Type ClusterResolver
Namespace tensorflow.distribute.cluster_resolver
Parent PythonObjectContainer
Interfaces IClusterResolver
Abstract class for all implementations of ClusterResolvers. This defines the skeleton for all implementations of ClusterResolvers.
ClusterResolvers are a way for TensorFlow to communicate with various cluster
management systems (e.g. GCE, AWS, etc...). By letting TensorFlow communicate with these systems, we will be able to
automatically discover and resolve IP addresses for various TensorFlow
workers. This will eventually allow us to automatically recover from
underlying machine failures and scale TensorFlow worker clusters up and down. Note to Implementors: In addition to these abstract methods, you must also
implement the task_type, task_id, and rpc_layer attributes. You may choose
to implement them either as properties with getters or setters or directly
set the attributes. - task_type is the name of the server's current named job (e.g. 'worker',
'ps' in a distributed parameterized training job).
- task_id is the ordinal index of the server within the task type.
- rpc_layer is the protocol used by TensorFlow to communicate with other
TensorFlow servers in a distributed environment.
Methods
Properties
Public instance methods
ClusterSpec cluster_spec()
Returns a ClusterSpec object based on the latest TPU information. We retrieve the information from the GCE APIs every time this method is
called.
Returns
-
ClusterSpec
- A ClusterSpec containing host information returned from Cloud TPUs, or None.
object cluster_spec_dyn()
Returns a ClusterSpec object based on the latest TPU information. We retrieve the information from the GCE APIs every time this method is
called.
Returns
-
object
- A ClusterSpec containing host information returned from Cloud TPUs, or None.
string master(string task_type, Nullable<int> task_id, string rpc_layer)
Get the Master string to be used for the session. In the normal case, this returns the grpc path (grpc://1.2.3.4:8470) of
first instance in the ClusterSpec returned by the cluster_spec function. If a non-TPU name is used when constructing a TPUClusterResolver, that will
be returned instead (e.g. If the tpus argument's value when constructing
this TPUClusterResolver was 'grpc://10.240.1.2:8470',
'grpc://10.240.1.2:8470' will be returned).
Parameters
-
string
task_type - (Optional, string) The type of the TensorFlow task of the master.
-
Nullable<int>
task_id - (Optional, integer) The index of the TensorFlow task of the master.
-
string
rpc_layer - (Optional, string) The RPC protocol TensorFlow should use to communicate with TPUs.
Returns
-
string
- string, the connection string to use when creating a session.
object master_dyn(object task_type, object task_id, object rpc_layer)
Get the Master string to be used for the session. In the normal case, this returns the grpc path (grpc://1.2.3.4:8470) of
first instance in the ClusterSpec returned by the cluster_spec function. If a non-TPU name is used when constructing a TPUClusterResolver, that will
be returned instead (e.g. If the tpus argument's value when constructing
this TPUClusterResolver was 'grpc://10.240.1.2:8470',
'grpc://10.240.1.2:8470' will be returned).
Parameters
-
object
task_type - (Optional, string) The type of the TensorFlow task of the master.
-
object
task_id - (Optional, integer) The index of the TensorFlow task of the master.
-
object
rpc_layer - (Optional, string) The RPC protocol TensorFlow should use to communicate with TPUs.
Returns
-
object
- string, the connection string to use when creating a session.
object num_accelerators(string task_type, Nullable<int> task_id, object config_proto)
Returns the number of TPU cores per worker. Connects to the master and list all the devices present in the master,
and counts them up. Also verifies that the device counts per host in the
cluster is the same before returning the number of TPU cores per host.
Parameters
-
string
task_type - Unused.
-
Nullable<int>
task_id - Unused.
-
object
config_proto - Used to create a connection to a TPU master in order to retrieve the system metadata.
object num_accelerators_dyn(object task_type, object task_id, object config_proto)
Returns the number of TPU cores per worker. Connects to the master and list all the devices present in the master,
and counts them up. Also verifies that the device counts per host in the
cluster is the same before returning the number of TPU cores per host.
Parameters
-
object
task_type - Unused.
-
object
task_id - Unused.
-
object
config_proto - Used to create a connection to a TPU master in order to retrieve the system metadata.
Public properties
string environment get;
Returns the current environment which TensorFlow is running in. There are two possible return values, "google" (when TensorFlow is running
in a Google-internal environment) or an empty string (when TensorFlow is
running elsewhere). If you are implementing a ClusterResolver that works in both the Google
environment and the open-source world (for instance, a TPU ClusterResolver
or similar), you will have to return the appropriate string depending on the
environment, which you will have to detect. Otherwise, if you are implementing a ClusterResolver that will only work
in open-source TensorFlow, you do not need to implement this property.
object environment_dyn get;
Returns the current environment which TensorFlow is running in. There are two possible return values, "google" (when TensorFlow is running
in a Google-internal environment) or an empty string (when TensorFlow is
running elsewhere). If you are implementing a ClusterResolver that works in both the Google
environment and the open-source world (for instance, a TPU ClusterResolver
or similar), you will have to return the appropriate string depending on the
environment, which you will have to detect. Otherwise, if you are implementing a ClusterResolver that will only work
in open-source TensorFlow, you do not need to implement this property.