LostTech.TensorFlow : API Documentation

Type ClusterResolver

Namespace tensorflow.distribute.cluster_resolver

Parent PythonObjectContainer

Interfaces IClusterResolver

Abstract class for all implementations of ClusterResolvers.

This defines the skeleton for all implementations of ClusterResolvers. ClusterResolvers are a way for TensorFlow to communicate with various cluster management systems (e.g. GCE, AWS, etc...).

By letting TensorFlow communicate with these systems, we will be able to automatically discover and resolve IP addresses for various TensorFlow workers. This will eventually allow us to automatically recover from underlying machine failures and scale TensorFlow worker clusters up and down.

Note to Implementors: In addition to these abstract methods, you must also implement the task_type, task_id, and rpc_layer attributes. You may choose to implement them either as properties with getters or setters or directly set the attributes.

- task_type is the name of the server's current named job (e.g. 'worker', 'ps' in a distributed parameterized training job). - task_id is the ordinal index of the server within the task type. - rpc_layer is the protocol used by TensorFlow to communicate with other TensorFlow servers in a distributed environment.

Methods

Properties

Public instance methods

ClusterSpec cluster_spec()

Returns a ClusterSpec object based on the latest TPU information.

We retrieve the information from the GCE APIs every time this method is called.
Returns
ClusterSpec
A ClusterSpec containing host information returned from Cloud TPUs, or None.

object cluster_spec_dyn()

Returns a ClusterSpec object based on the latest TPU information.

We retrieve the information from the GCE APIs every time this method is called.
Returns
object
A ClusterSpec containing host information returned from Cloud TPUs, or None.

string master(string task_type, Nullable<int> task_id, string rpc_layer)

Get the Master string to be used for the session.

In the normal case, this returns the grpc path (grpc://1.2.3.4:8470) of first instance in the ClusterSpec returned by the cluster_spec function.

If a non-TPU name is used when constructing a TPUClusterResolver, that will be returned instead (e.g. If the tpus argument's value when constructing this TPUClusterResolver was 'grpc://10.240.1.2:8470', 'grpc://10.240.1.2:8470' will be returned).
Parameters
string task_type
(Optional, string) The type of the TensorFlow task of the master.
Nullable<int> task_id
(Optional, integer) The index of the TensorFlow task of the master.
string rpc_layer
(Optional, string) The RPC protocol TensorFlow should use to communicate with TPUs.
Returns
string
string, the connection string to use when creating a session.

object master_dyn(object task_type, object task_id, object rpc_layer)

Get the Master string to be used for the session.

In the normal case, this returns the grpc path (grpc://1.2.3.4:8470) of first instance in the ClusterSpec returned by the cluster_spec function.

If a non-TPU name is used when constructing a TPUClusterResolver, that will be returned instead (e.g. If the tpus argument's value when constructing this TPUClusterResolver was 'grpc://10.240.1.2:8470', 'grpc://10.240.1.2:8470' will be returned).
Parameters
object task_type
(Optional, string) The type of the TensorFlow task of the master.
object task_id
(Optional, integer) The index of the TensorFlow task of the master.
object rpc_layer
(Optional, string) The RPC protocol TensorFlow should use to communicate with TPUs.
Returns
object
string, the connection string to use when creating a session.

object num_accelerators(string task_type, Nullable<int> task_id, object config_proto)

Returns the number of TPU cores per worker.

Connects to the master and list all the devices present in the master, and counts them up. Also verifies that the device counts per host in the cluster is the same before returning the number of TPU cores per host.
Parameters
string task_type
Unused.
Nullable<int> task_id
Unused.
object config_proto
Used to create a connection to a TPU master in order to retrieve the system metadata.

object num_accelerators_dyn(object task_type, object task_id, object config_proto)

Returns the number of TPU cores per worker.

Connects to the master and list all the devices present in the master, and counts them up. Also verifies that the device counts per host in the cluster is the same before returning the number of TPU cores per host.
Parameters
object task_type
Unused.
object task_id
Unused.
object config_proto
Used to create a connection to a TPU master in order to retrieve the system metadata.

Public properties

string environment get;

Returns the current environment which TensorFlow is running in.

There are two possible return values, "google" (when TensorFlow is running in a Google-internal environment) or an empty string (when TensorFlow is running elsewhere).

If you are implementing a ClusterResolver that works in both the Google environment and the open-source world (for instance, a TPU ClusterResolver or similar), you will have to return the appropriate string depending on the environment, which you will have to detect.

Otherwise, if you are implementing a ClusterResolver that will only work in open-source TensorFlow, you do not need to implement this property.

object environment_dyn get;

Returns the current environment which TensorFlow is running in.

There are two possible return values, "google" (when TensorFlow is running in a Google-internal environment) or an empty string (when TensorFlow is running elsewhere).

If you are implementing a ClusterResolver that works in both the Google environment and the open-source world (for instance, a TPU ClusterResolver or similar), you will have to return the appropriate string depending on the environment, which you will have to detect.

Otherwise, if you are implementing a ClusterResolver that will only work in open-source TensorFlow, you do not need to implement this property.

object PythonObject get;