ParameterServerStrategy - LostTech.TensorFlow Documentation

Type ParameterServerStrategy

Namespace tensorflow.contrib.distribute

A parameter server DistributionStrategy.

*** contrib version ***

This strategy class works for both local training and between-graph replicated training for multiple workers. If `cluster_spec` is specified, either passed in to __init__() method or parsed from the ["TF_CONFIG" environment variable](https://www.tensorflow.org/api_docs/python/tf/estimator/RunConfig), variables and updates to those variables are assigned to parameter servers and other operations are assigned to workers. If `cluster_spec` is not set, it becomes local training where variables are assigned to local CPU or the only GPU. When each worker has more than one GPU, operations will be replicated on these GPUs. In both cases, operations are replicated but variables are not and these workers share a common view for which parameter server a variable is assigned to.

This class assumes between-graph replication will be used and works on a graph for a particular worker. Note that each graph and worker is independent. This means that while each worker will synchronously compute a single gradient update across all GPUs, updates between workers proceed asynchronously. Operations that occur only on the first replica (such as incrementing the global step), will occur on the first replica *of every worker*.

It is expected to call `call_for_each_replica(fn,...)` for any operations which potentially can be replicated across replicas (i.e. multiple GPUs) even if there is only CPU or one GPU. When defining the `fn`, extra caution needs to be taken:

1) Always use `tf.compat.v1.get_variable` instead of tf.Variable which is not able to refer to the same variable on different replicas.

2) It is generally not recommended to open a device scope under the strategy's scope. A device scope (i.e. calling tf.device) will be merged with or override the device for operations but will not change the device for variables.

3) It is also not recommended to open a colocation scope (i.e. calling `tf.compat.v1.colocate_with`) under the strategy's scope. For colocating variables, use `strategy.extended.colocate_vars_with` instead. Colocation of ops will possibly create conflicts of device assignment.

LostTech.TensorFlow : API Documentation

Properties

Public properties

object extended get;

object extended_dyn get;

int num_replicas_in_sync get;

object num_replicas_in_sync_dyn get;

object PythonObject get;