Type ParameterServerStrategy
Namespace tensorflow.contrib.distribute
Parent Strategy
Interfaces IParameterServerStrategy
A parameter server DistributionStrategy. *** contrib version *** This strategy class works for both local training and between-graph replicated
training for multiple workers. If `cluster_spec` is specified, either passed
in to __init__() method or parsed from the
["TF_CONFIG" environment
variable](https://www.tensorflow.org/api_docs/python/tf/estimator/RunConfig),
variables and updates to those variables are assigned to parameter servers and
other operations are assigned to workers. If `cluster_spec` is not set, it
becomes local training where variables are assigned to local CPU or the only
GPU. When each worker has more than one GPU, operations will be replicated on
these GPUs. In both cases, operations are replicated but variables are not and
these workers share a common view for which parameter server a variable is
assigned to. This class assumes between-graph replication will be used and works on a graph
for a particular worker. Note that each graph and worker is independent.
This means that while each worker will synchronously compute a single gradient
update across all GPUs, updates between workers proceed asynchronously.
Operations that occur only on the first replica (such as incrementing the
global step), will occur on the first replica *of every worker*. It is expected to call `call_for_each_replica(fn,...)` for any
operations which potentially can be replicated across replicas (i.e. multiple
GPUs) even if there is only CPU or one GPU. When defining the `fn`, extra
caution needs to be taken: 1) Always use `tf.compat.v1.get_variable` instead of
tf.Variable
which
is not able to refer to the same variable on different replicas. 2) It is generally not recommended to open a device scope under the strategy's
scope. A device scope (i.e. calling tf.device
) will be merged with or
override the device for operations but will not change the device for
variables. 3) It is also not recommended to open a colocation scope (i.e. calling
`tf.compat.v1.colocate_with`) under the strategy's scope. For colocating
variables, use `strategy.extended.colocate_vars_with` instead. Colocation of
ops will possibly create conflicts of device assignment.