Type MirroredStrategy
Namespace tensorflow.contrib.distribute
Parent Strategy
Interfaces IMirroredStrategy
Mirrors vars to distribute across multiple devices and machines. *** contrib version *** This strategy uses one replica per device and sync replication for its
multi-GPU version. When `cluster_spec` is given by the `configure` method., it turns into the
mulit-worker version that works on multiple workers with in-graph replication.
Note: `configure` will be called by higher-level APIs if running in
distributed environment. There are several important concepts for distributed TensorFlow, e.g.
`client`, `job`, `task`, `cluster`, `in-graph replication` and
`synchronous training` and they have already been defined in the
[TensorFlow's documentation](https://www.tensorflow.org/deploy/distributed).
The distribution strategy inherits these concepts as well and in addition to
that we also clarify several more concepts: * **In-graph replication**: the `client` creates a single
tf.Graph
that
specifies tasks for devices on all workers. The `client` then creates a
client session which will talk to the `master` service of a `worker`. Then
the `master` will partition the graph and distribute the work to all
participating workers.
* **Worker**: A `worker` is a TensorFlow `task` that usually maps to one
physical machine. We will have multiple `worker`s with different `task`
index. They all do similar things except for one worker checkpointing model
variables, writing summaries, etc. in addition to its ordinary work. The multi-worker version of this class maps one replica to one device on a
worker. It mirrors all model variables on all replicas. For example, if you
have two `worker`s and each `worker` has 4 GPUs, it will create 8 copies of
the model variables on these 8 GPUs. Then like in MirroredStrategy, each
replica performs their computation with their own copy of variables unless in
cross-replica model where variable or tensor reduction happens.