Mirrors vars to distribute across multiple devices and machines.

*** contrib version ***

This strategy uses one replica per device and sync replication for its multi-GPU version.

When `cluster_spec` is given by the `configure` method., it turns into the mulit-worker version that works on multiple workers with in-graph replication. Note: `configure` will be called by higher-level APIs if running in distributed environment.

There are several important concepts for distributed TensorFlow, e.g. `client`, `job`, `task`, `cluster`, `in-graph replication` and `synchronous training` and they have already been defined in the [TensorFlow's documentation](https://www.tensorflow.org/deploy/distributed). The distribution strategy inherits these concepts as well and in addition to that we also clarify several more concepts:

* **In-graph replication**: the `client` creates a single tf.Graph that specifies tasks for devices on all workers. The `client` then creates a client session which will talk to the `master` service of a `worker`. Then the `master` will partition the graph and distribute the work to all participating workers. * **Worker**: A `worker` is a TensorFlow `task` that usually maps to one physical machine. We will have multiple `worker`s with different `task` index. They all do similar things except for one worker checkpointing model variables, writing summaries, etc. in addition to its ordinary work.

The multi-worker version of this class maps one replica to one device on a worker. It mirrors all model variables on all replicas. For example, if you have two `worker`s and each `worker` has 4 GPUs, it will create 8 copies of the model variables on these 8 GPUs. Then like in MirroredStrategy, each replica performs their computation with their own copy of variables unless in cross-replica model where variable or tensor reduction happens.


