Type CrossShardOptimizer
Namespace tensorflow.tpu
Parent Optimizer
Interfaces ICrossShardOptimizer
An optimizer that averages gradients across TPU shards.
Methods
- compute_gradients
- compute_gradients
- compute_gradients
- compute_gradients
- compute_gradients_dyn
- NewDyn
Properties
Public instance methods
object compute_gradients(object loss, IDictionary<string, object> var_list, IDictionary<string, object> kwargs)
Compute gradients of "loss" for the variables in "var_list". This simply wraps the compute_gradients() from the real optimizer. The
gradients will be aggregated in the apply_gradients() so that user can
modify the gradients like clipping with per replica global norm if needed.
The global norm with aggregated gradients can be bad as one replica's huge
gradients can hurt the gradients from other replicas.
Parameters
-
object
loss - A Tensor containing the value to minimize.
-
IDictionary<string, object>
var_list - Optional list or tuple of
tf.Variable
to update to minimize `loss`. Defaults to the list of variables collected in the graph under the key `GraphKey.TRAINABLE_VARIABLES`. -
IDictionary<string, object>
kwargs - Keyword arguments for compute_gradients().
Returns
-
object
- A list of (gradient, variable) pairs.
object compute_gradients(object loss, IEnumerable<object> var_list, IDictionary<string, object> kwargs)
Compute gradients of "loss" for the variables in "var_list". This simply wraps the compute_gradients() from the real optimizer. The
gradients will be aggregated in the apply_gradients() so that user can
modify the gradients like clipping with per replica global norm if needed.
The global norm with aggregated gradients can be bad as one replica's huge
gradients can hurt the gradients from other replicas.
Parameters
-
object
loss - A Tensor containing the value to minimize.
-
IEnumerable<object>
var_list - Optional list or tuple of
tf.Variable
to update to minimize `loss`. Defaults to the list of variables collected in the graph under the key `GraphKey.TRAINABLE_VARIABLES`. -
IDictionary<string, object>
kwargs - Keyword arguments for compute_gradients().
Returns
-
object
- A list of (gradient, variable) pairs.
object compute_gradients(PythonFunctionContainer loss, IDictionary<string, object> var_list, IDictionary<string, object> kwargs)
Compute gradients of "loss" for the variables in "var_list". This simply wraps the compute_gradients() from the real optimizer. The
gradients will be aggregated in the apply_gradients() so that user can
modify the gradients like clipping with per replica global norm if needed.
The global norm with aggregated gradients can be bad as one replica's huge
gradients can hurt the gradients from other replicas.
Parameters
-
PythonFunctionContainer
loss - A Tensor containing the value to minimize.
-
IDictionary<string, object>
var_list - Optional list or tuple of
tf.Variable
to update to minimize `loss`. Defaults to the list of variables collected in the graph under the key `GraphKey.TRAINABLE_VARIABLES`. -
IDictionary<string, object>
kwargs - Keyword arguments for compute_gradients().
Returns
-
object
- A list of (gradient, variable) pairs.
object compute_gradients(PythonFunctionContainer loss, IEnumerable<object> var_list, IDictionary<string, object> kwargs)
Compute gradients of "loss" for the variables in "var_list". This simply wraps the compute_gradients() from the real optimizer. The
gradients will be aggregated in the apply_gradients() so that user can
modify the gradients like clipping with per replica global norm if needed.
The global norm with aggregated gradients can be bad as one replica's huge
gradients can hurt the gradients from other replicas.
Parameters
-
PythonFunctionContainer
loss - A Tensor containing the value to minimize.
-
IEnumerable<object>
var_list - Optional list or tuple of
tf.Variable
to update to minimize `loss`. Defaults to the list of variables collected in the graph under the key `GraphKey.TRAINABLE_VARIABLES`. -
IDictionary<string, object>
kwargs - Keyword arguments for compute_gradients().
Returns
-
object
- A list of (gradient, variable) pairs.
object compute_gradients_dyn(object loss, object var_list, IDictionary<string, object> kwargs)
Compute gradients of "loss" for the variables in "var_list". This simply wraps the compute_gradients() from the real optimizer. The
gradients will be aggregated in the apply_gradients() so that user can
modify the gradients like clipping with per replica global norm if needed.
The global norm with aggregated gradients can be bad as one replica's huge
gradients can hurt the gradients from other replicas.
Parameters
-
object
loss - A Tensor containing the value to minimize.
-
object
var_list - Optional list or tuple of
tf.Variable
to update to minimize `loss`. Defaults to the list of variables collected in the graph under the key `GraphKey.TRAINABLE_VARIABLES`. -
IDictionary<string, object>
kwargs - Keyword arguments for compute_gradients().
Returns
-
object
- A list of (gradient, variable) pairs.
Public static methods
CrossShardOptimizer NewDyn(object opt, ImplicitContainer<T> reduction, ImplicitContainer<T> name, object group_assignment)
Construct a new cross-shard optimizer.
Parameters
-
object
opt - An existing `Optimizer` to encapsulate.
-
ImplicitContainer<T>
reduction - The reduction to apply to the shard losses.
-
ImplicitContainer<T>
name - Optional name prefix for the operations created when applying gradients. Defaults to "CrossShardOptimizer".
-
object
group_assignment - Optional 2d int32 lists with shape [num_groups, num_replicas_per_group] which describles how to apply optimizer to subgroups.