LostTech.TensorFlow : API Documentation

Type CrossShardOptimizer

Namespace tensorflow.tpu

Parent Optimizer

Interfaces ICrossShardOptimizer

An optimizer that averages gradients across TPU shards.

Methods

Properties

Public instance methods

object compute_gradients(object loss, IDictionary<string, object> var_list, IDictionary<string, object> kwargs)

Compute gradients of "loss" for the variables in "var_list".

This simply wraps the compute_gradients() from the real optimizer. The gradients will be aggregated in the apply_gradients() so that user can modify the gradients like clipping with per replica global norm if needed. The global norm with aggregated gradients can be bad as one replica's huge gradients can hurt the gradients from other replicas.
Parameters
object loss
A Tensor containing the value to minimize.
IDictionary<string, object> var_list
Optional list or tuple of tf.Variable to update to minimize `loss`. Defaults to the list of variables collected in the graph under the key `GraphKey.TRAINABLE_VARIABLES`.
IDictionary<string, object> kwargs
Keyword arguments for compute_gradients().
Returns
object
A list of (gradient, variable) pairs.

object compute_gradients(object loss, IEnumerable<object> var_list, IDictionary<string, object> kwargs)

Compute gradients of "loss" for the variables in "var_list".

This simply wraps the compute_gradients() from the real optimizer. The gradients will be aggregated in the apply_gradients() so that user can modify the gradients like clipping with per replica global norm if needed. The global norm with aggregated gradients can be bad as one replica's huge gradients can hurt the gradients from other replicas.
Parameters
object loss
A Tensor containing the value to minimize.
IEnumerable<object> var_list
Optional list or tuple of tf.Variable to update to minimize `loss`. Defaults to the list of variables collected in the graph under the key `GraphKey.TRAINABLE_VARIABLES`.
IDictionary<string, object> kwargs
Keyword arguments for compute_gradients().
Returns
object
A list of (gradient, variable) pairs.

object compute_gradients(PythonFunctionContainer loss, IDictionary<string, object> var_list, IDictionary<string, object> kwargs)

Compute gradients of "loss" for the variables in "var_list".

This simply wraps the compute_gradients() from the real optimizer. The gradients will be aggregated in the apply_gradients() so that user can modify the gradients like clipping with per replica global norm if needed. The global norm with aggregated gradients can be bad as one replica's huge gradients can hurt the gradients from other replicas.
Parameters
PythonFunctionContainer loss
A Tensor containing the value to minimize.
IDictionary<string, object> var_list
Optional list or tuple of tf.Variable to update to minimize `loss`. Defaults to the list of variables collected in the graph under the key `GraphKey.TRAINABLE_VARIABLES`.
IDictionary<string, object> kwargs
Keyword arguments for compute_gradients().
Returns
object
A list of (gradient, variable) pairs.

object compute_gradients(PythonFunctionContainer loss, IEnumerable<object> var_list, IDictionary<string, object> kwargs)

Compute gradients of "loss" for the variables in "var_list".

This simply wraps the compute_gradients() from the real optimizer. The gradients will be aggregated in the apply_gradients() so that user can modify the gradients like clipping with per replica global norm if needed. The global norm with aggregated gradients can be bad as one replica's huge gradients can hurt the gradients from other replicas.
Parameters
PythonFunctionContainer loss
A Tensor containing the value to minimize.
IEnumerable<object> var_list
Optional list or tuple of tf.Variable to update to minimize `loss`. Defaults to the list of variables collected in the graph under the key `GraphKey.TRAINABLE_VARIABLES`.
IDictionary<string, object> kwargs
Keyword arguments for compute_gradients().
Returns
object
A list of (gradient, variable) pairs.

object compute_gradients_dyn(object loss, object var_list, IDictionary<string, object> kwargs)

Compute gradients of "loss" for the variables in "var_list".

This simply wraps the compute_gradients() from the real optimizer. The gradients will be aggregated in the apply_gradients() so that user can modify the gradients like clipping with per replica global norm if needed. The global norm with aggregated gradients can be bad as one replica's huge gradients can hurt the gradients from other replicas.
Parameters
object loss
A Tensor containing the value to minimize.
object var_list
Optional list or tuple of tf.Variable to update to minimize `loss`. Defaults to the list of variables collected in the graph under the key `GraphKey.TRAINABLE_VARIABLES`.
IDictionary<string, object> kwargs
Keyword arguments for compute_gradients().
Returns
object
A list of (gradient, variable) pairs.

Public static methods

CrossShardOptimizer NewDyn(object opt, ImplicitContainer<T> reduction, ImplicitContainer<T> name, object group_assignment)

Construct a new cross-shard optimizer.
Parameters
object opt
An existing `Optimizer` to encapsulate.
ImplicitContainer<T> reduction
The reduction to apply to the shard losses.
ImplicitContainer<T> name
Optional name prefix for the operations created when applying gradients. Defaults to "CrossShardOptimizer".
object group_assignment
Optional 2d int32 lists with shape [num_groups, num_replicas_per_group] which describles how to apply optimizer to subgroups.

Public properties

object PythonObject get;