LostTech.TensorFlow : API Documentation

Type tf.tpu

Namespace tensorflow

Public static methods

IList<object> batch_parallel(object computation, object inputs, int num_shards, object infeed_queue, object device_assignment, string name)

Shards `computation` along the batch dimension for parallel execution.

Convenience wrapper around shard().

`inputs` must be a list of Tensors or None (equivalent to an empty list). Each input is split into `num_shards` pieces along the 0-th dimension, and computation is applied to each shard in parallel.

Tensors are broadcast to all shards if they are lexically captured by `computation`. e.g.,

x = tf.constant(7) def computation(): return x + 3 ... = shard(computation,...)

The outputs from all shards are concatenated back together along their 0-th dimension.

Inputs and outputs of the computation must be at least rank-1 Tensors.
Parameters
object computation
A Python function that builds a computation to apply to each shard of the input.
object inputs
A list of input tensors or None (equivalent to an empty list). The 0-th dimension of each Tensor must have size divisible by `num_shards`.
int num_shards
The number of shards.
object infeed_queue
If not `None`, the `InfeedQueue` from which to append a tuple of arguments as inputs to `computation`.
object device_assignment
If not `None`, a `DeviceAssignment` describing the mapping between logical cores in the computation with physical cores in the TPU topology. Uses a default device assignment if `None`. The `DeviceAssignment` may be omitted if each shard of the computation uses only one core, and there is either only one shard, or the number of shards is equal to the number of cores in the TPU system.
string name
(Deprecated) Does nothing.
Returns
IList<object>
A list of output tensors.

object batch_parallel_dyn(object computation, object inputs, ImplicitContainer<T> num_shards, object infeed_queue, object device_assignment, object name)

Shards `computation` along the batch dimension for parallel execution.

Convenience wrapper around shard().

`inputs` must be a list of Tensors or None (equivalent to an empty list). Each input is split into `num_shards` pieces along the 0-th dimension, and computation is applied to each shard in parallel.

Tensors are broadcast to all shards if they are lexically captured by `computation`. e.g.,

x = tf.constant(7) def computation(): return x + 3 ... = shard(computation,...)

The outputs from all shards are concatenated back together along their 0-th dimension.

Inputs and outputs of the computation must be at least rank-1 Tensors.
Parameters
object computation
A Python function that builds a computation to apply to each shard of the input.
object inputs
A list of input tensors or None (equivalent to an empty list). The 0-th dimension of each Tensor must have size divisible by `num_shards`.
ImplicitContainer<T> num_shards
The number of shards.
object infeed_queue
If not `None`, the `InfeedQueue` from which to append a tuple of arguments as inputs to `computation`.
object device_assignment
If not `None`, a `DeviceAssignment` describing the mapping between logical cores in the computation with physical cores in the TPU topology. Uses a default device assignment if `None`. The `DeviceAssignment` may be omitted if each shard of the computation uses only one core, and there is either only one shard, or the number of shards is equal to the number of cores in the TPU system.
object name
(Deprecated) Does nothing.
Returns
object
A list of output tensors.

IContextManager<T> bfloat16_scope()

Scope class for bfloat16 variables so that the model uses custom getter.

This enables variables to be read as bfloat16 type when using get_variable.

object bfloat16_scope_dyn()

Scope class for bfloat16 variables so that the model uses custom getter.

This enables variables to be read as bfloat16 type when using get_variable.

int core(int num)

Returns the device name for a core in a replicated TPU computation.
Parameters
int num
the virtual core number within each replica to which operators should be assigned.
Returns
int
A device name, suitable for passing to `tf.device()`.

object core_dyn(object num)

Returns the device name for a core in a replicated TPU computation.
Parameters
object num
the virtual core number within each replica to which operators should be assigned.
Returns
object
A device name, suitable for passing to `tf.device()`.

Tensor cross_replica_sum(object x, object group_assignment, string name)

Sum the input tensor across replicas according to group_assignment.
Parameters
object x
The local tensor to the sum.
object group_assignment
Optional 2d int32 lists with shape [num_groups, num_replicas_per_group]. `group_assignment[i]` represents the replica ids in the ith subgroup.
string name
Optional op name.
Returns
Tensor
A `Tensor` which is summed across replicas.

Tensor cross_replica_sum(IGraphNodeBase x, object group_assignment, string name)

Sum the input tensor across replicas according to group_assignment.
Parameters
IGraphNodeBase x
The local tensor to the sum.
object group_assignment
Optional 2d int32 lists with shape [num_groups, num_replicas_per_group]. `group_assignment[i]` represents the replica ids in the ith subgroup.
string name
Optional op name.
Returns
Tensor
A `Tensor` which is summed across replicas.

Tensor cross_replica_sum(PerReplica x, object group_assignment, string name)

Sum the input tensor across replicas according to group_assignment.
Parameters
PerReplica x
The local tensor to the sum.
object group_assignment
Optional 2d int32 lists with shape [num_groups, num_replicas_per_group]. `group_assignment[i]` represents the replica ids in the ith subgroup.
string name
Optional op name.
Returns
Tensor
A `Tensor` which is summed across replicas.

Tensor cross_replica_sum(ValueTuple<PythonClassContainer, PythonClassContainer> x, object group_assignment, string name)

Sum the input tensor across replicas according to group_assignment.
Parameters
ValueTuple<PythonClassContainer, PythonClassContainer> x
The local tensor to the sum.
object group_assignment
Optional 2d int32 lists with shape [num_groups, num_replicas_per_group]. `group_assignment[i]` represents the replica ids in the ith subgroup.
string name
Optional op name.
Returns
Tensor
A `Tensor` which is summed across replicas.

Tensor cross_replica_sum(IEnumerable<object> x, object group_assignment, string name)

Sum the input tensor across replicas according to group_assignment.
Parameters
IEnumerable<object> x
The local tensor to the sum.
object group_assignment
Optional 2d int32 lists with shape [num_groups, num_replicas_per_group]. `group_assignment[i]` represents the replica ids in the ith subgroup.
string name
Optional op name.
Returns
Tensor
A `Tensor` which is summed across replicas.

Tensor cross_replica_sum(IDictionary<object, object> x, object group_assignment, string name)

Sum the input tensor across replicas according to group_assignment.
Parameters
IDictionary<object, object> x
The local tensor to the sum.
object group_assignment
Optional 2d int32 lists with shape [num_groups, num_replicas_per_group]. `group_assignment[i]` represents the replica ids in the ith subgroup.
string name
Optional op name.
Returns
Tensor
A `Tensor` which is summed across replicas.

Tensor initialize_system(object embedding_config, string job)

Initializes a distributed TPU system for use with TensorFlow.
Parameters
object embedding_config
If not None, a `TPUEmbeddingConfiguration` proto describing the desired configuration of the hardware embedding lookup tables. If embedding_config is None, no hardware embeddings can be used.
string job
The job (the XXX in TensorFlow device specification /job:XXX) that contains the TPU devices that will be initialized. If job=None it is assumed there is only one job in the TensorFlow flock, and an error will be returned if this assumption does not hold.
Returns
Tensor
A serialized `TopologyProto` that describes the TPU system. Note: the topology must be evaluated using `Session.run` before it can be used.

object initialize_system_dyn(object embedding_config, object job)

Initializes a distributed TPU system for use with TensorFlow.
Parameters
object embedding_config
If not None, a `TPUEmbeddingConfiguration` proto describing the desired configuration of the hardware embedding lookup tables. If embedding_config is None, no hardware embeddings can be used.
object job
The job (the XXX in TensorFlow device specification /job:XXX) that contains the TPU devices that will be initialized. If job=None it is assumed there is only one job in the TensorFlow flock, and an error will be returned if this assumption does not hold.
Returns
object
A serialized `TopologyProto` that describes the TPU system. Note: the topology must be evaluated using `Session.run` before it can be used.

object outside_compilation(PythonFunctionContainer computation, IDictionary<string, object> kwargs, Object[] args)

Builds part of a computation outside any current TPU replicate scope.
Parameters
PythonFunctionContainer computation
A Python function that builds the computation to place on the host.
IDictionary<string, object> kwargs
the keyword arguments for the computation.
Object[] args
the positional arguments for the computation.
Returns
object
The Tensors returned by computation.

object outside_compilation(PythonFunctionContainer computation, Object[] args)

Builds part of a computation outside any current TPU replicate scope.
Parameters
PythonFunctionContainer computation
A Python function that builds the computation to place on the host.
Object[] args
the positional arguments for the computation.
Returns
object
The Tensors returned by computation.

object outside_compilation_dyn(object computation, Object[] args)

Builds part of a computation outside any current TPU replicate scope.
Parameters
object computation
A Python function that builds the computation to place on the host.
Object[] args
the positional arguments for the computation.
Returns
object
The Tensors returned by computation.

object outside_compilation_dyn(object computation, IDictionary<string, object> kwargs, Object[] args)

Builds part of a computation outside any current TPU replicate scope.
Parameters
object computation
A Python function that builds the computation to place on the host.
IDictionary<string, object> kwargs
the keyword arguments for the computation.
Object[] args
the positional arguments for the computation.
Returns
object
The Tensors returned by computation.

object replicate(object computation, IEnumerable<object> inputs, object infeed_queue, DeviceAssignment device_assignment, string name, PythonClassContainer maximum_shapes)

Builds a graph operator that runs a replicated TPU computation.
Parameters
object computation
A Python function that builds the computation to replicate.
IEnumerable<object> inputs
A list of lists of input tensors or `None` (equivalent to `[[]]`), indexed by `[replica_num][input_num]`. All replicas must have the same number of inputs. Each input can be a nested structure containing values that are convertible to tensors. Note that passing an N-dimension list of compatible values will result in a N-dimension list of scalar tensors rather than a single Rank-N tensors. If you need different behavior, convert part of inputs to tensors with tf.convert_to_tensor.
object infeed_queue
If not `None`, the `InfeedQueue` from which to append a tuple of arguments as inputs to computation.
DeviceAssignment device_assignment
If not `None`, a `DeviceAssignment` describing the mapping between logical cores in the computation with physical cores in the TPU topology. Uses a default device assignment if `None`. The `DeviceAssignment` may be omitted if each replica of the computation uses only one core, and there is either only one replica, or the number of replicas is equal to the number of cores in the TPU system.
string name
(Deprecated) Does nothing.
PythonClassContainer maximum_shapes
A nested structure of tf.TensorShape representing the shape to which the respective component of each input element in each replica should be padded. Any unknown dimensions (e.g. tf.compat.v1.Dimension(None) in a tf.TensorShape or -1 in a tensor-like object) will be padded to the maximum size of that dimension over all replicas. The structure of `maximum_shapes` needs to be the same as `inputs[0]`.
Returns
object
A list of outputs, indexed by `[replica_num]` each output can be a nested structure same as what computation() returns with a few exceptions.

Exceptions include: 1) None output: a NoOp would be returned which control-depends on computation. 2) Single value output: A tuple containing the value would be returned. 3) Operation-only outputs: a NoOp would be returned which control-depends on computation. TODO(b/121383831): Investigate into removing these special cases.

object replicate(object computation, IEnumerable<object> inputs, object infeed_queue, DeviceAssignment device_assignment, string name, int maximum_shapes)

Builds a graph operator that runs a replicated TPU computation.
Parameters
object computation
A Python function that builds the computation to replicate.
IEnumerable<object> inputs
A list of lists of input tensors or `None` (equivalent to `[[]]`), indexed by `[replica_num][input_num]`. All replicas must have the same number of inputs. Each input can be a nested structure containing values that are convertible to tensors. Note that passing an N-dimension list of compatible values will result in a N-dimension list of scalar tensors rather than a single Rank-N tensors. If you need different behavior, convert part of inputs to tensors with tf.convert_to_tensor.
object infeed_queue
If not `None`, the `InfeedQueue` from which to append a tuple of arguments as inputs to computation.
DeviceAssignment device_assignment
If not `None`, a `DeviceAssignment` describing the mapping between logical cores in the computation with physical cores in the TPU topology. Uses a default device assignment if `None`. The `DeviceAssignment` may be omitted if each replica of the computation uses only one core, and there is either only one replica, or the number of replicas is equal to the number of cores in the TPU system.
string name
(Deprecated) Does nothing.
int maximum_shapes
A nested structure of tf.TensorShape representing the shape to which the respective component of each input element in each replica should be padded. Any unknown dimensions (e.g. tf.compat.v1.Dimension(None) in a tf.TensorShape or -1 in a tensor-like object) will be padded to the maximum size of that dimension over all replicas. The structure of `maximum_shapes` needs to be the same as `inputs[0]`.
Returns
object
A list of outputs, indexed by `[replica_num]` each output can be a nested structure same as what computation() returns with a few exceptions.

Exceptions include: 1) None output: a NoOp would be returned which control-depends on computation. 2) Single value output: A tuple containing the value would be returned. 3) Operation-only outputs: a NoOp would be returned which control-depends on computation. TODO(b/121383831): Investigate into removing these special cases.

object replicate(object computation, IEnumerable<object> inputs, object infeed_queue, DeviceAssignment device_assignment, string name, TensorShape maximum_shapes)

Builds a graph operator that runs a replicated TPU computation.
Parameters
object computation
A Python function that builds the computation to replicate.
IEnumerable<object> inputs
A list of lists of input tensors or `None` (equivalent to `[[]]`), indexed by `[replica_num][input_num]`. All replicas must have the same number of inputs. Each input can be a nested structure containing values that are convertible to tensors. Note that passing an N-dimension list of compatible values will result in a N-dimension list of scalar tensors rather than a single Rank-N tensors. If you need different behavior, convert part of inputs to tensors with tf.convert_to_tensor.
object infeed_queue
If not `None`, the `InfeedQueue` from which to append a tuple of arguments as inputs to computation.
DeviceAssignment device_assignment
If not `None`, a `DeviceAssignment` describing the mapping between logical cores in the computation with physical cores in the TPU topology. Uses a default device assignment if `None`. The `DeviceAssignment` may be omitted if each replica of the computation uses only one core, and there is either only one replica, or the number of replicas is equal to the number of cores in the TPU system.
string name
(Deprecated) Does nothing.
TensorShape maximum_shapes
A nested structure of tf.TensorShape representing the shape to which the respective component of each input element in each replica should be padded. Any unknown dimensions (e.g. tf.compat.v1.Dimension(None) in a tf.TensorShape or -1 in a tensor-like object) will be padded to the maximum size of that dimension over all replicas. The structure of `maximum_shapes` needs to be the same as `inputs[0]`.
Returns
object
A list of outputs, indexed by `[replica_num]` each output can be a nested structure same as what computation() returns with a few exceptions.

Exceptions include: 1) None output: a NoOp would be returned which control-depends on computation. 2) Single value output: A tuple containing the value would be returned. 3) Operation-only outputs: a NoOp would be returned which control-depends on computation. TODO(b/121383831): Investigate into removing these special cases.

object replicate(object computation, IEnumerable<object> inputs, object infeed_queue, DeviceAssignment device_assignment, string name, Dimension maximum_shapes)

Builds a graph operator that runs a replicated TPU computation.
Parameters
object computation
A Python function that builds the computation to replicate.
IEnumerable<object> inputs
A list of lists of input tensors or `None` (equivalent to `[[]]`), indexed by `[replica_num][input_num]`. All replicas must have the same number of inputs. Each input can be a nested structure containing values that are convertible to tensors. Note that passing an N-dimension list of compatible values will result in a N-dimension list of scalar tensors rather than a single Rank-N tensors. If you need different behavior, convert part of inputs to tensors with tf.convert_to_tensor.
object infeed_queue
If not `None`, the `InfeedQueue` from which to append a tuple of arguments as inputs to computation.
DeviceAssignment device_assignment
If not `None`, a `DeviceAssignment` describing the mapping between logical cores in the computation with physical cores in the TPU topology. Uses a default device assignment if `None`. The `DeviceAssignment` may be omitted if each replica of the computation uses only one core, and there is either only one replica, or the number of replicas is equal to the number of cores in the TPU system.
string name
(Deprecated) Does nothing.
Dimension maximum_shapes
A nested structure of tf.TensorShape representing the shape to which the respective component of each input element in each replica should be padded. Any unknown dimensions (e.g. tf.compat.v1.Dimension(None) in a tf.TensorShape or -1 in a tensor-like object) will be padded to the maximum size of that dimension over all replicas. The structure of `maximum_shapes` needs to be the same as `inputs[0]`.
Returns
object
A list of outputs, indexed by `[replica_num]` each output can be a nested structure same as what computation() returns with a few exceptions.

Exceptions include: 1) None output: a NoOp would be returned which control-depends on computation. 2) Single value output: A tuple containing the value would be returned. 3) Operation-only outputs: a NoOp would be returned which control-depends on computation. TODO(b/121383831): Investigate into removing these special cases.

object replicate(object computation, IEnumerable<object> inputs, object infeed_queue, DeviceAssignment device_assignment, string name, IEnumerable<TensorShape> maximum_shapes)

Builds a graph operator that runs a replicated TPU computation.
Parameters
object computation
A Python function that builds the computation to replicate.
IEnumerable<object> inputs
A list of lists of input tensors or `None` (equivalent to `[[]]`), indexed by `[replica_num][input_num]`. All replicas must have the same number of inputs. Each input can be a nested structure containing values that are convertible to tensors. Note that passing an N-dimension list of compatible values will result in a N-dimension list of scalar tensors rather than a single Rank-N tensors. If you need different behavior, convert part of inputs to tensors with tf.convert_to_tensor.
object infeed_queue
If not `None`, the `InfeedQueue` from which to append a tuple of arguments as inputs to computation.
DeviceAssignment device_assignment
If not `None`, a `DeviceAssignment` describing the mapping between logical cores in the computation with physical cores in the TPU topology. Uses a default device assignment if `None`. The `DeviceAssignment` may be omitted if each replica of the computation uses only one core, and there is either only one replica, or the number of replicas is equal to the number of cores in the TPU system.
string name
(Deprecated) Does nothing.
IEnumerable<TensorShape> maximum_shapes
A nested structure of tf.TensorShape representing the shape to which the respective component of each input element in each replica should be padded. Any unknown dimensions (e.g. tf.compat.v1.Dimension(None) in a tf.TensorShape or -1 in a tensor-like object) will be padded to the maximum size of that dimension over all replicas. The structure of `maximum_shapes` needs to be the same as `inputs[0]`.
Returns
object
A list of outputs, indexed by `[replica_num]` each output can be a nested structure same as what computation() returns with a few exceptions.

Exceptions include: 1) None output: a NoOp would be returned which control-depends on computation. 2) Single value output: A tuple containing the value would be returned. 3) Operation-only outputs: a NoOp would be returned which control-depends on computation. TODO(b/121383831): Investigate into removing these special cases.

object replicate_dyn(object computation, object inputs, object infeed_queue, object device_assignment, object name, object maximum_shapes)

Builds a graph operator that runs a replicated TPU computation.
Parameters
object computation
A Python function that builds the computation to replicate.
object inputs
A list of lists of input tensors or `None` (equivalent to `[[]]`), indexed by `[replica_num][input_num]`. All replicas must have the same number of inputs. Each input can be a nested structure containing values that are convertible to tensors. Note that passing an N-dimension list of compatible values will result in a N-dimension list of scalar tensors rather than a single Rank-N tensors. If you need different behavior, convert part of inputs to tensors with tf.convert_to_tensor.
object infeed_queue
If not `None`, the `InfeedQueue` from which to append a tuple of arguments as inputs to computation.
object device_assignment
If not `None`, a `DeviceAssignment` describing the mapping between logical cores in the computation with physical cores in the TPU topology. Uses a default device assignment if `None`. The `DeviceAssignment` may be omitted if each replica of the computation uses only one core, and there is either only one replica, or the number of replicas is equal to the number of cores in the TPU system.
object name
(Deprecated) Does nothing.
object maximum_shapes
A nested structure of tf.TensorShape representing the shape to which the respective component of each input element in each replica should be padded. Any unknown dimensions (e.g. tf.compat.v1.Dimension(None) in a tf.TensorShape or -1 in a tensor-like object) will be padded to the maximum size of that dimension over all replicas. The structure of `maximum_shapes` needs to be the same as `inputs[0]`.
Returns
object
A list of outputs, indexed by `[replica_num]` each output can be a nested structure same as what computation() returns with a few exceptions.

Exceptions include: 1) None output: a NoOp would be returned which control-depends on computation. 2) Single value output: A tuple containing the value would be returned. 3) Operation-only outputs: a NoOp would be returned which control-depends on computation. TODO(b/121383831): Investigate into removing these special cases.

object rewrite(object computation, Nullable<ValueTuple<object>> inputs, object infeed_queue, object device_assignment, string name)

Rewrites `computation` for execution on a TPU system.
Parameters
object computation
A Python function that builds a computation to apply to the input. If the function takes n inputs, 'inputs' should be a list of n tensors.

`computation` may return a list of operations and tensors. Tensors must come before operations in the returned list. The return value of `rewrite` is a list of tensors corresponding to the tensors from the output of `computation`.

All `Operation`s constructed during `computation` will be executed when evaluating any of the returned output tensors, not just the ones returned.
Nullable<ValueTuple<object>> inputs
A list of input tensors or `None` (equivalent to an empty list). Each input can be a nested structure containing values that are convertible to tensors. Note that passing an N-dimension list of compatible values will result in a N-dimention list of scalar tensors rather than a single Rank-N tensors. If you need different behavior, convert part of inputs to tensors with tf.convert_to_tensor.
object infeed_queue
If not `None`, the `InfeedQueue` from which to append a tuple of arguments as inputs to `computation`.
object device_assignment
if not `None`, a `DeviceAssignment` describing the mapping between logical cores in the computation with physical cores in the TPU topology. May be omitted for a single-core computation, in which case the core attached to task 0, TPU device 0 is used.
string name
(Deprecated) Does nothing.
Returns
object
Same data structure as if computation(*inputs) is called directly with some exceptions for correctness. Exceptions include: 1) None output: a NoOp would be returned which control-depends on computation. 2) Single value output: A tuple containing the value would be returned. 3) Operation-only outputs: a NoOp would be returned which control-depends on computation. TODO(b/121383831): Investigate into removing these special cases.

object rewrite_dyn(object computation, object inputs, object infeed_queue, object device_assignment, object name)

Rewrites `computation` for execution on a TPU system.
Parameters
object computation
A Python function that builds a computation to apply to the input. If the function takes n inputs, 'inputs' should be a list of n tensors.

`computation` may return a list of operations and tensors. Tensors must come before operations in the returned list. The return value of `rewrite` is a list of tensors corresponding to the tensors from the output of `computation`.

All `Operation`s constructed during `computation` will be executed when evaluating any of the returned output tensors, not just the ones returned.
object inputs
A list of input tensors or `None` (equivalent to an empty list). Each input can be a nested structure containing values that are convertible to tensors. Note that passing an N-dimension list of compatible values will result in a N-dimention list of scalar tensors rather than a single Rank-N tensors. If you need different behavior, convert part of inputs to tensors with tf.convert_to_tensor.
object infeed_queue
If not `None`, the `InfeedQueue` from which to append a tuple of arguments as inputs to `computation`.
object device_assignment
if not `None`, a `DeviceAssignment` describing the mapping between logical cores in the computation with physical cores in the TPU topology. May be omitted for a single-core computation, in which case the core attached to task 0, TPU device 0 is used.
object name
(Deprecated) Does nothing.
Returns
object
Same data structure as if computation(*inputs) is called directly with some exceptions for correctness. Exceptions include: 1) None output: a NoOp would be returned which control-depends on computation. 2) Single value output: A tuple containing the value would be returned. 3) Operation-only outputs: a NoOp would be returned which control-depends on computation. TODO(b/121383831): Investigate into removing these special cases.

IList<object> shard(object computation, object inputs, int num_shards, object input_shard_axes, bool outputs_from_all_shards, object output_shard_axes, object infeed_queue, object device_assignment, string name)

Shards `computation` for parallel execution.

`inputs` must be a list of Tensors or None (equivalent to an empty list), each of which has a corresponding split axis (from `input_shard_axes`). Each input is split into `num_shards` pieces along the corresponding axis, and computation is applied to each shard in parallel.

Tensors are broadcast to all shards if they are lexically captured by `computation`. e.g.,

x = tf.constant(7) def computation(): return x + 3 ... = shard(computation,...)

TODO(phawkins): consider adding support for broadcasting Tensors passed as inputs.

If `outputs_from_all_shards` is true, the outputs from all shards of `computation` are concatenated back together along their `output_shards_axes`. Otherwise, each output is taken from an arbitrary shard.

Inputs and outputs of the computation must be at least rank-1 Tensors.
Parameters
object computation
A Python function that builds a computation to apply to each shard of the input.
object inputs
A list of input tensors or None (equivalent to an empty list). Each input tensor has a corresponding shard axes, given by `input_shard_axes`, which must have size divisible by `num_shards`.
int num_shards
The number of shards.
object input_shard_axes
A list of dimensions along which to shard `inputs`, or `None`. `None` means "shard all inputs along dimension 0". If not `None`, there must be one dimension per input.
bool outputs_from_all_shards
Boolean or list of boolean. For each output, if `True`, outputs from all shards are concatenated along the corresponding `output_shard_axes` entry. Otherwise, each output is taken from an arbitrary shard. If the argument is a boolean, the argument's value is used for each output.
object output_shard_axes
A list of dimensions along which to concatenate the outputs of `computation`, or `None`. `None` means "concatenate all outputs along dimension 0". If not `None`, there must be one dimension per output. Ignored if `outputs_from_all_shards` is False.
object infeed_queue
If not `None`, the `InfeedQueue` to use to augment the inputs of `computation`.
object device_assignment
If not `None`, a `DeviceAssignment` describing the mapping between logical cores in the computation with physical cores in the TPU topology. Uses a default device assignment if `None`. The `DeviceAssignment` may be omitted if each shard of the computation uses only one core, and there is either only one shard, or the number of shards is equal to the number of cores in the TPU system.
string name
(Deprecated) Does nothing.
Returns
IList<object>
A list of output tensors.

object shard_dyn(object computation, object inputs, ImplicitContainer<T> num_shards, object input_shard_axes, ImplicitContainer<T> outputs_from_all_shards, object output_shard_axes, object infeed_queue, object device_assignment, object name)

Shards `computation` for parallel execution.

`inputs` must be a list of Tensors or None (equivalent to an empty list), each of which has a corresponding split axis (from `input_shard_axes`). Each input is split into `num_shards` pieces along the corresponding axis, and computation is applied to each shard in parallel.

Tensors are broadcast to all shards if they are lexically captured by `computation`. e.g.,

x = tf.constant(7) def computation(): return x + 3 ... = shard(computation,...)

TODO(phawkins): consider adding support for broadcasting Tensors passed as inputs.

If `outputs_from_all_shards` is true, the outputs from all shards of `computation` are concatenated back together along their `output_shards_axes`. Otherwise, each output is taken from an arbitrary shard.

Inputs and outputs of the computation must be at least rank-1 Tensors.
Parameters
object computation
A Python function that builds a computation to apply to each shard of the input.
object inputs
A list of input tensors or None (equivalent to an empty list). Each input tensor has a corresponding shard axes, given by `input_shard_axes`, which must have size divisible by `num_shards`.
ImplicitContainer<T> num_shards
The number of shards.
object input_shard_axes
A list of dimensions along which to shard `inputs`, or `None`. `None` means "shard all inputs along dimension 0". If not `None`, there must be one dimension per input.
ImplicitContainer<T> outputs_from_all_shards
Boolean or list of boolean. For each output, if `True`, outputs from all shards are concatenated along the corresponding `output_shard_axes` entry. Otherwise, each output is taken from an arbitrary shard. If the argument is a boolean, the argument's value is used for each output.
object output_shard_axes
A list of dimensions along which to concatenate the outputs of `computation`, or `None`. `None` means "concatenate all outputs along dimension 0". If not `None`, there must be one dimension per output. Ignored if `outputs_from_all_shards` is False.
object infeed_queue
If not `None`, the `InfeedQueue` to use to augment the inputs of `computation`.
object device_assignment
If not `None`, a `DeviceAssignment` describing the mapping between logical cores in the computation with physical cores in the TPU topology. Uses a default device assignment if `None`. The `DeviceAssignment` may be omitted if each shard of the computation uses only one core, and there is either only one shard, or the number of shards is equal to the number of cores in the TPU system.
object name
(Deprecated) Does nothing.
Returns
object
A list of output tensors.

object shutdown_system(object job)

Shuts down a running a distributed TPU system.
Parameters
object job
The job (the XXX in TensorFlow device specification /job:XXX) that contains the TPU devices that will be shutdown. If job=None it is assumed there is only one job in the TensorFlow flock, and an error will be returned if this assumption does not hold.

object shutdown_system_dyn(object job)

Shuts down a running a distributed TPU system.
Parameters
object job
The job (the XXX in TensorFlow device specification /job:XXX) that contains the TPU devices that will be shutdown. If job=None it is assumed there is only one job in the TensorFlow flock, and an error will be returned if this assumption does not hold.

Public properties

PythonFunctionContainer batch_parallel_fn get;

PythonFunctionContainer bfloat16_scope_fn get;

PythonFunctionContainer cross_replica_sum_fn get;

PythonFunctionContainer initialize_system_fn get;

PythonFunctionContainer outside_compilation_fn get;

PythonFunctionContainer replicate_fn get;

PythonFunctionContainer rewrite_fn get;

PythonFunctionContainer shard_fn get;

PythonFunctionContainer shutdown_system_fn get;