Type Dataset
Namespace tensorflow.data
Parent Dataset
Interfaces IDataset
Represents a potentially large set of elements. A `Dataset` can be used to represent an input pipeline as a
collection of elements and a "logical plan" of transformations that act on
those elements.
Methods
- batch
- batch
- batch
- batch
- batch
- batch_dyn
- from_generator
- from_generator
- from_generator
- from_generator
- from_generator
- from_generator
- from_generator
- from_generator
- from_sparse_tensor_slices
- from_sparse_tensor_slices_dyn
- from_tensor_slices
- from_tensor_slices_dyn
- from_tensors
- from_tensors_dyn
- make_one_shot_iterator
- make_one_shot_iterator_dyn
- padded_batch
- padded_batch
- padded_batch
- shard
- shard_dyn
- skip
- take
- take_dyn
- zip
- zip
- zip
Properties
Public instance methods
object batch(TensorShape batch_size, bool drop_remainder)
Combines consecutive elements of this dataset into batches. The components of the resulting element will have an additional outer
dimension, which will be `batch_size` (or `N % batch_size` for the last
element if `batch_size` does not divide the number of input elements `N`
evenly and `drop_remainder` is `False`). If your program depends on the
batches having the same outer dimension, you should set the `drop_remainder`
argument to `True` to prevent the smaller batch from being produced.
Parameters
-
TensorShape
batch_size - A
tf.int64
scalartf.Tensor
, representing the number of consecutive elements of this dataset to combine in a single batch. -
bool
drop_remainder - (Optional.) A
tf.bool
scalartf.Tensor
, representing whether the last batch should be dropped in the case it has fewer than `batch_size` elements; the default behavior is not to drop the smaller batch.
Returns
-
object
object batch(IGraphNodeBase batch_size, bool drop_remainder)
Combines consecutive elements of this dataset into batches. The components of the resulting element will have an additional outer
dimension, which will be `batch_size` (or `N % batch_size` for the last
element if `batch_size` does not divide the number of input elements `N`
evenly and `drop_remainder` is `False`). If your program depends on the
batches having the same outer dimension, you should set the `drop_remainder`
argument to `True` to prevent the smaller batch from being produced.
Parameters
-
IGraphNodeBase
batch_size - A
tf.int64
scalartf.Tensor
, representing the number of consecutive elements of this dataset to combine in a single batch. -
bool
drop_remainder - (Optional.) A
tf.bool
scalartf.Tensor
, representing whether the last batch should be dropped in the case it has fewer than `batch_size` elements; the default behavior is not to drop the smaller batch.
Returns
-
object
object batch(int batch_size, bool drop_remainder)
Combines consecutive elements of this dataset into batches. The components of the resulting element will have an additional outer
dimension, which will be `batch_size` (or `N % batch_size` for the last
element if `batch_size` does not divide the number of input elements `N`
evenly and `drop_remainder` is `False`). If your program depends on the
batches having the same outer dimension, you should set the `drop_remainder`
argument to `True` to prevent the smaller batch from being produced.
Parameters
-
int
batch_size - A
tf.int64
scalartf.Tensor
, representing the number of consecutive elements of this dataset to combine in a single batch. -
bool
drop_remainder - (Optional.) A
tf.bool
scalartf.Tensor
, representing whether the last batch should be dropped in the case it has fewer than `batch_size` elements; the default behavior is not to drop the smaller batch.
Returns
-
object
object batch(ndarray batch_size, bool drop_remainder)
Combines consecutive elements of this dataset into batches. The components of the resulting element will have an additional outer
dimension, which will be `batch_size` (or `N % batch_size` for the last
element if `batch_size` does not divide the number of input elements `N`
evenly and `drop_remainder` is `False`). If your program depends on the
batches having the same outer dimension, you should set the `drop_remainder`
argument to `True` to prevent the smaller batch from being produced.
Parameters
-
ndarray
batch_size - A
tf.int64
scalartf.Tensor
, representing the number of consecutive elements of this dataset to combine in a single batch. -
bool
drop_remainder - (Optional.) A
tf.bool
scalartf.Tensor
, representing whether the last batch should be dropped in the case it has fewer than `batch_size` elements; the default behavior is not to drop the smaller batch.
Returns
-
object
object batch(Dimension batch_size, bool drop_remainder)
Combines consecutive elements of this dataset into batches. The components of the resulting element will have an additional outer
dimension, which will be `batch_size` (or `N % batch_size` for the last
element if `batch_size` does not divide the number of input elements `N`
evenly and `drop_remainder` is `False`). If your program depends on the
batches having the same outer dimension, you should set the `drop_remainder`
argument to `True` to prevent the smaller batch from being produced.
Parameters
-
Dimension
batch_size - A
tf.int64
scalartf.Tensor
, representing the number of consecutive elements of this dataset to combine in a single batch. -
bool
drop_remainder - (Optional.) A
tf.bool
scalartf.Tensor
, representing whether the last batch should be dropped in the case it has fewer than `batch_size` elements; the default behavior is not to drop the smaller batch.
Returns
-
object
object batch_dyn(object batch_size, ImplicitContainer<T> drop_remainder)
Combines consecutive elements of this dataset into batches. The components of the resulting element will have an additional outer
dimension, which will be `batch_size` (or `N % batch_size` for the last
element if `batch_size` does not divide the number of input elements `N`
evenly and `drop_remainder` is `False`). If your program depends on the
batches having the same outer dimension, you should set the `drop_remainder`
argument to `True` to prevent the smaller batch from being produced.
Parameters
-
object
batch_size - A
tf.int64
scalartf.Tensor
, representing the number of consecutive elements of this dataset to combine in a single batch. -
ImplicitContainer<T>
drop_remainder - (Optional.) A
tf.bool
scalartf.Tensor
, representing whether the last batch should be dropped in the case it has fewer than `batch_size` elements; the default behavior is not to drop the smaller batch.
Returns
-
object
object make_one_shot_iterator()
Creates an `Iterator` for enumerating the elements of this dataset. (deprecated) Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version.
Instructions for updating:
Use `for... in dataset:` to iterate over a dataset. If using
tf.estimator
, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`. Note: The returned iterator will be initialized automatically.
A "one-shot" iterator does not currently support re-initialization.
Returns
-
object
- An `Iterator` over the elements of this dataset.
object make_one_shot_iterator_dyn()
Creates an `Iterator` for enumerating the elements of this dataset. (deprecated) Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version.
Instructions for updating:
Use `for... in dataset:` to iterate over a dataset. If using
tf.estimator
, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`. Note: The returned iterator will be initialized automatically.
A "one-shot" iterator does not currently support re-initialization.
Returns
-
object
- An `Iterator` over the elements of this dataset.
object padded_batch(int batch_size, IGraphNodeBase padded_shapes, Nullable<ValueTuple<int, string>> padding_values, bool drop_remainder)
Combines consecutive elements of this dataset into padded batches. This transformation combines multiple consecutive elements of the input
dataset into a single element. Like
tf.data.Dataset.batch
, the components of the resulting element will
have an additional outer dimension, which will be `batch_size` (or
`N % batch_size` for the last element if `batch_size` does not divide the
number of input elements `N` evenly and `drop_remainder` is `False`). If
your program depends on the batches having the same outer dimension, you
should set the `drop_remainder` argument to `True` to prevent the smaller
batch from being produced. Unlike tf.data.Dataset.batch
, the input elements to be batched may have
different shapes, and this transformation will pad each component to the
respective shape in `padding_shapes`. The `padding_shapes` argument
determines the resulting shape for each dimension of each component in an
output element: * If the dimension is a constant (e.g. `tf.compat.v1.Dimension(37)`), the
component
will be padded out to that length in that dimension.
* If the dimension is unknown (e.g. `tf.compat.v1.Dimension(None)`), the
component
will be padded out to the maximum length of all elements in that
dimension. See also tf.data.experimental.dense_to_sparse_batch
, which combines
elements that may have different shapes into a tf.SparseTensor
.
Parameters
-
int
batch_size - A
tf.int64
scalartf.Tensor
, representing the number of consecutive elements of this dataset to combine in a single batch. -
IGraphNodeBase
padded_shapes - A nested structure of
tf.TensorShape
ortf.int64
vector tensor-like objects representing the shape to which the respective component of each input element should be padded prior to batching. Any unknown dimensions (e.g. `tf.compat.v1.Dimension(None)` in atf.TensorShape
or `-1` in a tensor-like object) will be padded to the maximum size of that dimension in each batch. -
Nullable<ValueTuple<int, string>>
padding_values - (Optional.) A nested structure of scalar-shaped
tf.Tensor
, representing the padding values to use for the respective components. Defaults are `0` for numeric types and the empty string for string types. -
bool
drop_remainder - (Optional.) A
tf.bool
scalartf.Tensor
, representing whether the last batch should be dropped in the case it has fewer than `batch_size` elements; the default behavior is not to drop the smaller batch.
Returns
-
object
object padded_batch(int batch_size, ValueTuple<IEnumerable<int>, object> padded_shapes, Nullable<ValueTuple<int, string>> padding_values, bool drop_remainder)
Combines consecutive elements of this dataset into padded batches. This transformation combines multiple consecutive elements of the input
dataset into a single element. Like
tf.data.Dataset.batch
, the components of the resulting element will
have an additional outer dimension, which will be `batch_size` (or
`N % batch_size` for the last element if `batch_size` does not divide the
number of input elements `N` evenly and `drop_remainder` is `False`). If
your program depends on the batches having the same outer dimension, you
should set the `drop_remainder` argument to `True` to prevent the smaller
batch from being produced. Unlike tf.data.Dataset.batch
, the input elements to be batched may have
different shapes, and this transformation will pad each component to the
respective shape in `padding_shapes`. The `padding_shapes` argument
determines the resulting shape for each dimension of each component in an
output element: * If the dimension is a constant (e.g. `tf.compat.v1.Dimension(37)`), the
component
will be padded out to that length in that dimension.
* If the dimension is unknown (e.g. `tf.compat.v1.Dimension(None)`), the
component
will be padded out to the maximum length of all elements in that
dimension. See also tf.data.experimental.dense_to_sparse_batch
, which combines
elements that may have different shapes into a tf.SparseTensor
.
Parameters
-
int
batch_size - A
tf.int64
scalartf.Tensor
, representing the number of consecutive elements of this dataset to combine in a single batch. -
ValueTuple<IEnumerable<int>, object>
padded_shapes - A nested structure of
tf.TensorShape
ortf.int64
vector tensor-like objects representing the shape to which the respective component of each input element should be padded prior to batching. Any unknown dimensions (e.g. `tf.compat.v1.Dimension(None)` in atf.TensorShape
or `-1` in a tensor-like object) will be padded to the maximum size of that dimension in each batch. -
Nullable<ValueTuple<int, string>>
padding_values - (Optional.) A nested structure of scalar-shaped
tf.Tensor
, representing the padding values to use for the respective components. Defaults are `0` for numeric types and the empty string for string types. -
bool
drop_remainder - (Optional.) A
tf.bool
scalartf.Tensor
, representing whether the last batch should be dropped in the case it has fewer than `batch_size` elements; the default behavior is not to drop the smaller batch.
Returns
-
object
object padded_batch(int batch_size, IEnumerable<object> padded_shapes, Nullable<ValueTuple<int, string>> padding_values, bool drop_remainder)
Combines consecutive elements of this dataset into padded batches. This transformation combines multiple consecutive elements of the input
dataset into a single element. Like
tf.data.Dataset.batch
, the components of the resulting element will
have an additional outer dimension, which will be `batch_size` (or
`N % batch_size` for the last element if `batch_size` does not divide the
number of input elements `N` evenly and `drop_remainder` is `False`). If
your program depends on the batches having the same outer dimension, you
should set the `drop_remainder` argument to `True` to prevent the smaller
batch from being produced. Unlike tf.data.Dataset.batch
, the input elements to be batched may have
different shapes, and this transformation will pad each component to the
respective shape in `padding_shapes`. The `padding_shapes` argument
determines the resulting shape for each dimension of each component in an
output element: * If the dimension is a constant (e.g. `tf.compat.v1.Dimension(37)`), the
component
will be padded out to that length in that dimension.
* If the dimension is unknown (e.g. `tf.compat.v1.Dimension(None)`), the
component
will be padded out to the maximum length of all elements in that
dimension. See also tf.data.experimental.dense_to_sparse_batch
, which combines
elements that may have different shapes into a tf.SparseTensor
.
Parameters
-
int
batch_size - A
tf.int64
scalartf.Tensor
, representing the number of consecutive elements of this dataset to combine in a single batch. -
IEnumerable<object>
padded_shapes - A nested structure of
tf.TensorShape
ortf.int64
vector tensor-like objects representing the shape to which the respective component of each input element should be padded prior to batching. Any unknown dimensions (e.g. `tf.compat.v1.Dimension(None)` in atf.TensorShape
or `-1` in a tensor-like object) will be padded to the maximum size of that dimension in each batch. -
Nullable<ValueTuple<int, string>>
padding_values - (Optional.) A nested structure of scalar-shaped
tf.Tensor
, representing the padding values to use for the respective components. Defaults are `0` for numeric types and the empty string for string types. -
bool
drop_remainder - (Optional.) A
tf.bool
scalartf.Tensor
, representing whether the last batch should be dropped in the case it has fewer than `batch_size` elements; the default behavior is not to drop the smaller batch.
Returns
-
object
object skip(IEnumerable<int> count)
Creates a `Dataset` that skips `count` elements from this dataset.
Parameters
Returns
-
object
object take(int count)
Creates a `Dataset` with at most `count` elements from this dataset.
Parameters
Returns
-
object
object take_dyn(object count)
Creates a `Dataset` with at most `count` elements from this dataset.
Parameters
Returns
-
object
Public static methods
Dataset from_generator(PythonFunctionContainer generator, IEnumerable<object> output_types, IEnumerable<object> output_shapes, Nullable<ValueTuple<object>> args)
Creates a `Dataset` whose elements are generated by `generator`. The `generator` argument must be a callable object that returns
an object that supports the `iter()` protocol (e.g. a generator function).
The elements generated by `generator` must be compatible with the given
`output_types` and (optional) `output_shapes` arguments.
NOTE: The current implementation of `Dataset.from_generator()` uses
tf.numpy_function
and inherits the same constraints. In particular, it
requires the `Dataset`- and `Iterator`-related operations to be placed
on a device in the same process as the Python program that called
`Dataset.from_generator()`. The body of `generator` will not be
serialized in a `GraphDef`, and you should not use this method if you
need to serialize your model and restore it in a different environment. NOTE: If `generator` depends on mutable global variables or other external
state, be aware that the runtime may invoke `generator` multiple times
(in order to support repeating the `Dataset`) and at any time
between the call to `Dataset.from_generator()` and the production of the
first element from the generator. Mutating global variables or external
state can cause undefined behavior, and we recommend that you explicitly
cache any external state in `generator` before calling
`Dataset.from_generator()`.
Parameters
-
PythonFunctionContainer
generator - A callable object that returns an object that supports the `iter()` protocol. If `args` is not specified, `generator` must take no arguments; otherwise it must take as many arguments as there are values in `args`.
-
IEnumerable<object>
output_types - A nested structure of
tf.DType
objects corresponding to each component of an element yielded by `generator`. -
IEnumerable<object>
output_shapes - (Optional.) A nested structure of
tf.TensorShape
objects corresponding to each component of an element yielded by `generator`. -
Nullable<ValueTuple<object>>
args - (Optional.) A tuple of
tf.Tensor
objects that will be evaluated and passed to `generator` as NumPy-array arguments.
Returns
Show Example
import itertools tf.compat.v1.enable_eager_execution() def gen(): for i in itertools.count(1): yield (i, [1] * i) ds = tf.data.Dataset.from_generator( gen, (tf.int64, tf.int64), (tf.TensorShape([]), tf.TensorShape([None]))) for value in ds.take(2): print value # (1, array([1])) # (2, array([1, 1]))
Dataset from_generator(PythonFunctionContainer generator, IDictionary<string, object> output_types, PythonClassContainer output_shapes, Nullable<ValueTuple<object>> args)
Creates a `Dataset` whose elements are generated by `generator`. The `generator` argument must be a callable object that returns
an object that supports the `iter()` protocol (e.g. a generator function).
The elements generated by `generator` must be compatible with the given
`output_types` and (optional) `output_shapes` arguments.
NOTE: The current implementation of `Dataset.from_generator()` uses
tf.numpy_function
and inherits the same constraints. In particular, it
requires the `Dataset`- and `Iterator`-related operations to be placed
on a device in the same process as the Python program that called
`Dataset.from_generator()`. The body of `generator` will not be
serialized in a `GraphDef`, and you should not use this method if you
need to serialize your model and restore it in a different environment. NOTE: If `generator` depends on mutable global variables or other external
state, be aware that the runtime may invoke `generator` multiple times
(in order to support repeating the `Dataset`) and at any time
between the call to `Dataset.from_generator()` and the production of the
first element from the generator. Mutating global variables or external
state can cause undefined behavior, and we recommend that you explicitly
cache any external state in `generator` before calling
`Dataset.from_generator()`.
Parameters
-
PythonFunctionContainer
generator - A callable object that returns an object that supports the `iter()` protocol. If `args` is not specified, `generator` must take no arguments; otherwise it must take as many arguments as there are values in `args`.
-
IDictionary<string, object>
output_types - A nested structure of
tf.DType
objects corresponding to each component of an element yielded by `generator`. -
PythonClassContainer
output_shapes - (Optional.) A nested structure of
tf.TensorShape
objects corresponding to each component of an element yielded by `generator`. -
Nullable<ValueTuple<object>>
args - (Optional.) A tuple of
tf.Tensor
objects that will be evaluated and passed to `generator` as NumPy-array arguments.
Returns
Show Example
import itertools tf.compat.v1.enable_eager_execution() def gen(): for i in itertools.count(1): yield (i, [1] * i) ds = tf.data.Dataset.from_generator( gen, (tf.int64, tf.int64), (tf.TensorShape([]), tf.TensorShape([None]))) for value in ds.take(2): print value # (1, array([1])) # (2, array([1, 1]))
Dataset from_generator(PythonFunctionContainer generator, IEnumerable<object> output_types, PythonClassContainer output_shapes, Nullable<ValueTuple<object>> args)
Creates a `Dataset` whose elements are generated by `generator`. The `generator` argument must be a callable object that returns
an object that supports the `iter()` protocol (e.g. a generator function).
The elements generated by `generator` must be compatible with the given
`output_types` and (optional) `output_shapes` arguments.
NOTE: The current implementation of `Dataset.from_generator()` uses
tf.numpy_function
and inherits the same constraints. In particular, it
requires the `Dataset`- and `Iterator`-related operations to be placed
on a device in the same process as the Python program that called
`Dataset.from_generator()`. The body of `generator` will not be
serialized in a `GraphDef`, and you should not use this method if you
need to serialize your model and restore it in a different environment. NOTE: If `generator` depends on mutable global variables or other external
state, be aware that the runtime may invoke `generator` multiple times
(in order to support repeating the `Dataset`) and at any time
between the call to `Dataset.from_generator()` and the production of the
first element from the generator. Mutating global variables or external
state can cause undefined behavior, and we recommend that you explicitly
cache any external state in `generator` before calling
`Dataset.from_generator()`.
Parameters
-
PythonFunctionContainer
generator - A callable object that returns an object that supports the `iter()` protocol. If `args` is not specified, `generator` must take no arguments; otherwise it must take as many arguments as there are values in `args`.
-
IEnumerable<object>
output_types - A nested structure of
tf.DType
objects corresponding to each component of an element yielded by `generator`. -
PythonClassContainer
output_shapes - (Optional.) A nested structure of
tf.TensorShape
objects corresponding to each component of an element yielded by `generator`. -
Nullable<ValueTuple<object>>
args - (Optional.) A tuple of
tf.Tensor
objects that will be evaluated and passed to `generator` as NumPy-array arguments.
Returns
Show Example
import itertools tf.compat.v1.enable_eager_execution() def gen(): for i in itertools.count(1): yield (i, [1] * i) ds = tf.data.Dataset.from_generator( gen, (tf.int64, tf.int64), (tf.TensorShape([]), tf.TensorShape([None]))) for value in ds.take(2): print value # (1, array([1])) # (2, array([1, 1]))
Dataset from_generator(PythonFunctionContainer generator, DType output_types, IEnumerable<object> output_shapes, Nullable<ValueTuple<object>> args)
Creates a `Dataset` whose elements are generated by `generator`. The `generator` argument must be a callable object that returns
an object that supports the `iter()` protocol (e.g. a generator function).
The elements generated by `generator` must be compatible with the given
`output_types` and (optional) `output_shapes` arguments.
NOTE: The current implementation of `Dataset.from_generator()` uses
tf.numpy_function
and inherits the same constraints. In particular, it
requires the `Dataset`- and `Iterator`-related operations to be placed
on a device in the same process as the Python program that called
`Dataset.from_generator()`. The body of `generator` will not be
serialized in a `GraphDef`, and you should not use this method if you
need to serialize your model and restore it in a different environment. NOTE: If `generator` depends on mutable global variables or other external
state, be aware that the runtime may invoke `generator` multiple times
(in order to support repeating the `Dataset`) and at any time
between the call to `Dataset.from_generator()` and the production of the
first element from the generator. Mutating global variables or external
state can cause undefined behavior, and we recommend that you explicitly
cache any external state in `generator` before calling
`Dataset.from_generator()`.
Parameters
-
PythonFunctionContainer
generator - A callable object that returns an object that supports the `iter()` protocol. If `args` is not specified, `generator` must take no arguments; otherwise it must take as many arguments as there are values in `args`.
-
DType
output_types - A nested structure of
tf.DType
objects corresponding to each component of an element yielded by `generator`. -
IEnumerable<object>
output_shapes - (Optional.) A nested structure of
tf.TensorShape
objects corresponding to each component of an element yielded by `generator`. -
Nullable<ValueTuple<object>>
args - (Optional.) A tuple of
tf.Tensor
objects that will be evaluated and passed to `generator` as NumPy-array arguments.
Returns
Show Example
import itertools tf.compat.v1.enable_eager_execution() def gen(): for i in itertools.count(1): yield (i, [1] * i) ds = tf.data.Dataset.from_generator( gen, (tf.int64, tf.int64), (tf.TensorShape([]), tf.TensorShape([None]))) for value in ds.take(2): print value # (1, array([1])) # (2, array([1, 1]))
Dataset from_generator(PythonFunctionContainer generator, DType output_types, int output_shapes, Nullable<ValueTuple<object>> args)
Creates a `Dataset` whose elements are generated by `generator`. The `generator` argument must be a callable object that returns
an object that supports the `iter()` protocol (e.g. a generator function).
The elements generated by `generator` must be compatible with the given
`output_types` and (optional) `output_shapes` arguments.
NOTE: The current implementation of `Dataset.from_generator()` uses
tf.numpy_function
and inherits the same constraints. In particular, it
requires the `Dataset`- and `Iterator`-related operations to be placed
on a device in the same process as the Python program that called
`Dataset.from_generator()`. The body of `generator` will not be
serialized in a `GraphDef`, and you should not use this method if you
need to serialize your model and restore it in a different environment. NOTE: If `generator` depends on mutable global variables or other external
state, be aware that the runtime may invoke `generator` multiple times
(in order to support repeating the `Dataset`) and at any time
between the call to `Dataset.from_generator()` and the production of the
first element from the generator. Mutating global variables or external
state can cause undefined behavior, and we recommend that you explicitly
cache any external state in `generator` before calling
`Dataset.from_generator()`.
Parameters
-
PythonFunctionContainer
generator - A callable object that returns an object that supports the `iter()` protocol. If `args` is not specified, `generator` must take no arguments; otherwise it must take as many arguments as there are values in `args`.
-
DType
output_types - A nested structure of
tf.DType
objects corresponding to each component of an element yielded by `generator`. -
int
output_shapes - (Optional.) A nested structure of
tf.TensorShape
objects corresponding to each component of an element yielded by `generator`. -
Nullable<ValueTuple<object>>
args - (Optional.) A tuple of
tf.Tensor
objects that will be evaluated and passed to `generator` as NumPy-array arguments.
Returns
Show Example
import itertools tf.compat.v1.enable_eager_execution() def gen(): for i in itertools.count(1): yield (i, [1] * i) ds = tf.data.Dataset.from_generator( gen, (tf.int64, tf.int64), (tf.TensorShape([]), tf.TensorShape([None]))) for value in ds.take(2): print value # (1, array([1])) # (2, array([1, 1]))
Dataset from_generator(PythonFunctionContainer generator, IDictionary<string, object> output_types, int output_shapes, Nullable<ValueTuple<object>> args)
Creates a `Dataset` whose elements are generated by `generator`. The `generator` argument must be a callable object that returns
an object that supports the `iter()` protocol (e.g. a generator function).
The elements generated by `generator` must be compatible with the given
`output_types` and (optional) `output_shapes` arguments.
NOTE: The current implementation of `Dataset.from_generator()` uses
tf.numpy_function
and inherits the same constraints. In particular, it
requires the `Dataset`- and `Iterator`-related operations to be placed
on a device in the same process as the Python program that called
`Dataset.from_generator()`. The body of `generator` will not be
serialized in a `GraphDef`, and you should not use this method if you
need to serialize your model and restore it in a different environment. NOTE: If `generator` depends on mutable global variables or other external
state, be aware that the runtime may invoke `generator` multiple times
(in order to support repeating the `Dataset`) and at any time
between the call to `Dataset.from_generator()` and the production of the
first element from the generator. Mutating global variables or external
state can cause undefined behavior, and we recommend that you explicitly
cache any external state in `generator` before calling
`Dataset.from_generator()`.
Parameters
-
PythonFunctionContainer
generator - A callable object that returns an object that supports the `iter()` protocol. If `args` is not specified, `generator` must take no arguments; otherwise it must take as many arguments as there are values in `args`.
-
IDictionary<string, object>
output_types - A nested structure of
tf.DType
objects corresponding to each component of an element yielded by `generator`. -
int
output_shapes - (Optional.) A nested structure of
tf.TensorShape
objects corresponding to each component of an element yielded by `generator`. -
Nullable<ValueTuple<object>>
args - (Optional.) A tuple of
tf.Tensor
objects that will be evaluated and passed to `generator` as NumPy-array arguments.
Returns
Show Example
import itertools tf.compat.v1.enable_eager_execution() def gen(): for i in itertools.count(1): yield (i, [1] * i) ds = tf.data.Dataset.from_generator( gen, (tf.int64, tf.int64), (tf.TensorShape([]), tf.TensorShape([None]))) for value in ds.take(2): print value # (1, array([1])) # (2, array([1, 1]))
Dataset from_generator(PythonFunctionContainer generator, IDictionary<string, object> output_types, IEnumerable<object> output_shapes, Nullable<ValueTuple<object>> args)
Creates a `Dataset` whose elements are generated by `generator`. The `generator` argument must be a callable object that returns
an object that supports the `iter()` protocol (e.g. a generator function).
The elements generated by `generator` must be compatible with the given
`output_types` and (optional) `output_shapes` arguments.
NOTE: The current implementation of `Dataset.from_generator()` uses
tf.numpy_function
and inherits the same constraints. In particular, it
requires the `Dataset`- and `Iterator`-related operations to be placed
on a device in the same process as the Python program that called
`Dataset.from_generator()`. The body of `generator` will not be
serialized in a `GraphDef`, and you should not use this method if you
need to serialize your model and restore it in a different environment. NOTE: If `generator` depends on mutable global variables or other external
state, be aware that the runtime may invoke `generator` multiple times
(in order to support repeating the `Dataset`) and at any time
between the call to `Dataset.from_generator()` and the production of the
first element from the generator. Mutating global variables or external
state can cause undefined behavior, and we recommend that you explicitly
cache any external state in `generator` before calling
`Dataset.from_generator()`.
Parameters
-
PythonFunctionContainer
generator - A callable object that returns an object that supports the `iter()` protocol. If `args` is not specified, `generator` must take no arguments; otherwise it must take as many arguments as there are values in `args`.
-
IDictionary<string, object>
output_types - A nested structure of
tf.DType
objects corresponding to each component of an element yielded by `generator`. -
IEnumerable<object>
output_shapes - (Optional.) A nested structure of
tf.TensorShape
objects corresponding to each component of an element yielded by `generator`. -
Nullable<ValueTuple<object>>
args - (Optional.) A tuple of
tf.Tensor
objects that will be evaluated and passed to `generator` as NumPy-array arguments.
Returns
Show Example
import itertools tf.compat.v1.enable_eager_execution() def gen(): for i in itertools.count(1): yield (i, [1] * i) ds = tf.data.Dataset.from_generator( gen, (tf.int64, tf.int64), (tf.TensorShape([]), tf.TensorShape([None]))) for value in ds.take(2): print value # (1, array([1])) # (2, array([1, 1]))
Dataset from_generator(PythonFunctionContainer generator, IEnumerable<object> output_types, int output_shapes, Nullable<ValueTuple<object>> args)
Creates a `Dataset` whose elements are generated by `generator`. The `generator` argument must be a callable object that returns
an object that supports the `iter()` protocol (e.g. a generator function).
The elements generated by `generator` must be compatible with the given
`output_types` and (optional) `output_shapes` arguments.
NOTE: The current implementation of `Dataset.from_generator()` uses
tf.numpy_function
and inherits the same constraints. In particular, it
requires the `Dataset`- and `Iterator`-related operations to be placed
on a device in the same process as the Python program that called
`Dataset.from_generator()`. The body of `generator` will not be
serialized in a `GraphDef`, and you should not use this method if you
need to serialize your model and restore it in a different environment. NOTE: If `generator` depends on mutable global variables or other external
state, be aware that the runtime may invoke `generator` multiple times
(in order to support repeating the `Dataset`) and at any time
between the call to `Dataset.from_generator()` and the production of the
first element from the generator. Mutating global variables or external
state can cause undefined behavior, and we recommend that you explicitly
cache any external state in `generator` before calling
`Dataset.from_generator()`.
Parameters
-
PythonFunctionContainer
generator - A callable object that returns an object that supports the `iter()` protocol. If `args` is not specified, `generator` must take no arguments; otherwise it must take as many arguments as there are values in `args`.
-
IEnumerable<object>
output_types - A nested structure of
tf.DType
objects corresponding to each component of an element yielded by `generator`. -
int
output_shapes - (Optional.) A nested structure of
tf.TensorShape
objects corresponding to each component of an element yielded by `generator`. -
Nullable<ValueTuple<object>>
args - (Optional.) A tuple of
tf.Tensor
objects that will be evaluated and passed to `generator` as NumPy-array arguments.
Returns
Show Example
import itertools tf.compat.v1.enable_eager_execution() def gen(): for i in itertools.count(1): yield (i, [1] * i) ds = tf.data.Dataset.from_generator( gen, (tf.int64, tf.int64), (tf.TensorShape([]), tf.TensorShape([None]))) for value in ds.take(2): print value # (1, array([1])) # (2, array([1, 1]))
DatasetV1Adapter from_sparse_tensor_slices(SparseTensor sparse_tensor)
Splits each rank-N
tf.SparseTensor
in this dataset row-wise. (deprecated) Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.from_tensor_slices()`.
Parameters
-
SparseTensor
sparse_tensor - A
tf.SparseTensor
.
Returns
object from_sparse_tensor_slices_dyn(object sparse_tensor)
Splits each rank-N
tf.SparseTensor
in this dataset row-wise. (deprecated) Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.from_tensor_slices()`.
Parameters
-
object
sparse_tensor - A
tf.SparseTensor
.
Returns
-
object
DatasetV1Adapter from_tensor_slices(object tensors)
Creates a `Dataset` whose elements are slices of the given tensors. Note that if `tensors` contains a NumPy array, and eager execution is not
enabled, the values will be embedded in the graph as one or more
tf.constant
operations. For large datasets (> 1 GB), this can waste
memory and run into byte limits of graph serialization. If `tensors`
contains one or more large NumPy arrays, consider the alternative described
in [this guide](
https://tensorflow.org/guide/datasets#consuming_numpy_arrays).
Parameters
-
object
tensors - A dataset element, with each component having the same size in the 0th dimension.
Returns
object from_tensor_slices_dyn(object tensors)
Creates a `Dataset` whose elements are slices of the given tensors. Note that if `tensors` contains a NumPy array, and eager execution is not
enabled, the values will be embedded in the graph as one or more
tf.constant
operations. For large datasets (> 1 GB), this can waste
memory and run into byte limits of graph serialization. If `tensors`
contains one or more large NumPy arrays, consider the alternative described
in [this guide](
https://tensorflow.org/guide/datasets#consuming_numpy_arrays).
Parameters
-
object
tensors - A dataset element, with each component having the same size in the 0th dimension.
Returns
-
object
DatasetV1Adapter from_tensors(object tensors)
Creates a `Dataset` with a single element, comprising the given tensors. Note that if `tensors` contains a NumPy array, and eager execution is not
enabled, the values will be embedded in the graph as one or more
tf.constant
operations. For large datasets (> 1 GB), this can waste
memory and run into byte limits of graph serialization. If `tensors`
contains one or more large NumPy arrays, consider the alternative described
in [this
guide](https://tensorflow.org/guide/datasets#consuming_numpy_arrays).
Parameters
-
object
tensors - A dataset element.
Returns
object from_tensors_dyn(object tensors)
Creates a `Dataset` with a single element, comprising the given tensors. Note that if `tensors` contains a NumPy array, and eager execution is not
enabled, the values will be embedded in the graph as one or more
tf.constant
operations. For large datasets (> 1 GB), this can waste
memory and run into byte limits of graph serialization. If `tensors`
contains one or more large NumPy arrays, consider the alternative described
in [this
guide](https://tensorflow.org/guide/datasets#consuming_numpy_arrays).
Parameters
-
object
tensors - A dataset element.
Returns
-
object
DatasetV1Adapter zip(Dataset datasets)
Creates a `Dataset` by zipping together the given datasets. This method has similar semantics to the built-in `zip()` function
in Python, with the main difference being that the `datasets`
argument can be an arbitrary nested structure of `Dataset` objects.
Parameters
-
Dataset
datasets - A nested structure of datasets.
Returns
Show Example
a = Dataset.range(1, 4) # ==> [ 1, 2, 3 ] b = Dataset.range(4, 7) # ==> [ 4, 5, 6 ] c = Dataset.range(7, 13).batch(2) # ==> [ [7, 8], [9, 10], [11, 12] ] d = Dataset.range(13, 15) # ==> [ 13, 14 ] # The nested structure of the `datasets` argument determines the # structure of elements in the resulting dataset. Dataset.zip((a, b)) # ==> [ (1, 4), (2, 5), (3, 6) ] Dataset.zip((b, a)) # ==> [ (4, 1), (5, 2), (6, 3) ] # The `datasets` argument may contain an arbitrary number of # datasets. Dataset.zip((a, b, c)) # ==> [ (1, 4, [7, 8]), # (2, 5, [9, 10]), # (3, 6, [11, 12]) ] # The number of elements in the resulting dataset is the same as # the size of the smallest dataset in `datasets`. Dataset.zip((a, d)) # ==> [ (1, 13), (2, 14) ]
DatasetV1Adapter zip(DatasetV1Adapter datasets)
Creates a `Dataset` by zipping together the given datasets. This method has similar semantics to the built-in `zip()` function
in Python, with the main difference being that the `datasets`
argument can be an arbitrary nested structure of `Dataset` objects.
Parameters
-
DatasetV1Adapter
datasets - A nested structure of datasets.
Returns
Show Example
a = Dataset.range(1, 4) # ==> [ 1, 2, 3 ] b = Dataset.range(4, 7) # ==> [ 4, 5, 6 ] c = Dataset.range(7, 13).batch(2) # ==> [ [7, 8], [9, 10], [11, 12] ] d = Dataset.range(13, 15) # ==> [ 13, 14 ] # The nested structure of the `datasets` argument determines the # structure of elements in the resulting dataset. Dataset.zip((a, b)) # ==> [ (1, 4), (2, 5), (3, 6) ] Dataset.zip((b, a)) # ==> [ (4, 1), (5, 2), (6, 3) ] # The `datasets` argument may contain an arbitrary number of # datasets. Dataset.zip((a, b, c)) # ==> [ (1, 4, [7, 8]), # (2, 5, [9, 10]), # (3, 6, [11, 12]) ] # The number of elements in the resulting dataset is the same as # the size of the smallest dataset in `datasets`. Dataset.zip((a, d)) # ==> [ (1, 13), (2, 14) ]
DatasetV1Adapter zip(object datasets)
Creates a `Dataset` by zipping together the given datasets. This method has similar semantics to the built-in `zip()` function
in Python, with the main difference being that the `datasets`
argument can be an arbitrary nested structure of `Dataset` objects.
Parameters
-
object
datasets - A nested structure of datasets.
Returns
Show Example
a = Dataset.range(1, 4) # ==> [ 1, 2, 3 ] b = Dataset.range(4, 7) # ==> [ 4, 5, 6 ] c = Dataset.range(7, 13).batch(2) # ==> [ [7, 8], [9, 10], [11, 12] ] d = Dataset.range(13, 15) # ==> [ 13, 14 ] # The nested structure of the `datasets` argument determines the # structure of elements in the resulting dataset. Dataset.zip((a, b)) # ==> [ (1, 4), (2, 5), (3, 6) ] Dataset.zip((b, a)) # ==> [ (4, 1), (5, 2), (6, 3) ] # The `datasets` argument may contain an arbitrary number of # datasets. Dataset.zip((a, b, c)) # ==> [ (1, 4, [7, 8]), # (2, 5, [9, 10]), # (3, 6, [11, 12]) ] # The number of elements in the resulting dataset is the same as # the size of the smallest dataset in `datasets`. Dataset.zip((a, d)) # ==> [ (1, 13), (2, 14) ]
Public properties
object element_spec get;
The type specification of an element of this dataset.
object element_spec_dyn get;
The type specification of an element of this dataset.
object output_classes get;
Returns the class of each component of an element of this dataset. (deprecated) Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_classes(dataset)`.
object output_classes_dyn get;
Returns the class of each component of an element of this dataset. (deprecated) Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_classes(dataset)`.
object output_shapes get;
Returns the shape of each component of an element of this dataset. (deprecated) Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(dataset)`.
object output_shapes_dyn get;
Returns the shape of each component of an element of this dataset. (deprecated) Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(dataset)`.
object output_types get;
Returns the type of each component of an element of this dataset. (deprecated) Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_types(dataset)`.
object output_types_dyn get;
Returns the type of each component of an element of this dataset. (deprecated) Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_types(dataset)`.