Type Dataset
Namespace tensorflow.compat.v2.data
Parent PythonObjectContainer
Interfaces Trackable, CompositeTensor, IDataset
Methods
- from_generator
- from_generator
- from_generator
- from_generator
- from_generator
- from_generator
- from_generator
- from_generator
- from_generator_dyn
- from_tensor_slices
- from_tensors
- list_files
- list_files
- list_files
- list_files_dyn
- range
- range_dyn
- zip
- zip_dyn
Properties
Public static methods
Dataset from_generator(PythonFunctionContainer generator, DType output_types, PythonClassContainer output_shapes, Nullable<ValueTuple> args)
Creates a `Dataset` whose elements are generated by `generator`. The `generator` argument must be a callable object that returns
an object that supports the `iter()` protocol (e.g. a generator function).
The elements generated by `generator` must be compatible with the given
`output_types` and (optional) `output_shapes` arguments.
NOTE: The current implementation of `Dataset.from_generator()` uses
tf.numpy_function and inherits the same constraints. In particular, it
requires the `Dataset`- and `Iterator`-related operations to be placed
on a device in the same process as the Python program that called
`Dataset.from_generator()`. The body of `generator` will not be
serialized in a `GraphDef`, and you should not use this method if you
need to serialize your model and restore it in a different environment. NOTE: If `generator` depends on mutable global variables or other external
state, be aware that the runtime may invoke `generator` multiple times
(in order to support repeating the `Dataset`) and at any time
between the call to `Dataset.from_generator()` and the production of the
first element from the generator. Mutating global variables or external
state can cause undefined behavior, and we recommend that you explicitly
cache any external state in `generator` before calling
`Dataset.from_generator()`.
Parameters
-
PythonFunctionContainergenerator - A callable object that returns an object that supports the `iter()` protocol. If `args` is not specified, `generator` must take no arguments; otherwise it must take as many arguments as there are values in `args`.
-
DTypeoutput_types - A nested structure of
tf.DTypeobjects corresponding to each component of an element yielded by `generator`. -
PythonClassContaineroutput_shapes - (Optional.) A nested structure of
tf.TensorShapeobjects corresponding to each component of an element yielded by `generator`. -
Nullable<ValueTuple>args - (Optional.) A tuple of
tf.Tensorobjects that will be evaluated and passed to `generator` as NumPy-array arguments.
Returns
Show Example
import itertools
tf.compat.v1.enable_eager_execution() def gen():
for i in itertools.count(1):
yield (i, [1] * i) ds = tf.data.Dataset.from_generator(
gen, (tf.int64, tf.int64), (tf.TensorShape([]), tf.TensorShape([None]))) for value in ds.take(2):
print value
# (1, array([1]))
# (2, array([1, 1]))
Dataset from_generator(PythonFunctionContainer generator, PythonClassContainer output_types, PythonClassContainer output_shapes, Nullable<ValueTuple> args)
Creates a `Dataset` whose elements are generated by `generator`. The `generator` argument must be a callable object that returns
an object that supports the `iter()` protocol (e.g. a generator function).
The elements generated by `generator` must be compatible with the given
`output_types` and (optional) `output_shapes` arguments.
NOTE: The current implementation of `Dataset.from_generator()` uses
tf.numpy_function and inherits the same constraints. In particular, it
requires the `Dataset`- and `Iterator`-related operations to be placed
on a device in the same process as the Python program that called
`Dataset.from_generator()`. The body of `generator` will not be
serialized in a `GraphDef`, and you should not use this method if you
need to serialize your model and restore it in a different environment. NOTE: If `generator` depends on mutable global variables or other external
state, be aware that the runtime may invoke `generator` multiple times
(in order to support repeating the `Dataset`) and at any time
between the call to `Dataset.from_generator()` and the production of the
first element from the generator. Mutating global variables or external
state can cause undefined behavior, and we recommend that you explicitly
cache any external state in `generator` before calling
`Dataset.from_generator()`.
Parameters
-
PythonFunctionContainergenerator - A callable object that returns an object that supports the `iter()` protocol. If `args` is not specified, `generator` must take no arguments; otherwise it must take as many arguments as there are values in `args`.
-
PythonClassContaineroutput_types - A nested structure of
tf.DTypeobjects corresponding to each component of an element yielded by `generator`. -
PythonClassContaineroutput_shapes - (Optional.) A nested structure of
tf.TensorShapeobjects corresponding to each component of an element yielded by `generator`. -
Nullable<ValueTuple>args - (Optional.) A tuple of
tf.Tensorobjects that will be evaluated and passed to `generator` as NumPy-array arguments.
Returns
Show Example
import itertools
tf.compat.v1.enable_eager_execution() def gen():
for i in itertools.count(1):
yield (i, [1] * i) ds = tf.data.Dataset.from_generator(
gen, (tf.int64, tf.int64), (tf.TensorShape([]), tf.TensorShape([None]))) for value in ds.take(2):
print value
# (1, array([1]))
# (2, array([1, 1]))
Dataset from_generator(PythonFunctionContainer generator, PythonClassContainer output_types, int output_shapes, Nullable<ValueTuple> args)
Creates a `Dataset` whose elements are generated by `generator`. The `generator` argument must be a callable object that returns
an object that supports the `iter()` protocol (e.g. a generator function).
The elements generated by `generator` must be compatible with the given
`output_types` and (optional) `output_shapes` arguments.
NOTE: The current implementation of `Dataset.from_generator()` uses
tf.numpy_function and inherits the same constraints. In particular, it
requires the `Dataset`- and `Iterator`-related operations to be placed
on a device in the same process as the Python program that called
`Dataset.from_generator()`. The body of `generator` will not be
serialized in a `GraphDef`, and you should not use this method if you
need to serialize your model and restore it in a different environment. NOTE: If `generator` depends on mutable global variables or other external
state, be aware that the runtime may invoke `generator` multiple times
(in order to support repeating the `Dataset`) and at any time
between the call to `Dataset.from_generator()` and the production of the
first element from the generator. Mutating global variables or external
state can cause undefined behavior, and we recommend that you explicitly
cache any external state in `generator` before calling
`Dataset.from_generator()`.
Parameters
-
PythonFunctionContainergenerator - A callable object that returns an object that supports the `iter()` protocol. If `args` is not specified, `generator` must take no arguments; otherwise it must take as many arguments as there are values in `args`.
-
PythonClassContaineroutput_types - A nested structure of
tf.DTypeobjects corresponding to each component of an element yielded by `generator`. -
intoutput_shapes - (Optional.) A nested structure of
tf.TensorShapeobjects corresponding to each component of an element yielded by `generator`. -
Nullable<ValueTuple>args - (Optional.) A tuple of
tf.Tensorobjects that will be evaluated and passed to `generator` as NumPy-array arguments.
Returns
Show Example
import itertools
tf.compat.v1.enable_eager_execution() def gen():
for i in itertools.count(1):
yield (i, [1] * i) ds = tf.data.Dataset.from_generator(
gen, (tf.int64, tf.int64), (tf.TensorShape([]), tf.TensorShape([None]))) for value in ds.take(2):
print value
# (1, array([1]))
# (2, array([1, 1]))
Dataset from_generator(PythonFunctionContainer generator, IDictionary<string, object> output_types, TensorShape output_shapes, Nullable<ValueTuple> args)
Dataset from_generator(PythonFunctionContainer generator, IEnumerable<object> output_types, TensorShape output_shapes, Nullable<ValueTuple> args)
Dataset from_generator(PythonFunctionContainer generator, DType output_types, TensorShape output_shapes, Nullable<ValueTuple> args)
Dataset from_generator(PythonFunctionContainer generator, PythonClassContainer output_types, TensorShape output_shapes, Nullable<ValueTuple> args)
Dataset from_generator(PythonFunctionContainer generator, PythonClassContainer output_types, IEnumerable<object> output_shapes, Nullable<ValueTuple> args)
Creates a `Dataset` whose elements are generated by `generator`. The `generator` argument must be a callable object that returns
an object that supports the `iter()` protocol (e.g. a generator function).
The elements generated by `generator` must be compatible with the given
`output_types` and (optional) `output_shapes` arguments.
NOTE: The current implementation of `Dataset.from_generator()` uses
tf.numpy_function and inherits the same constraints. In particular, it
requires the `Dataset`- and `Iterator`-related operations to be placed
on a device in the same process as the Python program that called
`Dataset.from_generator()`. The body of `generator` will not be
serialized in a `GraphDef`, and you should not use this method if you
need to serialize your model and restore it in a different environment. NOTE: If `generator` depends on mutable global variables or other external
state, be aware that the runtime may invoke `generator` multiple times
(in order to support repeating the `Dataset`) and at any time
between the call to `Dataset.from_generator()` and the production of the
first element from the generator. Mutating global variables or external
state can cause undefined behavior, and we recommend that you explicitly
cache any external state in `generator` before calling
`Dataset.from_generator()`.
Parameters
-
PythonFunctionContainergenerator - A callable object that returns an object that supports the `iter()` protocol. If `args` is not specified, `generator` must take no arguments; otherwise it must take as many arguments as there are values in `args`.
-
PythonClassContaineroutput_types - A nested structure of
tf.DTypeobjects corresponding to each component of an element yielded by `generator`. -
IEnumerable<object>output_shapes - (Optional.) A nested structure of
tf.TensorShapeobjects corresponding to each component of an element yielded by `generator`. -
Nullable<ValueTuple>args - (Optional.) A tuple of
tf.Tensorobjects that will be evaluated and passed to `generator` as NumPy-array arguments.
Returns
Show Example
import itertools
tf.compat.v1.enable_eager_execution() def gen():
for i in itertools.count(1):
yield (i, [1] * i) ds = tf.data.Dataset.from_generator(
gen, (tf.int64, tf.int64), (tf.TensorShape([]), tf.TensorShape([None]))) for value in ds.take(2):
print value
# (1, array([1]))
# (2, array([1, 1]))
object from_generator_dyn(object generator, object output_types, object output_shapes, object args)
Creates a `Dataset` whose elements are generated by `generator`. The `generator` argument must be a callable object that returns
an object that supports the `iter()` protocol (e.g. a generator function).
The elements generated by `generator` must be compatible with the given
`output_types` and (optional) `output_shapes` arguments.
NOTE: The current implementation of `Dataset.from_generator()` uses
tf.numpy_function and inherits the same constraints. In particular, it
requires the `Dataset`- and `Iterator`-related operations to be placed
on a device in the same process as the Python program that called
`Dataset.from_generator()`. The body of `generator` will not be
serialized in a `GraphDef`, and you should not use this method if you
need to serialize your model and restore it in a different environment. NOTE: If `generator` depends on mutable global variables or other external
state, be aware that the runtime may invoke `generator` multiple times
(in order to support repeating the `Dataset`) and at any time
between the call to `Dataset.from_generator()` and the production of the
first element from the generator. Mutating global variables or external
state can cause undefined behavior, and we recommend that you explicitly
cache any external state in `generator` before calling
`Dataset.from_generator()`.
Parameters
-
objectgenerator - A callable object that returns an object that supports the `iter()` protocol. If `args` is not specified, `generator` must take no arguments; otherwise it must take as many arguments as there are values in `args`.
-
objectoutput_types - A nested structure of
tf.DTypeobjects corresponding to each component of an element yielded by `generator`. -
objectoutput_shapes - (Optional.) A nested structure of
tf.TensorShapeobjects corresponding to each component of an element yielded by `generator`. -
objectargs - (Optional.) A tuple of
tf.Tensorobjects that will be evaluated and passed to `generator` as NumPy-array arguments.
Returns
-
object
Show Example
import itertools
tf.compat.v1.enable_eager_execution() def gen():
for i in itertools.count(1):
yield (i, [1] * i) ds = tf.data.Dataset.from_generator(
gen, (tf.int64, tf.int64), (tf.TensorShape([]), tf.TensorShape([None]))) for value in ds.take(2):
print value
# (1, array([1]))
# (2, array([1, 1]))
Dataset from_tensor_slices(IEnumerable<object> tensors)
Dataset from_tensors(IEnumerable<object> tensors)
object list_files(IEnumerable<object> file_pattern, Nullable<bool> shuffle, Nullable<int> seed)
A dataset of all files matching one or more glob patterns. NOTE: The default behavior of this method is to return filenames in
a non-deterministic random shuffled order. Pass a `seed` or `shuffle=False`
to get results in a deterministic order. Example:
If we had the following files on our filesystem:
- /path/to/dir/a.txt
- /path/to/dir/b.py
- /path/to/dir/c.py
If we pass "/path/to/dir/*.py" as the directory, the dataset
would produce:
- /path/to/dir/b.py
- /path/to/dir/c.py
Parameters
-
IEnumerable<object>file_pattern - A string, a list of strings, or a
tf.Tensorof string type (scalar or vector), representing the filename glob (i.e. shell wildcard) pattern(s) that will be matched. -
Nullable<bool>shuffle - (Optional.) If `True`, the file names will be shuffled randomly. Defaults to `True`.
-
Nullable<int>seed - (Optional.) A
tf.int64scalartf.Tensor, representing the random seed that will be used to create the distribution. See `tf.compat.v1.set_random_seed` for behavior.
Returns
-
object - Dataset: A `Dataset` of strings corresponding to file names.
object list_files(string file_pattern, Nullable<bool> shuffle, Nullable<int> seed)
A dataset of all files matching one or more glob patterns. NOTE: The default behavior of this method is to return filenames in
a non-deterministic random shuffled order. Pass a `seed` or `shuffle=False`
to get results in a deterministic order. Example:
If we had the following files on our filesystem:
- /path/to/dir/a.txt
- /path/to/dir/b.py
- /path/to/dir/c.py
If we pass "/path/to/dir/*.py" as the directory, the dataset
would produce:
- /path/to/dir/b.py
- /path/to/dir/c.py
Parameters
-
stringfile_pattern - A string, a list of strings, or a
tf.Tensorof string type (scalar or vector), representing the filename glob (i.e. shell wildcard) pattern(s) that will be matched. -
Nullable<bool>shuffle - (Optional.) If `True`, the file names will be shuffled randomly. Defaults to `True`.
-
Nullable<int>seed - (Optional.) A
tf.int64scalartf.Tensor, representing the random seed that will be used to create the distribution. See `tf.compat.v1.set_random_seed` for behavior.
Returns
-
object - Dataset: A `Dataset` of strings corresponding to file names.
object list_files(IGraphNodeBase file_pattern, Nullable<bool> shuffle, Nullable<int> seed)
object list_files_dyn(object file_pattern, object shuffle, object seed)
A dataset of all files matching one or more glob patterns. NOTE: The default behavior of this method is to return filenames in
a non-deterministic random shuffled order. Pass a `seed` or `shuffle=False`
to get results in a deterministic order. Example:
If we had the following files on our filesystem:
- /path/to/dir/a.txt
- /path/to/dir/b.py
- /path/to/dir/c.py
If we pass "/path/to/dir/*.py" as the directory, the dataset
would produce:
- /path/to/dir/b.py
- /path/to/dir/c.py
Parameters
-
objectfile_pattern - A string, a list of strings, or a
tf.Tensorof string type (scalar or vector), representing the filename glob (i.e. shell wildcard) pattern(s) that will be matched. -
objectshuffle - (Optional.) If `True`, the file names will be shuffled randomly. Defaults to `True`.
-
objectseed - (Optional.) A
tf.int64scalartf.Tensor, representing the random seed that will be used to create the distribution. See `tf.compat.v1.set_random_seed` for behavior.
Returns
-
object - Dataset: A `Dataset` of strings corresponding to file names.
Dataset range(Object[] args)
Creates a `Dataset` of a step-separated range of values.
Parameters
-
Object[]args - follows the same semantics as python's xrange. len(args) == 1 -> start = 0, stop = args[0], step = 1 len(args) == 2 -> start = args[0], stop = args[1], step = 1 len(args) == 3 -> start = args[0], stop = args[1, stop = args[2]
Returns
Show Example
Dataset.range(5) == [0, 1, 2, 3, 4]
Dataset.range(2, 5) == [2, 3, 4]
Dataset.range(1, 5, 2) == [1, 3]
Dataset.range(1, 5, -2) == []
Dataset.range(5, 1) == []
Dataset.range(5, 1, -2) == [5, 3]
object range_dyn(Object[] args)
Creates a `Dataset` of a step-separated range of values.
Parameters
-
Object[]args - follows the same semantics as python's xrange. len(args) == 1 -> start = 0, stop = args[0], step = 1 len(args) == 2 -> start = args[0], stop = args[1], step = 1 len(args) == 3 -> start = args[0], stop = args[1, stop = args[2]
Returns
-
object
Show Example
Dataset.range(5) == [0, 1, 2, 3, 4]
Dataset.range(2, 5) == [2, 3, 4]
Dataset.range(1, 5, 2) == [1, 3]
Dataset.range(1, 5, -2) == []
Dataset.range(5, 1) == []
Dataset.range(5, 1, -2) == [5, 3]
Dataset zip(IEnumerable<Dataset> datasets)
Creates a `Dataset` by zipping together the given datasets. This method has similar semantics to the built-in `zip()` function
in Python, with the main difference being that the `datasets`
argument can be an arbitrary nested structure of `Dataset` objects.
Parameters
-
IEnumerable<Dataset>datasets - A nested structure of datasets.
Returns
Show Example
a = Dataset.range(1, 4) # ==> [ 1, 2, 3 ]
b = Dataset.range(4, 7) # ==> [ 4, 5, 6 ]
c = Dataset.range(7, 13).batch(2) # ==> [ [7, 8], [9, 10], [11, 12] ]
d = Dataset.range(13, 15) # ==> [ 13, 14 ] # The nested structure of the `datasets` argument determines the
# structure of elements in the resulting dataset.
Dataset.zip((a, b)) # ==> [ (1, 4), (2, 5), (3, 6) ]
Dataset.zip((b, a)) # ==> [ (4, 1), (5, 2), (6, 3) ] # The `datasets` argument may contain an arbitrary number of
# datasets.
Dataset.zip((a, b, c)) # ==> [ (1, 4, [7, 8]),
# (2, 5, [9, 10]),
# (3, 6, [11, 12]) ] # The number of elements in the resulting dataset is the same as
# the size of the smallest dataset in `datasets`.
Dataset.zip((a, d)) # ==> [ (1, 13), (2, 14) ]
object zip_dyn(object datasets)
Creates a `Dataset` by zipping together the given datasets. This method has similar semantics to the built-in `zip()` function
in Python, with the main difference being that the `datasets`
argument can be an arbitrary nested structure of `Dataset` objects.
Parameters
-
objectdatasets - A nested structure of datasets.
Returns
-
object
Show Example
a = Dataset.range(1, 4) # ==> [ 1, 2, 3 ]
b = Dataset.range(4, 7) # ==> [ 4, 5, 6 ]
c = Dataset.range(7, 13).batch(2) # ==> [ [7, 8], [9, 10], [11, 12] ]
d = Dataset.range(13, 15) # ==> [ 13, 14 ] # The nested structure of the `datasets` argument determines the
# structure of elements in the resulting dataset.
Dataset.zip((a, b)) # ==> [ (1, 4), (2, 5), (3, 6) ]
Dataset.zip((b, a)) # ==> [ (4, 1), (5, 2), (6, 3) ] # The `datasets` argument may contain an arbitrary number of
# datasets.
Dataset.zip((a, b, c)) # ==> [ (1, 4, [7, 8]),
# (2, 5, [9, 10]),
# (3, 6, [11, 12]) ] # The number of elements in the resulting dataset is the same as
# the size of the smallest dataset in `datasets`.
Dataset.zip((a, d)) # ==> [ (1, 13), (2, 14) ]