Type Dataset
Namespace tensorflow.compat.v2.data
Parent PythonObjectContainer
Interfaces Trackable, CompositeTensor, IDataset
Methods
- from_generator
- from_generator
- from_generator
- from_generator
- from_generator
- from_generator
- from_generator
- from_generator
- from_generator_dyn
- from_tensor_slices
- from_tensors
- list_files
- list_files
- list_files
- list_files_dyn
- range
- range_dyn
- zip
- zip_dyn
Properties
Public static methods
Dataset from_generator(PythonFunctionContainer generator, DType output_types, PythonClassContainer output_shapes, Nullable<ValueTuple> args)
Creates a `Dataset` whose elements are generated by `generator`. The `generator` argument must be a callable object that returns
an object that supports the `iter()` protocol (e.g. a generator function).
The elements generated by `generator` must be compatible with the given
`output_types` and (optional) `output_shapes` arguments.
NOTE: The current implementation of `Dataset.from_generator()` uses
tf.numpy_function
and inherits the same constraints. In particular, it
requires the `Dataset`- and `Iterator`-related operations to be placed
on a device in the same process as the Python program that called
`Dataset.from_generator()`. The body of `generator` will not be
serialized in a `GraphDef`, and you should not use this method if you
need to serialize your model and restore it in a different environment. NOTE: If `generator` depends on mutable global variables or other external
state, be aware that the runtime may invoke `generator` multiple times
(in order to support repeating the `Dataset`) and at any time
between the call to `Dataset.from_generator()` and the production of the
first element from the generator. Mutating global variables or external
state can cause undefined behavior, and we recommend that you explicitly
cache any external state in `generator` before calling
`Dataset.from_generator()`.
Parameters
-
PythonFunctionContainer
generator - A callable object that returns an object that supports the `iter()` protocol. If `args` is not specified, `generator` must take no arguments; otherwise it must take as many arguments as there are values in `args`.
-
DType
output_types - A nested structure of
tf.DType
objects corresponding to each component of an element yielded by `generator`. -
PythonClassContainer
output_shapes - (Optional.) A nested structure of
tf.TensorShape
objects corresponding to each component of an element yielded by `generator`. -
Nullable<ValueTuple>
args - (Optional.) A tuple of
tf.Tensor
objects that will be evaluated and passed to `generator` as NumPy-array arguments.
Returns
Show Example
import itertools tf.compat.v1.enable_eager_execution() def gen(): for i in itertools.count(1): yield (i, [1] * i) ds = tf.data.Dataset.from_generator( gen, (tf.int64, tf.int64), (tf.TensorShape([]), tf.TensorShape([None]))) for value in ds.take(2): print value # (1, array([1])) # (2, array([1, 1]))
Dataset from_generator(PythonFunctionContainer generator, PythonClassContainer output_types, PythonClassContainer output_shapes, Nullable<ValueTuple> args)
Creates a `Dataset` whose elements are generated by `generator`. The `generator` argument must be a callable object that returns
an object that supports the `iter()` protocol (e.g. a generator function).
The elements generated by `generator` must be compatible with the given
`output_types` and (optional) `output_shapes` arguments.
NOTE: The current implementation of `Dataset.from_generator()` uses
tf.numpy_function
and inherits the same constraints. In particular, it
requires the `Dataset`- and `Iterator`-related operations to be placed
on a device in the same process as the Python program that called
`Dataset.from_generator()`. The body of `generator` will not be
serialized in a `GraphDef`, and you should not use this method if you
need to serialize your model and restore it in a different environment. NOTE: If `generator` depends on mutable global variables or other external
state, be aware that the runtime may invoke `generator` multiple times
(in order to support repeating the `Dataset`) and at any time
between the call to `Dataset.from_generator()` and the production of the
first element from the generator. Mutating global variables or external
state can cause undefined behavior, and we recommend that you explicitly
cache any external state in `generator` before calling
`Dataset.from_generator()`.
Parameters
-
PythonFunctionContainer
generator - A callable object that returns an object that supports the `iter()` protocol. If `args` is not specified, `generator` must take no arguments; otherwise it must take as many arguments as there are values in `args`.
-
PythonClassContainer
output_types - A nested structure of
tf.DType
objects corresponding to each component of an element yielded by `generator`. -
PythonClassContainer
output_shapes - (Optional.) A nested structure of
tf.TensorShape
objects corresponding to each component of an element yielded by `generator`. -
Nullable<ValueTuple>
args - (Optional.) A tuple of
tf.Tensor
objects that will be evaluated and passed to `generator` as NumPy-array arguments.
Returns
Show Example
import itertools tf.compat.v1.enable_eager_execution() def gen(): for i in itertools.count(1): yield (i, [1] * i) ds = tf.data.Dataset.from_generator( gen, (tf.int64, tf.int64), (tf.TensorShape([]), tf.TensorShape([None]))) for value in ds.take(2): print value # (1, array([1])) # (2, array([1, 1]))
Dataset from_generator(PythonFunctionContainer generator, PythonClassContainer output_types, int output_shapes, Nullable<ValueTuple> args)
Creates a `Dataset` whose elements are generated by `generator`. The `generator` argument must be a callable object that returns
an object that supports the `iter()` protocol (e.g. a generator function).
The elements generated by `generator` must be compatible with the given
`output_types` and (optional) `output_shapes` arguments.
NOTE: The current implementation of `Dataset.from_generator()` uses
tf.numpy_function
and inherits the same constraints. In particular, it
requires the `Dataset`- and `Iterator`-related operations to be placed
on a device in the same process as the Python program that called
`Dataset.from_generator()`. The body of `generator` will not be
serialized in a `GraphDef`, and you should not use this method if you
need to serialize your model and restore it in a different environment. NOTE: If `generator` depends on mutable global variables or other external
state, be aware that the runtime may invoke `generator` multiple times
(in order to support repeating the `Dataset`) and at any time
between the call to `Dataset.from_generator()` and the production of the
first element from the generator. Mutating global variables or external
state can cause undefined behavior, and we recommend that you explicitly
cache any external state in `generator` before calling
`Dataset.from_generator()`.
Parameters
-
PythonFunctionContainer
generator - A callable object that returns an object that supports the `iter()` protocol. If `args` is not specified, `generator` must take no arguments; otherwise it must take as many arguments as there are values in `args`.
-
PythonClassContainer
output_types - A nested structure of
tf.DType
objects corresponding to each component of an element yielded by `generator`. -
int
output_shapes - (Optional.) A nested structure of
tf.TensorShape
objects corresponding to each component of an element yielded by `generator`. -
Nullable<ValueTuple>
args - (Optional.) A tuple of
tf.Tensor
objects that will be evaluated and passed to `generator` as NumPy-array arguments.
Returns
Show Example
import itertools tf.compat.v1.enable_eager_execution() def gen(): for i in itertools.count(1): yield (i, [1] * i) ds = tf.data.Dataset.from_generator( gen, (tf.int64, tf.int64), (tf.TensorShape([]), tf.TensorShape([None]))) for value in ds.take(2): print value # (1, array([1])) # (2, array([1, 1]))
Dataset from_generator(PythonFunctionContainer generator, IDictionary<string, object> output_types, TensorShape output_shapes, Nullable<ValueTuple> args)
Dataset from_generator(PythonFunctionContainer generator, IEnumerable<object> output_types, TensorShape output_shapes, Nullable<ValueTuple> args)
Dataset from_generator(PythonFunctionContainer generator, DType output_types, TensorShape output_shapes, Nullable<ValueTuple> args)
Dataset from_generator(PythonFunctionContainer generator, PythonClassContainer output_types, TensorShape output_shapes, Nullable<ValueTuple> args)
Dataset from_generator(PythonFunctionContainer generator, PythonClassContainer output_types, IEnumerable<object> output_shapes, Nullable<ValueTuple> args)
Creates a `Dataset` whose elements are generated by `generator`. The `generator` argument must be a callable object that returns
an object that supports the `iter()` protocol (e.g. a generator function).
The elements generated by `generator` must be compatible with the given
`output_types` and (optional) `output_shapes` arguments.
NOTE: The current implementation of `Dataset.from_generator()` uses
tf.numpy_function
and inherits the same constraints. In particular, it
requires the `Dataset`- and `Iterator`-related operations to be placed
on a device in the same process as the Python program that called
`Dataset.from_generator()`. The body of `generator` will not be
serialized in a `GraphDef`, and you should not use this method if you
need to serialize your model and restore it in a different environment. NOTE: If `generator` depends on mutable global variables or other external
state, be aware that the runtime may invoke `generator` multiple times
(in order to support repeating the `Dataset`) and at any time
between the call to `Dataset.from_generator()` and the production of the
first element from the generator. Mutating global variables or external
state can cause undefined behavior, and we recommend that you explicitly
cache any external state in `generator` before calling
`Dataset.from_generator()`.
Parameters
-
PythonFunctionContainer
generator - A callable object that returns an object that supports the `iter()` protocol. If `args` is not specified, `generator` must take no arguments; otherwise it must take as many arguments as there are values in `args`.
-
PythonClassContainer
output_types - A nested structure of
tf.DType
objects corresponding to each component of an element yielded by `generator`. -
IEnumerable<object>
output_shapes - (Optional.) A nested structure of
tf.TensorShape
objects corresponding to each component of an element yielded by `generator`. -
Nullable<ValueTuple>
args - (Optional.) A tuple of
tf.Tensor
objects that will be evaluated and passed to `generator` as NumPy-array arguments.
Returns
Show Example
import itertools tf.compat.v1.enable_eager_execution() def gen(): for i in itertools.count(1): yield (i, [1] * i) ds = tf.data.Dataset.from_generator( gen, (tf.int64, tf.int64), (tf.TensorShape([]), tf.TensorShape([None]))) for value in ds.take(2): print value # (1, array([1])) # (2, array([1, 1]))
object from_generator_dyn(object generator, object output_types, object output_shapes, object args)
Creates a `Dataset` whose elements are generated by `generator`. The `generator` argument must be a callable object that returns
an object that supports the `iter()` protocol (e.g. a generator function).
The elements generated by `generator` must be compatible with the given
`output_types` and (optional) `output_shapes` arguments.
NOTE: The current implementation of `Dataset.from_generator()` uses
tf.numpy_function
and inherits the same constraints. In particular, it
requires the `Dataset`- and `Iterator`-related operations to be placed
on a device in the same process as the Python program that called
`Dataset.from_generator()`. The body of `generator` will not be
serialized in a `GraphDef`, and you should not use this method if you
need to serialize your model and restore it in a different environment. NOTE: If `generator` depends on mutable global variables or other external
state, be aware that the runtime may invoke `generator` multiple times
(in order to support repeating the `Dataset`) and at any time
between the call to `Dataset.from_generator()` and the production of the
first element from the generator. Mutating global variables or external
state can cause undefined behavior, and we recommend that you explicitly
cache any external state in `generator` before calling
`Dataset.from_generator()`.
Parameters
-
object
generator - A callable object that returns an object that supports the `iter()` protocol. If `args` is not specified, `generator` must take no arguments; otherwise it must take as many arguments as there are values in `args`.
-
object
output_types - A nested structure of
tf.DType
objects corresponding to each component of an element yielded by `generator`. -
object
output_shapes - (Optional.) A nested structure of
tf.TensorShape
objects corresponding to each component of an element yielded by `generator`. -
object
args - (Optional.) A tuple of
tf.Tensor
objects that will be evaluated and passed to `generator` as NumPy-array arguments.
Returns
-
object
Show Example
import itertools tf.compat.v1.enable_eager_execution() def gen(): for i in itertools.count(1): yield (i, [1] * i) ds = tf.data.Dataset.from_generator( gen, (tf.int64, tf.int64), (tf.TensorShape([]), tf.TensorShape([None]))) for value in ds.take(2): print value # (1, array([1])) # (2, array([1, 1]))
Dataset from_tensor_slices(IEnumerable<object> tensors)
Dataset from_tensors(IEnumerable<object> tensors)
object list_files(IEnumerable<object> file_pattern, Nullable<bool> shuffle, Nullable<int> seed)
A dataset of all files matching one or more glob patterns. NOTE: The default behavior of this method is to return filenames in
a non-deterministic random shuffled order. Pass a `seed` or `shuffle=False`
to get results in a deterministic order. Example:
If we had the following files on our filesystem:
- /path/to/dir/a.txt
- /path/to/dir/b.py
- /path/to/dir/c.py
If we pass "/path/to/dir/*.py" as the directory, the dataset
would produce:
- /path/to/dir/b.py
- /path/to/dir/c.py
Parameters
-
IEnumerable<object>
file_pattern - A string, a list of strings, or a
tf.Tensor
of string type (scalar or vector), representing the filename glob (i.e. shell wildcard) pattern(s) that will be matched. -
Nullable<bool>
shuffle - (Optional.) If `True`, the file names will be shuffled randomly. Defaults to `True`.
-
Nullable<int>
seed - (Optional.) A
tf.int64
scalartf.Tensor
, representing the random seed that will be used to create the distribution. See `tf.compat.v1.set_random_seed` for behavior.
Returns
-
object
- Dataset: A `Dataset` of strings corresponding to file names.
object list_files(string file_pattern, Nullable<bool> shuffle, Nullable<int> seed)
A dataset of all files matching one or more glob patterns. NOTE: The default behavior of this method is to return filenames in
a non-deterministic random shuffled order. Pass a `seed` or `shuffle=False`
to get results in a deterministic order. Example:
If we had the following files on our filesystem:
- /path/to/dir/a.txt
- /path/to/dir/b.py
- /path/to/dir/c.py
If we pass "/path/to/dir/*.py" as the directory, the dataset
would produce:
- /path/to/dir/b.py
- /path/to/dir/c.py
Parameters
-
string
file_pattern - A string, a list of strings, or a
tf.Tensor
of string type (scalar or vector), representing the filename glob (i.e. shell wildcard) pattern(s) that will be matched. -
Nullable<bool>
shuffle - (Optional.) If `True`, the file names will be shuffled randomly. Defaults to `True`.
-
Nullable<int>
seed - (Optional.) A
tf.int64
scalartf.Tensor
, representing the random seed that will be used to create the distribution. See `tf.compat.v1.set_random_seed` for behavior.
Returns
-
object
- Dataset: A `Dataset` of strings corresponding to file names.
object list_files(IGraphNodeBase file_pattern, Nullable<bool> shuffle, Nullable<int> seed)
object list_files_dyn(object file_pattern, object shuffle, object seed)
A dataset of all files matching one or more glob patterns. NOTE: The default behavior of this method is to return filenames in
a non-deterministic random shuffled order. Pass a `seed` or `shuffle=False`
to get results in a deterministic order. Example:
If we had the following files on our filesystem:
- /path/to/dir/a.txt
- /path/to/dir/b.py
- /path/to/dir/c.py
If we pass "/path/to/dir/*.py" as the directory, the dataset
would produce:
- /path/to/dir/b.py
- /path/to/dir/c.py
Parameters
-
object
file_pattern - A string, a list of strings, or a
tf.Tensor
of string type (scalar or vector), representing the filename glob (i.e. shell wildcard) pattern(s) that will be matched. -
object
shuffle - (Optional.) If `True`, the file names will be shuffled randomly. Defaults to `True`.
-
object
seed - (Optional.) A
tf.int64
scalartf.Tensor
, representing the random seed that will be used to create the distribution. See `tf.compat.v1.set_random_seed` for behavior.
Returns
-
object
- Dataset: A `Dataset` of strings corresponding to file names.
Dataset range(Object[] args)
Creates a `Dataset` of a step-separated range of values.
Parameters
-
Object[]
args - follows the same semantics as python's xrange. len(args) == 1 -> start = 0, stop = args[0], step = 1 len(args) == 2 -> start = args[0], stop = args[1], step = 1 len(args) == 3 -> start = args[0], stop = args[1, stop = args[2]
Returns
Show Example
Dataset.range(5) == [0, 1, 2, 3, 4] Dataset.range(2, 5) == [2, 3, 4] Dataset.range(1, 5, 2) == [1, 3] Dataset.range(1, 5, -2) == [] Dataset.range(5, 1) == [] Dataset.range(5, 1, -2) == [5, 3]
object range_dyn(Object[] args)
Creates a `Dataset` of a step-separated range of values.
Parameters
-
Object[]
args - follows the same semantics as python's xrange. len(args) == 1 -> start = 0, stop = args[0], step = 1 len(args) == 2 -> start = args[0], stop = args[1], step = 1 len(args) == 3 -> start = args[0], stop = args[1, stop = args[2]
Returns
-
object
Show Example
Dataset.range(5) == [0, 1, 2, 3, 4] Dataset.range(2, 5) == [2, 3, 4] Dataset.range(1, 5, 2) == [1, 3] Dataset.range(1, 5, -2) == [] Dataset.range(5, 1) == [] Dataset.range(5, 1, -2) == [5, 3]
Dataset zip(IEnumerable<Dataset> datasets)
Creates a `Dataset` by zipping together the given datasets. This method has similar semantics to the built-in `zip()` function
in Python, with the main difference being that the `datasets`
argument can be an arbitrary nested structure of `Dataset` objects.
Parameters
-
IEnumerable<Dataset>
datasets - A nested structure of datasets.
Returns
Show Example
a = Dataset.range(1, 4) # ==> [ 1, 2, 3 ] b = Dataset.range(4, 7) # ==> [ 4, 5, 6 ] c = Dataset.range(7, 13).batch(2) # ==> [ [7, 8], [9, 10], [11, 12] ] d = Dataset.range(13, 15) # ==> [ 13, 14 ] # The nested structure of the `datasets` argument determines the # structure of elements in the resulting dataset. Dataset.zip((a, b)) # ==> [ (1, 4), (2, 5), (3, 6) ] Dataset.zip((b, a)) # ==> [ (4, 1), (5, 2), (6, 3) ] # The `datasets` argument may contain an arbitrary number of # datasets. Dataset.zip((a, b, c)) # ==> [ (1, 4, [7, 8]), # (2, 5, [9, 10]), # (3, 6, [11, 12]) ] # The number of elements in the resulting dataset is the same as # the size of the smallest dataset in `datasets`. Dataset.zip((a, d)) # ==> [ (1, 13), (2, 14) ]
object zip_dyn(object datasets)
Creates a `Dataset` by zipping together the given datasets. This method has similar semantics to the built-in `zip()` function
in Python, with the main difference being that the `datasets`
argument can be an arbitrary nested structure of `Dataset` objects.
Parameters
-
object
datasets - A nested structure of datasets.
Returns
-
object
Show Example
a = Dataset.range(1, 4) # ==> [ 1, 2, 3 ] b = Dataset.range(4, 7) # ==> [ 4, 5, 6 ] c = Dataset.range(7, 13).batch(2) # ==> [ [7, 8], [9, 10], [11, 12] ] d = Dataset.range(13, 15) # ==> [ 13, 14 ] # The nested structure of the `datasets` argument determines the # structure of elements in the resulting dataset. Dataset.zip((a, b)) # ==> [ (1, 4), (2, 5), (3, 6) ] Dataset.zip((b, a)) # ==> [ (4, 1), (5, 2), (6, 3) ] # The `datasets` argument may contain an arbitrary number of # datasets. Dataset.zip((a, b, c)) # ==> [ (1, 4, [7, 8]), # (2, 5, [9, 10]), # (3, 6, [11, 12]) ] # The number of elements in the resulting dataset is the same as # the size of the smallest dataset in `datasets`. Dataset.zip((a, d)) # ==> [ (1, 13), (2, 14) ]