Type RandomDataset
Namespace tensorflow.data.experimental
Parent DatasetV1Adapter
Interfaces IRandomDataset
A `Dataset` of pseudorandom values.
Methods
- apply
- filter
- filter_dyn
- filter_with_legacy_function_dyn
- flat_map_dyn
- interleave
- interleave_dyn
- reduce
- reduce
- reduce
- reduce
- reduce_dyn
- unbatch
- unbatch_dyn
- window
- window_dyn
Properties
Public instance methods
object apply(PythonFunctionContainer transformation_func)
Applies a transformation function to this dataset. `apply` enables chaining of custom `Dataset` transformations, which are
represented as functions that take one `Dataset` argument and return a
transformed `Dataset`. For example: ```
dataset = (dataset.map(lambda x: x ** 2)
.apply(group_by_window(key_func, reduce_func, window_size))
.map(lambda x: x ** 3))
```
Parameters
-
PythonFunctionContainer
transformation_func - A function that takes one `Dataset` argument and returns a `Dataset`.
Returns
-
object
object filter(PythonFunctionContainer predicate)
Filters this dataset according to `predicate`.
Parameters
-
PythonFunctionContainer
predicate - A function mapping a dataset element to a boolean.
Returns
-
object
Show Example
d = tf.data.Dataset.from_tensor_slices([1, 2, 3]) d = d.filter(lambda x: x < 3) # ==> [1, 2] # `tf.math.equal(x, y)` is required for equality comparison def filter_fn(x): return tf.math.equal(x, 1) d = d.filter(filter_fn) # ==> [1]
object filter_dyn(object predicate)
Filters this dataset according to `predicate`.
Parameters
-
object
predicate - A function mapping a dataset element to a boolean.
Returns
-
object
Show Example
d = tf.data.Dataset.from_tensor_slices([1, 2, 3]) d = d.filter(lambda x: x < 3) # ==> [1, 2] # `tf.math.equal(x, y)` is required for equality comparison def filter_fn(x): return tf.math.equal(x, 1) d = d.filter(filter_fn) # ==> [1]
object filter_with_legacy_function_dyn(object predicate)
Filters this dataset according to `predicate`. (deprecated) Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.filter() NOTE: This is an escape hatch for existing uses of `filter` that do not work
with V2 functions. New uses are strongly discouraged and existing uses
should migrate to `filter` as this method will be removed in V2.
Parameters
-
object
predicate - A function mapping a nested structure of tensors (having shapes
and types defined by `self.output_shapes` and `self.output_types`) to a
scalar
tf.bool
tensor.
Returns
-
object
object flat_map_dyn(object map_func)
Maps `map_func` across this dataset and flattens the result. Use `flat_map` if you want to make sure that the order of your dataset
stays the same. For example, to flatten a dataset of batches into a
dataset of their elements:
`tf.data.Dataset.interleave()` is a generalization of `flat_map`, since
`flat_map` produces the same output as
`tf.data.Dataset.interleave(cycle_length=1)`
Parameters
-
object
map_func - A function mapping a dataset element to a dataset.
Returns
-
object
Show Example
a = Dataset.from_tensor_slices([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ]) a.flat_map(lambda x: Dataset.from_tensor_slices(x + 1)) # ==> # [ 2, 3, 4, 5, 6, 7, 8, 9, 10 ]
object interleave(PythonFunctionContainer map_func, ImplicitContainer<T> cycle_length, int block_length, Nullable<int> num_parallel_calls)
Maps `map_func` across this dataset, and interleaves the results. For example, you can use `Dataset.interleave()` to process many input files
concurrently:
The `cycle_length` and `block_length` arguments control the order in which
elements are produced. `cycle_length` controls the number of input elements
that are processed concurrently. If you set `cycle_length` to 1, this
transformation will handle one input element at a time, and will produce
identical results to
tf.data.Dataset.flat_map
. In general,
this transformation will apply `map_func` to `cycle_length` input elements,
open iterators on the returned `Dataset` objects, and cycle through them
producing `block_length` consecutive elements from each iterator, and
consuming the next input element each time it reaches the end of an
iterator.
NOTE: The order of elements yielded by this transformation is
deterministic, as long as `map_func` is a pure function. If
`map_func` contains any stateful operations, the order in which
that state is accessed is undefined.
Parameters
-
PythonFunctionContainer
map_func - A function mapping a dataset element to a dataset.
-
ImplicitContainer<T>
cycle_length - (Optional.) The number of input elements that will be
processed concurrently. If not specified, the value will be derived from
the number of available CPU cores. If the `num_parallel_calls` argument
is set to
tf.data.experimental.AUTOTUNE
, the `cycle_length` argument also identifies the maximum degree of parallelism. -
int
block_length - (Optional.) The number of consecutive elements to produce from each input element before cycling to another input element.
-
Nullable<int>
num_parallel_calls - (Optional.) If specified, the implementation creates a
threadpool, which is used to fetch inputs from cycle elements
asynchronously and in parallel. The default behavior is to fetch inputs
from cycle elements synchronously with no parallelism. If the value
tf.data.experimental.AUTOTUNE
is used, then the number of parallel calls is set dynamically based on available CPU.
Returns
-
object
Show Example
# Preprocess 4 files concurrently, and interleave blocks of 16 records from # each file. filenames = ["/var/data/file1.txt", "/var/data/file2.txt",...] dataset = (Dataset.from_tensor_slices(filenames) .interleave(lambda x: TextLineDataset(x).map(parse_fn, num_parallel_calls=1), cycle_length=4, block_length=16))
object interleave_dyn(object map_func, ImplicitContainer<T> cycle_length, ImplicitContainer<T> block_length, object num_parallel_calls)
Maps `map_func` across this dataset, and interleaves the results. For example, you can use `Dataset.interleave()` to process many input files
concurrently:
The `cycle_length` and `block_length` arguments control the order in which
elements are produced. `cycle_length` controls the number of input elements
that are processed concurrently. If you set `cycle_length` to 1, this
transformation will handle one input element at a time, and will produce
identical results to
tf.data.Dataset.flat_map
. In general,
this transformation will apply `map_func` to `cycle_length` input elements,
open iterators on the returned `Dataset` objects, and cycle through them
producing `block_length` consecutive elements from each iterator, and
consuming the next input element each time it reaches the end of an
iterator.
NOTE: The order of elements yielded by this transformation is
deterministic, as long as `map_func` is a pure function. If
`map_func` contains any stateful operations, the order in which
that state is accessed is undefined.
Parameters
-
object
map_func - A function mapping a dataset element to a dataset.
-
ImplicitContainer<T>
cycle_length - (Optional.) The number of input elements that will be
processed concurrently. If not specified, the value will be derived from
the number of available CPU cores. If the `num_parallel_calls` argument
is set to
tf.data.experimental.AUTOTUNE
, the `cycle_length` argument also identifies the maximum degree of parallelism. -
ImplicitContainer<T>
block_length - (Optional.) The number of consecutive elements to produce from each input element before cycling to another input element.
-
object
num_parallel_calls - (Optional.) If specified, the implementation creates a
threadpool, which is used to fetch inputs from cycle elements
asynchronously and in parallel. The default behavior is to fetch inputs
from cycle elements synchronously with no parallelism. If the value
tf.data.experimental.AUTOTUNE
is used, then the number of parallel calls is set dynamically based on available CPU.
Returns
-
object
Show Example
# Preprocess 4 files concurrently, and interleave blocks of 16 records from # each file. filenames = ["/var/data/file1.txt", "/var/data/file2.txt",...] dataset = (Dataset.from_tensor_slices(filenames) .interleave(lambda x: TextLineDataset(x).map(parse_fn, num_parallel_calls=1), cycle_length=4, block_length=16))
object reduce(int64 initial_state, PythonFunctionContainer reduce_func)
object reduce(ValueTuple<IGraphNodeBase, object> initial_state, PythonFunctionContainer reduce_func)
object reduce(int initial_state, PythonFunctionContainer reduce_func)
object reduce(IGraphNodeBase initial_state, PythonFunctionContainer reduce_func)
object reduce_dyn(object initial_state, object reduce_func)
_UnbatchDataset unbatch()
Splits elements of a dataset into multiple elements on the batch dimension. (deprecated) Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.unbatch()`. For example, if elements of the dataset are shaped `[B, a0, a1,...]`,
where `B` may vary for each input element, then for each element in the
dataset, the unbatched dataset will contain `B` consecutive elements
of shape `[a0, a1,...]`.
Returns
-
_UnbatchDataset
- A `Dataset` transformation function, which can be passed to
tf.data.Dataset.apply
.
Show Example
# NOTE: The following example uses `{... }` to represent the contents # of a dataset. a = { ['a', 'b', 'c'], ['a', 'b'], ['a', 'b', 'c', 'd'] } a.apply(tf.data.experimental.unbatch()) == { 'a', 'b', 'c', 'a', 'b', 'a', 'b', 'c', 'd'}
object unbatch_dyn()
Splits elements of a dataset into multiple elements on the batch dimension. (deprecated) Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.unbatch()`. For example, if elements of the dataset are shaped `[B, a0, a1,...]`,
where `B` may vary for each input element, then for each element in the
dataset, the unbatched dataset will contain `B` consecutive elements
of shape `[a0, a1,...]`.
Returns
-
object
- A `Dataset` transformation function, which can be passed to
tf.data.Dataset.apply
.
Show Example
# NOTE: The following example uses `{... }` to represent the contents # of a dataset. a = { ['a', 'b', 'c'], ['a', 'b'], ['a', 'b', 'c', 'd'] } a.apply(tf.data.experimental.unbatch()) == { 'a', 'b', 'c', 'a', 'b', 'a', 'b', 'c', 'd'}
object window(int size, Nullable<int> shift, int stride, bool drop_remainder)
Combines (nests of) input elements into a dataset of (nests of) windows. A "window" is a finite dataset of flat elements of size `size` (or possibly
fewer if there are not enough input elements to fill the window and
`drop_remainder` evaluates to false). The `stride` argument determines the stride of the input elements, and the
`shift` argument determines the shift of the window. For example, letting {...} to represent a Dataset: - `tf.data.Dataset.range(7).window(2)` produces
`{{0, 1}, {2, 3}, {4, 5}, {6}}`
- `tf.data.Dataset.range(7).window(3, 2, 1, True)` produces
`{{0, 1, 2}, {2, 3, 4}, {4, 5, 6}}`
- `tf.data.Dataset.range(7).window(3, 1, 2, True)` produces
`{{0, 2, 4}, {1, 3, 5}, {2, 4, 6}}` Note that when the `window` transformation is applied to a dataset of
nested elements, it produces a dataset of nested windows. For example: - `tf.data.Dataset.from_tensor_slices((range(4), range(4))).window(2)`
produces `{({0, 1}, {0, 1}), ({2, 3}, {2, 3})}`
- `tf.data.Dataset.from_tensor_slices({"a": range(4)}).window(2)`
produces `{{"a": {0, 1}}, {"a": {2, 3}}}`
Parameters
-
int
size - A
tf.int64
scalartf.Tensor
, representing the number of elements of the input dataset to combine into a window. -
Nullable<int>
shift - (Optional.) A
tf.int64
scalartf.Tensor
, representing the forward shift of the sliding window in each iteration. Defaults to `size`. -
int
stride - (Optional.) A
tf.int64
scalartf.Tensor
, representing the stride of the input elements in the sliding window. -
bool
drop_remainder - (Optional.) A
tf.bool
scalartf.Tensor
, representing whether a window should be dropped in case its size is smaller than `window_size`.
Returns
-
object
object window_dyn(object size, object shift, ImplicitContainer<T> stride, ImplicitContainer<T> drop_remainder)
Combines (nests of) input elements into a dataset of (nests of) windows. A "window" is a finite dataset of flat elements of size `size` (or possibly
fewer if there are not enough input elements to fill the window and
`drop_remainder` evaluates to false). The `stride` argument determines the stride of the input elements, and the
`shift` argument determines the shift of the window. For example, letting {...} to represent a Dataset: - `tf.data.Dataset.range(7).window(2)` produces
`{{0, 1}, {2, 3}, {4, 5}, {6}}`
- `tf.data.Dataset.range(7).window(3, 2, 1, True)` produces
`{{0, 1, 2}, {2, 3, 4}, {4, 5, 6}}`
- `tf.data.Dataset.range(7).window(3, 1, 2, True)` produces
`{{0, 2, 4}, {1, 3, 5}, {2, 4, 6}}` Note that when the `window` transformation is applied to a dataset of
nested elements, it produces a dataset of nested windows. For example: - `tf.data.Dataset.from_tensor_slices((range(4), range(4))).window(2)`
produces `{({0, 1}, {0, 1}), ({2, 3}, {2, 3})}`
- `tf.data.Dataset.from_tensor_slices({"a": range(4)}).window(2)`
produces `{{"a": {0, 1}}, {"a": {2, 3}}}`
Parameters
-
object
size - A
tf.int64
scalartf.Tensor
, representing the number of elements of the input dataset to combine into a window. -
object
shift - (Optional.) A
tf.int64
scalartf.Tensor
, representing the forward shift of the sliding window in each iteration. Defaults to `size`. -
ImplicitContainer<T>
stride - (Optional.) A
tf.int64
scalartf.Tensor
, representing the stride of the input elements in the sliding window. -
ImplicitContainer<T>
drop_remainder - (Optional.) A
tf.bool
scalartf.Tensor
, representing whether a window should be dropped in case its size is smaller than `window_size`.
Returns
-
object