Type TFRecordDataset
Namespace tensorflow.data
Parent DatasetV1Adapter
Interfaces ITFRecordDataset
A `Dataset` comprising records from one or more TFRecord files.
Methods
- concatenate
- flat_map
- make_initializable_iterator
- map
- map
- map_dyn
- map_with_legacy_function_dyn
- padded_batch_dyn
- prefetch
- prefetch
- prefetch_dyn
Properties
Public instance methods
Dataset concatenate(Dataset dataset)
Creates a `Dataset` by concatenating the given dataset with this dataset.
Parameters
-
Dataset
dataset - `Dataset` to be concatenated.
Returns
Show Example
a = Dataset.range(1, 4) # ==> [ 1, 2, 3 ] b = Dataset.range(4, 8) # ==> [ 4, 5, 6, 7 ] # The input dataset and dataset to be concatenated should have the same # nested structures and output types. # c = Dataset.range(8, 14).batch(2) # ==> [ [8, 9], [10, 11], [12, 13] ] # d = Dataset.from_tensor_slices([14.0, 15.0, 16.0]) # a.concatenate(c) and a.concatenate(d) would result in error. a.concatenate(b) # ==> [ 1, 2, 3, 4, 5, 6, 7 ]
object flat_map(PythonFunctionContainer map_func)
Maps `map_func` across this dataset and flattens the result. Use `flat_map` if you want to make sure that the order of your dataset
stays the same. For example, to flatten a dataset of batches into a
dataset of their elements:
`tf.data.Dataset.interleave()` is a generalization of `flat_map`, since
`flat_map` produces the same output as
`tf.data.Dataset.interleave(cycle_length=1)`
Parameters
-
PythonFunctionContainer
map_func - A function mapping a dataset element to a dataset.
Returns
-
object
Show Example
a = Dataset.from_tensor_slices([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ]) a.flat_map(lambda x: Dataset.from_tensor_slices(x + 1)) # ==> # [ 2, 3, 4, 5, 6, 7, 8, 9, 10 ]
Iterator make_initializable_iterator(string shared_name)
Creates an `Iterator` for enumerating the elements of this dataset. (deprecated) Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version.
Instructions for updating:
Use `for... in dataset:` to iterate over a dataset. If using
tf.estimator
, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_initializable_iterator(dataset)`. Note: The returned iterator will be in an uninitialized state,
and you must run the `iterator.initializer` operation before using it:
Parameters
-
string
shared_name - (Optional.) If non-empty, the returned iterator will be shared under the given name across multiple sessions that share the same devices (e.g. when using a remote server).
Returns
-
Iterator
- An `Iterator` over the elements of this dataset.
Show Example
dataset =... iterator = dataset.make_initializable_iterator() #... sess.run(iterator.initializer)
object map(PythonFunctionContainer map_func, Nullable<int> num_parallel_calls)
Maps `map_func` across the elements of this dataset. This transformation applies `map_func` to each element of this dataset, and
returns a new dataset containing the transformed elements, in the same
order as they appeared in the input.
The input signature of `map_func` is determined by the structure of each
element in this dataset.
The value or values returned by `map_func` determine the structure of each
element in the returned dataset.
`map_func` can accept as arguments and return any type of dataset element. Note that irrespective of the context in which `map_func` is defined (eager
vs. graph), tf.data traces the function and executes it as a graph. To use
Python code inside of the function you have two options: 1) Rely on AutoGraph to convert Python code into an equivalent graph
computation. The downside of this approach is that AutoGraph can convert
some but not all Python code. 2) Use
tf.py_function
, which allows you to write arbitrary Python code but
will generally result in worse performance than 1).
Parameters
-
PythonFunctionContainer
map_func - A function mapping a dataset element to another dataset element.
-
Nullable<int>
num_parallel_calls - (Optional.) A
tf.int32
scalartf.Tensor
, representing the number elements to process asynchronously in parallel. If not specified, elements will be processed sequentially. If the valuetf.data.experimental.AUTOTUNE
is used, then the number of parallel calls is set dynamically based on available CPU.
Returns
-
object
Show Example
a = Dataset.range(1, 6) # ==> [ 1, 2, 3, 4, 5 ] a.map(lambda x: x + 1) # ==> [ 2, 3, 4, 5, 6 ]
object map(object map_func, Nullable<int> num_parallel_calls)
Maps `map_func` across the elements of this dataset. This transformation applies `map_func` to each element of this dataset, and
returns a new dataset containing the transformed elements, in the same
order as they appeared in the input.
The input signature of `map_func` is determined by the structure of each
element in this dataset.
The value or values returned by `map_func` determine the structure of each
element in the returned dataset.
`map_func` can accept as arguments and return any type of dataset element. Note that irrespective of the context in which `map_func` is defined (eager
vs. graph), tf.data traces the function and executes it as a graph. To use
Python code inside of the function you have two options: 1) Rely on AutoGraph to convert Python code into an equivalent graph
computation. The downside of this approach is that AutoGraph can convert
some but not all Python code. 2) Use
tf.py_function
, which allows you to write arbitrary Python code but
will generally result in worse performance than 1).
Parameters
-
object
map_func - A function mapping a dataset element to another dataset element.
-
Nullable<int>
num_parallel_calls - (Optional.) A
tf.int32
scalartf.Tensor
, representing the number elements to process asynchronously in parallel. If not specified, elements will be processed sequentially. If the valuetf.data.experimental.AUTOTUNE
is used, then the number of parallel calls is set dynamically based on available CPU.
Returns
-
object
Show Example
a = Dataset.range(1, 6) # ==> [ 1, 2, 3, 4, 5 ] a.map(lambda x: x + 1) # ==> [ 2, 3, 4, 5, 6 ]
object map_dyn(object map_func, object num_parallel_calls)
Maps `map_func` across the elements of this dataset. This transformation applies `map_func` to each element of this dataset, and
returns a new dataset containing the transformed elements, in the same
order as they appeared in the input.
The input signature of `map_func` is determined by the structure of each
element in this dataset.
The value or values returned by `map_func` determine the structure of each
element in the returned dataset.
`map_func` can accept as arguments and return any type of dataset element. Note that irrespective of the context in which `map_func` is defined (eager
vs. graph), tf.data traces the function and executes it as a graph. To use
Python code inside of the function you have two options: 1) Rely on AutoGraph to convert Python code into an equivalent graph
computation. The downside of this approach is that AutoGraph can convert
some but not all Python code. 2) Use
tf.py_function
, which allows you to write arbitrary Python code but
will generally result in worse performance than 1).
Parameters
-
object
map_func - A function mapping a dataset element to another dataset element.
-
object
num_parallel_calls - (Optional.) A
tf.int32
scalartf.Tensor
, representing the number elements to process asynchronously in parallel. If not specified, elements will be processed sequentially. If the valuetf.data.experimental.AUTOTUNE
is used, then the number of parallel calls is set dynamically based on available CPU.
Returns
-
object
Show Example
a = Dataset.range(1, 6) # ==> [ 1, 2, 3, 4, 5 ] a.map(lambda x: x + 1) # ==> [ 2, 3, 4, 5, 6 ]
object map_with_legacy_function_dyn(object map_func, object num_parallel_calls)
Maps `map_func` across the elements of this dataset. (deprecated) Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map() NOTE: This is an escape hatch for existing uses of `map` that do not work
with V2 functions. New uses are strongly discouraged and existing uses
should migrate to `map` as this method will be removed in V2.
Parameters
-
object
map_func - A function mapping a nested structure of tensors (having shapes and types defined by `self.output_shapes` and `self.output_types`) to another nested structure of tensors.
-
object
num_parallel_calls - (Optional.) A
tf.int32
scalartf.Tensor
, representing the number elements to process asynchronously in parallel. If not specified, elements will be processed sequentially. If the valuetf.data.experimental.AUTOTUNE
is used, then the number of parallel calls is set dynamically based on available CPU.
Returns
-
object
object padded_batch_dyn(object batch_size, object padded_shapes, object padding_values, ImplicitContainer<T> drop_remainder)
Combines consecutive elements of this dataset into padded batches. This transformation combines multiple consecutive elements of the input
dataset into a single element. Like
tf.data.Dataset.batch
, the components of the resulting element will
have an additional outer dimension, which will be `batch_size` (or
`N % batch_size` for the last element if `batch_size` does not divide the
number of input elements `N` evenly and `drop_remainder` is `False`). If
your program depends on the batches having the same outer dimension, you
should set the `drop_remainder` argument to `True` to prevent the smaller
batch from being produced. Unlike tf.data.Dataset.batch
, the input elements to be batched may have
different shapes, and this transformation will pad each component to the
respective shape in `padding_shapes`. The `padding_shapes` argument
determines the resulting shape for each dimension of each component in an
output element: * If the dimension is a constant (e.g. `tf.compat.v1.Dimension(37)`), the
component
will be padded out to that length in that dimension.
* If the dimension is unknown (e.g. `tf.compat.v1.Dimension(None)`), the
component
will be padded out to the maximum length of all elements in that
dimension. See also tf.data.experimental.dense_to_sparse_batch
, which combines
elements that may have different shapes into a tf.SparseTensor
.
Parameters
-
object
batch_size - A
tf.int64
scalartf.Tensor
, representing the number of consecutive elements of this dataset to combine in a single batch. -
object
padded_shapes - A nested structure of
tf.TensorShape
ortf.int64
vector tensor-like objects representing the shape to which the respective component of each input element should be padded prior to batching. Any unknown dimensions (e.g. `tf.compat.v1.Dimension(None)` in atf.TensorShape
or `-1` in a tensor-like object) will be padded to the maximum size of that dimension in each batch. -
object
padding_values - (Optional.) A nested structure of scalar-shaped
tf.Tensor
, representing the padding values to use for the respective components. Defaults are `0` for numeric types and the empty string for string types. -
ImplicitContainer<T>
drop_remainder - (Optional.) A
tf.bool
scalartf.Tensor
, representing whether the last batch should be dropped in the case it has fewer than `batch_size` elements; the default behavior is not to drop the smaller batch.
Returns
-
object
Dataset prefetch(int buffer_size)
Creates a `Dataset` that prefetches elements from this dataset. Note: Like other `Dataset` methods, prefetch operates on the
elements of the input dataset. It has no concept of examples vs. batches.
`examples.prefetch(2)` will prefetch two elements (2 examples),
while `examples.batch(20).prefetch(2)` will prefetch 2 elements
(2 batches, of 20 examples each).
Parameters
Returns
Dataset prefetch(IGraphNodeBase buffer_size)
Creates a `Dataset` that prefetches elements from this dataset. Note: Like other `Dataset` methods, prefetch operates on the
elements of the input dataset. It has no concept of examples vs. batches.
`examples.prefetch(2)` will prefetch two elements (2 examples),
while `examples.batch(20).prefetch(2)` will prefetch 2 elements
(2 batches, of 20 examples each).
Parameters
-
IGraphNodeBase
buffer_size - A
tf.int64
scalartf.Tensor
, representing the maximum number of elements that will be buffered when prefetching.
Returns
object prefetch_dyn(object buffer_size)
Creates a `Dataset` that prefetches elements from this dataset. Note: Like other `Dataset` methods, prefetch operates on the
elements of the input dataset. It has no concept of examples vs. batches.
`examples.prefetch(2)` will prefetch two elements (2 examples),
while `examples.batch(20).prefetch(2)` will prefetch 2 elements
(2 batches, of 20 examples each).
Parameters
Returns
-
object