TFRecordDataset - LostTech.TensorFlow Documentation

Dataset concatenate(Dataset dataset)

Creates a `Dataset` by concatenating the given dataset with this dataset.

Parameters

Dataset dataset: `Dataset` to be concatenated.

Returns

Dataset

a = Dataset.range(1, 4)  # ==> [ 1, 2, 3 ]
            b = Dataset.range(4, 8)  # ==> [ 4, 5, 6, 7 ]  # The input dataset and dataset to be concatenated should have the same
# nested structures and output types.
# c = Dataset.range(8, 14).batch(2)  # ==> [ [8, 9], [10, 11], [12, 13] ]
# d = Dataset.from_tensor_slices([14.0, 15.0, 16.0])
# a.concatenate(c) and a.concatenate(d) would result in error. 
 a.concatenate(b)  # ==> [ 1, 2, 3, 4, 5, 6, 7 ]

object flat_map(PythonFunctionContainer map_func)

Maps `map_func` across this dataset and flattens the result.

Use `flat_map` if you want to make sure that the order of your dataset stays the same. For example, to flatten a dataset of batches into a dataset of their elements: `tf.data.Dataset.interleave()` is a generalization of `flat_map`, since `flat_map` produces the same output as `tf.data.Dataset.interleave(cycle_length=1)`

Parameters

PythonFunctionContainer map_func: A function mapping a dataset element to a dataset.

Returns

object

Show Example

a = Dataset.from_tensor_slices([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ])  a.flat_map(lambda x: Dataset.from_tensor_slices(x + 1)) # ==>
#  [ 2, 3, 4, 5, 6, 7, 8, 9, 10 ]

Iterator make_initializable_iterator(string shared_name)

Creates an `Iterator` for enumerating the elements of this dataset. (deprecated)

Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use `for... in dataset:` to iterate over a dataset. If using tf.estimator, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_initializable_iterator(dataset)`.

Note: The returned iterator will be in an uninitialized state, and you must run the `iterator.initializer` operation before using it:

Parameters

string shared_name: (Optional.) If non-empty, the returned iterator will be shared under the given name across multiple sessions that share the same devices (e.g. when using a remote server).

Returns

Iterator: An `Iterator` over the elements of this dataset.

Show Example

dataset =...
            iterator = dataset.make_initializable_iterator()
            #...
            sess.run(iterator.initializer)

object map(PythonFunctionContainer map_func, Nullable<int> num_parallel_calls)

Maps `map_func` across the elements of this dataset.

This transformation applies `map_func` to each element of this dataset, and returns a new dataset containing the transformed elements, in the same order as they appeared in the input. The input signature of `map_func` is determined by the structure of each element in this dataset. The value or values returned by `map_func` determine the structure of each element in the returned dataset. `map_func` can accept as arguments and return any type of dataset element.

Note that irrespective of the context in which `map_func` is defined (eager vs. graph), tf.data traces the function and executes it as a graph. To use Python code inside of the function you have two options:

1) Rely on AutoGraph to convert Python code into an equivalent graph computation. The downside of this approach is that AutoGraph can convert some but not all Python code.

2) Use tf.py_function, which allows you to write arbitrary Python code but will generally result in worse performance than 1).

Parameters

PythonFunctionContainer map_func: A function mapping a dataset element to another dataset element.
Nullable<int> num_parallel_calls: (Optional.) A tf.int32 scalar tf.Tensor, representing the number elements to process asynchronously in parallel. If not specified, elements will be processed sequentially. If the value tf.data.experimental.AUTOTUNE is used, then the number of parallel calls is set dynamically based on available CPU.

Returns

object

Show Example

a = Dataset.range(1, 6)  # ==> [ 1, 2, 3, 4, 5 ] 
 a.map(lambda x: x + 1)  # ==> [ 2, 3, 4, 5, 6 ]

object map(object map_func, Nullable<int> num_parallel_calls)

Maps `map_func` across the elements of this dataset.

This transformation applies `map_func` to each element of this dataset, and returns a new dataset containing the transformed elements, in the same order as they appeared in the input. The input signature of `map_func` is determined by the structure of each element in this dataset. The value or values returned by `map_func` determine the structure of each element in the returned dataset. `map_func` can accept as arguments and return any type of dataset element.

Note that irrespective of the context in which `map_func` is defined (eager vs. graph), tf.data traces the function and executes it as a graph. To use Python code inside of the function you have two options:

1) Rely on AutoGraph to convert Python code into an equivalent graph computation. The downside of this approach is that AutoGraph can convert some but not all Python code.

2) Use tf.py_function, which allows you to write arbitrary Python code but will generally result in worse performance than 1).

Parameters

object map_func: A function mapping a dataset element to another dataset element.
Nullable<int> num_parallel_calls: (Optional.) A tf.int32 scalar tf.Tensor, representing the number elements to process asynchronously in parallel. If not specified, elements will be processed sequentially. If the value tf.data.experimental.AUTOTUNE is used, then the number of parallel calls is set dynamically based on available CPU.

Returns

object

Show Example

a = Dataset.range(1, 6)  # ==> [ 1, 2, 3, 4, 5 ] 
 a.map(lambda x: x + 1)  # ==> [ 2, 3, 4, 5, 6 ]

object map_dyn(object map_func, object num_parallel_calls)

Maps `map_func` across the elements of this dataset.

This transformation applies `map_func` to each element of this dataset, and returns a new dataset containing the transformed elements, in the same order as they appeared in the input. The input signature of `map_func` is determined by the structure of each element in this dataset. The value or values returned by `map_func` determine the structure of each element in the returned dataset. `map_func` can accept as arguments and return any type of dataset element.

Note that irrespective of the context in which `map_func` is defined (eager vs. graph), tf.data traces the function and executes it as a graph. To use Python code inside of the function you have two options:

1) Rely on AutoGraph to convert Python code into an equivalent graph computation. The downside of this approach is that AutoGraph can convert some but not all Python code.

2) Use tf.py_function, which allows you to write arbitrary Python code but will generally result in worse performance than 1).

Parameters

object map_func: A function mapping a dataset element to another dataset element.
object num_parallel_calls: (Optional.) A tf.int32 scalar tf.Tensor, representing the number elements to process asynchronously in parallel. If not specified, elements will be processed sequentially. If the value tf.data.experimental.AUTOTUNE is used, then the number of parallel calls is set dynamically based on available CPU.

Returns

object

Show Example

a = Dataset.range(1, 6)  # ==> [ 1, 2, 3, 4, 5 ] 
 a.map(lambda x: x + 1)  # ==> [ 2, 3, 4, 5, 6 ]

object map_with_legacy_function_dyn(object map_func, object num_parallel_calls)

Maps `map_func` across the elements of this dataset. (deprecated)

Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use `tf.data.Dataset.map()

NOTE: This is an escape hatch for existing uses of `map` that do not work with V2 functions. New uses are strongly discouraged and existing uses should migrate to `map` as this method will be removed in V2.

Parameters

object map_func: A function mapping a nested structure of tensors (having shapes and types defined by `self.output_shapes` and `self.output_types`) to another nested structure of tensors.
object num_parallel_calls: (Optional.) A tf.int32 scalar tf.Tensor, representing the number elements to process asynchronously in parallel. If not specified, elements will be processed sequentially. If the value tf.data.experimental.AUTOTUNE is used, then the number of parallel calls is set dynamically based on available CPU.

Returns

object

object padded_batch_dyn(object batch_size, object padded_shapes, object padding_values, ImplicitContainer<T> drop_remainder)

Combines consecutive elements of this dataset into padded batches.

This transformation combines multiple consecutive elements of the input dataset into a single element.

Like tf.data.Dataset.batch, the components of the resulting element will have an additional outer dimension, which will be `batch_size` (or `N % batch_size` for the last element if `batch_size` does not divide the number of input elements `N` evenly and `drop_remainder` is `False`). If your program depends on the batches having the same outer dimension, you should set the `drop_remainder` argument to `True` to prevent the smaller batch from being produced.

Unlike tf.data.Dataset.batch, the input elements to be batched may have different shapes, and this transformation will pad each component to the respective shape in `padding_shapes`. The `padding_shapes` argument determines the resulting shape for each dimension of each component in an output element:

* If the dimension is a constant (e.g. `tf.compat.v1.Dimension(37)`), the component will be padded out to that length in that dimension. * If the dimension is unknown (e.g. `tf.compat.v1.Dimension(None)`), the component will be padded out to the maximum length of all elements in that dimension.

See also tf.data.experimental.dense_to_sparse_batch, which combines elements that may have different shapes into a tf.SparseTensor.

Parameters

object batch_size: A tf.int64 scalar tf.Tensor, representing the number of consecutive elements of this dataset to combine in a single batch.
object padded_shapes: A nested structure of tf.TensorShape or tf.int64 vector tensor-like objects representing the shape to which the respective component of each input element should be padded prior to batching. Any unknown dimensions (e.g. `tf.compat.v1.Dimension(None)` in a tf.TensorShape or `-1` in a tensor-like object) will be padded to the maximum size of that dimension in each batch.
object padding_values: (Optional.) A nested structure of scalar-shaped tf.Tensor, representing the padding values to use for the respective components. Defaults are `0` for numeric types and the empty string for string types.
ImplicitContainer<T> drop_remainder: (Optional.) A tf.bool scalar tf.Tensor, representing whether the last batch should be dropped in the case it has fewer than `batch_size` elements; the default behavior is not to drop the smaller batch.

Returns

object

Dataset prefetch(int buffer_size)

Creates a `Dataset` that prefetches elements from this dataset.

Note: Like other `Dataset` methods, prefetch operates on the elements of the input dataset. It has no concept of examples vs. batches. `examples.prefetch(2)` will prefetch two elements (2 examples), while `examples.batch(20).prefetch(2)` will prefetch 2 elements (2 batches, of 20 examples each).

Parameters

int buffer_size: A tf.int64 scalar tf.Tensor, representing the maximum number of elements that will be buffered when prefetching.

Returns

Dataset

Dataset prefetch(IGraphNodeBase buffer_size)

Creates a `Dataset` that prefetches elements from this dataset.

Note: Like other `Dataset` methods, prefetch operates on the elements of the input dataset. It has no concept of examples vs. batches. `examples.prefetch(2)` will prefetch two elements (2 examples), while `examples.batch(20).prefetch(2)` will prefetch 2 elements (2 batches, of 20 examples each).

Parameters

IGraphNodeBase buffer_size: A tf.int64 scalar tf.Tensor, representing the maximum number of elements that will be buffered when prefetching.

Returns

Dataset

object prefetch_dyn(object buffer_size)

Creates a `Dataset` that prefetches elements from this dataset.

Note: Like other `Dataset` methods, prefetch operates on the elements of the input dataset. It has no concept of examples vs. batches. `examples.prefetch(2)` will prefetch two elements (2 examples), while `examples.batch(20).prefetch(2)` will prefetch 2 elements (2 batches, of 20 examples each).

Parameters

object buffer_size: A tf.int64 scalar tf.Tensor, representing the maximum number of elements that will be buffered when prefetching.

Methods

Properties

Public instance methods

Dataset concatenate(Dataset dataset)

Parameters

Returns

object flat_map(PythonFunctionContainer map_func)

Parameters

Returns

Iterator make_initializable_iterator(string shared_name)

Parameters

Returns

object map(PythonFunctionContainer map_func, Nullable<int> num_parallel_calls)

Parameters

Returns

object map(object map_func, Nullable<int> num_parallel_calls)

Parameters

Returns

object map_dyn(object map_func, object num_parallel_calls)

Parameters

Returns

object map_with_legacy_function_dyn(object map_func, object num_parallel_calls)

Parameters

Returns

object padded_batch_dyn(object batch_size, object padded_shapes, object padding_values, ImplicitContainer<T> drop_remainder)

Parameters

Returns

Dataset prefetch(int buffer_size)

Parameters

Returns

Dataset prefetch(IGraphNodeBase buffer_size)

Parameters

Returns

object prefetch_dyn(object buffer_size)

Parameters

Returns

Public properties

object element_spec get;

object element_spec_dyn get;

object output_classes get;

object output_classes_dyn get;

object output_shapes get;

object output_shapes_dyn get;

object output_types get;

object output_types_dyn get;

object PythonObject get;