Type tf.data.experimental
Namespace tensorflow
Methods
- bucket_by_sequence_length
- bucket_by_sequence_length_dyn
- bytes_produced_stats
- bytes_produced_stats
- bytes_produced_stats_dyn
- cardinality
- cardinality
- cardinality_dyn
- copy_to_device
- dense_to_sparse_batch
- from_variant
- from_variant
- from_variant_dyn
- get_next_as_optional
- get_next_as_optional
- get_next_as_optional
- get_next_as_optional
- get_next_as_optional
- get_next_as_optional_dyn
- get_single_element
- get_structure
- get_structure
- get_structure_dyn
- group_by_reducer
- group_by_window
- latency_stats
- latency_stats
- latency_stats_dyn
- make_batched_features_dataset
- make_batched_features_dataset
- make_csv_dataset
- make_saveable_from_iterator
- map_and_batch
- map_and_batch_with_legacy_function
- map_and_batch_with_legacy_function_dyn
- parallel_interleave
- parallel_interleave
- parse_example_dataset
- prefetch_to_device
- rejection_resample
- scan
- scan
- scan
- scan
- scan
- scan
- shuffle_and_repeat
- take_while
- take_while_dyn
- to_variant
- to_variant
- to_variant_dyn
Properties
- AUTOTUNE
- bucket_by_sequence_length_fn
- bytes_produced_stats_fn
- cardinality_fn
- choose_from_datasets_fn
- copy_to_device_fn
- Counter_fn
- dense_to_sparse_batch_fn
- enumerate_dataset_fn
- from_variant_fn
- get_next_as_optional_fn
- get_single_element_fn
- get_structure_fn
- group_by_reducer_fn
- group_by_window_fn
- ignore_errors_fn
- INFINITE_CARDINALITY
- latency_stats_fn
- make_batched_features_dataset_fn
- make_csv_dataset_fn
- make_saveable_from_iterator_fn
- map_and_batch_fn
- map_and_batch_with_legacy_function_fn
- parallel_interleave_fn
- parse_example_dataset_fn
- prefetch_to_device_fn
- rejection_resample_fn
- sample_from_datasets_fn
- scan_fn
- shuffle_and_repeat_fn
- take_while_fn
- to_variant_fn
- unbatch_fn
- unique_fn
- UNKNOWN_CARDINALITY
Public static methods
object bucket_by_sequence_length(object element_length_func, IEnumerable<int> bucket_boundaries, IEnumerable<int> bucket_batch_sizes, object padded_shapes, object padding_values, bool pad_to_bucket_boundary, bool no_padding, bool drop_remainder)
A transformation that buckets elements in a `Dataset` by length. Elements of the `Dataset` are grouped together by length and then are padded
and batched. This is useful for sequence tasks in which the elements have variable length.
Grouping together elements that have similar lengths reduces the total
fraction of padding in a batch which increases training step efficiency.
Parameters
-
object
element_length_func - function from element in `Dataset` to
tf.int32
, determines the length of the element, which will determine the bucket it goes into. -
IEnumerable<int>
bucket_boundaries - `list
`, upper length boundaries of the buckets. -
IEnumerable<int>
bucket_batch_sizes - `list
`, batch size per bucket. Length should be `len(bucket_boundaries) + 1`. -
object
padded_shapes - Nested structure of
tf.TensorShape
to pass totf.data.Dataset.padded_batch
. If not provided, will use `dataset.output_shapes`, which will result in variable length dimensions being padded out to the maximum length in each batch. -
object
padding_values - Values to pad with, passed to
tf.data.Dataset.padded_batch
. Defaults to padding with 0. -
bool
pad_to_bucket_boundary - bool, if `False`, will pad dimensions with unknown size to maximum length in batch. If `True`, will pad dimensions with unknown size to bucket boundary minus 1 (i.e., the maximum length in each bucket), and caller must ensure that the source `Dataset` does not contain any elements with length longer than `max(bucket_boundaries)`.
-
bool
no_padding - `bool`, indicates whether to pad the batch features (features
need to be either of type
tf.SparseTensor
or of same shape). -
bool
drop_remainder - (Optional.) A
tf.bool
scalartf.Tensor
, representing whether the last batch should be dropped in the case it has fewer than `batch_size` elements; the default behavior is not to drop the smaller batch.
Returns
-
object
- A `Dataset` transformation function, which can be passed to
tf.data.Dataset.apply
.
object bucket_by_sequence_length_dyn(object element_length_func, object bucket_boundaries, object bucket_batch_sizes, object padded_shapes, object padding_values, ImplicitContainer<T> pad_to_bucket_boundary, ImplicitContainer<T> no_padding, ImplicitContainer<T> drop_remainder)
object bytes_produced_stats(IEnumerable<string> tag)
Records the number of bytes produced by each element of the input dataset. To consume the statistics, associate a `StatsAggregator` with the output
dataset.
Parameters
-
IEnumerable<string>
tag - String. All statistics recorded by the returned transformation will be associated with the given `tag`.
Returns
-
object
- A `Dataset` transformation function, which can be passed to
tf.data.Dataset.apply
.
object bytes_produced_stats(string tag)
Records the number of bytes produced by each element of the input dataset. To consume the statistics, associate a `StatsAggregator` with the output
dataset.
Parameters
-
string
tag - String. All statistics recorded by the returned transformation will be associated with the given `tag`.
Returns
-
object
- A `Dataset` transformation function, which can be passed to
tf.data.Dataset.apply
.
object bytes_produced_stats_dyn(object tag)
Records the number of bytes produced by each element of the input dataset. To consume the statistics, associate a `StatsAggregator` with the output
dataset.
Parameters
-
object
tag - String. All statistics recorded by the returned transformation will be associated with the given `tag`.
Returns
-
object
- A `Dataset` transformation function, which can be passed to
tf.data.Dataset.apply
.
object cardinality_dyn(object dataset)
Returns the cardinality of `dataset`, if known. The operation returns the cardinality of `dataset`. The operation may return
tf.data.experimental.INFINITE_CARDINALITY
if `dataset` contains an infinite
number of elements or tf.data.experimental.UNKNOWN_CARDINALITY
if the
analysis fails to determine the number of elements in `dataset` (e.g. when the
dataset source is a file).
Parameters
-
object
dataset - A
tf.data.Dataset
for which to determine cardinality.
Returns
-
object
- A scalar
tf.int64
`Tensor` representing the cardinality of `dataset`. If the cardinality is infinite or unknown, the operation returns the named constant `INFINITE_CARDINALITY` and `UNKNOWN_CARDINALITY` respectively.
object copy_to_device(string target_device, string source_device)
A transformation that copies dataset elements to the given `target_device`.
Parameters
-
string
target_device - The name of a device to which elements will be copied.
-
string
source_device - The original device on which `input_dataset` will be placed.
Returns
-
object
- A `Dataset` transformation function, which can be passed to
tf.data.Dataset.apply
.
object dense_to_sparse_batch(int batch_size, IEnumerable<int> row_shape)
A transformation that batches ragged elements into
tf.SparseTensor
s. Like `Dataset.padded_batch()`, this transformation combines multiple
consecutive elements of the dataset, which might have different
shapes, into a single element. The resulting element has three
components (`indices`, `values`, and `dense_shape`), which
comprise a tf.SparseTensor
that represents the same data. The
`row_shape` represents the dense shape of each row in the
resulting tf.SparseTensor
, to which the effective batch size is
prepended.
Parameters
-
int
batch_size - A
tf.int64
scalartf.Tensor
, representing the number of consecutive elements of this dataset to combine in a single batch. -
IEnumerable<int>
row_shape - A
tf.TensorShape
ortf.int64
vector tensor-like object representing the equivalent dense shape of a row in the resultingtf.SparseTensor
. Each element of this dataset must have the same rank as `row_shape`, and must have size less than or equal to `row_shape` in each dimension.
Returns
-
object
- A `Dataset` transformation function, which can be passed to
tf.data.Dataset.apply
.
Show Example
# NOTE: The following examples use `{... }` to represent the # contents of a dataset. a = { ['a', 'b', 'c'], ['a', 'b'], ['a', 'b', 'c', 'd'] } a.apply(tf.data.experimental.dense_to_sparse_batch( batch_size=2, row_shape=[6])) == { ([[0, 0], [0, 1], [0, 2], [1, 0], [1, 1]], # indices ['a', 'b', 'c', 'a', 'b'], # values [2, 6]), # dense_shape ([[0, 0], [0, 1], [0, 2], [0, 3]], ['a', 'b', 'c', 'd'], [1, 6]) }
_VariantDataset from_variant(IGraphNodeBase variant, TensorSpec structure)
Constructs a dataset from the given variant and structure.
Parameters
-
IGraphNodeBase
variant - A scalar
tf.variant
tensor representing a dataset. -
TensorSpec
structure - A
tf.data.experimental.Structure
object representing the structure of each element in the dataset.
Returns
-
_VariantDataset
- A
tf.data.Dataset
instance.
_VariantDataset from_variant(IGraphNodeBase variant, ValueTuple<TensorSpec, object, object> structure)
Constructs a dataset from the given variant and structure.
Parameters
-
IGraphNodeBase
variant - A scalar
tf.variant
tensor representing a dataset. -
ValueTuple<TensorSpec, object, object>
structure - A
tf.data.experimental.Structure
object representing the structure of each element in the dataset.
Returns
-
_VariantDataset
- A
tf.data.Dataset
instance.
object from_variant_dyn(object variant, object structure)
Constructs a dataset from the given variant and structure.
Parameters
-
object
variant - A scalar
tf.variant
tensor representing a dataset. -
object
structure - A
tf.data.experimental.Structure
object representing the structure of each element in the dataset.
Returns
-
object
- A
tf.data.Dataset
instance.
_OptionalImpl get_next_as_optional(Trackable iterator)
Returns an `Optional` that contains the next value from the iterator. If `iterator` has reached the end of the sequence, the returned `Optional`
will have no value.
Parameters
-
Trackable
iterator - A `tf.compat.v1.data.Iterator` object.
Returns
-
_OptionalImpl
- An `Optional` object representing the next value from the iterator (if it has one) or no value.
_OptionalImpl get_next_as_optional(IEnumerable<object> iterator)
Returns an `Optional` that contains the next value from the iterator. If `iterator` has reached the end of the sequence, the returned `Optional`
will have no value.
Parameters
-
IEnumerable<object>
iterator - A `tf.compat.v1.data.Iterator` object.
Returns
-
_OptionalImpl
- An `Optional` object representing the next value from the iterator (if it has one) or no value.
_OptionalImpl get_next_as_optional(IEnumerator<object> iterator)
Returns an `Optional` that contains the next value from the iterator. If `iterator` has reached the end of the sequence, the returned `Optional`
will have no value.
Parameters
-
IEnumerator<object>
iterator - A `tf.compat.v1.data.Iterator` object.
Returns
-
_OptionalImpl
- An `Optional` object representing the next value from the iterator (if it has one) or no value.
_OptionalImpl get_next_as_optional(MultiDeviceIteratorV2 iterator)
Returns an `Optional` that contains the next value from the iterator. If `iterator` has reached the end of the sequence, the returned `Optional`
will have no value.
Parameters
-
MultiDeviceIteratorV2
iterator - A `tf.compat.v1.data.Iterator` object.
Returns
-
_OptionalImpl
- An `Optional` object representing the next value from the iterator (if it has one) or no value.
_OptionalImpl get_next_as_optional(object iterator)
Returns an `Optional` that contains the next value from the iterator. If `iterator` has reached the end of the sequence, the returned `Optional`
will have no value.
Parameters
-
object
iterator - A `tf.compat.v1.data.Iterator` object.
Returns
-
_OptionalImpl
- An `Optional` object representing the next value from the iterator (if it has one) or no value.
object get_next_as_optional_dyn(object iterator)
Returns an `Optional` that contains the next value from the iterator. If `iterator` has reached the end of the sequence, the returned `Optional`
will have no value.
Parameters
-
object
iterator - A `tf.compat.v1.data.Iterator` object.
Returns
-
object
- An `Optional` object representing the next value from the iterator (if it has one) or no value.
object get_single_element(Dataset dataset)
object get_structure(object dataset_or_iterator)
Returns the type specification of an element of a `Dataset` or `Iterator`.
Parameters
-
object
dataset_or_iterator - A
tf.data.Dataset
ortf.data.Iterator
.
Returns
-
object
- A nested structure of
tf.TypeSpec
objects matching the structure of an element of `dataset_or_iterator` and spacifying the type of individal components.
object get_structure(IEnumerable<IGraphNodeBase> dataset_or_iterator)
Returns the type specification of an element of a `Dataset` or `Iterator`.
Parameters
-
IEnumerable<IGraphNodeBase>
dataset_or_iterator - A
tf.data.Dataset
ortf.data.Iterator
.
Returns
-
object
- A nested structure of
tf.TypeSpec
objects matching the structure of an element of `dataset_or_iterator` and spacifying the type of individal components.
object get_structure_dyn(object dataset_or_iterator)
Returns the type specification of an element of a `Dataset` or `Iterator`.
Parameters
-
object
dataset_or_iterator - A
tf.data.Dataset
ortf.data.Iterator
.
Returns
-
object
- A nested structure of
tf.TypeSpec
objects matching the structure of an element of `dataset_or_iterator` and spacifying the type of individal components.
object group_by_reducer(PythonFunctionContainer key_func, Reducer reducer)
object group_by_window(PythonFunctionContainer key_func, PythonFunctionContainer reduce_func, Nullable<int> window_size, object window_size_func)
object latency_stats(string tag)
Records the latency of producing each element of the input dataset. To consume the statistics, associate a `StatsAggregator` with the output
dataset.
Parameters
-
string
tag - String. All statistics recorded by the returned transformation will be associated with the given `tag`.
Returns
-
object
- A `Dataset` transformation function, which can be passed to
tf.data.Dataset.apply
.
object latency_stats(IEnumerable<string> tag)
Records the latency of producing each element of the input dataset. To consume the statistics, associate a `StatsAggregator` with the output
dataset.
Parameters
-
IEnumerable<string>
tag - String. All statistics recorded by the returned transformation will be associated with the given `tag`.
Returns
-
object
- A `Dataset` transformation function, which can be passed to
tf.data.Dataset.apply
.
object latency_stats_dyn(object tag)
Records the latency of producing each element of the input dataset. To consume the statistics, associate a `StatsAggregator` with the output
dataset.
Parameters
-
object
tag - String. All statistics recorded by the returned transformation will be associated with the given `tag`.
Returns
-
object
- A `Dataset` transformation function, which can be passed to
tf.data.Dataset.apply
.
DatasetV1Adapter make_batched_features_dataset(IEnumerable<object> file_pattern, int batch_size, IDictionary<string, object> features, ImplicitContainer<T> reader, string label_key, object reader_args, Nullable<int> num_epochs, bool shuffle, int shuffle_buffer_size, Nullable<int> shuffle_seed, object prefetch_buffer_size, Nullable<int> reader_num_threads, Nullable<int> parser_num_threads, bool sloppy_ordering, bool drop_final_batch)
DatasetV1Adapter make_batched_features_dataset(IEnumerable<object> file_pattern, int batch_size, IDictionary<string, object> features, PythonClassContainer reader, string label_key, object reader_args, Nullable<int> num_epochs, bool shuffle, int shuffle_buffer_size, Nullable<int> shuffle_seed, object prefetch_buffer_size, Nullable<int> reader_num_threads, Nullable<int> parser_num_threads, bool sloppy_ordering, bool drop_final_batch)
DatasetV1Adapter make_csv_dataset(IEnumerable<object> file_pattern, int batch_size, object column_names, object column_defaults, object label_name, object select_columns, string field_delim, bool use_quote_delim, string na_value, bool header, Nullable<int> num_epochs, bool shuffle, int shuffle_buffer_size, object shuffle_seed, object prefetch_buffer_size, object num_parallel_reads, bool sloppy, int num_rows_for_inference, object compression_type, bool ignore_errors)
Reads CSV files into a dataset. Reads CSV files into a dataset, where each element is a (features, labels)
tuple that corresponds to a batch of CSV rows. The features dictionary
maps feature column names to `Tensor`s containing the corresponding
feature data, and labels is a `Tensor` containing the batch's label data.
Parameters
-
IEnumerable<object>
file_pattern - List of files or patterns of file paths containing CSV
records. See
tf.io.gfile.glob
for pattern rules. -
int
batch_size - An int representing the number of records to combine in a single batch.
-
object
column_names - An optional list of strings that corresponds to the CSV columns, in order. One per column of the input record. If this is not provided, infers the column names from the first row of the records. These names will be the keys of the features dict of each dataset element.
-
object
column_defaults - A optional list of default values for the CSV fields. One item per selected column of the input record. Each item in the list is either a valid CSV dtype (float32, float64, int32, int64, or string), or a `Tensor` with one of the aforementioned types. The tensor can either be a scalar default value (if the column is optional), or an empty tensor (if the column is required). If a dtype is provided instead of a tensor, the column is also treated as required. If this list is not provided, tries to infer types based on reading the first num_rows_for_inference rows of files specified, and assumes all columns are optional, defaulting to `0` for numeric values and `""` for string values. If both this and `select_columns` are specified, these must have the same lengths, and `column_defaults` is assumed to be sorted in order of increasing column index.
-
object
label_name - A optional string corresponding to the label column. If provided, the data for this column is returned as a separate `Tensor` from the features dictionary, so that the dataset complies with the format expected by a `tf.Estimator.train` or `tf.Estimator.evaluate` input function.
-
object
select_columns - An optional list of integer indices or string column names, that specifies a subset of columns of CSV data to select. If column names are provided, these must correspond to names provided in `column_names` or inferred from the file header lines. When this argument is specified, only a subset of CSV columns will be parsed and returned, corresponding to the columns specified. Using this results in faster parsing and lower memory usage. If both this and `column_defaults` are specified, these must have the same lengths, and `column_defaults` is assumed to be sorted in order of increasing column index.
-
string
field_delim - An optional `string`. Defaults to `","`. Char delimiter to separate fields in a record.
-
bool
use_quote_delim - An optional bool. Defaults to `True`. If false, treats double quotation marks as regular characters inside of the string fields.
-
string
na_value - Additional string to recognize as NA/NaN.
-
bool
header - A bool that indicates whether the first rows of provided CSV files correspond to header lines with column names, and should not be included in the data.
-
Nullable<int>
num_epochs - An int specifying the number of times this dataset is repeated. If None, cycles through the dataset forever.
-
bool
shuffle - A bool that indicates whether the input should be shuffled.
-
int
shuffle_buffer_size - Buffer size to use for shuffling. A large buffer size ensures better shuffling, but increases memory usage and startup time.
-
object
shuffle_seed - Randomization seed to use for shuffling.
-
object
prefetch_buffer_size - An int specifying the number of feature batches to prefetch for performance improvement. Recommended value is the number of batches consumed per training step. Defaults to auto-tune.
-
object
num_parallel_reads - Number of threads used to read CSV records from files. If >1, the results will be interleaved. Defaults to `1`.
-
bool
sloppy - If `True`, reading performance will be improved at the cost of non-deterministic ordering. If `False`, the order of elements produced is deterministic prior to shuffling (elements are still randomized if `shuffle=True`. Note that if the seed is set, then order of elements after shuffling is deterministic). Defaults to `False`.
-
int
num_rows_for_inference - Number of rows of a file to use for type inference if record_defaults is not provided. If None, reads all the rows of all the files. Defaults to 100.
-
object
compression_type - (Optional.) A
tf.string
scalar evaluating to one of `""` (no compression), `"ZLIB"`, or `"GZIP"`. Defaults to no compression. -
bool
ignore_errors - (Optional.) If `True`, ignores errors with CSV file parsing, such as malformed data or empty lines, and moves on to the next valid CSV record. Otherwise, the dataset raises an error and stops processing when encountering any invalid records. Defaults to `False`.
Returns
-
DatasetV1Adapter
- A dataset, where each element is a (features, labels) tuple that corresponds to a batch of `batch_size` CSV rows. The features dictionary maps feature column names to `Tensor`s containing the corresponding column data, and labels is a `Tensor` containing the column data for the label column specified by `label_name`.
object map_and_batch(PythonFunctionContainer map_func, int batch_size, Nullable<int> num_parallel_batches, bool drop_remainder, Nullable<int> num_parallel_calls)
object map_and_batch_with_legacy_function(object map_func, object batch_size, object num_parallel_batches, bool drop_remainder, object num_parallel_calls)
Fused implementation of `map` and `batch`. (deprecated) Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch() NOTE: This is an escape hatch for existing uses of `map_and_batch` that do not
work with V2 functions. New uses are strongly discouraged and existing uses
should migrate to `map_and_batch` as this method will not be removed in V2.
Parameters
-
object
map_func - A function mapping a nested structure of tensors to another nested structure of tensors.
-
object
batch_size - A
tf.int64
scalartf.Tensor
, representing the number of consecutive elements of this dataset to combine in a single batch. -
object
num_parallel_batches - (Optional.) A
tf.int64
scalartf.Tensor
, representing the number of batches to create in parallel. On one hand, higher values can help mitigate the effect of stragglers. On the other hand, higher values can increase contention if CPU is scarce. -
bool
drop_remainder - (Optional.) A
tf.bool
scalartf.Tensor
, representing whether the last batch should be dropped in case its size is smaller than desired; the default behavior is not to drop the smaller batch. -
object
num_parallel_calls - (Optional.) A
tf.int32
scalartf.Tensor
, representing the number of elements to process in parallel. If not specified, `batch_size * num_parallel_batches` elements will be processed in parallel. If the valuetf.data.experimental.AUTOTUNE
is used, then the number of parallel calls is set dynamically based on available CPU.
Returns
-
object
- A `Dataset` transformation function, which can be passed to
tf.data.Dataset.apply
.
object map_and_batch_with_legacy_function_dyn(object map_func, object batch_size, object num_parallel_batches, ImplicitContainer<T> drop_remainder, object num_parallel_calls)
object parallel_interleave(PythonFunctionContainer map_func, Nullable<int> cycle_length, int block_length, Nullable<bool> sloppy, Nullable<int> buffer_output_elements, Nullable<int> prefetch_input_elements)
object parallel_interleave(object map_func, Nullable<int> cycle_length, int block_length, Nullable<bool> sloppy, Nullable<int> buffer_output_elements, Nullable<int> prefetch_input_elements)
A parallel version of the `Dataset.interleave()` transformation. (deprecated) Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`. `parallel_interleave()` maps `map_func` across its input to produce nested
datasets, and outputs their elements interleaved. Unlike
tf.data.Dataset.interleave
, it gets elements from `cycle_length` nested
datasets in parallel, which increases the throughput, especially in the
presence of stragglers. Furthermore, the `sloppy` argument can be used to
improve performance, by relaxing the requirement that the outputs are produced
in a deterministic order, and allowing the implementation to skip over nested
datasets whose elements are not readily available when requested. Example usage:
WARNING: If `sloppy` is `True`, the order of produced elements is not
deterministic.
Parameters
-
object
map_func - A function mapping a nested structure of tensors to a `Dataset`.
-
Nullable<int>
cycle_length - The number of input `Dataset`s to interleave from in parallel.
-
int
block_length - The number of consecutive elements to pull from an input `Dataset` before advancing to the next input `Dataset`.
-
Nullable<bool>
sloppy - If false, elements are produced in deterministic order. Otherwise, the implementation is allowed, for the sake of expediency, to produce elements in a non-deterministic order.
-
Nullable<int>
buffer_output_elements - The number of elements each iterator being interleaved should buffer (similar to the `.prefetch()` transformation for each interleaved iterator).
-
Nullable<int>
prefetch_input_elements - The number of input elements to transform to iterators before they are needed for interleaving.
Returns
-
object
- A `Dataset` transformation function, which can be passed to
tf.data.Dataset.apply
.
Show Example
# Preprocess 4 files concurrently. filenames = tf.data.Dataset.list_files("/path/to/data/train*.tfrecords") dataset = filenames.apply( tf.data.experimental.parallel_interleave( lambda filename: tf.data.TFRecordDataset(filename), cycle_length=4))
object parse_example_dataset(IDictionary<string, object> features, Nullable<int> num_parallel_calls)
A transformation that parses `Example` protos into a `dict` of tensors. Parses a number of serialized `Example` protos given in `serialized`. We refer
to `serialized` as a batch with `batch_size` many entries of individual
`Example` protos. This op parses serialized examples into a dictionary mapping keys to `Tensor`
and `SparseTensor` objects. `features` is a dict from keys to `VarLenFeature`,
`SparseFeature`, and `FixedLenFeature` objects. Each `VarLenFeature`
and `SparseFeature` is mapped to a `SparseTensor`, and each
`FixedLenFeature` is mapped to a `Tensor`. See
tf.io.parse_example
for more
details about feature dictionaries.
Returns
-
object
- A dataset transformation function, which can be passed to
tf.data.Dataset.apply
.
object prefetch_to_device(string device, object buffer_size)
A transformation that prefetches dataset values to the given `device`. NOTE: Although the transformation creates a
tf.data.Dataset
, the
transformation must be the final `Dataset` in the input pipeline.
Parameters
-
string
device - A string. The name of a device to which elements will be prefetched.
-
object
buffer_size - (Optional.) The number of elements to buffer on `device`. Defaults to an automatically chosen value.
Returns
-
object
- A `Dataset` transformation function, which can be passed to
tf.data.Dataset.apply
.
object rejection_resample(object class_func, IEnumerable<double> target_dist, IEnumerable<double> initial_dist, Nullable<int> seed)
A transformation that resamples a dataset to achieve a target distribution. **NOTE** Resampling is performed via rejection sampling; some fraction
of the input values will be dropped.
Parameters
-
object
class_func - A function mapping an element of the input dataset to a scalar
tf.int32
tensor. Values should be in `[0, num_classes)`. -
IEnumerable<double>
target_dist - A floating point type tensor, shaped `[num_classes]`.
-
IEnumerable<double>
initial_dist - (Optional.) A floating point type tensor, shaped `[num_classes]`. If not provided, the true class distribution is estimated live in a streaming fashion.
-
Nullable<int>
seed - (Optional.) Python integer seed for the resampler.
Returns
-
object
- A `Dataset` transformation function, which can be passed to
tf.data.Dataset.apply
.
object scan(int64 initial_state, PythonFunctionContainer scan_func)
object scan(ValueTuple<IEnumerable<int>, int> initial_state, PythonFunctionContainer scan_func)
object scan(IEnumerable<int> initial_state, PythonFunctionContainer scan_func)
object scan(TensorArray initial_state, PythonFunctionContainer scan_func)
object scan(int initial_state, PythonFunctionContainer scan_func)
object scan(IGraphNodeBase initial_state, PythonFunctionContainer scan_func)
object shuffle_and_repeat(Nullable<int> buffer_size, Nullable<int> count, Nullable<int> seed)
Shuffles and repeats a Dataset returning a new permutation for each epoch. (deprecated) Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.shuffle(buffer_size, seed)` followed by `tf.data.Dataset.repeat(count)`. Static tf.data optimizations will take care of using the fused implementation. `dataset.apply(tf.data.experimental.shuffle_and_repeat(buffer_size, count))` is equivalent to `dataset.shuffle(buffer_size, reshuffle_each_iteration=True).repeat(count)` The difference is that the latter dataset is not serializable. So,
if you need to checkpoint an input pipeline with reshuffling you must use
this implementation.
Parameters
-
Nullable<int>
buffer_size - A
tf.int64
scalartf.Tensor
, representing the maximum number elements that will be buffered when prefetching. -
Nullable<int>
count - (Optional.) A
tf.int64
scalartf.Tensor
, representing the number of times the dataset should be repeated. The default behavior (if `count` is `None` or `-1`) is for the dataset be repeated indefinitely. -
Nullable<int>
seed - (Optional.) A
tf.int64
scalartf.Tensor
, representing the random seed that will be used to create the distribution. See `tf.compat.v1.set_random_seed` for behavior.
Returns
-
object
- A `Dataset` transformation function, which can be passed to
tf.data.Dataset.apply
.
object take_while(object predicate)
A transformation that stops dataset iteration based on a `predicate`.
Parameters
-
object
predicate - A function that maps a nested structure of tensors (having shapes
and types defined by `self.output_shapes` and `self.output_types`) to a
scalar
tf.bool
tensor.
Returns
-
object
- A `Dataset` transformation function, which can be passed to
tf.data.Dataset.apply
.
object take_while_dyn(object predicate)
A transformation that stops dataset iteration based on a `predicate`.
Parameters
-
object
predicate - A function that maps a nested structure of tensors (having shapes
and types defined by `self.output_shapes` and `self.output_types`) to a
scalar
tf.bool
tensor.
Returns
-
object
- A `Dataset` transformation function, which can be passed to
tf.data.Dataset.apply
.
object to_variant(Dataset dataset)
Returns a variant representing the given dataset.
Parameters
-
Dataset
dataset - A
tf.data.Dataset
.
Returns
-
object
- A scalar
tf.variant
tensor representing the given dataset.
object to_variant(Dataset dataset)
Returns a variant representing the given dataset.
Parameters
-
Dataset
dataset - A
tf.data.Dataset
.
Returns
-
object
- A scalar
tf.variant
tensor representing the given dataset.
object to_variant_dyn(object dataset)
Returns a variant representing the given dataset.
Parameters
-
object
dataset - A
tf.data.Dataset
.
Returns
-
object
- A scalar
tf.variant
tensor representing the given dataset.