Type tf.strings
Namespace tensorflow
Methods
- bytes_split
- bytes_split
- bytes_split_dyn
- format
- format
- format
- format
- format
- format
- format
- format
- format
- format
- format
- format
- format_dyn
- length
- length_dyn
- lower
- lower_dyn
- ngrams
- ngrams
- ngrams
- ngrams
- ngrams
- ngrams
- ngrams
- ngrams
- ngrams
- ngrams
- ngrams
- ngrams
- ngrams
- ngrams
- ngrams
- ngrams
- ngrams
- ngrams
- ngrams
- ngrams
- ngrams_dyn
- regex_full_match
- regex_full_match
- split
- split
- split
- split
- split
- split
- split
- split
- split
- split_dyn
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- substr
- unicode_decode
- unicode_decode
- unicode_decode
- unicode_decode_dyn
- unicode_decode_with_offsets
- unicode_decode_with_offsets
- unicode_decode_with_offsets_dyn
- unicode_encode
- unicode_encode
- unicode_encode
- unicode_encode
- unicode_encode
- unicode_encode
- unicode_encode_dyn
- unicode_script
- unicode_script_dyn
- unicode_split
- unicode_split
- unicode_split
- unicode_split_dyn
- unicode_split_with_offsets
- unicode_split_with_offsets
- unicode_split_with_offsets_dyn
- unicode_transcode
- unicode_transcode_dyn
- unsorted_segment_join
- unsorted_segment_join_dyn
- upper
- upper_dyn
Properties
Public static methods
object bytes_split(RaggedTensor input, string name)
Split string elements of `input` into bytes. Examples:
Note that this op splits strings into bytes, not unicode characters. To
split strings into unicode characters, use
tf.strings.unicode_split
. See also: tf.io.decode_raw
, tf.strings.split
, tf.strings.unicode_split
.
Parameters
-
RaggedTensor
input - A string `Tensor` or `RaggedTensor`: the strings to split. Must have a statically known rank (`N`).
-
string
name - A name for the operation (optional).
Returns
-
object
- A `RaggedTensor` of rank `N+1`: the bytes that make up the source strings.
Show Example
>>> tf.strings.bytes_split('hello') ['h', 'e', 'l', 'l', 'o'] >>> tf.strings.bytes_split(['hello', '123'])
object bytes_split(IGraphNodeBase input, string name)
Split string elements of `input` into bytes. Examples:
Note that this op splits strings into bytes, not unicode characters. To
split strings into unicode characters, use
tf.strings.unicode_split
. See also: tf.io.decode_raw
, tf.strings.split
, tf.strings.unicode_split
.
Parameters
-
IGraphNodeBase
input - A string `Tensor` or `RaggedTensor`: the strings to split. Must have a statically known rank (`N`).
-
string
name - A name for the operation (optional).
Returns
-
object
- A `RaggedTensor` of rank `N+1`: the bytes that make up the source strings.
Show Example
>>> tf.strings.bytes_split('hello') ['h', 'e', 'l', 'l', 'o'] >>> tf.strings.bytes_split(['hello', '123'])
object bytes_split_dyn(object input, object name)
Split string elements of `input` into bytes. Examples:
Note that this op splits strings into bytes, not unicode characters. To
split strings into unicode characters, use
tf.strings.unicode_split
. See also: tf.io.decode_raw
, tf.strings.split
, tf.strings.unicode_split
.
Parameters
-
object
input - A string `Tensor` or `RaggedTensor`: the strings to split. Must have a statically known rank (`N`).
-
object
name - A name for the operation (optional).
Returns
-
object
- A `RaggedTensor` of rank `N+1`: the bytes that make up the source strings.
Show Example
>>> tf.strings.bytes_split('hello') ['h', 'e', 'l', 'l', 'o'] >>> tf.strings.bytes_split(['hello', '123'])
Tensor format(string template, IGraphNodeBase inputs, string placeholder, PythonFunctionContainer summarize, string name)
Formats a string template using a list of tensors. Formats a string template using a list of tensors, abbreviating tensors by
only printing the first and last `summarize` elements of each dimension
(recursively). If formatting only one tensor into a template, the tensor does
not have to be wrapped in a list. Example:
Formatting a single-tensor template:
Formatting a multi-tensor template:
Parameters
-
string
template - A string template to format tensor values into.
-
IGraphNodeBase
inputs - A list of `Tensor` objects, or a single Tensor. The list of tensors to format into the template string. If a solitary tensor is passed in, the input tensor will automatically be wrapped as a list.
-
string
placeholder - An optional `string`. Defaults to `{}`. At each placeholder occurring in the template, a subsequent tensor will be inserted.
-
PythonFunctionContainer
summarize - An optional `int`. Defaults to `3`. When formatting the tensors, show the first and last `summarize` entries of each tensor dimension (recursively). If set to -1, all elements of the tensor will be shown.
-
string
name - A name for the operation (optional).
Returns
-
Tensor
- A scalar `Tensor` of type `string`.
Show Example
sess = tf.compat.v1.Session() with sess.as_default(): tensor = tf.range(10) formatted = tf.strings.format("tensor: {}, suffix", tensor) out = sess.run(formatted) expected = "tensor: [0 1 2... 7 8 9], suffix" assert(out.decode() == expected)
Tensor format(string template, ValueTuple inputs, string placeholder, PythonFunctionContainer summarize, string name)
Formats a string template using a list of tensors. Formats a string template using a list of tensors, abbreviating tensors by
only printing the first and last `summarize` elements of each dimension
(recursively). If formatting only one tensor into a template, the tensor does
not have to be wrapped in a list. Example:
Formatting a single-tensor template:
Formatting a multi-tensor template:
Parameters
-
string
template - A string template to format tensor values into.
-
ValueTuple
inputs - A list of `Tensor` objects, or a single Tensor. The list of tensors to format into the template string. If a solitary tensor is passed in, the input tensor will automatically be wrapped as a list.
-
string
placeholder - An optional `string`. Defaults to `{}`. At each placeholder occurring in the template, a subsequent tensor will be inserted.
-
PythonFunctionContainer
summarize - An optional `int`. Defaults to `3`. When formatting the tensors, show the first and last `summarize` entries of each tensor dimension (recursively). If set to -1, all elements of the tensor will be shown.
-
string
name - A name for the operation (optional).
Returns
-
Tensor
- A scalar `Tensor` of type `string`.
Show Example
sess = tf.compat.v1.Session() with sess.as_default(): tensor = tf.range(10) formatted = tf.strings.format("tensor: {}, suffix", tensor) out = sess.run(formatted) expected = "tensor: [0 1 2... 7 8 9], suffix" assert(out.decode() == expected)
Tensor format(IEnumerable<object> template, ValueTuple inputs, string placeholder, PythonFunctionContainer summarize, string name)
Formats a string template using a list of tensors. Formats a string template using a list of tensors, abbreviating tensors by
only printing the first and last `summarize` elements of each dimension
(recursively). If formatting only one tensor into a template, the tensor does
not have to be wrapped in a list. Example:
Formatting a single-tensor template:
Formatting a multi-tensor template:
Parameters
-
IEnumerable<object>
template - A string template to format tensor values into.
-
ValueTuple
inputs - A list of `Tensor` objects, or a single Tensor. The list of tensors to format into the template string. If a solitary tensor is passed in, the input tensor will automatically be wrapped as a list.
-
string
placeholder - An optional `string`. Defaults to `{}`. At each placeholder occurring in the template, a subsequent tensor will be inserted.
-
PythonFunctionContainer
summarize - An optional `int`. Defaults to `3`. When formatting the tensors, show the first and last `summarize` entries of each tensor dimension (recursively). If set to -1, all elements of the tensor will be shown.
-
string
name - A name for the operation (optional).
Returns
-
Tensor
- A scalar `Tensor` of type `string`.
Show Example
sess = tf.compat.v1.Session() with sess.as_default(): tensor = tf.range(10) formatted = tf.strings.format("tensor: {}, suffix", tensor) out = sess.run(formatted) expected = "tensor: [0 1 2... 7 8 9], suffix" assert(out.decode() == expected)
Tensor format(IEnumerable<object> template, IGraphNodeBase inputs, string placeholder, PythonFunctionContainer summarize, string name)
Formats a string template using a list of tensors. Formats a string template using a list of tensors, abbreviating tensors by
only printing the first and last `summarize` elements of each dimension
(recursively). If formatting only one tensor into a template, the tensor does
not have to be wrapped in a list. Example:
Formatting a single-tensor template:
Formatting a multi-tensor template:
Parameters
-
IEnumerable<object>
template - A string template to format tensor values into.
-
IGraphNodeBase
inputs - A list of `Tensor` objects, or a single Tensor. The list of tensors to format into the template string. If a solitary tensor is passed in, the input tensor will automatically be wrapped as a list.
-
string
placeholder - An optional `string`. Defaults to `{}`. At each placeholder occurring in the template, a subsequent tensor will be inserted.
-
PythonFunctionContainer
summarize - An optional `int`. Defaults to `3`. When formatting the tensors, show the first and last `summarize` entries of each tensor dimension (recursively). If set to -1, all elements of the tensor will be shown.
-
string
name - A name for the operation (optional).
Returns
-
Tensor
- A scalar `Tensor` of type `string`.
Show Example
sess = tf.compat.v1.Session() with sess.as_default(): tensor = tf.range(10) formatted = tf.strings.format("tensor: {}, suffix", tensor) out = sess.run(formatted) expected = "tensor: [0 1 2... 7 8 9], suffix" assert(out.decode() == expected)
Tensor format(IEnumerable<object> template, ValueTuple inputs, string placeholder, ImplicitContainer<T> summarize, string name)
Formats a string template using a list of tensors. Formats a string template using a list of tensors, abbreviating tensors by
only printing the first and last `summarize` elements of each dimension
(recursively). If formatting only one tensor into a template, the tensor does
not have to be wrapped in a list. Example:
Formatting a single-tensor template:
Formatting a multi-tensor template:
Parameters
-
IEnumerable<object>
template - A string template to format tensor values into.
-
ValueTuple
inputs - A list of `Tensor` objects, or a single Tensor. The list of tensors to format into the template string. If a solitary tensor is passed in, the input tensor will automatically be wrapped as a list.
-
string
placeholder - An optional `string`. Defaults to `{}`. At each placeholder occurring in the template, a subsequent tensor will be inserted.
-
ImplicitContainer<T>
summarize - An optional `int`. Defaults to `3`. When formatting the tensors, show the first and last `summarize` entries of each tensor dimension (recursively). If set to -1, all elements of the tensor will be shown.
-
string
name - A name for the operation (optional).
Returns
-
Tensor
- A scalar `Tensor` of type `string`.
Show Example
sess = tf.compat.v1.Session() with sess.as_default(): tensor = tf.range(10) formatted = tf.strings.format("tensor: {}, suffix", tensor) out = sess.run(formatted) expected = "tensor: [0 1 2... 7 8 9], suffix" assert(out.decode() == expected)
Tensor format(IEnumerable<object> template, IEnumerable<object> inputs, string placeholder, PythonFunctionContainer summarize, string name)
Formats a string template using a list of tensors. Formats a string template using a list of tensors, abbreviating tensors by
only printing the first and last `summarize` elements of each dimension
(recursively). If formatting only one tensor into a template, the tensor does
not have to be wrapped in a list. Example:
Formatting a single-tensor template:
Formatting a multi-tensor template:
Parameters
-
IEnumerable<object>
template - A string template to format tensor values into.
-
IEnumerable<object>
inputs - A list of `Tensor` objects, or a single Tensor. The list of tensors to format into the template string. If a solitary tensor is passed in, the input tensor will automatically be wrapped as a list.
-
string
placeholder - An optional `string`. Defaults to `{}`. At each placeholder occurring in the template, a subsequent tensor will be inserted.
-
PythonFunctionContainer
summarize - An optional `int`. Defaults to `3`. When formatting the tensors, show the first and last `summarize` entries of each tensor dimension (recursively). If set to -1, all elements of the tensor will be shown.
-
string
name - A name for the operation (optional).
Returns
-
Tensor
- A scalar `Tensor` of type `string`.
Show Example
sess = tf.compat.v1.Session() with sess.as_default(): tensor = tf.range(10) formatted = tf.strings.format("tensor: {}, suffix", tensor) out = sess.run(formatted) expected = "tensor: [0 1 2... 7 8 9], suffix" assert(out.decode() == expected)
Tensor format(IEnumerable<object> template, IEnumerable<object> inputs, string placeholder, ImplicitContainer<T> summarize, string name)
Formats a string template using a list of tensors. Formats a string template using a list of tensors, abbreviating tensors by
only printing the first and last `summarize` elements of each dimension
(recursively). If formatting only one tensor into a template, the tensor does
not have to be wrapped in a list. Example:
Formatting a single-tensor template:
Formatting a multi-tensor template:
Parameters
-
IEnumerable<object>
template - A string template to format tensor values into.
-
IEnumerable<object>
inputs - A list of `Tensor` objects, or a single Tensor. The list of tensors to format into the template string. If a solitary tensor is passed in, the input tensor will automatically be wrapped as a list.
-
string
placeholder - An optional `string`. Defaults to `{}`. At each placeholder occurring in the template, a subsequent tensor will be inserted.
-
ImplicitContainer<T>
summarize - An optional `int`. Defaults to `3`. When formatting the tensors, show the first and last `summarize` entries of each tensor dimension (recursively). If set to -1, all elements of the tensor will be shown.
-
string
name - A name for the operation (optional).
Returns
-
Tensor
- A scalar `Tensor` of type `string`.
Show Example
sess = tf.compat.v1.Session() with sess.as_default(): tensor = tf.range(10) formatted = tf.strings.format("tensor: {}, suffix", tensor) out = sess.run(formatted) expected = "tensor: [0 1 2... 7 8 9], suffix" assert(out.decode() == expected)
Tensor format(IEnumerable<object> template, IGraphNodeBase inputs, string placeholder, ImplicitContainer<T> summarize, string name)
Formats a string template using a list of tensors. Formats a string template using a list of tensors, abbreviating tensors by
only printing the first and last `summarize` elements of each dimension
(recursively). If formatting only one tensor into a template, the tensor does
not have to be wrapped in a list. Example:
Formatting a single-tensor template:
Formatting a multi-tensor template:
Parameters
-
IEnumerable<object>
template - A string template to format tensor values into.
-
IGraphNodeBase
inputs - A list of `Tensor` objects, or a single Tensor. The list of tensors to format into the template string. If a solitary tensor is passed in, the input tensor will automatically be wrapped as a list.
-
string
placeholder - An optional `string`. Defaults to `{}`. At each placeholder occurring in the template, a subsequent tensor will be inserted.
-
ImplicitContainer<T>
summarize - An optional `int`. Defaults to `3`. When formatting the tensors, show the first and last `summarize` entries of each tensor dimension (recursively). If set to -1, all elements of the tensor will be shown.
-
string
name - A name for the operation (optional).
Returns
-
Tensor
- A scalar `Tensor` of type `string`.
Show Example
sess = tf.compat.v1.Session() with sess.as_default(): tensor = tf.range(10) formatted = tf.strings.format("tensor: {}, suffix", tensor) out = sess.run(formatted) expected = "tensor: [0 1 2... 7 8 9], suffix" assert(out.decode() == expected)
Tensor format(string template, IEnumerable<object> inputs, string placeholder, ImplicitContainer<T> summarize, string name)
Formats a string template using a list of tensors. Formats a string template using a list of tensors, abbreviating tensors by
only printing the first and last `summarize` elements of each dimension
(recursively). If formatting only one tensor into a template, the tensor does
not have to be wrapped in a list. Example:
Formatting a single-tensor template:
Formatting a multi-tensor template:
Parameters
-
string
template - A string template to format tensor values into.
-
IEnumerable<object>
inputs - A list of `Tensor` objects, or a single Tensor. The list of tensors to format into the template string. If a solitary tensor is passed in, the input tensor will automatically be wrapped as a list.
-
string
placeholder - An optional `string`. Defaults to `{}`. At each placeholder occurring in the template, a subsequent tensor will be inserted.
-
ImplicitContainer<T>
summarize - An optional `int`. Defaults to `3`. When formatting the tensors, show the first and last `summarize` entries of each tensor dimension (recursively). If set to -1, all elements of the tensor will be shown.
-
string
name - A name for the operation (optional).
Returns
-
Tensor
- A scalar `Tensor` of type `string`.
Show Example
sess = tf.compat.v1.Session() with sess.as_default(): tensor = tf.range(10) formatted = tf.strings.format("tensor: {}, suffix", tensor) out = sess.run(formatted) expected = "tensor: [0 1 2... 7 8 9], suffix" assert(out.decode() == expected)
Tensor format(string template, IEnumerable<object> inputs, string placeholder, PythonFunctionContainer summarize, string name)
Formats a string template using a list of tensors. Formats a string template using a list of tensors, abbreviating tensors by
only printing the first and last `summarize` elements of each dimension
(recursively). If formatting only one tensor into a template, the tensor does
not have to be wrapped in a list. Example:
Formatting a single-tensor template:
Formatting a multi-tensor template:
Parameters
-
string
template - A string template to format tensor values into.
-
IEnumerable<object>
inputs - A list of `Tensor` objects, or a single Tensor. The list of tensors to format into the template string. If a solitary tensor is passed in, the input tensor will automatically be wrapped as a list.
-
string
placeholder - An optional `string`. Defaults to `{}`. At each placeholder occurring in the template, a subsequent tensor will be inserted.
-
PythonFunctionContainer
summarize - An optional `int`. Defaults to `3`. When formatting the tensors, show the first and last `summarize` entries of each tensor dimension (recursively). If set to -1, all elements of the tensor will be shown.
-
string
name - A name for the operation (optional).
Returns
-
Tensor
- A scalar `Tensor` of type `string`.
Show Example
sess = tf.compat.v1.Session() with sess.as_default(): tensor = tf.range(10) formatted = tf.strings.format("tensor: {}, suffix", tensor) out = sess.run(formatted) expected = "tensor: [0 1 2... 7 8 9], suffix" assert(out.decode() == expected)
Tensor format(string template, ValueTuple inputs, string placeholder, ImplicitContainer<T> summarize, string name)
Formats a string template using a list of tensors. Formats a string template using a list of tensors, abbreviating tensors by
only printing the first and last `summarize` elements of each dimension
(recursively). If formatting only one tensor into a template, the tensor does
not have to be wrapped in a list. Example:
Formatting a single-tensor template:
Formatting a multi-tensor template:
Parameters
-
string
template - A string template to format tensor values into.
-
ValueTuple
inputs - A list of `Tensor` objects, or a single Tensor. The list of tensors to format into the template string. If a solitary tensor is passed in, the input tensor will automatically be wrapped as a list.
-
string
placeholder - An optional `string`. Defaults to `{}`. At each placeholder occurring in the template, a subsequent tensor will be inserted.
-
ImplicitContainer<T>
summarize - An optional `int`. Defaults to `3`. When formatting the tensors, show the first and last `summarize` entries of each tensor dimension (recursively). If set to -1, all elements of the tensor will be shown.
-
string
name - A name for the operation (optional).
Returns
-
Tensor
- A scalar `Tensor` of type `string`.
Show Example
sess = tf.compat.v1.Session() with sess.as_default(): tensor = tf.range(10) formatted = tf.strings.format("tensor: {}, suffix", tensor) out = sess.run(formatted) expected = "tensor: [0 1 2... 7 8 9], suffix" assert(out.decode() == expected)
Tensor format(string template, IGraphNodeBase inputs, string placeholder, ImplicitContainer<T> summarize, string name)
Formats a string template using a list of tensors. Formats a string template using a list of tensors, abbreviating tensors by
only printing the first and last `summarize` elements of each dimension
(recursively). If formatting only one tensor into a template, the tensor does
not have to be wrapped in a list. Example:
Formatting a single-tensor template:
Formatting a multi-tensor template:
Parameters
-
string
template - A string template to format tensor values into.
-
IGraphNodeBase
inputs - A list of `Tensor` objects, or a single Tensor. The list of tensors to format into the template string. If a solitary tensor is passed in, the input tensor will automatically be wrapped as a list.
-
string
placeholder - An optional `string`. Defaults to `{}`. At each placeholder occurring in the template, a subsequent tensor will be inserted.
-
ImplicitContainer<T>
summarize - An optional `int`. Defaults to `3`. When formatting the tensors, show the first and last `summarize` entries of each tensor dimension (recursively). If set to -1, all elements of the tensor will be shown.
-
string
name - A name for the operation (optional).
Returns
-
Tensor
- A scalar `Tensor` of type `string`.
Show Example
sess = tf.compat.v1.Session() with sess.as_default(): tensor = tf.range(10) formatted = tf.strings.format("tensor: {}, suffix", tensor) out = sess.run(formatted) expected = "tensor: [0 1 2... 7 8 9], suffix" assert(out.decode() == expected)
object format_dyn(object template, object inputs, ImplicitContainer<T> placeholder, ImplicitContainer<T> summarize, object name)
Formats a string template using a list of tensors. Formats a string template using a list of tensors, abbreviating tensors by
only printing the first and last `summarize` elements of each dimension
(recursively). If formatting only one tensor into a template, the tensor does
not have to be wrapped in a list. Example:
Formatting a single-tensor template:
Formatting a multi-tensor template:
Parameters
-
object
template - A string template to format tensor values into.
-
object
inputs - A list of `Tensor` objects, or a single Tensor. The list of tensors to format into the template string. If a solitary tensor is passed in, the input tensor will automatically be wrapped as a list.
-
ImplicitContainer<T>
placeholder - An optional `string`. Defaults to `{}`. At each placeholder occurring in the template, a subsequent tensor will be inserted.
-
ImplicitContainer<T>
summarize - An optional `int`. Defaults to `3`. When formatting the tensors, show the first and last `summarize` entries of each tensor dimension (recursively). If set to -1, all elements of the tensor will be shown.
-
object
name - A name for the operation (optional).
Returns
-
object
- A scalar `Tensor` of type `string`.
Show Example
sess = tf.compat.v1.Session() with sess.as_default(): tensor = tf.range(10) formatted = tf.strings.format("tensor: {}, suffix", tensor) out = sess.run(formatted) expected = "tensor: [0 1 2... 7 8 9], suffix" assert(out.decode() == expected)
Tensor length(IEnumerable<Byte[]> input, string name, string unit)
String lengths of `input`. Computes the length of each string given in the input tensor.
Parameters
-
IEnumerable<Byte[]>
input - A `Tensor` of type `string`. The string for which to compute the length.
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is counted to compute string length. One of: `"BYTE"` (for the number of bytes in each string) or `"UTF8_CHAR"` (for the number of UTF-8 encoded Unicode code points in each string). Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `int32`.
object length_dyn(object input, object name, ImplicitContainer<T> unit)
String lengths of `input`. Computes the length of each string given in the input tensor.
Parameters
-
object
input - A `Tensor` of type `string`. The string for which to compute the length.
-
object
name - A name for the operation (optional).
-
ImplicitContainer<T>
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is counted to compute string length. One of: `"BYTE"` (for the number of bytes in each string) or `"UTF8_CHAR"` (for the number of UTF-8 encoded Unicode code points in each string). Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
object
- A `Tensor` of type `int32`.
Tensor lower(IGraphNodeBase input, string encoding, string name)
TODO: add doc.
Parameters
-
IGraphNodeBase
input - A `Tensor` of type `string`.
-
string
encoding - An optional `string`. Defaults to `""`.
-
string
name - A name for the operation (optional).
Returns
-
Tensor
- A `Tensor` of type `string`.
object lower_dyn(object input, ImplicitContainer<T> encoding, object name)
TODO: add doc.
Parameters
-
object
input - A `Tensor` of type `string`.
-
ImplicitContainer<T>
encoding - An optional `string`. Defaults to `""`.
-
object
name - A name for the operation (optional).
Returns
-
object
- A `Tensor` of type `string`.
object ngrams(IEnumerable<object> data, IEnumerable<int> ngram_width, string separator, object pad_values, Nullable<int> padding_width, bool preserve_short_sequences, string name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
IEnumerable<object>
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
IEnumerable<int>
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
string
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
Nullable<int>
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
bool
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
string
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
object ngrams(IEnumerable<object> data, int ngram_width, Byte[] separator, object pad_values, Nullable<int> padding_width, bool preserve_short_sequences, string name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
IEnumerable<object>
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
int
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
Byte[]
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
Nullable<int>
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
bool
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
string
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
object ngrams(IEnumerable<object> data, int ngram_width, string separator, object pad_values, Nullable<int> padding_width, bool preserve_short_sequences, string name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
IEnumerable<object>
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
int
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
string
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
Nullable<int>
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
bool
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
string
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
object ngrams(RaggedTensor data, IEnumerable<int> ngram_width, Byte[] separator, object pad_values, Nullable<int> padding_width, bool preserve_short_sequences, string name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
RaggedTensor
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
IEnumerable<int>
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
Byte[]
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
Nullable<int>
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
bool
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
string
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
object ngrams(RaggedTensor data, IEnumerable<int> ngram_width, string separator, object pad_values, Nullable<int> padding_width, bool preserve_short_sequences, string name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
RaggedTensor
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
IEnumerable<int>
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
string
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
Nullable<int>
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
bool
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
string
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
object ngrams(RaggedTensor data, int ngram_width, Byte[] separator, object pad_values, Nullable<int> padding_width, bool preserve_short_sequences, string name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
RaggedTensor
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
int
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
Byte[]
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
Nullable<int>
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
bool
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
string
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
object ngrams(RaggedTensor data, int ngram_width, string separator, object pad_values, Nullable<int> padding_width, bool preserve_short_sequences, string name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
RaggedTensor
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
int
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
string
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
Nullable<int>
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
bool
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
string
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
object ngrams(IEnumerable<object> data, IEnumerable<int> ngram_width, Byte[] separator, object pad_values, Nullable<int> padding_width, bool preserve_short_sequences, string name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
IEnumerable<object>
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
IEnumerable<int>
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
Byte[]
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
Nullable<int>
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
bool
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
string
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
object ngrams(int data, int ngram_width, Byte[] separator, object pad_values, Nullable<int> padding_width, bool preserve_short_sequences, string name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
int
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
int
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
Byte[]
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
Nullable<int>
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
bool
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
string
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
object ngrams(int data, IEnumerable<int> ngram_width, Byte[] separator, object pad_values, Nullable<int> padding_width, bool preserve_short_sequences, string name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
int
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
IEnumerable<int>
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
Byte[]
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
Nullable<int>
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
bool
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
string
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
object ngrams(int data, IEnumerable<int> ngram_width, string separator, object pad_values, Nullable<int> padding_width, bool preserve_short_sequences, string name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
int
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
IEnumerable<int>
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
string
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
Nullable<int>
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
bool
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
string
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
object ngrams(int data, int ngram_width, string separator, object pad_values, Nullable<int> padding_width, bool preserve_short_sequences, string name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
int
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
int
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
string
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
Nullable<int>
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
bool
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
string
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
object ngrams(IGraphNodeBase data, IEnumerable<int> ngram_width, Byte[] separator, object pad_values, Nullable<int> padding_width, bool preserve_short_sequences, string name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
IGraphNodeBase
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
IEnumerable<int>
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
Byte[]
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
Nullable<int>
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
bool
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
string
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
object ngrams(IGraphNodeBase data, IEnumerable<int> ngram_width, string separator, object pad_values, Nullable<int> padding_width, bool preserve_short_sequences, string name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
IGraphNodeBase
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
IEnumerable<int>
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
string
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
Nullable<int>
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
bool
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
string
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
object ngrams(IGraphNodeBase data, int ngram_width, Byte[] separator, object pad_values, Nullable<int> padding_width, bool preserve_short_sequences, string name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
IGraphNodeBase
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
int
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
Byte[]
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
Nullable<int>
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
bool
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
string
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
object ngrams(IGraphNodeBase data, int ngram_width, string separator, object pad_values, Nullable<int> padding_width, bool preserve_short_sequences, string name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
IGraphNodeBase
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
int
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
string
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
Nullable<int>
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
bool
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
string
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
object ngrams(object data, IEnumerable<int> ngram_width, Byte[] separator, object pad_values, Nullable<int> padding_width, bool preserve_short_sequences, string name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
object
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
IEnumerable<int>
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
Byte[]
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
Nullable<int>
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
bool
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
string
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
object ngrams(object data, IEnumerable<int> ngram_width, string separator, object pad_values, Nullable<int> padding_width, bool preserve_short_sequences, string name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
object
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
IEnumerable<int>
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
string
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
Nullable<int>
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
bool
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
string
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
object ngrams(object data, int ngram_width, Byte[] separator, object pad_values, Nullable<int> padding_width, bool preserve_short_sequences, string name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
object
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
int
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
Byte[]
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
Nullable<int>
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
bool
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
string
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
object ngrams(object data, int ngram_width, string separator, object pad_values, Nullable<int> padding_width, bool preserve_short_sequences, string name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
object
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
int
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
string
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
Nullable<int>
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
bool
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
string
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
object ngrams_dyn(object data, object ngram_width, ImplicitContainer<T> separator, object pad_values, object padding_width, ImplicitContainer<T> preserve_short_sequences, object name)
Create a tensor of n-grams based on `data`. Creates a tensor of n-grams based on `data`. The n-grams are created by
joining windows of `width` adjacent strings from the inner axis of `data`
using `separator`. The input data can be padded on both the start and end of the sequence, if
desired, using the `pad_values` argument. If set, `pad_values` should contain
either a tuple of strings or a single string; the 0th element of the tuple
will be used to pad the left side of the sequence and the 1st element of the
tuple will be used to pad the right side of the sequence. The `padding_width`
arg controls how many padding values are added to each side; it defaults to
`ngram_width-1`. If this op is configured to not have padding, or if it is configured to add
padding with `padding_width` set to less than ngram_width-1, it is possible
that a sequence, or a sequence plus padding, is smaller than the ngram
width. In that case, no ngrams will be generated for that sequence. This can
be prevented by setting `preserve_short_sequences`, which will cause the op
to always generate at least one ngram per non-empty sequence.
Parameters
-
object
data - A Tensor or RaggedTensor containing the source data for the ngrams.
-
object
ngram_width - The width(s) of the ngrams to create. If this is a list or tuple, the op will return ngrams of all specified arities in list order. Values must be non-Tensor integers greater than 0.
-
ImplicitContainer<T>
separator - The separator string used between ngram elements. Must be a string constant, not a Tensor.
-
object
pad_values - A tuple of (left_pad_value, right_pad_value), a single string, or None. If None, no padding will be added; if a single string, then that string will be used for both left and right padding. Values must be Python strings.
-
object
padding_width - If set, `padding_width` pad values will be added to both sides of each sequence. Defaults to `ngram_width`-1. Must be greater than 0. (Note that 1-grams are never padded, regardless of this value.)
-
ImplicitContainer<T>
preserve_short_sequences - If true, then ensure that at least one ngram is generated for each input sequence. In particular, if an input sequence is shorter than `min(ngram_width) + 2*pad_width`, then generate a single ngram containing the entire sequence. If false, then no ngrams are generated for these short input sequences.
-
object
name - The op name.
Returns
-
object
- A RaggedTensor of ngrams. If `data.shape=[D1...DN, S]`, then `output.shape=[D1...DN, NUM_NGRAMS]`, where `NUM_NGRAMS=S-ngram_width+1+2*padding_width`.
Tensor regex_full_match(IGraphNodeBase input, string pattern, string name)
Check if the input matches the regex pattern. The input is a string tensor of any shape. The pattern is a scalar
string tensor which is applied to every element of the input tensor.
The boolean values (True or False) of the output tensor indicate
if the input matches the regex pattern provided. The pattern follows the re2 syntax (https://github.com/google/re2/wiki/Syntax)
Parameters
-
IGraphNodeBase
input - A `Tensor` of type `string`. A string tensor of the text to be processed.
-
string
pattern - A `Tensor` of type `string`. A scalar string tensor containing the regular expression to match the input.
-
string
name - A name for the operation (optional).
Returns
-
Tensor
- A `Tensor` of type `bool`.
Tensor regex_full_match(IGraphNodeBase input, ValueTuple<Byte[], string> pattern, string name)
Check if the input matches the regex pattern. The input is a string tensor of any shape. The pattern is a scalar
string tensor which is applied to every element of the input tensor.
The boolean values (True or False) of the output tensor indicate
if the input matches the regex pattern provided. The pattern follows the re2 syntax (https://github.com/google/re2/wiki/Syntax)
Parameters
-
IGraphNodeBase
input - A `Tensor` of type `string`. A string tensor of the text to be processed.
-
ValueTuple<Byte[], string>
pattern - A `Tensor` of type `string`. A scalar string tensor containing the regular expression to match the input.
-
string
name - A name for the operation (optional).
Returns
-
Tensor
- A `Tensor` of type `bool`.
object split(IEnumerable<object> input, object sep, int maxsplit, string result_type, IEnumerable<object> source, string name)
Split elements of `input` based on `sep`. Let N be the size of `input` (typically N will be the batch size). Split each
element of `input` based on `sep` and return a `SparseTensor` or
`RaggedTensor` containing the split tokens. Empty tokens are ignored. Examples:
If `sep` is given, consecutive delimiters are not grouped together and are
deemed to delimit empty strings. For example, `input` of `"1<>2<><>3"` and
`sep` of `"<>"` returns `["1", "2", "", "3"]`. If `sep` is None or an empty
string, consecutive whitespace are regarded as a single separator, and the
result will contain no empty strings at the start or end if the string has
leading or trailing whitespace. Note that the above mentioned behavior matches python's str.split.
Parameters
-
IEnumerable<object>
input - A string `Tensor` of rank `N`, the strings to split. If `rank(input)` is not known statically, then it is assumed to be `1`.
-
object
sep - `0-D` string `Tensor`, the delimiter character.
-
int
maxsplit - An `int`. If `maxsplit > 0`, limit of the split of the result.
-
string
result_type - The tensor type for the result: one of `"RaggedTensor"` or `"SparseTensor"`.
-
IEnumerable<object>
source - alias for "input" argument.
-
string
name - A name for the operation (optional).
Returns
-
object
- A `SparseTensor` or `RaggedTensor` of rank `N+1`, the strings split according to the delimiter.
Show Example
>>> tf.strings.split(['hello world', 'a b c']) tf.SparseTensor(indices=[[0, 0], [0, 1], [1, 0], [1, 1], [1, 2]], values=['hello', 'world', 'a', 'b', 'c'] dense_shape=[2, 3]) >>> tf.strings.split(['hello world', 'a b c'], result_type="RaggedTensor")
object split(IGraphNodeBase input, object sep, int maxsplit, string result_type, IGraphNodeBase source, string name)
Split elements of `input` based on `sep`. Let N be the size of `input` (typically N will be the batch size). Split each
element of `input` based on `sep` and return a `SparseTensor` or
`RaggedTensor` containing the split tokens. Empty tokens are ignored. Examples:
If `sep` is given, consecutive delimiters are not grouped together and are
deemed to delimit empty strings. For example, `input` of `"1<>2<><>3"` and
`sep` of `"<>"` returns `["1", "2", "", "3"]`. If `sep` is None or an empty
string, consecutive whitespace are regarded as a single separator, and the
result will contain no empty strings at the start or end if the string has
leading or trailing whitespace. Note that the above mentioned behavior matches python's str.split.
Parameters
-
IGraphNodeBase
input - A string `Tensor` of rank `N`, the strings to split. If `rank(input)` is not known statically, then it is assumed to be `1`.
-
object
sep - `0-D` string `Tensor`, the delimiter character.
-
int
maxsplit - An `int`. If `maxsplit > 0`, limit of the split of the result.
-
string
result_type - The tensor type for the result: one of `"RaggedTensor"` or `"SparseTensor"`.
-
IGraphNodeBase
source - alias for "input" argument.
-
string
name - A name for the operation (optional).
Returns
-
object
- A `SparseTensor` or `RaggedTensor` of rank `N+1`, the strings split according to the delimiter.
Show Example
>>> tf.strings.split(['hello world', 'a b c']) tf.SparseTensor(indices=[[0, 0], [0, 1], [1, 0], [1, 1], [1, 2]], values=['hello', 'world', 'a', 'b', 'c'] dense_shape=[2, 3]) >>> tf.strings.split(['hello world', 'a b c'], result_type="RaggedTensor")
object split(IGraphNodeBase input, object sep, int maxsplit, string result_type, int source, string name)
Split elements of `input` based on `sep`. Let N be the size of `input` (typically N will be the batch size). Split each
element of `input` based on `sep` and return a `SparseTensor` or
`RaggedTensor` containing the split tokens. Empty tokens are ignored. Examples:
If `sep` is given, consecutive delimiters are not grouped together and are
deemed to delimit empty strings. For example, `input` of `"1<>2<><>3"` and
`sep` of `"<>"` returns `["1", "2", "", "3"]`. If `sep` is None or an empty
string, consecutive whitespace are regarded as a single separator, and the
result will contain no empty strings at the start or end if the string has
leading or trailing whitespace. Note that the above mentioned behavior matches python's str.split.
Parameters
-
IGraphNodeBase
input - A string `Tensor` of rank `N`, the strings to split. If `rank(input)` is not known statically, then it is assumed to be `1`.
-
object
sep - `0-D` string `Tensor`, the delimiter character.
-
int
maxsplit - An `int`. If `maxsplit > 0`, limit of the split of the result.
-
string
result_type - The tensor type for the result: one of `"RaggedTensor"` or `"SparseTensor"`.
-
int
source - alias for "input" argument.
-
string
name - A name for the operation (optional).
Returns
-
object
- A `SparseTensor` or `RaggedTensor` of rank `N+1`, the strings split according to the delimiter.
Show Example
>>> tf.strings.split(['hello world', 'a b c']) tf.SparseTensor(indices=[[0, 0], [0, 1], [1, 0], [1, 1], [1, 2]], values=['hello', 'world', 'a', 'b', 'c'] dense_shape=[2, 3]) >>> tf.strings.split(['hello world', 'a b c'], result_type="RaggedTensor")
object split(IEnumerable<object> input, object sep, int maxsplit, string result_type, int source, string name)
Split elements of `input` based on `sep`. Let N be the size of `input` (typically N will be the batch size). Split each
element of `input` based on `sep` and return a `SparseTensor` or
`RaggedTensor` containing the split tokens. Empty tokens are ignored. Examples:
If `sep` is given, consecutive delimiters are not grouped together and are
deemed to delimit empty strings. For example, `input` of `"1<>2<><>3"` and
`sep` of `"<>"` returns `["1", "2", "", "3"]`. If `sep` is None or an empty
string, consecutive whitespace are regarded as a single separator, and the
result will contain no empty strings at the start or end if the string has
leading or trailing whitespace. Note that the above mentioned behavior matches python's str.split.
Parameters
-
IEnumerable<object>
input - A string `Tensor` of rank `N`, the strings to split. If `rank(input)` is not known statically, then it is assumed to be `1`.
-
object
sep - `0-D` string `Tensor`, the delimiter character.
-
int
maxsplit - An `int`. If `maxsplit > 0`, limit of the split of the result.
-
string
result_type - The tensor type for the result: one of `"RaggedTensor"` or `"SparseTensor"`.
-
int
source - alias for "input" argument.
-
string
name - A name for the operation (optional).
Returns
-
object
- A `SparseTensor` or `RaggedTensor` of rank `N+1`, the strings split according to the delimiter.
Show Example
>>> tf.strings.split(['hello world', 'a b c']) tf.SparseTensor(indices=[[0, 0], [0, 1], [1, 0], [1, 1], [1, 2]], values=['hello', 'world', 'a', 'b', 'c'] dense_shape=[2, 3]) >>> tf.strings.split(['hello world', 'a b c'], result_type="RaggedTensor")
object split(IGraphNodeBase input, object sep, int maxsplit, string result_type, IEnumerable<object> source, string name)
Split elements of `input` based on `sep`. Let N be the size of `input` (typically N will be the batch size). Split each
element of `input` based on `sep` and return a `SparseTensor` or
`RaggedTensor` containing the split tokens. Empty tokens are ignored. Examples:
If `sep` is given, consecutive delimiters are not grouped together and are
deemed to delimit empty strings. For example, `input` of `"1<>2<><>3"` and
`sep` of `"<>"` returns `["1", "2", "", "3"]`. If `sep` is None or an empty
string, consecutive whitespace are regarded as a single separator, and the
result will contain no empty strings at the start or end if the string has
leading or trailing whitespace. Note that the above mentioned behavior matches python's str.split.
Parameters
-
IGraphNodeBase
input - A string `Tensor` of rank `N`, the strings to split. If `rank(input)` is not known statically, then it is assumed to be `1`.
-
object
sep - `0-D` string `Tensor`, the delimiter character.
-
int
maxsplit - An `int`. If `maxsplit > 0`, limit of the split of the result.
-
string
result_type - The tensor type for the result: one of `"RaggedTensor"` or `"SparseTensor"`.
-
IEnumerable<object>
source - alias for "input" argument.
-
string
name - A name for the operation (optional).
Returns
-
object
- A `SparseTensor` or `RaggedTensor` of rank `N+1`, the strings split according to the delimiter.
Show Example
>>> tf.strings.split(['hello world', 'a b c']) tf.SparseTensor(indices=[[0, 0], [0, 1], [1, 0], [1, 1], [1, 2]], values=['hello', 'world', 'a', 'b', 'c'] dense_shape=[2, 3]) >>> tf.strings.split(['hello world', 'a b c'], result_type="RaggedTensor")
object split(IEnumerable<object> input, object sep, int maxsplit, string result_type, IGraphNodeBase source, string name)
Split elements of `input` based on `sep`. Let N be the size of `input` (typically N will be the batch size). Split each
element of `input` based on `sep` and return a `SparseTensor` or
`RaggedTensor` containing the split tokens. Empty tokens are ignored. Examples:
If `sep` is given, consecutive delimiters are not grouped together and are
deemed to delimit empty strings. For example, `input` of `"1<>2<><>3"` and
`sep` of `"<>"` returns `["1", "2", "", "3"]`. If `sep` is None or an empty
string, consecutive whitespace are regarded as a single separator, and the
result will contain no empty strings at the start or end if the string has
leading or trailing whitespace. Note that the above mentioned behavior matches python's str.split.
Parameters
-
IEnumerable<object>
input - A string `Tensor` of rank `N`, the strings to split. If `rank(input)` is not known statically, then it is assumed to be `1`.
-
object
sep - `0-D` string `Tensor`, the delimiter character.
-
int
maxsplit - An `int`. If `maxsplit > 0`, limit of the split of the result.
-
string
result_type - The tensor type for the result: one of `"RaggedTensor"` or `"SparseTensor"`.
-
IGraphNodeBase
source - alias for "input" argument.
-
string
name - A name for the operation (optional).
Returns
-
object
- A `SparseTensor` or `RaggedTensor` of rank `N+1`, the strings split according to the delimiter.
Show Example
>>> tf.strings.split(['hello world', 'a b c']) tf.SparseTensor(indices=[[0, 0], [0, 1], [1, 0], [1, 1], [1, 2]], values=['hello', 'world', 'a', 'b', 'c'] dense_shape=[2, 3]) >>> tf.strings.split(['hello world', 'a b c'], result_type="RaggedTensor")
object split(int input, object sep, int maxsplit, string result_type, IGraphNodeBase source, string name)
Split elements of `input` based on `sep`. Let N be the size of `input` (typically N will be the batch size). Split each
element of `input` based on `sep` and return a `SparseTensor` or
`RaggedTensor` containing the split tokens. Empty tokens are ignored. Examples:
If `sep` is given, consecutive delimiters are not grouped together and are
deemed to delimit empty strings. For example, `input` of `"1<>2<><>3"` and
`sep` of `"<>"` returns `["1", "2", "", "3"]`. If `sep` is None or an empty
string, consecutive whitespace are regarded as a single separator, and the
result will contain no empty strings at the start or end if the string has
leading or trailing whitespace. Note that the above mentioned behavior matches python's str.split.
Parameters
-
int
input - A string `Tensor` of rank `N`, the strings to split. If `rank(input)` is not known statically, then it is assumed to be `1`.
-
object
sep - `0-D` string `Tensor`, the delimiter character.
-
int
maxsplit - An `int`. If `maxsplit > 0`, limit of the split of the result.
-
string
result_type - The tensor type for the result: one of `"RaggedTensor"` or `"SparseTensor"`.
-
IGraphNodeBase
source - alias for "input" argument.
-
string
name - A name for the operation (optional).
Returns
-
object
- A `SparseTensor` or `RaggedTensor` of rank `N+1`, the strings split according to the delimiter.
Show Example
>>> tf.strings.split(['hello world', 'a b c']) tf.SparseTensor(indices=[[0, 0], [0, 1], [1, 0], [1, 1], [1, 2]], values=['hello', 'world', 'a', 'b', 'c'] dense_shape=[2, 3]) >>> tf.strings.split(['hello world', 'a b c'], result_type="RaggedTensor")
object split(int input, object sep, int maxsplit, string result_type, int source, string name)
Split elements of `input` based on `sep`. Let N be the size of `input` (typically N will be the batch size). Split each
element of `input` based on `sep` and return a `SparseTensor` or
`RaggedTensor` containing the split tokens. Empty tokens are ignored. Examples:
If `sep` is given, consecutive delimiters are not grouped together and are
deemed to delimit empty strings. For example, `input` of `"1<>2<><>3"` and
`sep` of `"<>"` returns `["1", "2", "", "3"]`. If `sep` is None or an empty
string, consecutive whitespace are regarded as a single separator, and the
result will contain no empty strings at the start or end if the string has
leading or trailing whitespace. Note that the above mentioned behavior matches python's str.split.
Parameters
-
int
input - A string `Tensor` of rank `N`, the strings to split. If `rank(input)` is not known statically, then it is assumed to be `1`.
-
object
sep - `0-D` string `Tensor`, the delimiter character.
-
int
maxsplit - An `int`. If `maxsplit > 0`, limit of the split of the result.
-
string
result_type - The tensor type for the result: one of `"RaggedTensor"` or `"SparseTensor"`.
-
int
source - alias for "input" argument.
-
string
name - A name for the operation (optional).
Returns
-
object
- A `SparseTensor` or `RaggedTensor` of rank `N+1`, the strings split according to the delimiter.
Show Example
>>> tf.strings.split(['hello world', 'a b c']) tf.SparseTensor(indices=[[0, 0], [0, 1], [1, 0], [1, 1], [1, 2]], values=['hello', 'world', 'a', 'b', 'c'] dense_shape=[2, 3]) >>> tf.strings.split(['hello world', 'a b c'], result_type="RaggedTensor")
object split(int input, object sep, int maxsplit, string result_type, IEnumerable<object> source, string name)
Split elements of `input` based on `sep`. Let N be the size of `input` (typically N will be the batch size). Split each
element of `input` based on `sep` and return a `SparseTensor` or
`RaggedTensor` containing the split tokens. Empty tokens are ignored. Examples:
If `sep` is given, consecutive delimiters are not grouped together and are
deemed to delimit empty strings. For example, `input` of `"1<>2<><>3"` and
`sep` of `"<>"` returns `["1", "2", "", "3"]`. If `sep` is None or an empty
string, consecutive whitespace are regarded as a single separator, and the
result will contain no empty strings at the start or end if the string has
leading or trailing whitespace. Note that the above mentioned behavior matches python's str.split.
Parameters
-
int
input - A string `Tensor` of rank `N`, the strings to split. If `rank(input)` is not known statically, then it is assumed to be `1`.
-
object
sep - `0-D` string `Tensor`, the delimiter character.
-
int
maxsplit - An `int`. If `maxsplit > 0`, limit of the split of the result.
-
string
result_type - The tensor type for the result: one of `"RaggedTensor"` or `"SparseTensor"`.
-
IEnumerable<object>
source - alias for "input" argument.
-
string
name - A name for the operation (optional).
Returns
-
object
- A `SparseTensor` or `RaggedTensor` of rank `N+1`, the strings split according to the delimiter.
Show Example
>>> tf.strings.split(['hello world', 'a b c']) tf.SparseTensor(indices=[[0, 0], [0, 1], [1, 0], [1, 1], [1, 2]], values=['hello', 'world', 'a', 'b', 'c'] dense_shape=[2, 3]) >>> tf.strings.split(['hello world', 'a b c'], result_type="RaggedTensor")
object split_dyn(object input, object sep, ImplicitContainer<T> maxsplit, ImplicitContainer<T> result_type, object source, object name)
Split elements of `input` based on `sep`. Let N be the size of `input` (typically N will be the batch size). Split each
element of `input` based on `sep` and return a `SparseTensor` or
`RaggedTensor` containing the split tokens. Empty tokens are ignored. Examples:
If `sep` is given, consecutive delimiters are not grouped together and are
deemed to delimit empty strings. For example, `input` of `"1<>2<><>3"` and
`sep` of `"<>"` returns `["1", "2", "", "3"]`. If `sep` is None or an empty
string, consecutive whitespace are regarded as a single separator, and the
result will contain no empty strings at the start or end if the string has
leading or trailing whitespace. Note that the above mentioned behavior matches python's str.split.
Parameters
-
object
input - A string `Tensor` of rank `N`, the strings to split. If `rank(input)` is not known statically, then it is assumed to be `1`.
-
object
sep - `0-D` string `Tensor`, the delimiter character.
-
ImplicitContainer<T>
maxsplit - An `int`. If `maxsplit > 0`, limit of the split of the result.
-
ImplicitContainer<T>
result_type - The tensor type for the result: one of `"RaggedTensor"` or `"SparseTensor"`.
-
object
source - alias for "input" argument.
-
object
name - A name for the operation (optional).
Returns
-
object
- A `SparseTensor` or `RaggedTensor` of rank `N+1`, the strings split according to the delimiter.
Show Example
>>> tf.strings.split(['hello world', 'a b c']) tf.SparseTensor(indices=[[0, 0], [0, 1], [1, 0], [1, 1], [1, 2]], values=['hello', 'world', 'a', 'b', 'c'] dense_shape=[2, 3]) >>> tf.strings.split(['hello world', 'a b c'], result_type="RaggedTensor")
Tensor substr(IEnumerable<object> input, ndarray pos, ndarray len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
IEnumerable<object>
input - A `Tensor` of type `string`. Tensor of strings
-
ndarray
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
ndarray
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(Byte[] input, ndarray pos, ndarray len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
Byte[]
input - A `Tensor` of type `string`. Tensor of strings
-
ndarray
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
ndarray
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(PythonClassContainer input, double pos, double len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
PythonClassContainer
input - A `Tensor` of type `string`. Tensor of strings
-
double
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
double
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(IGraphNodeBase input, int pos, int len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
IGraphNodeBase
input - A `Tensor` of type `string`. Tensor of strings
-
int
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
int
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(IGraphNodeBase input, int pos, ndarray len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
IGraphNodeBase
input - A `Tensor` of type `string`. Tensor of strings
-
int
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
ndarray
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(IGraphNodeBase input, int pos, double len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
IGraphNodeBase
input - A `Tensor` of type `string`. Tensor of strings
-
int
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
double
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(IGraphNodeBase input, ndarray pos, int len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
IGraphNodeBase
input - A `Tensor` of type `string`. Tensor of strings
-
ndarray
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
int
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(IGraphNodeBase input, ndarray pos, ndarray len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
IGraphNodeBase
input - A `Tensor` of type `string`. Tensor of strings
-
ndarray
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
ndarray
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(IGraphNodeBase input, ndarray pos, double len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
IGraphNodeBase
input - A `Tensor` of type `string`. Tensor of strings
-
ndarray
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
double
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(IGraphNodeBase input, double pos, int len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
IGraphNodeBase
input - A `Tensor` of type `string`. Tensor of strings
-
double
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
int
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(IGraphNodeBase input, double pos, ndarray len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
IGraphNodeBase
input - A `Tensor` of type `string`. Tensor of strings
-
double
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
ndarray
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(IGraphNodeBase input, double pos, double len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
IGraphNodeBase
input - A `Tensor` of type `string`. Tensor of strings
-
double
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
double
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(Byte[] input, ndarray pos, double len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
Byte[]
input - A `Tensor` of type `string`. Tensor of strings
-
ndarray
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
double
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(IEnumerable<object> input, int pos, int len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
IEnumerable<object>
input - A `Tensor` of type `string`. Tensor of strings
-
int
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
int
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(IEnumerable<object> input, int pos, double len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
IEnumerable<object>
input - A `Tensor` of type `string`. Tensor of strings
-
int
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
double
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(IEnumerable<object> input, ndarray pos, int len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
IEnumerable<object>
input - A `Tensor` of type `string`. Tensor of strings
-
ndarray
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
int
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(IEnumerable<object> input, ndarray pos, double len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
IEnumerable<object>
input - A `Tensor` of type `string`. Tensor of strings
-
ndarray
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
double
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(IEnumerable<object> input, double pos, int len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
IEnumerable<object>
input - A `Tensor` of type `string`. Tensor of strings
-
double
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
int
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(IEnumerable<object> input, double pos, ndarray len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
IEnumerable<object>
input - A `Tensor` of type `string`. Tensor of strings
-
double
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
ndarray
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(IEnumerable<object> input, double pos, double len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
IEnumerable<object>
input - A `Tensor` of type `string`. Tensor of strings
-
double
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
double
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(Byte[] input, int pos, int len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
Byte[]
input - A `Tensor` of type `string`. Tensor of strings
-
int
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
int
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(Byte[] input, int pos, ndarray len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
Byte[]
input - A `Tensor` of type `string`. Tensor of strings
-
int
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
ndarray
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(Byte[] input, int pos, double len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
Byte[]
input - A `Tensor` of type `string`. Tensor of strings
-
int
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
double
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(Byte[] input, ndarray pos, int len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
Byte[]
input - A `Tensor` of type `string`. Tensor of strings
-
ndarray
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
int
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(IEnumerable<object> input, int pos, ndarray len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
IEnumerable<object>
input - A `Tensor` of type `string`. Tensor of strings
-
int
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
ndarray
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(Byte[] input, double pos, int len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
Byte[]
input - A `Tensor` of type `string`. Tensor of strings
-
double
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
int
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(PythonClassContainer input, double pos, int len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
PythonClassContainer
input - A `Tensor` of type `string`. Tensor of strings
-
double
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
int
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(Byte[] input, double pos, double len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
Byte[]
input - A `Tensor` of type `string`. Tensor of strings
-
double
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
double
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(PythonClassContainer input, int pos, int len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
PythonClassContainer
input - A `Tensor` of type `string`. Tensor of strings
-
int
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
int
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(PythonClassContainer input, int pos, ndarray len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
PythonClassContainer
input - A `Tensor` of type `string`. Tensor of strings
-
int
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
ndarray
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(PythonClassContainer input, int pos, double len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
PythonClassContainer
input - A `Tensor` of type `string`. Tensor of strings
-
int
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
double
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(Byte[] input, double pos, ndarray len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
Byte[]
input - A `Tensor` of type `string`. Tensor of strings
-
double
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
ndarray
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(PythonClassContainer input, ndarray pos, int len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
PythonClassContainer
input - A `Tensor` of type `string`. Tensor of strings
-
ndarray
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
int
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(PythonClassContainer input, ndarray pos, ndarray len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
PythonClassContainer
input - A `Tensor` of type `string`. Tensor of strings
-
ndarray
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
ndarray
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(PythonClassContainer input, ndarray pos, double len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
PythonClassContainer
input - A `Tensor` of type `string`. Tensor of strings
-
ndarray
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
double
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
Tensor substr(PythonClassContainer input, double pos, ndarray len, string name, string unit)
Return substrings from `Tensor` of strings. For each string in the input `Tensor`, creates a substring starting at index
`pos` with a total length of `len`. If `len` defines a substring that would extend beyond the length of the input
string, then as many characters as possible are used. A negative `pos` indicates distance within the string backwards from the end. If `pos` specifies an index which is out of range for any of the input strings,
then an `InvalidArgumentError` is thrown. `pos` and `len` must have the same shape, otherwise a `ValueError` is thrown on
Op creation. *NOTE*: `Substr` supports broadcasting up to two dimensions. More about
broadcasting
[here](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) --- Examples Using scalar `pos` and `len`:
Using `pos` and `len` with same shape as `input`:
Broadcasting `pos` and `len` onto `input`: ```
input = [[b'ten', b'eleven', b'twelve'],
[b'thirteen', b'fourteen', b'fifteen'],
[b'sixteen', b'seventeen', b'eighteen'],
[b'nineteen', b'twenty', b'twentyone']]
position = [1, 2, 3]
length = [1, 2, 3] output = [[b'e', b'ev', b'lve'],
[b'h', b'ur', b'tee'],
[b'i', b've', b'hte'],
[b'i', b'en', b'nty']]
``` Broadcasting `input` onto `pos` and `len`: ```
input = b'thirteen'
position = [1, 5, 7]
length = [3, 2, 1] output = [b'hir', b'ee', b'n']
```
Parameters
-
PythonClassContainer
input - A `Tensor` of type `string`. Tensor of strings
-
double
pos - A `Tensor`. Must be one of the following types: `int32`, `int64`. Scalar defining the position of first character in each substring
-
ndarray
len - A `Tensor`. Must have the same type as `pos`. Scalar defining the number of characters to include in each substring
-
string
name - A name for the operation (optional).
-
string
unit - An optional `string` from: `"BYTE", "UTF8_CHAR"`. Defaults to `"BYTE"`. The unit that is used to create the substring. One of: `"BYTE"` (for defining position and length by bytes) or `"UTF8_CHAR"` (for the UTF-8 encoded Unicode code points). The default is `"BYTE"`. Results are undefined if `unit=UTF8_CHAR` and the `input` strings do not contain structurally valid UTF-8.
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
input = [b'Hello', b'World'] position = 1 length = 3 output = [b'ell', b'orl']
object unicode_decode(IGraphNodeBase input, string input_encoding, string errors, int replacement_char, bool replace_control_characters, string name)
Decodes each string in `input` into a sequence of Unicode code points. `result[i1...iN, j]` is the Unicode codepoint for the `j`th character in
`input[i1...iN]`, when decoded using `input_encoding`.
Parameters
-
IGraphNodeBase
input - An `N` dimensional potentially ragged `string` tensor with shape `[D1...DN]`. `N` must be statically known.
-
string
input_encoding - String name for the unicode encoding that should be used to decode each string.
-
string
errors - Specifies the response when an input string can't be converted using the indicated encoding. One of: * `'strict'`: Raise an exception for any illegal substrings. * `'replace'`: Replace illegal substrings with `replacement_char`. * `'ignore'`: Skip illegal substrings.
-
int
replacement_char - The replacement codepoint to be used in place of invalid substrings in `input` when `errors='replace'`; and in place of C0 control characters in `input` when `replace_control_characters=True`.
-
bool
replace_control_characters - Whether to replace the C0 control characters `(U+0000 - U+001F)` with the `replacement_char`.
-
string
name - A name for the operation (optional).
Returns
-
object
- A `N+1` dimensional `int32` tensor with shape `[D1...DN, (num_chars)]`.
The returned tensor is a
tf.Tensor
if `input` is a scalar, or atf.RaggedTensor
otherwise. #### Example: ```python >>> input = [s.encode('utf8') for s in (u'G\xf6\xf6dnight', u'\U0001f60a')] >>> tf.strings.unicode_decode(input, 'UTF-8').tolist() [[71, 246, 246, 100, 110, 105, 103, 104, 116], [128522]] ```
object unicode_decode(IEnumerable<Byte[]> input, string input_encoding, string errors, int replacement_char, bool replace_control_characters, string name)
Decodes each string in `input` into a sequence of Unicode code points. `result[i1...iN, j]` is the Unicode codepoint for the `j`th character in
`input[i1...iN]`, when decoded using `input_encoding`.
Parameters
-
IEnumerable<Byte[]>
input - An `N` dimensional potentially ragged `string` tensor with shape `[D1...DN]`. `N` must be statically known.
-
string
input_encoding - String name for the unicode encoding that should be used to decode each string.
-
string
errors - Specifies the response when an input string can't be converted using the indicated encoding. One of: * `'strict'`: Raise an exception for any illegal substrings. * `'replace'`: Replace illegal substrings with `replacement_char`. * `'ignore'`: Skip illegal substrings.
-
int
replacement_char - The replacement codepoint to be used in place of invalid substrings in `input` when `errors='replace'`; and in place of C0 control characters in `input` when `replace_control_characters=True`.
-
bool
replace_control_characters - Whether to replace the C0 control characters `(U+0000 - U+001F)` with the `replacement_char`.
-
string
name - A name for the operation (optional).
Returns
-
object
- A `N+1` dimensional `int32` tensor with shape `[D1...DN, (num_chars)]`.
The returned tensor is a
tf.Tensor
if `input` is a scalar, or atf.RaggedTensor
otherwise. #### Example: ```python >>> input = [s.encode('utf8') for s in (u'G\xf6\xf6dnight', u'\U0001f60a')] >>> tf.strings.unicode_decode(input, 'UTF-8').tolist() [[71, 246, 246, 100, 110, 105, 103, 104, 116], [128522]] ```
object unicode_decode(ndarray input, string input_encoding, string errors, int replacement_char, bool replace_control_characters, string name)
Decodes each string in `input` into a sequence of Unicode code points. `result[i1...iN, j]` is the Unicode codepoint for the `j`th character in
`input[i1...iN]`, when decoded using `input_encoding`.
Parameters
-
ndarray
input - An `N` dimensional potentially ragged `string` tensor with shape `[D1...DN]`. `N` must be statically known.
-
string
input_encoding - String name for the unicode encoding that should be used to decode each string.
-
string
errors - Specifies the response when an input string can't be converted using the indicated encoding. One of: * `'strict'`: Raise an exception for any illegal substrings. * `'replace'`: Replace illegal substrings with `replacement_char`. * `'ignore'`: Skip illegal substrings.
-
int
replacement_char - The replacement codepoint to be used in place of invalid substrings in `input` when `errors='replace'`; and in place of C0 control characters in `input` when `replace_control_characters=True`.
-
bool
replace_control_characters - Whether to replace the C0 control characters `(U+0000 - U+001F)` with the `replacement_char`.
-
string
name - A name for the operation (optional).
Returns
-
object
- A `N+1` dimensional `int32` tensor with shape `[D1...DN, (num_chars)]`.
The returned tensor is a
tf.Tensor
if `input` is a scalar, or atf.RaggedTensor
otherwise. #### Example: ```python >>> input = [s.encode('utf8') for s in (u'G\xf6\xf6dnight', u'\U0001f60a')] >>> tf.strings.unicode_decode(input, 'UTF-8').tolist() [[71, 246, 246, 100, 110, 105, 103, 104, 116], [128522]] ```
object unicode_decode_dyn(object input, object input_encoding, ImplicitContainer<T> errors, ImplicitContainer<T> replacement_char, ImplicitContainer<T> replace_control_characters, object name)
Decodes each string in `input` into a sequence of Unicode code points. `result[i1...iN, j]` is the Unicode codepoint for the `j`th character in
`input[i1...iN]`, when decoded using `input_encoding`.
Parameters
-
object
input - An `N` dimensional potentially ragged `string` tensor with shape `[D1...DN]`. `N` must be statically known.
-
object
input_encoding - String name for the unicode encoding that should be used to decode each string.
-
ImplicitContainer<T>
errors - Specifies the response when an input string can't be converted using the indicated encoding. One of: * `'strict'`: Raise an exception for any illegal substrings. * `'replace'`: Replace illegal substrings with `replacement_char`. * `'ignore'`: Skip illegal substrings.
-
ImplicitContainer<T>
replacement_char - The replacement codepoint to be used in place of invalid substrings in `input` when `errors='replace'`; and in place of C0 control characters in `input` when `replace_control_characters=True`.
-
ImplicitContainer<T>
replace_control_characters - Whether to replace the C0 control characters `(U+0000 - U+001F)` with the `replacement_char`.
-
object
name - A name for the operation (optional).
Returns
-
object
- A `N+1` dimensional `int32` tensor with shape `[D1...DN, (num_chars)]`.
The returned tensor is a
tf.Tensor
if `input` is a scalar, or atf.RaggedTensor
otherwise. #### Example: ```python >>> input = [s.encode('utf8') for s in (u'G\xf6\xf6dnight', u'\U0001f60a')] >>> tf.strings.unicode_decode(input, 'UTF-8').tolist() [[71, 246, 246, 100, 110, 105, 103, 104, 116], [128522]] ```
object unicode_decode_with_offsets(IEnumerable<Byte[]> input, string input_encoding, string errors, int replacement_char, bool replace_control_characters, string name)
Decodes each string into a sequence of code points with start offsets. This op is similar to `tf.strings.decode(...)`, but it also returns the
start offset for each character in its respective string. This information
can be used to align the characters with the original byte sequence. Returns a tuple `(codepoints, start_offsets)` where: * `codepoints[i1...iN, j]` is the Unicode codepoint for the `j`th character
in `input[i1...iN]`, when decoded using `input_encoding`.
* `start_offsets[i1...iN, j]` is the start byte offset for the `j`th
character in `input[i1...iN]`, when decoded using `input_encoding`.
Parameters
-
IEnumerable<Byte[]>
input - An `N` dimensional potentially ragged `string` tensor with shape `[D1...DN]`. `N` must be statically known.
-
string
input_encoding - String name for the unicode encoding that should be used to decode each string.
-
string
errors - Specifies the response when an input string can't be converted using the indicated encoding. One of: * `'strict'`: Raise an exception for any illegal substrings. * `'replace'`: Replace illegal substrings with `replacement_char`. * `'ignore'`: Skip illegal substrings.
-
int
replacement_char - The replacement codepoint to be used in place of invalid substrings in `input` when `errors='replace'`; and in place of C0 control characters in `input` when `replace_control_characters=True`.
-
bool
replace_control_characters - Whether to replace the C0 control characters `(U+0000 - U+001F)` with the `replacement_char`.
-
string
name - A name for the operation (optional).
Returns
-
object
- A tuple of `N+1` dimensional tensors `(codepoints, start_offsets)`. * `codepoints` is an `int32` tensor with shape `[D1...DN, (num_chars)]`.
* `offsets` is an `int64` tensor with shape `[D1...DN, (num_chars)]`. The returned tensors are
tf.Tensor
s if `input` is a scalar, ortf.RaggedTensor
s otherwise. #### Example: ```python >>> input = [s.encode('utf8') for s in (u'G\xf6\xf6dnight', u'\U0001f60a')] >>> result = tf.strings.unicode_decode_with_offsets(input, 'UTF-8') >>> result[0].tolist() # codepoints [[71, 246, 246, 100, 110, 105, 103, 104, 116], [128522]] >>> result[1].tolist() # offsets [[0, 1, 3, 5, 6, 7, 8, 9, 10], [0]] ```
object unicode_decode_with_offsets(IGraphNodeBase input, string input_encoding, string errors, int replacement_char, bool replace_control_characters, string name)
Decodes each string into a sequence of code points with start offsets. This op is similar to `tf.strings.decode(...)`, but it also returns the
start offset for each character in its respective string. This information
can be used to align the characters with the original byte sequence. Returns a tuple `(codepoints, start_offsets)` where: * `codepoints[i1...iN, j]` is the Unicode codepoint for the `j`th character
in `input[i1...iN]`, when decoded using `input_encoding`.
* `start_offsets[i1...iN, j]` is the start byte offset for the `j`th
character in `input[i1...iN]`, when decoded using `input_encoding`.
Parameters
-
IGraphNodeBase
input - An `N` dimensional potentially ragged `string` tensor with shape `[D1...DN]`. `N` must be statically known.
-
string
input_encoding - String name for the unicode encoding that should be used to decode each string.
-
string
errors - Specifies the response when an input string can't be converted using the indicated encoding. One of: * `'strict'`: Raise an exception for any illegal substrings. * `'replace'`: Replace illegal substrings with `replacement_char`. * `'ignore'`: Skip illegal substrings.
-
int
replacement_char - The replacement codepoint to be used in place of invalid substrings in `input` when `errors='replace'`; and in place of C0 control characters in `input` when `replace_control_characters=True`.
-
bool
replace_control_characters - Whether to replace the C0 control characters `(U+0000 - U+001F)` with the `replacement_char`.
-
string
name - A name for the operation (optional).
Returns
-
object
- A tuple of `N+1` dimensional tensors `(codepoints, start_offsets)`. * `codepoints` is an `int32` tensor with shape `[D1...DN, (num_chars)]`.
* `offsets` is an `int64` tensor with shape `[D1...DN, (num_chars)]`. The returned tensors are
tf.Tensor
s if `input` is a scalar, ortf.RaggedTensor
s otherwise. #### Example: ```python >>> input = [s.encode('utf8') for s in (u'G\xf6\xf6dnight', u'\U0001f60a')] >>> result = tf.strings.unicode_decode_with_offsets(input, 'UTF-8') >>> result[0].tolist() # codepoints [[71, 246, 246, 100, 110, 105, 103, 104, 116], [128522]] >>> result[1].tolist() # offsets [[0, 1, 3, 5, 6, 7, 8, 9, 10], [0]] ```
object unicode_decode_with_offsets_dyn(object input, object input_encoding, ImplicitContainer<T> errors, ImplicitContainer<T> replacement_char, ImplicitContainer<T> replace_control_characters, object name)
Decodes each string into a sequence of code points with start offsets. This op is similar to `tf.strings.decode(...)`, but it also returns the
start offset for each character in its respective string. This information
can be used to align the characters with the original byte sequence. Returns a tuple `(codepoints, start_offsets)` where: * `codepoints[i1...iN, j]` is the Unicode codepoint for the `j`th character
in `input[i1...iN]`, when decoded using `input_encoding`.
* `start_offsets[i1...iN, j]` is the start byte offset for the `j`th
character in `input[i1...iN]`, when decoded using `input_encoding`.
Parameters
-
object
input - An `N` dimensional potentially ragged `string` tensor with shape `[D1...DN]`. `N` must be statically known.
-
object
input_encoding - String name for the unicode encoding that should be used to decode each string.
-
ImplicitContainer<T>
errors - Specifies the response when an input string can't be converted using the indicated encoding. One of: * `'strict'`: Raise an exception for any illegal substrings. * `'replace'`: Replace illegal substrings with `replacement_char`. * `'ignore'`: Skip illegal substrings.
-
ImplicitContainer<T>
replacement_char - The replacement codepoint to be used in place of invalid substrings in `input` when `errors='replace'`; and in place of C0 control characters in `input` when `replace_control_characters=True`.
-
ImplicitContainer<T>
replace_control_characters - Whether to replace the C0 control characters `(U+0000 - U+001F)` with the `replacement_char`.
-
object
name - A name for the operation (optional).
Returns
-
object
- A tuple of `N+1` dimensional tensors `(codepoints, start_offsets)`. * `codepoints` is an `int32` tensor with shape `[D1...DN, (num_chars)]`.
* `offsets` is an `int64` tensor with shape `[D1...DN, (num_chars)]`. The returned tensors are
tf.Tensor
s if `input` is a scalar, ortf.RaggedTensor
s otherwise. #### Example: ```python >>> input = [s.encode('utf8') for s in (u'G\xf6\xf6dnight', u'\U0001f60a')] >>> result = tf.strings.unicode_decode_with_offsets(input, 'UTF-8') >>> result[0].tolist() # codepoints [[71, 246, 246, 100, 110, 105, 103, 104, 116], [128522]] >>> result[1].tolist() # offsets [[0, 1, 3, 5, 6, 7, 8, 9, 10], [0]] ```
object unicode_encode(RaggedTensor input, string output_encoding, string errors, int replacement_char, string name)
Encodes each sequence of Unicode code points in `input` into a string. `result[i1...iN]` is the string formed by concatenating the Unicode
codepoints `input[1...iN, :]`, encoded using `output_encoding`.
Parameters
-
RaggedTensor
input - An `N+1` dimensional potentially ragged integer tensor with shape `[D1...DN, num_chars]`.
-
string
output_encoding - Unicode encoding that should be used to encode each codepoint sequence. Can be `"UTF-8"`, `"UTF-16-BE"`, or `"UTF-32-BE"`.
-
string
errors - Specifies the response when an invalid codepoint is encountered (optional). One of: * `'replace'`: Replace invalid codepoint with the `replacement_char`. (default) * `'ignore'`: Skip invalid codepoints. * `'strict'`: Raise an exception for any invalid codepoint.
-
int
replacement_char - The replacement character codepoint to be used in place of any invalid input when `errors='replace'`. Any valid unicode codepoint may be used. The default value is the default unicode replacement character which is 0xFFFD (U+65533).
-
string
name - A name for the operation (optional).
Returns
-
object
- A `N` dimensional `string` tensor with shape `[D1...DN]`. #### Example: ```python >>> input = [[71, 246, 246, 100, 110, 105, 103, 104, 116], [128522]] >>> unicode_encode(input, 'UTF-8') ['G\xc3\xb6\xc3\xb6dnight', '\xf0\x9f\x98\x8a'] ```
object unicode_encode(ndarray input, string output_encoding, string errors, int replacement_char, string name)
Encodes each sequence of Unicode code points in `input` into a string. `result[i1...iN]` is the string formed by concatenating the Unicode
codepoints `input[1...iN, :]`, encoded using `output_encoding`.
Parameters
-
ndarray
input - An `N+1` dimensional potentially ragged integer tensor with shape `[D1...DN, num_chars]`.
-
string
output_encoding - Unicode encoding that should be used to encode each codepoint sequence. Can be `"UTF-8"`, `"UTF-16-BE"`, or `"UTF-32-BE"`.
-
string
errors - Specifies the response when an invalid codepoint is encountered (optional). One of: * `'replace'`: Replace invalid codepoint with the `replacement_char`. (default) * `'ignore'`: Skip invalid codepoints. * `'strict'`: Raise an exception for any invalid codepoint.
-
int
replacement_char - The replacement character codepoint to be used in place of any invalid input when `errors='replace'`. Any valid unicode codepoint may be used. The default value is the default unicode replacement character which is 0xFFFD (U+65533).
-
string
name - A name for the operation (optional).
Returns
-
object
- A `N` dimensional `string` tensor with shape `[D1...DN]`. #### Example: ```python >>> input = [[71, 246, 246, 100, 110, 105, 103, 104, 116], [128522]] >>> unicode_encode(input, 'UTF-8') ['G\xc3\xb6\xc3\xb6dnight', '\xf0\x9f\x98\x8a'] ```
object unicode_encode(IEnumerable<object> input, string output_encoding, string errors, int replacement_char, string name)
Encodes each sequence of Unicode code points in `input` into a string. `result[i1...iN]` is the string formed by concatenating the Unicode
codepoints `input[1...iN, :]`, encoded using `output_encoding`.
Parameters
-
IEnumerable<object>
input - An `N+1` dimensional potentially ragged integer tensor with shape `[D1...DN, num_chars]`.
-
string
output_encoding - Unicode encoding that should be used to encode each codepoint sequence. Can be `"UTF-8"`, `"UTF-16-BE"`, or `"UTF-32-BE"`.
-
string
errors - Specifies the response when an invalid codepoint is encountered (optional). One of: * `'replace'`: Replace invalid codepoint with the `replacement_char`. (default) * `'ignore'`: Skip invalid codepoints. * `'strict'`: Raise an exception for any invalid codepoint.
-
int
replacement_char - The replacement character codepoint to be used in place of any invalid input when `errors='replace'`. Any valid unicode codepoint may be used. The default value is the default unicode replacement character which is 0xFFFD (U+65533).
-
string
name - A name for the operation (optional).
Returns
-
object
- A `N` dimensional `string` tensor with shape `[D1...DN]`. #### Example: ```python >>> input = [[71, 246, 246, 100, 110, 105, 103, 104, 116], [128522]] >>> unicode_encode(input, 'UTF-8') ['G\xc3\xb6\xc3\xb6dnight', '\xf0\x9f\x98\x8a'] ```
object unicode_encode(int input, string output_encoding, string errors, int replacement_char, string name)
Encodes each sequence of Unicode code points in `input` into a string. `result[i1...iN]` is the string formed by concatenating the Unicode
codepoints `input[1...iN, :]`, encoded using `output_encoding`.
Parameters
-
int
input - An `N+1` dimensional potentially ragged integer tensor with shape `[D1...DN, num_chars]`.
-
string
output_encoding - Unicode encoding that should be used to encode each codepoint sequence. Can be `"UTF-8"`, `"UTF-16-BE"`, or `"UTF-32-BE"`.
-
string
errors - Specifies the response when an invalid codepoint is encountered (optional). One of: * `'replace'`: Replace invalid codepoint with the `replacement_char`. (default) * `'ignore'`: Skip invalid codepoints. * `'strict'`: Raise an exception for any invalid codepoint.
-
int
replacement_char - The replacement character codepoint to be used in place of any invalid input when `errors='replace'`. Any valid unicode codepoint may be used. The default value is the default unicode replacement character which is 0xFFFD (U+65533).
-
string
name - A name for the operation (optional).
Returns
-
object
- A `N` dimensional `string` tensor with shape `[D1...DN]`. #### Example: ```python >>> input = [[71, 246, 246, 100, 110, 105, 103, 104, 116], [128522]] >>> unicode_encode(input, 'UTF-8') ['G\xc3\xb6\xc3\xb6dnight', '\xf0\x9f\x98\x8a'] ```
object unicode_encode(object input, string output_encoding, string errors, int replacement_char, string name)
Encodes each sequence of Unicode code points in `input` into a string. `result[i1...iN]` is the string formed by concatenating the Unicode
codepoints `input[1...iN, :]`, encoded using `output_encoding`.
Parameters
-
object
input - An `N+1` dimensional potentially ragged integer tensor with shape `[D1...DN, num_chars]`.
-
string
output_encoding - Unicode encoding that should be used to encode each codepoint sequence. Can be `"UTF-8"`, `"UTF-16-BE"`, or `"UTF-32-BE"`.
-
string
errors - Specifies the response when an invalid codepoint is encountered (optional). One of: * `'replace'`: Replace invalid codepoint with the `replacement_char`. (default) * `'ignore'`: Skip invalid codepoints. * `'strict'`: Raise an exception for any invalid codepoint.
-
int
replacement_char - The replacement character codepoint to be used in place of any invalid input when `errors='replace'`. Any valid unicode codepoint may be used. The default value is the default unicode replacement character which is 0xFFFD (U+65533).
-
string
name - A name for the operation (optional).
Returns
-
object
- A `N` dimensional `string` tensor with shape `[D1...DN]`. #### Example: ```python >>> input = [[71, 246, 246, 100, 110, 105, 103, 104, 116], [128522]] >>> unicode_encode(input, 'UTF-8') ['G\xc3\xb6\xc3\xb6dnight', '\xf0\x9f\x98\x8a'] ```
object unicode_encode(IGraphNodeBase input, string output_encoding, string errors, int replacement_char, string name)
Encodes each sequence of Unicode code points in `input` into a string. `result[i1...iN]` is the string formed by concatenating the Unicode
codepoints `input[1...iN, :]`, encoded using `output_encoding`.
Parameters
-
IGraphNodeBase
input - An `N+1` dimensional potentially ragged integer tensor with shape `[D1...DN, num_chars]`.
-
string
output_encoding - Unicode encoding that should be used to encode each codepoint sequence. Can be `"UTF-8"`, `"UTF-16-BE"`, or `"UTF-32-BE"`.
-
string
errors - Specifies the response when an invalid codepoint is encountered (optional). One of: * `'replace'`: Replace invalid codepoint with the `replacement_char`. (default) * `'ignore'`: Skip invalid codepoints. * `'strict'`: Raise an exception for any invalid codepoint.
-
int
replacement_char - The replacement character codepoint to be used in place of any invalid input when `errors='replace'`. Any valid unicode codepoint may be used. The default value is the default unicode replacement character which is 0xFFFD (U+65533).
-
string
name - A name for the operation (optional).
Returns
-
object
- A `N` dimensional `string` tensor with shape `[D1...DN]`. #### Example: ```python >>> input = [[71, 246, 246, 100, 110, 105, 103, 104, 116], [128522]] >>> unicode_encode(input, 'UTF-8') ['G\xc3\xb6\xc3\xb6dnight', '\xf0\x9f\x98\x8a'] ```
object unicode_encode_dyn(object input, object output_encoding, ImplicitContainer<T> errors, ImplicitContainer<T> replacement_char, object name)
Encodes each sequence of Unicode code points in `input` into a string. `result[i1...iN]` is the string formed by concatenating the Unicode
codepoints `input[1...iN, :]`, encoded using `output_encoding`.
Parameters
-
object
input - An `N+1` dimensional potentially ragged integer tensor with shape `[D1...DN, num_chars]`.
-
object
output_encoding - Unicode encoding that should be used to encode each codepoint sequence. Can be `"UTF-8"`, `"UTF-16-BE"`, or `"UTF-32-BE"`.
-
ImplicitContainer<T>
errors - Specifies the response when an invalid codepoint is encountered (optional). One of: * `'replace'`: Replace invalid codepoint with the `replacement_char`. (default) * `'ignore'`: Skip invalid codepoints. * `'strict'`: Raise an exception for any invalid codepoint.
-
ImplicitContainer<T>
replacement_char - The replacement character codepoint to be used in place of any invalid input when `errors='replace'`. Any valid unicode codepoint may be used. The default value is the default unicode replacement character which is 0xFFFD (U+65533).
-
object
name - A name for the operation (optional).
Returns
-
object
- A `N` dimensional `string` tensor with shape `[D1...DN]`. #### Example: ```python >>> input = [[71, 246, 246, 100, 110, 105, 103, 104, 116], [128522]] >>> unicode_encode(input, 'UTF-8') ['G\xc3\xb6\xc3\xb6dnight', '\xf0\x9f\x98\x8a'] ```
Tensor unicode_script(IGraphNodeBase input, string name)
Determine the script codes of a given tensor of Unicode integer code points. This operation converts Unicode code points to script codes corresponding to
each code point. Script codes correspond to International Components for
Unicode (ICU) UScriptCode values. See http://icu-project.org/apiref/icu4c/uscript_8h.html.
Returns -1 (USCRIPT_INVALID_CODE) for invalid codepoints. Output shape will
match input shape.
Parameters
-
IGraphNodeBase
input - A `Tensor` of type `int32`. A Tensor of int32 Unicode code points.
-
string
name - A name for the operation (optional).
Returns
-
Tensor
- A `Tensor` of type `int32`.
object unicode_script_dyn(object input, object name)
Determine the script codes of a given tensor of Unicode integer code points. This operation converts Unicode code points to script codes corresponding to
each code point. Script codes correspond to International Components for
Unicode (ICU) UScriptCode values. See http://icu-project.org/apiref/icu4c/uscript_8h.html.
Returns -1 (USCRIPT_INVALID_CODE) for invalid codepoints. Output shape will
match input shape.
Parameters
-
object
input - A `Tensor` of type `int32`. A Tensor of int32 Unicode code points.
-
object
name - A name for the operation (optional).
Returns
-
object
- A `Tensor` of type `int32`.
object unicode_split(IGraphNodeBase input, string input_encoding, string errors, int replacement_char, string name)
Splits each string in `input` into a sequence of Unicode code points. `result[i1...iN, j]` is the substring of `input[i1...iN]` that encodes its
`j`th character, when decoded using `input_encoding`.
Parameters
-
IGraphNodeBase
input - An `N` dimensional potentially ragged `string` tensor with shape `[D1...DN]`. `N` must be statically known.
-
string
input_encoding - String name for the unicode encoding that should be used to decode each string.
-
string
errors - Specifies the response when an input string can't be converted using the indicated encoding. One of: * `'strict'`: Raise an exception for any illegal substrings. * `'replace'`: Replace illegal substrings with `replacement_char`. * `'ignore'`: Skip illegal substrings.
-
int
replacement_char - The replacement codepoint to be used in place of invalid substrings in `input` when `errors='replace'`.
-
string
name - A name for the operation (optional).
Returns
-
object
- A `N+1` dimensional `int32` tensor with shape `[D1...DN, (num_chars)]`.
The returned tensor is a
tf.Tensor
if `input` is a scalar, or atf.RaggedTensor
otherwise. #### Example: ```python >>> input = [s.encode('utf8') for s in (u'G\xf6\xf6dnight', u'\U0001f60a')] >>> tf.strings.unicode_split(input, 'UTF-8').tolist() [['G', '\xc3\xb6', '\xc3\xb6', 'd', 'n', 'i', 'g', 'h', 't'], ['\xf0\x9f\x98\x8a']] ```
object unicode_split(IEnumerable<Byte[]> input, string input_encoding, string errors, int replacement_char, string name)
Splits each string in `input` into a sequence of Unicode code points. `result[i1...iN, j]` is the substring of `input[i1...iN]` that encodes its
`j`th character, when decoded using `input_encoding`.
Parameters
-
IEnumerable<Byte[]>
input - An `N` dimensional potentially ragged `string` tensor with shape `[D1...DN]`. `N` must be statically known.
-
string
input_encoding - String name for the unicode encoding that should be used to decode each string.
-
string
errors - Specifies the response when an input string can't be converted using the indicated encoding. One of: * `'strict'`: Raise an exception for any illegal substrings. * `'replace'`: Replace illegal substrings with `replacement_char`. * `'ignore'`: Skip illegal substrings.
-
int
replacement_char - The replacement codepoint to be used in place of invalid substrings in `input` when `errors='replace'`.
-
string
name - A name for the operation (optional).
Returns
-
object
- A `N+1` dimensional `int32` tensor with shape `[D1...DN, (num_chars)]`.
The returned tensor is a
tf.Tensor
if `input` is a scalar, or atf.RaggedTensor
otherwise. #### Example: ```python >>> input = [s.encode('utf8') for s in (u'G\xf6\xf6dnight', u'\U0001f60a')] >>> tf.strings.unicode_split(input, 'UTF-8').tolist() [['G', '\xc3\xb6', '\xc3\xb6', 'd', 'n', 'i', 'g', 'h', 't'], ['\xf0\x9f\x98\x8a']] ```
object unicode_split(ndarray input, string input_encoding, string errors, int replacement_char, string name)
Splits each string in `input` into a sequence of Unicode code points. `result[i1...iN, j]` is the substring of `input[i1...iN]` that encodes its
`j`th character, when decoded using `input_encoding`.
Parameters
-
ndarray
input - An `N` dimensional potentially ragged `string` tensor with shape `[D1...DN]`. `N` must be statically known.
-
string
input_encoding - String name for the unicode encoding that should be used to decode each string.
-
string
errors - Specifies the response when an input string can't be converted using the indicated encoding. One of: * `'strict'`: Raise an exception for any illegal substrings. * `'replace'`: Replace illegal substrings with `replacement_char`. * `'ignore'`: Skip illegal substrings.
-
int
replacement_char - The replacement codepoint to be used in place of invalid substrings in `input` when `errors='replace'`.
-
string
name - A name for the operation (optional).
Returns
-
object
- A `N+1` dimensional `int32` tensor with shape `[D1...DN, (num_chars)]`.
The returned tensor is a
tf.Tensor
if `input` is a scalar, or atf.RaggedTensor
otherwise. #### Example: ```python >>> input = [s.encode('utf8') for s in (u'G\xf6\xf6dnight', u'\U0001f60a')] >>> tf.strings.unicode_split(input, 'UTF-8').tolist() [['G', '\xc3\xb6', '\xc3\xb6', 'd', 'n', 'i', 'g', 'h', 't'], ['\xf0\x9f\x98\x8a']] ```
object unicode_split_dyn(object input, object input_encoding, ImplicitContainer<T> errors, ImplicitContainer<T> replacement_char, object name)
Splits each string in `input` into a sequence of Unicode code points. `result[i1...iN, j]` is the substring of `input[i1...iN]` that encodes its
`j`th character, when decoded using `input_encoding`.
Parameters
-
object
input - An `N` dimensional potentially ragged `string` tensor with shape `[D1...DN]`. `N` must be statically known.
-
object
input_encoding - String name for the unicode encoding that should be used to decode each string.
-
ImplicitContainer<T>
errors - Specifies the response when an input string can't be converted using the indicated encoding. One of: * `'strict'`: Raise an exception for any illegal substrings. * `'replace'`: Replace illegal substrings with `replacement_char`. * `'ignore'`: Skip illegal substrings.
-
ImplicitContainer<T>
replacement_char - The replacement codepoint to be used in place of invalid substrings in `input` when `errors='replace'`.
-
object
name - A name for the operation (optional).
Returns
-
object
- A `N+1` dimensional `int32` tensor with shape `[D1...DN, (num_chars)]`.
The returned tensor is a
tf.Tensor
if `input` is a scalar, or atf.RaggedTensor
otherwise. #### Example: ```python >>> input = [s.encode('utf8') for s in (u'G\xf6\xf6dnight', u'\U0001f60a')] >>> tf.strings.unicode_split(input, 'UTF-8').tolist() [['G', '\xc3\xb6', '\xc3\xb6', 'd', 'n', 'i', 'g', 'h', 't'], ['\xf0\x9f\x98\x8a']] ```
ValueTuple<object, object> unicode_split_with_offsets(IEnumerable<Byte[]> input, string input_encoding, string errors, int replacement_char, string name)
Splits each string into a sequence of code points with start offsets. This op is similar to `tf.strings.decode(...)`, but it also returns the
start offset for each character in its respective string. This information
can be used to align the characters with the original byte sequence. Returns a tuple `(chars, start_offsets)` where: * `chars[i1...iN, j]` is the substring of `input[i1...iN]` that encodes its
`j`th character, when decoded using `input_encoding`.
* `start_offsets[i1...iN, j]` is the start byte offset for the `j`th
character in `input[i1...iN]`, when decoded using `input_encoding`.
Parameters
-
IEnumerable<Byte[]>
input - An `N` dimensional potentially ragged `string` tensor with shape `[D1...DN]`. `N` must be statically known.
-
string
input_encoding - String name for the unicode encoding that should be used to decode each string.
-
string
errors - Specifies the response when an input string can't be converted using the indicated encoding. One of: * `'strict'`: Raise an exception for any illegal substrings. * `'replace'`: Replace illegal substrings with `replacement_char`. * `'ignore'`: Skip illegal substrings.
-
int
replacement_char - The replacement codepoint to be used in place of invalid substrings in `input` when `errors='replace'`.
-
string
name - A name for the operation (optional).
Returns
-
ValueTuple<object, object>
- A tuple of `N+1` dimensional tensors `(codepoints, start_offsets)`. * `codepoints` is an `int32` tensor with shape `[D1...DN, (num_chars)]`.
* `offsets` is an `int64` tensor with shape `[D1...DN, (num_chars)]`. The returned tensors are
tf.Tensor
s if `input` is a scalar, ortf.RaggedTensor
s otherwise. #### Example: ```python >>> input = [s.encode('utf8') for s in (u'G\xf6\xf6dnight', u'\U0001f60a')] >>> result = tf.strings.unicode_split_with_offsets(input, 'UTF-8') >>> result[0].tolist() # character substrings [['G', '\xc3\xb6', '\xc3\xb6', 'd', 'n', 'i', 'g', 'h', 't'], ['\xf0\x9f\x98\x8a']] >>> result[1].tolist() # offsets [[0, 1, 3, 5, 6, 7, 8, 9, 10], [0]] ```
ValueTuple<object, object> unicode_split_with_offsets(IGraphNodeBase input, string input_encoding, string errors, int replacement_char, string name)
Splits each string into a sequence of code points with start offsets. This op is similar to `tf.strings.decode(...)`, but it also returns the
start offset for each character in its respective string. This information
can be used to align the characters with the original byte sequence. Returns a tuple `(chars, start_offsets)` where: * `chars[i1...iN, j]` is the substring of `input[i1...iN]` that encodes its
`j`th character, when decoded using `input_encoding`.
* `start_offsets[i1...iN, j]` is the start byte offset for the `j`th
character in `input[i1...iN]`, when decoded using `input_encoding`.
Parameters
-
IGraphNodeBase
input - An `N` dimensional potentially ragged `string` tensor with shape `[D1...DN]`. `N` must be statically known.
-
string
input_encoding - String name for the unicode encoding that should be used to decode each string.
-
string
errors - Specifies the response when an input string can't be converted using the indicated encoding. One of: * `'strict'`: Raise an exception for any illegal substrings. * `'replace'`: Replace illegal substrings with `replacement_char`. * `'ignore'`: Skip illegal substrings.
-
int
replacement_char - The replacement codepoint to be used in place of invalid substrings in `input` when `errors='replace'`.
-
string
name - A name for the operation (optional).
Returns
-
ValueTuple<object, object>
- A tuple of `N+1` dimensional tensors `(codepoints, start_offsets)`. * `codepoints` is an `int32` tensor with shape `[D1...DN, (num_chars)]`.
* `offsets` is an `int64` tensor with shape `[D1...DN, (num_chars)]`. The returned tensors are
tf.Tensor
s if `input` is a scalar, ortf.RaggedTensor
s otherwise. #### Example: ```python >>> input = [s.encode('utf8') for s in (u'G\xf6\xf6dnight', u'\U0001f60a')] >>> result = tf.strings.unicode_split_with_offsets(input, 'UTF-8') >>> result[0].tolist() # character substrings [['G', '\xc3\xb6', '\xc3\xb6', 'd', 'n', 'i', 'g', 'h', 't'], ['\xf0\x9f\x98\x8a']] >>> result[1].tolist() # offsets [[0, 1, 3, 5, 6, 7, 8, 9, 10], [0]] ```
object unicode_split_with_offsets_dyn(object input, object input_encoding, ImplicitContainer<T> errors, ImplicitContainer<T> replacement_char, object name)
Splits each string into a sequence of code points with start offsets. This op is similar to `tf.strings.decode(...)`, but it also returns the
start offset for each character in its respective string. This information
can be used to align the characters with the original byte sequence. Returns a tuple `(chars, start_offsets)` where: * `chars[i1...iN, j]` is the substring of `input[i1...iN]` that encodes its
`j`th character, when decoded using `input_encoding`.
* `start_offsets[i1...iN, j]` is the start byte offset for the `j`th
character in `input[i1...iN]`, when decoded using `input_encoding`.
Parameters
-
object
input - An `N` dimensional potentially ragged `string` tensor with shape `[D1...DN]`. `N` must be statically known.
-
object
input_encoding - String name for the unicode encoding that should be used to decode each string.
-
ImplicitContainer<T>
errors - Specifies the response when an input string can't be converted using the indicated encoding. One of: * `'strict'`: Raise an exception for any illegal substrings. * `'replace'`: Replace illegal substrings with `replacement_char`. * `'ignore'`: Skip illegal substrings.
-
ImplicitContainer<T>
replacement_char - The replacement codepoint to be used in place of invalid substrings in `input` when `errors='replace'`.
-
object
name - A name for the operation (optional).
Returns
-
object
- A tuple of `N+1` dimensional tensors `(codepoints, start_offsets)`. * `codepoints` is an `int32` tensor with shape `[D1...DN, (num_chars)]`.
* `offsets` is an `int64` tensor with shape `[D1...DN, (num_chars)]`. The returned tensors are
tf.Tensor
s if `input` is a scalar, ortf.RaggedTensor
s otherwise. #### Example: ```python >>> input = [s.encode('utf8') for s in (u'G\xf6\xf6dnight', u'\U0001f60a')] >>> result = tf.strings.unicode_split_with_offsets(input, 'UTF-8') >>> result[0].tolist() # character substrings [['G', '\xc3\xb6', '\xc3\xb6', 'd', 'n', 'i', 'g', 'h', 't'], ['\xf0\x9f\x98\x8a']] >>> result[1].tolist() # offsets [[0, 1, 3, 5, 6, 7, 8, 9, 10], [0]] ```
Tensor unicode_transcode(IGraphNodeBase input, string input_encoding, string output_encoding, string errors, int replacement_char, bool replace_control_characters, string name)
Transcode the input text from a source encoding to a destination encoding. The input is a string tensor of any shape. The output is a string tensor of
the same shape containing the transcoded strings. Output strings are always
valid unicode. If the input contains invalid encoding positions, the
`errors` attribute sets the policy for how to deal with them. If the default
error-handling policy is used, invalid formatting will be substituted in the
output by the `replacement_char`. If the errors policy is to `ignore`, any
invalid encoding positions in the input are skipped and not included in the
output. If it set to `strict` then any invalid formatting will result in an
InvalidArgument error. This operation can be used with `output_encoding = input_encoding` to enforce
correct formatting for inputs even if they are already in the desired encoding. If the input is prefixed by a Byte Order Mark needed to determine encoding
(e.g. if the encoding is UTF-16 and the BOM indicates big-endian), then that
BOM will be consumed and not emitted into the output. If the input encoding
is marked with an explicit endianness (e.g. UTF-16-BE), then the BOM is
interpreted as a non-breaking-space and is preserved in the output (including
always for UTF-8). The end result is that if the input is marked as an explicit endianness the
transcoding is faithful to all codepoints in the source. If it is not marked
with an explicit endianness, the BOM is not considered part of the string itself
but as metadata, and so is not preserved in the output.
Parameters
-
IGraphNodeBase
input - A `Tensor` of type `string`. The text to be processed. Can have any shape.
-
string
input_encoding - A `string`. Text encoding of the input strings. This is any of the encodings supported by ICU ucnv algorithmic converters. Examples: `"UTF-16", "US ASCII", "UTF-8"`.
-
string
output_encoding - A `string` from: `"UTF-8", "UTF-16-BE", "UTF-32-BE"`. The unicode encoding to use in the output. Must be one of `"UTF-8", "UTF-16-BE", "UTF-32-BE"`. Multi-byte encodings will be big-endian.
-
string
errors - An optional `string` from: `"strict", "replace", "ignore"`. Defaults to `"replace"`. Error handling policy when there is invalid formatting found in the input. The value of 'strict' will cause the operation to produce a InvalidArgument error on any invalid input formatting. A value of 'replace' (the default) will cause the operation to replace any invalid formatting in the input with the `replacement_char` codepoint. A value of 'ignore' will cause the operation to skip any invalid formatting in the input and produce no corresponding output character.
-
int
replacement_char - An optional `int`. Defaults to `65533`. The replacement character codepoint to be used in place of any invalid formatting in the input when `errors='replace'`. Any valid unicode codepoint may be used. The default value is the default unicode replacement character is 0xFFFD or U+65533.) Note that for UTF-8, passing a replacement character expressible in 1 byte, such as ' ', will preserve string alignment to the source since invalid bytes will be replaced with a 1-byte replacement. For UTF-16-BE and UTF-16-LE, any 1 or 2 byte replacement character will preserve byte alignment to the source.
-
bool
replace_control_characters - An optional `bool`. Defaults to `False`. Whether to replace the C0 control characters (00-1F) with the `replacement_char`. Default is false.
-
string
name - A name for the operation (optional).
Returns
-
Tensor
- A `Tensor` of type `string`.
object unicode_transcode_dyn(object input, object input_encoding, object output_encoding, ImplicitContainer<T> errors, ImplicitContainer<T> replacement_char, ImplicitContainer<T> replace_control_characters, object name)
Transcode the input text from a source encoding to a destination encoding. The input is a string tensor of any shape. The output is a string tensor of
the same shape containing the transcoded strings. Output strings are always
valid unicode. If the input contains invalid encoding positions, the
`errors` attribute sets the policy for how to deal with them. If the default
error-handling policy is used, invalid formatting will be substituted in the
output by the `replacement_char`. If the errors policy is to `ignore`, any
invalid encoding positions in the input are skipped and not included in the
output. If it set to `strict` then any invalid formatting will result in an
InvalidArgument error. This operation can be used with `output_encoding = input_encoding` to enforce
correct formatting for inputs even if they are already in the desired encoding. If the input is prefixed by a Byte Order Mark needed to determine encoding
(e.g. if the encoding is UTF-16 and the BOM indicates big-endian), then that
BOM will be consumed and not emitted into the output. If the input encoding
is marked with an explicit endianness (e.g. UTF-16-BE), then the BOM is
interpreted as a non-breaking-space and is preserved in the output (including
always for UTF-8). The end result is that if the input is marked as an explicit endianness the
transcoding is faithful to all codepoints in the source. If it is not marked
with an explicit endianness, the BOM is not considered part of the string itself
but as metadata, and so is not preserved in the output.
Parameters
-
object
input - A `Tensor` of type `string`. The text to be processed. Can have any shape.
-
object
input_encoding - A `string`. Text encoding of the input strings. This is any of the encodings supported by ICU ucnv algorithmic converters. Examples: `"UTF-16", "US ASCII", "UTF-8"`.
-
object
output_encoding - A `string` from: `"UTF-8", "UTF-16-BE", "UTF-32-BE"`. The unicode encoding to use in the output. Must be one of `"UTF-8", "UTF-16-BE", "UTF-32-BE"`. Multi-byte encodings will be big-endian.
-
ImplicitContainer<T>
errors - An optional `string` from: `"strict", "replace", "ignore"`. Defaults to `"replace"`. Error handling policy when there is invalid formatting found in the input. The value of 'strict' will cause the operation to produce a InvalidArgument error on any invalid input formatting. A value of 'replace' (the default) will cause the operation to replace any invalid formatting in the input with the `replacement_char` codepoint. A value of 'ignore' will cause the operation to skip any invalid formatting in the input and produce no corresponding output character.
-
ImplicitContainer<T>
replacement_char - An optional `int`. Defaults to `65533`. The replacement character codepoint to be used in place of any invalid formatting in the input when `errors='replace'`. Any valid unicode codepoint may be used. The default value is the default unicode replacement character is 0xFFFD or U+65533.) Note that for UTF-8, passing a replacement character expressible in 1 byte, such as ' ', will preserve string alignment to the source since invalid bytes will be replaced with a 1-byte replacement. For UTF-16-BE and UTF-16-LE, any 1 or 2 byte replacement character will preserve byte alignment to the source.
-
ImplicitContainer<T>
replace_control_characters - An optional `bool`. Defaults to `False`. Whether to replace the C0 control characters (00-1F) with the `replacement_char`. Default is false.
-
object
name - A name for the operation (optional).
Returns
-
object
- A `Tensor` of type `string`.
Tensor unsorted_segment_join(IGraphNodeBase inputs, IGraphNodeBase segment_ids, IGraphNodeBase num_segments, string separator, string name)
Joins the elements of `inputs` based on `segment_ids`. Computes the string join along segments of a tensor.
Given `segment_ids` with rank `N` and `data` with rank `N+M`: `output[i, k1...kM] = strings.join([data[j1...jN, k1...kM])` where the join is over all [j1...jN] such that segment_ids[j1...jN] = i.
Strings are joined in row-major order.
Parameters
-
IGraphNodeBase
inputs - A `Tensor` of type `string`. The input to be joined.
-
IGraphNodeBase
segment_ids - A `Tensor`. Must be one of the following types: `int32`, `int64`. A tensor whose shape is a prefix of data.shape. Negative segment ids are not supported.
-
IGraphNodeBase
num_segments - A `Tensor`. Must be one of the following types: `int32`, `int64`. A scalar.
-
string
separator - An optional `string`. Defaults to `""`. The separator to use when joining.
-
string
name - A name for the operation (optional).
Returns
-
Tensor
- A `Tensor` of type `string`.
Show Example
inputs = [['Y', 'q', 'c'], ['Y', '6', '6'], ['p', 'G', 'a']] output_array = string_ops.unsorted_segment_join(inputs=inputs, segment_ids=[1, 0, 1], num_segments=2, separator=':')) # output_array ==> [['Y', '6', '6'], ['Y:p', 'q:G', 'c:a']] inputs = ['this', 'is', 'a', 'test'] output_array = string_ops.unsorted_segment_join(inputs=inputs, segment_ids=[0, 0, 0, 0], num_segments=1, separator=':')) # output_array ==> ['this:is:a:test']
object unsorted_segment_join_dyn(object inputs, object segment_ids, object num_segments, ImplicitContainer<T> separator, object name)
Joins the elements of `inputs` based on `segment_ids`. Computes the string join along segments of a tensor.
Given `segment_ids` with rank `N` and `data` with rank `N+M`: `output[i, k1...kM] = strings.join([data[j1...jN, k1...kM])` where the join is over all [j1...jN] such that segment_ids[j1...jN] = i.
Strings are joined in row-major order.
Parameters
-
object
inputs - A `Tensor` of type `string`. The input to be joined.
-
object
segment_ids - A `Tensor`. Must be one of the following types: `int32`, `int64`. A tensor whose shape is a prefix of data.shape. Negative segment ids are not supported.
-
object
num_segments - A `Tensor`. Must be one of the following types: `int32`, `int64`. A scalar.
-
ImplicitContainer<T>
separator - An optional `string`. Defaults to `""`. The separator to use when joining.
-
object
name - A name for the operation (optional).
Returns
-
object
- A `Tensor` of type `string`.
Show Example
inputs = [['Y', 'q', 'c'], ['Y', '6', '6'], ['p', 'G', 'a']] output_array = string_ops.unsorted_segment_join(inputs=inputs, segment_ids=[1, 0, 1], num_segments=2, separator=':')) # output_array ==> [['Y', '6', '6'], ['Y:p', 'q:G', 'c:a']] inputs = ['this', 'is', 'a', 'test'] output_array = string_ops.unsorted_segment_join(inputs=inputs, segment_ids=[0, 0, 0, 0], num_segments=1, separator=':')) # output_array ==> ['this:is:a:test']
Tensor upper(IGraphNodeBase input, string encoding, string name)
TODO: add doc.
Parameters
-
IGraphNodeBase
input - A `Tensor` of type `string`.
-
string
encoding - An optional `string`. Defaults to `""`.
-
string
name - A name for the operation (optional).
Returns
-
Tensor
- A `Tensor` of type `string`.
object upper_dyn(object input, ImplicitContainer<T> encoding, object name)
TODO: add doc.
Parameters
-
object
input - A `Tensor` of type `string`.
-
ImplicitContainer<T>
encoding - An optional `string`. Defaults to `""`.
-
object
name - A name for the operation (optional).
Returns
-
object
- A `Tensor` of type `string`.