Type tf.keras.datasets.reuters
Namespace tensorflow
Public static methods
ValueTuple<object, object> load_data(string path, object num_words, int skip_top, object maxlen, double test_split, int seed, int start_char, int oov_char, int index_from, IDictionary<string, object> kwargs)
Loads the Reuters newswire classification dataset.
Parameters
-
stringpath - where to cache the data (relative to `~/.keras/dataset`).
-
objectnum_words - max number of words to include. Words are ranked by how often they occur (in the training set) and only the most frequent words are kept
-
intskip_top - skip the top N most frequently occurring words (which may not be informative).
-
objectmaxlen - truncate sequences after this length.
-
doubletest_split - Fraction of the dataset to be used as test data.
-
intseed - random seed for sample shuffling.
-
intstart_char - The start of a sequence will be marked with this character. Set to 1 because 0 is usually the padding character.
-
intoov_char - words that were cut out because of the `num_words` or `skip_top` limit will be replaced with this character.
-
intindex_from - index actual words with this index and higher.
-
IDictionary<string, object>kwargs - Used for backwards compatibility.
Returns
-
ValueTuple<object, object> - Tuple of Numpy arrays: `(x_train, y_train), (x_test, y_test)`. Note that the 'out of vocabulary' character is only used for words that were present in the training set but are not included because they're not making the `num_words` cut here. Words that were not seen in the training set but are in the test set have simply been skipped.
object load_data_dyn(ImplicitContainer<T> path, object num_words, ImplicitContainer<T> skip_top, object maxlen, ImplicitContainer<T> test_split, ImplicitContainer<T> seed, ImplicitContainer<T> start_char, ImplicitContainer<T> oov_char, ImplicitContainer<T> index_from, IDictionary<string, object> kwargs)
Loads the Reuters newswire classification dataset.
Parameters
-
ImplicitContainer<T>path - where to cache the data (relative to `~/.keras/dataset`).
-
objectnum_words - max number of words to include. Words are ranked by how often they occur (in the training set) and only the most frequent words are kept
-
ImplicitContainer<T>skip_top - skip the top N most frequently occurring words (which may not be informative).
-
objectmaxlen - truncate sequences after this length.
-
ImplicitContainer<T>test_split - Fraction of the dataset to be used as test data.
-
ImplicitContainer<T>seed - random seed for sample shuffling.
-
ImplicitContainer<T>start_char - The start of a sequence will be marked with this character. Set to 1 because 0 is usually the padding character.
-
ImplicitContainer<T>oov_char - words that were cut out because of the `num_words` or `skip_top` limit will be replaced with this character.
-
ImplicitContainer<T>index_from - index actual words with this index and higher.
-
IDictionary<string, object>kwargs - Used for backwards compatibility.
Returns
-
object - Tuple of Numpy arrays: `(x_train, y_train), (x_test, y_test)`. Note that the 'out of vocabulary' character is only used for words that were present in the training set but are not included because they're not making the `num_words` cut here. Words that were not seen in the training set but are in the test set have simply been skipped.