Type KMeans
Namespace tensorflow.contrib.factorization
Parent PythonObjectContainer
Interfaces IKMeans
Creates the graph for kmeans clustering.
Methods
Properties
Public instance methods
ValueTuple<object, object, object, object, object, object> training_graph()
Generate a training graph for kmeans algorithm. This returns, among other things, an op that chooses initial centers
(init_op), a boolean variable that is set to True when the initial centers
are chosen (cluster_centers_initialized), and an op to perform either an
entire Lloyd iteration or a minibatch of a Lloyd iteration (training_op).
The caller should use these components as follows. A single worker should
execute init_op multiple times until cluster_centers_initialized becomes
True. Then multiple workers may execute training_op any number of times.
Returns

ValueTuple<object, object, object, object, object, object>
 A tuple consisting of:
object training_graph_dyn()
Generate a training graph for kmeans algorithm. This returns, among other things, an op that chooses initial centers
(init_op), a boolean variable that is set to True when the initial centers
are chosen (cluster_centers_initialized), and an op to perform either an
entire Lloyd iteration or a minibatch of a Lloyd iteration (training_op).
The caller should use these components as follows. A single worker should
execute init_op multiple times until cluster_centers_initialized becomes
True. Then multiple workers may execute training_op any number of times.
Returns

object
 A tuple consisting of:
Public static methods
KMeans NewDyn(object inputs, object num_clusters, ImplicitContainer<T> initial_clusters, ImplicitContainer<T> distance_metric, ImplicitContainer<T> use_mini_batch, ImplicitContainer<T> mini_batch_steps_per_iteration, ImplicitContainer<T> random_seed, ImplicitContainer<T> kmeans_plus_plus_num_retries, ImplicitContainer<T> kmc2_chain_length)
Creates an object for generating KMeans clustering graph. This class implements the following variants of Kmeans algorithm: If use_mini_batch is False, it runs standard full batch Kmeans. Each step
runs a single iteration of KMeans. This step can be run sharded across
multiple workers by passing a list of sharded inputs to this class. Note
however that a single step needs to process the full input at once. If use_mini_batch is True, it runs a generalization of the minibatch
Kmeans algorithm. It runs multiple iterations, where each iteration is
composed of mini_batch_steps_per_iteration steps. Two copies of cluster
centers are maintained: one that is updated at the end of each iteration,
and one that is updated every step. The first copy is used to compute
cluster allocations for each step, and for inference, while the second copy
is the one updated each step using the minibatch update rule. After each
iteration is complete, this second copy is copied back the first copy. Note that for use_mini_batch=True, when mini_batch_steps_per_iteration=1,
the algorithm reduces to the standard minibatch algorithm. Also by setting
mini_batch_steps_per_iteration = num_inputs / batch_size, the algorithm
becomes an asynchronous version of the fullbatch algorithm. Note however
that there is no guarantee by this implementation that each input is seen
exactly once per iteration. Also, different updates are applied
asynchronously without locking. So this asynchronous version may not behave
exactly like a fullbatch version.
Parameters

object
inputs  An input tensor or list of input tensors. It is assumed that the data points have been previously randomly permuted.

object
num_clusters  An integer tensor specifying the number of clusters. This argument is ignored if initial_clusters is a tensor or numpy array.

ImplicitContainer<T>
initial_clusters  Specifies the clusters used during initialization. One of the following:  a tensor or numpy array with the initial cluster centers.  a function f(inputs, k) that returns up to k centers from `inputs`.  "random": Choose centers randomly from `inputs`.  "kmeans_plus_plus": Use kmeans++ to choose centers from `inputs`.  "kmc2": Use the fast kMC2 algorithm to choose centers from `inputs`. In the last three cases, one batch of `inputs` may not yield `num_clusters` centers, in which case initialization will require multiple batches until enough centers are chosen. In the case of "random" or "kmeans_plus_plus", if the input size is <= `num_clusters` then the entire batch is chosen to be cluster centers.

ImplicitContainer<T>
distance_metric  Distance metric used for clustering. Supported options: "squared_euclidean", "cosine".

ImplicitContainer<T>
use_mini_batch  If true, use the minibatch kmeans algorithm. Else assume full batch.

ImplicitContainer<T>
mini_batch_steps_per_iteration  Number of steps after which the updated cluster centers are synced back to a master copy.

ImplicitContainer<T>
random_seed  Seed for PRNG used to initialize seeds.

ImplicitContainer<T>
kmeans_plus_plus_num_retries  For each point that is sampled during kmeans++ initialization, this parameter specifies the number of additional points to draw from the current distribution before selecting the best. If a negative value is specified, a heuristic is used to sample O(log(num_to_sample)) additional points.

ImplicitContainer<T>
kmc2_chain_length  Determines how many candidate points are used by the kMC2 algorithm to produce one new cluster centers. If a (mini)batch contains less points, one new cluster center is generated from the (mini)batch.