Type FtrlOptimizer
Namespace tensorflow.train
Parent Optimizer
Interfaces IFtrlOptimizer
Optimizer that implements the FTRL algorithm. See this [paper](
https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf).
This version has support for both online L2 (the L2 penalty given in the paper
above) and shrinkage-type L2 (which is the addition of an L2 penalty to the
loss function).
Methods
Properties
Public static methods
FtrlOptimizer NewDyn(object learning_rate, ImplicitContainer<T> learning_rate_power, ImplicitContainer<T> initial_accumulator_value, ImplicitContainer<T> l1_regularization_strength, ImplicitContainer<T> l2_regularization_strength, ImplicitContainer<T> use_locking, ImplicitContainer<T> name, object accum_name, object linear_name, ImplicitContainer<T> l2_shrinkage_regularization_strength)
Construct a new FTRL optimizer.
Parameters
-
object
learning_rate - A float value or a constant float `Tensor`.
-
ImplicitContainer<T>
learning_rate_power - A float value, must be less or equal to zero. Controls how the learning rate decreases during training. Use zero for a fixed learning rate. See section 3.1 in the [paper](https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf).
-
ImplicitContainer<T>
initial_accumulator_value - The starting value for accumulators. Only zero or positive values are allowed.
-
ImplicitContainer<T>
l1_regularization_strength - A float value, must be greater than or equal to zero.
-
ImplicitContainer<T>
l2_regularization_strength - A float value, must be greater than or equal to zero.
-
ImplicitContainer<T>
use_locking - If `True` use locks for update operations.
-
ImplicitContainer<T>
name - Optional name prefix for the operations created when applying gradients. Defaults to "Ftrl".
-
object
accum_name - The suffix for the variable that keeps the gradient squared accumulator. If not present, defaults to name.
-
object
linear_name - The suffix for the variable that keeps the linear gradient accumulator. If not present, defaults to name + "_1".
-
ImplicitContainer<T>
l2_shrinkage_regularization_strength - A float value, must be greater than or equal to zero. This differs from L2 above in that the L2 above is a stabilization penalty, whereas this L2 shrinkage is a magnitude penalty. The FTRL formulation can be written as: w_{t+1} = argmin_w(\hat{g}_{1:t}w + L1*||w||_1 + L2*||w||_2^2), where \hat{g} = g + (2*L2_shrinkage*w), and g is the gradient of the loss function w.r.t. the weights w. Specifically, in the absence of L1 regularization, it is equivalent to the following update rule: w_{t+1} = w_t - lr_t / (1 + 2*L2*lr_t) * g_t - 2*L2_shrinkage*lr_t / (1 + 2*L2*lr_t) * w_t where lr_t is the learning rate at t. When input is sparse shrinkage will only happen on the active weights.