Stochastic gradient descent and momentum optimizer.

Computes: ``` theta(t+1) = theta(t) - learning_rate * gradient gradient is evaluated at theta(t). ```

or Computes (if `nesterov = False`): ``` v(t+1) = momentum * v(t) - learning_rate * gradient theta(t+1) = theta(t) + v(t+1) if `nesterov` is False, gradient is evaluated at theta(t). if `nesterov` is True, gradient is evaluated at theta(t) + momentum * v(t), and the variables always store theta + m v instead of theta ```

Some of the args below are hyperparameters, where a hyperparameter is defined as a scalar Tensor, a regular Python value, or a callable (which will be evaluated when `apply_gradients` is called) returning a scalar Tensor or a Python value.

# References nesterov = True, See [Sutskever et al., 2013]( http://jmlr.org/proceedings/papers/v28/sutskever13.pdf).


Public properties

object clipnorm get; set;

object clipvalue get; set;

object iterations get; set;

object iterations_dyn get; set;

bool nesterov get; set;

object PythonObject get;

IList<object> weights get;

object weights_dyn get;