Type SGD
Namespace tensorflow.keras.optimizers
Parent Optimizer
Interfaces ISGD
Stochastic gradient descent and momentum optimizer. Computes:
```
theta(t+1) = theta(t) - learning_rate * gradient
gradient is evaluated at theta(t).
``` or Computes (if `nesterov = False`):
```
v(t+1) = momentum * v(t) - learning_rate * gradient
theta(t+1) = theta(t) + v(t+1)
if `nesterov` is False, gradient is evaluated at theta(t).
if `nesterov` is True, gradient is evaluated at theta(t) + momentum * v(t),
and the variables always store theta + m v instead of theta
``` Some of the args below are hyperparameters, where a hyperparameter is
defined as a scalar Tensor, a regular Python value, or a callable (which
will be evaluated when `apply_gradients` is called) returning a scalar
Tensor or a Python value. # References
nesterov = True, See [Sutskever et al., 2013](
http://jmlr.org/proceedings/papers/v28/sutskever13.pdf).