Type Nadam
Namespace tensorflow.keras.optimizers
Parent Optimizer
Interfaces INadam
Optimizer that implements the NAdam algorithm. Much like Adam is essentially RMSprop with momentum, Nadam is Adam with
Nesterov momentum. Initialization: $$m_0 := 0 \text{(Initialize 1st moment vector)}$$
$$v_0 := 0 \text{(Initialize 2nd moment vector)}$$
$$mu_0 := 1$$
$$t := 0 \text{(Initialize timestep)}$$ Computes:
$$t := t + 1$$
$$\mu_t := \beta_1 * (1 - 0.5 * 0.96^{0.004 * t})$$
$$g' := g / (1 - \prod_{i=1}^{t}{\mu_i})$$
$$m_t := \beta_1 * m_{t-1} + (1 - \beta_1) * g$$
$$m' := m_t / (1 - \prod_{i=1}^{t+1}{\mu_i})$$
$$v_t := \beta_2 * v_{t-1} + (1 - \beta_2) * g * g$$
$$v' := v_t / (1 - \beta_2^t)$$
$$\bar{m} := (1 - \mu_t) * g' + \mu_{t+1} * m'$$
$$\theta_t := \theta_{t-1} - lr * \bar{m} / (\sqrt{v'} + \epsilon)$$ gradient is evaluated at theta(t) + momentum * v(t), and the variables always
store theta + beta_1 * m / sqrt(v) instead of theta. References
See [Dozat, T., 2015](http://cs229.stanford.edu/proj2015/054_report.pdf).