# LostTech.TensorFlow : API Documentation

Type Nadam

Namespace tensorflow.keras.optimizers

Parent Optimizer

Interfaces INadam

Optimizer that implements the NAdam algorithm.

Much like Adam is essentially RMSprop with momentum, Nadam is Adam with Nesterov momentum.

Initialization:

$$m_0 := 0 \text{(Initialize 1st moment vector)}$$ $$v_0 := 0 \text{(Initialize 2nd moment vector)}$$ $$mu_0 := 1$$ $$t := 0 \text{(Initialize timestep)}$$

Computes: $$t := t + 1$$ $$\mu_t := \beta_1 * (1 - 0.5 * 0.96^{0.004 * t})$$ $$g' := g / (1 - \prod_{i=1}^{t}{\mu_i})$$ $$m_t := \beta_1 * m_{t-1} + (1 - \beta_1) * g$$ $$m' := m_t / (1 - \prod_{i=1}^{t+1}{\mu_i})$$ $$v_t := \beta_2 * v_{t-1} + (1 - \beta_2) * g * g$$ $$v' := v_t / (1 - \beta_2^t)$$ $$\bar{m} := (1 - \mu_t) * g' + \mu_{t+1} * m'$$ $$\theta_t := \theta_{t-1} - lr * \bar{m} / (\sqrt{v'} + \epsilon)$$

gradient is evaluated at theta(t) + momentum * v(t), and the variables always store theta + beta_1 * m / sqrt(v) instead of theta.

References See [Dozat, T., 2015](http://cs229.stanford.edu/proj2015/054_report.pdf).