Type Adagrad
Namespace tensorflow.keras.optimizers
Parent Optimizer
Interfaces IAdagrad
Optimizer that implements the Adagrad algorithm. Adagrad is an optimizer with parameter-specific learning rates,
which are adapted relative to how frequently a parameter gets
updated during training. The more updates a parameter receives,
the smaller the updates. Initialization:
$$accum_{g_0} := \text{initial_accumulator_value}$$ Update step:
$$t := t + 1$$
$$accum_{g_t} := accum_{g_{t-1}} + g^2$$
$$\theta_t := \theta_{t-1} - lr * g / (\sqrt{accum_{g_t}} + \epsilon)$$ References: * [Paper](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf).
* [Introduction]
(https://ppasupat.github.io/a9online/uploads/proximal_notes.pdf).