Type LossScaleOptimizer
Namespace tensorflow.keras.mixed_precision.experimental
Parent Optimizer
Interfaces ILossScaleOptimizer
An optimizer that applies loss scaling. Loss scaling is a process that multiplies the loss by a multiplier called the
loss scale, and divides each gradient by the same multiplier. The pseudocode
for this process is: ```
loss =...
loss *= loss_scale
grads = gradients(loss, vars)
grads /= loss_scale
``` Mathematically, loss scaling has no effect, but can help avoid numerical
underflow in intermediate gradients when float16 tensors are used. By
multiplying the loss, each intermediate gradient will have the same multiplier
applied. The loss scale can either be a fixed constant, chosen by the user, or be
dynamically determined. Dynamically determining the loss scale is convenient
as a loss scale does not have to be explicitly chosen. However it reduces
performance. This optimizer wraps another optimizer and applies loss scaling to it via a
`LossScale`. Loss scaling is applied whenever gradients are
computed, either through `minimize()` or `get_gradients()`. The loss scale is
updated via `LossScale.update()` whenever gradients are applied, either
through `minimize()` or `apply_gradients()`.
If a
tf.GradientTape
is used to compute gradients instead of
`LossScaleOptimizer.minimize` or `LossScaleOptimizer.get_gradients`, the loss
and gradients must be scaled manually. This can be done by calling
`LossScaleOptimizer.get_scaled_loss` before passing the loss to
tf.GradientTape
, and `LossScaleOptimizer.get_unscaled_gradients` after
computing the gradients with tf.GradientTape
.
Show Example
opt = tf.keras.optimizers.SGD(0.1) opt = tf.keras.mixed_precision.experimental.LossScaleOptimizer(opt, "dynamic") # 'minimize' applies loss scaling to the loss and updates the loss sale. opt.minimize(loss_fn)
Methods
- get_scaled_loss
- get_scaled_loss
- get_scaled_loss
- get_scaled_loss_dyn
- get_unscaled_gradients
- get_unscaled_gradients
- get_unscaled_gradients_dyn
Properties
Public instance methods
object get_scaled_loss(int loss)
Scales the loss by the loss scale. This method is only needed if you compute gradients manually, e.g. with
tf.GradientTape
. In that case, call this method to scale the loss before
passing the loss to tf.GradientTape
. If you use
`LossScaleOptimizer.minimize` or `LossScaleOptimizer.get_gradients`, loss
scaling is automatically applied and this method is unneeded. If this method is called, `get_unscaled_gradients` should also be called.
See the tf.keras.mixed_precision.experimental.LossScaleOptimizer
doc for
an example.
Parameters
-
int
loss - The loss, which will be multiplied by the loss scale. Can either be a tensor or a callable returning a tensor.
Returns
-
object
- `loss` multiplied by `LossScaleOptimizer.loss_scale()`.
object get_scaled_loss(IGraphNodeBase loss)
Scales the loss by the loss scale. This method is only needed if you compute gradients manually, e.g. with
tf.GradientTape
. In that case, call this method to scale the loss before
passing the loss to tf.GradientTape
. If you use
`LossScaleOptimizer.minimize` or `LossScaleOptimizer.get_gradients`, loss
scaling is automatically applied and this method is unneeded. If this method is called, `get_unscaled_gradients` should also be called.
See the tf.keras.mixed_precision.experimental.LossScaleOptimizer
doc for
an example.
Parameters
-
IGraphNodeBase
loss - The loss, which will be multiplied by the loss scale. Can either be a tensor or a callable returning a tensor.
Returns
-
object
- `loss` multiplied by `LossScaleOptimizer.loss_scale()`.
object get_scaled_loss(object loss)
Scales the loss by the loss scale. This method is only needed if you compute gradients manually, e.g. with
tf.GradientTape
. In that case, call this method to scale the loss before
passing the loss to tf.GradientTape
. If you use
`LossScaleOptimizer.minimize` or `LossScaleOptimizer.get_gradients`, loss
scaling is automatically applied and this method is unneeded. If this method is called, `get_unscaled_gradients` should also be called.
See the tf.keras.mixed_precision.experimental.LossScaleOptimizer
doc for
an example.
Parameters
-
object
loss - The loss, which will be multiplied by the loss scale. Can either be a tensor or a callable returning a tensor.
Returns
-
object
- `loss` multiplied by `LossScaleOptimizer.loss_scale()`.
object get_scaled_loss_dyn(object loss)
Scales the loss by the loss scale. This method is only needed if you compute gradients manually, e.g. with
tf.GradientTape
. In that case, call this method to scale the loss before
passing the loss to tf.GradientTape
. If you use
`LossScaleOptimizer.minimize` or `LossScaleOptimizer.get_gradients`, loss
scaling is automatically applied and this method is unneeded. If this method is called, `get_unscaled_gradients` should also be called.
See the tf.keras.mixed_precision.experimental.LossScaleOptimizer
doc for
an example.
Parameters
-
object
loss - The loss, which will be multiplied by the loss scale. Can either be a tensor or a callable returning a tensor.
Returns
-
object
- `loss` multiplied by `LossScaleOptimizer.loss_scale()`.
IList<object> get_unscaled_gradients(IEnumerable<object> grads)
Unscales the gradients by the loss scale. This method is only needed if you compute gradients manually, e.g. with
tf.GradientTape
. In that case, call this method to unscale the gradients
after computing them with tf.GradientTape
. If you use
`LossScaleOptimizer.minimize` or `LossScaleOptimizer.get_gradients`, loss
scaling is automatically applied and this method is unneeded. If this method is called, `get_scaled_loss` should also be called. See
the tf.keras.mixed_precision.experimental.LossScaleOptimizer
doc for an
example.
Parameters
-
IEnumerable<object>
grads - A list of tensors, each which will be divided by the loss scale. Can have None values, which are ignored.
Returns
-
IList<object>
- A new list the same size as `grads`, where every non-None value in `grads` is divided by `LossScaleOptimizer.loss_scale()`.
IList<object> get_unscaled_gradients(PythonClassContainer grads)
Unscales the gradients by the loss scale. This method is only needed if you compute gradients manually, e.g. with
tf.GradientTape
. In that case, call this method to unscale the gradients
after computing them with tf.GradientTape
. If you use
`LossScaleOptimizer.minimize` or `LossScaleOptimizer.get_gradients`, loss
scaling is automatically applied and this method is unneeded. If this method is called, `get_scaled_loss` should also be called. See
the tf.keras.mixed_precision.experimental.LossScaleOptimizer
doc for an
example.
Parameters
-
PythonClassContainer
grads - A list of tensors, each which will be divided by the loss scale. Can have None values, which are ignored.
Returns
-
IList<object>
- A new list the same size as `grads`, where every non-None value in `grads` is divided by `LossScaleOptimizer.loss_scale()`.
object get_unscaled_gradients_dyn(object grads)
Unscales the gradients by the loss scale. This method is only needed if you compute gradients manually, e.g. with
tf.GradientTape
. In that case, call this method to unscale the gradients
after computing them with tf.GradientTape
. If you use
`LossScaleOptimizer.minimize` or `LossScaleOptimizer.get_gradients`, loss
scaling is automatically applied and this method is unneeded. If this method is called, `get_scaled_loss` should also be called. See
the tf.keras.mixed_precision.experimental.LossScaleOptimizer
doc for an
example.
Parameters
-
object
grads - A list of tensors, each which will be divided by the loss scale. Can have None values, which are ignored.
Returns
-
object
- A new list the same size as `grads`, where every non-None value in `grads` is divided by `LossScaleOptimizer.loss_scale()`.
Public properties
object clipnorm get; set;
object clipvalue get; set;
object iterations get; set;
Variable. The number of training steps this Optimizer has run.
object iterations_dyn get; set;
Variable. The number of training steps this Optimizer has run.
object learning_rate get; set;
object learning_rate_dyn get; set;
object loss_scale get;
The `LossScale` instance associated with this optimizer.
object loss_scale_dyn get;
The `LossScale` instance associated with this optimizer.