Type MaskedAutoregressiveFlow
Namespace tensorflow.contrib.distributions.bijectors
Parent Bijector
Interfaces IMaskedAutoregressiveFlow
Affine MaskedAutoregressiveFlow bijector for vector-valued events. The affine autoregressive flow [(Papamakarios et al., 2016)][3] provides a
relatively simple framework for user-specified (deep) architectures to learn
a distribution over vector-valued events. Regarding terminology, "Autoregressive models decompose the joint density as a product of
conditionals, and model each conditional in turn. Normalizing flows
transform a base density (e.g. a standard Gaussian) into the target density
by an invertible transformation with tractable Jacobian."
[(Papamakarios et al., 2016)][3] In other words, the "autoregressive property" is equivalent to the
decomposition, `p(x) = prod{ p(x[i] | x[0:i]) : i=0,..., d }`. The provided
`shift_and_log_scale_fn`, `masked_autoregressive_default_template`, achieves
this property by zeroing out weights in its `masked_dense` layers. In the `tfp` framework, a "normalizing flow" is implemented as a
`tfp.bijectors.Bijector`. The `forward` "autoregression"
is implemented using a
tf.while_loop
and a deep neural network (DNN) with
masked weights such that the autoregressive property is automatically met in
the `inverse`. A `TransformedDistribution` using `MaskedAutoregressiveFlow(...)` uses the
(expensive) forward-mode calculation to draw samples and the (cheap)
reverse-mode calculation to compute log-probabilities. Conversely, a
`TransformedDistribution` using `Invert(MaskedAutoregressiveFlow(...))` uses
the (expensive) forward-mode calculation to compute log-probabilities and the
(cheap) reverse-mode calculation to compute samples. See "Example Use"
[below] for more details. Given a `shift_and_log_scale_fn`, the forward and inverse transformations are
(a sequence of) affine transformations. A "valid" `shift_and_log_scale_fn`
must compute each `shift` (aka `loc` or "mu" in [Germain et al. (2015)][1])
and `log(scale)` (aka "alpha" in [Germain et al. (2015)][1]) such that each
are broadcastable with the arguments to `forward` and `inverse`, i.e., such
that the calculations in `forward`, `inverse` [below] are possible. For convenience, `masked_autoregressive_default_template` is offered as a
possible `shift_and_log_scale_fn` function. It implements the MADE
architecture [(Germain et al., 2015)][1]. MADE is a feed-forward network that
computes a `shift` and `log(scale)` using `masked_dense` layers in a deep
neural network. Weights are masked to ensure the autoregressive property. It
is possible that this architecture is suboptimal for your task. To build
alternative networks, either change the arguments to
`masked_autoregressive_default_template`, use the `masked_dense` function to
roll-out your own, or use some other architecture, e.g., using tf.layers
. Warning: no attempt is made to validate that the `shift_and_log_scale_fn`
enforces the "autoregressive property". Assuming `shift_and_log_scale_fn` has valid shape and autoregressive
semantics, the forward transformation is
and the inverse transformation is
Notice that the `inverse` does not need a for-loop. This is because in the
forward pass each calculation of `shift` and `log_scale` is based on the `y`
calculated so far (not `x`). In the `inverse`, the `y` is fully known, thus is
equivalent to the scaling used in `forward` after `event_size` passes, i.e.,
the "last" `y` used to compute `shift`, `log_scale`. (Roughly speaking, this
also proves the transform is bijective.) #### Examples
#### References [1]: Mathieu Germain, Karol Gregor, Iain Murray, and Hugo Larochelle. MADE:
Masked Autoencoder for Distribution Estimation. In _International
Conference on Machine Learning_, 2015. https://arxiv.org/abs/1502.03509 [2]: Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya
Sutskever, and Max Welling. Improving Variational Inference with Inverse
Autoregressive Flow. In _Neural Information Processing Systems_, 2016.
https://arxiv.org/abs/1606.04934 [3]: George Papamakarios, Theo Pavlakou, and Iain Murray. Masked
Autoregressive Flow for Density Estimation. In _Neural Information
Processing Systems_, 2017. https://arxiv.org/abs/1705.07057
Show Example
def forward(x): y = zeros_like(x) event_size = x.shape[-1] for _ in range(event_size): shift, log_scale = shift_and_log_scale_fn(y) y = x * math_ops.exp(log_scale) + shift return y
Methods
Properties
Public static methods
MaskedAutoregressiveFlow NewDyn(object shift_and_log_scale_fn, ImplicitContainer<T> is_constant_jacobian, ImplicitContainer<T> validate_args, ImplicitContainer<T> unroll_loop, object name)
Creates the MaskedAutoregressiveFlow bijector. (deprecated) Warning: THIS FUNCTION IS DEPRECATED. It will be removed after 2018-10-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of
tf.contrib.distributions
.
Parameters
-
object
shift_and_log_scale_fn - Python `callable` which computes `shift` and `log_scale` from both the forward domain (`x`) and the inverse domain (`y`). Calculation must respect the "autoregressive property" (see class docstring). Suggested default `masked_autoregressive_default_template(hidden_layers=...)`. Typically the function contains `tf.Variables` and is wrapped using `tf.compat.v1.make_template`. Returning `None` for either (both) `shift`, `log_scale` is equivalent to (but more efficient than) returning zero.
-
ImplicitContainer<T>
is_constant_jacobian - Python `bool`. Default: `False`. When `True` the implementation assumes `log_scale` does not depend on the forward domain (`x`) or inverse domain (`y`) values. (No validation is made; `is_constant_jacobian=False` is always safe but possibly computationally inefficient.)
-
ImplicitContainer<T>
validate_args - Python `bool` indicating whether arguments should be checked for correctness.
-
ImplicitContainer<T>
unroll_loop - Python `bool` indicating whether the
tf.while_loop
in `_forward` should be replaced with a static for loop. Requires that the final dimension of `x` be known at graph construction time. Defaults to `False`. -
object
name - Python `str`, name given to ops managed by this object.