Type RealNVP
Namespace tensorflow.contrib.distributions.bijectors
Parent Bijector
Interfaces IRealNVP
RealNVP "affine coupling layer" for vector-valued events. Real NVP models a normalizing flow on a `D`-dimensional distribution via a
single `D-d`-dimensional conditional distribution [(Dinh et al., 2017)][1]: `y[d:D] = y[d:D] * math_ops.exp(log_scale_fn(y[d:D])) + shift_fn(y[d:D])`
`y[0:d] = x[0:d]` The last `D-d` units are scaled and shifted based on the first `d` units only,
while the first `d` units are 'masked' and left unchanged. Real NVP's
`shift_and_log_scale_fn` computes vector-valued quantities. For
scale-and-shift transforms that do not depend on any masked units, i.e.
`d=0`, use the `tfb.Affine` bijector with learned parameters instead. Masking is currently only supported for base distributions with
`event_ndims=1`. For more sophisticated masking schemes like checkerboard or
channel-wise masking [(Papamakarios et al., 2016)[4], use the `tfb.Permute`
bijector to re-order desired masked units into the first `d` units. For base
distributions with `event_ndims > 1`, use the `tfb.Reshape` bijector to
flatten the event shape. Recall that the MAF bijector [(Papamakarios et al., 2016)][4] implements a
normalizing flow via an autoregressive transformation. MAF and IAF have
opposite computational tradeoffs - MAF can train all units in parallel but
must sample units sequentially, while IAF must train units sequentially but
can sample in parallel. In contrast, Real NVP can compute both forward and
inverse computations in parallel. However, the lack of an autoregressive
transformations makes it less expressive on a per-bijector basis. A "valid" `shift_and_log_scale_fn` must compute each `shift` (aka `loc` or
"mu" in [Papamakarios et al. (2016)][4]) and `log(scale)` (aka "alpha" in
[Papamakarios et al. (2016)][4]) such that each are broadcastable with the
arguments to `forward` and `inverse`, i.e., such that the calculations in
`forward`, `inverse` [below] are possible. For convenience,
`real_nvp_default_nvp` is offered as a possible `shift_and_log_scale_fn`
function. NICE [(Dinh et al., 2014)][2] is a special case of the Real NVP bijector
which discards the scale transformation, resulting in a constant-time
inverse-log-determinant-Jacobian. To use a NICE bijector instead of Real
NVP, `shift_and_log_scale_fn` should return `(shift, None)`, and
`is_constant_jacobian` should be set to `True` in the `RealNVP` constructor.
Calling `real_nvp_default_template` with `shift_only=True` returns one such
NICE-compatible `shift_and_log_scale_fn`. Caching: the scalar input depth `D` of the base distribution is not known at
construction time. The first call to any of `forward(x)`, `inverse(x)`,
`inverse_log_det_jacobian(x)`, or `forward_log_det_jacobian(x)` memoizes
`D`, which is re-used in subsequent calls. This shape must be known prior to
graph execution (which is the case if using tf.layers). #### Example Use
For more examples, see [Jang (2018)][3]. #### References [1]: Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density Estimation
using Real NVP. In _International Conference on Learning
Representations_, 2017. https://arxiv.org/abs/1605.08803 [2]: Laurent Dinh, David Krueger, and Yoshua Bengio. NICE: Non-linear
Independent Components Estimation. _arXiv preprint arXiv:1410.8516_,
2014. https://arxiv.org/abs/1410.8516 [3]: Eric Jang. Normalizing Flows Tutorial, Part 2: Modern Normalizing Flows.
_Technical Report_, 2018. http://blog.evjang.com/2018/01/nf2.html [4]: George Papamakarios, Theo Pavlakou, and Iain Murray. Masked
Autoregressive Flow for Density Estimation. In _Neural Information
Processing Systems_, 2017. https://arxiv.org/abs/1705.07057
Show Example
import tensorflow_probability as tfp tfd = tfp.distributions tfb = tfp.bijectors # A common choice for a normalizing flow is to use a Gaussian for the base # distribution. (However, any continuous distribution would work.) E.g., num_dims = 3 num_samples = 1 nvp = tfd.TransformedDistribution( distribution=tfd.MultivariateNormalDiag(loc=np.zeros(num_dims)), bijector=tfb.RealNVP( num_masked=2, shift_and_log_scale_fn=tfb.real_nvp_default_template( hidden_layers=[512, 512]))) x = nvp.sample(num_samples) nvp.log_prob(x) nvp.log_prob(np.zeros([num_samples, num_dims]))