VectorDiffeomixture - LostTech.TensorFlow Documentation

Type VectorDiffeomixture

Namespace tensorflow.contrib.distributions

VectorDiffeomixture distribution.

A vector diffeomixture (VDM) is a distribution parameterized by a convex combination of `K` component `loc` vectors, `loc[k], k = 0,...,K-1`, and `K` `scale` matrices `scale[k], k = 0,..., K-1`. It approximates the following [compound distribution] (https://en.wikipedia.org/wiki/Compound_probability_distribution)

```none p(x) = int p(x | z) p(z) dz, where z is in the K-simplex, and p(x | z) := p(x | loc=sum_k z[k] loc[k], scale=sum_k z[k] scale[k]) ```

The integral `int p(x | z) p(z) dz` is approximated with a quadrature scheme adapted to the mixture density `p(z)`. The `N` quadrature points `z_{N, n}` and weights `w_{N, n}` (which are non-negative and sum to 1) are chosen such that

```q_N(x) := sum_{n=1}^N w_{n, N} p(x | z_{N, n}) --> p(x)```

as `N --> infinity`.

Since `q_N(x)` is in fact a mixture (of `N` points), we may sample from `q_N` exactly. It is important to note that the VDM is *defined* as `q_N` above, and *not* `p(x)`. Therefore, sampling and pdf may be implemented as exact (up to floating point error) methods.

A common choice for the conditional `p(x | z)` is a multivariate Normal.

The implemented marginal `p(z)` is the `SoftmaxNormal`, which is a `K-1` dimensional Normal transformed by a `SoftmaxCentered` bijector, making it a density on the `K`-simplex. That is,

``` Z = SoftmaxCentered(X), X = Normal(mix_loc / temperature, 1 / temperature) ```

The default quadrature scheme chooses `z_{N, n}` as `N` midpoints of the quantiles of `p(z)` (generalized quantiles if `K > 2`).

See [Dillon and Langmore (2018)][1] for more details.

#### About `Vector` distributions in TensorFlow.

The `VectorDiffeomixture` is a non-standard distribution that has properties particularly useful in [variational Bayesian methods](https://en.wikipedia.org/wiki/Variational_Bayesian_methods).

Conditioned on a draw from the SoftmaxNormal, `X|z` is a vector whose components are linear combinations of affine transformations, thus is itself an affine transformation.

Note: The marginals `X_1|v,..., X_d|v` are *not* generally identical to some parameterization of `distribution`. This is due to the fact that the sum of draws from `distribution` are not generally itself the same `distribution`.

#### About `Diffeomixture`s and reparameterization.

The `VectorDiffeomixture` is designed to be reparameterized, i.e., its parameters are only used to transform samples from a distribution which has no trainable parameters. This property is important because backprop stops at sources of stochasticity. That is, as long as the parameters are used *after* the underlying source of stochasticity, the computed gradient is accurate.

Reparametrization means that we can use gradient-descent (via backprop) to optimize Monte-Carlo objectives. Such objectives are a finite-sample approximation of an expectation and arise throughout scientific computing.

WARNING: If you backprop through a VectorDiffeomixture sample and the "base" distribution is both: not `FULLY_REPARAMETERIZED` and a function of trainable variables, then the gradient is not guaranteed correct!

#### Examples #### References

[1]: Joshua Dillon and Ian Langmore. Quadrature Compound: An approximating family of distributions. _arXiv preprint arXiv:1801.03080_, 2018. https://arxiv.org/abs/1801.03080

Show Example

import tensorflow_probability as tfp
            tfd = tfp.distributions  # Create two batches of VectorDiffeomixtures, one with mix_loc=[0.],
# another with mix_loc=[1]. In both cases, `K=2` and the affine
# transformations involve:
# k=0: loc=zeros(dims)  scale=LinearOperatorScaledIdentity
# k=1: loc=[2.]*dims    scale=LinOpDiag
dims = 5
vdm = tfd.VectorDiffeomixture(
    mix_loc=[[0.], [1]],
    temperature=[1.],
    distribution=tfd.Normal(loc=0., scale=1.),
    loc=[
        None,  # Equivalent to `np.zeros(dims, dtype=np.float32)`.
        np.float32([2.]*dims),
    ],
    scale=[
        tf.linalg.LinearOperatorScaledIdentity(
          num_rows=dims,
          multiplier=np.float32(1.1),
          is_positive_definite=True),
        tf.linalg.LinearOperatorDiag(
          diag=np.linspace(2.5, 3.5, dims, dtype=np.float32),
          is_positive_definite=True),
    ],
    validate_args=True)

Methods

NewDyn

Properties

Public static methods

VectorDiffeomixture NewDyn(object mix_loc, object temperature, object distribution, object loc, object scale, ImplicitContainer<T> quadrature_size, ImplicitContainer<T> quadrature_fn, ImplicitContainer<T> validate_args, ImplicitContainer<T> allow_nan_stats, ImplicitContainer<T> name)

Constructs the VectorDiffeomixture on `R^d`. (deprecated)

Warning: THIS FUNCTION IS DEPRECATED. It will be removed after 2018-10-01. Instructions for updating: The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of tf.contrib.distributions.

The vector diffeomixture (VDM) approximates the compound distribution

```none p(x) = int p(x | z) p(z) dz, where z is in the K-simplex, and p(x | z) := p(x | loc=sum_k z[k] loc[k], scale=sum_k z[k] scale[k]) ```

Parameters

object mix_loc: `float`-like `Tensor` with shape `[b1,..., bB, K-1]`. In terms of samples, larger `mix_loc[..., k]` ==> `Z` is more likely to put more weight on its `kth` component.
object temperature: `float`-like `Tensor`. Broadcastable with `mix_loc`. In terms of samples, smaller `temperature` means one component is more likely to dominate. I.e., smaller `temperature` makes the VDM look more like a standard mixture of `K` components.
object distribution: `tf.Distribution`-like instance. Distribution from which `d` iid samples are used as input to the selected affine transformation. Must be a scalar-batch, scalar-event distribution. Typically `distribution.reparameterization_type = FULLY_REPARAMETERIZED` or it is a function of non-trainable parameters. WARNING: If you backprop through a VectorDiffeomixture sample and the `distribution` is not `FULLY_REPARAMETERIZED` yet is a function of trainable variables, then the gradient will be incorrect!
object loc: Length-`K` list of `float`-type `Tensor`s. The `k`-th element represents the `shift` used for the `k`-th affine transformation. If the `k`-th item is `None`, `loc` is implicitly `0`. When specified, must have shape `[B1,..., Bb, d]` where `b >= 0` and `d` is the event size.
object scale: Length-`K` list of `LinearOperator`s. Each should be positive-definite and operate on a `d`-dimensional vector space. The `k`-th element represents the `scale` used for the `k`-th affine transformation. `LinearOperator`s must have shape `[B1,..., Bb, d, d]`, `b >= 0`, i.e., characterizes `b`-batches of `d x d` matrices
ImplicitContainer<T> quadrature_size: Python `int` scalar representing number of quadrature points. Larger `quadrature_size` means `q_N(x)` better approximates `p(x)`.
ImplicitContainer<T> quadrature_fn: Python callable taking `normal_loc`, `normal_scale`, `quadrature_size`, `validate_args` and returning `tuple(grid, probs)` representing the SoftmaxNormal grid and corresponding normalized weight. normalized) weight. Default value: `quadrature_scheme_softmaxnormal_quantiles`.
ImplicitContainer<T> validate_args: Python `bool`, default `False`. When `True` distribution parameters are checked for validity despite possibly degrading runtime performance. When `False` invalid inputs may silently render incorrect outputs.
ImplicitContainer<T> allow_nan_stats: Python `bool`, default `True`. When `True`, statistics (e.g., mean, mode, variance) use the value "`NaN`" to indicate the result is undefined. When `False`, an exception is raised if one or more of the statistic's batch members are undefined.
ImplicitContainer<T> name: Python `str` name prefixed to Ops created by this class.

Public properties

object allow_nan_stats get;

object allow_nan_stats_dyn get;

TensorShape batch_shape get;

object batch_shape_dyn get;

object distribution get;

Base scalar-event, scalar-batch distribution.

object distribution_dyn get;

Base scalar-event, scalar-batch distribution.

object dtype get;

object dtype_dyn get;

IList<AffineLinearOperator> endpoint_affine get;

Affine transformation for each of `K` components.

object endpoint_affine_dyn get;

Affine transformation for each of `K` components.

TensorShape event_shape get;

object event_shape_dyn get;

object grid get;

Grid of mixing probabilities, one for each grid point.

object grid_dyn get;

Grid of mixing probabilities, one for each grid point.

IList<AffineLinearOperator> interpolated_affine get;

Affine transformation for each convex combination of `K` components.

object interpolated_affine_dyn get;

Affine transformation for each convex combination of `K` components.

Categorical mixture_distribution get;

Distribution used to select a convex combination of affine transforms.

object mixture_distribution_dyn get;