Flows for simultaneous manifold learning and density estimation

Mar 30, 2020
15 pages
e-Print:

Citations per year

20202021202220232024496
Abstract: (arXiv)
We introduce manifold-learning flows (M-flows), a new class of generative models that simultaneously learn the data manifold as well as a tractable probability density on that manifold. Combining aspects of normalizing flows, GANs, autoencoders, and energy-based models, they have the potential to represent datasets with a manifold structure more faithfully and provide handles on dimensionality reduction, denoising, and out-of-distribution detection. We argue why such models should not be trained by maximum likelihood alone and present a new training algorithm that separates manifold and density updates. In a range of experiments we demonstrate how M-flows learn the data manifold and allow for better inference than standard flows in the ambient data space.
Note:
  • [1]
    n = 2, sampling z0 and z1 from a unit Gaussian. We generate a training set of 104 images
    • [2]
      n = 64, sampling the z0...63 from a Gaussian with mean 0 and variance exp(θ)2. The model parameter θ is drawn from a unit Gaussian. We generate a training set of 2·104 images. All images are downsampled to a resolution of 64 × 64. The images thus populate a 2-dimensional or 64-dimensional manifold embedded in a 64×64×3-dimensional ambient space. We preprocess the 8-bit training data through uniform dequantization (43). Architectures. We consider AF, PIE, M-flow, and Me-flow models based on rational-quadratic neural spline flows (29). • The AF models closely follow the setup described in Reference (29), which in turn is based on the Glow (43) and RealNVP (5) architectures. A multi-scale setup (5) with four levels is used. On each level, seven steps are stacked. Each step entails an actnorm layer, an invertible 1 × 1 convolution, and a rational-quadratic coupling transformation. Overall there are thus 28 coupling transformation layers. • M-flow use a similar setup for the transformation f, except that each level only uses five steps. The output of this multi-scale transformation is then transformed with an invertible linear (LU-decomposed) layer, an invertible activation function, and another invertible linear