Flow-based generative model by Wasserstein proximal gradient descent



Normalizing flow has become a popular class of deep generative models for efficient sampling and density estimation. Recently, the remarkable success of score-based diffusion models has inspired flow-based models that are closely related to the diffusion process. In particular, the celebrated Jordan-Kinderleherer-Otto (JKO) scheme captures the variational nature of the Fokker-Planck equation of a diffusion process as a Wasserstein gradient flow. In the context of normalizing flow, this naturally suggests a progressive way of training a flow model that implements a proximal gradient descent in the Wasserstein-2 space. In this talk, we introduce such a JKO flow model that achieves competitive performance compared to existing diffusion and flow models on generating high dimensional real data. The proposed flow network stacks residual blocks one after another, each block corresponding to a JKO step, and it allows efficient block-wise training that remedies the memory load and possible difficulties in end-to-end training. Thanks to the invertibility of the neural ODE model, both the forward (data-to-noise) and reverse (noise-to-data) processes are deterministic, avoiding sampling SDE trajectories. On the theoretical side, the connection to Wasserstein proximal gradient descent allows us to prove the exponentially fast convergence of the discrete-time flow in both directions, leading to a KullbackÔÇôLeibler (KL) guarantee of data generation at $O(\varepsilon^2)$ error when using $\log (1/\varepsilon)$ many JKO steps (residual blocks).

Joint work with Yao Xie, Chen Xu (Georgia Tech), and Jianfeng Lu, Yixin Tan (Duke).