INTERPRETING THE SPECTRAL EMBEDDING IN FINITE MIXTURE MODELS

YUEKAI SUN – UNIVERSITY OF MICHIGAN

As datasets increase in heterogeneity, practitioners are turning to mixture models to account for the heterogeneity. Spectral clustering is a non-parametric approach that embeds the data into a low-dimensional space in a way that reveals the cluster structure in the data. We show that the spectral embedding in finite mixture models is, up to an orthogonal transformation, a perturbed version of the posterior probabilities of the unobserved labels. We quantify the magnitude of the perturbation based on two things: the separation between the mixture components and how easily any component is divisible into sub-components. Based on this connection, we design an algorithm based on the QR factorization to recover the latter from the former.