A Tale of two Clusters: Cluster Quilting & Conformal Clustering
GENEVERA ALLEN – COLUMBIA UNIVERSITY
ABSTRACT
In this two part talk, I will highlight two recent works on clustering.
First, we consider patchwork learning, a new data collection paradigm where both samples and features are observed in fragmented subsets; such data commonly arises in neuroscience and healthcare, among others. We focus on the clustering for patchwork learning, aiming at discovering clusters amongst all samples even when some are never jointly observed for any feature. To address this, we propose a novel spectral clustering method called Cluster Quilting, study its theoretical properties, and validate its empirical performance on simulations, neuroscience, and genomics examples. (This is joint work with Lili Zheng and Andersen Chang.)
Second, we consider the challenging task of uncertainty quantification for clustering. Inspired by conformal inference approaches for supervised learning, we propose a novel Conformal Clustering framework to provide valid confidence sets for possible cluster labels of a new observation. We study this novel problem and provide general theory on the under-coverage of naive approaches as well as provide conditions under which we can achieve asymptotically valid coverage. We also show that any finite mixture model achieves these conditions. Finally, we validate our approach through simulations and real scientific clustering examples from genomics and astronomy. (This is joint work with YoonHaeng Hur and Anirban Nath).

