MIXED MEMBERSHIP ESTIMATION FOR LARGE SOCIAL NETWORKS

ZHENG TRACY KE – HARVARD UNIVERSITY

ABSTRACT

Given a large network, we assume that there are K perceivable communities and that each node can belong to multiple communities via a mixed membership vector. We are interested in estimating these mixed membership vectors, as they represent the latent social structure of nodes. We propose a spectral method, Mixed-SCORE. It uses the pre-PCA & post-PCA normalizations to simultaneously maximize the signal-to-noise ratios at all entries of eigenvectors and estimates the mixed membership vectors from a simplex geometry in the spectral domain. Under a degree-corrected mixed membership model, we show that Mixed-SCORE is “optimally adaptive”: It achieves the optimal rate for many different combinations of network sparsity, degree heterogeneity and signal strength, and the method does not need any prior information of model parameters.

For real applications, we apply our method to the MADStat dataset (Ji et al., 2022). It contains attributes of 83K papers published 36 statistics-related journals. We constructed a co-citation network of statisticians and applied Mixed-SCORE to discover a Research Triangle and a Research Map about the academic statistics society. The results provide evidence of the philosophical Research Triangle conjectured by Bradley Efron.

Reference Papers:

Data set (Ji et al., 2022): http://zke.fas.harvard.edu/papers/MADStat-combined.pdf
Method and theory (Ke and Wang, 2022): https://arxiv.org/abs/2204.12087