Statistical Inference for Cataloging the Visible Universe



A key task in astronomy is to locate astronomical objects in images and to characterize them according to physical parameters such as brightness, color, and morphology. This task, known as cataloging, is challenging for several reasons: many astronomical objects are much dimmer than the sky background, labeled data is generally unavailable, overlapping astronomical objects must be resolved collectively, and the datasets are enormous — terabytes now, petabytes soon. Previous approaches to cataloging are largely based on algorithmic software pipelines. In this talk, I present a new approach to cataloging based on inference in a fully specified probabilistic model. I consider two inference procedures: one based on variational inference (VI) and another based on MCMC. A distributed implementation of VI, written in Julia and run on a supercomputer, achieves petascale performance — a first for any high-productivity programming language. The run is the largest-scale application of Bayesian inference reported to date. In an extension, using new ideas from variational autoencoders and deep learning, I avoid many of the traditional disadvantages of VI relative to MCMC, and improve model fit.