Statistical Methods for Assessing the Factual Accuracy of Large Language Models
JOHN CHERIAN – STANFORD UNIVERSITY
ABSTRACT
The deployment of machine learning in high-stakes settings has raised fundamental questions about the reliability and fairness of black-box models. For example, does a model treat different groups equitably, or can we quantify model uncertainty before taking action on each prediction? While numerous assumption-lean methods appear to address these types of questions, their guarantees can often be misaligned with practitioners’ needs. My research program aims to resolve the inherent tension of model-free statistical inference: the generic validity of such methods is appealing, but without a well-specified model, it is challenging to identify guarantees that are also useful for decision-making.
To illustrate my approach, this talk will primarily focus on a set of new conformal inference methods for obtaining validity guarantees on the output of large language models (LLMs). Prior work in language modeling identifies a subset of the text that satisfies a high-probability guarantee of factuality. These methods work by filtering a claim from the LLM’s original response if a scoring function evaluated on the claim fails to exceed some estimated threshold. Existing methods in this area suffer from two deficiencies. First, the guarantee is not conditionally valid. The trustworthiness of the filtering step may vary based on the topic of the response. Second, because the scoring function is imperfect, the filtering step can remove many valuable and accurate claims. Our work addresses both of these challenges via two new conformal prediction methods. First, we show how to issue an error guarantee that is both valid and adaptive: the guarantee remains well-calibrated even though it can depend on the prompt (e.g., so that the final output retains most claims). Second, we will show how to optimize the accuracy of the scoring function used in this procedure, e.g., by ensembling multiple scoring approaches. This is joint work with Isaac Gibbs and Emmanuel Candès.
Related Paper