Matching as Subset Selection: Design and Optimization for Causal Inference

KWONSANG LEE – SEOUL NATIONAL UNIVERSITY

ABSTRACT

Observational studies commonly rely on matching to construct comparable treated and control groups. However, in high-dimensional settings, traditional matching methods based on unit-to-unit comparisons can become unstable, as similarity at the individual level is often driven by noise rather than meaningful structure. This suggests that matching at the unit level may not be the right abstraction in complex settings.

In this talk, I revisit matching from a design perspective and argue that the core problem is not pairing individuals, but directly selecting a subset of controls that collectively resembles the treated group. I propose a subset-selection approach to matching, which we refer to as optimal subset matching, that explicitly targets distributional balance between treated units and the selected control group. This perspective provides a principled approach to bias control, but leads to a challenging combinatorial optimization problem.

To address this, I introduce a quadratic optimization framework that generalizes subset matching and unifies design-based approaches. The resulting formulation naturally maps to QUBO (quadratic unconstrained binary optimization), highlighting deep connections between causal inference and modern optimization. Empirically, I show that applying standard estimators, such as double machine learning, to matched subsets can improve performance relative to using the full dataset.

Finally, I discuss ongoing work on scalable optimization strategies, including emerging approaches such as quantum annealing, which offer a promising direction for solving large-scale combinatorial design problems in causal inference