Learning Targeted Treatment Assignment Policies with Adaptive Experiments: Theory and Applications



Randomized experiments have long been used to learn what treatment works best, but uniform randomization is not the most efficient way to accomplish that goal, particularly when there are many potential treatment arms or when the best treatment depends on individual characteristics, in which case the goal is to learn a targeted treatment assignment policy.  Adaptive experiments (“contextual bandits”) have great potential to both improve the outcomes of individuals during an experiment (minimize “cumulative regret”), and to improve the ability to estimate the best policy after the experiment (policy learning/minimize simple regret).  This talk will start with some real-world implementations of contextual bandits, identifying challenges, including the problem of model misspecification and tradeoffs between cumulative regret and policy learning.  Then, it will review recent theoretical developments that make progress in addressing these challenges.


Related Papers: