Tao Wang, Shuo Li, Yan Sun, Dongsheng Ding, Edgar Dobriban Where to Spend Rollouts: Hit-Utility Optimal Rollout Allocation for Group-Based RLVR.
Sunay Joshi, Tao Wang, Seyed Hamed Hassani, Edgar Dobriban Risk-Controlled Post-Processing of Decision Policies.
Tao Wang and Edgar Dobriban, Optimal Decision-Making Based on Prediction Sets.
Tao Wang, Yan Sun, Edgar Dobriban, Singleton-Optimized Conformal Prediction.
Georgy Noarov, Soham Mallick, Tao Wang, Sunay Joshi, Yan Sun, Yangxinyu Xie, Mengxin Yu, Edgar Dobriban (Working), Foundations of Top-k Decoding For Language Models.