Title |
---|
![]() DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization Gang Li Ming Lin Tomer Galanti Zhengzhong Tu Tianbao Yang |
![]() SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization Minghan Chen Guikun Chen Wenguan Wang Yi Yang |
![]() Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment Siliang Zeng Quan Wei William Brown Oana Frunza Yuriy Nevmyvaka Mingyi Hong |
![]() Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO Peter Chen Xiaopeng Li Ziniu Li Xi Chen Tianyi Lin |
![]() Reinforcement Learning Finetunes Small Subnetworks in Large Language Models Sagnik Mukherjee Lifan Yuan Dilek Hakkani-Tur Hao Peng |