Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems

26 February 2025

Papers citing "Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems"

3 / 3 papers shown

Title
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models Xiaobao Wu LRM 72 1 0 05 May 2025
A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions Emre Can Acikgoz Cheng Qian Hongru Wang Vardhan Dongre Xiusi Chen Heng Ji Dilek Hakkani-Tur Gokhan Tur LM&Ro ELM 55 1 0 07 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling Zijun Liu P. Wang Ran Xu Shirong Ma Chong Ruan Peng Li Yang Liu Y. Wu OffRL LRM 46 11 0 03 Apr 2025