Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.23363
Cited By
Discriminative Policy Optimization for Token-Level Reward Models
29 May 2025
Hongzhan Chen
Tao Yang
Shiping Gao
Ruijun Chen
Xiaojun Quan
Hongtao Tian
Ting Yao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Discriminative Policy Optimization for Token-Level Reward Models"
9 / 9 papers shown
Title
What Makes a Reward Model a Good Teacher? An Optimization Perspective
Noam Razin
Zixuan Wang
Hubert Strauss
Stanley Wei
Jason D. Lee
Sanjeev Arora
81
9
0
19 Mar 2025
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Amrith Rajagopal Setlur
Chirag Nagpal
Adam Fisch
Xinyang Geng
Jacob Eisenstein
Rishabh Agarwal
Alekh Agarwal
Jonathan Berant
Aviral Kumar
OffRL
LRM
76
56
0
10 Oct 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
110
62
0
29 Apr 2024
RewardBench: Evaluating Reward Models for Language Modeling
Nathan Lambert
Valentina Pyatkin
Jacob Morrison
Lester James V. Miranda
Bill Yuchen Lin
...
Sachin Kumar
Tom Zick
Yejin Choi
Noah A. Smith
Hanna Hajishirzi
ALM
128
250
0
20 Mar 2024
Dense Reward for Free in Reinforcement Learning from Human Feedback
Alex J. Chan
Hao Sun
Samuel Holt
M. Schaar
52
41
0
01 Feb 2024
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu
Yushi Hu
Weijia Shi
Nouha Dziri
Alane Suhr
Prithviraj Ammanabrolu
Noah A. Smith
Mari Ostendorf
Hannaneh Hajishirzi
ALM
120
321
0
02 Jun 2023
Preference-grounded Token-level Guidance for Language Model Fine-tuning
Shentao Yang
Shujian Zhang
Congying Xia
Yihao Feng
Caiming Xiong
Mi Zhou
84
25
0
01 Jun 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
106
581
0
22 May 2023
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
285
18,685
0
20 Jul 2017
1