Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.02790
Cited By
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
7 January 2025
Yueqin Yin
Shentao Yang
Yujia Xie
Ziyi Yang
Yuting Sun
Hany Awadalla
Weizhu Chen
Mingyuan Zhou
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model"
4 / 4 papers shown
Title
TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization
Mingkang Zhu
Xi Chen
Zhongdao Wang
Bei Yu
Hengshuang Zhao
Jiaya Jia
22
0
0
17 Jun 2025
Cascade Reward Sampling for Efficient Decoding-Time Alignment
Bolian Li
Yifan Wang
A. Grama
Ruqi Zhang
Ruqi Zhang
AI4TS
162
15
0
24 Jun 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
158
72
0
29 Apr 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Percy Liang
Tatsunori Hashimoto
ALM
179
403
0
06 Apr 2024
1