Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2506.14574
Cited By
TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization
17 June 2025
Mingkang Zhu
Xi Chen
Zhongdao Wang
Bei Yu
Hengshuang Zhao
Jiaya Jia
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization"
4 / 4 papers shown
Title
Fine-tuning Language Models for Factuality
Katherine Tian
Eric Mitchell
Huaxiu Yao
Christopher D. Manning
Chelsea Finn
KELM
HILM
SyDa
75
182
0
14 Nov 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
391
4,388
0
09 Jun 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
880
13,148
0
04 Mar 2022
Hindsight Experience Replay
Marcin Andrychowicz
Dwight Crow
Alex Ray
Jonas Schneider
Rachel Fong
Peter Welinder
Bob McGrew
Joshua Tobin
Pieter Abbeel
Wojciech Zaremba
OffRL
262
2,337
0
05 Jul 2017
1