Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2506.14574
Cited By
TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization
17 June 2025
Mingkang Zhu
Xi Chen
Zhongdao Wang
Bei Yu
Hengshuang Zhao
Jiaya Jia
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization"
7 / 7 papers shown
Title
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective
Ruichen Shao
Yangqiu Song
Gangao Liu
Yang Chen
Xiang Zhou
Jiadong Wang
Xunliang Cai
Peng Li
36
3
0
21 Feb 2025
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
121
71
0
29 Apr 2024
Fine-tuning Language Models for Factuality
Katherine Tian
Eric Mitchell
Huaxiu Yao
Christopher D. Manning
Chelsea Finn
KELM
HILM
SyDa
75
182
0
14 Nov 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
391
4,388
0
09 Jun 2023
Preference-grounded Token-level Guidance for Language Model Fine-tuning
Shentao Yang
Shujian Zhang
Congying Xia
Yihao Feng
Caiming Xiong
Mi Zhou
108
27
0
01 Jun 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
880
13,148
0
04 Mar 2022
Hindsight Experience Replay
Marcin Andrychowicz
Dwight Crow
Alex Ray
Jonas Schneider
Rachel Fong
Peter Welinder
Bob McGrew
Joshua Tobin
Pieter Abbeel
Wojciech Zaremba
OffRL
262
2,337
0
05 Jul 2017
1