Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.00398
Cited By
Preference-grounded Token-level Guidance for Language Model Fine-tuning
1 June 2023
Shentao Yang
Shujian Zhang
Congying Xia
Yihao Feng
Caiming Xiong
Mi Zhou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Preference-grounded Token-level Guidance for Language Model Fine-tuning"
36 / 36 papers shown
Title
References Indeed Matter? Reference-Free Preference Optimization for Conversational Query Reformulation
Doyoung Kim
Youngjun Lee
Joeun Kim
Jihwan Bang
Hwanjun Song
Susik Yoon
Jae-Gil Lee
46
0
0
10 May 2025
A Survey on Progress in LLM Alignment from the Perspective of Reward Design
Miaomiao Ji
Yanqiu Wu
Zhibin Wu
Shoujin Wang
Jian Yang
Mark Dras
Usman Naseem
48
1
0
05 May 2025
Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data
Shuai Zhao
Linchao Zhu
Yi Yang
53
2
0
14 Apr 2025
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
Ruizhe Chen
Wenhao Chai
Zhifei Yang
Xiaotian Zhang
Qiufeng Wang
Tony Q.S. Quek
Soujanya Poria
Zuozhu Liu
65
0
0
06 Mar 2025
Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference
Wenjie Qiu
Yi-Chen Li
Xuqin Zhang
Tianyi Zhang
Yiming Zhang
Zongzhang Zhang
Yang Yu
ALM
58
1
0
01 Mar 2025
Advantage-Guided Distillation for Preference Alignment in Small Language Models
Shiping Gao
Fanqi Wan
Jiajian Guo
Xiaojun Quan
Qifan Wang
ALM
64
0
0
25 Feb 2025
Learning to Summarize from LLM-generated Feedback
Hwanjun Song
Taewon Yun
Yuho Lee
Jihwan Oh
Gihun Lee
Jason (Jinglun) Cai
Hang Su
81
7
0
28 Jan 2025
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Yueqin Yin
Shentao Yang
Yujia Xie
Ziyi Yang
Yuting Sun
Hany Awadalla
Weizhu Chen
Mingyuan Zhou
54
1
0
07 Jan 2025
Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning
Ziang Ye
Zizhuo Zhang
Yang Zhang
Jianxin Ma
Junyang Lin
Fuli Feng
LRM
97
0
0
19 Dec 2024
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
Jiahao Qiu
Yifu Lu
Yifan Zeng
Jiacheng Guo
Jiayi Geng
Huazheng Wang
Kaixuan Huang
Yue Wu
Mengdi Wang
61
25
0
18 Oct 2024
Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning
Hao Ma
Tianyi Hu
Zhiqiang Pu
Boyin Liu
Xiaolin Ai
Yanyan Liang
Min Chen
81
3
0
08 Oct 2024
Efficient Hybrid Inference for LLMs: Reward-Based Token Modelling with Selective Cloud Assistance
Adarsh MS
Jithin VG
Ditto PS
25
1
0
15 Sep 2024
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization
Yuxin Jiang
Bo Huang
Yufei Wang
Xingshan Zeng
Liangyou Li
Yasheng Wang
Xin Jiang
Lifeng Shang
Ruiming Tang
Wei Wang
49
7
0
14 Aug 2024
TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback
Eunseop Yoon
Hee Suk Yoon
Soohwan Eom
Gunsoo Han
D. W. Nam
DaeJin Jo
Kyoung-Woon On
M. Hasegawa-Johnson
Sungwoong Kim
C. Yoo
ALM
47
16
0
23 Jul 2024
FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering
Tianchi Cai
Zhiwen Tan
Xierui Song
Tao Sun
Jiyan Jiang
Yunqi Xu
Yinger Zhang
Jinjie Gu
43
7
0
19 Jun 2024
Switchable Decision: Dynamic Neural Generation Networks
Shujian Zhang
Korawat Tanwisuth
Chengyue Gong
Pengcheng He
Mi Zhou
BDL
49
0
0
07 May 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
86
61
0
29 Apr 2024
Reinforcement Learning with Token-level Feedback for Controllable Text Generation
Wendi Li
Xiaoye Qu
Kaihe Xu
Wenfeng Xie
Dangyang Chen
Yu Cheng
55
7
0
18 Mar 2024
A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
Shentao Yang
Tianqi Chen
Mingyuan Zhou
EGVM
51
26
0
13 Feb 2024
Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint
Zhipeng Chen
Kun Zhou
Wayne Xin Zhao
Junchen Wan
Fuzheng Zhang
Di Zhang
Ji-Rong Wen
KELM
44
33
0
11 Jan 2024
Let's Reinforce Step by Step
Sarah Pan
Vladislav Lialin
Sherin Muckatira
Anna Rumshisky
ReLM
LRM
27
8
0
10 Nov 2023
Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models
Ashutosh Baheti
Ximing Lu
Faeze Brahman
Ronan Le Bras
Maarten Sap
Mark O. Riedl
43
9
0
24 May 2023
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
Hung Le
Yue Wang
Akhilesh Deepak Gotmare
Silvio Savarese
Guosheng Lin
SyDa
ALM
137
243
0
05 Jul 2022
Offline RL for Natural Language Generation with Implicit Language Q Learning
Charles Burton Snell
Ilya Kostrikov
Yi Su
Mengjiao Yang
Sergey Levine
OffRL
144
107
0
05 Jun 2022
Teaching language models to support answers with verified quotes
Jacob Menick
Maja Trebacz
Vladimir Mikulik
John Aslanides
Francis Song
...
Mia Glaese
Susannah Young
Lucy Campbell-Gillingham
G. Irving
Nat McAleese
ELM
RALM
253
261
0
21 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
457
12,345
0
04 Mar 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
254
1,668
0
15 Oct 2021
Mismatched No More: Joint Model-Policy Optimization for Model-Based RL
Benjamin Eysenbach
Alexander Khazatsky
Sergey Levine
Ruslan Salakhutdinov
OffRL
206
44
0
06 Oct 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
298
3,917
0
18 Apr 2021
WARP: Word-level Adversarial ReProgramming
Karen Hambardzumyan
Hrant Khachatrian
Jonathan May
AAML
260
343
0
01 Jan 2021
Making Pre-trained Language Models Better Few-shot Learners
Tianyu Gao
Adam Fisch
Danqi Chen
245
1,941
0
31 Dec 2020
Bayesian Attention Modules
Xinjie Fan
Shujian Zhang
Bo Chen
Mingyuan Zhou
119
59
0
20 Oct 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick
Hinrich Schütze
261
1,601
0
21 Jan 2020
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
310
1,646
0
18 Sep 2019
Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task
Marcin Junczys-Dowmunt
Roman Grundkiewicz
Shubha Guha
Kenneth Heafield
51
193
0
16 Apr 2018
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
214
3,521
0
10 Jun 2015
1