ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.18290
  4. Cited By
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

29 May 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
    ALM
ArXivPDFHTML

Papers citing "Direct Preference Optimization: Your Language Model is Secretly a Reward Model"

37 / 2,637 papers shown
Title
OctoPack: Instruction Tuning Code Large Language Models
OctoPack: Instruction Tuning Code Large Language Models
Niklas Muennighoff
Qian Liu
A. Zebaze
Qinkai Zheng
Binyuan Hui
Terry Yue Zhuo
Swayam Singh
Xiangru Tang
Leandro von Werra
Shayne Longpre
VLM
ALM
71
120
0
14 Aug 2023
Large Language Models for Information Retrieval: A Survey
Large Language Models for Information Retrieval: A Survey
Yutao Zhu
Huaying Yuan
Shuting Wang
Jiongnan Liu
Wenhan Liu
Chenlong Deng
Haonan Chen
Zhicheng Dou
Ji-Rong Wen
KELM
57
290
0
14 Aug 2023
#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of
  Large Language Models
#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models
Keming Lu
Hongyi Yuan
Zheng Yuan
Runji Lin
Junyang Lin
Chuanqi Tan
Chang Zhou
Jingren Zhou
ALM
LRM
35
65
0
14 Aug 2023
Detecting and Preventing Hallucinations in Large Vision Language Models
Detecting and Preventing Hallucinations in Large Vision Language Models
Anisha Gunjal
Jihan Yin
Erhan Bas
MLLM
VLM
36
156
0
11 Aug 2023
Retroformer: Retrospective Large Language Agents with Policy Gradient
  Optimization
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
Weiran Yao
Shelby Heinecke
Juan Carlos Niebles
Zhiwei Liu
Yihao Feng
...
Ran Xu
P. Mùi
Haiquan Wang
Caiming Xiong
Silvio Savarese
LLMAG
LM&Ro
41
74
0
04 Aug 2023
Reinforcement Learning for Generative AI: State of the Art,
  Opportunities and Open Research Challenges
Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges
Giorgio Franceschelli
Mirco Musolesi
AI4CE
40
20
0
31 Jul 2023
Scaling Sentence Embeddings with Large Language Models
Scaling Sentence Embeddings with Large Language Models
Ting Jiang
Shaohan Huang
Zhongzhi Luan
Deqing Wang
Fuzhen Zhuang
LRM
46
40
0
31 Jul 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from
  Human Feedback
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
52
481
0
27 Jul 2023
Leveraging Implicit Feedback from Deployment Data in Dialogue
Leveraging Implicit Feedback from Deployment Data in Dialogue
Richard Yuanzhe Pang
Stephen Roller
Kyunghyun Cho
He He
Jason Weston
51
7
0
26 Jul 2023
RLCD: Reinforcement Learning from Contrastive Distillation for Language
  Model Alignment
RLCD: Reinforcement Learning from Contrastive Distillation for Language Model Alignment
Kevin Kaichuang Yang
Dan Klein
Asli Celikyilmaz
Nanyun Peng
Yuandong Tian
ALM
38
30
0
24 Jul 2023
A Comprehensive Overview of Large Language Models
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Mian
OffRL
70
538
0
12 Jul 2023
TIM: Teaching Large Language Models to Translate with Comparison
TIM: Teaching Large Language Models to Translate with Comparison
Jiali Zeng
Fandong Meng
Yongjing Yin
Jie Zhou
37
55
0
10 Jul 2023
Let Me Teach You: Pedagogical Foundations of Feedback for Language
  Models
Let Me Teach You: Pedagogical Foundations of Feedback for Language Models
Beatriz Borges
Niket Tandon
Tanja Käser
Antoine Bosselut
31
4
0
01 Jul 2023
Large Language Models are Effective Text Rankers with Pairwise Ranking
  Prompting
Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting
Zhen Qin
R. Jagerman
Kai Hui
Honglei Zhuang
Junru Wu
...
Tianqi Liu
Jialu Liu
Donald Metzler
Xuanhui Wang
Michael Bendersky
ALM
RALM
56
224
0
30 Jun 2023
Preference Ranking Optimization for Human Alignment
Preference Ranking Optimization for Human Alignment
Feifan Song
Yu Bowen
Minghao Li
Haiyang Yu
Fei Huang
Yongbin Li
Houfeng Wang
ALM
28
239
0
30 Jun 2023
A Survey on Multimodal Large Language Models
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLM
LRM
62
562
0
23 Jun 2023
AutoML in the Age of Large Language Models: Current Challenges, Future
  Opportunities and Risks
AutoML in the Age of Large Language Models: Current Challenges, Future Opportunities and Risks
Alexander Tornede
Difan Deng
Theresa Eimer
Joseph Giovanelli
Aditya Mohan
...
Sarah Segel
Daphne Theodorakopoulos
Tanja Tornede
Henning Wachsmuth
Marius Lindauer
41
23
0
13 Jun 2023
Large Language Models Sometimes Generate Purely Negatively-Reinforced
  Text
Large Language Models Sometimes Generate Purely Negatively-Reinforced Text
Fabien Roger
SILM
20
0
0
13 Jun 2023
Artificial General Intelligence for Medical Imaging
Artificial General Intelligence for Medical Imaging
Xiang Li
Lu Zhang
Zihao Wu
Zheng Liu
Lin Zhao
...
Pingkuan Yan
Quanzheng Li
Wen Liu
Tianming Liu
Dinggang Shen
LM&MA
AI4CE
30
40
0
08 Jun 2023
When to Show a Suggestion? Integrating Human Feedback in AI-Assisted
  Programming
When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming
Hussein Mozannar
Gagan Bansal
Adam Fourney
Eric Horvitz
42
26
0
08 Jun 2023
Sequential Monte Carlo Steering of Large Language Models using
  Probabilistic Programs
Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs
Alexander K. Lew
Tan Zhi-Xuan
Gabriel Grand
Vikash K. Mansinghka
LLMSV
LRM
65
35
0
05 Jun 2023
Fine-Tuning Language Models with Advantage-Induced Policy Alignment
Fine-Tuning Language Models with Advantage-Induced Policy Alignment
Banghua Zhu
Hiteshi Sharma
Felipe Vieira Frujeri
Shi Dong
Chenguang Zhu
Michael I. Jordan
Jiantao Jiao
OSLM
33
39
0
04 Jun 2023
Training Socially Aligned Language Models on Simulated Social
  Interactions
Training Socially Aligned Language Models on Simulated Social Interactions
Ruibo Liu
Ruixin Yang
Chenyan Jia
Ge Zhang
Denny Zhou
Andrew M. Dai
Diyi Yang
Soroush Vosoughi
ALM
37
46
0
26 May 2023
Inverse Preference Learning: Preference-based RL without a Reward
  Function
Inverse Preference Learning: Preference-based RL without a Reward Function
Joey Hejna
Dorsa Sadigh
OffRL
32
48
0
24 May 2023
Leftover Lunch: Advantage-based Offline Reinforcement Learning for
  Language Models
Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models
Ashutosh Baheti
Ximing Lu
Faeze Brahman
Ronan Le Bras
Maarten Sap
Mark O. Riedl
38
9
0
24 May 2023
On Learning to Summarize with Large Language Models as References
On Learning to Summarize with Large Language Models as References
Yixin Liu
Kejian Shi
Katherine S He
Longtian Ye
Alexander R. Fabbri
Pengfei Liu
Dragomir R. Radev
Arman Cohan
ELM
43
71
0
23 May 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human
  Feedback
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
45
549
0
22 May 2023
Diffusion Language Models Generation Can Be Halted Early
Diffusion Language Models Generation Can Be Halted Early
Sofia Maria Lo Cicero Vaina
Nikita Balagansky
Daniil Gavrilov
DiffM
57
0
0
18 May 2023
Consistency Regularization for Domain Generalization with Logit
  Attribution Matching
Consistency Regularization for Domain Generalization with Logit Attribution Matching
Han Gao
Kaican Li
Weiyan Xie
Zhi Lin
Yongxiang Huang
Luning Wang
Caleb Chen Cao
N. Zhang
13
2
0
13 May 2023
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural
  Language Generation
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Patrick Fernandes
Aman Madaan
Emmy Liu
António Farinhas
Pedro Henrique Martins
...
José G. C. de Souza
Shuyan Zhou
Tongshuang Wu
Graham Neubig
André F. T. Martins
ALM
117
56
0
01 May 2023
On the Creativity of Large Language Models
On the Creativity of Large Language Models
Giorgio Franceschelli
Mirco Musolesi
74
54
0
27 Mar 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
369
3,029
0
22 Mar 2023
Direct Preference-based Policy Optimization without Reward Modeling
Direct Preference-based Policy Optimization without Reward Modeling
Gaon An
Junhyeok Lee
Xingdong Zuo
Norio Kosaka
KyungHyun Kim
Hyun Oh Song
OffRL
32
26
0
30 Jan 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
384
12,081
0
04 Mar 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
218
1,663
0
15 Oct 2021
Creativity and Machine Learning: A Survey
Creativity and Machine Learning: A Survey
Giorgio Franceschelli
Mirco Musolesi
VLM
AI4CE
34
40
0
06 Apr 2021
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
301
1,616
0
18 Sep 2019
Previous
123...515253