Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.18290
Cited By
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
29 May 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Direct Preference Optimization: Your Language Model is Secretly a Reward Model"
50 / 2,637 papers shown
Title
Revisiting Ensemble Methods for Stock Trading and Crypto Trading Tasks at ACM ICAIF FinRL Contest 2023-2024
Nikolaus Holzer
Keyi Wang
Kairong Xiao
Xiao-Yang Liu Yanglet
AIFin
35
1
0
18 Jan 2025
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Yannis Flet-Berliac
Nathan Grinsztajn
Florian Strub
Bill Wu
Eugene Choi
...
Arash Ahmadian
Yash Chandak
M. G. Azar
Olivier Pietquin
Matthieu Geist
OffRL
68
5
0
17 Jan 2025
A General Framework for Inference-time Scaling and Steering of Diffusion Models
R. Singhal
Zachary Horvitz
Ryan Teehan
Mengye Ren
Zhou Yu
Kathleen McKeown
Rajesh Ranganath
DiffM
74
16
0
17 Jan 2025
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models
Yong-Hyun Park
Sangdoo Yun
Jin-Hwa Kim
Junho Kim
Geonhui Jang
Yonghyun Jeong
Junghyo Jo
Gayoung Lee
81
14
0
17 Jan 2025
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Junyi Ao
Yuancheng Wang
Xiaohai Tian
Dekun Chen
Jingyang Zhang
Lu Lu
Yansen Wang
Haizhou Li
Zhikai Wu
AuLLM
93
19
0
17 Jan 2025
Tapping the Potential of Large Language Models as Recommender Systems: A Comprehensive Framework and Empirical Analysis
Lanling Xu
Junjie Zhang
Bingqian Li
Jinpeng Wang
Sheng Chen
Wayne Xin Zhao
Ji-Rong Wen
85
18
0
17 Jan 2025
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints
Jonathan Nöther
Adish Singla
Goran Radanović
AAML
67
0
0
14 Jan 2025
Foundation Models at Work: Fine-Tuning for Fairness in Algorithmic Hiring
Buse Sibel Korkmaz
Rahul Nair
Elizabeth M. Daly
Evangelos Anagnostopoulos
Christos Varytimidis
Antonio del Rio Chanona
45
0
0
13 Jan 2025
VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning
Ji Soo Lee
Jongha Kim
Jeehye Na
Jinyoung Park
H. Kim
VGen
43
0
0
12 Jan 2025
Large Language Models, Knowledge Graphs and Search Engines: A Crossroads for Answering Users' Questions
Aidan Hogan
Xin Luna Dong
Denny Vrandečić
Gerhard Weikum
62
2
0
12 Jan 2025
Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation
Xiaoying Xing
Avinab Saha
Junfeng He
Susan Hao
Paul Vicol
...
Sahil Singla
Sarah Young
Yinxiao Li
Feng Yang
Deepak Ramachandran
DiffM
57
1
0
11 Jan 2025
Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques
Natalia Zhang
X. Wang
Qiwen Cui
Runlong Zhou
Sham Kakade
Simon S. Du
OffRL
61
0
0
10 Jan 2025
Feedback-Driven Vision-Language Alignment with Minimal Human Supervision
Giorgio Giannone
Ruoteng Li
Qianli Feng
Evgeny Perevodchikov
Rui Chen
Aleix M. Martinez
VLM
71
0
0
08 Jan 2025
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Lester James V. Miranda
Yizhong Wang
Yanai Elazar
Sachin Kumar
Valentina Pyatkin
Faeze Brahman
Noah A. Smith
Hannaneh Hajishirzi
Pradeep Dasigi
57
8
0
08 Jan 2025
MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue
Fengxiang Wang
Ranjie Duan
Peng Xiao
Xiaojun Jia
Shiji Zhao
...
Hang Su
Jialing Tao
Hui Xue
Jun Zhu
Hui Xue
LLMAG
69
7
0
08 Jan 2025
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Yueqin Yin
Shentao Yang
Yujia Xie
Ziyi Yang
Yuting Sun
Hany Awadalla
Weizhu Chen
Mingyuan Zhou
52
1
0
07 Jan 2025
IIMedGPT: Promoting Large Language Model Capabilities of Medical Tasks by Efficient Human Preference Alignment
Yiming Zhang
Zheng Chang
Wentao Cai
MengXing Ren
Kang Yuan
Yining Sun
Zenghui Ding
LM&MA
46
3
0
06 Jan 2025
Foundations of GenIR
Qingyao Ai
Jingtao Zhan
Yang Liu
54
0
0
06 Jan 2025
Improving GenIR Systems Based on User Feedback
Qingyao Ai
Zhicheng Dou
Min Zhang
242
0
0
06 Jan 2025
Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning
Zhen Li
Yupeng Su
Runming Yang
C. Xie
Zehua Wang
Zhongwei Xie
Ngai Wong
Hongxia Yang
MQ
LRM
64
3
0
06 Jan 2025
Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications
Zhe Chen
Yusheng Liao
Shuyang Jiang
Pingjie Wang
Yu Guo
Yucheng Wang
Yu Wang
48
3
0
05 Jan 2025
SR-Reward: Taking The Path More Traveled
Seyed Mahdi Basiri Azad
Zahra Padar
Gabriel Kalweit
Joschka Boedecker
OffRL
77
0
0
04 Jan 2025
Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection
Yachao Zhao
Bo Wang
Yan Wang
63
2
0
04 Jan 2025
Verbosity-Aware Rationale Reduction: Effective Reduction of Redundant Rationale via Principled Criteria
Joonwon Jang
Jaehee Kim
Wonbin Kweon
Hwanjo Yu
LRM
47
1
0
03 Jan 2025
Enhancing Code LLMs with Reinforcement Learning in Code Generation: A Survey
Junqiao Wang
Zeng Zhang
Yangfan He
Yuyang Song
Tianyu Shi
...
Hengyuan Xu
Kunyu Wu
Guangwu Qian
Qiuwu Chen
Lewei He
48
11
0
03 Jan 2025
DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning
Utsav Singh
Souradip Chakraborty
Wesley A Suttle
Brian M. Sadler
Vinay P. Namboodiri
Amrit Singh Bedi
OffRL
58
0
0
03 Jan 2025
ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze
Chunyu Xuan
Yazhe Niu
Yuan Pu
Shuai Hu
Yu Liu
Jing Yang
78
0
0
03 Jan 2025
Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search
Shuangtao Li
Shuaihao Dong
Kexin Luan
Xinhan Di
Chaofan Ding
LRM
57
2
0
02 Jan 2025
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
Hashmath Shaik
Alex Doboli
OffRL
ELM
235
0
0
31 Dec 2024
Genetic-guided GFlowNets for Sample Efficient Molecular Optimization
Hyeon-Seob Kim
Minsu Kim
Sanghyeok Choi
Jinkyoo Park
56
3
0
31 Dec 2024
Towards Effective Discrimination Testing for Generative AI
Thomas P. Zollo
Nikita Rajaneesh
Richard Zemel
Talia B. Gillis
Emily Black
51
1
0
31 Dec 2024
Zero-Indexing Internet Search Augmented Generation for Large Language Models
Guangxin He
Zonghong Dai
Jiangcheng Zhu
Binqiang Zhao
Qicheng Hu
Chenyue Li
You Peng
Chen Wang
Binhang Yuan
71
0
0
31 Dec 2024
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Hanguang Xiao
Feizhong Zhou
Xianglong Liu
Tianqi Liu
Zhipeng Li
Xin Liu
Xiaoxuan Huang
AILaw
LM&MA
LRM
68
19
0
31 Dec 2024
LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekeeping Robots
Dongge Han
Trevor A. McInroe
Adam Jelley
Stefano V. Albrecht
Peter Bell
Amos Storkey
70
11
0
31 Dec 2024
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
LLM-jp
Akiko Aizawa
Eiji Aramaki
Bowen Chen
Fei Cheng
...
Yuya Yamamoto
Yusuke Yamauchi
Hitomi Yanaka
Rio Yokota
Koichiro Yoshino
62
14
0
31 Dec 2024
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Yulei Qin
Yuncheng Yang
Pengcheng Guo
Gang Li
Hang Shao
Yuchen Shi
Zihan Xu
Yun Gu
Ke Li
Xing Sun
ALM
99
12
0
31 Dec 2024
Disentangling Preference Representation and Text Generation for Efficient Individual Preference Alignment
Jianfei Zhang
Jun Bai
Yangqiu Song
Yanmeng Wang
Rumei Li
Chenghua Lin
Wenge Rong
49
0
0
31 Dec 2024
Geometric-Averaged Preference Optimization for Soft Preference Labels
Hiroki Furuta
Kuang-Huei Lee
Shixiang Shane Gu
Y. Matsuo
Aleksandra Faust
Heiga Zen
Izzeddin Gur
60
7
0
31 Dec 2024
AlignAb: Pareto-Optimal Energy Alignment for Designing Nature-Like Antibodies
Yibo Wen
Chenwei Xu
Jerry Yao-Chieh Hu
Han Liu
DiffM
53
4
0
31 Dec 2024
Nash CoT: Multi-Path Inference with Preference Equilibrium
Ziqi Zhang
Cunxiang Wang
Xiong Xiao
Yue Zhang
Donglin Wang
LRM
49
1
0
31 Dec 2024
Natural Language Fine-Tuning
Jiaheng Liu
Yue Wang
Zhiqi Lin
Min Chen
Yixue Hao
Long Hu
36
1
0
31 Dec 2024
From Generalist to Specialist: A Survey of Large Language Models for Chemistry
Yang Han
Ziping Wan
Lu Chen
Kai Yu
Xin Chen
LM&MA
42
1
0
31 Dec 2024
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation
Weilong Dong
Xinwei Wu
Renren Jin
Shaoyang Xu
Deyi Xiong
68
8
0
31 Dec 2024
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
Chia-Yu Hung
Navonil Majumder
Zhifeng Kong
Ambuj Mehrish
Rafael Valle
Bryan Catanzaro
Soujanya Poria
Bryan Catanzaro
Soujanya Poria
57
6
0
30 Dec 2024
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Xingyu Chen
Jiahao Xu
Tian Liang
Zhiwei He
Jianhui Pang
...
Zizhuo Zhang
Rui Wang
Zhaopeng Tu
Haitao Mi
Dong Yu
LRM
ReLM
72
118
0
30 Dec 2024
FaGeL: Fabric LLMs Agent empowered Embodied Intelligence Evolution with Autonomous Human-Machine Collaboration
Jia Liu
Min Chen
LM&Ro
AI4CE
47
2
0
28 Dec 2024
Malware Classification using a Hybrid Hidden Markov Model-Convolutional Neural Network
Ritik Mehta
Olha Jurecková
Mark Stamp
66
0
0
25 Dec 2024
A Statistical Framework for Ranking LLM-Based Chatbots
Siavash Ameli
Siyuan Zhuang
Ion Stoica
Michael W. Mahoney
ELM
48
1
0
24 Dec 2024
Multimodal Preference Data Synthetic Alignment with Reward Model
Robert Wijaya
Ngoc-Bao Nguyen
Ngai-man Cheung
MLLM
SyDa
67
3
0
23 Dec 2024
Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning
Huchen Jiang
Yangyang Ma
Chaofan Ding
Kexin Luan
Xinhan Di
ReLM
LRM
54
2
0
23 Dec 2024
Previous
1
2
3
...
17
18
19
...
51
52
53
Next