Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.08507
Cited By
InfoPO: On Mutual Information Maximization for Large Language Model Alignment
13 May 2025
Teng Xiao
Zhen Ge
Sujay Sanghavi
Tian Wang
Julian Katz-Samuels
Marc Versage
Qingjun Cui
Trishul Chilimbi
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"InfoPO: On Mutual Information Maximization for Large Language Model Alignment"
41 / 41 papers shown
Title
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao
Yige Yuan
Ziyang Chen
Mingxiao Li
Shangsong Liang
Zhaochun Ren
V. Honavar
225
11
0
21 Feb 2025
How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective
Teng Xiao
Mingxiao Li
Yige Yuan
Huaisheng Zhu
Chao Cui
V. Honavar
ALM
63
9
0
14 Oct 2024
SimPO: Simple Preference Optimization with a Reference-Free Reward
Yu Meng
Mengzhou Xia
Danqi Chen
140
488
0
23 May 2024
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Fahim Tajwar
Anika Singh
Archit Sharma
Rafael Rafailov
Jeff Schneider
Tengyang Xie
Stefano Ermon
Chelsea Finn
Aviral Kumar
90
131
0
22 Apr 2024
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Shusheng Xu
Wei Fu
Jiaxuan Gao
Wenjie Ye
Weiling Liu
Zhiyu Mei
Guangju Wang
Chao Yu
Yi Wu
143
165
0
16 Apr 2024
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Corby Rosset
Ching-An Cheng
Arindam Mitra
Michael Santacroce
Ahmed Hassan Awadallah
Tengyang Xie
196
132
0
04 Apr 2024
Advancing LLM Reasoning Generalists with Preference Trees
Lifan Yuan
Ganqu Cui
Hanbin Wang
Ning Ding
Xingyao Wang
...
Zhenghao Liu
Bowen Zhou
Hao Peng
Zhiyuan Liu
Maosong Sun
LRM
124
122
0
02 Apr 2024
Disentangling Length from Quality in Direct Preference Optimization
Ryan Park
Rafael Rafailov
Stefano Ermon
Chelsea Finn
ALM
93
145
0
28 Mar 2024
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Arka Pal
Deep Karkhanis
Samuel Dooley
Manley Roberts
Siddartha Naidu
Colin White
OSLM
86
154
0
20 Feb 2024
Generalized Preference Optimization: A Unified Approach to Offline Alignment
Yunhao Tang
Z. Guo
Zeyu Zheng
Daniele Calandriello
Rémi Munos
Mark Rowland
Pierre Harvey Richemond
Michal Valko
Bernardo Avila-Pires
Bilal Piot
59
120
0
08 Feb 2024
Noise Contrastive Alignment of Language Models with Explicit Rewards
Huayu Chen
Guande He
Lifan Yuan
Ganqu Cui
Hang Su
Jun Zhu
97
56
0
08 Feb 2024
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
270
565
0
02 Feb 2024
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
Haoran Xu
Amr Sharaf
Yunmo Chen
Weiting Tan
Lingfeng Shen
Benjamin Van Durme
Kenton W. Murray
Young Jin Kim
ALM
107
263
0
16 Jan 2024
A General Theoretical Paradigm to Understand Learning from Human Preferences
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
180
645
0
18 Oct 2023
A General Offline Reinforcement Learning Framework for Interactive Recommendation
Teng Xiao
Donglin Wang
OffRL
79
74
0
01 Oct 2023
Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints
Chaoqi Wang
Yibo Jiang
Yuguang Yang
Han Liu
Yuxin Chen
73
105
0
28 Sep 2023
Statistical Rejection Sampling Improves Preference Optimization
Tianqi Liu
Yao-Min Zhao
Rishabh Joshi
Misha Khalman
Mohammad Saleh
Peter J. Liu
Jialu Liu
131
249
0
13 Sep 2023
Secrets of RLHF in Large Language Models Part I: PPO
Rui Zheng
Shihan Dou
Songyang Gao
Yuan Hua
Wei Shen
...
Hang Yan
Tao Gui
Qi Zhang
Xipeng Qiu
Xuanjing Huang
ALM
OffRL
115
174
0
11 Jul 2023
LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion
Dongfu Jiang
Xiang Ren
Bill Yuchen Lin
ELM
84
332
0
05 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
389
4,163
0
29 May 2023
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
Yao-Min Zhao
Rishabh Joshi
Tianqi Liu
Misha Khalman
Mohammad Saleh
Peter J. Liu
83
307
0
17 May 2023
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Stella Biderman
Hailey Schoelkopf
Quentin G. Anthony
Herbie Bradley
Kyle O'Brien
...
USVSN Sai Prashanth
Edward Raff
Aviya Skowron
Lintang Sutawika
Oskar van der Wal
107
1,307
0
03 Apr 2023
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
256
2,623
0
12 Apr 2022
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELM
AIMat
ReCod
ALM
216
2,004
0
16 Aug 2021
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
236
5,665
0
07 Jul 2021
Conditional Contrastive Learning for Improving Fairness in Self-Supervised Learning
Martin Q. Ma
Yao-Hung Hubert Tsai
Paul Pu Liang
Han Zhao
Kun Zhang
Ruslan Salakhutdinov
Louis-Philippe Morency
SSL
52
16
0
05 Jun 2021
Learning to summarize from human feedback
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
259
2,189
0
02 Sep 2020
Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO
Logan Engstrom
Andrew Ilyas
Shibani Santurkar
Dimitris Tsipras
Firdaus Janoos
L. Rudolph
Aleksander Madry
AAML
61
229
0
25 May 2020
A Simple Framework for Contrastive Learning of Visual Representations
Ting-Li Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
SSL
387
18,866
0
13 Feb 2020
Understanding the Limitations of Variational Mutual Information Estimators
Jiaming Song
Stefano Ermon
SSL
DRL
74
204
0
14 Oct 2019
On Mutual Information Maximization for Representation Learning
Michael Tschannen
Josip Djolonga
Paul Kishan Rubenstein
Sylvain Gelly
Mario Lucic
SSL
184
502
0
31 Jul 2019
Learning Representations by Maximizing Mutual Information Across Views
Philip Bachman
R. Devon Hjelm
William Buchwalter
SSL
195
1,479
0
03 Jun 2019
On Variational Bounds of Mutual Information
Ben Poole
Sherjil Ozair
Aaron van den Oord
Alexander A. Alemi
George Tucker
SSL
111
814
0
16 May 2019
Variational Information Distillation for Knowledge Transfer
SungSoo Ahn
S. Hu
Andreas C. Damianou
Neil D. Lawrence
Zhenwen Dai
89
621
0
11 Apr 2019
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord
Yazhe Li
Oriol Vinyals
DRL
SSL
351
10,364
0
10 Jul 2018
Diversity is All You Need: Learning Skills without a Reward Function
Benjamin Eysenbach
Abhishek Gupta
Julian Ibarz
Sergey Levine
113
1,089
0
16 Feb 2018
MINE: Mutual Information Neural Estimation
Mohamed Ishmael Belghazi
A. Baratin
Sai Rajeswar
Sherjil Ozair
Yoshua Bengio
Aaron Courville
R. Devon Hjelm
DRL
196
1,285
0
12 Jan 2018
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
541
19,296
0
20 Jul 2017
Deep reinforcement learning from human preferences
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
218
3,365
0
12 Jun 2017
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.1K
150,364
0
22 Dec 2014
Estimating divergence functionals and the likelihood ratio by convex risk minimization
X. Nguyen
Martin J. Wainwright
Michael I. Jordan
229
806
0
04 Sep 2008
1