Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.04964
Cited By
Secrets of RLHF in Large Language Models Part I: PPO
11 July 2023
Rui Zheng
Shihan Dou
Songyang Gao
Yuan Hua
Wei Shen
Bing Wang
Yan Liu
Senjie Jin
Qin Liu
Yuhao Zhou
Limao Xiong
Luyao Chen
Zhiheng Xi
Nuo Xu
Wen-De Lai
Minghao Zhu
Cheng Chang
Zhangyue Yin
Rongxiang Weng
Wen-Chun Cheng
Haoran Huang
Tianxiang Sun
Hang Yan
Tao Gui
Qi Zhang
Xipeng Qiu
Xuanjing Huang
ALM
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Secrets of RLHF in Large Language Models Part I: PPO"
50 / 126 papers shown
Title
Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization
Sunghwan Kim
Dongjin Kang
Taeyoon Kwon
Hyungjoo Chae
Dongha Lee
Jinyoung Yeo
ALM
4
0
0
19 May 2025
InfoPO: On Mutual Information Maximization for Large Language Model Alignment
Teng Xiao
Zhen Ge
Sujay Sanghavi
Tian Wang
Julian Katz-Samuels
Marc Versage
Qingjun Cui
Trishul Chilimbi
31
0
0
13 May 2025
A Survey on Progress in LLM Alignment from the Perspective of Reward Design
Miaomiao Ji
Yanqiu Wu
Zhibin Wu
Shoujin Wang
Jian Yang
Mark Dras
Usman Naseem
39
0
0
05 May 2025
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
João Loula
Benjamin LeBrun
Li Du
Ben Lipkin
Clemente Pasti
...
Ryan Cotterel
Vikash K. Mansinghka
Alexander K. Lew
Tim Vieira
Timothy J. O'Donnell
40
2
0
17 Apr 2025
Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration
Ran Xu
W. Shi
Yuchen Zhuang
Yue Yu
Joyce C. Ho
Haoyu Wang
Carl Yang
26
1
0
07 Apr 2025
SyLeR: A Framework for Explicit Syllogistic Legal Reasoning in Large Language Models
Kepu Zhang
Weijie Yu
Zhongxiang Sun
Jun Xu
AILaw
ELM
LRM
64
0
0
05 Apr 2025
InCo-DPO: Balancing Distribution Shift and Data Quality for Enhanced Preference Optimization
Yunan Wang
Jijie Li
Bo Zhang
Liangdong Wang
Guang Liu
63
0
0
20 Mar 2025
BalancedDPO: Adaptive Multi-Metric Alignment
Dipesh Tamboli
Souradip Chakraborty
Aditya Malusare
B. Banerjee
Amrit Singh Bedi
Vaneet Aggarwal
EGVM
67
0
0
16 Mar 2025
RankPO: Preference Optimization for Job-Talent Matching
Yuyao Zhang
Hao Wu
Yu Wang
Xiaohui Wang
51
0
0
13 Mar 2025
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
Yuzhe Gu
Feiyu Xiong
Chengqi Lyu
Dahua Lin
Kai Chen
65
1
0
04 Mar 2025
PEO: Improving Bi-Factorial Preference Alignment with Post-Training Policy Extrapolation
Yuxuan Liu
45
0
0
03 Mar 2025
Improving Plasticity in Non-stationary Reinforcement Learning with Evidential Proximal Policy Optimization
Abdullah Akgul
Gulcin Baykal
Manuel Haußmann
M. Kandemir
38
0
0
03 Mar 2025
Reward Shaping to Mitigate Reward Hacking in RLHF
Jiayi Fu
Xuandong Zhao
Chengyuan Yao
Han Wang
Qi Han
Yanghua Xiao
84
6
0
26 Feb 2025
Larger or Smaller Reward Margins to Select Preferences for Alignment?
Kexin Huang
Junkang Wu
Ziqian Chen
Xue Wang
Jinyang Gao
Bolin Ding
Jiancan Wu
Xiangnan He
Xuben Wang
55
0
0
25 Feb 2025
RLTHF: Targeted Human Feedback for LLM Alignment
Yifei Xu
Tusher Chakraborty
Emre Kıcıman
Bibek Aryal
Eduardo Rodrigues
...
Rafael Padilha
Leonardo Nunes
Shobana Balakrishnan
Songwu Lu
Ranveer Chandra
118
1
0
24 Feb 2025
Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models
Yingshui Tan
Yilei Jiang
Heng Chang
Jiaheng Liu
Xingyuan Bu
Wenbo Su
Xiangyu Yue
Xiaoyong Zhu
Bo Zheng
ALM
84
1
0
17 Feb 2025
Bone Soups: A Seek-and-Soup Model Merging Approach for Controllable Multi-Objective Generation
Guofu Xie
Xiao Zhang
Ting Yao
Yunsheng Shi
MoMe
63
1
0
15 Feb 2025
IPO: Iterative Preference Optimization for Text-to-Video Generation
Xiaomeng Yang
Zhiyu Tan
Xuecheng Nie
VGen
109
1
0
04 Feb 2025
Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning
Hao Sun
M. Schaar
94
14
0
28 Jan 2025
Graph Generative Pre-trained Transformer
Xiaohui Chen
Yinkai Wang
Jiaxing He
Yuanqi Du
S. Hassoun
Xiaolin Xu
Li Liu
43
1
0
03 Jan 2025
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation
Weilong Dong
Xinwei Wu
Renren Jin
Shaoyang Xu
Deyi Xiong
65
7
0
31 Dec 2024
Robust Multi-bit Text Watermark with LLM-based Paraphrasers
Xiaojun Xu
Jinghan Jia
Yuanshun Yao
Yang Liu
Hang Li
77
0
0
04 Dec 2024
Explainable CTR Prediction via LLM Reasoning
Xiaohan Yu
Li Zhang
C. L. Philip Chen
OffRL
LRM
69
1
0
03 Dec 2024
Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms
Minghe Gao
Wendong Bu
Bingchen Miao
Yang Wu
Yunfei Li
Juncheng Billy Li
Siliang Tang
Qi Wu
Yueting Zhuang
Meng Wang
LM&Ro
47
3
0
17 Nov 2024
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
62
48
1
15 Nov 2024
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment
Mingzhi Wang
Chengdong Ma
Qizhi Chen
Linjian Meng
Yang Han
Jiancong Xiao
Zhaowei Zhang
Jing Huo
Weijie Su
Yaodong Yang
32
4
0
22 Oct 2024
On Designing Effective RL Reward at Training Time for LLM Reasoning
Jiaxuan Gao
Shusheng Xu
Wenjie Ye
Weilin Liu
Chuyi He
Wei Fu
Zhiyu Mei
Guangju Wang
Yi Wu
OffRL
LRM
43
12
0
19 Oct 2024
SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment
Qin Liu
Fei Wang
Chaowei Xiao
Muhao Chen
178
0
0
18 Oct 2024
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
Enyu Zhou
Guodong Zheng
Binghui Wang
Zhiheng Xi
Shihan Dou
...
Yurong Mou
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
ALM
65
18
0
13 Oct 2024
Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both
Abhijnan Nath
Changsoo Jung
Ethan Seefried
Nikhil Krishnaswamy
164
1
0
11 Oct 2024
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Noam Razin
Sadhika Malladi
Adithya Bhaskar
Danqi Chen
Sanjeev Arora
Boris Hanin
99
16
0
11 Oct 2024
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Shenao Zhang
Zhihan Liu
Boyi Liu
Wenjie Qu
Yingxiang Yang
Y. Liu
Liyu Chen
Tao Sun
Ziyi Wang
101
3
0
10 Oct 2024
Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning
Chongyu Fan
Jiancheng Liu
Licong Lin
Jinghan Jia
Ruiqi Zhang
Song Mei
Sijia Liu
MU
43
17
0
09 Oct 2024
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time
Yi Ding
Bolian Li
Ruqi Zhang
MLLM
72
7
0
09 Oct 2024
Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning
Hao Ma
Tianyi Hu
Zhiqiang Pu
Boyin Liu
Xiaolin Ai
Yanyan Liang
Min Chen
50
3
0
08 Oct 2024
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths
Yew Ken Chia
Guizhen Chen
Weiwen Xu
Luu Anh Tuan
Soujanya Poria
Lidong Bing
LRM
28
0
0
07 Oct 2024
Investigating on RLHF methodology
Alexey Kutalev
Sergei Markoff
34
0
0
02 Oct 2024
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
Amirhossein Kazemnejad
Milad Aghajohari
Eva Portelance
Alessandro Sordoni
Siva Reddy
Rameswar Panda
Nicolas Le Roux
OffRL
LRM
36
27
0
02 Oct 2024
FlipGuard: Defending Preference Alignment against Update Regression with Constrained Optimization
Mingye Zhu
Yi Liu
Quan Wang
Junbo Guo
Zhendong Mao
29
1
0
01 Oct 2024
The Perfect Blend: Redefining RLHF with Mixture of Judges
Tengyu Xu
Eryk Helenowski
Karthik Abinav Sankararaman
Di Jin
Kaiyan Peng
...
Gabriel Cohen
Yuandong Tian
Hao Ma
Sinong Wang
Han Fang
41
9
0
30 Sep 2024
VickreyFeedback: Cost-efficient Data Construction for Reinforcement Learning from Human Feedback
Guoxi Zhang
Jiuding Duan
40
1
0
27 Sep 2024
Post-hoc Reward Calibration: A Case Study on Length Bias
Zeyu Huang
Zihan Qiu
Zili Wang
Edoardo M. Ponti
Ivan Titov
40
5
0
25 Sep 2024
Self-Evolutionary Large Language Models through Uncertainty-Enhanced Preference Optimization
Jianing Wang
Yang Zhou
Xiaocheng Zhang
Mengjiao Bao
Peng Yan
30
1
0
17 Sep 2024
Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison
Judy Hanwen Shen
Archit Sharma
Jun Qin
50
4
0
15 Sep 2024
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Wenxuan Zhang
Philip Torr
Mohamed Elhoseiny
Adel Bibi
88
9
0
27 Aug 2024
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates
Hui Wei
Shenghua He
Tian Xia
Andy H. Wong
Jingyang Lin
Mei Han
Mei Han
ALM
ELM
66
23
0
23 Aug 2024
SEAL: Systematic Error Analysis for Value ALignment
Manon Revel
Matteo Cargnelutti
Tyna Eloundou
Greg Leppert
40
3
0
16 Aug 2024
Token-Supervised Value Models for Enhancing Mathematical Problem-Solving Capabilities of Large Language Models
Jung Hyun Lee
June Yong Yang
Byeongho Heo
Dongyoon Han
Kang Min Yoo
Eunho Yang
Kang Min Yoo
LRM
32
0
0
12 Jul 2024
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study
Shihan Dou
Haoxiang Jia
Shenxi Wu
Huiyuan Zheng
Weikang Zhou
...
Xunliang Cai
Tao Gui
Xipeng Qiu
Qi Zhang
Xuanjing Huang
38
32
0
08 Jul 2024
BAPO: Base-Anchored Preference Optimization for Personalized Alignment in Large Language Models
Gihun Lee
Minchan Jeong
Yujin Kim
Hojung Jung
Jaehoon Oh
Sangmook Kim
Se-Young Yun
35
1
0
30 Jun 2024
1
2
3
Next