Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.07691
Cited By
v1
v2 (latest)
ORPO: Monolithic Preference Optimization without Reference Model
12 March 2024
Jiwoo Hong
Noah Lee
James Thorne
OSLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ORPO: Monolithic Preference Optimization without Reference Model"
24 / 174 papers shown
Title
PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning
Shiva K. Pentyala
Zhichao Wang
Bin Bi
Kiran Ramnath
Xiang-Bo Mao
Regunathan Radhakrishnan
S. Asur
Na
Cheng
MoMe
47
8
0
25 Jun 2024
PORT: Preference Optimization on Reasoning Traces
Salem Lahlou
Abdalgader Abubaker
Hakim Hacid
LRM
112
5
0
23 Jun 2024
A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems
Florin Cuconasu
Giovanni Trappolini
Nicola Tonellotto
Fabrizio Silvestri
83
2
0
21 Jun 2024
Aligning Large Language Models with Diverse Political Viewpoints
Dominik Stammbach
Philine Widmer
Eunjung Cho
Çağlar Gülçehre
Elliott Ash
96
5
0
20 Jun 2024
Low-Redundant Optimization for Large Language Model Alignment
Zhipeng Chen
Kun Zhou
Wayne Xin Zhao
Jingyuan Wang
Ji-Rong Wen
83
3
0
18 Jun 2024
Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency
Leonidas Gee
Milan Gritta
Gerasimos Lampouras
Ignacio Iacobacci
107
10
0
18 Jun 2024
WPO: Enhancing RLHF with Weighted Preference Optimization
Wenxuan Zhou
Ravi Agrawal
Shujian Zhang
Sathish Indurthi
Sanqiang Zhao
Kaiqiang Song
Silei Xu
Chenguang Zhu
102
20
0
17 Jun 2024
A Survey on Human Preference Learning for Large Language Models
Ruili Jiang
Kehai Chen
Xuefeng Bai
Zhixuan He
Juntao Li
Muyun Yang
Tiejun Zhao
Liqiang Nie
Min Zhang
128
9
0
17 Jun 2024
Step-level Value Preference Optimization for Mathematical Reasoning
Guoxin Chen
Minpeng Liao
Chengxi Li
Kai Fan
LRM
95
42
0
16 Jun 2024
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference
Jiwoo Hong
Sayak Paul
Noah Lee
Kashif Rasul
James Thorne
Jongheon Jeong
99
18
0
10 Jun 2024
Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization
Yi Gu
Zhendong Wang
Yueqin Yin
Yujia Xie
Mingyuan Zhou
98
17
0
10 Jun 2024
PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration
Huiping Zhuang
Jianwei Wang
Zhengdong Lu
Huiping Zhuang
Haoran Li
Huiping Zhuang
Cen Chen
RALM
KELM
123
8
0
03 Jun 2024
Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment
Keming Lu
Bowen Yu
Fei Huang
Yang Fan
Runji Lin
Chang Zhou
MoMe
83
21
0
28 May 2024
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Zhihan Liu
Miao Lu
Shenao Zhang
Boyi Liu
Hongyi Guo
Yingxiang Yang
Jose H. Blanchet
Zhaoran Wang
145
62
0
26 May 2024
SimPO: Simple Preference Optimization with a Reference-Free Reward
Yu Meng
Mengzhou Xia
Danqi Chen
161
492
0
23 May 2024
360Zhinao Technical Report
360Zhinao Team
62
0
0
22 May 2024
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
Minghao Wu
Jiahao Xu
Yulin Yuan
Gholamreza Haffari
Longyue Wang
Weihua Luo
Kaifu Zhang
LLMAG
179
27
0
20 May 2024
Advanced Natural-based interaction for the ITAlian language: LLaMAntino-3-ANITA
Marco Polignano
Pierpaolo Basile
Giovanni Semeraro
76
20
0
11 May 2024
D2PO: Discriminator-Guided DPO with Response Evaluation Models
Prasann Singhal
Nathan Lambert
S. Niekum
Tanya Goyal
Greg Durrett
OffRL
EGVM
69
6
0
02 May 2024
Self-Explore to Avoid the Pit: Improving the Reasoning Capabilities of Language Models with Fine-grained Rewards
Hyeonbin Hwang
Doyoung Kim
Seungone Kim
Seonghyeon Ye
Minjoon Seo
LRM
ReLM
81
17
0
16 Apr 2024
Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model Alignment
Yuu Jinnai
Tetsuro Morimura
Kaito Ariu
Kenshi Abe
135
8
0
01 Apr 2024
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Yaowei Zheng
Richong Zhang
Junhao Zhang
Yanhan Ye
Zheyan Luo
Zhangchi Feng
Yongqiang Ma
157
558
0
20 Mar 2024
Noise Contrastive Alignment of Language Models with Explicit Rewards
Huayu Chen
Guande He
Lifan Yuan
Ganqu Cui
Hang Su
Jun Zhu
110
56
0
08 Feb 2024
Let Me Teach You: Pedagogical Foundations of Feedback for Language Models
Beatriz Borges
Niket Tandon
Tanja Käser
Antoine Bosselut
142
4
0
01 Jul 2023
Previous
1
2
3
4