ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.07691
  4. Cited By
ORPO: Monolithic Preference Optimization without Reference Model
v1v2 (latest)

ORPO: Monolithic Preference Optimization without Reference Model

12 March 2024
Jiwoo Hong
Noah Lee
James Thorne
    OSLM
ArXiv (abs)PDFHTML

Papers citing "ORPO: Monolithic Preference Optimization without Reference Model"

50 / 174 papers shown
Title
Cross-lingual Transfer of Reward Models in Multilingual Alignment
Cross-lingual Transfer of Reward Models in Multilingual Alignment
Jiwoo Hong
Noah Lee
Rodrigo Martínez-Castaño
César Rodríguez
James Thorne
137
6
0
23 Oct 2024
Augmenting Legal Decision Support Systems with LLM-based NLI for
  Analyzing Social Media Evidence
Augmenting Legal Decision Support Systems with LLM-based NLI for Analyzing Social Media Evidence
Ram Mohan Rao Kadiyala
Siddartha Pullakhandam
Kanwal Mehreen
Subhasya Tippareddy
Ashay Srivastava
AILaw
63
1
0
21 Oct 2024
Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning
Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning
H. Fernando
Han Shen
Parikshit Ram
Yi Zhou
Horst Samulowitz
Nathalie Baracaldo
Tianyi Chen
CLL
166
4
0
20 Oct 2024
Optimizing Preference Alignment with Differentiable NDCG Ranking
Optimizing Preference Alignment with Differentiable NDCG Ranking
Jiacong Zhou
Xianyun Wang
Jun Yu
108
2
0
17 Oct 2024
Qtok: A Comprehensive Framework for Evaluating Multilingual Tokenizer
  Quality in Large Language Models
Qtok: A Comprehensive Framework for Evaluating Multilingual Tokenizer Quality in Large Language Models
Iaroslav Chelombitko
Egor Safronov
Aleksey Komissarov
76
1
0
16 Oct 2024
PRefLexOR: Preference-based Recursive Language Modeling for Exploratory
  Optimization of Reasoning and Agentic Thinking
PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking
Markus J. Buehler
ReLMLRM
74
6
0
16 Oct 2024
CREAM: Consistency Regularized Self-Rewarding Language Models
CREAM: Consistency Regularized Self-Rewarding Language Models
Zhaoxiang Wang
Weilei He
Zhiyuan Liang
Xuchao Zhang
Chetan Bansal
Ying Wei
Weitong Zhang
Huaxiu Yao
ALM
181
12
0
16 Oct 2024
Understanding Likelihood Over-optimisation in Direct Alignment
  Algorithms
Understanding Likelihood Over-optimisation in Direct Alignment Algorithms
Zhengyan Shi
Sander Land
Acyr Locatelli
Matthieu Geist
Max Bartolo
110
8
0
15 Oct 2024
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Jihan Yao
Wenxuan Ding
Shangbin Feng
Lucy Lu Wang
Yulia Tsvetkov
71
2
0
14 Oct 2024
Taming Overconfidence in LLMs: Reward Calibration in RLHF
Taming Overconfidence in LLMs: Reward Calibration in RLHF
Jixuan Leng
Chengsong Huang
Banghua Zhu
Jiaxin Huang
123
16
0
13 Oct 2024
Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both
Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both
Abhijnan Nath
Changsoo Jung
Ethan Seefried
Nikhil Krishnaswamy
484
4
0
11 Oct 2024
Evolutionary Contrastive Distillation for Language Model Alignment
Evolutionary Contrastive Distillation for Language Model Alignment
Julian Katz-Samuels
Zheng Li
Hyokun Yun
Priyanka Nigam
Yi Xu
Vaclav Petricek
Bing Yin
Trishul Chilimbi
ALMSyDa
31
0
0
10 Oct 2024
TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees
TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees
Weibin Liao
Xu Chu
Yasha Wang
LRM
136
8
0
10 Oct 2024
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Shenao Zhang
Zhihan Liu
Boyi Liu
Yanzhe Zhang
Yingxiang Yang
Yunxing Liu
Liyu Chen
Tao Sun
Ziyi Wang
167
3
0
10 Oct 2024
SparsePO: Controlling Preference Alignment of LLMs via Sparse Token
  Masks
SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks
Fenia Christopoulou
Ronald Cardenas
Gerasimos Lampouras
Haitham Bou-Ammar
Jun Wang
79
2
0
07 Oct 2024
Reasoning Paths Optimization: Learning to Reason and Explore From
  Diverse Paths
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths
Yew Ken Chia
Guizhen Chen
Weiwen Xu
Luu Anh Tuan
Soujanya Poria
Lidong Bing
LRM
57
1
0
07 Oct 2024
MVP-Bench: Can Large Vision--Language Models Conduct Multi-level Visual
  Perception Like Humans?
MVP-Bench: Can Large Vision--Language Models Conduct Multi-level Visual Perception Like Humans?
Guanzhen Li
Yuxi Xie
Min-Yen Kan
VLM
392
1
0
06 Oct 2024
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao
Wenhao Zhan
Jonathan D. Chang
Gokul Swamy
Kianté Brantley
Jason D. Lee
Wen Sun
OffRL
140
7
0
06 Oct 2024
Improving LLM Reasoning through Scaling Inference Computation with
  Collaborative Verification
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification
Zhenwen Liang
Ye Liu
Tong Niu
Xiangliang Zhang
Yingbo Zhou
Semih Yavuz
LRM
79
25
0
05 Oct 2024
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Hanyang Zhao
Genta Indra Winata
Anirban Das
Shi-Xiong Zhang
D. Yao
Wenpin Tang
Sambit Sahu
109
9
0
05 Oct 2024
X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale
X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale
Haoran Xu
Kenton W. Murray
Philipp Koehn
Hieu T. Hoang
Akiko Eriguchi
Huda Khayrallah
140
15
0
04 Oct 2024
Investigating on RLHF methodology
Investigating on RLHF methodology
Alexey Kutalev
Sergei Markoff
43
0
0
02 Oct 2024
Beyond Scalar Reward Model: Learning Generative Judge from Preference
  Data
Beyond Scalar Reward Model: Learning Generative Judge from Preference Data
Ziyi Ye
Xiangsheng Li
Qiuchi Li
Qingyao Ai
Yujia Zhou
Wei Shen
Dong Yan
Yiqun Liu
120
17
0
01 Oct 2024
Evaluation of Large Language Models for Summarization Tasks in the
  Medical Domain: A Narrative Review
Evaluation of Large Language Models for Summarization Tasks in the Medical Domain: A Narrative Review
Emma Croxford
Yanjun Gao
Nicholas Pellegrino
Karen K. Wong
Graham Wills
Elliot First
Frank J. Liao
Cherodeep Goswami
Brian Patterson
Majid Afshar
HILMELMLM&MA
129
1
0
26 Sep 2024
Self-supervised Preference Optimization: Enhance Your Language Model
  with Preference Degree Awareness
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness
Jian Li
Haojing Huang
Yujia Zhang
Pengfei Xu
Xi Chen
Rui Song
Lida Shi
Jingwen Wang
Hao Xu
48
0
0
26 Sep 2024
Modulated Intervention Preference Optimization (MIPO): Keep the Easy,
  Refine the Difficult
Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult
Cheolhun Jang
61
0
0
26 Sep 2024
Just Say What You Want: Only-prompting Self-rewarding Online Preference
  Optimization
Just Say What You Want: Only-prompting Self-rewarding Online Preference Optimization
Ruijie Xu
Zhihan Liu
Yongfei Liu
Shipeng Yan
Zhaoran Wang
Zhi-Li Zhang
Xuming He
ALM
80
1
0
26 Sep 2024
Aligning Language Models Using Follow-up Likelihood as Reward Signal
Aligning Language Models Using Follow-up Likelihood as Reward Signal
Chen Zhang
Dading Chong
Feng Jiang
Chengguang Tang
Anningzhe Gao
Guohua Tang
Haizhou Li
ALM
105
2
0
20 Sep 2024
CamelEval: Advancing Culturally Aligned Arabic Language Models and
  Benchmarks
CamelEval: Advancing Culturally Aligned Arabic Language Models and Benchmarks
Zhaozhi Qian
Faroq Altam
Muhammad Alqurishi
Riad Souissi
35
3
0
19 Sep 2024
From Lists to Emojis: How Format Bias Affects Model Alignment
From Lists to Emojis: How Format Bias Affects Model Alignment
Xuanchang Zhang
Wei Xiong
Lichang Chen
Dinesh Manocha
Heng Huang
Tong Zhang
ALM
104
13
0
18 Sep 2024
KodeXv0.1: A Family of State-of-the-Art Financial Large Language Models
KodeXv0.1: A Family of State-of-the-Art Financial Large Language Models
Neel Rajani
Lilli Kiessling
Aleksandr Ogaltsov
Claus Lang
ALM
55
0
0
13 Sep 2024
AIPO: Improving Training Objective for Iterative Preference Optimization
AIPO: Improving Training Objective for Iterative Preference Optimization
Yaojie Shen
Xinyao Wang
Yulei Niu
Ying Zhou
Lexin Tang
Libo Zhang
Fan Chen
Longyin Wen
89
2
0
13 Sep 2024
Propaganda is all you need
Propaganda is all you need
Paul Kronlund-Drouault
139
0
0
13 Sep 2024
Towards a Unified View of Preference Learning for Large Language Models:
  A Survey
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Bofei Gao
Feifan Song
Yibo Miao
Zefan Cai
Zhiyong Yang
...
Houfeng Wang
Zhifang Sui
Peiyi Wang
Baobao Chang
Baobao Chang
153
14
0
04 Sep 2024
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large
  Language Models
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models
Dian Yu
Baolin Peng
Ye Tian
Linfeng Song
Haitao Mi
Dong Yu
ALMLRM
73
3
0
28 Aug 2024
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
Chenglong Wang
Yang Gan
Yifu Huo
Yongyu Mu
Murun Yang
...
Chunliang Zhang
Tongran Liu
Quan Du
Di Yang
Jingbo Zhu
VLM
162
6
0
22 Aug 2024
Value Alignment from Unstructured Text
Value Alignment from Unstructured Text
Inkit Padhi
Karthikeyan N. Ramamurthy
P. Sattigeri
Manish Nagireddy
Pierre Dognin
Kush R. Varshney
93
0
0
19 Aug 2024
Minor DPO reject penalty to increase training robustness
Minor DPO reject penalty to increase training robustness
Shiming Xie
Hong Chen
Fred Yu
Zeye Sun
Xiuyu Wu
Yingfan Hu
71
4
0
19 Aug 2024
The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation
The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation
Samee Arif
Sualeha Farid
Abdul Hameed Azeemi
Awais Athar
Agha Ali Raza
LLMAG
114
8
0
16 Aug 2024
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization
Yuxin Jiang
Bo Huang
Yufei Wang
Xingshan Zeng
Liangyou Li
Yasheng Wang
Xin Jiang
Lifeng Shang
Ruiming Tang
Wei Wang
127
7
0
14 Aug 2024
Exploring Applications of State Space Models and Advanced Training
  Techniques in Sequential Recommendations: A Comparative Study on Efficiency
  and Performance
Exploring Applications of State Space Models and Advanced Training Techniques in Sequential Recommendations: A Comparative Study on Efficiency and Performance
M. Obozov
Makar Baderko
Stepan Kulibaba
N. Kutuzov
Alexander Gasnikov
MambaOffRL
99
0
0
10 Aug 2024
Towards Explainable Network Intrusion Detection using Large Language
  Models
Towards Explainable Network Intrusion Detection using Large Language Models
Paul R. B. Houssel
Priyanka Singh
S. Layeghy
Marius Portmann
73
4
0
08 Aug 2024
ABC Align: Large Language Model Alignment for Safety & Accuracy
ABC Align: Large Language Model Alignment for Safety & Accuracy
Gareth Seneque
Lap-Hang Ho
Peter W. Glynn
Yinyu Ye
Jeffrey Molendijk
90
1
0
01 Aug 2024
ALLaM: Large Language Models for Arabic and English
ALLaM: Large Language Models for Arabic and English
M Saiful Bari
Yazeed Alnumay
Norah A. Alzahrani
Nouf M. Alotaibi
H. A. Alyahya
...
Jeril Kuriakose
Abdalghani Abujabal
Nora Al-Twairesh
Areeb Alowisheq
Haidar Khan
68
17
0
22 Jul 2024
Weak-to-Strong Reasoning
Weak-to-Strong Reasoning
Yuqing Yang
Yan Ma
Pengfei Liu
LRM
76
18
0
18 Jul 2024
Research on Tibetan Tourism Viewpoints information generation system
  based on LLM
Research on Tibetan Tourism Viewpoints information generation system based on LLM
Jinhu Qi
Shuai Yan
Wentao Zhang
Yibo Zhang
Zirui Liu
Ke Wang
53
1
0
18 Jul 2024
New Desiderata for Direct Preference Optimization
New Desiderata for Direct Preference Optimization
Xiangkun Hu
Tong He
David Wipf
91
3
0
12 Jul 2024
LIONs: An Empirically Optimized Approach to Align Language Models
LIONs: An Empirically Optimized Approach to Align Language Models
Xiao Yu
Qingyang Wu
Yu Li
Zhou Yu
ALM
95
6
0
09 Jul 2024
Suri: Multi-constraint Instruction Following for Long-form Text
  Generation
Suri: Multi-constraint Instruction Following for Long-form Text Generation
Chau Minh Pham
Simeng Sun
Mohit Iyyer
ALMLRM
121
23
0
27 Jun 2024
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of
  LLMs
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
Xin Lai
Zhuotao Tian
Yukang Chen
Senqiao Yang
Xiangru Peng
Jiaya Jia
LRM
170
126
0
26 Jun 2024
Previous
1234
Next