ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.14734
  4. Cited By
SimPO: Simple Preference Optimization with a Reference-Free Reward
v1v2 (latest)

SimPO: Simple Preference Optimization with a Reference-Free Reward

23 May 2024
Yu Meng
Mengzhou Xia
Danqi Chen
ArXiv (abs)PDFHTML

Papers citing "SimPO: Simple Preference Optimization with a Reference-Free Reward"

50 / 197 papers shown
Title
Frictional Agent Alignment Framework: Slow Down and Don't Break Things
Frictional Agent Alignment Framework: Slow Down and Don't Break Things
Abhijnan Nath
Carine Graff
Andrei Bachinin
Nikhil Krishnaswamy
116
1
0
26 May 2025
Token-Importance Guided Direct Preference Optimization
Token-Importance Guided Direct Preference Optimization
Yang Ning
Lin Hai
Liu Yibo
Tian Baoliang
Liu Guoqing
Zhang Haijun
71
0
0
26 May 2025
An Embarrassingly Simple Defense Against LLM Abliteration Attacks
An Embarrassingly Simple Defense Against LLM Abliteration Attacks
Harethah Shairah
Hasan Hammoud
Bernard Ghanem
G. Turkiyyah
63
0
0
25 May 2025
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
Mingyuan Wu
Jingcheng Yang
Jize Jiang
Meitang Li
Kaizhuo Yan
Hanchao Yu
Minjia Zhang
Chengxiang Zhai
Klara Nahrstedt
LRM
170
0
0
25 May 2025
Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization
Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization
Meng Li
Guangda Huzhang
Haibo Zhang
Xiting Wang
Anxiang Zeng
42
0
0
24 May 2025
Flex-Judge: Think Once, Judge Anywhere
Flex-Judge: Think Once, Judge Anywhere
Jongwoo Ko
S. Kim
Sungwoo Cho
Se-Young Yun
ELMLRM
218
0
0
24 May 2025
Rethinking Direct Preference Optimization in Diffusion Models
Rethinking Direct Preference Optimization in Diffusion Models
Junyong Kang
Seohyun Lim
Kyungjune Baek
Hyunjung Shim
777
0
0
24 May 2025
Hybrid Latent Reasoning via Reinforcement Learning
Hybrid Latent Reasoning via Reinforcement Learning
Zhenrui Yue
Bowen Jin
Huimin Zeng
Honglei Zhuang
Zhen Qin
Jinsung Yoon
Lanyu Shang
Jiawei Han
Dong Wang
OffRLBDLLRM
68
0
0
24 May 2025
VeriThinker: Learning to Verify Makes Reasoning Model Efficient
Zigeng Chen
Xinyin Ma
Gongfan Fang
Ruonan Yu
Xinchao Wang
LRM
165
1
0
23 May 2025
ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models
ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models
Razvan-Gabriel Dumitru
Darius Peteleaza
Vikas Yadav
Liangming Pan
ReLMLRM
115
1
0
22 May 2025
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
Zebin You
Shen Nie
Xiaolu Zhang
Jun Hu
Jun Zhou
Zhiwu Lu
J. Wen
Chongxuan Li
MLLMVLM
112
2
0
22 May 2025
MPO: Multilingual Safety Alignment via Reward Gap Optimization
MPO: Multilingual Safety Alignment via Reward Gap Optimization
Weixiang Zhao
Yulin Hu
Yang Deng
Tongtong Wu
Wenxuan Zhang
...
An Zhang
Yanyan Zhao
Bing Qin
Tat-Seng Chua
Ting Liu
100
2
0
22 May 2025
Reward Is Enough: LLMs Are In-Context Reinforcement Learners
Reward Is Enough: LLMs Are In-Context Reinforcement Learners
Kefan Song
Amir Moeini
Peng Wang
Lei Gong
Rohan Chandra
Yanjun Qi
Shangtong Zhang
ReLMLRM
30
3
0
21 May 2025
Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning
Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning
Jinghui Lu
Haiyang Yu
Siliang Xu
Shiwei Ran
Guozhi Tang
...
Teng Fu
Hao Feng
Jingqun Tang
Hongru Wang
Can Huang
LRM
111
3
0
21 May 2025
When Can Large Reasoning Models Save Thinking? Mechanistic Analysis of Behavioral Divergence in Reasoning
When Can Large Reasoning Models Save Thinking? Mechanistic Analysis of Behavioral Divergence in Reasoning
Rongzhi Zhu
Yi Liu
Zequn Sun
Yiwei Wang
Wei Hu
OffRLLRMAI4CE
85
0
0
21 May 2025
ThinkSwitcher: When to Think Hard, When to Think Fast
ThinkSwitcher: When to Think Hard, When to Think Fast
Guosheng Liang
Longguang Zhong
Ziyi Yang
Xiaojun Quan
LRM
65
1
0
20 May 2025
Self-Evolving Curriculum for LLM Reasoning
Self-Evolving Curriculum for LLM Reasoning
Xiaoyin Chen
Jiarui Lu
Minsu Kim
Dinghuai Zhang
Jian Tang
Alexandre Piché
Nicolas Angelard-Gontier
Yoshua Bengio
Ehsan Kamalloo
ReLMLRM
112
0
0
20 May 2025
WikiPersonas: What Can We Learn From Personalized Alignment to Famous People?
WikiPersonas: What Can We Learn From Personalized Alignment to Famous People?
Zilu Tang
Afra Feyza Akyürek
Ekin Akyürek
Derry Wijaya
114
0
0
19 May 2025
SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment
SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment
Wenqiao Zhu
Ji Liu
Lulu Wang
Jun Wu
Yulun Zhang
106
0
0
18 May 2025
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li
Ming Lin
Tomer Galanti
Zhengzhong Tu
Tianbao Yang
98
1
0
18 May 2025
Reward Inside the Model: A Lightweight Hidden-State Reward Model for LLM's Best-of-N sampling
Reward Inside the Model: A Lightweight Hidden-State Reward Model for LLM's Best-of-N sampling
Jizhou Guo
Zhaomin Wu
Philip S. Yu
82
0
0
18 May 2025
Mutual-Taught for Co-adapting Policy and Reward Models
Mutual-Taught for Co-adapting Policy and Reward Models
Tianyuan Shi
Canbin Huang
Fanqi Wan
Longguang Zhong
Ziyi Yang
Weizhou Shen
Xiaojun Quan
Ming Yan
22
0
0
17 May 2025
ShiQ: Bringing back Bellman to LLMs
ShiQ: Bringing back Bellman to LLMs
Pierre Clavier
Nathan Grinsztajn
Raphaël Avalos
Yannis Flet-Berliac
Irem Ergun
...
Eugene Tarassov
Olivier Pietquin
Pierre Harvey Richemond
Florian Strub
Matthieu Geist
OffRL
64
0
0
16 May 2025
Towards Self-Improvement of Diffusion Models via Group Preference Optimization
Towards Self-Improvement of Diffusion Models via Group Preference Optimization
Renjie Chen
Wenfeng Lin
Yichen Zhang
Jiangchuan Wei
Boyuan Liu
Chao Feng
Jiao Ran
Mingyu Guo
64
0
0
16 May 2025
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Peter Chen
Xiaopeng Li
Zhiyu Li
Xi Chen
Tianyi Lin
85
0
0
16 May 2025
Time-R1: Towards Comprehensive Temporal Reasoning in LLMs
Time-R1: Towards Comprehensive Temporal Reasoning in LLMs
Zijia Liu
Peixuan Han
Haofei Yu
Haoru Li
Jiaxuan You
AI4TSLRM
175
0
0
16 May 2025
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
Huashan Sun
Shengyi Liao
Yansen Han
Yu Bai
Yang Gao
...
Weizhou Shen
Fanqi Wan
Ming Yan
J.N. Zhang
Fei Huang
174
0
0
16 May 2025
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
Ziyi Wang
Jiaqi Zeng
Olivier Delalleau
Hoo-Chang Shin
Felipe Soares
Alexander Bukharin
Ellie Evans
Yi Dong
Oleksii Kuchaiev
101
2
0
16 May 2025
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Sagnik Mukherjee
Lifan Yuan
Dilek Hakkani-Tur
Hao Peng
108
0
0
16 May 2025
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Zemin Huang
Zhiyang Chen
Zijun Wang
Tiancheng Li
Guo-Jun Qi
DiffMLRMAI4CE
101
2
0
15 May 2025
InfoPO: On Mutual Information Maximization for Large Language Model Alignment
InfoPO: On Mutual Information Maximization for Large Language Model Alignment
Teng Xiao
Zhen Ge
Sujay Sanghavi
Tian Wang
Julian Katz-Samuels
Marc Versage
Qingjun Cui
Trishul Chilimbi
201
1
0
13 May 2025
ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning
ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning
Ziqing Qiao
Yongheng Deng
Jiali Zeng
Dong Wang
Lai Wei
Fandong Meng
Jie Zhou
Ju Ren
Yaoxue Zhang
LRM
112
3
0
08 May 2025
LLAMAPIE: Proactive In-Ear Conversation Assistants
LLAMAPIE: Proactive In-Ear Conversation Assistants
Tuochao Chen
Nicholas Batchelder
Alisa Liu
Noah A. Smith
Shyamnath Gollakota
403
0
0
07 May 2025
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
Taehyun Cho
Seokhun Ju
Seungyub Han
Dohyeong Kim
Kyungjae Lee
Jungwoo Lee
OffRL
114
0
0
06 May 2025
FairPO: Robust Preference Optimization for Fair Multi-Label Learning
FairPO: Robust Preference Optimization for Fair Multi-Label Learning
Soumen Kumar Mondal
Akshit Varmora
Prateek Chanda
Ganesh Ramakrishnan
96
0
0
05 May 2025
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
Tianjian Li
Daniel Khashabi
135
0
0
05 May 2025
Bielik 11B v2 Technical Report
Bielik 11B v2 Technical Report
Krzysztof Ociepa
Łukasz Flis
Krzysztof Wróbel
Adrian Gwoździej
Remigiusz Kinas
114
0
0
05 May 2025
Calibrating Translation Decoding with Quality Estimation on LLMs
Calibrating Translation Decoding with Quality Estimation on LLMs
Di Wu
Yibin Lei
Christof Monz
162
0
0
26 Apr 2025
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
Junshu Pan
Wei Shen
Shulin Huang
Qiji Zhou
Yue Zhang
115
2
0
22 Apr 2025
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
Jiliang Ni
Jiachen Pu
Zhongyi Yang
Kun Zhou
Hui Wang
Xiaoliang Xiao
Dakui Wang
Xin Li
Jingfeng Luo
Conggang Hu
129
0
0
18 Apr 2025
ToolRL: Reward is All Tool Learning Needs
ToolRL: Reward is All Tool Learning Needs
Cheng Qian
Emre Can Acikgoz
Qi He
Hongru Wang
Xiusi Chen
Dilek Hakkani-Tur
Gokhan Tur
Heng Ji
OffRLLRM
139
32
0
16 Apr 2025
Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data
Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data
Shuai Zhao
Linchao Zhu
Yi Yang
93
3
0
14 Apr 2025
FuseRL: Dense Preference Optimization for Heterogeneous Model Fusion
FuseRL: Dense Preference Optimization for Heterogeneous Model Fusion
Longguang Zhong
Fanqi Wan
Ziyi Yang
Guosheng Liang
Tianyuan Shi
Xiaojun Quan
MoMe
124
1
0
09 Apr 2025
Information-Theoretic Reward Decomposition for Generalizable RLHF
Information-Theoretic Reward Decomposition for Generalizable RLHF
Liyuan Mao
Haoran Xu
Amy Zhang
Weinan Zhang
Chenjia Bai
115
0
0
08 Apr 2025
PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
Xinpeng Ding
Kai Zhang
Jinahua Han
Lanqing Hong
Hang Xu
Xuelong Li
MLLMVLM
500
0
0
08 Apr 2025
SEA-LION: Southeast Asian Languages in One Network
SEA-LION: Southeast Asian Languages in One Network
Raymond Ng
Thanh Ngan Nguyen
Yuli Huang
Ngee Chia Tai
Wai Yi Leong
...
David Ong Tat-Wee
B. Liu
William-Chandra Tjhi
Min Zhang
Leslie Teo
131
14
0
08 Apr 2025
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
Anna Goldie
Azalia Mirhoseini
Hao Zhou
Irene Cai
Christopher D. Manning
SyDaOffRLReLMLRM
189
11
0
07 Apr 2025
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
Hongyi Zhou
Jin Zhu
Francesco Quinzan
C. Shi
95
4
0
03 Apr 2025
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
José P. Pombal
Nuno M. Guerreiro
Ricardo Rei
André F. T. Martins
ALM
139
2
0
01 Apr 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELMOffRLLRMAI4CE
429
4
0
26 Mar 2025
Previous
1234
Next