ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.14734
  4. Cited By
SimPO: Simple Preference Optimization with a Reference-Free Reward
v1v2 (latest)

SimPO: Simple Preference Optimization with a Reference-Free Reward

23 May 2024
Yu Meng
Mengzhou Xia
Danqi Chen
ArXiv (abs)PDFHTML

Papers citing "SimPO: Simple Preference Optimization with a Reference-Free Reward"

50 / 197 papers shown
Title
Unlocking Post-hoc Dataset Inference with Synthetic Data
Unlocking Post-hoc Dataset Inference with Synthetic Data
Bihe Zhao
Pratyush Maini
Franziska Boenisch
Adam Dziedzic
20
0
0
18 Jun 2025
TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization
TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization
Mingkang Zhu
Xi Chen
Zhongdao Wang
Bei Yu
Hengshuang Zhao
Jiaya Jia
15
0
0
17 Jun 2025
Expectation Confirmation Preference Optimization for Multi-Turn Conversational Recommendation Agent
Expectation Confirmation Preference Optimization for Multi-Turn Conversational Recommendation Agent
Xueyang Feng
Jingsen Zhang
Jiakai Tang
Wei Li
Guohao Cai
X. Chen
Quanyu Dai
Y. Zhu
Zhenhua Dong
27
0
0
17 Jun 2025
Prefix-Tuning+: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
Prefix-Tuning+: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
Haonan Wang
Brian K Chen
Siquan Li
Xinhe Liang
Hwee Kuan Lee
Kenji Kawaguchi
Tianyang Hu
23
0
0
16 Jun 2025
Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing
Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing
Zhuoying Li
Zhu Xu
Yuxin Peng
Yang Liu
12
0
0
15 Jun 2025
Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs
Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs
Yucong Luo
Yitong Zhou
Mingyue Cheng
Jiahao Wang
Daoyu Wang
Tingyue Pan
Jintao Zhang
AI4TSLRM
114
0
0
12 Jun 2025
Improved Supervised Fine-Tuning for Large Language Models to Mitigate Catastrophic Forgetting
Improved Supervised Fine-Tuning for Large Language Models to Mitigate Catastrophic Forgetting
Fei Ding
Baiqiao Wang
CLL
91
0
0
11 Jun 2025
DreamCS: Geometry-Aware Text-to-3D Generation with Unpaired 3D Reward Supervision
Xiandong Zou
Ruihao Xia
Hongsong Wang
Pan Zhou
AI4TS
47
0
0
11 Jun 2025
Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs
Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs
Hiroshi Matsuda
Chunpeng Ma
Masayuki Asahara
88
0
0
11 Jun 2025
A Survey on Large Language Models for Mathematical Reasoning
Peng-Yuan Wang
Tian-Shuo Liu
Chenyang Wang
Yi-Di Wang
Shu Yan
...
Xu-Hui Liu
Xin-Wei Chen
Jia-Cheng Xu
Ziniu Li
Yang Yu
LRM
21
0
0
10 Jun 2025
Reinforce LLM Reasoning through Multi-Agent Reflection
Yurun Yuan
Tengyang Xie
LRM
25
0
0
10 Jun 2025
Preference-Driven Multi-Objective Combinatorial Optimization with Conditional Computation
Mingfeng Fan
Jianan Zhou
Yifeng Zhang
Yaoxin Wu
Jinbiao Chen
Guillaume Sartoretti
AI4CE
40
0
0
10 Jun 2025
Flow Matching Meets PDEs: A Unified Framework for Physics-Constrained Generation
Giacomo Baldan
Qiang Liu
Alberto Guardone
Nils Thuerey
AI4CE
21
1
0
10 Jun 2025
Reinforcing Multimodal Understanding and Generation with Dual Self-rewards
Reinforcing Multimodal Understanding and Generation with Dual Self-rewards
Jixiang Hong
Yiran Zhang
Guanzhong Wang
Yi Liu
Ji-Rong Wen
Rui Yan
LRM
26
0
0
09 Jun 2025
Explicit Preference Optimization: No Need for an Implicit Reward Model
Explicit Preference Optimization: No Need for an Implicit Reward Model
Xiangkun Hu
Lemin Kong
Tong He
David Wipf
17
0
0
09 Jun 2025
Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding
Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding
Feifan Song
Shaohang Wei
Wen Luo
Yuxuan Fan
Tianyu Liu
Guoyin Wang
Houfeng Wang
15
0
0
09 Jun 2025
How Far Are We from Optimal Reasoning Efficiency?
How Far Are We from Optimal Reasoning Efficiency?
Jiaxuan Gao
Shu Yan
Qixin Tan
Lu Yang
Shusheng Xu
Wei Fu
Zhiyu Mei
Kaifeng Lyu
Yi Wu
LRM
22
0
0
08 Jun 2025
From Threat to Tool: Leveraging Refusal-Aware Injection Attacks for Safety Alignment
From Threat to Tool: Leveraging Refusal-Aware Injection Attacks for Safety Alignment
Kyubyung Chae
Hyunbin Jin
Taesup Kim
27
0
0
07 Jun 2025
Unlocking Recursive Thinking of LLMs: Alignment via Refinement
Unlocking Recursive Thinking of LLMs: Alignment via Refinement
Haoke Zhang
Xiaobo Liang
Cunxiang Wang
Juntao Li
Min Zhang
LRM
37
0
0
06 Jun 2025
Debiasing Online Preference Learning via Preference Feature Preservation
Debiasing Online Preference Learning via Preference Feature Preservation
Dongyoung Kim
Jinsung Yoon
Jinwoo Shin
Jaehyung Kim
12
0
0
06 Jun 2025
SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat
Yuru Jiang
Wenxuan Ding
Shangbin Feng
Greg Durrett
Yulia Tsvetkov
88
0
0
05 Jun 2025
Aligning Large Language Models with Implicit Preferences from User-Generated Content
Zhaoxuan Tan
Zheng Li
Tianyi Liu
Haodong Wang
Hyokun Yun
...
Yifan Gao
Ruijie Wang
Priyanka Nigam
Bing Yin
Meng Jiang
75
0
0
04 Jun 2025
Multi-objective Aligned Bidword Generation Model for E-commerce Search Advertising
Multi-objective Aligned Bidword Generation Model for E-commerce Search Advertising
Zhenhui Liu
Chunyuan Yuan
Ming Pang
Zheng Fang
Li Yuan
Xue Jiang
Changping Peng
Zhangang Lin
Zheng Luo
Jingping Shao
76
0
0
04 Jun 2025
Robust Preference Optimization via Dynamic Target Margins
Robust Preference Optimization via Dynamic Target Margins
Jie Sun
Junkang Wu
Jiancan Wu
Zhibo Zhu
Xingyu Lu
Jun Zhou
Lintao Ma
Xiang Wang
53
0
0
04 Jun 2025
Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning
Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning
Yinjie Wang
Ling Yang
Ye Tian
Ke Shen
Mengdi Wang
LRM
62
1
0
03 Jun 2025
daDPO: Distribution-Aware DPO for Distilling Conversational Abilities
daDPO: Distribution-Aware DPO for Distilling Conversational Abilities
Zhengze Zhang
Shiqi Wang
Yiqun Shen
Simin Guo
Dahua Lin
Xiaoliang Wang
Nguyen Cam-Tu
Fei Tan
15
0
0
03 Jun 2025
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
S. Wang
Le Yu
Chang Gao
Chujie Zheng
Shixuan Liu
...
Yang Yue
S. Song
Bowen Yu
Gao Huang
Junyang Lin
LRM
65
9
0
02 Jun 2025
Towards Human-like Preference Profiling in Sequential Recommendation
Towards Human-like Preference Profiling in Sequential Recommendation
Z. Ouyang
Qianlong Wen
Chunhui Zhang
Yanfang Ye
Soroush Vosoughi
HAI
25
0
0
02 Jun 2025
Generalizable LLM Learning of Graph Synthetic Data with Reinforcement Learning
Generalizable LLM Learning of Graph Synthetic Data with Reinforcement Learning
Yizhuo Zhang
Heng Wang
Shangbin Feng
Zhaoxuan Tan
Xinyun Liu
Yulia Tsvetkov
OffRL
49
0
0
01 Jun 2025
K-order Ranking Preference Optimization for Large Language Models
K-order Ranking Preference Optimization for Large Language Models
Shihao Cai
Chongming Gao
Yang Zhang
Wentao Shi
Jizhi Zhang
Keqin Bao
Qifan Wang
Fuli Feng
ALM
40
0
0
31 May 2025
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning
Yiqing Liang
Jielin Qiu
Wenhao Ding
Zuxin Liu
James Tompkin
Mengdi Xu
Mengzhou Xia
Zhengzhong Tu
Laixi Shi
Jiacheng Zhu
OffRL
125
0
0
30 May 2025
MDPO: Multi-Granularity Direct Preference Optimization for Mathematical Reasoning
MDPO: Multi-Granularity Direct Preference Optimization for Mathematical Reasoning
Yunze Lin
LRM
15
0
0
30 May 2025
Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning
Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning
Shuyao Xu
Cheng Peng
Jiangxuan Long
Weidi Xu
Wei Chu
Yuan Qi
LRM
30
0
0
30 May 2025
On Symmetric Losses for Robust Policy Optimization with Noisy Preferences
On Symmetric Losses for Robust Policy Optimization with Noisy Preferences
Soichiro Nishimori
Yu Zhang
Thanawat Lodkaew
Masashi Sugiyama
NoLa
39
0
0
30 May 2025
Towards Reward Fairness in RLHF: From a Resource Allocation Perspective
Towards Reward Fairness in RLHF: From a Resource Allocation Perspective
Sheng Ouyang
Yulan Hu
Ge Chen
Qingyang Li
Fuzheng Zhang
Yong Liu
32
0
0
29 May 2025
Proximalized Preference Optimization for Diverse Feedback Types: A Decomposed Perspective on DPO
Proximalized Preference Optimization for Diverse Feedback Types: A Decomposed Perspective on DPO
Kaiyang Guo
Yinchuan Li
Zhitang Chen
60
0
0
29 May 2025
Learning Parametric Distributions from Samples and Preferences
Learning Parametric Distributions from Samples and Preferences
Marc Jourdan
Gizem Yüce
Nicolas Flammarion
25
0
0
29 May 2025
Discriminative Policy Optimization for Token-Level Reward Models
Discriminative Policy Optimization for Token-Level Reward Models
Hongzhan Chen
Tao Yang
Shiping Gao
Ruijun Chen
Xiaojun Quan
Hongtao Tian
Ting Yao
33
0
0
29 May 2025
Differential Information: An Information-Theoretic Perspective on Preference Optimization
Differential Information: An Information-Theoretic Perspective on Preference Optimization
Yunjae Won
Hyunji Lee
Hyeonbin Hwang
Minjoon Seo
27
0
0
29 May 2025
MenTeR: A fully-automated Multi-agenT workflow for end-to-end RF/Analog Circuits Netlist Design
MenTeR: A fully-automated Multi-agenT workflow for end-to-end RF/Analog Circuits Netlist Design
Pin-Han Chen
Y. Lin
Wei-Cheng Lee
Tin-Yu Leu
Po-Hsiang Hsu
Anjana Dissanayake
Sungjin Oh
Chinq-Shiun Chiu
46
0
0
29 May 2025
Dataset Cartography for Large Language Model Alignment: Mapping and Diagnosing Preference Data
Dataset Cartography for Large Language Model Alignment: Mapping and Diagnosing Preference Data
Seohyeong Lee
Eunwon Kim
Hwaran Lee
Buru Chang
66
0
0
29 May 2025
Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences?
Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences?
Paul Gölz
Nika Haghtalab
Kunhe Yang
40
0
0
29 May 2025
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning
Qiuchen Wang
Ruixue Ding
Y. Zeng
Zehui Chen
Lin Yen-Chen
Shihang Wang
Pengjun Xie
Fei Huang
Feng Zhao
VLMLRM
86
0
0
28 May 2025
SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
Xiaomeng Yang
Zhiyu Tan
Junyan Wang
Zhijian Zhou
Hao Li
75
0
0
28 May 2025
Modeling and Optimizing User Preferences in AI Copilots: A Comprehensive Survey and Taxonomy
Modeling and Optimizing User Preferences in AI Copilots: A Comprehensive Survey and Taxonomy
Saleh Afzoon
Zahra Jahanandish
Phuong Thao Huynh
Amin Beheshti
Usman Naseem
39
0
0
28 May 2025
Improved Representation Steering for Language Models
Improved Representation Steering for Language Models
Zhengxuan Wu
Qinan Yu
Aryaman Arora
Christopher D. Manning
Christopher Potts
LLMSV
76
0
0
27 May 2025
Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models
Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models
Sohyun An
Ruochen Wang
Tianyi Zhou
Cho-Jui Hsieh
KELMLRM
94
1
0
27 May 2025
Frictional Agent Alignment Framework: Slow Down and Don't Break Things
Frictional Agent Alignment Framework: Slow Down and Don't Break Things
Abhijnan Nath
Carine Graff
Andrei Bachinin
Nikhil Krishnaswamy
116
1
0
26 May 2025
Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models
Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models
Yi Liu
Dianqing Liu
Mingye Zhu
Junbo Guo
Yongdong Zhang
Zhendong Mao
102
0
0
26 May 2025
Interleaved Reasoning for Large Language Models via Reinforcement Learning
Interleaved Reasoning for Large Language Models via Reinforcement Learning
Roy Xie
David Qiu
Deepak Gopinath
Dong Lin
Yanchao Sun
Chong-Jun Wang
Saloni Potdar
Bhuwan Dhingra
KELMLRM
73
0
0
26 May 2025
1234
Next