ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.12435
  4. Cited By
SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment

SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment

18 May 2025
Wenqiao Zhu
Ji Liu
Lulu Wang
Jun Wu
Yulun Zhang
ArXiv (abs)PDFHTML

Papers citing "SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment"

40 / 40 papers shown
Title
PSC: Extending Context Window of Large Language Models via Phase Shift Calibration
PSC: Extending Context Window of Large Language Models via Phase Shift Calibration
Wenqiao Zhu
Chao Xu
Lulu Wang
Jun Wu
107
1
0
18 May 2025
A Comprehensive Survey on Long Context Language Modeling
A Comprehensive Survey on Long Context Language Modeling
Jiaheng Liu
Dawei Zhu
Zhiqi Bai
Yancheng He
Huanxuan Liao
...
Bo Zheng
Wangchunshu Zhou
Wenhao Huang
Sujian Li
Zhenru Zhang
LLMAG
112
11
0
20 Mar 2025
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao
Yige Yuan
Ziyang Chen
Mingxiao Li
Shangsong Liang
Zhaochun Ren
V. Honavar
262
11
0
21 Feb 2025
Trustworthy Federated Learning: Privacy, Security, and Beyond
Trustworthy Federated Learning: Privacy, Security, and Beyond
Chunlu Chen
Ji Liu
Haowen Tan
Xingjian Li
Kevin I-Kai Wang
Peng Li
Kouichi Sakurai
Dejing Dou
FedML
105
11
0
03 Nov 2024
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Noam Razin
Sadhika Malladi
Adithya Bhaskar
Danqi Chen
Sanjeev Arora
Boris Hanin
212
35
0
11 Oct 2024
Efficient Federated Learning Using Dynamic Update and Adaptive Pruning
  with Momentum on Shared Server Data
Efficient Federated Learning Using Dynamic Update and Adaptive Pruning with Momentum on Shared Server Data
Ji Liu
Juncheng Jia
Hong Zhang
Yuhui Yun
Leye Wang
Yang Zhou
H. Dai
Dejing Dou
FedML
85
7
0
11 Aug 2024
Eliminating Biased Length Reliance of Direct Preference Optimization via
  Down-Sampled KL Divergence
Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence
Junru Lu
Jiazheng Li
Siyu An
Meng Zhao
Yulan He
Di Yin
Xing Sun
94
20
0
16 Jun 2024
SimPO: Simple Preference Optimization with a Reference-Free Reward
SimPO: Simple Preference Optimization with a Reference-Free Reward
Yu Meng
Mengzhou Xia
Danqi Chen
166
492
0
23 May 2024
Towards Analyzing and Understanding the Limitations of DPO: A
  Theoretical Perspective
Towards Analyzing and Understanding the Limitations of DPO: A Theoretical Perspective
Duanyu Feng
Bowen Qin
Chen Huang
Zheng Zhang
Wenqiang Lei
67
41
0
06 Apr 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Percy Liang
Tatsunori Hashimoto
ALM
164
403
0
06 Apr 2024
Binary Classifier Optimization for Large Language Model Alignment
Binary Classifier Optimization for Large Language Model Alignment
Seungjae Jung
Gunsoo Han
D. W. Nam
Kyoung-Woon On
80
25
0
06 Apr 2024
ChatGLM-RLHF: Practices of Aligning Large Language Models with Human
  Feedback
ChatGLM-RLHF: Practices of Aligning Large Language Models with Human Feedback
Zhenyu Hou
Yiin Niu
Zhengxiao Du
Xiaohan Zhang
Xiao Liu
...
Qinkai Zheng
Minlie Huang
Hongning Wang
Jie Tang
Yuxiao Dong
ALM
107
19
0
01 Apr 2024
Disentangling Length from Quality in Direct Preference Optimization
Disentangling Length from Quality in Direct Preference Optimization
Ryan Park
Rafael Rafailov
Stefano Ermon
Chelsea Finn
ALM
98
145
0
28 Mar 2024
ORPO: Monolithic Preference Optimization without Reference Model
ORPO: Monolithic Preference Optimization without Reference Model
Jiwoo Hong
Noah Lee
James Thorne
OSLM
113
267
0
12 Mar 2024
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Arka Pal
Deep Karkhanis
Samuel Dooley
Manley Roberts
Siddartha Naidu
Colin White
OSLM
106
155
0
20 Feb 2024
Noise Contrastive Alignment of Language Models with Explicit Rewards
Noise Contrastive Alignment of Language Models with Explicit Rewards
Huayu Chen
Guande He
Lifan Yuan
Ganqu Cui
Hang Su
Jun Zhu
110
56
0
08 Feb 2024
KTO: Model Alignment as Prospect Theoretic Optimization
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
297
570
0
02 Feb 2024
Contrastive Preference Optimization: Pushing the Boundaries of LLM
  Performance in Machine Translation
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
Haoran Xu
Amr Sharaf
Yunmo Chen
Weiting Tan
Lingfeng Shen
Benjamin Van Durme
Kenton W. Murray
Young Jin Kim
ALM
120
266
0
16 Jan 2024
Efficient Asynchronous Federated Learning with Sparsification and
  Quantization
Efficient Asynchronous Federated Learning with Sparsification and Quantization
Juncheng Jia
Ji Liu
Chendi Zhou
Hao Tian
M. Dong
Dejing Dou
FedML
88
13
0
23 Dec 2023
AEDFL: Efficient Asynchronous Decentralized Federated Learning with
  Heterogeneous Devices
AEDFL: Efficient Asynchronous Decentralized Federated Learning with Heterogeneous Devices
Ji Liu
Tianshi Che
Yang Zhou
Ruoming Jin
H. Dai
Dejing Dou
P. Valduriez
101
13
0
18 Dec 2023
FedASMU: Efficient Asynchronous Federated Learning with Dynamic
  Staleness-aware Model Update
FedASMU: Efficient Asynchronous Federated Learning with Dynamic Staleness-aware Model Update
Ji Liu
Juncheng Jia
Tianshi Che
Chao Huo
Jiaxiang Ren
Yang Zhou
H. Dai
Dejing Dou
71
37
0
10 Dec 2023
Federated Learning of Large Language Models with Parameter-Efficient
  Prompt Tuning and Adaptive Optimization
Federated Learning of Large Language Models with Parameter-Efficient Prompt Tuning and Adaptive Optimization
Tianshi Che
Ji Liu
Yang Zhou
Jiaxiang Ren
Jiwen Zhou
Victor S. Sheng
H. Dai
Dejing Dou
90
56
0
23 Oct 2023
A General Theoretical Paradigm to Understand Learning from Human
  Preferences
A General Theoretical Paradigm to Understand Learning from Human Preferences
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
207
647
0
18 Oct 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALMOSLMELM
541
4,453
0
09 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
399
4,184
0
29 May 2023
Enhancing Chat Language Models by Scaling High-quality Instructional
  Conversations
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
Ning Ding
Yulin Chen
Bokai Xu
Yujia Qin
Zhi Zheng
Shengding Hu
Zhiyuan Liu
Maosong Sun
Bowen Zhou
ALM
150
554
0
23 May 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human
  Feedback
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
152
608
0
22 May 2023
Distributed and Deep Vertical Federated Learning with Big Data
Distributed and Deep Vertical Federated Learning with Big Data
Ji Liu
Xuehai Zhou
L. Mo
Shilei Ji
Yuan Liao
Zhu Li
Qinhua Gu
Dejing Dou
FedML
79
18
0
08 Mar 2023
Multi-Job Intelligent Scheduling with Cross-Device Federated Learning
Multi-Job Intelligent Scheduling with Cross-Device Federated Learning
Ji Liu
Juncheng Jia
Beichen Ma
Chen Zhou
Jingbo Zhou
Yang Zhou
H. Dai
Dejing Dou
FedML
104
24
0
24 Nov 2022
FedDUAP: Federated Learning with Dynamic Update and Adaptive Pruning
  Using Shared Data on the Server
FedDUAP: Federated Learning with Dynamic Update and Adaptive Pruning Using Shared Data on the Server
Hong Zhang
Ji Liu
Juncheng Jia
Yang Zhou
H. Dai
Dejing Dou
FedML
78
45
0
25 Apr 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning
  from Human Feedback
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
258
2,630
0
12 Apr 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
924
13,266
0
04 Mar 2022
Efficient Device Scheduling with Multi-Job Federated Learning
Efficient Device Scheduling with Multi-Job Federated Learning
Chen Zhou
Ji Liu
Juncheng Jia
Jingbo Zhou
Yang Zhou
H. Dai
Dejing Dou
FedML
100
41
0
11 Dec 2021
HeterPS: Distributed Deep Learning With Reinforcement Learning Based
  Scheduling in Heterogeneous Environments
HeterPS: Distributed Deep Learning With Reinforcement Learning Based Scheduling in Heterogeneous Environments
Ji Liu
Zhihua Wu
Dianhai Yu
Yanjun Ma
Danlei Feng
Minxu Zhang
Xinxuan Wu
Xuefeng Yao
Dejing Dou
76
49
0
20 Nov 2021
TruthfulQA: Measuring How Models Mimic Human Falsehoods
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Stephanie C. Lin
Jacob Hilton
Owain Evans
HILM
151
1,951
0
08 Sep 2021
From Distributed Machine Learning to Federated Learning: A Survey
From Distributed Machine Learning to Federated Learning: A Survey
Ji Liu
Jizhou Huang
Yang Zhou
Xuhong Li
Shilei Ji
Haoyi Xiong
Dejing Dou
FedMLOOD
144
262
0
29 Apr 2021
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELMRALM
204
4,580
0
07 Sep 2020
Learning to summarize from human feedback
Learning to summarize from human feedback
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
282
2,195
0
02 Sep 2020
PIQA: Reasoning about Physical Commonsense in Natural Language
PIQA: Reasoning about Physical Commonsense in Natural Language
Yonatan Bisk
Rowan Zellers
Ronan Le Bras
Jianfeng Gao
Yejin Choi
OODLRM
246
1,851
0
26 Nov 2019
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning
  Challenge
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark
Isaac Cowhey
Oren Etzioni
Tushar Khot
Ashish Sabharwal
Carissa Schoenick
Oyvind Tafjord
ELMRALMLRM
237
2,676
0
14 Mar 2018
1