ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.01878
  4. Cited By
LiPO: Listwise Preference Optimization through Learning-to-Rank

LiPO: Listwise Preference Optimization through Learning-to-Rank

28 January 2025
Tianqi Liu
Zhen Qin
Junru Wu
Jiaming Shen
Misha Khalman
Rishabh Joshi
Yao-Min Zhao
Mohammad Saleh
Simon Baumgartner
Jialu Liu
Peter J. Liu
Xuanhui Wang
ArXivPDFHTML

Papers citing "LiPO: Listwise Preference Optimization through Learning-to-Rank"

39 / 39 papers shown
Title
In-context Ranking Preference Optimization
In-context Ranking Preference Optimization
Junda Wu
Rohan Surana
Zhouhang Xie
Yiran Shen
Yu Xia
Tong Yu
Ryan Rossi
Prithviraj Ammanabrolu
Julian McAuley
38
0
0
21 Apr 2025
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
Jialun Zhong
Wei Shen
Yanzeng Li
Songyang Gao
Hua Lu
Yicheng Chen
Yang Zhang
Wei Zhou
Jinjie Gu
Lei Zou
LRM
45
2
0
12 Apr 2025
2D-Curri-DPO: Two-Dimensional Curriculum Learning for Direct Preference Optimization
2D-Curri-DPO: Two-Dimensional Curriculum Learning for Direct Preference Optimization
Mengyang Li
Zhong Zhang
27
0
0
10 Apr 2025
Perception in Reflection
Perception in Reflection
Yana Wei
Liang Zhao
Kangheng Lin
En Yu
Yuang Peng
...
Jianjian Sun
Haoran Wei
Zheng Ge
Xiangyu Zhang
Vishal M. Patel
31
0
0
09 Apr 2025
Controllable Protein Sequence Generation with LLM Preference Optimization
Xiangyu Liu
Yi Liu
Silei Chen
Wei Hu
36
0
0
28 Jan 2025
ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific
  Instruction Tuning
ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning
Yang Wu
Huayi Zhang
Yizheng Jiao
Lin Ma
Xiaozhong Liu
Jinhong Yu
Dongyu Zhang
Dezhi Yu
Wei Xu
82
1
0
01 Dec 2024
Reward Modeling with Ordinal Feedback: Wisdom of the Crowd
Reward Modeling with Ordinal Feedback: Wisdom of the Crowd
Shang Liu
Yu Pan
Guanting Chen
Xiaocheng Li
80
2
0
19 Nov 2024
$f$-PO: Generalizing Preference Optimization with $f$-divergence Minimization
fff-PO: Generalizing Preference Optimization with fff-divergence Minimization
Jiaqi Han
Mingjian Jiang
Yuxuan Song
J. Leskovec
Stefano Ermon
53
3
0
29 Oct 2024
Scalable Ranked Preference Optimization for Text-to-Image Generation
Scalable Ranked Preference Optimization for Text-to-Image Generation
Shyamgopal Karthik
Huseyin Coskun
Zeynep Akata
Sergey Tulyakov
J. Ren
Anil Kag
EGVM
57
5
0
23 Oct 2024
Optimizing Preference Alignment with Differentiable NDCG Ranking
Optimizing Preference Alignment with Differentiable NDCG Ranking
Jiacong Zhou
Xianyun Wang
Jun Yu
30
2
0
17 Oct 2024
SEER: Self-Aligned Evidence Extraction for Retrieval-Augmented
  Generation
SEER: Self-Aligned Evidence Extraction for Retrieval-Augmented Generation
Xinping Zhao
Dongfang Li
Yan Zhong
Boren Hu
Yibin Chen
Baotian Hu
Min Zhang
23
2
0
15 Oct 2024
Offline Model-Based Optimization by Learning to Rank
Offline Model-Based Optimization by Learning to Rank
Rong-Xi Tan
Ke Xue
Shen-Huan Lyu
Haopu Shang
Yao Wang
Yaoyuan Wang
Sheng Fu
Chao Qian
OffRL
81
2
0
15 Oct 2024
Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both
Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both
Abhijnan Nath
Changsoo Jung
Ethan Seefried
Nikhil Krishnaswamy
134
1
0
11 Oct 2024
HyperDPO: Hypernetwork-based Multi-Objective Fine-Tuning Framework
HyperDPO: Hypernetwork-based Multi-Objective Fine-Tuning Framework
Yinuo Ren
Tesi Xiao
Michael Shavlovsky
Lexing Ying
Holakou Rahmanian
23
0
0
10 Oct 2024
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
Yifan Zhang
Ge Zhang
Yue Wu
Kangping Xu
Quanquan Gu
48
3
0
03 Oct 2024
FlashMask: Efficient and Rich Mask Extension of FlashAttention
FlashMask: Efficient and Rich Mask Extension of FlashAttention
Guoxia Wang
Jinle Zeng
Xiyuan Xiao
Siming Wu
Jiabin Yang
Lujing Zheng
Zeyu Chen
Jiang Bian
Dianhai Yu
Haifeng Wang
136
2
0
02 Oct 2024
RRM: Robust Reward Model Training Mitigates Reward Hacking
RRM: Robust Reward Model Training Mitigates Reward Hacking
Tianqi Liu
Wei Xiong
Jie Jessie Ren
Lichang Chen
Junru Wu
...
Yuan Liu
Bilal Piot
Abe Ittycheriah
Aviral Kumar
Mohammad Saleh
AAML
56
13
0
20 Sep 2024
Alignment of Diffusion Models: Fundamentals, Challenges, and Future
Alignment of Diffusion Models: Fundamentals, Challenges, and Future
Buhua Liu
Shitong Shao
Bao Li
Lichen Bai
Zhiqiang Xu
Haoyi Xiong
James Kwok
Sumi Helal
Zeke Xie
45
12
0
11 Sep 2024
Towards a Unified View of Preference Learning for Large Language Models:
  A Survey
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Bofei Gao
Feifan Song
Yibo Miao
Zefan Cai
Z. Yang
...
Houfeng Wang
Zhifang Sui
Peiyi Wang
Baobao Chang
Baobao Chang
50
11
0
04 Sep 2024
LAMPO: Large Language Models as Preference Machines for Few-shot Ordinal
  Classification
LAMPO: Large Language Models as Preference Machines for Few-shot Ordinal Classification
Zhen Qin
Junru Wu
Jiaming Shen
Tianqi Liu
Xuanhui Wang
58
3
0
06 Aug 2024
Aligning Diffusion Models with Noise-Conditioned Perception
Aligning Diffusion Models with Noise-Conditioned Perception
Alexander Gambashidze
Anton Kulikov
Yuriy Sosnin
Ilya Makarov
44
5
0
25 Jun 2024
On the Transformations across Reward Model, Parameter Update, and
  In-Context Prompt
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt
Deng Cai
Huayang Li
Tingchen Fu
Siheng Li
Weiwen Xu
...
Leyang Cui
Yan Wang
Lemao Liu
Taro Watanabe
Shuming Shi
KELM
30
2
0
24 Jun 2024
A Survey on Human Preference Learning for Large Language Models
A Survey on Human Preference Learning for Large Language Models
Ruili Jiang
Kehai Chen
Xuefeng Bai
Zhixuan He
Juntao Li
Muyun Yang
Tiejun Zhao
Liqiang Nie
Min Zhang
49
8
0
17 Jun 2024
Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning
  Enhancement in RLHF and Effective-Merged LLMs
Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs
Chen Zheng
Ke Sun
Xun Zhou
MoE
49
0
0
12 Jun 2024
Direct Preference Optimization for Suppressing Hallucinated Prior Exams
  in Radiology Report Generation
Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation
Oishi Banerjee
Hong-Yu Zhou
Subathra Adithan
Stephen Kwak
Kay Wu
Pranav Rajpurkar
MedIm
47
3
0
10 Jun 2024
Prompt Optimization with Human Feedback
Prompt Optimization with Human Feedback
Xiaoqiang Lin
Zhongxiang Dai
Arun Verma
See-Kiong Ng
P. Jaillet
K. H. Low
AAML
36
8
0
27 May 2024
SimPO: Simple Preference Optimization with a Reference-Free Reward
SimPO: Simple Preference Optimization with a Reference-Free Reward
Yu Meng
Mengzhou Xia
Danqi Chen
62
350
0
23 May 2024
Hummer: Towards Limited Competitive Preference Dataset
Hummer: Towards Limited Competitive Preference Dataset
Li Jiang
Yusen Wu
Junwu Xiong
Jingqing Ruan
Yichuan Ding
Qingpei Guo
Zujie Wen
Jun Zhou
Xiaotie Deng
34
6
0
19 May 2024
Filtered Direct Preference Optimization
Filtered Direct Preference Optimization
Tetsuro Morimura
Mitsuki Sakamoto
Yuu Jinnai
Kenshi Abe
Kaito Air
48
13
0
22 Apr 2024
Learn Your Reference Model for Real Good Alignment
Learn Your Reference Model for Real Good Alignment
Alexey Gorbatovski
Boris Shaposhnikov
Alexey Malakhov
Nikita Surnachev
Yaroslav Aksenov
Ian Maksimov
Nikita Balagansky
Daniil Gavrilov
OffRL
54
26
0
15 Apr 2024
Direct Nash Optimization: Teaching Language Models to Self-Improve with
  General Preferences
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Corby Rosset
Ching-An Cheng
Arindam Mitra
Michael Santacroce
Ahmed Hassan Awadallah
Tengyang Xie
152
114
0
04 Apr 2024
Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model Alignment
Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model Alignment
Yuu Jinnai
Tetsuro Morimura
Kaito Ariu
Kenshi Abe
66
7
0
01 Apr 2024
Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization
Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization
Hritik Bansal
Ashima Suvarna
Gantavya Bhatt
Nanyun Peng
Kai-Wei Chang
Aditya Grover
ALM
64
9
0
31 Mar 2024
Curry-DPO: Enhancing Alignment using Curriculum Learning & Ranked
  Preferences
Curry-DPO: Enhancing Alignment using Curriculum Learning & Ranked Preferences
Pulkit Pattnaik
Rishabh Maheshwary
Kelechi Ogueji
Vikas Yadav
Sathwik Tejaswi Madhusudhan
31
18
0
12 Mar 2024
CURATRON: Complete Robust Preference Data for Robust Alignment of Large
  Language Models
CURATRON: Complete Robust Preference Data for Robust Alignment of Large Language Models
S. Nguyen
Uma-Naresh Niranjan
Theja Tulabandhula
36
0
0
05 Mar 2024
Balancing Enhancement, Harmlessness, and General Capabilities: Enhancing
  Conversational LLMs with Direct RLHF
Balancing Enhancement, Harmlessness, and General Capabilities: Enhancing Conversational LLMs with Direct RLHF
Chen Zheng
Ke Sun
Hang Wu
Chenguang Xi
Xun Zhou
52
12
0
04 Mar 2024
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large
  Language Models
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Minsuk Kahng
Ian Tenney
Mahima Pushkarna
Michael Xieyang Liu
James Wexler
Emily Reif
Krystal Kallarackal
Minsuk Chang
Michael Terry
Lucas Dixon
51
21
0
16 Feb 2024
Large Language Models are Effective Text Rankers with Pairwise Ranking
  Prompting
Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting
Zhen Qin
R. Jagerman
Kai Hui
Honglei Zhuang
Junru Wu
...
Tianqi Liu
Jialu Liu
Donald Metzler
Xuanhui Wang
Michael Bendersky
ALM
RALM
42
218
0
30 Jun 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
1