Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.13131
Cited By
v1
v2 (latest)
Rethinking Diverse Human Preference Learning through Principal Component Analysis
18 February 2025
Feng Luo
Rui Yang
Hao Sun
Chunyuan Deng
Jiarui Yao
Jingyan Shen
Huan Zhang
Hanjie Chen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Rethinking Diverse Human Preference Learning through Principal Component Analysis"
33 / 33 papers shown
Title
Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs
Hao Sun
Yunyi Shen
Jean-Francois Ton
M. Schaar
55
1
0
04 Feb 2025
Reviving The Classics: Active Reward Modeling in Large Language Model Alignment
Yunyi Shen
Hao Sun
Jean-Francois Ton
47
3
0
04 Feb 2025
Test-Time Alignment via Hypothesis Reweighting
Yoonho Lee
Jonathan Williams
Henrik Marklund
Archit Sharma
E. Mitchell
Anikait Singh
Chelsea Finn
131
5
0
11 Dec 2024
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
Chris Yuhao Liu
Liang Zeng
Qingbin Liu
Rui Yan
Jujie He
Chaojie Wang
Shuicheng Yan
Yang Liu
Yahui Zhou
AI4TS
111
116
0
24 Oct 2024
How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective
Teng Xiao
Mingxiao Li
Yige Yuan
Huaisheng Zhu
Chao Cui
V. Honavar
ALM
66
9
0
14 Oct 2024
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
Yifan Zhang
Ge Zhang
Yue Wu
Kangping Xu
Quanquan Gu
93
3
0
03 Oct 2024
Improving Context-Aware Preference Modeling for Language Models
Silviu Pitis
Ziang Xiao
Nicolas Le Roux
Alessandro Sordoni
88
12
0
20 Jul 2024
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts
Haoxiang Wang
Wei Xiong
Tengyang Xie
Han Zhao
Tong Zhang
117
180
0
18 Jun 2024
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs
Rui Yang
Ruomeng Ding
Yong Lin
Huan Zhang
Tong Zhang
103
62
0
14 Jun 2024
Scalable Ensembling For Mitigating Reward Overoptimisation
Ahmed M. Ahmed
Rafael Rafailov
Stepan Sharkov
Xuechen Li
Oluwasanmi Koyejo
145
5
0
03 Jun 2024
Direct Preference Optimization With Unobserved Preference Heterogeneity
Keertana Chidambaram
Karthik Vinay Seetharaman
Vasilis Syrgkanis
85
10
0
23 May 2024
DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling
Shanghaoran Quan
MoE
OffRL
71
10
0
02 Mar 2024
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions
Hanjie Chen
Zhouxiang Fang
Yash Singla
Mark Dredze
ELM
AI4MH
119
43
0
28 Feb 2024
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Kenneth Li
Samy Jelassi
Hugh Zhang
Sham Kakade
Martin Wattenberg
David Brandfonbrener
114
11
0
22 Feb 2024
Uncovering Latent Human Wellbeing in Language Model Embeddings
Pedro Freire
ChengCheng Tan
Adam Gleave
Dan Hendrycks
Scott Emmons
77
1
0
19 Feb 2024
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
Rui Yang
Xiaoman Pan
Feng Luo
Shuang Qiu
Han Zhong
Dong Yu
Jianshu Chen
191
83
0
15 Feb 2024
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences
Souradip Chakraborty
Jiahao Qiu
Hui Yuan
Alec Koppel
Furong Huang
Dinesh Manocha
Amrit Singh Bedi
Mengdi Wang
ALM
92
60
0
14 Feb 2024
HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM
Zhilin Wang
Yi Dong
Jiaqi Zeng
Virginia Adams
Makesh Narsimhan Sreedhar
...
Olivier Delalleau
Jane Polak Scowcroft
Neel Kant
Aidan Swope
Oleksii Kuchaiev
3DV
64
77
0
16 Nov 2023
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences
Yuanhe Tian
Ruyi Gan
Yan Song
Jiaxing Zhang
Yongdong Zhang
AI4MH
AI4CE
LM&MA
113
41
0
10 Nov 2023
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
Joel Jang
Seungone Kim
Bill Yuchen Lin
Yizhong Wang
Jack Hessel
Luke Zettlemoyer
Hannaneh Hajishirzi
Yejin Choi
Prithviraj Ammanabrolu
MoMe
119
153
0
17 Oct 2023
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
Zhanhui Zhou
Jie Liu
Chao Yang
Jing Shao
Yu Liu
Xiangyu Yue
Wanli Ouyang
Yu Qiao
73
61
0
05 Oct 2023
Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL
Hao Sun
Alihan Huyuk
M. Schaar
OffRL
LRM
70
30
0
13 Sep 2023
ExpeL: LLM Agents Are Experiential Learners
Andrew Zhao
Daniel Huang
Quentin Xu
Matthieu Lin
Yang Liu
Gao Huang
LLMAG
112
224
0
20 Aug 2023
LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion
Dongfu Jiang
Xiang Ren
Bill Yuchen Lin
ELM
86
333
0
05 Jun 2023
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu
Yushi Hu
Weijia Shi
Nouha Dziri
Alane Suhr
Prithviraj Ammanabrolu
Noah A. Smith
Mari Ostendorf
Hannaneh Hajishirzi
ALM
153
335
0
02 Jun 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.5K
14,761
0
15 Mar 2023
Exploring Dimensionality Reduction Techniques in Multilingual Transformers
Álvaro Huertas-García
Alejandro Martín
Javier Huertas-Tato
David Camacho
54
8
0
18 Apr 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
256
2,623
0
12 Apr 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
888
13,207
0
04 Mar 2022
WebGPT: Browser-assisted question-answering with human feedback
Reiichiro Nakano
Jacob Hilton
S. Balaji
Jeff Wu
Ouyang Long
...
Gretchen Krueger
Kevin Button
Matthew Knight
B. Chess
John Schulman
ALM
RALM
196
1,294
0
17 Dec 2021
Learning to summarize from human feedback
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
259
2,189
0
02 Sep 2020
DeepMDP: Learning Continuous Latent Space Models for Representation Learning
Carles Gelada
Saurabh Kumar
Jacob Buckman
Ofir Nachum
Marc G. Bellemare
BDL
88
288
0
06 Jun 2019
Deep reinforcement learning from human preferences
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
218
3,377
0
12 Jun 2017
1