Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.08358
Cited By
v1
v2 (latest)
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
13 December 2023
Anand Siththaranjan
Cassidy Laidlaw
Dylan Hadfield-Menell
Re-assign community
ArXiv (abs)
PDF
HTML
Github (29★)
Papers citing
"Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF"
50 / 52 papers shown
Title
Theoretical Tensions in RLHF: Reconciling Empirical Success with Inconsistencies in Social Choice Theory
Jiancong Xiao
Zhekun Shi
Kaizhao Liu
Q. Long
Weijie J. Su
39
0
0
14 Jun 2025
Population-Proportional Preference Learning from Human Feedback: An Axiomatic Approach
Kihyun Kim
Jiawei Zhang
Asuman Ozdaglar
P. Parrilo
40
0
0
05 Jun 2025
MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning
Jingyan Shen
Jiarui Yao
Rui Yang
Yifan Sun
Feng Luo
Boyao Wang
Tong Zhang
Han Zhao
26
0
0
30 May 2025
Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences?
Paul Gölz
Nika Haghtalab
Kunhe Yang
51
0
0
29 May 2025
Fundamental Limits of Game-Theoretic LLM Alignment: Smith Consistency and Preference Matching
Zhekun Shi
Kaizhao Liu
Qi Long
Weijie J. Su
Jiancong Xiao
58
2
0
27 May 2025
What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs
Sangyeop Kim
Yohan Lee
Yongwoo Song
Kimin Lee
AAML
34
0
0
26 May 2025
WikiPersonas: What Can We Learn From Personalized Alignment to Famous People?
Zilu Tang
Afra Feyza Akyürek
Ekin Akyürek
Derry Wijaya
116
0
0
19 May 2025
Metric Distortion for Tournament Voting and Beyond
Moses Charikar
Prasanna Ramakrishnan
Zihan Tan
Kangning Wang
37
0
0
19 May 2025
Beyond Single-Point Judgment: Distribution Alignment for LLM-as-a-Judge
Luyu Chen
Zeyu Zhang
Haoran Tan
Quanyu Dai
Hao-ran Yang
Zhenhua Dong
Xu Chen
52
0
0
18 May 2025
Pairwise Calibrated Rewards for Pluralistic Alignment
Daniel Halpern
Evi Micha
Ariel D. Procaccia
Itai Shapira
23
0
0
17 May 2025
Learning Guarantee of Reward Modeling Using Deep Neural Networks
Yuanhang Luo
Yeheng Ge
Ruijian Han
Guohao Shen
73
0
0
10 May 2025
The Mind in the Machine: A Survey of Incorporating Psychological Theories in LLMs
Zizhou Liu
Ziwei Gong
Lin Ai
Zheng Hui
Run Chen
Colin Wayne Leach
Michelle R. Greene
Julia Hirschberg
LLMAG
487
0
0
28 Mar 2025
Capturing Individual Human Preferences with Reward Features
André Barreto
Vincent Dumoulin
Yiran Mao
Nicolas Perez-Nieves
Bobak Shahriari
Yann Dauphin
Doina Precup
Hugo Larochelle
ALM
89
2
0
21 Mar 2025
A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World Applications
Jian Guan
Jian Wu
Jia-Nan Li
Chuanqi Cheng
Wei Wu
LM&MA
171
3
0
21 Mar 2025
From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment
Jia-Nan Li
Jian Guan
Songhao Wu
Wei Wu
Rui Yan
171
3
0
19 Mar 2025
Strategyproof Reinforcement Learning from Human Feedback
Thomas Kleine Buening
Jiarui Gan
Debmalya Mandal
Marta Z. Kwiatkowska
61
1
0
13 Mar 2025
Improving LLM-as-a-Judge Inference with the Judgment Distribution
Victor Wang
Michael J.Q. Zhang
Eunsol Choi
114
4
0
04 Mar 2025
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models
Alberto Purpura
Sahil Wadhwa
Jesse Zymet
Akshay Gupta
Andy Luo
Melissa Kazemi Rad
Swapnil Shinde
Mohammad Sorower
AAML
469
0
0
03 Mar 2025
CoPL: Collaborative Preference Learning for Personalizing LLMs
Youngbin Choi
Seunghyuk Cho
M. Lee
Moonjeong Park
Yesong Ko
Jungseul Ok
Dongwoo Kim
112
0
0
03 Mar 2025
FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users
Anikait Singh
Sheryl Hsu
Kyle Hsu
E. Mitchell
Stefano Ermon
Tatsunori Hashimoto
Archit Sharma
Chelsea Finn
SyDa
OffRL
128
3
0
26 Feb 2025
Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions
Joseph Suh
Erfan Jahanparast
Suhong Moon
Minwoo Kang
Serina Chang
ALM
LM&MA
144
4
0
24 Feb 2025
Bone Soups: A Seek-and-Soup Model Merging Approach for Controllable Multi-Objective Generation
Guofu Xie
Xiao Zhang
Ting Yao
Yunsheng Shi
MoMe
155
1
0
15 Feb 2025
CTR-Driven Advertising Image Generation with Multimodal Large Language Models
Xingye Chen
Wei Feng
Zhenbang Du
Weizhen Wang
Yuxiao Chen
...
Jingping Shao
Yuanjie Shao
Xinge You
Changxin Gao
Nong Sang
OffRL
124
2
0
05 Feb 2025
Clone-Robust AI Alignment
Ariel D. Procaccia
Benjamin G. Schiffer
Shirley Zhang
48
3
0
17 Jan 2025
Geometric-Averaged Preference Optimization for Soft Preference Labels
Hiroki Furuta
Kuang-Huei Lee
Shixiang Shane Gu
Y. Matsuo
Aleksandra Faust
Heiga Zen
Izzeddin Gur
144
13
0
31 Dec 2024
Test-Time Alignment via Hypothesis Reweighting
Yoonho Lee
Jonathan Williams
Henrik Marklund
Archit Sharma
E. Mitchell
Anikait Singh
Chelsea Finn
144
5
0
11 Dec 2024
Beyond the Binary: Capturing Diverse Preferences With Reward Regularization
Vishakh Padmakumar
Chuanyang Jin
Hannah Rose Kirk
He He
102
6
0
05 Dec 2024
Contextualized Evaluations: Judging Language Model Responses to Underspecified Queries
Chaitanya Malaviya
Joseph Chee Chang
Dan Roth
Mohit Iyyer
Mark Yatskar
Kyle Lo
ELM
99
0
0
11 Nov 2024
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
Yunhan Zhao
Xiang Zheng
Lin Luo
Yige Li
Xingjun Ma
Yu-Gang Jiang
VLM
AAML
113
7
0
28 Oct 2024
Rethinking the Uncertainty: A Critical Review and Analysis in the Era of Large Language Models
Mohammad Beigi
Sijia Wang
Ying Shen
Zihao Lin
Adithya Kulkarni
...
Ming Jin
Jin-Hee Cho
Dawei Zhou
Chang-Tien Lu
Lifu Huang
87
1
0
26 Oct 2024
SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior
Jing-Jing Li
Valentina Pyatkin
Max Kleiman-Weiner
Liwei Jiang
Nouha Dziri
Anne Collins
Jana Schaich Borg
Maarten Sap
Yejin Choi
Sydney Levine
82
1
0
22 Oct 2024
Diverging Preferences: When do Annotators Disagree and do Models Know?
Michael J.Q. Zhang
Zhilin Wang
Jena D. Hwang
Yi Dong
Olivier Delalleau
Yejin Choi
Eunsol Choi
Xiang Ren
Valentina Pyatkin
106
13
0
18 Oct 2024
Quantile Regression for Distributional Reward Models in RLHF
Nicolai Dorka
102
26
0
16 Sep 2024
Beyond Preferences in AI Alignment
Tan Zhi-Xuan
Micah Carroll
Matija Franklin
Hal Ashton
138
18
0
30 Aug 2024
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
S. Poddar
Yanming Wan
Hamish Ivison
Abhishek Gupta
Natasha Jaques
104
50
0
19 Aug 2024
Improving Context-Aware Preference Modeling for Language Models
Silviu Pitis
Ziang Xiao
Nicolas Le Roux
Alessandro Sordoni
97
12
0
20 Jul 2024
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
Sibo Yi
Yule Liu
Zhen Sun
Tianshuo Cong
Xinlei He
Jiaxing Song
Ke Xu
Qi Li
AAML
122
111
0
05 Jul 2024
GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models
Leyan Wang
Yonggang Jin
Tianhao Shen
Tianyu Zheng
Xinrun Du
...
Jiaheng Liu
Shi Wang
Ge Zhang
Liuyu Xiang
Zhaofeng He
VLM
AI4MH
70
0
0
21 Jun 2024
Pareto-Optimal Learning from Preferences with Hidden Context
Ryan Boldi
Li Ding
Lee Spector
S. Niekum
153
6
0
21 Jun 2024
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning
Jifan Zhang
Lalit P. Jain
Yang Guo
Jiayi Chen
Kuan Lok Zhou
...
Scott Sievert
Timothy T. Rogers
Kevin Jamieson
Robert Mankoff
Robert Nowak
105
6
0
15 Jun 2024
Multi-objective Reinforcement learning from AI Feedback
Marcus Williams
95
1
0
11 Jun 2024
Aligning to Thousands of Preferences via System Message Generalization
Seongyun Lee
Sue Hyun Park
Seungone Kim
Minjoon Seo
ALM
113
49
0
28 May 2024
Direct Preference Optimization With Unobserved Preference Heterogeneity
Keertana Chidambaram
Karthik Vinay Seetharaman
Vasilis Syrgkanis
92
10
0
23 May 2024
Axioms for AI Alignment from Human Feedback
Luise Ge
Daniel Halpern
Evi Micha
Ariel D. Procaccia
Itai Shapira
Yevgeniy Vorobeychik
Junlin Wu
77
24
0
23 May 2024
Hummer: Towards Limited Competitive Preference Dataset
Li Jiang
Yusen Wu
Junwu Xiong
Jingqing Ruan
Yichuan Ding
Qingpei Guo
ZuJie Wen
Jun Zhou
Xiaotie Deng
91
7
0
19 May 2024
Mapping Social Choice Theory to RLHF
Jessica Dai
Eve Fleisig
47
18
0
19 Apr 2024
Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback
Vincent Conitzer
Rachel Freedman
J. Heitzig
Wesley H. Holliday
Bob M. Jacobs
...
Eric Pacuit
Stuart Russell
Hailey Schoelkopf
Emanuel Tewolde
W. Zwicker
118
40
0
16 Apr 2024
Post-Hoc Reversal: Are We Selecting Models Prematurely?
Rishabh Ranjan
Saurabh Garg
Mrigank Raman
Carlos Guestrin
Zachary Chase Lipton
76
0
0
11 Apr 2024
Scalable Interactive Machine Learning for Future Command and Control
Anna Madison
Ellen R. Novoseller
Vinicius G. Goecks
Benjamin T. Files
Nicholas R. Waytowich
Alfred Yu
Vernon J. Lawhern
Steven Thurman
Christopher Kelshaw
Kaleb McDowell
70
4
0
09 Feb 2024
A Roadmap to Pluralistic Alignment
Taylor Sorensen
Jared Moore
Jillian R. Fisher
Mitchell L. Gordon
Niloofar Mireshghallah
...
Liwei Jiang
Ximing Lu
Nouha Dziri
Tim Althoff
Yejin Choi
142
95
0
07 Feb 2024
1
2
Next