Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.10075
Cited By
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
19 August 2024
S. Poddar
Yanming Wan
Hamish Ivison
Abhishek Gupta
Natasha Jaques
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning"
14 / 14 papers shown
Title
Steerable Chatbots: Personalizing LLMs with Preference-Based Activation Steering
Jessica Y. Bo
Tianyu Xu
Ishan Chatterjee
Katrina Passarella-Ward
Achin Kulshrestha
D Shin
LLMSV
82
0
0
07 May 2025
Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination
Kunal Jha
Wilka Carvalho
Yancheng Liang
S. Du
Max Kleiman-Weiner
Natasha Jaques
27
0
0
17 Apr 2025
A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World Applications
Jian-Yu Guan
Jian Wu
J. Li
Chuanqi Cheng
Wei Yu Wu
LM&MA
74
0
0
21 Mar 2025
From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment
J. Li
Jian-Yu Guan
Songhao Wu
Wei Yu Wu
Rui Yan
64
1
0
19 Mar 2025
A Survey of Personalized Large Language Models: Progress and Future Directions
Jiahong Liu
Zexuan Qiu
Zhongyang Li
Quanyu Dai
Jieming Zhu
Minda Hu
Menglin Yang
Irwin King
LM&MA
54
2
0
17 Feb 2025
CTR-Driven Advertising Image Generation with Multimodal Large Language Models
Xingye Chen
Wei Feng
Zhenbang Du
Weizhen Wang
Y. Chen
...
Jingping Shao
Yuanjie Shao
Xinge You
Changxin Gao
Nong Sang
OffRL
47
2
0
05 Feb 2025
ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model
Lifan Jiang
Zhihui Wang
Siqi Yin
Guangxiao Ma
Peng Zhang
Boxi Wu
DiffM
59
0
0
28 Aug 2024
Pareto-Optimal Learning from Preferences with Hidden Context
Ryan Boldi
Li Ding
Lee Spector
S. Niekum
70
6
0
21 Jun 2024
A Roadmap to Pluralistic Alignment
Taylor Sorensen
Jared Moore
Jillian R. Fisher
Mitchell L. Gordon
Niloofar Mireshghallah
...
Liwei Jiang
Ximing Lu
Nouha Dziri
Tim Althoff
Yejin Choi
65
80
0
07 Feb 2024
Personalized Language Modeling from Personalized Human Feedback
Xinyu Li
Zachary C. Lipton
Liu Leqi
ALM
63
47
0
06 Feb 2024
Crowd-PrefRL: Preference-Based Reward Learning from Crowds
David Chhan
Ellen R. Novoseller
Vernon J. Lawhern
29
5
0
17 Jan 2024
vec2text with Round-Trip Translations
Geoffrey Cideron
Sertan Girgin
Anton Raichuk
Olivier Pietquin
Olivier Bachem
Léonard Hussenot
48
3
0
14 Sep 2022
Offline Reinforcement Learning with Implicit Q-Learning
Ilya Kostrikov
Ashvin Nair
Sergey Levine
OffRL
214
843
0
12 Oct 2021
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
280
1,595
0
18 Sep 2019
1