ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.18571
  4. Cited By
Arithmetic Control of LLMs for Diverse User Preferences: Directional
  Preference Alignment with Multi-Objective Rewards

Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards

28 February 2024
Haoxiang Wang
Yong Lin
Wei Xiong
Rui Yang
Shizhe Diao
Shuang Qiu
Han Zhao
Tong Zhang
ArXivPDFHTML

Papers citing "Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards"

50 / 63 papers shown
Title
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning
Xiaokun Wang
Chris
Jiangbo Pei
Wei Shen
Yi Peng
...
Ai Jian
Tianyidan Xie
Xuchen Song
Yang Liu
Yahui Zhou
OffRL
LRM
72
1
0
12 May 2025
Drift: Decoding-time Personalized Alignments with Implicit User Preferences
Drift: Decoding-time Personalized Alignments with Implicit User Preferences
Minbeom Kim
Kang-il Lee
Seongho Joo
Hwaran Lee
Thibaut Thonet
Kyomin Jung
AI4TS
138
1
0
20 Feb 2025
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
146
6
0
07 Nov 2024
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
Gabrielle Kaili-May Liu
Bowen Shi
Avi Caciularu
Idan Szpektor
Arman Cohan
115
4
0
30 Oct 2024
Taming Overconfidence in LLMs: Reward Calibration in RLHF
Taming Overconfidence in LLMs: Reward Calibration in RLHF
Jixuan Leng
Chengsong Huang
Banghua Zhu
Jiaxin Huang
63
11
0
13 Oct 2024
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Yuancheng Xu
Udari Madhushani Sehwag
Alec Koppel
Sicheng Zhu
Bang An
Furong Huang
Sumitra Ganesh
84
11
0
10 Oct 2024
From Lists to Emojis: How Format Bias Affects Model Alignment
From Lists to Emojis: How Format Bias Affects Model Alignment
Xuanchang Zhang
Wei Xiong
Lichang Chen
Dinesh Manocha
Heng Huang
Tong Zhang
ALM
74
12
0
18 Sep 2024
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Wenxuan Zhang
Philip Torr
Mohamed Elhoseiny
Adel Bibi
132
10
0
27 Aug 2024
Personality Alignment of Large Language Models
Personality Alignment of Large Language Models
Minjun Zhu
Linyi Yang
Yue Zhang
Yue Zhang
ALM
90
6
0
21 Aug 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
110
62
0
29 Apr 2024
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
Liqiang Jing
Xinya Du
115
17
0
07 Apr 2024
Self-Rewarding Language Models
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLM
SyDa
ALM
LRM
287
312
0
18 Jan 2024
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Gokul Swamy
Christoph Dann
Rahul Kidambi
Zhiwei Steven Wu
Alekh Agarwal
OffRL
83
105
0
08 Jan 2024
Nash Learning from Human Feedback
Nash Learning from Human Feedback
Rémi Munos
Michal Valko
Daniele Calandriello
M. G. Azar
Mark Rowland
...
Nikola Momchev
Olivier Bachem
D. Mankowitz
Doina Precup
Bilal Piot
76
137
0
01 Dec 2023
HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM
HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM
Zhilin Wang
Yi Dong
Jiaqi Zeng
Virginia Adams
Makesh Narsimhan Sreedhar
...
Olivier Delalleau
Jane Polak Scowcroft
Neel Kant
Aidan Swope
Oleksii Kuchaiev
3DV
44
70
0
16 Nov 2023
Zephyr: Direct Distillation of LM Alignment
Zephyr: Direct Distillation of LM Alignment
Lewis Tunstall
E. Beeching
Nathan Lambert
Nazneen Rajani
Kashif Rasul
...
Nathan Habib
Nathan Sarrazin
Omar Sanseviero
Alexander M. Rush
Thomas Wolf
ALM
74
382
0
25 Oct 2023
A General Theoretical Paradigm to Understand Learning from Human
  Preferences
A General Theoretical Paradigm to Understand Learning from Human Preferences
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
135
597
0
18 Oct 2023
Personalized Soups: Personalized Large Language Model Alignment via
  Post-hoc Parameter Merging
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
Joel Jang
Seungone Kim
Bill Yuchen Lin
Yizhong Wang
Jack Hessel
Luke Zettlemoyer
Hannaneh Hajishirzi
Yejin Choi
Prithviraj Ammanabrolu
MoMe
84
140
0
17 Oct 2023
Verbosity Bias in Preference Labeling by Large Language Models
Verbosity Bias in Preference Labeling by Large Language Models
Keita Saito
Akifumi Wachi
Koki Wataoka
Youhei Akimoto
ALM
37
32
0
16 Oct 2023
Mistral 7B
Mistral 7B
Albert Q. Jiang
Alexandre Sablayrolles
A. Mensch
Chris Bamford
Devendra Singh Chaplot
...
Teven Le Scao
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoE
LRM
46
2,102
0
10 Oct 2023
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to
  RLHF
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF
Yi Dong
Zhilin Wang
Makesh Narsimhan Sreedhar
Xianchao Wu
Oleksii Kuchaiev
ALM
LLMSV
66
68
0
09 Oct 2023
A Long Way to Go: Investigating Length Correlations in RLHF
A Long Way to Go: Investigating Length Correlations in RLHF
Prasann Singhal
Tanya Goyal
Jiacheng Xu
Greg Durrett
92
148
0
05 Oct 2023
Efficient Memory Management for Large Language Model Serving with
  PagedAttention
Efficient Memory Management for Large Language Model Serving with PagedAttention
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
134
2,049
0
12 Sep 2023
Peering Through Preferences: Unraveling Feedback Acquisition for
  Aligning Large Language Models
Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models
Hritik Bansal
John Dang
Aditya Grover
ALM
63
20
0
30 Aug 2023
Revisiting Scalarization in Multi-Task Learning: A Theoretical
  Perspective
Revisiting Scalarization in Multi-Task Learning: A Theoretical Perspective
Yuzheng Hu
Ruicheng Xian
Qilong Wu
Qiuling Fan
Lang Yin
Han Zhao
62
41
0
27 Aug 2023
Reinforced Self-Training (ReST) for Language Modeling
Reinforced Self-Training (ReST) for Language Modeling
Çağlar Gülçehre
T. Paine
S. Srinivasan
Ksenia Konyushkova
L. Weerts
...
Chenjie Gu
Wolfgang Macherey
Arnaud Doucet
Orhan Firat
Nando de Freitas
OffRL
103
293
0
17 Aug 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from
  Human Feedback
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
96
497
0
27 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
221
11,636
0
18 Jul 2023
BeaverTails: Towards Improved Safety Alignment of LLM via a
  Human-Preference Dataset
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Jiaming Ji
Mickel Liu
Juntao Dai
Xuehai Pan
Chi Zhang
Ce Bian
Chi Zhang
Ruiyang Sun
Yizhou Wang
Yaodong Yang
ALM
71
460
0
10 Jul 2023
Is RLHF More Difficult than Standard RL?
Is RLHF More Difficult than Standard RL?
Yuanhao Wang
Qinghua Liu
Chi Jin
OffRL
29
60
0
25 Jun 2023
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large
  Foundation Models
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models
Shizhe Diao
Rui Pan
Hanze Dong
Kashun Shum
Jipeng Zhang
Wei Xiong
Tong Zhang
ALM
48
64
0
21 Jun 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
295
4,186
0
09 Jun 2023
Rewarded soups: towards Pareto-optimal alignment by interpolating
  weights fine-tuned on diverse rewards
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
Alexandre Ramé
Guillaume Couairon
Mustafa Shukor
Corentin Dancette
Jean-Baptiste Gaya
Laure Soulier
Matthieu Cord
MoMe
48
144
0
07 Jun 2023
Fine-Grained Human Feedback Gives Better Rewards for Language Model
  Training
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu
Yushi Hu
Weijia Shi
Nouha Dziri
Alane Suhr
Prithviraj Ammanabrolu
Noah A. Smith
Mari Ostendorf
Hannaneh Hajishirzi
ALM
120
321
0
02 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
295
3,712
0
29 May 2023
Moral Machine or Tyranny of the Majority?
Moral Machine or Tyranny of the Majority?
Michael Feffer
Hoda Heidari
Zachary Chase Lipton
43
26
0
27 May 2023
Enhancing Chat Language Models by Scaling High-quality Instructional
  Conversations
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
Ning Ding
Yulin Chen
Bokai Xu
Yujia Qin
Zhi Zheng
Shengding Hu
Zhiyuan Liu
Maosong Sun
Bowen Zhou
ALM
91
511
0
23 May 2023
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
Yao-Min Zhao
Rishabh Joshi
Tianqi Liu
Misha Khalman
Mohammad Saleh
Peter J. Liu
59
284
0
17 May 2023
OpenAssistant Conversations -- Democratizing Large Language Model
  Alignment
OpenAssistant Conversations -- Democratizing Large Language Model Alignment
Andreas Kopf
Yannic Kilcher
Dimitri von Rutte
Sotiris Anagnostidis
Zhi Rui Tam
...
Arnav Dantuluri
Andrew Maguire
Christoph Schuhmann
Huu Nguyen
A. Mattick
ALM
LM&MA
94
611
0
14 Apr 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Rui Pan
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
54
439
0
13 Apr 2023
RRHF: Rank Responses to Align Language Models with Human Feedback
  without tears
RRHF: Rank Responses to Align Language Models with Human Feedback without tears
Zheng Yuan
Hongyi Yuan
Chuanqi Tan
Wei Wang
Songfang Huang
Feiran Huang
ALM
87
358
0
11 Apr 2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards
  and Ethical Behavior in the MACHIAVELLI Benchmark
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Alexander Pan
Chan Jun Shern
Andy Zou
Nathaniel Li
Steven Basart
Thomas Woodside
Jonathan Ng
Hanlin Zhang
Scott Emmons
Dan Hendrycks
46
130
0
06 Apr 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
797
13,788
0
15 Mar 2023
Optimizing Prompts for Text-to-Image Generation
Optimizing Prompts for Text-to-Image Generation
Y. Hao
Zewen Chi
Li Dong
Furu Wei
76
143
0
19 Dec 2022
Constitutional AI: Harmlessness from AI Feedback
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
156
1,583
0
15 Dec 2022
Fine-tuning language models to find agreement among humans with diverse
  preferences
Fine-tuning language models to find agreement among humans with diverse preferences
Michiel A. Bakker
Martin Chadwick
Hannah R. Sheahan
Michael Henry Tessler
Lucy Campbell-Gillingham
...
Nat McAleese
Amelia Glaese
John Aslanides
M. Botvinick
Christopher Summerfield
ALM
89
219
0
28 Nov 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
311
2,364
0
09 Nov 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning
  from Human Feedback
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
224
2,457
0
12 Apr 2022
PaLM: Scaling Language Modeling with Pathways
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
351
6,132
0
05 Apr 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
721
12,525
0
04 Mar 2022
12
Next