Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.06256
Cited By
Mitigating the Alignment Tax of RLHF
12 September 2023
Yong Lin
Hangyu Lin
Wei Xiong
Shizhe Diao
Zeming Zheng
Jipeng Zhang
Lu Tan
Haoxiang Wang
Wenbin Hu
Hanning Zhang
Hanze Dong
Renjie Pi
Han Zhao
Nan Jiang
Heng Ji
Yuan Yao
Tong Zhang
MoMe
CLL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mitigating the Alignment Tax of RLHF"
45 / 45 papers shown
Title
Paying Alignment Tax with Contrastive Learning
Buse Sibel Korkmaz
Rahul Nair
Elizabeth M. Daly
Antonio del Rio Chanona
50
0
0
25 May 2025
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models
Chi-Yuan Hsiao
Ke-Han Lu
Kai-Wei Chang
Chih-Kai Yang
Wei-Chih Chen
Hung-yi Lee
CLL
MoMe
153
0
0
23 May 2025
ExeSQL: Self-Taught Text-to-SQL Models with Execution-Driven Bootstrapping for SQL Dialects
Jipeng Zhang
Haolin Yang
Kehao Miao
Ruiyuan Zhang
Renjie Pi
Jiahui Gao
Xiaofang Zhou
141
0
0
22 May 2025
CONGRAD:Conflicting Gradient Filtering for Multilingual Preference Alignment
Jiangnan Li
Thuy-Trang Vu
Christian Herold
Amirhossein Tebbifakhr
Shahram Khadivi
Gholamreza Haffari
103
0
0
31 Mar 2025
Unlocking the Power of Function Vectors for Characterizing and Mitigating Catastrophic Forgetting in Continual Instruction Tuning
Gangwei Jiang
Caigao Jiang
Zhaoyi Li
Siqiao Xue
Jun-ping Zhou
Linqi Song
Defu Lian
Yin Wei
CLL
MU
100
1
0
16 Feb 2025
Evolutionary Optimization of Model Merging Recipes
Takuya Akiba
Makoto Shing
Yujin Tang
Qi Sun
David Ha
MoMe
247
118
0
28 Jan 2025
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
Yupeng Chen
Senmiao Wang
Zhihang Lin
Zhihang Lin
Yushun Zhang
Tian Ding
Ruoyu Sun
Ruoyu Sun
CLL
131
3
0
30 Jul 2024
Language Model Alignment with Elastic Reset
Michael Noukhovitch
Samuel Lavoie
Florian Strub
Aaron Courville
KELM
125
26
0
06 Dec 2023
Zephyr: Direct Distillation of LM Alignment
Lewis Tunstall
E. Beeching
Nathan Lambert
Nazneen Rajani
Kashif Rasul
...
Nathan Habib
Nathan Sarrazin
Omar Sanseviero
Alexander M. Rush
Thomas Wolf
ALM
89
387
0
25 Oct 2023
A General Theoretical Paradigm to Understand Learning from Human Preferences
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
163
615
0
18 Oct 2023
Reinforced Self-Training (ReST) for Language Modeling
Çağlar Gülçehre
T. Paine
S. Srinivasan
Ksenia Konyushkova
L. Weerts
...
Chenjie Gu
Wolfgang Macherey
Arnaud Doucet
Orhan Firat
Nando de Freitas
OffRL
105
297
0
17 Aug 2023
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
Alexandre Ramé
Guillaume Couairon
Mustafa Shukor
Corentin Dancette
Jean-Baptiste Gaya
Laure Soulier
Matthieu Cord
MoMe
53
149
0
07 Jun 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Rui Pan
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
62
450
0
13 Apr 2023
RRHF: Rank Responses to Align Language Models with Human Feedback without tears
Zheng Yuan
Hongyi Yuan
Chuanqi Tan
Wei Wang
Songfang Huang
Feiran Huang
ALM
140
369
0
11 Apr 2023
Progressive Prompts: Continual Learning for Language Models
Anastasia Razdaibiedina
Yuning Mao
Rui Hou
Madian Khabsa
M. Lewis
Amjad Almahairi
VLM
KELM
CLL
92
135
0
29 Jan 2023
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
353
2,377
0
09 Nov 2022
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
OffRL
82
246
0
03 Oct 2022
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
420
6,202
0
05 Apr 2022
ELLE: Efficient Lifelong Pre-training for Emerging Data
Yujia Qin
Jiajie Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
76
72
0
12 Mar 2022
Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution
Ananya Kumar
Aditi Raghunathan
Robbie Jones
Tengyu Ma
Percy Liang
OODD
107
669
0
21 Feb 2022
A General Language Assistant as a Laboratory for Alignment
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
...
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
114
775
0
01 Dec 2021
Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora
Xisen Jin
Dejiao Zhang
Henghui Zhu
Wei Xiao
Shang-Wen Li
Xiaokai Wei
Andrew O. Arnold
Xiang Ren
KELM
CLL
83
116
0
16 Oct 2021
Recursively Summarizing Books with Human Feedback
Jeff Wu
Long Ouyang
Daniel M. Ziegler
Nissan Stiennon
Ryan J. Lowe
Jan Leike
Paul Christiano
ALM
139
302
0
22 Sep 2021
Robust fine-tuning of zero-shot models
Mitchell Wortsman
Gabriel Ilharco
Jong Wook Kim
Mike Li
Simon Kornblith
...
Raphael Gontijo-Lopes
Hannaneh Hajishirzi
Ali Farhadi
Hongseok Namkoong
Ludwig Schmidt
VLM
114
721
0
04 Sep 2021
Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble
Seunghyun Lee
Younggyo Seo
Kimin Lee
Pieter Abbeel
Jinwoo Shin
OffRL
OnRL
57
188
0
01 Jul 2021
Co
2
^2
2
L: Contrastive Continual Learning
Hyuntak Cha
Jaeho Lee
Jinwoo Shin
CLL
SSL
VLM
97
301
0
28 Jun 2021
Continual Learning for Text Classification with Information Disentanglement Based Regularization
Yufan Huang
Yanzhe Zhang
Jiaao Chen
Xuezhi Wang
Diyi Yang
CLL
59
111
0
12 Apr 2021
New Insights on Reducing Abrupt Representation Change in Online Continual Learning
Lucas Caccia
Rahaf Aljundi
Nader Asadi
Tinne Tuytelaars
Joelle Pineau
Eugene Belilovsky
CLL
46
200
0
11 Apr 2021
SWAD: Domain Generalization by Seeking Flat Minima
Junbum Cha
Sanghyuk Chun
Kyungjae Lee
Han-Cheol Cho
Seunghyun Park
Yunsung Lee
Sungrae Park
MoMe
275
451
0
17 Feb 2021
Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning
Zeyuan Allen-Zhu
Yuanzhi Li
FedML
114
370
0
17 Dec 2020
Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO
Logan Engstrom
Andrew Ilyas
Shibani Santurkar
Dimitris Tsipras
Firdaus Janoos
L. Rudolph
Aleksander Madry
AAML
51
226
0
25 May 2020
Dark Experience for General Continual Learning: a Strong, Simple Baseline
Pietro Buzzega
Matteo Boschini
Angelo Porrello
Davide Abati
Simone Calderara
BDL
CLL
72
909
0
15 Apr 2020
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
452
1,717
0
18 Sep 2019
On the Weaknesses of Reinforcement Learning for Neural Machine Translation
Leshem Choshen
Lior Fox
Zohar Aizenbud
Omri Abend
77
107
0
03 Jul 2019
Efficient Lifelong Learning with A-GEM
Arslan Chaudhry
MarcÁurelio Ranzato
Marcus Rohrbach
Mohamed Elhoseiny
CLL
188
1,449
0
02 Dec 2018
Experience Replay for Continual Learning
David Rolnick
Arun Ahuja
Jonathan Richard Schwarz
Timothy Lillicrap
Greg Wayne
CLL
112
1,156
0
28 Nov 2018
Know What You Don't Know: Unanswerable Questions for SQuAD
Pranav Rajpurkar
Robin Jia
Percy Liang
RALM
ELM
247
2,837
0
11 Jun 2018
Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting
H. Ritter
Aleksandar Botev
David Barber
BDL
CLL
81
329
0
20 May 2018
Progress & Compress: A scalable framework for continual learning
Jonathan Richard Schwarz
Jelena Luketina
Wojciech M. Czarnecki
A. Grabska-Barwinska
Yee Whye Teh
Razvan Pascanu
R. Hadsell
CLL
108
886
0
16 May 2018
Explicit Inductive Bias for Transfer Learning with Convolutional Networks
Xuhong Li
Yves Grandvalet
Franck Davoine
SSL
74
354
0
05 Feb 2018
Memory Aware Synapses: Learning what (not) to forget
Rahaf Aljundi
F. Babiloni
Mohamed Elhoseiny
Marcus Rohrbach
Tinne Tuytelaars
KELM
CLL
83
1,628
0
27 Nov 2017
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
444
18,931
0
20 Jul 2017
Overcoming catastrophic forgetting in neural networks
J. Kirkpatrick
Razvan Pascanu
Neil C. Rabinowitz
J. Veness
Guillaume Desjardins
...
A. Grabska-Barwinska
Demis Hassabis
Claudia Clopath
D. Kumaran
R. Hadsell
CLL
317
7,478
0
02 Dec 2016
iCaRL: Incremental Classifier and Representation Learning
Sylvestre-Alvise Rebuffi
Alexander Kolesnikov
G. Sperl
Christoph H. Lampert
CLL
OOD
129
3,741
0
23 Nov 2016
Visualizing and Understanding Convolutional Networks
Matthew D. Zeiler
Rob Fergus
FAtt
SSL
486
15,861
0
12 Nov 2013
1