Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.08699
Cited By
v1
v2 (latest)
Mitigating Catastrophic Forgetting in Language Transfer via Model Merging
11 July 2024
Anton Alexandrov
Veselin Raychev
Mark Niklas Muller
Ce Zhang
Martin Vechev
Kristina Toutanova
MoMe
CLL
KELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Mitigating Catastrophic Forgetting in Language Transfer via Model Merging"
50 / 55 papers shown
Title
Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs
Zeping Yu
Sophia Ananiadou
MoMe
KELM
CLL
83
0
0
22 May 2025
ExeSQL: Self-Taught Text-to-SQL Models with Execution-Driven Bootstrapping for SQL Dialects
Jipeng Zhang
Haolin Yang
Kehao Miao
Ruiyuan Zhang
Renjie Pi
Jiahui Gao
Xiaofang Zhou
173
0
0
22 May 2025
SEA-LION: Southeast Asian Languages in One Network
Raymond Ng
Thanh Ngan Nguyen
Yuli Huang
Ngee Chia Tai
Wai Yi Leong
...
David Ong Tat-Wee
B. Liu
William-Chandra Tjhi
Min Zhang
Leslie Teo
116
14
0
08 Apr 2025
Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization
Iñigo Pikabea
Iñaki Lacunza
Oriol Pareras
Carlos Escolano
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
VLM
186
1
0
28 Mar 2025
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
Han Wu
Yuxuan Yao
Shuqi Liu
Zehua Liu
Xiaojin Fu
Xiongwei Han
Xianrui Li
Hui-Ling Zhen
Tao Zhong
Mingxuan Yuan
MoMe
LRM
127
14
0
26 Mar 2025
Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models
Shuqi Liu
Han Wu
Bowei He
Xiongwei Han
Mingxuan Yuan
Linqi Song
MoMe
114
3
0
20 Feb 2025
Extending LLMs to New Languages: A Case Study of Llama and Persian Adaptation
Samin Mahdizadeh Sani
Pouya Sadeghi
Thuy-Trang Vu
Yadollah Yaghoobzadeh
Gholamreza Haffari
153
2
0
17 Dec 2024
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models
Sombit Dey
Jan-Nico Zaech
Nikolay Nikolov
Luc Van Gool
Danda Pani Paudel
MoMe
VLM
102
5
0
23 Sep 2024
LoRA Learns Less and Forgets Less
D. Biderman
Jose Javier Gonzalez Ortiz
Jacob P. Portes
Mansheej Paul
Philip Greengard
...
Sam Havens
Vitaliy Chiley
Jonathan Frankle
Cody Blakeney
John P. Cunningham
CLL
114
141
0
15 May 2024
Continual Learning of Large Language Models: A Comprehensive Survey
Haizhou Shi
Zihao Xu
Hengyi Wang
Weiyi Qin
Wenyuan Wang
Yibin Wang
Zifeng Wang
Sayna Ebrahimi
Hao Wang
CLL
KELM
LRM
124
87
0
25 Apr 2024
Model Stock: All we need is just a few fine-tuned models
Dong-Hwan Jang
Sangdoo Yun
Dongyoon Han
OODD
MoMe
77
45
0
28 Mar 2024
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Charles Goddard
Shamane Siriwardhana
Malikeh Ehghaghi
Luke Meyers
Vladimir Karpukhin
Brian Benedict
Mark McQuade
Jacob Solawetz
MoMe
KELM
153
101
0
20 Mar 2024
Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral
Yiming Cui
Xin Yao
30
5
0
04 Mar 2024
Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal
Jianheng Huang
Leyang Cui
Ante Wang
Chengyi Yang
Xinting Liao
Linfeng Song
Junfeng Yao
Jinsong Su
KELM
CLL
71
46
0
02 Mar 2024
Instruction-tuned Language Models are Better Knowledge Learners
Zhengbao Jiang
Zhiqing Sun
Weijia Shi
Pedro Rodriguez
Chunting Zhou
Graham Neubig
Xi Lin
Wen-tau Yih
Srinivasan Iyer
KELM
75
41
0
20 Feb 2024
Fast Vocabulary Transfer for Language Model Compression
Leonidas Gee
Andrea Zugarini
Leonardo Rigutini
Paolo Torroni
59
32
0
15 Feb 2024
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
189
412
0
01 Feb 2024
Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks
Mohammad-Javad Davari
Eugene Belilovsky
MoMe
80
70
0
11 Dec 2023
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Le Yu
Yu Bowen
Haiyang Yu
Fei Huang
Yongbin Li
MoMe
109
335
0
06 Nov 2023
CITB: A Benchmark for Continual Instruction Tuning
Zihan Zhang
Meng Fang
Ling-Hao Chen
Mohammad-Reza Namazi-Rad
ALM
CLL
90
25
0
23 Oct 2023
At Which Training Stage Does Code Data Help LLMs Reasoning?
Xiaogang Jia
Yue Liu
Yue Yu
Yuanliang Zhang
Yu Jiang
Changjian Wang
Shanshan Li
LRM
SyDa
91
68
0
28 Sep 2023
SlimPajama-DC: Understanding Data Combinations for LLM Training
Zhiqiang Shen
Tianhua Tao
Liqun Ma
Willie Neiswanger
Zhengzhong Liu
...
Bowen Tan
Joel Hestness
Natalia Vassilieva
Daria Soboleva
Eric Xing
68
49
0
19 Sep 2023
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Thuat Nguyen
Chien Van Nguyen
Viet Dac Lai
Hieu Man
Nghia Trung Ngo
Franck Dernoncourt
Ryan Rossi
Thien Huu Nguyen
102
112
0
17 Sep 2023
Mitigating the Alignment Tax of RLHF
Yong Lin
Hangyu Lin
Wei Xiong
Shizhe Diao
Zeming Zheng
...
Han Zhao
Nan Jiang
Heng Ji
Yuan Yao
Tong Zhang
MoMe
CLL
85
80
0
12 Sep 2023
Platypus: Quick, Cheap, and Powerful Refinement of LLMs
Ariel N. Lee
Cole J. Hunter
Nataniel Ruiz
ALM
ObjD
82
142
0
14 Aug 2023
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Tri Dao
LRM
117
1,326
0
17 Jul 2023
TIES-Merging: Resolving Interference When Merging Models
Prateek Yadav
Derek Tam
Leshem Choshen
Colin Raffel
Joey Tianyi Zhou
MoMe
120
317
0
02 Jun 2023
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Sang Michael Xie
Hieu H. Pham
Xuanyi Dong
Nan Du
Hanxiao Liu
Yifeng Lu
Percy Liang
Quoc V. Le
Tengyu Ma
Adams Wei Yu
MoMe
MoE
136
203
0
17 May 2023
ZipIt! Merging Models from Different Tasks without Training
George Stoica
Daniel Bolya
J. Bjorner
Pratik Ramesh
Taylor N. Hearn
Judy Hoffman
VLM
MoMe
108
125
0
04 May 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
1.5K
13,472
0
27 Feb 2023
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
Shayne Longpre
Le Hou
Tu Vu
Albert Webson
Hyung Won Chung
...
Denny Zhou
Quoc V. Le
Barret Zoph
Jason W. Wei
Adam Roberts
ALM
114
677
0
31 Jan 2023
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning
Shachar Don-Yehiya
Elad Venezian
Colin Raffel
Noam Slonim
Yoav Katz
Leshem Choshen
MoMe
82
55
0
02 Dec 2022
lo-fi: distributed fine-tuning without communication
Mitchell Wortsman
Suchin Gururangan
Shen Li
Ali Farhadi
Ludwig Schmidt
Michael G. Rabbat
Ari S. Morcos
77
24
0
19 Oct 2022
Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models
Margaret Li
Suchin Gururangan
Tim Dettmers
M. Lewis
Tim Althoff
Noah A. Smith
Luke Zettlemoyer
MoMe
95
153
0
05 Aug 2022
Fine-tuned Language Models are Continual Learners
Thomas Scialom
Tuhin Chakrabarty
Smaranda Muresan
CLL
LRM
189
122
0
24 May 2022
Fusing finetuned models for better pretraining
Leshem Choshen
Elad Venezian
Noam Slonim
Yoav Katz
FedML
AI4CE
MoMe
119
96
0
06 Apr 2022
Merging Models with Fisher-Weighted Averaging
Michael Matena
Colin Raffel
FedML
MoMe
96
402
0
18 Nov 2021
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
350
4,596
0
27 Oct 2021
EXAMS: A Multi-Subject High School Examinations Dataset for Cross-Lingual and Multilingual Question Answering
Momchil Hardalov
Todor Mihaylov
Dimitrina Zlatkova
Yoan Dinkov
Ivan Koychev
Preslav Nakov
AI4Ed
ELM
156
54
0
05 Nov 2020
mT5: A massively multilingual pre-trained text-to-text transformer
Linting Xue
Noah Constant
Adam Roberts
Mihir Kale
Rami Al-Rfou
Aditya Siddhant
Aditya Barua
Colin Raffel
148
2,560
0
22 Oct 2020
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
184
4,572
0
07 Sep 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
547
42,639
0
03 Dec 2019
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Samyam Rajbhandari
Jeff Rasley
Olatunji Ruwase
Yuxiong He
ALM
AI4CE
82
919
0
04 Oct 2019
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Cheolhyoung Lee
Kyunghyun Cho
Wanmo Kang
MoE
275
209
0
25 Sep 2019
PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification
Yinfei Yang
Y. Zhang
Chris Tar
Jason Baldridge
AAML
75
368
0
30 Aug 2019
MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms
Aida Amini
Saadia Gabriel
Shanchuan Lin
Rik Koncel-Kedziorski
Yejin Choi
Hannaneh Hajishirzi
AIMat
ReLM
AI4CE
127
581
0
30 May 2019
HellaSwag: Can a Machine Really Finish Your Sentence?
Rowan Zellers
Ari Holtzman
Yonatan Bisk
Ali Farhadi
Yejin Choi
182
2,532
0
19 May 2019
Experience Replay for Continual Learning
David Rolnick
Arun Ahuja
Jonathan Richard Schwarz
Timothy Lillicrap
Greg Wayne
CLL
116
1,171
0
28 Nov 2018
XNLI: Evaluating Cross-lingual Sentence Representations
Alexis Conneau
Guillaume Lample
Ruty Rinott
Adina Williams
Samuel R. Bowman
Holger Schwenk
Veselin Stoyanov
ELM
90
1,388
0
13 Sep 2018
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
Taku Kudo
John Richardson
206
3,531
0
19 Aug 2018
1
2
Next