ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.08699
  4. Cited By
Mitigating Catastrophic Forgetting in Language Transfer via Model
  Merging
v1v2 (latest)

Mitigating Catastrophic Forgetting in Language Transfer via Model Merging

11 July 2024
Anton Alexandrov
Veselin Raychev
Mark Niklas Muller
Ce Zhang
Martin Vechev
Kristina Toutanova
    MoMeCLLKELM
ArXiv (abs)PDFHTML

Papers citing "Mitigating Catastrophic Forgetting in Language Transfer via Model Merging"

50 / 55 papers shown
Title
Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs
Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs
Zeping Yu
Sophia Ananiadou
MoMeKELMCLL
83
0
0
22 May 2025
ExeSQL: Self-Taught Text-to-SQL Models with Execution-Driven Bootstrapping for SQL Dialects
ExeSQL: Self-Taught Text-to-SQL Models with Execution-Driven Bootstrapping for SQL Dialects
Jipeng Zhang
Haolin Yang
Kehao Miao
Ruiyuan Zhang
Renjie Pi
Jiahui Gao
Xiaofang Zhou
173
0
0
22 May 2025
SEA-LION: Southeast Asian Languages in One Network
SEA-LION: Southeast Asian Languages in One Network
Raymond Ng
Thanh Ngan Nguyen
Yuli Huang
Ngee Chia Tai
Wai Yi Leong
...
David Ong Tat-Wee
B. Liu
William-Chandra Tjhi
Min Zhang
Leslie Teo
116
14
0
08 Apr 2025
Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization
Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization
Iñigo Pikabea
Iñaki Lacunza
Oriol Pareras
Carlos Escolano
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
VLM
186
1
0
28 Mar 2025
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
Han Wu
Yuxuan Yao
Shuqi Liu
Zehua Liu
Xiaojin Fu
Xiongwei Han
Xianrui Li
Hui-Ling Zhen
Tao Zhong
Mingxuan Yuan
MoMeLRM
127
14
0
26 Mar 2025
Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models
Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models
Shuqi Liu
Han Wu
Bowei He
Xiongwei Han
Mingxuan Yuan
Linqi Song
MoMe
114
3
0
20 Feb 2025
Extending LLMs to New Languages: A Case Study of Llama and Persian Adaptation
Extending LLMs to New Languages: A Case Study of Llama and Persian Adaptation
Samin Mahdizadeh Sani
Pouya Sadeghi
Thuy-Trang Vu
Yadollah Yaghoobzadeh
Gholamreza Haffari
153
2
0
17 Dec 2024
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models
Sombit Dey
Jan-Nico Zaech
Nikolay Nikolov
Luc Van Gool
Danda Pani Paudel
MoMeVLM
102
5
0
23 Sep 2024
LoRA Learns Less and Forgets Less
LoRA Learns Less and Forgets Less
D. Biderman
Jose Javier Gonzalez Ortiz
Jacob P. Portes
Mansheej Paul
Philip Greengard
...
Sam Havens
Vitaliy Chiley
Jonathan Frankle
Cody Blakeney
John P. Cunningham
CLL
114
141
0
15 May 2024
Continual Learning of Large Language Models: A Comprehensive Survey
Continual Learning of Large Language Models: A Comprehensive Survey
Haizhou Shi
Zihao Xu
Hengyi Wang
Weiyi Qin
Wenyuan Wang
Yibin Wang
Zifeng Wang
Sayna Ebrahimi
Hao Wang
CLLKELMLRM
124
87
0
25 Apr 2024
Model Stock: All we need is just a few fine-tuned models
Model Stock: All we need is just a few fine-tuned models
Dong-Hwan Jang
Sangdoo Yun
Dongyoon Han
OODDMoMe
77
45
0
28 Mar 2024
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Charles Goddard
Shamane Siriwardhana
Malikeh Ehghaghi
Luke Meyers
Vladimir Karpukhin
Brian Benedict
Mark McQuade
Jacob Solawetz
MoMeKELM
153
101
0
20 Mar 2024
Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral
Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral
Yiming Cui
Xin Yao
30
5
0
04 Mar 2024
Mitigating Catastrophic Forgetting in Large Language Models with
  Self-Synthesized Rehearsal
Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal
Jianheng Huang
Leyang Cui
Ante Wang
Chengyi Yang
Xinting Liao
Linfeng Song
Junfeng Yao
Jinsong Su
KELMCLL
71
46
0
02 Mar 2024
Instruction-tuned Language Models are Better Knowledge Learners
Instruction-tuned Language Models are Better Knowledge Learners
Zhengbao Jiang
Zhiqing Sun
Weijia Shi
Pedro Rodriguez
Chunting Zhou
Graham Neubig
Xi Lin
Wen-tau Yih
Srinivasan Iyer
KELM
75
41
0
20 Feb 2024
Fast Vocabulary Transfer for Language Model Compression
Fast Vocabulary Transfer for Language Model Compression
Leonidas Gee
Andrea Zugarini
Leonardo Rigutini
Paolo Torroni
59
32
0
15 Feb 2024
OLMo: Accelerating the Science of Language Models
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
189
412
0
01 Feb 2024
Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks
Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks
Mohammad-Javad Davari
Eugene Belilovsky
MoMe
80
70
0
11 Dec 2023
Language Models are Super Mario: Absorbing Abilities from Homologous
  Models as a Free Lunch
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Le Yu
Yu Bowen
Haiyang Yu
Fei Huang
Yongbin Li
MoMe
109
335
0
06 Nov 2023
CITB: A Benchmark for Continual Instruction Tuning
CITB: A Benchmark for Continual Instruction Tuning
Zihan Zhang
Meng Fang
Ling-Hao Chen
Mohammad-Reza Namazi-Rad
ALMCLL
90
25
0
23 Oct 2023
At Which Training Stage Does Code Data Help LLMs Reasoning?
At Which Training Stage Does Code Data Help LLMs Reasoning?
Xiaogang Jia
Yue Liu
Yue Yu
Yuanliang Zhang
Yu Jiang
Changjian Wang
Shanshan Li
LRMSyDa
91
68
0
28 Sep 2023
SlimPajama-DC: Understanding Data Combinations for LLM Training
SlimPajama-DC: Understanding Data Combinations for LLM Training
Zhiqiang Shen
Tianhua Tao
Liqun Ma
Willie Neiswanger
Zhengzhong Liu
...
Bowen Tan
Joel Hestness
Natalia Vassilieva
Daria Soboleva
Eric Xing
68
49
0
19 Sep 2023
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large
  Language Models in 167 Languages
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Thuat Nguyen
Chien Van Nguyen
Viet Dac Lai
Hieu Man
Nghia Trung Ngo
Franck Dernoncourt
Ryan Rossi
Thien Huu Nguyen
102
112
0
17 Sep 2023
Mitigating the Alignment Tax of RLHF
Mitigating the Alignment Tax of RLHF
Yong Lin
Hangyu Lin
Wei Xiong
Shizhe Diao
Zeming Zheng
...
Han Zhao
Nan Jiang
Heng Ji
Yuan Yao
Tong Zhang
MoMeCLL
85
80
0
12 Sep 2023
Platypus: Quick, Cheap, and Powerful Refinement of LLMs
Platypus: Quick, Cheap, and Powerful Refinement of LLMs
Ariel N. Lee
Cole J. Hunter
Nataniel Ruiz
ALMObjD
82
142
0
14 Aug 2023
FlashAttention-2: Faster Attention with Better Parallelism and Work
  Partitioning
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Tri Dao
LRM
117
1,326
0
17 Jul 2023
TIES-Merging: Resolving Interference When Merging Models
TIES-Merging: Resolving Interference When Merging Models
Prateek Yadav
Derek Tam
Leshem Choshen
Colin Raffel
Joey Tianyi Zhou
MoMe
120
317
0
02 Jun 2023
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Sang Michael Xie
Hieu H. Pham
Xuanyi Dong
Nan Du
Hanxiao Liu
Yifeng Lu
Percy Liang
Quoc V. Le
Tengyu Ma
Adams Wei Yu
MoMeMoE
136
203
0
17 May 2023
ZipIt! Merging Models from Different Tasks without Training
ZipIt! Merging Models from Different Tasks without Training
George Stoica
Daniel Bolya
J. Bjorner
Pratik Ramesh
Taylor N. Hearn
Judy Hoffman
VLMMoMe
108
125
0
04 May 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
1.5K
13,472
0
27 Feb 2023
The Flan Collection: Designing Data and Methods for Effective
  Instruction Tuning
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
Shayne Longpre
Le Hou
Tu Vu
Albert Webson
Hyung Won Chung
...
Denny Zhou
Quoc V. Le
Barret Zoph
Jason W. Wei
Adam Roberts
ALM
114
677
0
31 Jan 2023
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning
Shachar Don-Yehiya
Elad Venezian
Colin Raffel
Noam Slonim
Yoav Katz
Leshem Choshen
MoMe
82
55
0
02 Dec 2022
lo-fi: distributed fine-tuning without communication
lo-fi: distributed fine-tuning without communication
Mitchell Wortsman
Suchin Gururangan
Shen Li
Ali Farhadi
Ludwig Schmidt
Michael G. Rabbat
Ari S. Morcos
77
24
0
19 Oct 2022
Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language
  Models
Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models
Margaret Li
Suchin Gururangan
Tim Dettmers
M. Lewis
Tim Althoff
Noah A. Smith
Luke Zettlemoyer
MoMe
95
153
0
05 Aug 2022
Fine-tuned Language Models are Continual Learners
Fine-tuned Language Models are Continual Learners
Thomas Scialom
Tuhin Chakrabarty
Smaranda Muresan
CLLLRM
189
122
0
24 May 2022
Fusing finetuned models for better pretraining
Fusing finetuned models for better pretraining
Leshem Choshen
Elad Venezian
Noam Slonim
Yoav Katz
FedMLAI4CEMoMe
119
96
0
06 Apr 2022
Merging Models with Fisher-Weighted Averaging
Merging Models with Fisher-Weighted Averaging
Michael Matena
Colin Raffel
FedMLMoMe
96
402
0
18 Nov 2021
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
350
4,596
0
27 Oct 2021
EXAMS: A Multi-Subject High School Examinations Dataset for
  Cross-Lingual and Multilingual Question Answering
EXAMS: A Multi-Subject High School Examinations Dataset for Cross-Lingual and Multilingual Question Answering
Momchil Hardalov
Todor Mihaylov
Dimitrina Zlatkova
Yoan Dinkov
Ivan Koychev
Preslav Nakov
AI4EdELM
156
54
0
05 Nov 2020
mT5: A massively multilingual pre-trained text-to-text transformer
mT5: A massively multilingual pre-trained text-to-text transformer
Linting Xue
Noah Constant
Adam Roberts
Mihir Kale
Rami Al-Rfou
Aditya Siddhant
Aditya Barua
Colin Raffel
148
2,560
0
22 Oct 2020
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELMRALM
184
4,572
0
07 Sep 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
547
42,639
0
03 Dec 2019
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Samyam Rajbhandari
Jeff Rasley
Olatunji Ruwase
Yuxiong He
ALMAI4CE
82
919
0
04 Oct 2019
Mixout: Effective Regularization to Finetune Large-scale Pretrained
  Language Models
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Cheolhyoung Lee
Kyunghyun Cho
Wanmo Kang
MoE
275
209
0
25 Sep 2019
PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase
  Identification
PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification
Yinfei Yang
Y. Zhang
Chris Tar
Jason Baldridge
AAML
75
368
0
30 Aug 2019
MathQA: Towards Interpretable Math Word Problem Solving with
  Operation-Based Formalisms
MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms
Aida Amini
Saadia Gabriel
Shanchuan Lin
Rik Koncel-Kedziorski
Yejin Choi
Hannaneh Hajishirzi
AIMatReLMAI4CE
127
581
0
30 May 2019
HellaSwag: Can a Machine Really Finish Your Sentence?
HellaSwag: Can a Machine Really Finish Your Sentence?
Rowan Zellers
Ari Holtzman
Yonatan Bisk
Ali Farhadi
Yejin Choi
182
2,532
0
19 May 2019
Experience Replay for Continual Learning
Experience Replay for Continual Learning
David Rolnick
Arun Ahuja
Jonathan Richard Schwarz
Timothy Lillicrap
Greg Wayne
CLL
116
1,171
0
28 Nov 2018
XNLI: Evaluating Cross-lingual Sentence Representations
XNLI: Evaluating Cross-lingual Sentence Representations
Alexis Conneau
Guillaume Lample
Ruty Rinott
Adina Williams
Samuel R. Bowman
Holger Schwenk
Veselin Stoyanov
ELM
90
1,388
0
13 Sep 2018
SentencePiece: A simple and language independent subword tokenizer and
  detokenizer for Neural Text Processing
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
Taku Kudo
John Richardson
206
3,531
0
19 Aug 2018
12
Next