Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.03617
Cited By
What Matters for Model Merging at Scale?
4 October 2024
Prateek Yadav
Tu Vu
Jonathan Lai
Alexandra Chronopoulou
Manaal Faruqui
Joey Tianyi Zhou
Tsendsuren Munkhdalai
MoMe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Matters for Model Merging at Scale?"
44 / 44 papers shown
Title
A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment
Jean-Philippe Corbeil
Amin Dada
Jean-Michel Attendu
Asma Ben Abacha
Alessandro Sordoni
Lucas Caccia
François Beaulieu
Thomas Lin
Jens Kleesiek
Paul Vozila
LM&MA
102
0
0
15 May 2025
Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion
Binchi Zhang
Zaiyi Zheng
Zhengzhang Chen
Wenlin Yao
189
1
0
01 Feb 2025
Evolutionary Optimization of Model Merging Recipes
Takuya Akiba
Makoto Shing
Yujin Tang
Qi Sun
David Ha
MoMe
276
121
0
28 Jan 2025
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic
Yifei He
Yuzheng Hu
Yong Lin
Tong Zhang
Han Zhao
FedML
MoMe
102
24
0
08 Jan 2025
FuseChat: Knowledge Fusion of Chat Models
Fanqi Wan
Longguang Zhong
Ziyi Yang
Ruijun Chen
Xiaojun Quan
ALM
KELM
MoMe
81
25
0
15 Aug 2024
WARP: On the Benefits of Weight Averaged Rewarded Policies
Alexandre Ramé
Johan Ferret
Nino Vieillard
Robert Dadashi
Léonard Hussenot
Pierre-Louis Cedoz
Pier Giuseppe Sessa
Sertan Girgin
Arthur Douillard
Olivier Bachem
101
19
0
24 Jun 2024
Model Stock: All we need is just a few fine-tuned models
Dong-Hwan Jang
Sangdoo Yun
Dongyoon Han
OODD
MoMe
75
42
0
28 Mar 2024
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Viraj Shah
Nataniel Ruiz
Forrester Cole
Erika Lu
Svetlana Lazebnik
Yuanzhen Li
Varun Jampani
DiffM
107
111
0
22 Nov 2023
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Le Yu
Yu Bowen
Haiyang Yu
Fei Huang
Yongbin Li
MoMe
104
320
0
06 Nov 2023
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Haipeng Luo
Qingfeng Sun
Can Xu
Pu Zhao
Jian-Guang Lou
...
Xiubo Geng
Qingwei Lin
Shifeng Chen
Yansong Tang
Dongmei Zhang
LRM
OSLM
203
455
0
18 Aug 2023
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
Alexandre Ramé
Guillaume Couairon
Mustafa Shukor
Corentin Dancette
Jean-Baptiste Gaya
Laure Soulier
Matthieu Cord
MoMe
70
152
0
07 Jun 2023
An Empirical Study of Multimodal Model Merging
Yi-Lin Sung
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Joey Tianyi Zhou
Lijuan Wang
MoMe
91
41
0
28 Apr 2023
Exploring the Benefits of Training Expert Language Models over Instruction Tuning
Joel Jang
Seungone Kim
Seonghyeon Ye
Doyoung Kim
Lajanugen Logeswaran
Moontae Lee
Kyungjae Lee
Minjoon Seo
LRM
ALM
93
79
0
07 Feb 2023
Re-basin via implicit Sinkhorn differentiation
F. Guerrero-Peña
H. R. Medeiros
Thomas Dubail
Masih Aminbeidokhti
Eric Granger
M. Pedersoli
MoMe
71
49
0
22 Dec 2022
Editing Models with Task Arithmetic
Gabriel Ilharco
Marco Tulio Ribeiro
Mitchell Wortsman
Suchin Gururangan
Ludwig Schmidt
Hannaneh Hajishirzi
Ali Farhadi
KELM
MoMe
MU
185
496
0
08 Dec 2022
REPAIR: REnormalizing Permuted Activations for Interpolation Repair
Keller Jordan
Hanie Sedghi
O. Saukh
R. Entezari
Behnam Neyshabur
MoMe
66
99
0
15 Nov 2022
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
189
3,128
0
20 Oct 2022
Fusing finetuned models for better pretraining
Leshem Choshen
Elad Venezian
Noam Slonim
Yoav Katz
FedML
AI4CE
MoMe
112
94
0
06 Apr 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
342
1,702
0
15 Oct 2021
The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks
R. Entezari
Hanie Sedghi
O. Saukh
Behnam Neyshabur
MoMe
89
231
0
12 Oct 2021
Robust fine-tuning of zero-shot models
Mitchell Wortsman
Gabriel Ilharco
Jong Wook Kim
Mike Li
Simon Kornblith
...
Raphael Gontijo-Lopes
Hannaneh Hajishirzi
Ali Farhadi
Hongseok Namkoong
Ludwig Schmidt
VLM
129
725
0
04 Sep 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
929
29,436
0
26 Feb 2021
Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling
Gregory W. Benton
Wesley J. Maddox
Sanae Lotfi
A. Wilson
UQCV
89
69
0
25 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
654
41,103
0
22 Oct 2020
Optimizing Mode Connectivity via Neuron Alignment
N. Joseph Tatro
Pin-Yu Chen
Payel Das
Igor Melnyk
P. Sattigeri
Rongjie Lai
MoMe
272
82
0
05 Sep 2020
What is being transferred in transfer learning?
Behnam Neyshabur
Hanie Sedghi
Chiyuan Zhang
103
522
0
26 Aug 2020
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
AAML
159
2,737
0
05 Jun 2020
Linear Mode Connectivity and the Lottery Ticket Hypothesis
Jonathan Frankle
Gintare Karolina Dziugaite
Daniel M. Roy
Michael Carbin
MoMe
149
619
0
11 Dec 2019
Adversarial NLI: A New Benchmark for Natural Language Understanding
Yixin Nie
Adina Williams
Emily Dinan
Joey Tianyi Zhou
Jason Weston
Douwe Kiela
125
1,006
0
31 Oct 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
439
20,181
0
23 Oct 2019
Model Fusion via Optimal Transport
Sidak Pal Singh
Martin Jaggi
MoMe
FedML
106
237
0
12 Oct 2019
Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning
Lifu Huang
Ronan Le Bras
Chandra Bhagavatula
Yejin Choi
AIMat
RALM
LRM
112
454
0
31 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
659
24,464
0
26 Jul 2019
DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension
Kai Sun
Dian Yu
Jianshu Chen
Dong Yu
Yejin Choi
Claire Cardie
RALM
AIMat
57
295
0
01 Feb 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
94,891
0
11 Oct 2018
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Zhilin Yang
Peng Qi
Saizheng Zhang
Yoshua Bengio
William W. Cohen
Ruslan Salakhutdinov
Christopher D. Manning
RALM
171
2,655
0
25 Sep 2018
WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations
Mohammad Taher Pilehvar
Jose Camacho-Collados
195
489
0
28 Aug 2018
Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization
Shashi Narayan
Shay B. Cohen
Mirella Lapata
AILaw
126
1,676
0
27 Aug 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,159
0
20 Apr 2018
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Noam M. Shazeer
Mitchell Stern
ODL
76
1,048
0
11 Apr 2018
Essentially No Barriers in Neural Network Energy Landscape
Felix Dräxler
K. Veschgini
M. Salmhofer
Fred Hamprecht
MoMe
111
434
0
02 Mar 2018
Get To The Point: Summarization with Pointer-Generator Networks
A. See
Peter J. Liu
Christopher D. Manning
3DPC
298
4,019
0
14 Apr 2017
Topology and Geometry of Half-Rectified Network Optimization
C. Freeman
Joan Bruna
199
235
0
04 Nov 2016
Communication-Efficient Learning of Deep Networks from Decentralized Data
H. B. McMahan
Eider Moore
Daniel Ramage
S. Hampson
Blaise Agüera y Arcas
FedML
406
17,486
0
17 Feb 2016
1