Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.03044
Cited By
Fusing finetuned models for better pretraining
6 April 2022
Leshem Choshen
Elad Venezian
Noam Slonim
Yoav Katz
FedML
AI4CE
MoMe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Fusing finetuned models for better pretraining"
30 / 80 papers shown
Title
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy
Pingzhi Li
Zhenyu (Allen) Zhang
Prateek Yadav
Yi-Lin Sung
Yu Cheng
Mohit Bansal
Tianlong Chen
MoMe
26
33
0
02 Oct 2023
Deep Model Fusion: A Survey
Weishi Li
Yong Peng
Miao Zhang
Liang Ding
Han Hu
Li Shen
FedML
MoMe
33
52
0
27 Sep 2023
Cordyceps@LT-EDI: Patching Language-Specific Homophobia/Transphobia Classifiers with a Multilingual Understanding
Dean Ninalga
29
2
0
24 Sep 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
Mustafa Shukor
Corentin Dancette
Alexandre Ramé
Matthieu Cord
MoMe
MLLM
61
42
0
30 Jul 2023
Can Model Fusing Help Transformers in Long Document Classification? An Empirical Study
Damith Premasiri
Tharindu Ranasinghe
R. Mitkov
VLM
29
1
0
18 Jul 2023
Tangent Transformers for Composition, Privacy and Removal
Tian Yu Liu
Aditya Golatkar
Stefano Soatto
30
8
0
16 Jul 2023
Tangent Model Composition for Ensembling and Continual Fine-tuning
Tianlin Liu
Stefano Soatto
LRM
MoMe
CLL
27
15
0
16 Jul 2023
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging
Max Zimmer
Christoph Spiegel
Sebastian Pokutta
MoMe
41
14
0
29 Jun 2023
Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models
A. Jaiswal
Shiwei Liu
Tianlong Chen
Ying Ding
Zhangyang Wang
VLM
32
22
0
18 Jun 2023
Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models
Nikhil Kandpal
Brian Lester
Mohammed Muqeeth
Anisha Mascarenhas
Monty Evans
Vishal Baskaran
Tenghao Huang
Haokun Liu
Colin Raffel
VLM
16
10
0
07 Jun 2023
Soft Merging of Experts with Adaptive Routing
Mohammed Muqeeth
Haokun Liu
Colin Raffel
MoMe
MoE
37
45
0
06 Jun 2023
TIES-Merging: Resolving Interference When Merging Models
Prateek Yadav
Derek Tam
Leshem Choshen
Colin Raffel
Joey Tianyi Zhou
MoMe
45
253
0
02 Jun 2023
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models
Guillermo Ortiz-Jiménez
Alessandro Favero
P. Frossard
MoMe
51
110
0
22 May 2023
Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks
Alon Jacovi
Avi Caciularu
Omer Goldman
Yoav Goldberg
17
95
0
17 May 2023
ZipIt! Merging Models from Different Tasks without Training
George Stoica
Daniel Bolya
J. Bjorner
Pratik Ramesh
Taylor N. Hearn
Judy Hoffman
VLM
MoMe
49
111
0
04 May 2023
Merging Decision Transformers: Weight Averaging for Forming Multi-Task Policies
Daniel Lawson
A. H. Qureshi
MoMe
OffRL
31
13
0
14 Mar 2023
Towards Zero-Shot Functional Compositionality of Language Models
Hangyeol Yu
Myeongho Jeong
Jamin Shin
Hyeongdon Moon
Juneyoung Park
Seungtaek Choi
37
1
0
06 Mar 2023
Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?
Ruisi Cai
Zhenyu (Allen) Zhang
Zhangyang Wang
AAML
OOD
33
12
0
24 Feb 2023
Knowledge is a Region in Weight Space for Fine-tuned Language Models
Almog Gueta
Elad Venezian
Colin Raffel
Noam Slonim
Yoav Katz
Leshem Choshen
34
49
0
09 Feb 2023
Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization
Alexandre Ramé
Kartik Ahuja
Jianyu Zhang
Matthieu Cord
Léon Bottou
David Lopez-Paz
MoMe
OODD
37
81
0
20 Dec 2022
Dataless Knowledge Fusion by Merging Weights of Language Models
Xisen Jin
Xiang Ren
Daniel Preotiuc-Pietro
Pengxiang Cheng
FedML
MoMe
24
214
0
19 Dec 2022
Editing Models with Task Arithmetic
Gabriel Ilharco
Marco Tulio Ribeiro
Mitchell Wortsman
Suchin Gururangan
Ludwig Schmidt
Hannaneh Hajishirzi
Ali Farhadi
KELM
MoMe
MU
72
435
0
08 Dec 2022
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning
Shachar Don-Yehiya
Elad Venezian
Colin Raffel
Noam Slonim
Yoav Katz
Leshem Choshen
MoMe
28
52
0
02 Dec 2022
Where to start? Analyzing the potential value of intermediate models
Leshem Choshen
Elad Venezian
Shachar Don-Yehiya
Noam Slonim
Yoav Katz
MoMe
17
27
0
31 Oct 2022
lo-fi: distributed fine-tuning without communication
Mitchell Wortsman
Suchin Gururangan
Shen Li
Ali Farhadi
Ludwig Schmidt
Michael G. Rabbat
Ari S. Morcos
32
24
0
19 Oct 2022
Patching open-vocabulary models by interpolating weights
Gabriel Ilharco
Mitchell Wortsman
S. Gadre
Shuran Song
Hannaneh Hajishirzi
Simon Kornblith
Ali Farhadi
Ludwig Schmidt
VLM
KELM
32
166
0
10 Aug 2022
Diverse Weight Averaging for Out-of-Distribution Generalization
Alexandre Ramé
Matthieu Kirchmeyer
Thibaud Rahier
A. Rakotomamonjy
Patrick Gallinari
Matthieu Cord
OOD
199
128
0
19 May 2022
On Neurons Invariant to Sentence Structural Changes in Neural Machine Translation
Gal Patel
Leshem Choshen
Omri Abend
36
2
0
06 Oct 2021
Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes
James Lucas
Juhan Bae
Michael Ruogu Zhang
Stanislav Fort
R. Zemel
Roger C. Grosse
MoMe
164
28
0
22 Apr 2021
e-SNLI: Natural Language Inference with Natural Language Explanations
Oana-Maria Camburu
Tim Rocktaschel
Thomas Lukasiewicz
Phil Blunsom
LRM
260
620
0
04 Dec 2018
Previous
1
2