ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.12706
  4. Cited By
Scalable Model Merging with Progressive Layer-wise Distillation

Scalable Model Merging with Progressive Layer-wise Distillation

18 February 2025
Jing Xu
Jiazheng Li
J.N. Zhang
    MoMe
    FedML
ArXivPDFHTML

Papers citing "Scalable Model Merging with Progressive Layer-wise Distillation"

39 / 89 papers shown
Title
Dataless Knowledge Fusion by Merging Weights of Language Models
Dataless Knowledge Fusion by Merging Weights of Language Models
Xisen Jin
Xiang Ren
Daniel Preoţiuc-Pietro
Pengxiang Cheng
FedML
MoMe
47
231
0
19 Dec 2022
Editing Models with Task Arithmetic
Editing Models with Task Arithmetic
Gabriel Ilharco
Marco Tulio Ribeiro
Mitchell Wortsman
Suchin Gururangan
Ludwig Schmidt
Hannaneh Hajishirzi
Ali Farhadi
KELM
MoMe
MU
154
474
0
08 Dec 2022
Git Re-Basin: Merging Models modulo Permutation Symmetries
Git Re-Basin: Merging Models modulo Permutation Symmetries
Samuel K. Ainsworth
J. Hayase
S. Srinivasa
MoMe
277
326
0
11 Sep 2022
Factorizing Knowledge in Neural Networks
Factorizing Knowledge in Neural Networks
Xingyi Yang
Jingwen Ye
Xinchao Wang
MoMe
65
121
0
04 Jul 2022
Model soups: averaging weights of multiple fine-tuned models improves
  accuracy without increasing inference time
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Mitchell Wortsman
Gabriel Ilharco
S. Gadre
Rebecca Roelofs
Raphael Gontijo-Lopes
...
Hongseok Namkoong
Ali Farhadi
Y. Carmon
Simon Kornblith
Ludwig Schmidt
MoMe
112
953
1
10 Mar 2022
Merging Models with Fisher-Weighted Averaging
Merging Models with Fisher-Weighted Averaging
Michael Matena
Colin Raffel
FedML
MoMe
79
379
0
18 Nov 2021
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
214
4,175
0
27 Oct 2021
Multi-Task Self-Training for Learning General Representations
Multi-Task Self-Training for Learning General Representations
Golnaz Ghiasi
Barret Zoph
E. D. Cubuk
Quoc V. Le
Nayeon Lee
SSL
51
100
0
25 Aug 2021
Program Synthesis with Large Language Models
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELM
AIMat
ReCod
ALM
140
1,893
0
16 Aug 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
180
5,328
0
07 Jul 2021
GAN Cocktail: mixing GANs without dataset access
GAN Cocktail: mixing GANs without dataset access
Omri Avrahami
Dani Lischinski
Ohad Fried
DiffM
MoMe
40
10
0
07 Jun 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
339
4,873
0
24 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
462
40,217
0
22 Oct 2020
Linear Mode Connectivity in Multitask and Continual Learning
Linear Mode Connectivity in Multitask and Continual Learning
Seyed Iman Mirzadeh
Mehrdad Farajtabar
Dilan Görür
Razvan Pascanu
H. Ghasemzadeh
CLL
59
140
0
09 Oct 2020
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
D. Song
Jacob Steinhardt
ELM
RALM
155
4,222
0
07 Sep 2020
Knowledge Distillation for Multi-task Learning
Knowledge Distillation for Multi-task Learning
Weihong Li
Hakan Bilen
MoMe
34
72
0
14 Jul 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
546
41,106
0
28 May 2020
Linear Mode Connectivity and the Lottery Ticket Hypothesis
Linear Mode Connectivity and the Lottery Ticket Hypothesis
Jonathan Frankle
Gintare Karolina Dziugaite
Daniel M. Roy
Michael Carbin
MoMe
138
613
0
11 Dec 2019
Model Fusion via Optimal Transport
Model Fusion via Optimal Transport
Sidak Pal Singh
Martin Jaggi
MoMe
FedML
89
231
0
12 Oct 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
469
24,160
0
26 Jul 2019
Gradient Descent Maximizes the Margin of Homogeneous Neural Networks
Gradient Descent Maximizes the Margin of Homogeneous Neural Networks
Kaifeng Lyu
Jian Li
73
332
0
13 Jun 2019
Multi-Task Learning as Multi-Objective Optimization
Multi-Task Learning as Multi-Objective Optimization
Ozan Sener
V. Koltun
128
1,266
0
10 Oct 2018
Neural Network Acceptability Judgments
Neural Network Acceptability Judgments
Alex Warstadt
Amanpreet Singh
Samuel R. Bowman
194
1,390
0
31 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
761
7,080
0
20 Apr 2018
End-to-End Multi-Task Learning with Attention
End-to-End Multi-Task Learning with Attention
Shikun Liu
Edward Johns
Andrew J. Davison
CVBM
48
1,036
0
28 Mar 2018
Averaging Weights Leads to Wider Optima and Better Generalization
Averaging Weights Leads to Wider Optima and Better Generalization
Pavel Izmailov
Dmitrii Podoprikhin
T. Garipov
Dmitry Vetrov
A. Wilson
FedML
MoMe
104
1,643
0
14 Mar 2018
Convergence of Gradient Descent on Separable Data
Convergence of Gradient Descent on Separable Data
Mor Shpigel Nacson
Jason D. Lee
Suriya Gunasekar
Pedro H. P. Savarese
Nathan Srebro
Daniel Soudry
62
167
0
05 Mar 2018
Model-Ensemble Trust-Region Policy Optimization
Model-Ensemble Trust-Region Policy Optimization
Thanard Kurutach
I. Clavera
Yan Duan
Aviv Tamar
Pieter Abbeel
60
450
0
28 Feb 2018
EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and
  Land Cover Classification
EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
P. Helber
B. Bischke
Andreas Dengel
Damian Borth
109
1,790
0
31 Aug 2017
SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and
  Cross-lingual Focused Evaluation
SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation
Daniel Cer
Mona T. Diab
Eneko Agirre
I. Lopez-Gazpio
Lucia Specia
270
1,870
0
31 Jul 2017
Forward Thinking: Building and Training Neural Networks One Layer at a
  Time
Forward Thinking: Building and Training Neural Networks One Layer at a Time
Chris Hettinger
Tanner Christensen
Ben Ehlert
J. Humpherys
Tyler J. Jarvis
Sean Wade
AI4CE
38
45
0
08 Jun 2017
A Broad-Coverage Challenge Corpus for Sentence Understanding through
  Inference
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Adina Williams
Nikita Nangia
Samuel R. Bowman
456
4,444
0
18 Apr 2017
Layer-wise training of deep networks using kernel similarity
Layer-wise training of deep networks using kernel similarity
Mandar M. Kulkarni
Shirish S. Karande
46
35
0
21 Mar 2017
Remote Sensing Image Scene Classification: Benchmark and State of the
  Art
Remote Sensing Image Scene Classification: Benchmark and State of the Art
Gong Cheng
Junwei Han
Xiaoqiang Lu
81
2,237
0
01 Mar 2017
SQuAD: 100,000+ Questions for Machine Comprehension of Text
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
184
8,067
0
16 Jun 2016
Cross-stitch Networks for Multi-task Learning
Cross-stitch Networks for Multi-task Learning
Ishan Misra
Abhinav Shrivastava
Abhinav Gupta
M. Hebert
81
1,339
0
12 Apr 2016
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
290
19,523
0
09 Mar 2015
FitNets: Hints for Thin Deep Nets
FitNets: Hints for Thin Deep Nets
Adriana Romero
Nicolas Ballas
Samira Ebrahimi Kahou
Antoine Chassang
C. Gatta
Yoshua Bengio
FedML
258
3,862
0
19 Dec 2014
Describing Textures in the Wild
Describing Textures in the Wild
Mircea Cimpoi
Subhransu Maji
Iasonas Kokkinos
S. Mohamed
Andrea Vedaldi
3DV
85
2,632
0
14 Nov 2013
Previous
12