Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.05482
Cited By
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
10 March 2022
Mitchell Wortsman
Gabriel Ilharco
S. Gadre
Rebecca Roelofs
Raphael Gontijo-Lopes
Ari S. Morcos
Hongseok Namkoong
Ali Farhadi
Y. Carmon
Simon Kornblith
Ludwig Schmidt
MoMe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time"
50 / 94 papers shown
Title
Understanding Mode Connectivity via Parameter Space Symmetry
B. Zhao
Nima Dehmamy
Robin Walters
Rose Yu
111
7
0
29 May 2025
Text Generation Beyond Discrete Token Sampling
Yufan Zhuang
Liyuan Liu
Chandan Singh
Jingbo Shang
Jianfeng Gao
OOD
95
1
0
20 May 2025
A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment
Jean-Philippe Corbeil
Amin Dada
Jean-Michel Attendu
Asma Ben Abacha
Alessandro Sordoni
Lucas Caccia
François Beaulieu
Thomas Lin
Jens Kleesiek
Paul Vozila
LM&MA
74
0
0
15 May 2025
EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning
Lingxiao Kong
Cong Yang
Susanne Neufang
Oya Beyan
Zeyd Boukhers
OffRL
53
0
0
05 May 2025
Bielik 11B v2 Technical Report
Krzysztof Ociepa
Łukasz Flis
Krzysztof Wróbel
Adrian Gwoździej
Remigiusz Kinas
55
0
0
05 May 2025
Embedding based retrieval for long tail search queries in ecommerce
Akshay Kekuda
Yuyang Zhang
Arun Udayashankar
RALM
124
0
0
03 May 2025
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
Hongkang Li
Yihua Zhang
Shuai Zhang
Ming Wang
Sijia Liu
Pin-Yu Chen
MoMe
129
5
0
15 Apr 2025
A Model Zoo of Vision Transformers
Damian Falk
Léo Meynent
Florence Pfammatter
Konstantin Schurholt
Damian Borth
131
0
0
14 Apr 2025
FuseRL: Dense Preference Optimization for Heterogeneous Model Fusion
Longguang Zhong
Fanqi Wan
Ziyi Yang
Guosheng Liang
Tianyuan Shi
Xiaojun Quan
MoMe
85
0
0
09 Apr 2025
FedMerge: Federated Personalization via Model Merging
Shutong Chen
Tianyi Zhou
Guodong Long
Jing Jiang
Chengqi Zhang
FedML
MoMe
82
0
0
09 Apr 2025
BECAME: BayEsian Continual Learning with Adaptive Model MErging
Mei Li
Yuxiang Lu
Qinyan Dai
Suizhi Huang
Yue Ding
Hongtao Lu
CLL
MoMe
92
0
0
03 Apr 2025
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
Han Wu
Yuxuan Yao
Shuqi Liu
Zehua Liu
Xiaojin Fu
Xiongwei Han
Xianrui Li
Hui-Ling Zhen
Tao Zhong
Mingxuan Yuan
MoMe
LRM
94
10
0
26 Mar 2025
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
Zhongzhan Huang
Guoming Ling
Vincent S. Liang
Yupei Lin
Yandong Chen
Shanshan Zhong
Hefeng Wu
LRM
118
5
0
08 Mar 2025
SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting
Linqi Yang
Xiongwei Zhao
Qihao Sun
Ke Wang
Ao Chen
Peng Kang
3DGS
98
3
0
07 Mar 2025
Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems
Matthew Barker
Andrew Bell
Evan Thomas
James Carr
Thomas Andrews
Umang Bhatt
99
2
0
25 Feb 2025
PICASO: Permutation-Invariant Context Composition with State Space Models
Tian Yu Liu
Alessandro Achille
Matthew Trager
Aditya Golatkar
Luca Zancato
Stefano Soatto
LRM
89
0
0
24 Feb 2025
Sparsity May Be All You Need: Sparse Random Parameter Adaptation
Jesus Rios
Pierre Dognin
Ronny Luss
Karthikeyan N. Ramamurthy
118
1
0
21 Feb 2025
Robust Concept Erasure Using Task Vectors
Minh Pham
Kelly O. Marshall
Chinmay Hegde
Niv Cohen
141
20
0
21 Feb 2025
Scalable Model Merging with Progressive Layer-wise Distillation
Jing Xu
Jiazheng Li
J.N. Zhang
MoMe
FedML
207
2
0
18 Feb 2025
Secure and Efficient Watermarking for Latent Diffusion Models in Model Distribution Scenarios
Liangqi Lei
Keke Gai
Jing Yu
Liehuang Zhu
Qi Wu
WIGM
79
0
0
18 Feb 2025
Linear Mode Connectivity in Differentiable Tree Ensembles
Ryuichi Kanoh
M. Sugiyama
136
1
0
17 Feb 2025
SuperMerge: An Approach For Gradient-Based Model Merging
Haoyu Yang
Zheng Zhang
Saket Sathe
MoMe
161
0
0
17 Feb 2025
Forget the Data and Fine-Tuning! Just Fold the Network to Compress
Dong Wang
Haris Šikić
Lothar Thiele
O. Saukh
79
1
0
17 Feb 2025
Be Cautious When Merging Unfamiliar LLMs: A Phishing Model Capable of Stealing Privacy
Zhenyuan Guo
Yi Shi
Wenlong Meng
Chen Gong
Chengkun Wei
Wenzhi Chen
MoMe
89
0
0
17 Feb 2025
Superpose Singular Features for Model Merging
Haiquan Qiu
You Wu
Quanming Yao
MoMe
109
0
0
15 Feb 2025
1bit-Merging: Dynamic Quantized Merging for Large Language Models
Shuqi Liu
Yuxuan Yao
Bowei He
Zehua Liu
Xiongwei Han
Mingxuan Yuan
Han Wu
Linqi Song
MoMe
MQ
105
2
0
15 Feb 2025
Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding
Ziyi Wang
Muneeza Azmart
Ang Li
R. Horesh
Mikhail Yurochkin
142
1
0
11 Feb 2025
When, Where and Why to Average Weights?
Niccolò Ajroldi
Antonio Orvieto
Jonas Geiping
MoMe
170
0
0
10 Feb 2025
Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble
Atsushi Nitanda
Anzelle Lee
Damian Tan Xing Kai
Mizuki Sakaguchi
Taiji Suzuki
AI4CE
87
1
0
09 Feb 2025
MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs
Yuhang Zhou
Giannis Karamanolakis
Victor Soto
Anna Rumshisky
Mayank Kulkarni
Furong Huang
Wei Ai
Jianhua Lu
MoMe
127
2
0
03 Feb 2025
Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion
Binchi Zhang
Zaiyi Zheng
Zhengzhang Chen
Wenlin Yao
108
0
0
01 Feb 2025
Learning Priors of Human Motion With Vision Transformers
Placido Falqueto
Alberto Sanfeliu
Luigi Palopoli
Daniele Fontanelli
ViT
187
0
0
30 Jan 2025
Evolutionary Optimization of Model Merging Recipes
Takuya Akiba
Makoto Shing
Yujin Tang
Qi Sun
David Ha
MoMe
206
115
0
28 Jan 2025
Multi-Task Model Merging via Adaptive Weight Disentanglement
Feng Xiong
Runxi Cheng
Wang Chen
Zhanqiu Zhang
Yiwen Guo
Chun Yuan
Ruifeng Xu
MoMe
158
6
0
10 Jan 2025
Training-free Heterogeneous Model Merging
Zhengqi Xu
Han Zheng
Jie Song
Li Sun
Mingli Song
MoMe
171
1
0
03 Jan 2025
Task Singular Vectors: Reducing Task Interference in Model Merging
Antonio Andrea Gargiulo
Donato Crisostomi
Maria Sofia Bucarelli
Simone Scardapane
Fabrizio Silvestri
Emanuele Rodolà
MoMe
119
14
0
26 Nov 2024
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
413
3
0
20 Nov 2024
ATM: Improving Model Merging by Alternating Tuning and Merging
Luca Zhou
Daniele Solombrino
Donato Crisostomi
Maria Sofia Bucarelli
Fabrizio Silvestri
Emanuele Rodolà
MoMe
80
5
0
05 Nov 2024
Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace
Jinluan Yang
Anke Tang
Didi Zhu
Zhengyu Chen
Li Shen
Leilei Gan
MoMe
AAML
110
4
0
17 Oct 2024
Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions
Michael J.Q. Zhang
W. Bradley Knox
Eunsol Choi
58
5
0
17 Oct 2024
Agent Skill Acquisition for Large Language Models via CycleQD
So Kuroki
Taishi Nakamura
Takuya Akiba
Yujin Tang
MoMe
62
2
0
16 Oct 2024
CollabEdit: Towards Non-destructive Collaborative Knowledge Editing
Jiamu Zheng
Jinghuai Zhang
Tianyu Du
Xuhong Zhang
Jianwei Yin
Tao Lin
KELM
88
0
0
12 Oct 2024
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
Wenlong Deng
Yize Zhao
V. Vakilian
Minghui Chen
Xiaoxiao Li
Christos Thrampoulidis
121
6
0
12 Oct 2024
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Lucas Bandarkar
Benjamin Muller
Pritish Yuvraj
Rui Hou
Nayan Singhal
Hongjiang Lv
Bing-Quan Liu
KELM
LRM
MoMe
70
4
0
02 Oct 2024
Dual Consolidation for Pre-Trained Model-Based Domain-Incremental Learning
Da-Wei Zhou
Zi-Wen Cai
Han-Jia Ye
Lijun Zhang
De-Chuan Zhan
CLL
AI4CE
137
2
0
01 Oct 2024
Towards understanding evolution of science through language model series
Junjie Dong
Zhuoqi Lyu
Qing Ke
AI4TS
102
0
0
15 Sep 2024
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
Yupeng Chen
Senmiao Wang
Zhihang Lin
Zhihang Lin
Yushun Zhang
Tian Ding
Ruoyu Sun
Ruoyu Sun
CLL
125
3
0
30 Jul 2024
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging
Deyuan Liu
Zhan Qin
Han Wang
Zhao Yang
Zecheng Wang
...
Zhao Lv
Zhiying Tu
Dianhui Chu
Bo Li
Dianbo Sui
50
2
0
24 Jun 2024
MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection
Zhuoxiao Chen
Junjie Meng
Mahsa Baktashmotlagh
Yonggang Zhang
Zi Huang
Yadan Luo
125
1
0
21 Jun 2024
Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead
Rickard Brüel-Gabrielsson
Jiacheng Zhu
Onkar Bhardwaj
Leshem Choshen
Kristjan Greenewald
Mikhail Yurochkin
Justin Solomon
96
8
0
17 Jun 2024
1
2
Next