ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.05482
  4. Cited By
Model soups: averaging weights of multiple fine-tuned models improves
  accuracy without increasing inference time

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

10 March 2022
Mitchell Wortsman
Gabriel Ilharco
S. Gadre
Rebecca Roelofs
Raphael Gontijo-Lopes
Ari S. Morcos
Hongseok Namkoong
Ali Farhadi
Y. Carmon
Simon Kornblith
Ludwig Schmidt
    MoMe
ArXivPDFHTML

Papers citing "Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time"

50 / 94 papers shown
Title
Understanding Mode Connectivity via Parameter Space Symmetry
Understanding Mode Connectivity via Parameter Space Symmetry
B. Zhao
Nima Dehmamy
Robin Walters
Rose Yu
111
7
0
29 May 2025
Text Generation Beyond Discrete Token Sampling
Text Generation Beyond Discrete Token Sampling
Yufan Zhuang
Liyuan Liu
Chandan Singh
Jingbo Shang
Jianfeng Gao
OOD
95
1
0
20 May 2025
A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment
A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment
Jean-Philippe Corbeil
Amin Dada
Jean-Michel Attendu
Asma Ben Abacha
Alessandro Sordoni
Lucas Caccia
François Beaulieu
Thomas Lin
Jens Kleesiek
Paul Vozila
LM&MA
74
0
0
15 May 2025
EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning
EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning
Lingxiao Kong
Cong Yang
Susanne Neufang
Oya Beyan
Zeyd Boukhers
OffRL
53
0
0
05 May 2025
Bielik 11B v2 Technical Report
Bielik 11B v2 Technical Report
Krzysztof Ociepa
Łukasz Flis
Krzysztof Wróbel
Adrian Gwoździej
Remigiusz Kinas
55
0
0
05 May 2025
Embedding based retrieval for long tail search queries in ecommerce
Embedding based retrieval for long tail search queries in ecommerce
Akshay Kekuda
Yuyang Zhang
Arun Udayashankar
RALM
124
0
0
03 May 2025
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
Hongkang Li
Yihua Zhang
Shuai Zhang
Ming Wang
Sijia Liu
Pin-Yu Chen
MoMe
129
5
0
15 Apr 2025
A Model Zoo of Vision Transformers
A Model Zoo of Vision Transformers
Damian Falk
Léo Meynent
Florence Pfammatter
Konstantin Schurholt
Damian Borth
131
0
0
14 Apr 2025
FuseRL: Dense Preference Optimization for Heterogeneous Model Fusion
FuseRL: Dense Preference Optimization for Heterogeneous Model Fusion
Longguang Zhong
Fanqi Wan
Ziyi Yang
Guosheng Liang
Tianyuan Shi
Xiaojun Quan
MoMe
85
0
0
09 Apr 2025
FedMerge: Federated Personalization via Model Merging
FedMerge: Federated Personalization via Model Merging
Shutong Chen
Tianyi Zhou
Guodong Long
Jing Jiang
Chengqi Zhang
FedML
MoMe
82
0
0
09 Apr 2025
BECAME: BayEsian Continual Learning with Adaptive Model MErging
BECAME: BayEsian Continual Learning with Adaptive Model MErging
Mei Li
Yuxiang Lu
Qinyan Dai
Suizhi Huang
Yue Ding
Hongtao Lu
CLL
MoMe
92
0
0
03 Apr 2025
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
Han Wu
Yuxuan Yao
Shuqi Liu
Zehua Liu
Xiaojin Fu
Xiongwei Han
Xianrui Li
Hui-Ling Zhen
Tao Zhong
Mingxuan Yuan
MoMe
LRM
94
10
0
26 Mar 2025
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
Zhongzhan Huang
Guoming Ling
Vincent S. Liang
Yupei Lin
Yandong Chen
Shanshan Zhong
Hefeng Wu
LRM
118
5
0
08 Mar 2025
SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting
Linqi Yang
Xiongwei Zhao
Qihao Sun
Ke Wang
Ao Chen
Peng Kang
3DGS
98
3
0
07 Mar 2025
Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems
Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems
Matthew Barker
Andrew Bell
Evan Thomas
James Carr
Thomas Andrews
Umang Bhatt
99
2
0
25 Feb 2025
PICASO: Permutation-Invariant Context Composition with State Space Models
PICASO: Permutation-Invariant Context Composition with State Space Models
Tian Yu Liu
Alessandro Achille
Matthew Trager
Aditya Golatkar
Luca Zancato
Stefano Soatto
LRM
89
0
0
24 Feb 2025
Sparsity May Be All You Need: Sparse Random Parameter Adaptation
Sparsity May Be All You Need: Sparse Random Parameter Adaptation
Jesus Rios
Pierre Dognin
Ronny Luss
Karthikeyan N. Ramamurthy
118
1
0
21 Feb 2025
Robust Concept Erasure Using Task Vectors
Robust Concept Erasure Using Task Vectors
Minh Pham
Kelly O. Marshall
Chinmay Hegde
Niv Cohen
141
20
0
21 Feb 2025
Scalable Model Merging with Progressive Layer-wise Distillation
Scalable Model Merging with Progressive Layer-wise Distillation
Jing Xu
Jiazheng Li
J.N. Zhang
MoMe
FedML
207
2
0
18 Feb 2025
Secure and Efficient Watermarking for Latent Diffusion Models in Model Distribution Scenarios
Secure and Efficient Watermarking for Latent Diffusion Models in Model Distribution Scenarios
Liangqi Lei
Keke Gai
Jing Yu
Liehuang Zhu
Qi Wu
WIGM
79
0
0
18 Feb 2025
Linear Mode Connectivity in Differentiable Tree Ensembles
Linear Mode Connectivity in Differentiable Tree Ensembles
Ryuichi Kanoh
M. Sugiyama
136
1
0
17 Feb 2025
SuperMerge: An Approach For Gradient-Based Model Merging
SuperMerge: An Approach For Gradient-Based Model Merging
Haoyu Yang
Zheng Zhang
Saket Sathe
MoMe
161
0
0
17 Feb 2025
Forget the Data and Fine-Tuning! Just Fold the Network to Compress
Forget the Data and Fine-Tuning! Just Fold the Network to Compress
Dong Wang
Haris Šikić
Lothar Thiele
O. Saukh
79
1
0
17 Feb 2025
Be Cautious When Merging Unfamiliar LLMs: A Phishing Model Capable of Stealing Privacy
Be Cautious When Merging Unfamiliar LLMs: A Phishing Model Capable of Stealing Privacy
Zhenyuan Guo
Yi Shi
Wenlong Meng
Chen Gong
Chengkun Wei
Wenzhi Chen
MoMe
89
0
0
17 Feb 2025
Superpose Singular Features for Model Merging
Superpose Singular Features for Model Merging
Haiquan Qiu
You Wu
Quanming Yao
MoMe
109
0
0
15 Feb 2025
1bit-Merging: Dynamic Quantized Merging for Large Language Models
1bit-Merging: Dynamic Quantized Merging for Large Language Models
Shuqi Liu
Yuxuan Yao
Bowei He
Zehua Liu
Xiongwei Han
Mingxuan Yuan
Han Wu
Linqi Song
MoMe
MQ
105
2
0
15 Feb 2025
Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding
Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding
Ziyi Wang
Muneeza Azmart
Ang Li
R. Horesh
Mikhail Yurochkin
142
1
0
11 Feb 2025
When, Where and Why to Average Weights?
Niccolò Ajroldi
Antonio Orvieto
Jonas Geiping
MoMe
170
0
0
10 Feb 2025
Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble
Atsushi Nitanda
Anzelle Lee
Damian Tan Xing Kai
Mizuki Sakaguchi
Taiji Suzuki
AI4CE
87
1
0
09 Feb 2025
MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs
MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs
Yuhang Zhou
Giannis Karamanolakis
Victor Soto
Anna Rumshisky
Mayank Kulkarni
Furong Huang
Wei Ai
Jianhua Lu
MoMe
127
2
0
03 Feb 2025
Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion
Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion
Binchi Zhang
Zaiyi Zheng
Zhengzhang Chen
Wenlin Yao
108
0
0
01 Feb 2025
Learning Priors of Human Motion With Vision Transformers
Learning Priors of Human Motion With Vision Transformers
Placido Falqueto
Alberto Sanfeliu
Luigi Palopoli
Daniele Fontanelli
ViT
187
0
0
30 Jan 2025
Evolutionary Optimization of Model Merging Recipes
Evolutionary Optimization of Model Merging Recipes
Takuya Akiba
Makoto Shing
Yujin Tang
Qi Sun
David Ha
MoMe
206
115
0
28 Jan 2025
Multi-Task Model Merging via Adaptive Weight Disentanglement
Multi-Task Model Merging via Adaptive Weight Disentanglement
Feng Xiong
Runxi Cheng
Wang Chen
Zhanqiu Zhang
Yiwen Guo
Chun Yuan
Ruifeng Xu
MoMe
158
6
0
10 Jan 2025
Training-free Heterogeneous Model Merging
Zhengqi Xu
Han Zheng
Jie Song
Li Sun
Mingli Song
MoMe
171
1
0
03 Jan 2025
Task Singular Vectors: Reducing Task Interference in Model Merging
Task Singular Vectors: Reducing Task Interference in Model Merging
Antonio Andrea Gargiulo
Donato Crisostomi
Maria Sofia Bucarelli
Simone Scardapane
Fabrizio Silvestri
Emanuele Rodolà
MoMe
119
14
0
26 Nov 2024
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
413
3
0
20 Nov 2024
ATM: Improving Model Merging by Alternating Tuning and Merging
ATM: Improving Model Merging by Alternating Tuning and Merging
Luca Zhou
Daniele Solombrino
Donato Crisostomi
Maria Sofia Bucarelli
Fabrizio Silvestri
Emanuele Rodolà
MoMe
80
5
0
05 Nov 2024
Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace
Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace
Jinluan Yang
Anke Tang
Didi Zhu
Zhengyu Chen
Li Shen
Leilei Gan
MoMe
AAML
110
4
0
17 Oct 2024
Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions
Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions
Michael J.Q. Zhang
W. Bradley Knox
Eunsol Choi
58
5
0
17 Oct 2024
Agent Skill Acquisition for Large Language Models via CycleQD
Agent Skill Acquisition for Large Language Models via CycleQD
So Kuroki
Taishi Nakamura
Takuya Akiba
Yujin Tang
MoMe
62
2
0
16 Oct 2024
CollabEdit: Towards Non-destructive Collaborative Knowledge Editing
CollabEdit: Towards Non-destructive Collaborative Knowledge Editing
Jiamu Zheng
Jinghuai Zhang
Tianyu Du
Xuhong Zhang
Jianwei Yin
Tao Lin
KELM
88
0
0
12 Oct 2024
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
Wenlong Deng
Yize Zhao
V. Vakilian
Minghui Chen
Xiaoxiao Li
Christos Thrampoulidis
121
6
0
12 Oct 2024
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Lucas Bandarkar
Benjamin Muller
Pritish Yuvraj
Rui Hou
Nayan Singhal
Hongjiang Lv
Bing-Quan Liu
KELM
LRM
MoMe
70
4
0
02 Oct 2024
Dual Consolidation for Pre-Trained Model-Based Domain-Incremental Learning
Dual Consolidation for Pre-Trained Model-Based Domain-Incremental Learning
Da-Wei Zhou
Zi-Wen Cai
Han-Jia Ye
Lijun Zhang
De-Chuan Zhan
CLL
AI4CE
137
2
0
01 Oct 2024
Towards understanding evolution of science through language model series
Towards understanding evolution of science through language model series
Junjie Dong
Zhuoqi Lyu
Qing Ke
AI4TS
102
0
0
15 Sep 2024
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
Yupeng Chen
Senmiao Wang
Zhihang Lin
Zhihang Lin
Yushun Zhang
Tian Ding
Ruoyu Sun
Ruoyu Sun
CLL
125
3
0
30 Jul 2024
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging
Deyuan Liu
Zhan Qin
Han Wang
Zhao Yang
Zecheng Wang
...
Zhao Lv
Zhiying Tu
Dianhui Chu
Bo Li
Dianbo Sui
50
2
0
24 Jun 2024
MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection
MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection
Zhuoxiao Chen
Junjie Meng
Mahsa Baktashmotlagh
Yonggang Zhang
Zi Huang
Yadan Luo
125
1
0
21 Jun 2024
Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead
Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead
Rickard Brüel-Gabrielsson
Jiacheng Zhu
Onkar Bhardwaj
Leshem Choshen
Kristjan Greenewald
Mikhail Yurochkin
Justin Solomon
96
8
0
17 Jun 2024
12
Next