ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.05407
  4. Cited By
Averaging Weights Leads to Wider Optima and Better Generalization
v1v2v3 (latest)

Averaging Weights Leads to Wider Optima and Better Generalization

14 March 2018
Pavel Izmailov
Dmitrii Podoprikhin
T. Garipov
Dmitry Vetrov
A. Wilson
    FedMLMoMe
ArXiv (abs)PDFHTML

Papers citing "Averaging Weights Leads to Wider Optima and Better Generalization"

50 / 1,040 papers shown
Title
Dealing with the Evil Twins: Improving Random Augmentation by Addressing Catastrophic Forgetting of Diverse Augmentations
Dealing with the Evil Twins: Improving Random Augmentation by Addressing Catastrophic Forgetting of Diverse Augmentations
Dongkyu Cho
Rumi Chunara
28
0
0
01 Jul 2025
Subspace-Boosted Model Merging
Subspace-Boosted Model Merging
Ronald Skorobogat
Karsten Roth
Mariana-Iuliana Georgescu
Zeynep Akata
MoMe
20
0
0
19 Jun 2025
Weight Factorization and Centralization for Continual Learning in Speech Recognition
Weight Factorization and Centralization for Continual Learning in Speech Recognition
Enes Yavuz Ugan
Ngoc-Quan Pham
Alexander Waibel
CLLMoMe
23
0
0
19 Jun 2025
Sequential Policy Gradient for Adaptive Hyperparameter Optimization
Sequential Policy Gradient for Adaptive Hyperparameter Optimization
Zheng Li
Jerry Q. Cheng
Huanying Gu
OffRL
20
0
0
18 Jun 2025
Detecting immune cells with label-free two-photon autofluorescence and deep learning
Detecting immune cells with label-free two-photon autofluorescence and deep learning
Lucas Kreiss
A. Chaware
Maryam Roohian
Sarah Lemire
Oana-Maria Thoma
Birgitta Carlé
Maximilian Waldner
Sebastian Schürmann
O. Friedrich
R. Horstmeyer
15
0
0
17 Jun 2025
Symmetry in Neural Network Parameter Spaces
Symmetry in Neural Network Parameter Spaces
Bo Zhao
Robin Walters
Rose Yu
27
0
0
16 Jun 2025
The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions
The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions
Devin Kwok
Gül Sena Altıntaş
Colin Raffel
David Rolnick
19
0
0
16 Jun 2025
An Effective End-to-End Solution for Multimodal Action Recognition
Songping Wang
Xiantao Hu
Yueming Lyu
Caifeng Shan
67
0
0
11 Jun 2025
Data-Efficient Challenges in Visual Inductive Priors: A Retrospective
Data-Efficient Challenges in Visual Inductive Priors: A Retrospective
Robert-Jan Bruintjes
A. Lengyel
O. Kayhan
Davide Zambrano
Nergis Tomen
Hadi Jamali Rad
Jan van Gemert
VLM
31
0
0
10 Jun 2025
Promoting Ensemble Diversity with Interactive Bayesian Distributional Robustness for Fine-tuning Foundation Models
Promoting Ensemble Diversity with Interactive Bayesian Distributional Robustness for Fine-tuning Foundation Models
Ngoc-Quan Pham
Tuan Truong
Quyen Tran
T. H. Nguyen
Dinh Q. Phung
T. Le
36
1
0
08 Jun 2025
A Stable Whitening Optimizer for Efficient Neural Network Training
A Stable Whitening Optimizer for Efficient Neural Network Training
Kevin Frans
Sergey Levine
Pieter Abbeel
35
0
0
08 Jun 2025
SAFE: Finding Sparse and Flat Minima to Improve Pruning
SAFE: Finding Sparse and Flat Minima to Improve Pruning
Dongyeop Lee
Kwanhee Lee
Jinseok Chung
Namhoon Lee
37
0
0
07 Jun 2025
Towards Better Generalization via Distributional Input Projection Network
Yifan Hao
Yanxin Lu
Xinwei Shen
Tong Zhang
95
0
0
05 Jun 2025
StatsMerging: Statistics-Guided Model Merging via Task-Specific Teacher Distillation
Ranjith Merugu
Bryan Bo Cao
Shubham Jain
FedMLMoMe
119
0
0
05 Jun 2025
Emergent Abilities of Large Language Models under Continued Pretraining for Language Adaptation
Emergent Abilities of Large Language Models under Continued Pretraining for Language Adaptation
Ahmed Elhady
Eneko Agirre
Mikel Artetxe
CLLKELMELM
37
0
0
30 May 2025
Understanding Mode Connectivity via Parameter Space Symmetry
Understanding Mode Connectivity via Parameter Space Symmetry
B. Zhao
Nima Dehmamy
Robin Walters
Rose Yu
236
8
0
29 May 2025
Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization
Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization
C. Tan
Yubo Zhou
Haishan Ye
Guang Dai
Junmin Liu
Zengjie Song
Jiangshe Zhang
Zixiang Zhao
Yunda Hao
Yong Xu
AAML
34
0
0
29 May 2025
Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking
Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking
Yuatyong Chaichana
Thanapat Trachu
Peerat Limkonchotiwat
Konpat Preechakul
Tirasan Khandhawit
Ekapol Chuangsuwanich
MoMe
76
0
0
29 May 2025
Navigating the Accuracy-Size Trade-Off with Flexible Model Merging
Navigating the Accuracy-Size Trade-Off with Flexible Model Merging
Akash Dhasade
Divyansh Jhunjhunwala
Milos Vujasinovic
Gauri Joshi
Anne-Marie Kermarrec
MoMe
66
0
0
29 May 2025
Update Your Transformer to the Latest Release: Re-Basin of Task Vectors
Update Your Transformer to the Latest Release: Re-Basin of Task Vectors
Filippo Rinaldi
Giacomo Capitani
Lorenzo Bonicelli
Donato Crisostomi
Federico Bolelli
E. Ficarra
Emanuele Rodolà
Simone Calderara
Angelo Porrello
22
0
0
28 May 2025
PADAM: Parallel averaged Adam reduces the error for stochastic optimization in scientific machine learning
PADAM: Parallel averaged Adam reduces the error for stochastic optimization in scientific machine learning
Arnulf Jentzen
Julian Kranz
Adrian Riekert
ODL
65
0
0
28 May 2025
Understanding Adversarial Training with Energy-based Models
Understanding Adversarial Training with Energy-based Models
Mujtaba Hussain Mirza
Maria Rosaria Briglia
Filippo Bartolucci
Senad Beadini
G. Lisanti
I. Masi
AAML
52
0
0
28 May 2025
FCOS: A Two-Stage Recoverable Model Pruning Framework for Automatic Modulation Recognition
FCOS: A Two-Stage Recoverable Model Pruning Framework for Automatic Modulation Recognition
Yao Lu
Tengfei Ma
Zeyu Wang
Z. Chen
Dongwei Xu
Yun Lin
Qi Xuan
Guan Gui
45
1
0
27 May 2025
One-Time Soft Alignment Enables Resilient Learning without Weight Transport
One-Time Soft Alignment Enables Resilient Learning without Weight Transport
Jeonghwan Cheon
Jaehyuk Bae
Se-Bum Paik
ODL
46
1
0
27 May 2025
Variational Deep Learning via Implicit Regularization
Variational Deep Learning via Implicit Regularization
Jonathan Wenger
Beau Coker
Juraj Marusic
John P. Cunningham
OODUQCVBDL
51
0
0
26 May 2025
Robust fine-tuning of speech recognition models via model merging: application to disordered speech
Robust fine-tuning of speech recognition models via model merging: application to disordered speech
Alexandre Ducorroy
Rachid Riad
MoMe
31
0
0
26 May 2025
Revisiting Feature Interactions from the Perspective of Quadratic Neural Networks for Click-through Rate Prediction
Revisiting Feature Interactions from the Perspective of Quadratic Neural Networks for Click-through Rate Prediction
Honghao Li
Yiwen Zhang
Yi Zhang
Lei Sang
Jieming Zhu
129
0
0
23 May 2025
Accelerating Learned Image Compression Through Modeling Neural Training Dynamics
Accelerating Learned Image Compression Through Modeling Neural Training Dynamics
Yichi Zhang
Zhihao Duan
Yuning Huang
Fengqing Zhu
235
0
0
23 May 2025
Bayesian Deep Learning for Discrete Choice
Bayesian Deep Learning for Discrete Choice
Daniel F. Villarraga
Ricardo A. Daziano
BDLAI4CE
160
0
0
23 May 2025
Bayesian Optimization for Enhanced Language Models: Optimizing Acquisition Functions
Bayesian Optimization for Enhanced Language Models: Optimizing Acquisition Functions
Zishuo Bao
Yibo Liu
Changyutao Qiu
203
0
0
22 May 2025
NAN: A Training-Free Solution to Coefficient Estimation in Model Merging
NAN: A Training-Free Solution to Coefficient Estimation in Model Merging
Chongjie Si
Kangtao Lv
Jingjing Jiang
Yadao Wang
Yongwei Wang
Xiaokang Yang
Wenbo Su
Bo Zheng
Wei Shen
MoMe
49
0
0
22 May 2025
DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer
DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer
Haiduo Huang
Jiangcheng Song
Yadong Zhang
Pengju Ren
70
0
0
21 May 2025
MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging
MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging
Zihuan Qiu
Yi Xu
Chiyuan He
Fanman Meng
Linfeng Xu
Qi Wu
Hongliang Li
CLLMoMe
86
0
0
17 May 2025
Variational Visual Question Answering
Variational Visual Question Answering
Tobias Jan Wieczorek
Nathalie Daun
Mohammad Emtiyaz Khan
Marcus Rohrbach
OOD
92
0
0
14 May 2025
CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging
CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging
Wenju Sun
Qingyong Li
Yangli-ao Geng
Boyang Li
MoMe
115
2
0
11 May 2025
QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration
QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration
HamidReza Imani
Jiaxin Peng
Peiman Mohseni
Abdolah Amirany
Tarek A. El-Ghazawi
MoE
127
0
0
10 May 2025
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
Young-Hu Park
R.-H. Park
Hyung-Min Park
146
0
0
07 May 2025
Transferable Adversarial Attacks on Black-Box Vision-Language Models
Transferable Adversarial Attacks on Black-Box Vision-Language Models
Kai Hu
Weichen Yu
Lefei Zhang
Alexander Robey
Andy Zou
Chengming Xu
Haoqi Hu
Matt Fredrikson
AAMLVLM
130
2
0
02 May 2025
Investigating Task Arithmetic for Zero-Shot Information Retrieval
Investigating Task Arithmetic for Zero-Shot Information Retrieval
Marco Braga
Pranav Kasela
Alessandro Raganato
G. Pasi
RALM
131
0
0
01 May 2025
CAMeL: Cross-modality Adaptive Meta-Learning for Text-based Person Retrieval
CAMeL: Cross-modality Adaptive Meta-Learning for Text-based Person Retrieval
Hang Yu
Jiahao Wen
Zhedong Zheng
94
1
0
26 Apr 2025
A Model Zoo on Phase Transitions in Neural Networks
A Model Zoo on Phase Transitions in Neural Networks
Konstantin Schurholt
Léo Meynent
Yefan Zhou
Haiquan Lu
Yaoqing Yang
Damian Borth
119
1
0
25 Apr 2025
Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
ParamΔΔΔ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
Sheng Cao
Mingrui Wu
Karthik Prasad
Yuandong Tian
Zechun Liu
MoMe
139
0
0
23 Apr 2025
Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image Collections
Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image Collections
Max Kirchner
Alexander C. Jenke
S. Bodenstedt
Fiona Kolbinger
Oliver Saldanha
Jakob N. Kather
M. Wagner
Stefanie Speidel
FedMLMedIm
158
1
0
23 Apr 2025
Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning
Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning
Yeoreum Lee
Jinwook Jung
Sungyong Baik
MoMe
157
3
0
20 Apr 2025
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Sergio Arnaud
Paul Mcvay
Ada Martin
Arjun Majumdar
Krishna Murthy Jatavallabhula
...
Nicolas Ballas
Mido Assran
Oleksandr Maksymets
Aravind Rajeswaran
Franziska Meier
3DPC
81
2
0
19 Apr 2025
Weakly Semi-supervised Whole Slide Image Classification by Two-level Cross Consistency Supervision
Weakly Semi-supervised Whole Slide Image Classification by Two-level Cross Consistency Supervision
Linhao Qu
Shiman Li
Xiaoyuan Luo
Shaolei Liu
Qinhao Guo
Manning Wang
Zhijian Song
80
0
0
16 Apr 2025
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
Hongkang Li
Yihua Zhang
Shuai Zhang
Ming Wang
Sijia Liu
Pin-Yu Chen
MoMe
258
10
0
15 Apr 2025
Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs
Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs
Rui Dai
Sile Hu
Xu Shen
Yonggang Zhang
Xinmei Tian
Jieping Ye
MoMe
105
3
0
15 Apr 2025
A Model Zoo of Vision Transformers
A Model Zoo of Vision Transformers
Damian Falk
Léo Meynent
Florence Pfammatter
Konstantin Schurholt
Damian Borth
264
1
0
14 Apr 2025
The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report
The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report
Bin Ren
Hang Guo
Lei-huan Sun
Zhikai Wu
Radu Timofte
...
Dong-Hyeop Son
Ui-Jin Choi
Tiancheng Shao
Yu Zhang
Mengcheng Ma
SupR
163
13
0
14 Apr 2025
1234...192021
Next