ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.05407
  4. Cited By
Averaging Weights Leads to Wider Optima and Better Generalization
v1v2v3 (latest)

Averaging Weights Leads to Wider Optima and Better Generalization

14 March 2018
Pavel Izmailov
Dmitrii Podoprikhin
T. Garipov
Dmitry Vetrov
A. Wilson
    FedMLMoMe
ArXiv (abs)PDFHTML

Papers citing "Averaging Weights Leads to Wider Optima and Better Generalization"

50 / 1,040 papers shown
Title
Robust Contrastive Learning With Theory Guarantee
Robust Contrastive Learning With Theory Guarantee
Ngoc N. Tran
Lam C. Tran
Hoang Phan
Anh-Vu Bui
Tung Pham
Toan M. Tran
Dinh Q. Phung
Trung Le
SSLNoLa
68
0
0
16 Nov 2023
Language and Task Arithmetic with Parameter-Efficient Layers for
  Zero-Shot Summarization
Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization
Alexandra Chronopoulou
Jonas Pfeiffer
Joshua Maynez
Xinyi Wang
Sebastian Ruder
Priyanka Agrawal
MoMe
87
18
0
15 Nov 2023
Using Stochastic Gradient Descent to Smooth Nonconvex Functions:
  Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling
Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling
Naoki Sato
Hideaki Iiduka
76
3
0
15 Nov 2023
Reducing the Side-Effects of Oscillations in Training of Quantized YOLO
  Networks
Reducing the Side-Effects of Oscillations in Training of Quantized YOLO Networks
Kartik Gupta
Akshay Asthana
MQ
36
8
0
09 Nov 2023
Robust Fine-Tuning of Vision-Language Models for Domain Generalization
Robust Fine-Tuning of Vision-Language Models for Domain Generalization
Kevin Vogt-Lowell
Noah Lee
Theodoros Tsiligkaridis
Marc Vaillant
VLM
79
4
0
03 Nov 2023
Towards Calibrated Robust Fine-Tuning of Vision-Language Models
Towards Calibrated Robust Fine-Tuning of Vision-Language Models
Changdae Oh
Hyesu Lim
Mijoo Kim
Dongyoon Han
Junhyeok Park
Euiseog Jeong
Alexander G. Hauptmann
Zhi-Qi Cheng
Kyungwoo Song
VLM
108
18
0
03 Nov 2023
Holistic Transfer: Towards Non-Disruptive Fine-Tuning with Partial
  Target Data
Holistic Transfer: Towards Non-Disruptive Fine-Tuning with Partial Target Data
Cheng-Hao Tu
Hong-You Chen
Zheda Mai
Shitian Zhao
Vardaan Pahuja
Tanya Berger-Wolf
Song Gao
Charles V. Stewart
Yu-Chuan Su
Wei-Lun Chao
CLL
84
6
0
02 Nov 2023
ATHENA: Mathematical Reasoning with Thought Expansion
ATHENA: Mathematical Reasoning with Thought Expansion
JB. Kim
Hazel Kim
Joonghyuk Hahn
Yo-Sub Han
ReLMLRMAIMat
116
7
0
02 Nov 2023
On Feynman--Kac training of partial Bayesian neural networks
On Feynman--Kac training of partial Bayesian neural networks
Zheng Zhao
Sebastian Mair
Thomas B. Schön
Jens Sjölund
74
0
0
30 Oct 2023
Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from
  a Minimax Game Perspective
Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from a Minimax Game Perspective
Yifei Wang
Liangchen Li
Jiansheng Yang
Zhouchen Lin
Yisen Wang
69
15
0
30 Oct 2023
Seeking Flat Minima with Mean Teacher on Semi- and Weakly-Supervised
  Domain Generalization for Object Detection
Seeking Flat Minima with Mean Teacher on Semi- and Weakly-Supervised Domain Generalization for Object Detection
Ryosuke Furuta
Yoichi Sato
104
0
0
30 Oct 2023
Proving Linear Mode Connectivity of Neural Networks via Optimal
  Transport
Proving Linear Mode Connectivity of Neural Networks via Optimal Transport
Damien Ferbach
Baptiste Goujaud
Gauthier Gidel
Aymeric Dieuleveut
MoMe
129
16
0
29 Oct 2023
Instance Segmentation under Occlusions via Location-aware Copy-Paste
  Data Augmentation
Instance Segmentation under Occlusions via Location-aware Copy-Paste Data Augmentation
Son Nguyen
Mikel Lainsa
Hung Dao
Daeyoung Kim
Giang Nguyen
44
1
0
27 Oct 2023
Diversifying Spatial-Temporal Perception for Video Domain Generalization
Diversifying Spatial-Temporal Perception for Video Domain Generalization
Kun-Yu Lin
Jia-Run Du
Yipeng Gao
Jiaming Zhou
Wei-Shi Zheng
84
16
0
27 Oct 2023
FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness
  for Semi-Supervised Learning
FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness for Semi-Supervised Learning
Zhuo Huang
Li Shen
Jun-chen Yu
Bo Han
Tongliang Liu
FedML
104
23
0
25 Oct 2023
Winning Prize Comes from Losing Tickets: Improve Invariant Learning by
  Exploring Variant Parameters for Out-of-Distribution Generalization
Winning Prize Comes from Losing Tickets: Improve Invariant Learning by Exploring Variant Parameters for Out-of-Distribution Generalization
Zhuo Huang
Muyang Li
Li Shen
Jun-chen Yu
Chen Gong
Bo Han
Tongliang Liu
OOD
118
11
0
25 Oct 2023
Improving generalization in large language models by learning prefix
  subspaces
Improving generalization in large language models by learning prefix subspaces
Louis Falissard
Vincent Guigue
Laure Soulier
47
1
0
24 Oct 2023
A Quadratic Synchronization Rule for Distributed Deep Learning
A Quadratic Synchronization Rule for Distributed Deep Learning
Xinran Gu
Kaifeng Lyu
Sanjeev Arora
Jingzhao Zhang
Longbo Huang
92
1
0
22 Oct 2023
Exponential weight averaging as damped harmonic motion
Exponential weight averaging as damped harmonic motion
J. Patsenker
Henry Li
Y. Kluger
50
0
0
20 Oct 2023
Model Merging by Uncertainty-Based Gradient Matching
Model Merging by Uncertainty-Based Gradient Matching
Nico Daheim
Thomas Möllenhoff
Edoardo Ponti
Iryna Gurevych
Mohammad Emtiyaz Khan
MoMeFedML
98
53
0
19 Oct 2023
Learn from the Past: A Proxy Guided Adversarial Defense Framework with
  Self Distillation Regularization
Learn from the Past: A Proxy Guided Adversarial Defense Framework with Self Distillation Regularization
Yaohua Liu
Jiaxin Gao
Xianghao Jiao
Zhu Liu
Xin-Yue Fan
Risheng Liu
AAML
87
0
0
19 Oct 2023
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from
  a Parametric Perspective
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective
Ming Zhong
Chenxin An
Weizhu Chen
Jiawei Han
Pengcheng He
98
12
0
17 Oct 2023
Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning
  and Autoregression
Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning and Autoregression
Adam Block
Dylan J. Foster
Akshay Krishnamurthy
Max Simchowitz
Cyril Zhang
77
7
0
17 Oct 2023
Relearning Forgotten Knowledge: on Forgetting, Overfit and Training-Free
  Ensembles of DNNs
Relearning Forgotten Knowledge: on Forgetting, Overfit and Training-Free Ensembles of DNNs
Uri Stern
D. Weinshall
CLL
60
0
0
17 Oct 2023
Domain Generalization Using Large Pretrained Models with
  Mixture-of-Adapters
Domain Generalization Using Large Pretrained Models with Mixture-of-Adapters
Gyuseong Lee
Wooseok Jang
Jin Hyeon Kim
Jaewoo Jung
Seungryong Kim
MoEOOD
67
4
0
17 Oct 2023
Causal Dynamic Variational Autoencoder for Counterfactual Regression in Longitudinal Data
Causal Dynamic Variational Autoencoder for Counterfactual Regression in Longitudinal Data
Mouad El Bouchattaoui
Myriam Tami
Benoit Lepetit
P. Cournède
CMLOOD
216
2
0
16 Oct 2023
On the Over-Memorization During Natural, Robust and Catastrophic
  Overfitting
On the Over-Memorization During Natural, Robust and Catastrophic Overfitting
Runqi Lin
Chaojian Yu
Bo Han
Tongliang Liu
78
9
0
13 Oct 2023
Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing
  Label Bias in Foundation Models
Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models
Beier Zhu
Kaihua Tang
Qianru Sun
Hanwang Zhang
76
22
0
12 Oct 2023
Entropy-MCMC: Sampling from Flat Basins with Ease
Entropy-MCMC: Sampling from Flat Basins with Ease
Bolian Li
Ruqi Zhang
76
5
0
09 Oct 2023
Continuous Invariance Learning
Continuous Invariance Learning
Yong Lin
Fan Zhou
Lu Tan
Lintao Ma
Jiameng Liu
...
Yuan Yuan
Yu Liu
James Y. Zhang
Yujiu Yang
Hao Wang
CLLOOD
80
4
0
09 Oct 2023
Building an Open-Vocabulary Video CLIP Model with Better Architectures,
  Optimization and Data
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data
Zuxuan Wu
Zejia Weng
Wujian Peng
Xitong Yang
Ang Li
Larry S. Davis
Yu-Gang Jiang
CLIPVLM
95
22
0
08 Oct 2023
Parameter Efficient Multi-task Model Fusion with Partial Linearization
Parameter Efficient Multi-task Model Fusion with Partial Linearization
Anke Tang
Li Shen
Yong Luo
Yibing Zhan
Han Hu
Bo Du
Yixin Chen
Dacheng Tao
MoMe
122
36
0
07 Oct 2023
TRAM: Bridging Trust Regions and Sharpness Aware Minimization
TRAM: Bridging Trust Regions and Sharpness Aware Minimization
Tom Sherborne
Naomi Saphra
Pradeep Dasigi
Hao Peng
58
5
0
05 Oct 2023
Splitting the Difference on Adversarial Training
Splitting the Difference on Adversarial Training
Matan Levi
A. Kontorovich
89
4
0
03 Oct 2023
Chunking: Continual Learning is not just about Distribution Shift
Chunking: Continual Learning is not just about Distribution Shift
Thomas L. Lee
Amos Storkey
76
1
0
03 Oct 2023
It's MBR All the Way Down: Modern Generation Techniques Through the Lens
  of Minimum Bayes Risk
It's MBR All the Way Down: Modern Generation Techniques Through the Lens of Minimum Bayes Risk
Amanda Bertsch
Alex Xie
Graham Neubig
Matthew R. Gormley
82
36
0
02 Oct 2023
Window-based Model Averaging Improves Generalization in Heterogeneous
  Federated Learning
Window-based Model Averaging Improves Generalization in Heterogeneous Federated Learning
Debora Caldarola
Barbara Caputo
Marco Ciccone
FedML
77
7
0
02 Oct 2023
Sharingan: A Transformer-based Architecture for Gaze Following
Sharingan: A Transformer-based Architecture for Gaze Following
Samy Tafasca
Anshul Gupta
J. Odobez
ViT
79
3
0
01 Oct 2023
On Memorization and Privacy Risks of Sharpness Aware Minimization
On Memorization and Privacy Risks of Sharpness Aware Minimization
Young In Kim
Pratiksha Agrawal
J. Royset
Rajiv Khanna
FedML
86
3
0
30 Sep 2023
GRANDE: Gradient-Based Decision Tree Ensembles for Tabular Data
GRANDE: Gradient-Based Decision Tree Ensembles for Tabular Data
Sascha Marton
Stefan Lüdtke
Christian Bartelt
Heiner Stuckenschmidt
LMTD
60
6
0
29 Sep 2023
Sharpness-Aware Teleportation on Riemannian Manifolds
Sharpness-Aware Teleportation on Riemannian Manifolds
Kenneth Allen
Hoang Nguyen
Haocheng Luo
Ming-Jun Lai
Mehrtash Harandi
Dinh Q. Phung
T. Le
AAML
95
3
0
29 Sep 2023
Bringing the Discussion of Minima Sharpness to the Audio Domain: a
  Filter-Normalised Evaluation for Acoustic Scene Classification
Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification
M. Milling
Andreas Triantafyllopoulos
Iosif Tsangko
Simon Rampp
F. Schlüter
111
3
0
28 Sep 2023
A Primer on Bayesian Neural Networks: Review and Debates
A Primer on Bayesian Neural Networks: Review and Debates
Federico Danieli
Konstantinos Pitas
M. Vladimirova
Vincent Fortuin
BDLAAML
103
20
0
28 Sep 2023
Deep Model Fusion: A Survey
Deep Model Fusion: A Survey
Weishi Li
Yong Peng
Miao Zhang
Liang Ding
Han Hu
Li Shen
FedMLMoMe
115
62
0
27 Sep 2023
Enhancing Sharpness-Aware Optimization Through Variance Suppression
Enhancing Sharpness-Aware Optimization Through Variance Suppression
Bingcong Li
G. Giannakis
AAML
112
23
0
27 Sep 2023
A mirror-Unet architecture for PET/CT lesion segmentation
A mirror-Unet architecture for PET/CT lesion segmentation
Yamila Rotstein Habarnau
Mauro Namías
34
0
0
23 Sep 2023
Distributional Shift-Aware Off-Policy Interval Estimation: A Unified
  Error Quantification Framework
Distributional Shift-Aware Off-Policy Interval Estimation: A Unified Error Quantification Framework
Wenzhuo Zhou
Yuhan Li
Ruoqing Zhu
Annie Qu
OffRL
83
5
0
23 Sep 2023
Investigating Efficient Deep Learning Architectures For Side-Channel
  Attacks on AES
Investigating Efficient Deep Learning Architectures For Side-Channel Attacks on AES
Yohai-Eliel Berreby
L. Sauvage
AAML
40
2
0
22 Sep 2023
Trading-off Mutual Information on Feature Aggregation for Face
  Recognition
Trading-off Mutual Information on Feature Aggregation for Face Recognition
Mohammad Akyash
Ali Zafari
Nasser M. Nasrabadi
ViT
72
1
0
22 Sep 2023
Weight Averaging Improves Knowledge Distillation under Domain Shift
Weight Averaging Improves Knowledge Distillation under Domain Shift
Valeriy Berezovskiy
Nikita Morozov
MoMe
83
1
0
20 Sep 2023
Previous
123...678...192021
Next