ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.05407
  4. Cited By
Averaging Weights Leads to Wider Optima and Better Generalization
v1v2v3 (latest)

Averaging Weights Leads to Wider Optima and Better Generalization

14 March 2018
Pavel Izmailov
Dmitrii Podoprikhin
T. Garipov
Dmitry Vetrov
A. Wilson
    FedMLMoMe
ArXiv (abs)PDFHTML

Papers citing "Averaging Weights Leads to Wider Optima and Better Generalization"

50 / 1,040 papers shown
Title
A Spectral Perspective towards Understanding and Improving Adversarial
  Robustness
A Spectral Perspective towards Understanding and Improving Adversarial Robustness
Binxiao Huang
Rui Lin
Chaofan Tao
Ngai Wong
AAML
78
0
0
25 Jun 2023
Concurrent ischemic lesion age estimation and segmentation of CT brain
  using a Transformer-based network
Concurrent ischemic lesion age estimation and segmentation of CT brain using a Transformer-based network
A. Marcus
P. Bentley
Daniel Rueckert
MedIm
109
9
0
21 Jun 2023
Traversing Between Modes in Function Space for Fast Ensembling
Traversing Between Modes in Function Space for Fast Ensembling
Eunggu Yun
Hyungi Lee
G. Nam
Juho Lee
UQCV
64
3
0
20 Jun 2023
Confidence-Based Model Selection: When to Take Shortcuts for
  Subpopulation Shifts
Confidence-Based Model Selection: When to Take Shortcuts for Subpopulation Shifts
Annie S. Chen
Yoonho Lee
Amrith Rajagopal Setlur
Sergey Levine
Chelsea Finn
OOD
101
5
0
19 Jun 2023
PLASTIC: Improving Input and Label Plasticity for Sample Efficient
  Reinforcement Learning
PLASTIC: Improving Input and Label Plasticity for Sample Efficient Reinforcement Learning
Hojoon Lee
Hanseul Cho
Hyunseung Kim
Daehoon Gwak
Joonkee Kim
Jaegul Choo
Se-Young Yun
Chulhee Yun
OffRL
157
30
0
19 Jun 2023
Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery
  Tickets from Large Models
Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models
A. Jaiswal
Shiwei Liu
Tianlong Chen
Ying Ding
Zhangyang Wang
VLM
115
21
0
18 Jun 2023
A Hierarchical Bayesian Model for Deep Few-Shot Meta Learning
A Hierarchical Bayesian Model for Deep Few-Shot Meta Learning
Minyoung Kim
Timothy M. Hospedales
BDL
63
0
0
16 Jun 2023
Collapsed Inference for Bayesian Deep Learning
Collapsed Inference for Bayesian Deep Learning
Zhe Zeng
Guy Van den Broeck
FedMLBDLUQCV
126
9
0
16 Jun 2023
The Split Matters: Flat Minima Methods for Improving the Performance of
  GNNs
The Split Matters: Flat Minima Methods for Improving the Performance of GNNs
N. Lell
A. Scherp
72
1
0
15 Jun 2023
MUBen: Benchmarking the Uncertainty of Molecular Representation Models
MUBen: Benchmarking the Uncertainty of Molecular Representation Models
Yinghao Li
Lingkai Kong
Yuanqi Du
Yue Yu
Yuchen Zhuang
Wenhao Mu
Chao Zhang
101
11
0
14 Jun 2023
Lookaround Optimizer: $k$ steps around, 1 step average
Lookaround Optimizer: kkk steps around, 1 step average
Jiangtao Zhang
Shunyu Liu
Mingli Song
Tongtian Zhu
Zhenxing Xu
Mingli Song
MoMe
109
6
0
13 Jun 2023
Riemannian Laplace approximations for Bayesian neural networks
Riemannian Laplace approximations for Bayesian neural networks
Federico Bergamin
Pablo Moreno-Muñoz
Søren Hauberg
Georgios Arvanitidis
BDL
81
7
0
12 Jun 2023
Unveiling the Hessian's Connection to the Decision Boundary
Unveiling the Hessian's Connection to the Decision Boundary
Mahalakshmi Sabanayagam
Freya Behrens
Urte Adomaityte
Anna Dawid
54
5
0
12 Jun 2023
Gradient Ascent Post-training Enhances Language Model Generalization
Gradient Ascent Post-training Enhances Language Model Generalization
Dongkeun Yoon
Joel Jang
Sungdong Kim
Minjoon Seo
VLMAI4CE
80
3
0
12 Jun 2023
Push: Concurrent Probabilistic Programming for Bayesian Deep Learning
Push: Concurrent Probabilistic Programming for Bayesian Deep Learning
Daniel Huang
Christian Camaño
Jonathan Tsegaye
Jonathan Austin Gale
AI4CE
78
0
0
10 Jun 2023
Consistent Explanations in the Face of Model Indeterminacy via
  Ensembling
Consistent Explanations in the Face of Model Indeterminacy via Ensembling
Dan Ley
Leonard Tang
Matthew Nazari
Hongjin Lin
Suraj Srinivas
Himabindu Lakkaraju
68
2
0
09 Jun 2023
A Boosted Model Ensembling Approach to Ball Action Spotting in Videos:
  The Runner-Up Solution to CVPR'23 SoccerNet Challenge
A Boosted Model Ensembling Approach to Ball Action Spotting in Videos: The Runner-Up Solution to CVPR'23 SoccerNet Challenge
Luping Wang
Hao Guo
B. Liu
100
3
0
09 Jun 2023
Differentially Private Sharpness-Aware Training
Differentially Private Sharpness-Aware Training
Jinseong Park
Hoki Kim
Yujin Choi
Jaewook Lee
83
8
0
09 Jun 2023
Catapults in SGD: spikes in the training loss and their impact on
  generalization through feature learning
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
Libin Zhu
Chaoyue Liu
Adityanarayanan Radhakrishnan
M. Belkin
124
15
0
07 Jun 2023
Rewarded soups: towards Pareto-optimal alignment by interpolating
  weights fine-tuned on diverse rewards
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
Alexandre Ramé
Guillaume Couairon
Mustafa Shukor
Corentin Dancette
Jean-Baptiste Gaya
Laure Soulier
Matthieu Cord
MoMe
120
157
0
07 Jun 2023
Optimal Transport Model Distributional Robustness
Optimal Transport Model Distributional Robustness
Van-Anh Nguyen
Trung Le
Anh Tuan Bui
Thanh-Toan Do
Dinh Q. Phung
OOD
77
4
0
07 Jun 2023
Soft Merging of Experts with Adaptive Routing
Soft Merging of Experts with Adaptive Routing
Mohammed Muqeeth
Haokun Liu
Colin Raffel
MoMeMoE
105
54
0
06 Jun 2023
Early Weight Averaging meets High Learning Rates for LLM Pre-training
Early Weight Averaging meets High Learning Rates for LLM Pre-training
Sunny Sanyal
A. Neerkaje
Jean Kaddour
Abhishek Kumar
Sujay Sanghavi
MoMe
102
19
0
05 Jun 2023
Information Flow Control in Machine Learning through Modular Model
  Architecture
Information Flow Control in Machine Learning through Modular Model Architecture
Trishita Tiwari
Suchin Gururangan
Chuan Guo
Weizhe Hua
Sanjay Kariyappa
Udit Gupta
Wenjie Xiong
Kiwan Maeng
Hsien-Hsin S. Lee
G. E. Suh
75
6
0
05 Jun 2023
Decentralized SGD and Average-direction SAM are Asymptotically
  Equivalent
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent
Tongtian Zhu
Fengxiang He
Kaixuan Chen
Mingli Song
Dacheng Tao
156
15
0
05 Jun 2023
TIES-Merging: Resolving Interference When Merging Models
TIES-Merging: Resolving Interference When Merging Models
Prateek Yadav
Derek Tam
Leshem Choshen
Colin Raffel
Joey Tianyi Zhou
MoMe
143
318
0
02 Jun 2023
The Information Pathways Hypothesis: Transformers are Dynamic
  Self-Ensembles
The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles
Md Shamim Hussain
Mohammed J Zaki
D. Subramanian
168
3
0
02 Jun 2023
Universal Test-time Adaptation through Weight Ensembling, Diversity
  Weighting, and Prior Correction
Universal Test-time Adaptation through Weight Ensembling, Diversity Weighting, and Prior Correction
Robert A. Marsden
Mario Döbler
Bin Yang
TTA
87
38
0
01 Jun 2023
Improving Energy Conserving Descent for Machine Learning: Theory and
  Practice
Improving Energy Conserving Descent for Machine Learning: Theory and Practice
G. Luca
Alice Gatti
E. Silverstein
69
1
0
01 Jun 2023
Quantifying Representation Reliability in Self-Supervised Learning
  Models
Quantifying Representation Reliability in Self-Supervised Learning Models
Young-Jin Park
Hao Wang
Shervin Ardeshir
Navid Azizan
SSLUQCV
93
5
0
31 May 2023
Inconsistency, Instability, and Generalization Gap of Deep Neural
  Network Training
Inconsistency, Instability, and Generalization Gap of Deep Neural Network Training
Rie Johnson
Tong Zhang
43
6
0
31 May 2023
A Bayesian Approach To Analysing Training Data Attribution In Deep
  Learning
A Bayesian Approach To Analysing Training Data Attribution In Deep Learning
Elisa Nguyen
Minjoon Seo
Seong Joon Oh
BDL
610
8
0
31 May 2023
VIPriors 3: Visual Inductive Priors for Data-Efficient Deep Learning
  Challenges
VIPriors 3: Visual Inductive Priors for Data-Efficient Deep Learning Challenges
Robert-Jan Bruintjes
A. Lengyel
Marcos Baptista-Rios
O. Kayhan
Davide Zambrano
Nergis Tomen
Jan van Gemert
65
9
0
31 May 2023
Improved Probabilistic Image-Text Representations
Improved Probabilistic Image-Text Representations
Sanghyuk Chun
VLM
116
31
0
29 May 2023
HyperTime: Hyperparameter Optimization for Combating Temporal
  Distribution Shifts
HyperTime: Hyperparameter Optimization for Combating Temporal Distribution Shifts
Shaokun Zhang
Yiran Wu
Zhonghua Zheng
Qingyun Wu
Chi Wang
OOD
95
8
0
28 May 2023
The Implicit Regularization of Dynamical Stability in Stochastic
  Gradient Descent
The Implicit Regularization of Dynamical Stability in Stochastic Gradient Descent
Lei Wu
Weijie J. Su
MLT
93
23
0
27 May 2023
Improving Neural Additive Models with Bayesian Principles
Improving Neural Additive Models with Bayesian Principles
Kouroche Bouchiat
Alexander Immer
Hugo Yèche
Gunnar Rätsch
Vincent Fortuin
BDLMedIm
105
6
0
26 May 2023
Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging
Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging
Fabian David Schmidt
Ivan Vulić
Goran Glavaš
75
9
0
26 May 2023
Sharpness-Aware Minimization Revisited: Weighted Sharpness as a
  Regularization Term
Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term
Yun Yue
Jiadi Jiang
Zhiling Ye
Ni Gao
Yongchao Liu
Kecheng Zhang
MLAUODL
113
14
0
25 May 2023
How to escape sharp minima with random perturbations
How to escape sharp minima with random perturbations
Kwangjun Ahn
Ali Jadbabaie
S. Sra
ODL
123
8
0
25 May 2023
Rethinking the Evaluation Protocol of Domain Generalization
Rethinking the Evaluation Protocol of Domain Generalization
Han Yu
Xingxuan Zhang
Renzhe Xu
Jiashuo Liu
Yue He
Peng Cui
OOD
101
8
0
24 May 2023
Sparse Weight Averaging with Multiple Particles for Iterative Magnitude
  Pruning
Sparse Weight Averaging with Multiple Particles for Iterative Magnitude Pruning
Moonseok Choi
Hyungi Lee
G. Nam
Juho Lee
78
2
0
24 May 2023
AdvFunMatch: When Consistent Teaching Meets Adversarial Robustness
AdvFunMatch: When Consistent Teaching Meets Adversarial Robustness
Ziuhi Wu
Haichang Gao
Bingqian Zhou
Ping Wang
AAML
63
0
0
24 May 2023
Sharpness-Aware Data Poisoning Attack
Sharpness-Aware Data Poisoning Attack
Pengfei He
Han Xu
Jie Ren
Yingqian Cui
Hui Liu
Charu C. Aggarwal
Jiliang Tang
AAML
156
8
0
24 May 2023
Improving Convergence and Generalization Using Parameter Symmetries
Improving Convergence and Generalization Using Parameter Symmetries
Bo Zhao
Robert Mansel Gower
Robin Walters
Rose Yu
MoMe
127
16
0
22 May 2023
POEM: Polarization of Embeddings for Domain-Invariant Representations
POEM: Polarization of Embeddings for Domain-Invariant Representations
Sang-Yeong Jo
Sung Whan Yoon
68
12
0
22 May 2023
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained
  Models
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models
Guillermo Ortiz-Jiménez
Alessandro Favero
P. Frossard
MoMe
163
125
0
22 May 2023
Loss Spike in Training Neural Networks
Loss Spike in Training Neural Networks
Zhongwang Zhang
Z. Xu
72
7
0
20 May 2023
Annealing Self-Distillation Rectification Improves Adversarial Training
Annealing Self-Distillation Rectification Improves Adversarial Training
Yuehua Wu
Hung-Jui Wang
Shang-Tse Chen
AAML
104
5
0
20 May 2023
PANNA 2.0: Efficient neural network interatomic potentials and new
  architectures
PANNA 2.0: Efficient neural network interatomic potentials and new architectures
Franco Pellegrini
Ruggero Lot
Yusuf Shaidu
E. Küçükbenli
27
9
0
19 May 2023
Previous
123...8910...192021
Next