v1v2v3 (latest)

Averaging Weights Leads to Wider Optima and Better Generalization

14 March 2018

Dmitry Vetrov

Papers citing "Averaging Weights Leads to Wider Optima and Better Generalization"

50 / 1,040 papers shown

Title
A Spectral Perspective towards Understanding and Improving Adversarial Robustness Binxiao Huang Rui Lin Chaofan Tao Ngai Wong AAML 78 0 0 25 Jun 2023
Concurrent ischemic lesion age estimation and segmentation of CT brain using a Transformer-based network A. Marcus P. Bentley Daniel Rueckert MedIm 109 9 0 21 Jun 2023
Traversing Between Modes in Function Space for Fast Ensembling Eunggu Yun Hyungi Lee G. Nam Juho Lee UQCV 64 3 0 20 Jun 2023
Confidence-Based Model Selection: When to Take Shortcuts for Subpopulation Shifts Annie S. Chen Yoonho Lee Amrith Rajagopal Setlur Sergey Levine Chelsea Finn OOD 101 5 0 19 Jun 2023
PLASTIC: Improving Input and Label Plasticity for Sample Efficient Reinforcement Learning Hojoon Lee Hanseul Cho Hyunseung Kim Daehoon Gwak Joonkee Kim Jaegul Choo Se-Young Yun Chulhee Yun OffRL 157 30 0 19 Jun 2023
Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models A. Jaiswal Shiwei Liu Tianlong Chen Ying Ding Zhangyang Wang VLM 115 21 0 18 Jun 2023
A Hierarchical Bayesian Model for Deep Few-Shot Meta Learning Minyoung Kim Timothy M. Hospedales BDL 63 0 0 16 Jun 2023
Collapsed Inference for Bayesian Deep Learning Zhe Zeng Guy Van den Broeck FedML BDL UQCV 126 9 0 16 Jun 2023
The Split Matters: Flat Minima Methods for Improving the Performance of GNNs N. Lell A. Scherp 72 1 0 15 Jun 2023
MUBen: Benchmarking the Uncertainty of Molecular Representation Models Yinghao Li Lingkai Kong Yuanqi Du Yue Yu Yuchen Zhuang Wenhao Mu Chao Zhang 101 11 0 14 Jun 2023
Lookaround Optimizer: $k$ steps around, 1 step average Jiangtao Zhang Shunyu Liu Mingli Song Tongtian Zhu Zhenxing Xu Mingli Song MoMe 109 6 0 13 Jun 2023
Riemannian Laplace approximations for Bayesian neural networks Federico Bergamin Pablo Moreno-Muñoz Søren Hauberg Georgios Arvanitidis BDL 81 7 0 12 Jun 2023
Unveiling the Hessian's Connection to the Decision Boundary Mahalakshmi Sabanayagam Freya Behrens Urte Adomaityte Anna Dawid 54 5 0 12 Jun 2023
Gradient Ascent Post-training Enhances Language Model Generalization Dongkeun Yoon Joel Jang Sungdong Kim Minjoon Seo VLM AI4CE 80 3 0 12 Jun 2023
Push: Concurrent Probabilistic Programming for Bayesian Deep Learning Daniel Huang Christian Camaño Jonathan Tsegaye Jonathan Austin Gale AI4CE 78 0 0 10 Jun 2023
Consistent Explanations in the Face of Model Indeterminacy via Ensembling Dan Ley Leonard Tang Matthew Nazari Hongjin Lin Suraj Srinivas Himabindu Lakkaraju 68 2 0 09 Jun 2023
A Boosted Model Ensembling Approach to Ball Action Spotting in Videos: The Runner-Up Solution to CVPR'23 SoccerNet Challenge Luping Wang Hao Guo B. Liu 100 3 0 09 Jun 2023
Differentially Private Sharpness-Aware Training Jinseong Park Hoki Kim Yujin Choi Jaewook Lee 83 8 0 09 Jun 2023
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning Libin Zhu Chaoyue Liu Adityanarayanan Radhakrishnan M. Belkin 124 15 0 07 Jun 2023
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards Alexandre Ramé Guillaume Couairon Mustafa Shukor Corentin Dancette Jean-Baptiste Gaya Laure Soulier Matthieu Cord MoMe 120 157 0 07 Jun 2023
Optimal Transport Model Distributional Robustness Van-Anh Nguyen Trung Le Anh Tuan Bui Thanh-Toan Do Dinh Q. Phung OOD 77 4 0 07 Jun 2023
Soft Merging of Experts with Adaptive Routing Mohammed Muqeeth Haokun Liu Colin Raffel MoMe MoE 105 54 0 06 Jun 2023
Early Weight Averaging meets High Learning Rates for LLM Pre-training Sunny Sanyal A. Neerkaje Jean Kaddour Abhishek Kumar Sujay Sanghavi MoMe 102 19 0 05 Jun 2023
Information Flow Control in Machine Learning through Modular Model Architecture Trishita Tiwari Suchin Gururangan Chuan Guo Weizhe Hua Sanjay Kariyappa Udit Gupta Wenjie Xiong Kiwan Maeng Hsien-Hsin S. Lee G. E. Suh 75 6 0 05 Jun 2023
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent Tongtian Zhu Fengxiang He Kaixuan Chen Mingli Song Dacheng Tao 156 15 0 05 Jun 2023
TIES-Merging: Resolving Interference When Merging Models Prateek Yadav Derek Tam Leshem Choshen Colin Raffel Joey Tianyi Zhou MoMe 143 318 0 02 Jun 2023
The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles Md Shamim Hussain Mohammed J Zaki D. Subramanian 168 3 0 02 Jun 2023
Universal Test-time Adaptation through Weight Ensembling, Diversity Weighting, and Prior Correction Robert A. Marsden Mario Döbler Bin Yang TTA 87 38 0 01 Jun 2023
Improving Energy Conserving Descent for Machine Learning: Theory and Practice G. Luca Alice Gatti E. Silverstein 69 1 0 01 Jun 2023
Quantifying Representation Reliability in Self-Supervised Learning Models Young-Jin Park Hao Wang Shervin Ardeshir Navid Azizan SSL UQCV 93 5 0 31 May 2023
Inconsistency, Instability, and Generalization Gap of Deep Neural Network Training Rie Johnson Tong Zhang 43 6 0 31 May 2023
A Bayesian Approach To Analysing Training Data Attribution In Deep Learning Elisa Nguyen Minjoon Seo Seong Joon Oh BDL 610 8 0 31 May 2023
VIPriors 3: Visual Inductive Priors for Data-Efficient Deep Learning Challenges Robert-Jan Bruintjes A. Lengyel Marcos Baptista-Rios O. Kayhan Davide Zambrano Nergis Tomen Jan van Gemert 65 9 0 31 May 2023
Improved Probabilistic Image-Text Representations Sanghyuk Chun VLM 116 31 0 29 May 2023
HyperTime: Hyperparameter Optimization for Combating Temporal Distribution Shifts Shaokun Zhang Yiran Wu Zhonghua Zheng Qingyun Wu Chi Wang OOD 95 8 0 28 May 2023
The Implicit Regularization of Dynamical Stability in Stochastic Gradient Descent Lei Wu Weijie J. Su MLT 93 23 0 27 May 2023
Improving Neural Additive Models with Bayesian Principles Kouroche Bouchiat Alexander Immer Hugo Yèche Gunnar Rätsch Vincent Fortuin BDL MedIm 105 6 0 26 May 2023
Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging Fabian David Schmidt Ivan Vulić Goran Glavaš 75 9 0 26 May 2023
Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term Yun Yue Jiadi Jiang Zhiling Ye Ni Gao Yongchao Liu Kecheng Zhang MLAU ODL 113 14 0 25 May 2023
How to escape sharp minima with random perturbations Kwangjun Ahn Ali Jadbabaie S. Sra ODL 123 8 0 25 May 2023
Rethinking the Evaluation Protocol of Domain Generalization Han Yu Xingxuan Zhang Renzhe Xu Jiashuo Liu Yue He Peng Cui OOD 101 8 0 24 May 2023
Sparse Weight Averaging with Multiple Particles for Iterative Magnitude Pruning Moonseok Choi Hyungi Lee G. Nam Juho Lee 78 2 0 24 May 2023
AdvFunMatch: When Consistent Teaching Meets Adversarial Robustness Ziuhi Wu Haichang Gao Bingqian Zhou Ping Wang AAML 63 0 0 24 May 2023
Sharpness-Aware Data Poisoning Attack Pengfei He Han Xu Jie Ren Yingqian Cui Hui Liu Charu C. Aggarwal Jiliang Tang AAML 156 8 0 24 May 2023
Improving Convergence and Generalization Using Parameter Symmetries Bo Zhao Robert Mansel Gower Robin Walters Rose Yu MoMe 127 16 0 22 May 2023
POEM: Polarization of Embeddings for Domain-Invariant Representations Sang-Yeong Jo Sung Whan Yoon 68 12 0 22 May 2023
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models Guillermo Ortiz-Jiménez Alessandro Favero P. Frossard MoMe 163 125 0 22 May 2023
Loss Spike in Training Neural Networks Zhongwang Zhang Z. Xu 72 7 0 20 May 2023
Annealing Self-Distillation Rectification Improves Adversarial Training Yuehua Wu Hung-Jui Wang Shang-Tse Chen AAML 104 5 0 20 May 2023
PANNA 2.0: Efficient neural network interatomic potentials and new architectures Franco Pellegrini Ruggero Lot Yusuf Shaidu E. Küçükbenli 27 9 0 19 May 2023