v1v2v3 (latest)

Averaging Weights Leads to Wider Optima and Better Generalization

14 March 2018

Dmitry Vetrov

Papers citing "Averaging Weights Leads to Wider Optima and Better Generalization"

50 / 1,040 papers shown

Title
Editing Models with Task Arithmetic Gabriel Ilharco Marco Tulio Ribeiro Mitchell Wortsman Suchin Gururangan Ludwig Schmidt Hannaneh Hajishirzi Ali Farhadi KELM MoMe MU 213 522 0 08 Dec 2022
RainUNet for Super-Resolution Rain Movie Prediction under Spatio-temporal Shifts Jinyoung Park Minseok Son Seungju Cho Inyoung Lee Changick Kim 34 3 0 07 Dec 2022
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning Shachar Don-Yehiya Elad Venezian Colin Raffel Noam Slonim Yoav Katz Leshem Choshen MoMe 109 55 0 02 Dec 2022
BARTSmiles: Generative Masked Language Models for Molecular Representations Gayane Chilingaryan Hovhannes Tamoyan Ani Tevosyan N. Babayan L. Khondkaryan Karen Hambardzumyan Zaven Navoyan Hrant Khachatrian Armen Aghajanyan SSL 101 28 0 29 Nov 2022
Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time Huaxiu Yao Caroline Choi Bochuan Cao Yoonho Lee Pang Wei Koh Chelsea Finn OOD 93 79 0 25 Nov 2022
Cross-Domain Ensemble Distillation for Domain Generalization Kyung-Jin Lee Sungyeon Kim Suha Kwak FedML OOD 84 38 0 25 Nov 2022
Learning Feynman Diagrams using Graph Neural Networks Harrison Mitchell Alexander Norcliffe Pietro Lio GNN 114 2 0 25 Nov 2022
Improving Multi-task Learning via Seeking Task-based Flat Regions Hoang Phan Lam C. Tran Ngoc N. Tran Nhat Ho Tuan Truong Qi Lei Nhat Ho Dinh Q. Phung Trung Le 209 11 0 24 Nov 2022
Indian Commercial Truck License Plate Detection and Recognition for Weighbridge Automation Siddharth Agrawal Keyur D. Joshi 71 4 0 23 Nov 2022
Improving Robust Generalization by Direct PAC-Bayesian Bound Minimization Zifa Wang Nan Ding Tomer Levinboim Xi Chen Radu Soricut AAML 79 6 0 22 Nov 2022
Pushing the Limits of Asynchronous Graph-based Object Detection with Event Cameras Daniel Gehrig Davide Scaramuzza GNN 65 32 0 22 Nov 2022
Efficient Generalization Improvement Guided by Random Weight Perturbation Tao Li Wei Yan Zehao Lei Yingwen Wu Kun Fang Ming-Hsuan Yang Xiaolin Huang AAML 72 6 0 21 Nov 2022
Non-reversible Parallel Tempering for Deep Posterior Approximation Wei Deng Qian Zhang Qi Feng F. Liang Guang Lin 76 4 0 20 Nov 2022
SAMSON: Sharpness-Aware Minimization Scaled by Outlier Normalization for Improving DNN Generalization and Robustness Gonçalo Mordido Sébastien Henwood Sarath Chandar Franccois Leduc-Primeau AAML 46 0 0 18 Nov 2022
Empirical Study on Optimizer Selection for Out-of-Distribution Generalization Hiroki Naganuma Kartik Ahuja S. Takagi Tetsuya Motokawa Rio Yokota Kohta Ishikawa I. Sato Ioannis Mitliagkas OOD 93 7 0 15 Nov 2022
Mechanistic Mode Connectivity Ekdeep Singh Lubana Eric J. Bigelow Robert P. Dick David M. Krueger Hidenori Tanaka 118 49 0 15 Nov 2022
REPAIR: REnormalizing Permuted Activations for Interpolation Repair Keller Jordan Hanie Sedghi O. Saukh R. Entezari Behnam Neyshabur MoMe 133 101 0 15 Nov 2022
MEAL: Stable and Active Learning for Few-Shot Prompting Abdullatif Köksal Timo Schick Hinrich Schütze 98 26 0 15 Nov 2022
Multi-Head Adapter Routing for Cross-Task Generalization Lucas Caccia Edoardo Ponti Zhan Su Matheus Pereira Nicolas Le Roux Alessandro Sordoni 64 23 0 07 Nov 2022
Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning Zafir Stojanovski Karsten Roth Zeynep Akata 67 17 0 06 Nov 2022
Learning to Annotate Part Segmentation with Gradient Matching Yu Yang Xiaotian Cheng Hakan Bilen Xiangyang Ji GAN 96 7 0 06 Nov 2022
Quantifying Model Uncertainty for Semantic Segmentation using Operators in the RKHS Rishabh Singh José C. Príncipe UQCV 68 3 0 03 Nov 2022
Circling Back to Recurrent Models of Language Gábor Melis 89 0 0 03 Nov 2022
The Enemy of My Enemy is My Friend: Exploring Inverse Adversaries for Improving Adversarial Training Junhao Dong Seyed-Mohsen Moosavi-Dezfooli Jianhuang Lai Xiaohua Xie AAML 112 29 0 01 Nov 2022
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning Yaqing Wang Sahaj Agarwal Subhabrata Mukherjee Xiaodong Liu Jing Gao Ahmed Hassan Awadallah Jianfeng Gao MoE 109 136 0 31 Oct 2022
Symmetries, flat minima, and the conserved quantities of gradient flow Bo Zhao I. Ganev Robin Walters Rose Yu Nima Dehmamy 109 20 0 31 Oct 2022
Towards Generalized Few-Shot Open-Set Object Detection Binyi Su Qichuan Geng Jingzhi Li Zhongjun Zhou 117 10 0 28 Oct 2022
Facial Action Unit Detection and Intensity Estimation from Self-supervised Representation Bowen Ma Rudong An Wei Zhang Yu-qiong Ding Zeng Zhao Rongsheng Zhang Tangjie Lv Changjie Fan Zhipeng Hu CVBM 103 21 0 28 Oct 2022
Efficient and Effective Augmentation Strategy for Adversarial Training Sravanti Addepalli Samyak Jain R. Venkatesh Babu AAML 129 60 0 27 Oct 2022
Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic Forgetting in Automatic Speech Recognition Steven Vander Eeckt Hugo Van hamme CLL MoMe 113 15 0 27 Oct 2022
History-Based, Bayesian, Closure for Stochastic Parameterization: Application to Lorenz '96 Mohamed Aziz Bhouri Pierre Gentine AI4TS AI4CE 85 6 0 26 Oct 2022
Sufficient Invariant Learning for Distribution Shift Taero Kim Sungjun Lim Kyungwoo Song OOD 81 2 0 24 Oct 2022
On the optimization and pruning for Bayesian deep learning X. Ke Yanan Fan BDL UQCV 79 1 0 24 Oct 2022
Revisiting Checkpoint Averaging for Neural Machine Translation Yingbo Gao Christian Herold Zijian Yang Hermann Ney MoMe 139 12 0 21 Oct 2022
lo-fi: distributed fine-tuning without communication Mitchell Wortsman Suchin Gururangan Shen Li Ali Farhadi Ludwig Schmidt Michael G. Rabbat Ari S. Morcos 103 24 0 19 Oct 2022
Scaling Adversarial Training to Large Perturbation Bounds Sravanti Addepalli Samyak Jain Gaurang Sriramanan R. Venkatesh Babu AAML 115 23 0 18 Oct 2022
Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models Nikolaos Dimitriadis P. Frossard Franccois Fleuret 90 25 0 18 Oct 2022
Improving Adversarial Robustness by Contrastive Guided Diffusion Process Yidong Ouyang Liyan Xie Guang Cheng 67 8 0 18 Oct 2022
RoS-KD: A Robust Stochastic Knowledge Distillation Approach for Noisy Medical Imaging A. Jaiswal Kumar Ashutosh Justin F. Rousseau Yifan Peng Zhangyang Wang Ying Ding 53 10 0 15 Oct 2022
Wasserstein Barycenter-based Model Fusion and Linear Mode Connectivity of Neural Networks A. K. Akash Sixu Li Nicolas García Trillos 75 13 0 13 Oct 2022
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities Brian Bartoldson B. Kailkhura Davis W. Blalock 107 51 0 13 Oct 2022
Deep Combinatorial Aggregation Yuesong Shen Daniel Cremers OOD UQCV 52 4 0 12 Oct 2022
Improving information retention in large scale online continual learning Z. Cai V. Koltun Ozan Sener CLL 28 1 0 12 Oct 2022
Boosting the Transferability of Adversarial Attacks with Reverse Adversarial Perturbation Zeyu Qin Yanbo Fan Yi Liu Li Shen Yong Zhang Jue Wang Baoyuan Wu AAML SILM 83 84 0 12 Oct 2022
Meta-Learning with Self-Improving Momentum Target Jihoon Tack Jongjin Park Hankook Lee Jaeho Lee Jinwoo Shin LRM 126 12 0 11 Oct 2022
Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling Haw-Shiuan Chang Ruei-Yao Sun Kathryn Ricci Andrew McCallum 108 15 0 10 Oct 2022
Revisiting adapters with adversarial training Sylvestre-Alvise Rebuffi Francesco Croce Sven Gowal AAML 60 17 0 10 Oct 2022
On the Importance of Calibration in Semi-supervised Learning Charlotte Loh Rumen Dangovski Shivchander Sudalairaj Seung-Jun Han Ligong Han Leonid Karlinsky Marin Soljacic Akash Srivastava 57 6 0 10 Oct 2022
Learning Across Domains and Devices: Style-Driven Source-Free Domain Adaptation in Clustered Federated Learning Donald Shenaj Eros Fani Marco Toldo Debora Caldarola A. Tavera Umberto Michieli Marco Ciccone Pietro Zanuttigh Barbara Caputo FedML 92 39 0 05 Oct 2022
Recycling Scraps: Improving Private Learning by Leveraging Intermediate Checkpoints Virat Shejwalkar Arun Ganesh Rajiv Mathews Om Thakkar Abhradeep Thakurta 101 8 0 04 Oct 2022