Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.06761
Cited By
When, Where and Why to Average Weights?
10 February 2025
Niccolò Ajroldi
Antonio Orvieto
Jonas Geiping
MoMe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"When, Where and Why to Average Weights?"
14 / 14 papers shown
Title
SWAD: Domain Generalization by Seeking Flat Minima
Junbum Cha
Sanghyuk Chun
Kyungjae Lee
Han-Cheol Cho
Seunghyun Park
Yunsung Lee
Sungrae Park
MoMe
259
438
0
17 Feb 2021
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song
Jascha Narain Sohl-Dickstein
Diederik P. Kingma
Abhishek Kumar
Stefano Ermon
Ben Poole
DiffM
SyDa
226
6,293
0
26 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
157
40,217
0
22 Oct 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
174
3,082
0
16 May 2020
Open Graph Benchmark: Datasets for Machine Learning on Graphs
Weihua Hu
Matthias Fey
Marinka Zitnik
Yuxiao Dong
Hongyu Ren
Bowen Liu
Michele Catasta
J. Leskovec
147
2,687
0
02 May 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well
Vipul Gupta
S. Serrano
D. DeCoste
MoMe
42
58
0
07 Jan 2020
Deep Learning Recommendation Model for Personalization and Recommendation Systems
Maxim Naumov
Dheevatsa Mudigere
Hao-Jun Michael Shi
Jianyu Huang
Narayanan Sundaraman
...
Wenlin Chen
Vijay Rao
Bill Jia
Liang Xiong
M. Smelyanskiy
37
726
0
31 May 2019
fastMRI: An Open Dataset and Benchmarks for Accelerated MRI
Jure Zbontar
Florian Knoll
Anuroop Sriram
Tullie Murrell
Zhengnan Huang
...
Erich Owens
C. L. Zitnick
M. Recht
D. Sodickson
Yvonne W. Lui
OOD
24
836
0
21 Nov 2018
Relational inductive biases, deep learning, and graph networks
Peter W. Battaglia
Jessica B. Hamrick
V. Bapst
Alvaro Sanchez-Gonzalez
V. Zambaldi
...
Pushmeet Kohli
M. Botvinick
Oriol Vinyals
Yujia Li
Razvan Pascanu
AI4CE
NAI
306
3,101
0
04 Jun 2018
Iterate averaging as regularization for stochastic gradient descent
Gergely Neu
Lorenzo Rosasco
MoMe
59
61
0
22 Feb 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
278
129,831
0
12 Jun 2017
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov
Frank Hutter
ODL
190
8,030
0
13 Aug 2016
Rethinking the Inception Architecture for Computer Vision
Christian Szegedy
Vincent Vanhoucke
Sergey Ioffe
Jonathon Shlens
Z. Wojna
3DV
BDL
348
27,231
0
02 Dec 2015
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
865
76,547
0
18 May 2015
1