Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.02677
Cited By
v1
v2 (latest)
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"
50 / 2,054 papers shown
Title
Scaling Optimal LR Across Token Horizons
Johan Bjorck
Alon Benhaim
Vishrav Chaudhary
Furu Wei
Xia Song
146
8
0
30 Sep 2024
Super Level Sets and Exponential Decay: A Synergistic Approach to Stable Neural Network Training
J. Chaudhary
Dipak Nidhi
J. Heikkonen
H. Merisaari
R. Kanth
62
0
0
25 Sep 2024
Statewide Visual Geolocalization in the Wild
F. Fervers
Sebastian Bullinger
C. Bodensteiner
Michael Arens
Rainer Stiefelhagen
89
4
0
25 Sep 2024
Single Image, Any Face: Generalisable 3D Face Generation
Wenqing Wang
Haosen Yang
Josef Kittler
Xiatian Zhu
3DH
150
0
0
25 Sep 2024
Efficient Training of Deep Neural Operator Networks via Randomized Sampling
Sharmila Karumuri
Lori Graham-Brady
Somdatta Goswami
82
2
0
20 Sep 2024
Convergence of Sharpness-Aware Minimization Algorithms using Increasing Batch Size and Decaying Learning Rate
Hinata Harada
Hideaki Iiduka
59
1
0
16 Sep 2024
GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion
Vitor Campagnolo Guizilini
P. Tokmakov
Achal Dave
Rares Andrei Ambrus
DiffM
75
2
0
15 Sep 2024
Spatial Adaptation Layer: Interpretable Domain Adaptation For Biosignal Sensor Array Applications
Joao Pereira
Michael Alummoottil
Dimitrios Halatsis
Dario Farina
79
1
0
12 Sep 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
264
15
0
02 Sep 2024
Hybrid Classification-Regression Adaptive Loss for Dense Object Detection
Yanquan Huang
Liu Wei Zhen
Yun Hao
Mengyuan Zhang
Qingyao Wu
Zikun Deng
Xueming Liu
Hong Deng
75
0
0
30 Aug 2024
A survey on secure decentralized optimization and learning
Changxin Liu
Nicola Bastianello
Wei Huo
Yang Shi
Karl H. Johansson
101
4
0
16 Aug 2024
Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for Collaborative DNN Training on Heterogeneous Edge Devices
Shengyuan Ye
Liekang Zeng
Xiaowen Chu
Guoliang Xing
Xu Chen
97
12
0
15 Aug 2024
Sign language recognition based on deep learning and low-cost handcrafted descriptors
Alvaro Leandro Cavalcante Carneiro
Denis Henrique Pinheiro Salvadeo
Lucas de Brito Silva
49
1
0
14 Aug 2024
Optimizing Cox Models with Stochastic Gradient Descent: Theoretical Foundations and Practical Guidances
Lang Zeng
Weijing Tang
Zhao Ren
Ying Ding
64
0
0
05 Aug 2024
From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation
Xin Liu
Chao Hao
Zitong Yu
Huanjing Yue
Jingyu Yang
65
1
0
05 Aug 2024
LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba
Yunxiang Fu
Chaoqi Chen
Yizhou Yu
Mamba
118
4
0
05 Aug 2024
Unsupervised Representation Learning by Balanced Self Attention Matching
Daniel Shalam
Simon Korman
SSL
109
0
0
04 Aug 2024
Masked Angle-Aware Autoencoder for Remote Sensing Images
Zhihao Li
B. Hou
Siteng Ma
Zitong Wu
Xianpeng Guo
Bo Ren
Licheng Jiao
132
13
0
04 Aug 2024
Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning
Dennis Chemnitz
Maximilian Engel
68
0
0
29 Jul 2024
Spring-block theory of feature learning in deep neural networks
Chengzhi Shi
Liming Pan
Ivan Dokmanić
AI4CE
134
1
0
28 Jul 2024
Ordered Momentum for Asynchronous SGD
Chang-Wei Shi
Yi-Rui Yang
Wu-Jun Li
ODL
169
0
0
27 Jul 2024
How Lightweight Can A Vision Transformer Be
Jen Hong Tan
ViT
MoE
91
0
0
25 Jul 2024
Unsqueeze [CLS] Bottleneck to Learn Rich Representations
Qing Su
Shihao Ji
90
0
0
24 Jul 2024
Stochastic weight matrix dynamics during learning and Dyson Brownian motion
Gert Aarts
B. Lucini
Chanju Park
83
1
0
23 Jul 2024
Resource-Efficient Federated Multimodal Learning via Layer-wise and Progressive Training
Ye Lin Tun
Chu Myaet Thwal
Minh N. H. Nguyen
Choong Seon Hong
87
0
0
22 Jul 2024
Towards Robust Vision Transformer via Masked Adaptive Ensemble
Fudong Lin
Jiadong Lou
Xu Yuan
Nianfeng Tzeng
ViT
AAML
88
2
0
22 Jul 2024
Textual Query-Driven Mask Transformer for Domain Generalized Segmentation
Byeonghyun Pak
Byeongju Woo
Sunghwan Kim
Dae-Hwan Kim
Hoseong Kim
134
5
0
12 Jul 2024
Analyzing Machine Learning Performance in a Hybrid Quantum Computing and HPC Environment
Samuel T. Bieberich
Michael A. Sandoval
23
0
0
10 Jul 2024
Stepping on the Edge: Curvature Aware Learning Rate Tuners
Vincent Roulet
Atish Agarwala
Jean-Bastien Grill
Grzegorz Swirszcz
Mathieu Blondel
Fabian Pedregosa
99
3
0
08 Jul 2024
DεpS: Delayed ε-Shrinking for Faster Once-For-All Training
Aditya Annavajjala
Alind Khare
Animesh Agrawal
Igor Fedorov
Hugo Latapie
Myungjin Lee
Alexey Tumanov
CLL
70
0
0
08 Jul 2024
SCSA: Exploring the Synergistic Effects Between Spatial and Channel Attention
Yunzhong Si
Huiying Xu
Xinzhong Zhu
Wenhao Zhang
Yao Dong
Yuxing Chen
Hongbo Li
114
36
0
06 Jul 2024
Isomorphic Pruning for Vision Models
Gongfan Fang
Xinyin Ma
Michael Bi Mi
Xinchao Wang
VLM
ViT
83
8
0
05 Jul 2024
Feature Attenuation of Defective Representation Can Resolve Incomplete Masking on Anomaly Detection
Yeonghyeon Park
Sungho Kang
Myung Jin Kim
Hyeong Seok Kim
Juneho Yi
AAML
62
0
0
05 Jul 2024
IntentionNet: Map-Lite Visual Navigation at the Kilometre Scale
Wei Gao
Bo Ai
Joel Loo
Vinay
David Hsu
125
1
0
03 Jul 2024
QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices
Juntao Zhao
Borui Wan
Size Zheng
Haibin Lin
Yibo Zhu
Chuan Wu
73
3
0
02 Jul 2024
Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts
Pasquale De Marinis
Nicola Fanelli
Raffaele Scaringi
Emanuele Colonna
Giuseppe Fiameni
G. Vessio
Giovanna Castellano
MLLM
VLM
72
2
0
02 Jul 2024
Are Data Augmentation Methods in Named Entity Recognition Applicable for Uncertainty Estimation?
Wataru Hashimoto
Hidetaka Kamigaito
Taro Watanabe
96
3
0
02 Jul 2024
Efficient Nearest Neighbor based Uncertainty Estimation for Natural Language Processing Tasks
Wataru Hashimoto
Hidetaka Kamigaito
Taro Watanabe
143
0
0
02 Jul 2024
Preserving Multilingual Quality While Tuning Query Encoder on English Only
Oleg V. Vasilyev
Randy Sawaya
John Bohannon
234
1
0
01 Jul 2024
Structured and Balanced Multi-Component and Multi-Layer Neural Networks
Shijun Zhang
Hongkai Zhao
Yimin Zhong
Haomin Zhou
97
1
0
30 Jun 2024
On the Trade-off between Flatness and Optimization in Distributed Learning
Ying Cao
Zhaoxian Wu
Kun Yuan
Ali H. Sayed
103
1
0
28 Jun 2024
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Tomer Porian
Mitchell Wortsman
J. Jitsev
Ludwig Schmidt
Y. Carmon
177
26
0
27 Jun 2024
On Scaling Up 3D Gaussian Splatting Training
Hexu Zhao
Haoyang Weng
Daohan Lu
Ang Li
Jinyang Li
Aurojit Panda
Saining Xie
3DGS
81
16
0
26 Jun 2024
Banishing LLM Hallucinations Requires Rethinking Generalization
Johnny Li
Saksham Consul
Eda Zhou
James Wong
Naila Farooqui
...
Zhuxiaona Wei
Tian Wu
Ben Echols
Sharon Zhou
Gregory Diamos
LRM
65
13
0
25 Jun 2024
GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
Byungsoo Jeon
Mengdi Wu
Shiyi Cao
Sunghyun Kim
Sunghyun Park
...
Xupeng Miao
Mohammad Alizadeh
G. R. Ganger
Tianqi Chen
Zhihao Jia
GNN
AI4CE
93
6
0
24 Jun 2024
Adaptive Adversarial Cross-Entropy Loss for Sharpness-Aware Minimization
Tanapat Ratchatorn
Masayuki Tanaka
AAML
106
1
0
20 Jun 2024
Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
92
1
0
20 Jun 2024
Machine Learning Visualization Tool for Exploring Parameterized Hydrodynamics
C. Jekel
D. Sterbentz
T. M. Stitt
P. Mocz
R. Rieben
D. A. White
Jonathan Belof
AI4CE
73
1
0
20 Jun 2024
Autoregressive Image Generation without Vector Quantization
Tianhong Li
Yonglong Tian
He Li
Mingyang Deng
Kaiming He
DiffM
164
238
0
17 Jun 2024
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
Pierfrancesco Beneventano
Andrea Pinto
Tomaso A. Poggio
MLT
58
1
0
17 Jun 2024
Previous
1
2
3
4
5
6
...
40
41
42
Next