Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.02677
Cited By
v1
v2 (latest)
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"
50 / 2,054 papers shown
Title
Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence
Junru Lu
Jiazheng Li
Siyu An
Meng Zhao
Yulan He
Di Yin
Xing Sun
94
20
0
16 Jun 2024
On the Effectiveness of Supervision in Asymmetric Non-Contrastive Learning
Jeongheon Oh
Kibok Lee
SSL
73
1
0
16 Jun 2024
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models
Qihao Liu
Zhanpeng Zeng
Ju He
Qihang Yu
Xiaohui Shen
Liang-Chieh Chen
110
22
0
13 Jun 2024
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
Roman Bachmann
Oğuzhan Fatih Kar
David Mizrahi
Ali Garjani
Mingfei Gao
David Griffiths
Jiaming Hu
Afshin Dehghan
Amir Zamir
MoE
VLM
MLLM
106
17
0
13 Jun 2024
Why Warmup the Learning Rate? Underlying Mechanisms and Improvements
Dayal Singh Kalra
M. Barkeshli
125
11
0
13 Jun 2024
Financial Assets Dependency Prediction Utilizing Spatiotemporal Patterns
Haoren Zhu
Pengfei Zhao
Wilfred Siu Hung NG
Dik Lun Lee
35
0
0
13 Jun 2024
Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization
Yuhang Cai
Jingfeng Wu
Song Mei
Michael Lindsey
Peter L. Bartlett
91
4
0
12 Jun 2024
Meta-Learning Neural Procedural Biases
Christian Raymond
Qi Chen
Bing Xue
Mengjie Zhan
105
1
0
12 Jun 2024
Visual Representation Learning with Stochastic Frame Prediction
Huiwon Jang
Dongyoung Kim
Junsu Kim
Jinwoo Shin
Pieter Abbeel
Younggyo Seo
90
3
0
11 Jun 2024
Training Through Failure: Effects of Data Consistency in Parallel Machine Learning Training
Ray Cao
Sherry Luo
Steve Gan
Sujeeth Jinesh
61
0
0
08 Jun 2024
MeGA: Merging Multiple Independently Trained Neural Networks Based on Genetic Algorithm
Daniel Yun
FedML
MoMe
46
1
0
07 Jun 2024
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training
Ao Sun
Weilin Zhao
Xu Han
Cheng Yang
Zhiyuan Liu
Chuan Shi
Maosong Sun
96
8
0
05 Jun 2024
SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN
Kang You
Zekai Xu
Chen Nie
Zhijie Deng
Qinghai Guo
Xiang Wang
Zhezhi He
102
11
0
05 Jun 2024
Efficient Data-Parallel Continual Learning with Asynchronous Distributed Rehearsal Buffers
Thomas Bouvier
Bogdan Nicolae
Hugo Chaugier
Alexandru Costan
Ian Foster
Gabriel Antoniu
76
1
0
05 Jun 2024
Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs
Luca Arnaboldi
Yatin Dandi
Florent Krzakala
Bruno Loureiro
Luca Pesce
Ludovic Stephan
92
1
0
04 Jun 2024
PETRA: Parallel End-to-end Training with Reversible Architectures
Stéphane Rivaud
Louis Fournier
Thomas Pumir
Eugene Belilovsky
Michael Eickenberg
Edouard Oyallon
96
0
0
04 Jun 2024
Autaptic Synaptic Circuit Enhances Spatio-temporal Predictive Learning of Spiking Neural Networks
Lihao Wang
Zhaofei Yu
98
5
0
01 Jun 2024
Contrastive Learning Via Equivariant Representation
Sifan Song
Jinfeng Wang
Qiaochu Zhao
Xiang Li
Dufan Wu
Angelos Stefanidis
Jionglong Su
S. Kevin Zhou
Quanzheng Li
81
1
0
01 Jun 2024
ACE: A Model Poisoning Attack on Contribution Evaluation Methods in Federated Learning
Zhangchen Xu
Fengqing Jiang
Luyao Niu
Jinyuan Jia
Bo Li
Radha Poovendran
FedML
90
2
0
31 May 2024
Improving Generalization and Convergence by Enhancing Implicit Regularization
Mingze Wang
Haotian He
Jinbo Wang
Zilin Wang
Guanhua Huang
Feiyu Xiong
Zhiyu Li
E. Weinan
Lei Wu
96
8
0
31 May 2024
Boosting General Trimap-free Matting in the Real-World Image
Leo Shan
104
1
0
28 May 2024
Dual-Delayed Asynchronous SGD for Arbitrarily Heterogeneous Data
Xiaolu Wang
Yuchang Sun
Hoi-To Wai
Jun Zhang
80
0
0
27 May 2024
AdaFisher: Adaptive Second Order Optimization via Fisher Information
Damien Martins Gomes
Yanlei Zhang
Eugene Belilovsky
Guy Wolf
Mahdi S. Hosseini
ODL
216
3
0
26 May 2024
SATSense: Multi-Satellite Collaborative Framework for Spectrum Sensing
Haoxuan Yuan
Zhe Chen
Zheng Lin
Jinbo Peng
Zihan Fang
Yuhang Zhong
Zihang Song
Yue Gao
76
12
0
24 May 2024
Pipeline Parallelism with Controllable Memory
Penghui Qi
Xinyi Wan
Nyamdavaa Amar
Min Lin
72
6
0
24 May 2024
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Shuaipeng Li
Penghao Zhao
Hailin Zhang
Xingwu Sun
Hao Wu
...
Zheng Fang
Jinbao Xue
Yangyu Tao
Tengjiao Wang
Di Wang
92
9
0
23 May 2024
Domain-specific augmentations with resolution agnostic self-attention mechanism improves choroid segmentation in optical coherence tomography images
Jamie Burke
Justin Engelmann
Charlene Hamid
Diana Moukaddem
Dan Pugh
...
Niall C. Strang
Stuart King
Tom J. MacGillivray
Miguel O. Bernabeu
Ian J. C. MacCormick
MedIm
67
0
0
23 May 2024
Does context matter in digital pathology?
Paulina Tomaszewska
Mateusz Sperkowski
Przemysław Biecek
22
0
0
23 May 2024
SlipStream: Adapting Pipelines for Distributed Training of Large DNNs Amid Failures
Swapnil Gandhi
Mark Zhao
Athinagoras Skiadopoulos
Christos Kozyrakis
AI4CE
GNN
64
1
0
22 May 2024
Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion
Xinyang Li
Zhangyu Lai
Linning Xu
Jianfei Guo
Liujuan Cao
Shengchuan Zhang
Bo Dai
Rongrong Ji
DiffM
89
9
0
16 May 2024
EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training
Yulin Wang
Yang Yue
Rui Lu
Yizeng Han
Shiji Song
Gao Huang
VLM
114
12
0
14 May 2024
Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains
Kyungeun Lee
Ye Seul Sim
Hye-Seung Cho
Moonjung Eo
Suhee Yoon
Sanghyu Yoon
Woohyung Lim
SSL
79
9
0
13 May 2024
Custom Gradient Estimators are Straight-Through Estimators in Disguise
Matt Schoenbauer
Daniele Moro
Lukasz Lew
Andrew G. Howard
MQ
81
4
0
08 May 2024
AB-Training: A Communication-Efficient Approach for Distributed Low-Rank Learning
D. Coquelin
Katherina Flügel
Marie Weiel
Nicholas Kiefer
Muhammed Öz
Charlotte Debus
Achim Streit
Markus Goetz
84
0
0
02 May 2024
Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning
Seyed Mahmoud Sajjadi Mohammadabadi
Lei Yang
Feng Yan
Junshan Zhang
69
7
0
01 May 2024
"Ask Me Anything": How Comcast Uses LLMs to Assist Agents in Real Time
Scott Rome
Tianwen Chen
Raphael Tang
Luwei Zhou
Ferhan Ture
26
3
0
01 May 2024
High dimensional analysis reveals conservative sharpening and a stochastic edge of stability
Atish Agarwala
Jeffrey Pennington
112
4
0
30 Apr 2024
Empirical Analysis of Dialogue Relation Extraction with Large Language Models
Guozheng Li
Zijie Xu
Ziyu Shang
Jiajun Liu
Ke Ji
Yikai Guo
95
2
0
27 Apr 2024
Grad Queue : A probabilistic framework to reinforce sparse gradients
Irfan Mohammad Al Hasib
83
0
0
25 Apr 2024
MalleTrain: Deep Neural Network Training on Unfillable Supercomputer Nodes
Xiaolong Ma
Feng Yan
Lei Yang
Ian Foster
M. Papka
Zhengchun Liu
R. Kettimuthu
LRM
41
7
0
24 Apr 2024
An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training
Jin Gao
Shubo Lin
Shaoru Wang
Yutong Kou
Zeming Li
Liang Li
Congxuan Zhang
Xiaoqin Zhang
Yizheng Wang
Weiming Hu
113
1
0
18 Apr 2024
STMixer: A One-Stage Sparse Action Detector
Tao Wu
Mengqing Cao
Ziteng Gao
Gangshan Wu
Limin Wang
83
0
0
15 Apr 2024
RF-Diffusion: Radio Signal Generation via Time-Frequency Diffusion
Guoxuan Chi
Zheng Yang
Chenshu Wu
Jingao Xu
Yuchong Gao
Yunhao Liu
Tony Xiao Han
DiffM
88
35
0
14 Apr 2024
PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices
Si Ung Noh
Junguk Hong
Chaemin Lim
Seong-Yeol Park
Jeehyun Kim
Hanjun Kim
Youngsok Kim
Jinho Lee
83
8
0
13 Apr 2024
Adapting LLaMA Decoder to Vision Transformer
Jiahao Wang
Wenqi Shao
Mengzhao Chen
Chengyue Wu
Yong Liu
Taiqiang Wu
Kaipeng Zhang
Songyang Zhang
Kai-xiang Chen
Ping Luo
MLLM
85
4
0
10 Apr 2024
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey
Feng Liang
Zhen Zhang
Haifeng Lu
Victor C. M. Leung
Yanyi Guo
Xiping Hu
GNN
103
8
0
09 Apr 2024
ApproxDARTS: Differentiable Neural Architecture Search with Approximate Multipliers
Michal Pinos
Lukás Sekanina
Vojtěch Mrázek
MQ
66
2
0
08 Apr 2024
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs
Donghyun Kim
Byeongho Heo
Dongyoon Han
85
17
0
28 Mar 2024
Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders
Alexandre Eymaël
Renaud Vandeghen
A. Cioppa
Silvio Giancola
Guohao Li
Marc Van Droogenbroeck
ViT
75
8
0
26 Mar 2024
On permutation-invariant neural networks
Masanari Kimura
Ryotaro Shimizu
Yuki Hirakawa
Ryosuke Goto
Yuki Saito
OOD
AAML
94
12
0
26 Mar 2024
Previous
1
2
3
4
5
...
40
41
42
Next