Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.00962
Cited By
v1
v2
v3
v4
v5 (latest)
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
1 April 2019
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1698★)
Papers citing
"Large Batch Optimization for Deep Learning: Training BERT in 76 minutes"
50 / 611 papers shown
Title
Conciseness: An Overlooked Language Task
Felix Stahlberg
Aashish Kumar
Chris Alberti
Shankar Kumar
47
1
0
08 Nov 2022
MogaNet: Multi-order Gated Aggregation Network
Siyuan Li
Zedong Wang
Zicheng Liu
Cheng Tan
Haitao Lin
Di Wu
Zhiyuan Chen
Jiangbin Zheng
Stan Z. Li
112
65
0
07 Nov 2022
Adaptive Compression for Communication-Efficient Distributed Training
Maksim Makarenko
Elnur Gasanov
Rustem Islamov
Abdurakhmon Sadiev
Peter Richtárik
127
16
0
31 Oct 2022
MetaFormer Baselines for Vision
Weihao Yu
Chenyang Si
Pan Zhou
Mi Luo
Yichen Zhou
Jiashi Feng
Shuicheng Yan
Xinchao Wang
MoE
110
171
0
24 Oct 2022
TridentSE: Guiding Speech Enhancement with 32 Global Tokens
Dacheng Yin
Zhiyuan Zhao
Chuanxin Tang
Zhiwei Xiong
Chong Luo
97
17
0
24 Oct 2022
Amos: An Adam-style Optimizer with Adaptive Weight Decay towards Model-Oriented Scale
Ran Tian
Ankur P. Parikh
ODL
88
6
0
21 Oct 2022
Solving Reasoning Tasks with a Slot Transformer
Ryan Faulkner
Daniel Zoran
LRM
56
1
0
20 Oct 2022
Large-batch Optimization for Dense Visual Predictions
Zeyue Xue
Jianming Liang
Guanglu Song
Zhuofan Zong
Liang Chen
Yu Liu
Ping Luo
VLM
98
9
0
20 Oct 2022
lo-fi: distributed fine-tuning without communication
Mitchell Wortsman
Suchin Gururangan
Shen Li
Ali Farhadi
Ludwig Schmidt
Michael G. Rabbat
Ari S. Morcos
115
24
0
19 Oct 2022
Perceptual Grouping in Contrastive Vision-Language Models
Kanchana Ranasinghe
Brandon McKinzie
S. S. Ravi
Yinfei Yang
Alexander Toshev
Jonathon Shlens
VLM
142
55
0
18 Oct 2022
Learning image representations for anomaly detection: application to discovery of histological alterations in drug development
I. Zingman
B. Stierstorfer
C. Lempp
Fabian Heinemann
OOD
MedIm
101
12
0
14 Oct 2022
Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer
Yanjing Li
Sheng Xu
Baochang Zhang
Xianbin Cao
Penglei Gao
Guodong Guo
MQ
ViT
111
95
0
13 Oct 2022
Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories
Qinqing Zheng
Mikael Henaff
Brandon Amos
Aditya Grover
OffRL
95
22
0
12 Oct 2022
VER: Scaling On-Policy RL Leads to the Emergence of Navigation in Embodied Rearrangement
Erik Wijmans
Irfan Essa
Dhruv Batra
OffRL
120
14
0
11 Oct 2022
That Sounds Right: Auditory Self-Supervision for Dynamic Robot Manipulation
Abitha Thankaraj
Lerrel Pinto
73
17
0
03 Oct 2022
CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-training
Tianyu Huang
Bowen Dong
Yunhan Yang
Xiaoshui Huang
Rynson W. H. Lau
Wanli Ouyang
W. Zuo
VLM
3DPC
CLIP
148
150
0
03 Oct 2022
SPARC: Sparse Render-and-Compare for CAD model alignment in a single RGB image
Florian Langer
Gwangbin Bae
Ignas Budvytis
R. Cipolla
3DPC
92
12
0
03 Oct 2022
A patch-based architecture for multi-label classification from single label annotations
Warren Jouanneau
Aurélie Bugeau
Marc Palyart
Nicolas Papadakis
Laurent Vézard
85
0
0
14 Sep 2022
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Mohit Shridhar
Lucas Manuelli
Dieter Fox
LM&Ro
304
501
0
12 Sep 2022
On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models
Rohan Anil
S. Gadanho
Danya Huang
Nijith Jacob
Zhuoshu Li
...
Cristina Pop
Kevin Regan
G. Shamir
Rakesh Shivanna
Qiqi Yan
3DV
96
42
0
12 Sep 2022
Simple and Effective Gradient-Based Tuning of Sequence-to-Sequence Models
Jared Lichtarge
Chris Alberti
Shankar Kumar
88
4
0
10 Sep 2022
Blessing of Class Diversity in Pre-training
Yulai Zhao
Jianshu Chen
S. Du
AI4CE
100
3
0
07 Sep 2022
Persistence Initialization: A novel adaptation of the Transformer architecture for Time Series Forecasting
Espen Haugsdal
Erlend Aune
M. Ruocco
AI4TS
AI4CE
32
17
0
30 Aug 2022
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Xingyu Xie
Pan Zhou
Huan Li
Zhouchen Lin
Shuicheng Yan
ODL
98
170
0
13 Aug 2022
Class-Incremental Learning with Cross-Space Clustering and Controlled Transfer
Arjun Ashok
K. J. Joseph
V. Balasubramanian
CLL
61
29
0
07 Aug 2022
LATTE: LAnguage Trajectory TransformEr
A. Bucker
Luis F. C. Figueredo
Sami Haddadin
Ashish Kapoor
Shuang Ma
Sai H. Vemprala
Rogerio Bonatti
LM&Ro
146
59
0
04 Aug 2022
CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning
Mahdi Saleh
Yige Wang
Nassir Navab
Benjamin Busam
F. Tombari
3DPC
69
4
0
31 Jul 2022
Retrieval-Augmented Transformer for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
90
59
0
26 Jul 2022
Analysis and Optimization of GNN-Based Recommender Systems on Persistent Memory
Yuwei Hu
Jiajie Li
Zhongming Yu
Zhiru Zhang
GNN
83
0
0
25 Jul 2022
Dive into Big Model Training
Qinghua Liu
Yuxiang Jiang
MoMe
AI4CE
LRM
41
3
0
25 Jul 2022
Training Transformers Together
Alexander Borzunov
Max Ryabinin
Tim Dettmers
Quentin Lhoest
Lucile Saulnier
Michael Diskin
Yacine Jernite
Thomas Wolf
ViT
63
10
0
07 Jul 2022
Lightweight Encoder-Decoder Architecture for Foot Ulcer Segmentation
Shahzad Ali
Arif Mahmood
Soon Ki Jung
MedIm
39
8
0
06 Jul 2022
Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning
Lin Zhang
Shaoshuai Shi
Wei Wang
Yue Liu
72
10
0
30 Jun 2022
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
Dacheng Yin
Chuanxin Tang
Yanqing Liu
Xiaoqiang Wang
Zhiyuan Zhao
Yucheng Zhao
Zhiwei Xiong
Sheng Zhao
Chong Luo
83
12
0
28 Jun 2022
AutoInit: Automatic Initialization via Jacobian Tuning
Tianyu He
Darshil Doshi
Andrey Gromov
68
4
0
27 Jun 2022
Deep Learning Models on CPUs: A Methodology for Efficient Training
Quchen Fu
Ramesh Chukka
Keith Achorn
Thomas Atta-fosu
Deepak R. Canchi
Zhongwei Teng
Jules White
Douglas C. Schmidt
70
2
0
20 Jun 2022
Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger
Zhiqi Bu
Yu Wang
Sheng Zha
George Karypis
147
72
0
14 Jun 2022
Distributed Adversarial Training to Robustify Deep Neural Networks at Scale
Gaoyuan Zhang
Songtao Lu
Yihua Zhang
Xiangyi Chen
Pin-Yu Chen
Quanfu Fan
Lee Martie
L. Horesh
Min-Fong Hong
Sijia Liu
OOD
84
12
0
13 Jun 2022
Multi-user Co-inference with Batch Processing Capable Edge Server
Wenqi Shi
Sheng Zhou
Z. Niu
Miao Jiang
Lu Geng
117
26
0
03 Jun 2022
Positive Unlabeled Contrastive Learning
Anish Acharya
Sujay Sanghavi
Li Jing
Bhargav Bhushanam
Dhruv Choudhary
Michael G. Rabbat
Inderjit Dhillon
SSL
65
11
0
01 Jun 2022
Dataset Distillation using Neural Feature Regression
Yongchao Zhou
E. Nezhadarya
Jimmy Ba
DD
FedML
130
161
0
01 Jun 2022
Hopular: Modern Hopfield Networks for Tabular Data
Bernhard Schafl
Lukas Gruber
Angela Bitto-Nemling
Sepp Hochreiter
LMTD
71
29
0
01 Jun 2022
Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top
Eduard A. Gorbunov
Samuel Horváth
Peter Richtárik
Gauthier Gidel
AAML
63
0
0
01 Jun 2022
Efficient-Adam: Communication-Efficient Distributed Adam
Congliang Chen
Li Shen
Wei Liu
Zhi-Quan Luo
58
20
0
28 May 2022
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN
Siyuan Li
Di Wu
Fang Wu
Lei Shang
Stan.Z.Li
84
50
0
27 May 2022
TransBoost: Improving the Best ImageNet Performance using Deep Transduction
Omer Belhasin
Guy Bar-Shalom
Ran El-Yaniv
ViT
120
3
0
26 May 2022
Trainable Weight Averaging: Accelerating Training and Improving Generalization
Tao Li
Zhehao Huang
Yingwen Wu
Zhengbao He
Qinghua Tao
Xiaolin Huang
Chih-Jen Lin
MoMe
119
3
0
26 May 2022
Amortized Inference for Causal Structure Learning
Lars Lorch
Scott Sussex
Jonas Rothfuss
Andreas Krause
Bernhard Schölkopf
CML
130
65
0
25 May 2022
GBA: A Tuning-free Approach to Switch between Synchronous and Asynchronous Training for Recommendation Model
Wenbo Su
Yuanxing Zhang
Yufeng Cai
Kaixu Ren
Pengjie Wang
...
Jing Chen
Hongbo Deng
Jian Xu
Lin Qu
Bo Zheng
66
5
0
23 May 2022
Life after BERT: What do Other Muppets Understand about Language?
Vladislav Lialin
Kevin Zhao
Namrata Shivagunde
Anna Rumshisky
110
6
0
21 May 2022
Previous
1
2
3
...
6
7
8
...
11
12
13
Next