Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.00962
Cited By
v1
v2
v3
v4
v5 (latest)
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
1 April 2019
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1698★)
Papers citing
"Large Batch Optimization for Deep Learning: Training BERT in 76 minutes"
50 / 611 papers shown
Title
ParFormer: Vision Transformer Baseline with Parallel Local Global Token Mixer and Convolution Attention Patch Embedding
Novendra Setyawan
Ghufron Wahyu Kurniawan
Chi-Chia Sun
Jun-Wei Hsieh
Hui-Kai Su
W. Kuo
ViT
MoE
94
0
0
22 Mar 2024
PETScML: Second-order solvers for training regression problems in Scientific Machine Learning
Stefano Zampini
Umberto Zerbinati
George Turkyyiah
David E. Keyes
67
5
0
18 Mar 2024
VIHE: Virtual In-Hand Eye Transformer for 3D Robotic Manipulation
Weiyao Wang
Yutian Lei
Shiyu Jin
Gregory D. Hager
Liangjun Zhang
92
3
0
18 Mar 2024
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
Guanxing Lu
Shiyi Zhang
Ziwei Wang
Changliu Liu
Jiwen Lu
Yansong Tang
117
57
0
13 Mar 2024
Intra-video Positive Pairs in Self-Supervised Learning for Ultrasound
Blake Vanberlo
Alexander Wong
Jesse Hoey
R. Arntfield
71
2
0
12 Mar 2024
A Tutorial on the Pretrain-Finetune Paradigm for Natural Language Processing
Yu Wang
Wen Qu
92
0
0
04 Mar 2024
Never-Ending Behavior-Cloning Agent for Robotic Manipulation
Wenqi Liang
Gan Sun
Qian He
Yu Ren
Jiahua Dong
Yang Cong
LM&Ro
90
1
0
01 Mar 2024
Pre-training Differentially Private Models with Limited Public Data
Zhiqi Bu
Xinwei Zhang
Mingyi Hong
Sheng Zha
George Karypis
121
4
0
28 Feb 2024
Stable LM 2 1.6B Technical Report
Marco Bellagente
J. Tow
Dakota Mahan
Duy Phung
Maksym Zhuravinskyi
...
Paulo Rocha
Harry Saini
H. Teufel
Niccoló Zanichelli
Carlos Riquelme
OSLM
109
58
0
27 Feb 2024
Towards Optimal Learning of Language Models
Yuxian Gu
Li Dong
Y. Hao
Qingxiu Dong
Minlie Huang
Furu Wei
106
7
0
27 Feb 2024
Pfeed: Generating near real-time personalized feeds using precomputed embedding similarities
B. Gebre
Karoliina Ranta
S. V. D. Elzen
Ernst Kuiper
Thijs Baars
Tom Heskes
82
1
0
25 Feb 2024
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Ziheng Jiang
Yanghua Peng
Yinmin Zhong
Qi Huang
Yangrui Chen
...
Zhe Li
X. Jia
Jia-jun Ye
Xin Jin
Xin Liu
LRM
126
124
0
23 Feb 2024
Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer
Yanjun Zhao
Sizhe Dang
Haishan Ye
Guang Dai
Yi Qian
Ivor W.Tsang
179
13
0
23 Feb 2024
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Willi Menapace
Aliaksandr Siarohin
Ivan Skorokhodov
Ekaterina Deyneka
Tsai-Shien Chen
...
Yuwei Fang
A. Stoliar
Elisa Ricci
Jian Ren
Sergey Tulyakov
VGen
136
62
0
22 Feb 2024
Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers
Markus Hiller
Krista A. Ehinger
Tom Drummond
118
4
0
19 Feb 2024
AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods
Tim Tsz-Kit Lau
Han Liu
Mladen Kolar
ODL
85
6
0
17 Feb 2024
Switch EMA: A Free Lunch for Better Flatness and Sharpness
Siyuan Li
Zicheng Liu
Juanxi Tian
Ge Wang
Zedong Wang
...
Cheng Tan
Tao Lin
Yang Liu
Baigui Sun
Stan Z. Li
66
6
0
14 Feb 2024
Feature learning as alignment: a structural property of gradient descent in non-linear neural networks
Daniel Beaglehole
Ioannis Mitliagkas
Atish Agarwala
MLT
99
2
0
07 Feb 2024
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Quan-Sen Sun
Jinsheng Wang
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Xinlong Wang
VLM
CLIP
MLLM
139
49
0
06 Feb 2024
Breaking MLPerf Training: A Case Study on Optimizing BERT
Yongdeok Kim
Jaehyung Ahn
Myeongwoo Kim
Changin Choi
Heejae Kim
...
Xiongzhan Linghu
Jingkun Ma
Lin Chen
Yuehua Dai
Sungjoo Yoo
65
0
0
04 Feb 2024
ALERT-Transformer: Bridging Asynchronous and Synchronous Machine Learning for Real-Time Event-based Spatio-Temporal Data
Carmen Martin-Turrero
Maxence Bouvier
Manuel Breitenstein
Pietro Zanuttigh
Vincent Parret
83
4
0
02 Feb 2024
Comparative Study of Large Language Model Architectures on Frontier
Shantia Yarahmadian
A. Bose
Guojing Cong
Richard Yamada
Quentin Anthony
ELM
83
7
0
01 Feb 2024
Making Parametric Anomaly Detection on Tabular Data Non-Parametric Again
Hugo Thimonier
Fabrice Popineau
Arpad Rimmel
Bich-Liên Doan
94
2
0
30 Jan 2024
M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining
Qingpei Guo
Furong Xu
Hanxiao Zhang
Wang Ren
Ziping Ma
Lin Ju
Jian Wang
Jingdong Chen
Ming Yang
VLM
MLLM
68
3
0
29 Jan 2024
TraKDis: A Transformer-based Knowledge Distillation Approach for Visual Reinforcement Learning with Application to Cloth Manipulation
Wei Chen
Nicolás Rojas
98
7
0
24 Jan 2024
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Kaan Ozkara
Can Karakus
Parameswaran Raman
Mingyi Hong
Shoham Sabach
Branislav Kveton
Volkan Cevher
101
4
0
17 Jan 2024
GD doesn't make the cut: Three ways that non-differentiability affects neural network training
Siddharth Krishna Kumar
AAML
81
3
0
16 Jan 2024
(Accelerated) Noise-adaptive Stochastic Heavy-Ball Momentum
Anh Dang
Reza Babanezhad
Sharan Vaswani
67
0
0
12 Jan 2024
FourCastNeXt: Optimizing FourCastNet Training for Limited Compute
Edison Guo
Maruf Ahmed
Yue Sun
Rui Yang
Harrison Cook
Tennessee Leeuwenburg
Ben Evans
45
1
0
10 Jan 2024
Robust Calibration For Improved Weather Prediction Under Distributional Shift
Sankalp Gilda
Neel Bhandari
Wendy Mak
Andrea Panizza
UQCV
OOD
38
1
0
08 Jan 2024
Interpreting Adaptive Gradient Methods by Parameter Scaling for Learning-Rate-Free Optimization
Min-Kook Suh
Seung-Woo Seo
ODL
74
0
0
06 Jan 2024
Ravnest: Decentralized Asynchronous Training on Heterogeneous Devices
A. Menon
Unnikrishnan Menon
Kailash Ahirwar
66
1
0
03 Jan 2024
Noise-free Optimization in Early Training Steps for Image Super-Resolution
MinKyu Lee
Jae-Pil Heo
69
5
0
29 Dec 2023
Black-Box Tuning of Vision-Language Models with Effective Gradient Approximation
Zixian Guo
Yuxiang Wei
Ming-Yu Liu
Zhilong Ji
Jinfeng Bai
Yiwen Guo
Wangmeng Zuo
VLM
104
9
0
26 Dec 2023
Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise
Boyao Wang
Yuxing Liu
Xiaoyu Wang
Tong Zhang
49
5
0
22 Dec 2023
Critic-Guided Decision Transformer for Offline Reinforcement Learning
Yuanfu Wang
Chao Yang
Yinghong Wen
Yu Liu
Yu Qiao
OffRL
104
12
0
21 Dec 2023
XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX
Alexander Nikulin
Vladislav Kurenkov
Ilya Zisman
Artem Agarkov
Viacheslav Sinii
Sergey Kolesnikov
125
30
0
19 Dec 2023
Sentiment analysis in Tourism: Fine-tuning BERT or sentence embeddings concatenation?
Ibrahim Bouabdallaoui
Fatima Guerouate
Samya Bouhaddour
C. Saadi
Mohammed Sbihi
57
0
0
12 Dec 2023
RankMatch: A Novel Approach to Semi-Supervised Label Distribution Learning Leveraging Inter-label Correlations
Kouzhiqiang Yucheng Xie
Jing Wang
Yuheng Jia
Boyu Shi
Xin Geng
61
1
0
11 Dec 2023
Analyzing and Improving the Training Dynamics of Diffusion Models
Tero Karras
M. Aittala
J. Lehtinen
Janne Hellsten
Timo Aila
S. Laine
153
204
0
05 Dec 2023
Industrial Internet of Things Intelligence Empowering Smart Manufacturing: A Literature Review
Member Ieee Yujiao Hu
Qingmin Jia
Yuao Yao
Yong Lee
Mengjie Lee
Chenyi Wang
Xiaomao Zhou
Senior Member Ieee Renchao Xie
Feng Yu
76
44
0
02 Dec 2023
Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training
Yefan Zhou
Tianyu Pang
Keqin Liu
Charles H. Martin
Michael W. Mahoney
Yaoqing Yang
148
12
0
01 Dec 2023
Generalisable Agents for Neural Network Optimisation
Kale-ab Tessera
C. Tilbury
Sasha Abramowitz
Ruan de Kock
Omayma Mahjoub
Benjamin Rosman
Sara Hooker
Arnu Pretorius
AI4CE
74
0
0
30 Nov 2023
RETSim: Resilient and Efficient Text Similarity
Marina Zhang
Owen Vallis
Aysegul Bumin
Tanay Vakharia
Elie Bursztein
136
1
0
28 Nov 2023
Recognizing Conditional Causal Relationships about Emotions and Their Corresponding Conditions
Xinhong Chen
Zongxi Li
Yaowei Wang
Haoran Xie
Jianping Wang
Qing Li
49
0
0
28 Nov 2023
Model-aware 3D Eye Gaze from Weak and Few-shot Supervisions
Nikola Popovic
Dimitrios Christodoulou
D. Paudel
Xi Wang
Luc Van Gool
94
0
0
20 Nov 2023
Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling
Naoki Sato
Hideaki Iiduka
86
3
0
15 Nov 2023
ViR: Towards Efficient Vision Retention Backbones
Ali Hatamizadeh
Michael Ranzinger
Shiyi Lan
Jose M. Alvarez
Sanja Fidler
Jan Kautz
GNN
40
2
0
30 Oct 2023
Deep-learning-based decomposition of overlapping-sparse images: application at the vertex of neutrino interactions
Saúl Alonso-Monsalve
D. Sgalaberna
Xingyu Zhao
Adrien Molines
C. Mcgrew
A. Rubbia
70
0
0
30 Oct 2023
On the accuracy and efficiency of group-wise clipping in differentially private optimization
Zhiqi Bu
Ruixuan Liu
Yu Wang
Sheng Zha
George Karypis
VLM
78
4
0
30 Oct 2023
Previous
1
2
3
4
5
6
...
11
12
13
Next