Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.00962
Cited By
v1
v2
v3
v4
v5 (latest)
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
1 April 2019
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1698★)
Papers citing
"Large Batch Optimization for Deep Learning: Training BERT in 76 minutes"
50 / 611 papers shown
Title
Pipeline Gradient-based Model Training on Analog In-memory Accelerators
Zhaoxian Wu
Quan-Wu Xiao
Tayfun Gokmen
H. Tsai
Kaoutar El Maghraoui
Tianyi Chen
74
1
0
19 Oct 2024
LecPrompt: A Prompt-based Approach for Logical Error Correction with CodeBERT
Zhenyu Xu
Victor S. Sheng
KELM
86
0
0
10 Oct 2024
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning
Siyuan Li
Juanxi Tian
Zedong Wang
Luyuan Zhang
Zicheng Liu
Weiyang Jin
Yang Liu
Baigui Sun
Stan Z. Li
95
0
0
08 Oct 2024
A second-order-like optimizer with adaptive gradient scaling for deep learning
Jérôme Bolte
Ryan Boustany
Edouard Pauwels
Andrei Purica
ODL
72
0
0
08 Oct 2024
Autoregressive Action Sequence Learning for Robotic Manipulation
Xinyu Zhang
Yuhan Liu
Haonan Chang
Liam Schramm
Abdeslam Boularias
161
17
0
04 Oct 2024
Face Forgery Detection with Elaborate Backbone
Zonghui Guo
Y. Liu
Jie Zhang
Haiyong Zheng
Shiguang Shan
AAML
CVBM
107
1
0
25 Sep 2024
RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning
Yinpei Dai
Jayjun Lee
Nima Fazeli
Joyce Chai
76
13
0
23 Sep 2024
Convergence of Sharpness-Aware Minimization Algorithms using Increasing Batch Size and Decaying Learning Rate
Hinata Harada
Hideaki Iiduka
65
1
0
16 Sep 2024
The AdEMAMix Optimizer: Better, Faster, Older
Matteo Pagliardini
Pierre Ablin
David Grangier
ODL
96
13
0
05 Sep 2024
Demystifying the Communication Characteristics for Distributed Transformer Models
Quentin G. Anthony
Benjamin Michalowicz
Jacob Hatef
Lang Xu
Mustafa Abduljabbar
Hari Subramoni
Hari Subramoni
D. Panda
AI4CE
52
2
0
19 Aug 2024
Narrowing the Focus: Learned Optimizers for Pretrained Models
Gus Kristiansen
Mark Sandler
A. Zhmoginov
Nolan Miller
Anirudh Goyal
Jihwan Lee
Max Vladymyrov
92
1
0
17 Aug 2024
What comes after transformers? -- A selective survey connecting ideas in deep learning
Johannes Schneider
AI4CE
125
2
0
01 Aug 2024
Variational Potential Flow: A Novel Probabilistic Framework for Energy-Based Generative Modelling
Junn Yong Loo
Michelle Adeline
Arghya Pal
Vishnu Monn Baskaran
Chee-Ming Ting
Raphaël C.-W. Phan
DiffM
83
0
0
21 Jul 2024
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Sherwin Bahmani
Ivan Skorokhodov
Aliaksandr Siarohin
Willi Menapace
Guocheng Qian
...
Chaoyang Wang
Jiaxu Zou
Andrea Tagliasacchi
David B. Lindell
Sergey Tulyakov
VGen
DiffM
207
50
0
17 Jul 2024
Scaling Law in Neural Data: Non-Invasive Speech Decoding with 175 Hours of EEG Data
Motoshige Sato
Kenichi Tomeoka
Ilya Horiguchi
Kai Arulkumaran
Ryota Kanai
Shuntaro Sasai
134
6
0
10 Jul 2024
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Ali Hatamizadeh
Jan Kautz
Mamba
160
74
0
10 Jul 2024
VoxAct-B: Voxel-Based Acting and Stabilizing Policy for Bimanual Manipulation
I-Chun Arthur Liu
Sicheng He
Daniel Seita
Gaurav Sukhatme
LM&Ro
93
13
0
04 Jul 2024
FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources
Xiyuan Wei
Fanjiang Ye
Ori Yonay
Xingyu Chen
Baixi Sun
Dingwen Tao
Tianbao Yang
VLM
CLIP
139
2
0
01 Jul 2024
On Scaling Up 3D Gaussian Splatting Training
Hexu Zhao
Haoyang Weng
Daohan Lu
Ang Li
Jinyang Li
Aurojit Panda
Saining Xie
3DGS
86
16
0
26 Jun 2024
3D-MVP: 3D Multiview Pretraining for Robotic Manipulation
Shengyi Qian
Kaichun Mo
Valts Blukis
David Fouhey
Dieter Fox
Ankit Goyal
85
3
0
26 Jun 2024
Adam-mini: Use Fewer Learning Rates To Gain More
Yushun Zhang
Congliang Chen
Ziniu Li
Tian Ding
Chenwei Wu
Yinyu Ye
Zhi-Quan Luo
Ruoyu Sun
141
58
0
24 Jun 2024
Large Batch Analysis for Adagrad Under Anisotropic Smoothness
Yuxing Liu
Boyao Wang
Tong Zhang
72
6
0
21 Jun 2024
Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
92
1
0
20 Jun 2024
Federating to Grow Transformers with Constrained Resources without Model Sharing
Shikun Shen
Yifei Zou
Yuan Yuan
Yanwei Zheng
Peng Li
Xiuzhen Cheng
Dongxiao Yu
84
0
0
19 Jun 2024
Contrastive Imitation Learning for Language-guided Multi-Task Robotic Manipulation
Teli Ma
Jiaming Zhou
Zifan Wang
Ronghe Qiu
Junwei Liang
90
11
0
14 Jun 2024
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Ivan Skorokhodov
Willi Menapace
Aliaksandr Siarohin
Sergey Tulyakov
VGen
79
10
0
12 Jun 2024
Population Transformer: Learning Population-level Representations of Neural Activity
Geeling Chau
Christopher Wang
Sabera Talukder
Vighnesh Subramaniam
Saraswati Soedarmadji
Yisong Yue
Boris Katz
Andrei Barbu
MedIm
164
6
0
05 Jun 2024
Don't drop your samples! Coherence-aware training benefits Conditional diffusion
Nicolas Dufour
Victor Besnier
Vicky Kalogeiton
David Picard
DiffM
141
2
0
30 May 2024
Multi-Modal Generative Embedding Model
Feipeng Ma
Hongwei Xue
Guangting Wang
Yizhou Zhou
Fengyun Rao
Shilin Yan
Yueyi Zhang
Siying Wu
Mike Zheng Shou
Xiaoyan Sun
VLM
71
4
0
29 May 2024
Enhancing Vision-Language Model with Unmasked Token Alignment
Jihao Liu
Jinliang Zheng
Boxiao Liu
Yu Liu
Hongsheng Li
CLIP
56
0
0
29 May 2024
Full-Stack Allreduce on Multi-Rail Networks
Enda Yu
Dezun Dong
Xiangke Liao
GNN
77
0
0
28 May 2024
Interpretable Robotic Manipulation from Language
Boyuan Zheng
Jianlong Zhou
Fang Chen
LM&Ro
85
0
0
27 May 2024
Integrating GNN and Neural ODEs for Estimating Two-Body Interactions in Mixed-Species Collective Motion
Masahito Uwamichi
S. Schnyder
Tetsuya J. Kobayashi
Satoshi Sawai
62
0
0
26 May 2024
AdaFisher: Adaptive Second Order Optimization via Fisher Information
Damien Martins Gomes
Yanlei Zhang
Eugene Belilovsky
Guy Wolf
Mahdi S. Hosseini
ODL
220
3
0
26 May 2024
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression
Vladimir Malinovskii
Denis Mazur
Ivan Ilin
Denis Kuznedelev
Konstantin Burlachenko
Kai Yi
Dan Alistarh
Peter Richtárik
MQ
118
24
0
23 May 2024
Neural Pfaffians: Solving Many Many-Electron Schrödinger Equations
Nicholas Gao
Stephan Günnemann
90
5
0
23 May 2024
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Shuaipeng Li
Penghao Zhao
Hailin Zhang
Xingwu Sun
Hao Wu
...
Zheng Fang
Jinbao Xue
Yangyu Tao
Tengjiao Wang
Di Wang
98
9
0
23 May 2024
Towards Retrieval-Augmented Architectures for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Alessandro Nicolosi
Rita Cucchiara
VLM
95
12
0
21 May 2024
Keep It Private: Unsupervised Privatization of Online Text
Calvin Bao
Marine Carpuat
DeLMO
89
3
0
16 May 2024
Random Scaling and Momentum for Non-smooth Non-convex Optimization
Qinzi Zhang
Ashok Cutkosky
67
4
0
16 May 2024
EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training
Yulin Wang
Yang Yue
Rui Lu
Yizeng Han
Shiji Song
Gao Huang
VLM
117
12
0
14 May 2024
Reinformer: Max-Return Sequence Modeling for Offline RL
Zifeng Zhuang
Dengyun Peng
Jinxin Liu
Ziqi Zhang
Donglin Wang
OffRL
AI4TS
111
14
0
14 May 2024
Achieving Resolution-Agnostic DNN-based Image Watermarking:A Novel Perspective of Implicit Neural Representation
Yuchen Wang
Xin-Shan Zhu
Guanhui Ye
Shiyao Zhang
Xuetao Wei
AAML
138
0
0
14 May 2024
WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights
Youngdong Jang
Dong In Lee
MinHyuk Jang
Jong Wook Kim
Feng Yang
Sangpil Kim
119
15
0
03 May 2024
On Replacing Cryptopuzzles with Useful Computation in Blockchain Proof-of-Work Protocols
Andrea Merlina
Thiago Garrett
Roman Vitenberg
72
0
0
24 Apr 2024
GhostNetV3: Exploring the Training Strategies for Compact Models
Zhenhua Liu
Zhiwei Hao
Kai Han
Yehui Tang
Yunhe Wang
78
17
0
17 Apr 2024
Homography Guided Temporal Fusion for Road Line and Marking Segmentation
Shan Wang
Chuong H. Nguyen
Jiawei Liu
Kaihao Zhang
Wenhan Luo
Yanhao Zhang
Sundaram Muthu
F. A. Maken
Hongdong Li
67
5
0
11 Apr 2024
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey
Feng Liang
Zhen Zhang
Haifeng Lu
Victor C. M. Leung
Yanyi Guo
Xiping Hu
GNN
103
8
0
09 Apr 2024
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
Boyao Wang
Xiang Liu
Shizhe Diao
Renjie Pi
Jipeng Zhang
Chi Han
Tong Zhang
106
55
0
26 Mar 2024
All-in-One: Heterogeneous Interaction Modeling for Cold-Start Rating Prediction
Shuheng Fang
Kangfei Zhao
Yu Rong
Zhixun Li
Jeffrey Xu Yu
103
0
0
26 Mar 2024
Previous
1
2
3
4
5
...
11
12
13
Next