ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.00962
  4. Cited By
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
v1v2v3v4v5 (latest)

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

1 April 2019
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
    ODL
ArXiv (abs)PDFHTMLGithub (1698★)

Papers citing "Large Batch Optimization for Deep Learning: Training BERT in 76 minutes"

50 / 611 papers shown
Title
A Spectral Condition for Feature Learning
A Spectral Condition for Feature Learning
Greg Yang
James B. Simon
Jeremy Bernstein
126
33
0
26 Oct 2023
Transferring a molecular foundation model for polymer property
  predictions
Transferring a molecular foundation model for polymer property predictions
Pei Zhang
Logan T. Kearney
D. Bhowmik
Zachary R. Fox
Amit K. Naskar
John P. Gounley
AI4CE
61
7
0
25 Oct 2023
A Unified, Scalable Framework for Neural Population Decoding
A Unified, Scalable Framework for Neural Population Decoding
Mehdi Azabou
Vinam Arora
Venkataramana Ganesh
Ximeng Mao
Santosh Nachimuthu
Michael J. Mendelson
Blake A. Richards
M. Perich
Guillaume Lajoie
Eva L. Dyer
HAIAI4TS
85
42
0
24 Oct 2023
Projected Stochastic Gradient Descent with Quantum Annealed Binary
  Gradients
Projected Stochastic Gradient Descent with Quantum Annealed Binary Gradients
Maximilian Krahn
Michele Sasdelli
Fengyi Yang
Vladislav Golyanik
Arno Solin
Tat-Jun Chin
Tolga Birdal
MQ
179
2
0
23 Oct 2023
A Quadratic Synchronization Rule for Distributed Deep Learning
A Quadratic Synchronization Rule for Distributed Deep Learning
Xinran Gu
Kaifeng Lyu
Sanjeev Arora
Jingzhao Zhang
Longbo Huang
99
1
0
22 Oct 2023
Sparse Multi-Object Render-and-Compare
Sparse Multi-Object Render-and-Compare
Florian Langer
Ignas Budvytis
Roberto Cipolla
3DPC
66
2
0
17 Oct 2023
MeKB-Rec: Personal Knowledge Graph Learning for Cross-Domain
  Recommendation
MeKB-Rec: Personal Knowledge Graph Learning for Cross-Domain Recommendation
Xin Su
Yao Zhou
Zifei Shan
Qian Chen
65
0
0
17 Oct 2023
Reusing Pretrained Models by Multi-linear Operators for Efficient
  Training
Reusing Pretrained Models by Multi-linear Operators for Efficient Training
Yu Pan
Ye Yuan
Yichun Yin
Zenglin Xu
Lifeng Shang
Xin Jiang
Qun Liu
95
17
0
16 Oct 2023
Attention-Map Augmentation for Hypercomplex Breast Cancer Classification
Attention-Map Augmentation for Hypercomplex Breast Cancer Classification
Eleonora Lopez
Filippo Betello
Federico Carmignani
Eleonora Grassucci
Danilo Comminiello
75
13
0
11 Oct 2023
Sample-Driven Federated Learning for Energy-Efficient and Real-Time IoT
  Sensing
Sample-Driven Federated Learning for Energy-Efficient and Real-Time IoT Sensing
Minh Ngoc Luu
Minh-Duong Nguyen
E. Bedeer
Van Duc Nguyen
D. Hoang
Diep N. Nguyen
Quoc-Viet Pham
67
3
0
11 Oct 2023
Rethinking Memory and Communication Cost for Efficient Large Language
  Model Training
Rethinking Memory and Communication Cost for Efficient Large Language Model Training
Chan Wu
Hanxiao Zhang
Lin Ju
Jinjing Huang
Youshao Xiao
...
Siyuan Li
Fanzhuang Meng
Lei Liang
Xiaolu Zhang
Jun Zhou
57
4
0
09 Oct 2023
Can Pre-trained Networks Detect Familiar Out-of-Distribution Data?
Can Pre-trained Networks Detect Familiar Out-of-Distribution Data?
Atsuyuki Miyai
Qing Yu
Go Irie
Kiyoharu Aizawa
OODD
211
6
0
02 Oct 2023
AI ensemble for signal detection of higher order gravitational wave
  modes of quasi-circular, spinning, non-precessing binary black hole mergers
AI ensemble for signal detection of higher order gravitational wave modes of quasi-circular, spinning, non-precessing binary black hole mergers
Minyang Tian
Eliu A. Huerta
Huihuo Zheng
55
0
0
29 Sep 2023
Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers and Gradient Clipping
Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers and Gradient Clipping
Martin Pelikan
Sheikh Shams Azam
Vitaly Feldman
Jan Honza Silovsky
Kunal Talwar
Christopher G. Brinton
Tatiana Likhomanenko
113
8
0
29 Sep 2023
Homotopy Relaxation Training Algorithms for Infinite-Width Two-Layer
  ReLU Neural Networks
Homotopy Relaxation Training Algorithms for Infinite-Width Two-Layer ReLU Neural Networks
Yahong Yang
Qipin Chen
Wenrui Hao
72
5
0
26 Sep 2023
Transformer-based classification of user queries for medical consultancy
  with respect to expert specialization
Transformer-based classification of user queries for medical consultancy with respect to expert specialization
Dmitry Lyutkin
A. Soloviev
Dmitry V. Zhukov
Denis Pozdnyakov
Muhammad Shahid Iqbal Malik
D. Ignatov
MedIm
71
0
0
26 Sep 2023
Distortion Resilience for Goal-Oriented Semantic Communication
Distortion Resilience for Goal-Oriented Semantic Communication
Minh-Duong Nguyen
Quang-Vinh Do
Zhaohui Yang
Quoc-Viet Pham
Won Joo Hwang
53
1
0
26 Sep 2023
Revisiting LARS for Large Batch Training Generalization of Neural
  Networks
Revisiting LARS for Large Batch Training Generalization of Neural Networks
K. Do
Duong Nguyen
Hoa Nguyen
Long Tran-Thanh
Nguyen-Hoang Tran
Quoc-Viet Pham
AI4CEODL
91
1
0
25 Sep 2023
Audio classification with Dilated Convolution with Learnable Spacings
Audio classification with Dilated Convolution with Learnable Spacings
Ismail Khalfaoui-Hassani
T. Masquelier
Thomas Pellegrini
78
1
0
25 Sep 2023
Accelerating Large Batch Training via Gradient Signal to Noise Ratio
  (GSNR)
Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)
Guo-qing Jiang
Jinlong Liu
Zixiang Ding
Lin Guo
W. Lin
AI4CE
61
2
0
24 Sep 2023
Investigating Efficient Deep Learning Architectures For Side-Channel
  Attacks on AES
Investigating Efficient Deep Learning Architectures For Side-Channel Attacks on AES
Yohai-Eliel Berreby
L. Sauvage
AAML
45
2
0
22 Sep 2023
Importance of Smoothness Induced by Optimizers in FL4ASR: Towards
  Understanding Federated Learning for End-to-End ASR
Importance of Smoothness Induced by Optimizers in FL4ASR: Towards Understanding Federated Learning for End-to-End ASR
Sheikh Shams Azam
Tatiana Likhomanenko
Martin Pelikan
Jan Honza Silovsky
77
7
0
22 Sep 2023
Large-scale Pretraining Improves Sample Efficiency of Active Learning
  based Molecule Virtual Screening
Large-scale Pretraining Improves Sample Efficiency of Active Learning based Molecule Virtual Screening
Zhonglin Cao
Simone Sciabola
Ye Wang
84
1
0
20 Sep 2023
Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism
Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism
Chengcheng Wang
Wei He
Ying Nie
Jianyuan Guo
Chuanjian Liu
Kai Han
Yunhe Wang
ObjD
131
245
0
20 Sep 2023
Heterogeneous Generative Knowledge Distillation with Masked Image
  Modeling
Heterogeneous Generative Knowledge Distillation with Masked Image Modeling
Ziming Wang
Shumin Han
Xiaodi Wang
Jing Hao
Xianbin Cao
Baochang Zhang
VLM
74
0
0
18 Sep 2023
A Distributed Data-Parallel PyTorch Implementation of the Distributed
  Shampoo Optimizer for Training Neural Networks At-Scale
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Hao-Jun Michael Shi
Tsung-Hsien Lee
Shintaro Iwasaki
Jose Gallego-Posada
Zhijing Li
Kaushik Rangadurai
Dheevatsa Mudigere
Michael Rabbat
ODL
100
27
0
12 Sep 2023
PolyGET: Accelerating Polymer Simulations by Accurate and Generalizable
  Forcefield with Equivariant Transformer
PolyGET: Accelerating Polymer Simulations by Accurate and Generalizable Forcefield with Equivariant Transformer
Rui Feng
Huan Tran
Aubrey Toland
Binghong Chen
Qi Zhu
R. Ramprasad
Chao Zhang
31
1
0
01 Sep 2023
GNFactor: Multi-Task Real Robot Learning with Generalizable Neural
  Feature Fields
GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields
Yanjie Ze
Ge Yan
Yueh-hua Wu
Annabella Macaluso
Yuying Ge
Jianglong Ye
Nicklas Hansen
Li Erran Li
Xinyu Wang
DiffMAI4CE
112
86
0
31 Aug 2023
Multi-Objective Decision Transformers for Offline Reinforcement Learning
Multi-Objective Decision Transformers for Offline Reinforcement Learning
Abdelghani Ghanem
P. Ciblat
Mounir Ghogho
OffRL
63
1
0
31 Aug 2023
Breaking Boundaries: Distributed Domain Decomposition with Scalable
  Physics-Informed Neural PDE Solvers
Breaking Boundaries: Distributed Domain Decomposition with Scalable Physics-Informed Neural PDE Solvers
Arthur Feeney
Zitong Li
Ramin Bostanabad
Aparna Chandramowlishwaran
AI4CE
58
1
0
28 Aug 2023
With a Little Help from your own Past: Prototypical Memory Networks for
  Image Captioning
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
92
20
0
23 Aug 2023
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive
  Language-Image Pre-training
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training
Xi Deng
Han Shi
Runhu Huang
Changlin Li
Hang Xu
Jianhua Han
James T. Kwok
Shen Zhao
Wei Zhang
Xiaodan Liang
CLIPVLM
95
3
0
22 Aug 2023
CoNe: Contrast Your Neighbours for Supervised Image Classification
CoNe: Contrast Your Neighbours for Supervised Image Classification
Mingkai Zheng
Shan You
Lang Huang
Xiu Su
Fei Wang
Chao Qian
Xiaogang Wang
Chang Xu
VLM
65
0
0
21 Aug 2023
Label-Free Event-based Object Recognition via Joint Learning with Image
  Reconstruction from Events
Label-Free Event-based Object Recognition via Joint Learning with Image Reconstruction from Events
Hoonhee Cho
Hyeonseong Kim
Yujeong Chae
Kuk-Jin Yoon
VLM
69
24
0
18 Aug 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
150
4
0
18 Aug 2023
Spanish Pre-trained BERT Model and Evaluation Data
Spanish Pre-trained BERT Model and Evaluation Data
J. Cañete
Gabriel Chaperon
Rodrigo Fuentes
Jou-Hui Ho
Hojin Kang
Jorge Pérez
102
667
0
06 Aug 2023
The Marginal Value of Momentum for Small Learning Rate SGD
The Marginal Value of Momentum for Small Learning Rate SGD
Runzhe Wang
Sadhika Malladi
Tianhao Wang
Kaifeng Lyu
Zhiyuan Li
ODL
92
9
0
27 Jul 2023
Controlling the Inductive Bias of Wide Neural Networks by Modifying the
  Kernel's Spectrum
Controlling the Inductive Bias of Wide Neural Networks by Modifying the Kernel's Spectrum
Amnon Geifman
Daniel Barzilai
Ronen Basri
Meirav Galun
89
6
0
26 Jul 2023
How to Scale Your EMA
How to Scale Your EMA
Dan Busbridge
Jason Ramapuram
Pierre Ablin
Tatiana Likhomanenko
Eeshan Gunesh Dhekane
Xavier Suau
Russ Webb
82
19
0
25 Jul 2023
E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning
E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning
Cheng Han
Qifan Wang
Yiming Cui
Zhiwen Cao
Wenguan Wang
Siyuan Qi
Dongfang Liu
VPVLMVLM
92
55
0
25 Jul 2023
M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models
  and Latent Space Geometry Optimization
M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization
Che Liu
Sibo Cheng
Chong Chen
Mengyun Qiao
Weitong Zhang
Anand Shah
Wenjia Bai
Rossella Arcucci
VLM
124
58
0
17 Jul 2023
Image Captions are Natural Prompts for Text-to-Image Models
Image Captions are Natural Prompts for Text-to-Image Models
Shiye Lei
Hao Chen
Senyang Zhang
Bo Zhao
Dacheng Tao
VLM
115
23
0
17 Jul 2023
DreamTeacher: Pretraining Image Backbones with Deep Generative Models
DreamTeacher: Pretraining Image Backbones with Deep Generative Models
Daiqing Li
Huan Ling
Amlan Kar
David Acuna
Seung Wook Kim
Karsten Kreis
Antonio Torralba
Sanja Fidler
VLMDiffM
85
29
0
14 Jul 2023
Training Physics-Informed Neural Networks via Multi-Task Optimization
  for Traffic Density Prediction
Training Physics-Informed Neural Networks via Multi-Task Optimization for Traffic Density Prediction
Bo Wang
•. A. K. Qin
S. Shafiei
Hussein Dia
Adriana-Simona Mihaita
Hanna Grzybowska
PINNAI4CE
75
2
0
08 Jul 2023
URL: A Representation Learning Benchmark for Transferable Uncertainty
  Estimates
URL: A Representation Learning Benchmark for Transferable Uncertainty Estimates
Michael Kirchhof
Bálint Mucsányi
Seong Joon Oh
Enkelejda Kasneci
UQCV
502
15
0
07 Jul 2023
CAME: Confidence-guided Adaptive Memory Efficient Optimization
CAME: Confidence-guided Adaptive Memory Efficient Optimization
Yang Luo
Xiaozhe Ren
Zangwei Zheng
Zhuo Jiang
Xin Jiang
Yang You
ODL
94
22
0
05 Jul 2023
An Efficient General-Purpose Modular Vision Model via Multi-Task
  Heterogeneous Training
An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training
Z. Chen
Mingyu Ding
Songlin Yang
Wei Zhan
Masayoshi Tomizuka
Erik Learned-Miller
Chuang Gan
MoE
67
8
0
29 Jun 2023
Training Deep Surrogate Models with Large Scale Online Learning
Training Deep Surrogate Models with Large Scale Online Learning
Lucas Meyer
M. Schouler
R. Caulk
Alejandro Ribés
Bruno Raffin
3DGSAI4CE
91
5
0
28 Jun 2023
RVT: Robotic View Transformer for 3D Object Manipulation
RVT: Robotic View Transformer for 3D Object Manipulation
Ankit Goyal
Jie Xu
Yijie Guo
Valts Blukis
Yu-Wei Chao
Dieter Fox
LM&Ro
139
144
0
26 Jun 2023
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Kuan-Fu Ding
Jingyang Li
Kim-Chuan Toh
142
8
0
26 Jun 2023
Previous
12345...111213
Next