Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.03194
Cited By
Why are Adaptive Methods Good for Attention Models?
6 December 2019
J.N. Zhang
Sai Praneeth Karimireddy
Andreas Veit
Seungyeon Kim
Sashank J. Reddi
Surinder Kumar
S. Sra
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Why are Adaptive Methods Good for Attention Models?"
18 / 18 papers shown
Title
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
31
47
0
02 Feb 2023
ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting
Shancheng Fang
Zhendong Mao
Hongtao Xie
Yuxin Wang
C. Yan
Yongdong Zhang
32
53
0
19 Nov 2022
On the Impossible Safety of Large AI Models
El-Mahdi El-Mhamdi
Sadegh Farhadkhani
R. Guerraoui
Nirupam Gupta
L. Hoang
Rafael Pinot
Sébastien Rouault
John Stephan
34
31
0
30 Sep 2022
Accelerated Federated Learning with Decoupled Adaptive Optimization
Jiayin Jin
Jiaxiang Ren
Yang Zhou
Lingjuan Lyu
Ji Liu
Dejing Dou
AI4CE
FedML
19
51
0
14 Jul 2022
A DNN Optimizer that Improves over AdaBelief by Suppression of the Adaptive Stepsize Range
Guoqiang Zhang
Kenta Niwa
W. Kleijn
ODL
16
2
0
24 Mar 2022
Extending AdamW by Leveraging Its Second Moment and Magnitude
Guoqiang Zhang
Niwa Kenta
W. Kleijn
6
3
0
09 Dec 2021
Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis
Jikai Jin
Samir Bhatt
Haiyang Wang
Liwei Wang
32
48
0
24 Oct 2021
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Hongwei Xue
Yupan Huang
Bei Liu
Houwen Peng
Jianlong Fu
Houqiang Li
Jiebo Luo
33
89
0
25 Jun 2021
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Zhicheng Huang
Zhaoyang Zeng
Yupan Huang
Bei Liu
Dongmei Fu
Jianlong Fu
VLM
ViT
51
271
0
07 Apr 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
44
78
0
24 Feb 2021
Learning from History for Byzantine Robust Optimization
Sai Praneeth Karimireddy
Lie He
Martin Jaggi
FedML
AAML
30
173
0
18 Dec 2020
Personalized Cross-Silo Federated Learning on Non-IID Data
Yutao Huang
Lingyang Chu
Zirui Zhou
Lanjun Wang
Jiangchuan Liu
J. Pei
Yong Zhang
FedML
20
591
0
07 Jul 2020
Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks
Umut Simsekli
Ozan Sener
George Deligiannidis
Murat A. Erdogdu
44
55
0
16 Jun 2020
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
Z. Yao
A. Gholami
Sheng Shen
Mustafa Mustafa
Kurt Keutzer
Michael W. Mahoney
ODL
39
275
0
01 Jun 2020
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
Zhicheng Huang
Zhaoyang Zeng
Bei Liu
Dongmei Fu
Jianlong Fu
ViT
50
436
0
02 Apr 2020
Adaptive Federated Optimization
Sashank J. Reddi
Zachary B. Charles
Manzil Zaheer
Zachary Garrett
Keith Rush
Jakub Konecný
Sanjiv Kumar
H. B. McMahan
FedML
43
1,393
0
29 Feb 2020
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
28
980
0
01 Apr 2019
A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method
Simon Lacoste-Julien
Mark W. Schmidt
Francis R. Bach
128
259
0
10 Dec 2012
1