ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.03194
  4. Cited By
Why are Adaptive Methods Good for Attention Models?

Why are Adaptive Methods Good for Attention Models?

6 December 2019
J.N. Zhang
Sai Praneeth Karimireddy
Andreas Veit
Seungyeon Kim
Sashank J. Reddi
Surinder Kumar
S. Sra
ArXivPDFHTML

Papers citing "Why are Adaptive Methods Good for Attention Models?"

18 / 18 papers shown
Title
A Survey on Efficient Training of Transformers
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
31
47
0
02 Feb 2023
ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for
  Scene Text Spotting
ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting
Shancheng Fang
Zhendong Mao
Hongtao Xie
Yuxin Wang
C. Yan
Yongdong Zhang
32
53
0
19 Nov 2022
On the Impossible Safety of Large AI Models
On the Impossible Safety of Large AI Models
El-Mahdi El-Mhamdi
Sadegh Farhadkhani
R. Guerraoui
Nirupam Gupta
L. Hoang
Rafael Pinot
Sébastien Rouault
John Stephan
34
31
0
30 Sep 2022
Accelerated Federated Learning with Decoupled Adaptive Optimization
Accelerated Federated Learning with Decoupled Adaptive Optimization
Jiayin Jin
Jiaxiang Ren
Yang Zhou
Lingjuan Lyu
Ji Liu
Dejing Dou
AI4CE
FedML
19
51
0
14 Jul 2022
A DNN Optimizer that Improves over AdaBelief by Suppression of the
  Adaptive Stepsize Range
A DNN Optimizer that Improves over AdaBelief by Suppression of the Adaptive Stepsize Range
Guoqiang Zhang
Kenta Niwa
W. Kleijn
ODL
16
2
0
24 Mar 2022
Extending AdamW by Leveraging Its Second Moment and Magnitude
Extending AdamW by Leveraging Its Second Moment and Magnitude
Guoqiang Zhang
Niwa Kenta
W. Kleijn
6
3
0
09 Dec 2021
Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis
Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis
Jikai Jin
Samir Bhatt
Haiyang Wang
Liwei Wang
32
48
0
24 Oct 2021
Probing Inter-modality: Visual Parsing with Self-Attention for
  Vision-Language Pre-training
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Hongwei Xue
Yupan Huang
Bei Liu
Houwen Peng
Jianlong Fu
Houqiang Li
Jiebo Luo
33
89
0
25 Jun 2021
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language
  Representation Learning
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Zhicheng Huang
Zhaoyang Zeng
Yupan Huang
Bei Liu
Dongmei Fu
Jianlong Fu
VLM
ViT
51
271
0
07 Apr 2021
On the Validity of Modeling SGD with Stochastic Differential Equations
  (SDEs)
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
44
78
0
24 Feb 2021
Learning from History for Byzantine Robust Optimization
Learning from History for Byzantine Robust Optimization
Sai Praneeth Karimireddy
Lie He
Martin Jaggi
FedML
AAML
30
173
0
18 Dec 2020
Personalized Cross-Silo Federated Learning on Non-IID Data
Personalized Cross-Silo Federated Learning on Non-IID Data
Yutao Huang
Lingyang Chu
Zirui Zhou
Lanjun Wang
Jiangchuan Liu
J. Pei
Yong Zhang
FedML
20
591
0
07 Jul 2020
Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks
Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks
Umut Simsekli
Ozan Sener
George Deligiannidis
Murat A. Erdogdu
44
55
0
16 Jun 2020
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
Z. Yao
A. Gholami
Sheng Shen
Mustafa Mustafa
Kurt Keutzer
Michael W. Mahoney
ODL
39
275
0
01 Jun 2020
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal
  Transformers
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
Zhicheng Huang
Zhaoyang Zeng
Bei Liu
Dongmei Fu
Jianlong Fu
ViT
50
436
0
02 Apr 2020
Adaptive Federated Optimization
Adaptive Federated Optimization
Sashank J. Reddi
Zachary B. Charles
Manzil Zaheer
Zachary Garrett
Keith Rush
Jakub Konecný
Sanjiv Kumar
H. B. McMahan
FedML
43
1,393
0
29 Feb 2020
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
28
980
0
01 Apr 2019
A simpler approach to obtaining an O(1/t) convergence rate for the
  projected stochastic subgradient method
A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method
Simon Lacoste-Julien
Mark W. Schmidt
Francis R. Bach
128
259
0
10 Dec 2012
1