Why are Adaptive Methods Good for Attention Models?

Why are Adaptive Methods Good for Attention Models?

6 December 2019

Sai Praneeth Karimireddy

Sashank J. Reddi

Papers citing "Why are Adaptive Methods Good for Attention Models?"

18 / 18 papers shown

Title
A Survey on Efficient Training of Transformers Bohan Zhuang Jing Liu Zizheng Pan Haoyu He Yuetian Weng Chunhua Shen 31 47 0 02 Feb 2023
ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting Shancheng Fang Zhendong Mao Hongtao Xie Yuxin Wang C. Yan Yongdong Zhang 32 53 0 19 Nov 2022
On the Impossible Safety of Large AI Models El-Mahdi El-Mhamdi Sadegh Farhadkhani R. Guerraoui Nirupam Gupta L. Hoang Rafael Pinot Sébastien Rouault John Stephan 34 31 0 30 Sep 2022
Accelerated Federated Learning with Decoupled Adaptive Optimization Jiayin Jin Jiaxiang Ren Yang Zhou Lingjuan Lyu Ji Liu Dejing Dou AI4CE FedML 19 51 0 14 Jul 2022
A DNN Optimizer that Improves over AdaBelief by Suppression of the Adaptive Stepsize Range Guoqiang Zhang Kenta Niwa W. Kleijn ODL 16 2 0 24 Mar 2022
Extending AdamW by Leveraging Its Second Moment and Magnitude Guoqiang Zhang Niwa Kenta W. Kleijn 6 3 0 09 Dec 2021
Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis Jikai Jin Samir Bhatt Haiyang Wang Liwei Wang 32 48 0 24 Oct 2021
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training Hongwei Xue Yupan Huang Bei Liu Houwen Peng Jianlong Fu Houqiang Li Jiebo Luo 33 89 0 25 Jun 2021
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning Zhicheng Huang Zhaoyang Zeng Yupan Huang Bei Liu Dongmei Fu Jianlong Fu VLM ViT 51 271 0 07 Apr 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs) Zhiyuan Li Sadhika Malladi Sanjeev Arora 44 78 0 24 Feb 2021
Learning from History for Byzantine Robust Optimization Sai Praneeth Karimireddy Lie He Martin Jaggi FedML AAML 30 173 0 18 Dec 2020
Personalized Cross-Silo Federated Learning on Non-IID Data Yutao Huang Lingyang Chu Zirui Zhou Lanjun Wang Jiangchuan Liu J. Pei Yong Zhang FedML 20 591 0 07 Jul 2020
Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks Umut Simsekli Ozan Sener George Deligiannidis Murat A. Erdogdu 44 55 0 16 Jun 2020
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning Z. Yao A. Gholami Sheng Shen Mustafa Mustafa Kurt Keutzer Michael W. Mahoney ODL 39 275 0 01 Jun 2020
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers Zhicheng Huang Zhaoyang Zeng Bei Liu Dongmei Fu Jianlong Fu ViT 50 436 0 02 Apr 2020
Adaptive Federated Optimization Sashank J. Reddi Zachary B. Charles Manzil Zaheer Zachary Garrett Keith Rush Jakub Konecný Sanjiv Kumar H. B. McMahan FedML 43 1,393 0 29 Feb 2020
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes Yang You Jing Li Sashank J. Reddi Jonathan Hseu Sanjiv Kumar Srinadh Bhojanapalli Xiaodan Song J. Demmel Kurt Keutzer Cho-Jui Hsieh ODL 28 980 0 01 Apr 2019
A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method Simon Lacoste-Julien Mark W. Schmidt Francis R. Bach 128 259 0 10 Dec 2012