ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.02888
  4. Cited By
1-bit Adam: Communication Efficient Large-Scale Training with Adam's
  Convergence Speed

1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed

4 February 2021
Hanlin Tang
Shaoduo Gan
A. A. Awan
Samyam Rajbhandari
Conglong Li
Xiangru Lian
Ji Liu
Ce Zhang
Yuxiong He
    AI4CE
ArXivPDFHTML

Papers citing "1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed"

24 / 24 papers shown
Title
Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation
Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation
Yaxiong Chen
Yujie Wang
Zixuan Zheng
Jingliang Hu
Yilei Shi
Shengwu Xiong
Xiao Xiang Zhu
Lichao Mou
54
1
0
18 Mar 2025
Sketched Adaptive Federated Deep Learning: A Sharp Convergence Analysis
Sketched Adaptive Federated Deep Learning: A Sharp Convergence Analysis
Zhijie Chen
Qiaobo Li
A. Banerjee
FedML
35
0
0
11 Nov 2024
Ordered Momentum for Asynchronous SGD
Ordered Momentum for Asynchronous SGD
Chang-Wei Shi
Yi-Rui Yang
Wu-Jun Li
ODL
56
0
0
27 Jul 2024
Investigation of Energy-efficient AI Model Architectures and Compression
  Techniques for "Green" Fetal Brain Segmentation
Investigation of Energy-efficient AI Model Architectures and Compression Techniques for "Green" Fetal Brain Segmentation
Szymon Mazurek
M. Pytlarz
Sylwia Malec
A. Crimi
31
0
0
03 Apr 2024
DropCompute: simple and more robust distributed synchronous training via
  compute variance reduction
DropCompute: simple and more robust distributed synchronous training via compute variance reduction
Niv Giladi
Shahar Gottlieb
Moran Shkolnik
A. Karnieli
Ron Banner
Elad Hoffer
Kfir Y. Levy
Daniel Soudry
33
2
0
18 Jun 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature
  Review
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
30
41
0
07 Apr 2023
Matching-based Term Semantics Pre-training for Spoken Patient Query
  Understanding
Matching-based Term Semantics Pre-training for Spoken Patient Query Understanding
Zefa Hu
Xiuyi Chen
Hao Wu
Minglun Han
Ziyi Ni
Jing Shi
Shuang Xu
Bo Xu
55
4
0
02 Mar 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly
  Communication-Efficient
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
30
31
0
27 Jan 2023
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware
  Communication Compression
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
Jaeyong Song
Jinkyu Yim
Jaewon Jung
Hongsun Jang
H. Kim
Youngsok Kim
Jinho Lee
GNN
24
25
0
24 Jan 2023
Federated Averaging Langevin Dynamics: Toward a unified theory and new
  algorithms
Federated Averaging Langevin Dynamics: Toward a unified theory and new algorithms
Vincent Plassier
Alain Durmus
Eric Moulines
FedML
21
6
0
31 Oct 2022
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining
Wenhan Xian
Feihu Huang
Heng-Chiao Huang
FedML
32
0
0
14 Oct 2022
Fine-tuning Language Models over Slow Networks using Activation
  Compression with Guarantees
Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees
Jue Wang
Binhang Yuan
Luka Rimanic
Yongjun He
Tri Dao
Beidi Chen
Christopher Ré
Ce Zhang
AI4CE
24
11
0
02 Jun 2022
Communication-Efficient Adaptive Federated Learning
Communication-Efficient Adaptive Federated Learning
Yujia Wang
Lu Lin
Jinghui Chen
FedML
27
71
0
05 May 2022
Survey on Large Scale Neural Network Training
Survey on Large Scale Neural Network Training
Julia Gusak
Daria Cherniuk
Alena Shilova
A. Katrutsa
Daniel Bershatsky
...
Lionel Eyraud-Dubois
Oleg Shlyazhko
Denis Dimitrov
Ivan V. Oseledets
Olivier Beaumont
22
10
0
21 Feb 2022
Maximizing Communication Efficiency for Large-scale Training via 0/1
  Adam
Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Yucheng Lu
Conglong Li
Minjia Zhang
Christopher De Sa
Yuxiong He
OffRL
AI4CE
24
20
0
12 Feb 2022
Benchmark Assessment for DeepSpeed Optimization Library
Benchmark Assessment for DeepSpeed Optimization Library
G. Liang
I. Alsmadi
29
3
0
12 Feb 2022
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
27
14
0
01 Nov 2021
Order Optimal Bounds for One-Shot Federated Learning over non-Convex
  Loss Functions
Order Optimal Bounds for One-Shot Federated Learning over non-Convex Loss Functions
Arsalan Sharifnassab
Saber Salehkaleybar
S. J. Golestani
FedML
6
0
0
19 Aug 2021
BAGUA: Scaling up Distributed Learning with System Relaxations
BAGUA: Scaling up Distributed Learning with System Relaxations
Shaoduo Gan
Xiangru Lian
Rui Wang
Jianbin Chang
Chengjun Liu
...
Jiawei Jiang
Binhang Yuan
Sen Yang
Ji Liu
Ce Zhang
23
30
0
03 Jul 2021
Scaling Vision Transformers
Scaling Vision Transformers
Xiaohua Zhai
Alexander Kolesnikov
N. Houlsby
Lucas Beyer
ViT
55
1,060
0
08 Jun 2021
Layered gradient accumulation and modular pipeline parallelism: fast and
  efficient training of large language models
Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models
J. Lamy-Poirier
MoE
15
8
0
04 Jun 2021
Linearly Converging Error Compensated SGD
Linearly Converging Error Compensated SGD
Eduard A. Gorbunov
D. Kovalev
Dmitry Makarenko
Peter Richtárik
163
77
0
23 Oct 2020
A new regret analysis for Adam-type algorithms
A new regret analysis for Adam-type algorithms
Ahmet Alacaoglu
Yura Malitsky
P. Mertikopoulos
V. Cevher
ODL
48
42
0
21 Mar 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
1