ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.18563
  4. Cited By
PacTrain: Pruning and Adaptive Sparse Gradient Compression for Efficient Collective Communication in Distributed Deep Learning

PacTrain: Pruning and Adaptive Sparse Gradient Compression for Efficient Collective Communication in Distributed Deep Learning

24 May 2025
Yisu Wang
Ruilong Wu
Xinjiao Li
Dirk Kutscher
ArXivPDFHTML

Papers citing "PacTrain: Pruning and Adaptive Sparse Gradient Compression for Efficient Collective Communication in Distributed Deep Learning"

31 / 31 papers shown
Title
Beyond Throughput and Compression Ratios: Towards High End-to-end
  Utility of Gradient Compression
Beyond Throughput and Compression Ratios: Towards High End-to-end Utility of Gradient Compression
Wenchen Han
S. Vargaftik
Michael Mitzenmacher
Brad Karp
Ran Ben-Basat
67
3
0
01 Jul 2024
FedMef: Towards Memory-efficient Federated Dynamic Pruning
FedMef: Towards Memory-efficient Federated Dynamic Pruning
Hong Huang
Weiming Zhuang
Chen Chen
Lingjuan Lyu
75
9
0
21 Mar 2024
MegaScale: Scaling Large Language Model Training to More Than 10,000
  GPUs
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Ziheng Jiang
Yanghua Peng
Yinmin Zhong
Qi Huang
Yangrui Chen
...
Zhe Li
X. Jia
Jia-jun Ye
Xin Jin
Xin Liu
LRM
79
108
0
23 Feb 2024
Optimal and Near-Optimal Adaptive Vector Quantization
Optimal and Near-Optimal Adaptive Vector Quantization
Ran Ben-Basat
Y. Ben-Itzhak
Michael Mitzenmacher
S. Vargaftik
MQ
44
4
0
05 Feb 2024
NetLLM: Adapting Large Language Models for Networking
NetLLM: Adapting Large Language Models for Networking
Duo Wu
Xianda Wang
Yaqi Qiao
Zhi Wang
Junchen Jiang
Shuguang Cui
Fangxin Wang
56
35
0
04 Feb 2024
Swing: Short-cutting Rings for Higher Bandwidth Allreduce
Swing: Short-cutting Rings for Higher Bandwidth Allreduce
Daniele De Sensi
Tommaso Bonato
D. Saam
Torsten Hoefler
42
9
0
17 Jan 2024
DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training
DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training
Aochuan Chen
Yimeng Zhang
Jinghan Jia
James Diffenderfer
Jiancheng Liu
Konstantinos Parasyris
Yihua Zhang
Zheng Zhang
B. Kailkhura
Sijia Liu
72
45
0
03 Oct 2023
Model Sparsity Can Simplify Machine Unlearning
Model Sparsity Can Simplify Machine Unlearning
Jinghan Jia
Jiancheng Liu
Parikshit Ram
Yuguang Yao
Gaowen Liu
Yang Liu
Pranay Sharma
Sijia Liu
MU
48
117
0
11 Apr 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
813
12,840
0
27 Feb 2023
THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic
  Compression
THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression
Minghao Li
Ran Ben-Basat
S. Vargaftik
Chon-In Lao
Ke Xu
Michael Mitzenmacher
Minlan Yu Harvard University
63
16
0
16 Feb 2023
Distributed Pruning Towards Tiny Neural Networks in Federated Learning
Distributed Pruning Towards Tiny Neural Networks in Federated Learning
Hong Huang
Lan Zhang
Chaoyue Sun
R. Fang
Xiaoyong Yuan
Dapeng Wu
FedML
24
17
0
05 Dec 2022
ZeroFL: Efficient On-Device Training for Federated Learning with Local
  Sparsity
ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity
Xinchi Qiu
Javier Fernandez-Marques
Pedro Gusmão
Yan Gao
Titouan Parcollet
Nicholas D. Lane
FedML
57
70
0
04 Aug 2022
LaMDA: Language Models for Dialog Applications
LaMDA: Language Models for Dialog Applications
R. Thoppilan
Daniel De Freitas
Jamie Hall
Noam M. Shazeer
Apoorv Kulshreshtha
...
Blaise Aguera-Arcas
Claire Cui
M. Croak
Ed H. Chi
Quoc Le
ALM
98
1,577
0
20 Jan 2022
Federated Dynamic Sparse Training: Computing Less, Communicating Less,
  Yet Learning Better
Federated Dynamic Sparse Training: Computing Less, Communicating Less, Yet Learning Better
Sameer Bibikar
H. Vikalo
Zhangyang Wang
Xiaohan Chen
FedML
54
101
0
18 Dec 2021
When to Prune? A Policy towards Early Structural Pruning
When to Prune? A Policy towards Early Structural Pruning
Maying Shen
Pavlo Molchanov
Hongxu Yin
J. Álvarez
VLM
47
55
0
22 Oct 2021
EmbRace: Accelerating Sparse Communication for Distributed Training of
  NLP Neural Networks
EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks
Shengwei Li
Zhiquan Lai
Dongsheng Li
Yiming Zhang
Xiangyu Ye
Yabo Duan
FedML
25
3
0
18 Oct 2021
EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets
EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets
Xiaohan Chen
Yu Cheng
Shuohang Wang
Zhe Gan
Zhangyang Wang
Jingjing Liu
65
100
0
31 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
400
40,217
0
22 Oct 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
500
41,106
0
28 May 2020
PowerSGD: Practical Low-Rank Gradient Compression for Distributed
  Optimization
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization
Thijs Vogels
Sai Praneeth Karimireddy
Martin Jaggi
56
320
0
31 May 2019
Similarity of Neural Network Representations Revisited
Similarity of Neural Network Representations Revisited
Simon Kornblith
Mohammad Norouzi
Honglak Lee
Geoffrey E. Hinton
124
1,382
0
01 May 2019
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle
Michael Carbin
170
3,433
0
09 Mar 2018
Deep Gradient Compression: Reducing the Communication Bandwidth for
  Distributed Training
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
Chengyue Wu
Song Han
Huizi Mao
Yu Wang
W. Dally
107
1,399
0
05 Dec 2017
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep
  Learning
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
W. Wen
Cong Xu
Feng Yan
Chunpeng Wu
Yandan Wang
Yiran Chen
Hai Helen Li
128
985
0
22 May 2017
Sparse Communication for Distributed Gradient Descent
Sparse Communication for Distributed Gradient Descent
Alham Fikri Aji
Kenneth Heafield
58
738
0
17 Apr 2017
Pruning Filters for Efficient ConvNets
Pruning Filters for Efficient ConvNets
Hao Li
Asim Kadav
Igor Durdanovic
H. Samet
H. Graf
3DPC
170
3,687
0
31 Aug 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.4K
192,638
0
10 Dec 2015
You Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object Detection
Joseph Redmon
S. Divvala
Ross B. Girshick
Ali Farhadi
ObjD
568
36,643
0
08 Jun 2015
Learning both Weights and Connections for Efficient Neural Networks
Learning both Weights and Connections for Efficient Neural Networks
Song Han
Jeff Pool
J. Tran
W. Dally
CVBM
247
6,628
0
08 Jun 2015
Very Deep Convolutional Networks for Large-Scale Image Recognition
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
952
99,991
0
04 Sep 2014
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
1.1K
39,383
0
01 Sep 2014
1