ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.01352
49
0

TAH-QUANT: Effective Activation Quantization in Pipeline Parallelism over Slow Network

2 June 2025
Guangxin He
Yuan Cao
Yutong He
Tianyi Bai
Kun Yuan
Binhang Yuan
    MQ
ArXiv (abs)PDFHTML
Main:11 Pages
5 Figures
Bibliography:4 Pages
3 Tables
Appendix:1 Pages
Abstract

Decentralized training of large language models offers the opportunity to pool computational resources across geographically distributed participants but faces significant network communication bottlenecks, particularly in pipeline-parallel settings. While pipeline parallelism partitions model layers across devices to handle large-scale models, it necessitates frequent communication of intermediate activations, creating challenges when network bandwidth is limited. Existing activation compression methods, such as AQ-SGD, mitigate quantization-induced errors through error compensation but impose prohibitive memory overhead by requiring storage of previous activations. To address these issues, we introduce TAH-Quant (Tile-wise Adaptive Hadamard Quantization), a novel activation quantization framework designed specifically for pipeline parallelism. Our approach integrates fine-grained tile-wise quantization for precise control, entropy-guided token-level adaptive bit allocation for optimal bit usage, and a Hadamard-based transform with pivot element swapping to effectively suppress quantization outliers. We further provide a theoretical analysis, proving that pipeline parallel training equipped with TAH-Quant maintains a convergence rate of O(1/T)\mathcal{O}(1/\sqrt{T})O(1/T​), matching that of vanilla stochastic gradient descent. Extensive experiments on diverse LLM tasks demonstrate that TAH-Quant achieves aggressive activation quantization (3-4 bits) ratio, which provides up to 4.3×\times× end-to-end speedup without compromising training convergence, matches state-of-the-art methods, incurs no extra memory overhead, and generalizes well across different training scenarios.

View on arXiv
@article{he2025_2506.01352,
  title={ TAH-QUANT: Effective Activation Quantization in Pipeline Parallelism over Slow Network },
  author={ Guangxin He and Yuan Cao and Yutong He and Tianyi Bai and Kun Yuan and Binhang Yuan },
  journal={arXiv preprint arXiv:2506.01352},
  year={ 2025 }
}
Comments on this paper