Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

v1v2v3 (latest)

Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

15 July 2024

ArXiv (abs)PDF HTML

Papers citing "Q-Sparse: All Large Language Models can be Fully Sparsely-Activated"

4 / 4 papers shown

Title
Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity Guang Yan Yuhui Zhang Zimu Guo Lutan Zhao Xiaojun Chen Chen Wang Wenhao Wang Dan Meng Rui Hou 74 0 0 12 May 2025
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs Hongyu Wang Shuming Ma Furu Wei MQ 96 4 0 25 Apr 2025
Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations Zican Dong Han Peng Peiyu Liu Wayne Xin Zhao Dong Wu Feng Xiao Ziyi Wang MoE 80 2 0 09 Apr 2025
MoH: Multi-Head Attention as Mixture-of-Head Attention Peng Jin Bo Zhu Li Yuan Shuicheng Yan MoE 101 17 0 15 Oct 2024