Title |
---|
![]() Exploring the Benefit of Activation Sparsity in Pre-training Zhengyan Zhang Chaojun Xiao Qiujieli Qin Yankai Lin Zhiyuan Zeng Xu Han Zhiyuan Liu Ruobing Xie Maosong Sun Jie Zhou |
![]() FactorLLM: Factorizing Knowledge via Mixture of Experts for Large
Language Models Zhongyu Zhao Menghang Dong Rongyu Zhang Wenzhao Zheng Yunpeng Zhang Huanrui Yang Dalong Du Kurt Keutzer Shanghang Zhang |