Title
Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design Mohan Zhang Pingzhi Li Jie Peng Mufan Qiu Tianlong Chen MoE 45 0 0 02 Apr 2025
Efficient Algorithms for Verifying Kruskal Rank in Sparse Linear Regression and Related Applications Fengqin Zhou 55 0 0 06 Mar 2025
Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts Sukwon Yun Inyoung Choi Jie Peng Yangfan Wu J. Bao Qiyiwen Zhang Jiayi Xin Qi Long Tianlong Chen MoE 50 4 0 10 Oct 2024
Efficient Deweather Mixture-of-Experts with Uncertainty-aware Feature-wise Linear Modulation Rongyu Zhang Yulin Luo Jiaming Liu Huanrui Yang Zhen Dong ... Tomoyuki Okuno Yohei Nakata Kurt Keutzer Yuan Du Shanghang Zhang MoMe MoE 35 3 0 27 Dec 2023
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective Can Jin Tianjin Huang Yihua Zhang Mykola Pechenizkiy Sijia Liu Shiwei Liu Tianlong Chen VLM 25 26 0 03 Dec 2023
Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs Lu Yin Ajay Jaiswal Shiwei Liu Souvik Kundu Zhangyang Wang 22 7 0 29 Sep 2023
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter Ajay Jaiswal Shiwei Liu Tianlong Chen Zhangyang Wang VLM 21 33 0 06 Jun 2023
Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together! Shiwei Liu Tianlong Chen Zhenyu (Allen) Zhang Xuxi Chen Tianjin Huang Ajay Jaiswal Zhangyang Wang 29 29 0 03 Mar 2023
Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers Tianlong Chen Zhenyu (Allen) Zhang Ajay Jaiswal Shiwei Liu Zhangyang Wang MoE 30 46 0 02 Mar 2023
HetuMoE: An Efficient Trillion-scale Mixture-of-Expert Distributed Training System Xiaonan Nie Pinxue Zhao Xupeng Miao Tong Zhao Bin Cui MoE 21 36 0 28 Mar 2022
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 237 4,469 0 23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 245 1,821 0 17 Sep 2019
Language Models as Knowledge Bases? Fabio Petroni Tim Rocktaschel Patrick Lewis A. Bakhtin Yuxiang Wu Alexander H. Miller Sebastian Riedel KELM AI4MH 415 2,586 0 03 Sep 2019