Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit
Quantization and Robustness

Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness

3 October 2023

Young Jin Kim

Papers citing "Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness"

7 / 7 papers shown

Title
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems Yinsicheng Jiang Yao Fu Yeqi Huang Ping Nie Zhan Lu ... Dayou Du Tairan Xu Kai Zou Edoardo Ponti Luo Mai MoE 12 0 0 16 May 2025
$D$^{2}$MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving$ D $^{2}$ MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving Haodong Wang Qihua Zhou Zicong Hong Song Guo MoE 58 0 0 17 Apr 2025
MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness Zihao Zheng Xiuping Cui Size Zheng Maoliang Li Jiayu Chen Yun Liang Xiang Chen MQ MoE 61 0 0 27 Mar 2025
CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited Memory Jiashun Suo Xiaojian Liao Limin Xiao Li Ruan Jinquan Wang Xiao Su Zhisheng Huo 67 0 0 04 Mar 2025
Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline Zhiyuan Fang Yuegui Huang Zicong Hong Yufeng Lyu Wuhui Chen Yue Yu Fan Yu Zibin Zheng MoE 48 0 0 09 Feb 2025
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference Sneha Kudugunta Yanping Huang Ankur Bapna M. Krikun Dmitry Lepikhin Minh-Thang Luong Orhan Firat MoE 119 106 0 24 Sep 2021
Scalable and Efficient MoE Training for Multitask Multilingual Models Young Jin Kim A. A. Awan Alexandre Muzio Andres Felipe Cruz Salinas Liyang Lu Amr Hendy Samyam Rajbhandari Yuxiong He Hany Awadalla MoE 98 84 0 22 Sep 2021