Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.11873
Cited By
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
21 January 2025
Zihan Qiu
Zeyu Huang
Jian Xu
Kaiyue Wen
Zhaoxiang Wang
Rui Men
Ivan Titov
Dayiheng Liu
Jingren Zhou
Junyang Lin
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models"
4 / 4 papers shown
Title
Qwen3 Technical Report
An Yang
A. Li
Baosong Yang
Beichen Zhang
Binyuan Hui
...
Zekun Wang
Zeyu Cui
Zhenru Zhang
Zhenhong Zhou
Zihan Qiu
LLMAG
OSLM
LRM
51
10
0
14 May 2025
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Zihan Qiu
Zhaoxiang Wang
Bo Zheng
Zeyu Huang
Kaiyue Wen
...
Fei Huang
Suozhi Huang
Dayiheng Liu
Jingren Zhou
Junyang Lin
MoE
45
0
0
10 May 2025
Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs
Yehui Tang
Yichun Yin
Yaoyuan Wang
Hang Zhou
Yu Pan
...
Zhe Liu
Zhicheng Liu
Zhuowen Tu
Zilin Ding
Zongyuan Zhan
MoE
40
0
0
07 May 2025
Neural network task specialization via domain constraining
Roman Malashin
Daniil Ilyukhin
49
0
0
28 Apr 2025
1