Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.12656
Cited By
HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts
20 February 2024
Hao Zhao
Zihan Qiu
Huijia Wu
Zili Wang
Zhaofeng He
Jie Fu
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts"
17 / 17 papers shown
Title
THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation
Yunlong Liang
Fandong Meng
Jie Zhou
MoE
17
0
0
20 May 2025
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference
Suraiya Tairin
Shohaib Mahmud
Haiying Shen
Anand Iyer
MoE
233
1
0
10 Mar 2025
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts
Zhenpeng Su
Xing Wu
Zijia Lin
Yizhe Xiong
Minxuan Lv
Guangyuan Ma
Hui Chen
Songlin Hu
Guiguang Ding
MoE
29
3
0
21 Oct 2024
On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs
Herun Wan
Minnan Luo
Zhixiong Su
Guang Dai
Xiang Zhao
DeLMO
35
0
0
16 Oct 2024
PTD-SQL: Partitioning and Targeted Drilling with LLMs in Text-to-SQL
Ruilin Luo
Liyuan Wang
Binghuai Lin
Zicheng Lin
Yujiu Yang
LRM
35
6
0
21 Sep 2024
Mixture of Diverse Size Experts
Manxi Sun
Wei Liu
Jian Luan
Pengzhi Gao
Bin Wang
MoE
28
1
0
18 Sep 2024
Layerwise Recurrent Router for Mixture-of-Experts
Zihan Qiu
Zeyu Huang
Shuang Cheng
Yizhi Zhou
Zili Wang
Ivan Titov
Jie Fu
MoE
81
2
0
13 Aug 2024
MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts
Zhenpeng Su
Zijia Lin
Xue Bai
Xing Wu
Yizhe Xiong
...
Guangyuan Ma
Hui Chen
Guiguang Ding
Wei Zhou
Songlin Hu
MoE
34
5
0
13 Jul 2024
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
Haoze Wu
Zihan Qiu
Zili Wang
Hang Zhao
Jie Fu
MoE
51
3
0
18 Jun 2024
Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction
Yonggang Jin
Ge Zhang
Hao Zhao
Tianyu Zheng
Jiawei Guo
Liuyu Xiang
Shawn Yue
Stephen W. Huang
Zhaofeng He
Jie Fu
OffRL
34
4
0
06 Feb 2024
Prototype-based HyperAdapter for Sample-Efficient Multi-task Tuning
Hao Zhao
Jie Fu
Zhaofeng He
105
6
0
18 Oct 2023
Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer
Ahmet Üstün
Arianna Bisazza
G. Bouma
Gertjan van Noord
Sebastian Ruder
54
32
0
24 May 2022
Multilingual Machine Translation with Hyper-Adapters
Christos Baziotis
Mikel Artetxe
James Cross
Shruti Bhosale
75
21
0
22 May 2022
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
160
331
0
18 Feb 2022
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
304
6,996
0
20 Apr 2018
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
950
20,599
0
17 Apr 2017
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
211
3,515
0
10 Jun 2015
1