Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14705
Cited By
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
24 May 2023
Sheng Shen
Le Hou
Yan-Quan Zhou
Nan Du
Shayne Longpre
Jason W. Wei
Hyung Won Chung
Barret Zoph
W. Fedus
Xinyun Chen
Tu Vu
Yuexin Wu
Wuyang Chen
Albert Webson
Yunxuan Li
Vincent Zhao
Hongkun Yu
Kurt Keutzer
Trevor Darrell
Denny Zhou
ALM
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models"
50 / 50 papers shown
Title
Backdoor Attacks Against Patch-based Mixture of Experts
Cedric Chan
Jona te Lintelo
S. Picek
AAML
MoE
151
0
0
03 May 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
99
1
0
01 May 2025
Ensemble Learning for Large Language Models in Text and Code Generation: A Survey
Mari Ashiga
Wei Jie
Fan Wu
Vardan K. Voskanyan
Fateme Dinmohammadi
P. Brookes
Jingzhi Gong
Zheng Wang
44
0
0
13 Mar 2025
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment
Zhili Liu
Yunhao Gou
Kai Chen
Lanqing Hong
Jiahui Gao
...
Yu Zhang
Zhenguo Li
Xin Jiang
Qiang Liu
James T. Kwok
MoE
101
9
0
20 Feb 2025
Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning
Ziyu Zhao
Yixiao Zhou
Didi Zhu
Tao Shen
Qing Guo
Jing Su
Kun Kuang
Zhongyu Wei
Fei Wu
Yu Cheng
MoE
40
1
0
28 Jan 2025
LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning
Shuguang Chen
Guang Lin
LRM
138
0
0
28 Dec 2024
Investigating Mixture of Experts in Dense Retrieval
Effrosyni Sokli
Pranav Kasela
Georgios Peikos
G. Pasi
MoE
77
1
0
16 Dec 2024
H
3
H^3
H
3
Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs
Selim Furkan Tekin
Fatih Ilhan
Tiansheng Huang
Sihao Hu
Zachary Yahn
Ling Liu
MoMe
89
3
0
26 Nov 2024
TradExpert: Revolutionizing Trading with Mixture of Expert LLMs
Qianggang Ding
Haochen Shi
Jiadong Guo
Bang Liu
AIFin
43
3
0
16 Oct 2024
Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence
Shangbin Feng
Zifeng Wang
Yike Wang
Sayna Ebrahimi
Hamid Palangi
...
Nathalie Rauschmayr
Yejin Choi
Yulia Tsvetkov
Chen-Yu Lee
Tomas Pfister
MoMe
35
3
0
15 Oct 2024
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
Di Wu
Hongwei Wang
W. Yu
Yuwei Zhang
Kai-Wei Chang
Dong Yu
RALM
KELM
46
5
0
14 Oct 2024
Realizing Video Summarization from the Path of Language-based Semantic Understanding
Kuan-Chen Mu
Zhi-Yi Chin
Wei-Chen Chiu
28
0
0
06 Oct 2024
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling
Jihai Zhang
Xiaoye Qu
Tong Zhu
Yu Cheng
41
7
0
28 Sep 2024
Identity-Driven Hierarchical Role-Playing Agents
Libo Sun
Siyuan Wang
Xuanjing Huang
Zhongyu Wei
LLMAG
AI4CE
44
7
0
28 Jul 2024
Adaptive Prediction Ensemble: Improving Out-of-Distribution Generalization of Motion Forecasting
Jinning Li
Jiachen Li
Sangjae Bae
David Isele
39
4
0
12 Jul 2024
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Longrong Yang
Dong Shen
Chaoxiang Cai
Fan Yang
Size Li
Di Zhang
Xi Li
MoE
56
2
0
28 Jun 2024
Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning
Ziyu Zhao
Leilei Gan
Guoyin Wang
Yuwei Hu
Tao Shen
Hongxia Yang
Kun Kuang
Fei Wu
MoE
MoMe
39
11
0
24 Jun 2024
SimSMoE: Solving Representational Collapse via Similarity Measure
Giang Do
Hung Le
T. Tran
MoE
47
1
0
22 Jun 2024
MoE-RBench
\texttt{MoE-RBench}
MoE-RBench
: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
Guanjie Chen
Xinyu Zhao
Tianlong Chen
Yu Cheng
MoE
76
5
0
17 Jun 2024
MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors
Renzhi Wang
Piji Li
KELM
38
3
0
29 May 2024
CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
Xingqun Qi
Hengyuan Zhang
Yatian Wang
J. Pan
Chen Liu
...
Qixun Zhang
Shanghang Zhang
Wenhan Luo
Qifeng Liu
Qi-fei Liu
DiffM
SLR
110
5
0
27 May 2024
Position: Leverage Foundational Models for Black-Box Optimization
Xingyou Song
Yingtao Tian
Robert Tjarko Lange
Chansoo Lee
Yujin Tang
Yutian Chen
42
5
0
06 May 2024
Towards Incremental Learning in Large Language Models: A Critical Review
M. Jovanovic
Peter Voss
ELM
CLL
KELM
37
5
0
28 Apr 2024
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Bowen Pan
Yikang Shen
Haokun Liu
Mayank Mishra
Gaoyuan Zhang
Aude Oliva
Colin Raffel
Rameswar Panda
MoE
40
19
0
08 Apr 2024
RouterBench: A Benchmark for Multi-LLM Routing System
Qitian Jason Hu
Jacob Bieker
Xiuyu Li
Nan Jiang
Benjamin Keigwin
Gaurav Ranganath
Kurt Keutzer
Shriyash Kaustubh Upadhyay
44
36
0
18 Mar 2024
Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models
Ning Ding
Yulin Chen
Ganqu Cui
Xingtai Lv
Weilin Zhao
Ruobing Xie
Bowen Zhou
Zhiyuan Liu
Maosong Sun
ALM
MoMe
AI4CE
38
7
0
13 Mar 2024
Tuning-Free Accountable Intervention for LLM Deployment -- A Metacognitive Approach
Zhen Tan
Jie Peng
Tianlong Chen
Huan Liu
31
6
0
08 Mar 2024
LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild
Ziyu Zhao
Leilei Gan
Guoyin Wang
Wangchunshu Zhou
Hongxia Yang
Kun Kuang
Fei Wu
MoMe
26
29
0
15 Feb 2024
How to Train Data-Efficient LLMs
Noveen Sachdeva
Benjamin Coleman
Wang-Cheng Kang
Jianmo Ni
Lichan Hong
Ed H. Chi
James Caverlee
Julian McAuley
D. Cheng
29
51
0
15 Feb 2024
TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese
N. Corrêa
Sophia Falk
Shiza Fatimah
Aniket Sen
N. D. Oliveira
30
9
0
30 Jan 2024
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
Haoyuan Wu
Haisheng Zheng
Zhuolun He
Bei Yu
MoE
ALM
29
14
0
05 Jan 2024
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
91
46
0
18 Dec 2023
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Jialin Wu
Xia Hu
Yaqing Wang
Bo Pang
Radu Soricut
MoE
21
14
0
01 Dec 2023
Mixture-of-Experts for Open Set Domain Adaptation: A Dual-Space Detection Approach
Zhenbang Du
Jiayu An
Yunlu Tu
Jiahao Hong
Dongrui Wu
MoE
25
1
0
01 Nov 2023
Enabling Language Models to Implicitly Learn Self-Improvement
Ziqi Wang
Le Hou
Tianjian Lu
Yuexin Wu
Yunxuan Li
Hongkun Yu
Heng Ji
ReLM
LRM
16
5
0
02 Oct 2023
ConPET: Continual Parameter-Efficient Tuning for Large Language Models
Chenyan Song
Xu Han
Zheni Zeng
Kuai Li
Chen Chen
Zhiyuan Liu
Maosong Sun
Taojiannan Yang
CLL
KELM
21
10
0
26 Sep 2023
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning
Ted Zadouri
Ahmet Üstün
Arash Ahmadian
Beyza Ermics
Acyr Locatelli
Sara Hooker
MoE
35
88
0
11 Sep 2023
Platypus: Quick, Cheap, and Powerful Refinement of LLMs
Ariel N. Lee
Cole J. Hunter
Nataniel Ruiz
ALM
ObjD
34
135
0
14 Aug 2023
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLM
LRM
54
556
0
23 Jun 2023
Scaling Vision-Language Models with Sparse Mixture of Experts
Sheng Shen
Z. Yao
Chunyuan Li
Trevor Darrell
Kurt Keutzer
Yuxiong He
VLM
MoE
21
62
0
13 Mar 2023
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
227
502
0
28 Sep 2022
Tutel: Adaptive Mixture-of-Experts at Scale
Changho Hwang
Wei Cui
Yifan Xiong
Ziyue Yang
Ze Liu
...
Joe Chau
Peng Cheng
Fan Yang
Mao Yang
Y. Xiong
MoE
97
110
0
07 Jun 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
319
11,953
0
04 Mar 2022
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
160
327
0
18 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
389
8,495
0
28 Jan 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
213
1,657
0
15 Oct 2021
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Sneha Kudugunta
Yanping Huang
Ankur Bapna
M. Krikun
Dmitry Lepikhin
Minh-Thang Luong
Orhan Firat
MoE
119
106
0
24 Sep 2021
CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP
Qinyuan Ye
Bill Yuchen Lin
Xiang Ren
214
180
0
18 Apr 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,848
0
18 Apr 2021
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
250
677
0
06 Jan 2021
1