Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.02176
Cited By
Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity
3 May 2023
Da Xu
Maha Elbayad
Kenton W. Murray
Jean Maillard
Vedanuj Goswami
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity"
8 / 8 papers shown
Title
Progress and Opportunities of Foundation Models in Bioinformatics
Qing Li
Zhihang Hu
Yixuan Wang
Lei Li
Yimin Fan
Irwin King
Le Song
Yu-Hu Li
AI4CE
40
9
0
06 Feb 2024
MultiMUC: Multilingual Template Filling on MUC-4
William Gantt
Shabnam Behzad
Hannah YoungEun An
Yunmo Chen
Aaron Steven White
Benjamin Van Durme
M. Yarmohammadi
32
3
0
29 Jan 2024
Condensing Multilingual Knowledge with Lightweight Language-Specific Modules
Haoran Xu
Weiting Tan
Shuyue Stella Li
Yunmo Chen
Benjamin Van Durme
Philipp Koehn
Kenton W. Murray
11
6
0
23 May 2023
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
160
327
0
18 Feb 2022
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Sneha Kudugunta
Yanping Huang
Ankur Bapna
M. Krikun
Dmitry Lepikhin
Minh-Thang Luong
Orhan Firat
MoE
119
106
0
24 Sep 2021
Scalable and Efficient MoE Training for Multitask Multilingual Models
Young Jin Kim
A. A. Awan
Alexandre Muzio
Andres Felipe Cruz Salinas
Liyang Lu
Amr Hendy
Samyam Rajbhandari
Yuxiong He
Hany Awadalla
MoE
96
84
0
22 Sep 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
237
4,469
0
23 Jan 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,956
0
20 Apr 2018
1