MoEC: Mixture of Expert Clusters

19 July 2022

Papers citing "MoEC: Mixture of Expert Clusters"

29 / 29 papers shown

Title
HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models Bingshen Mu Kun Wei Qijie Shao Yong Xu Lei Xie MoE 96 2 0 30 Sep 2024
Task-Specific Expert Pruning for Sparse Mixture-of-Experts Tianyu Chen Shaohan Huang Yuan Xie Binxing Jiao Daxin Jiang Haoyi Zhou Jianxin Li Furu Wei MoE 53 40 0 01 Jun 2022
Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers R. Liu Young Jin Kim Alexandre Muzio Hany Awadalla MoE 65 22 0 28 May 2022
Residual Mixture of Experts Lemeng Wu Mengchen Liu Yinpeng Chen Dongdong Chen Xiyang Dai Lu Yuan MoE 81 36 0 20 Apr 2022
On the Representation Collapse of Sparse Mixture of Experts Zewen Chi Li Dong Shaohan Huang Damai Dai Shuming Ma ... Payal Bajaj Xia Song Xian-Ling Mao Heyan Huang Furu Wei MoMe MoE 67 104 0 20 Apr 2022
StableMoE: Stable Routing Strategy for Mixture of Experts Damai Dai Li Dong Shuming Ma Bo Zheng Zhifang Sui Baobao Chang Furu Wei MoE 43 63 0 18 Apr 2022
One Student Knows All Experts Know: From Sparse to Dense Fuzhao Xue Xiaoxin He Xiaozhe Ren Yuxuan Lou Yang You MoMe MoE 57 20 0 26 Jan 2022
Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition K. Kumatani R. Gmyr Andres Felipe Cruz Salinas Linquan Liu Wei Zuo Devang Patel Eric Sun Yu Shi MoE 62 20 0 10 Dec 2021
Tricks for Training Sparse Translation Models Dheeru Dua Shruti Bhosale Vedanuj Goswami James Cross M. Lewis Angela Fan MoE 177 19 0 15 Oct 2021
Go Wider Instead of Deeper Fuzhao Xue Ziji Shi Futao Wei Yuxuan Lou Yong Liu Yang You ViT MoE 49 82 0 25 Jul 2021
BEiT: BERT Pre-Training of Image Transformers Hangbo Bao Li Dong Songhao Piao Furu Wei ViT 265 2,824 0 15 Jun 2021
Scaling Vision with Sparse Mixture of Experts C. Riquelme J. Puigcerver Basil Mustafa Maxim Neumann Rodolphe Jenatton André Susano Pinto Daniel Keysers N. Houlsby MoE 106 601 0 10 Jun 2021
Hash Layers For Large Sparse Models Stephen Roller Sainbayar Sukhbaatar Arthur Szlam Jason Weston MoE 181 210 0 08 Jun 2021
BASE Layers: Simplifying Training of Large, Sparse Models M. Lewis Shruti Bhosale Tim Dettmers Naman Goyal Luke Zettlemoyer MoE 191 278 0 30 Mar 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity W. Fedus Barret Zoph Noam M. Shazeer MoE 85 2,181 0 11 Jan 2021
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding Dmitry Lepikhin HyoukJoong Lee Yuanzhong Xu Dehao Chen Orhan Firat Yanping Huang M. Krikun Noam M. Shazeer Zhiwen Chen MoE 92 1,162 0 30 Jun 2020
Language Models are Few-Shot Learners Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan ... Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever Dario Amodei BDL 755 41,932 0 28 May 2020
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension M. Lewis Yinhan Liu Naman Goyal Marjan Ghazvininejad Abdel-rahman Mohamed Omer Levy Veselin Stoyanov Luke Zettlemoyer AIMat VLM 246 10,819 0 29 Oct 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Colin Raffel Noam M. Shazeer Adam Roberts Katherine Lee Sharan Narang Michael Matena Yanqi Zhou Wei Li Peter J. Liu AIMat 419 20,127 0 23 Oct 2019
Cross-lingual Language Model Pretraining Guillaume Lample Alexis Conneau 75 2,744 0 22 Jan 2019
Neural Network Acceptability Judgments Alex Warstadt Amanpreet Singh Samuel R. Bowman 230 1,407 0 31 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 1.1K 7,159 0 20 Apr 2018
SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation Daniel Cer Mona T. Diab Eneko Agirre I. Lopez-Gazpio Lucia Specia 430 1,881 0 31 Jul 2017
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 698 131,652 0 12 Jun 2017
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference Adina Williams Nikita Nangia Samuel R. Bowman 520 4,479 0 18 Apr 2017
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer Noam M. Shazeer Azalia Mirhoseini Krzysztof Maziarz Andy Davis Quoc V. Le Geoffrey E. Hinton J. Dean MoE 248 2,644 0 23 Jan 2017
SQuAD: 100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar Jian Zhang Konstantin Lopyrev Percy Liang RALM 280 8,134 0 16 Jun 2016
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books Yukun Zhu Ryan Kiros R. Zemel Ruslan Salakhutdinov R. Urtasun Antonio Torralba Sanja Fidler 122 2,548 0 22 Jun 2015
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 1.8K 150,039 0 22 Dec 2014