Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

v1v2v3v4 (latest)

Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

23 May 2024

Tao Lin

ArXiv (abs)PDF HTML

Papers citing "Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models"

6 / 56 papers shown

Title
Deeper, Broader and Artier Domain Generalization Da Li Yongxin Yang Yi-Zhe Song Timothy M. Hospedales OOD 124 1,451 0 09 Oct 2017
Deep Hashing Network for Unsupervised Domain Adaptation Hemanth Venkateswara José Eusébio Shayok Chakraborty S. Panchanathan OOD 147 2,057 0 22 Jun 2017
Hard Mixtures of Experts for Large Scale Weakly Supervised Vision Sam Gross MarcÁurelio Ranzato Arthur Szlam MoE 63 102 0 20 Apr 2017
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer Noam M. Shazeer Azalia Mirhoseini Krzysztof Maziarz Andy Davis Quoc V. Le Geoffrey E. Hinton J. Dean MoE 253 2,686 0 23 Jan 2017
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering Yash Goyal Tejas Khot D. Summers-Stay Dhruv Batra Devi Parikh CoGe 352 3,270 0 02 Dec 2016
Learning Factored Representations in a Deep Mixture of Experts David Eigen MarcÁurelio Ranzato Ilya Sutskever MoE 90 377 0 16 Dec 2013