Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts

2 September 2024

Papers citing "Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts"

6 / 6 papers shown

Title
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL Ghada Sokar J. Obando-Ceron Aaron C. Courville Hugo Larochelle Pablo Samuel Castro MoE 127 2 0 02 Oct 2024
On Least Square Estimation in Softmax Gating Mixture of Experts Huy Nguyen Nhat Ho Alessandro Rinaldo 51 13 0 05 Feb 2024
Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters Umberto Cappellazzo Daniele Falavigna A. Brutti MoE 38 2 0 01 Feb 2024
Mixture-of-Experts with Expert Choice Routing Yan-Quan Zhou Tao Lei Han-Chu Liu Nan Du Yanping Huang Vincent Zhao Andrew M. Dai Zhifeng Chen Quoc V. Le James Laudon MoE 160 327 0 18 Feb 2022
What is the State of Neural Network Pruning? Davis W. Blalock Jose Javier Gonzalez Ortiz Jonathan Frankle John Guttag 191 1,027 0 06 Mar 2020
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 249 4,489 0 23 Jan 2020