Exponentially Increasing the Capacity-to-Computation Ratio for Conditional Computation in Deep Learning

28 June 2014

Dong Wang

Yoshua Bengio

ArXiv (abs)PDF HTML

Papers citing "Exponentially Increasing the Capacity-to-Computation Ratio for Conditional Computation in Deep Learning"

32 / 32 papers shown

MoIN: Mixture of Introvert Experts to Upcycle an LLM

354

13 Oct 2024

Scaling Diffusion Transformers to 16 Billion Parameters

Zhengcong Fei

Junshi Huang

327

16 Jul 2024

Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules

Huishuai Zhang

Minlie Huang

Dongyan Zhao

Rui Yan

MoE

189

09 Jul 2024

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

315

155

02 Apr 2024

Statistical Perspective of Top-K Sparse Softmax Gating Mixture of ExpertsInternational Conference on Learning Representations (ICLR), 2023

Huy Nguyen

383

25 Sep 2023

ScrollNet: Dynamic Weight Importance for Continual Learning

Fei Yang

Kai Wang

Joost van de Weijer

274

31 Aug 2023

Brainformers: Trading Simplicity for EfficiencyInternational Conference on Machine Learning (ICML), 2023

...

258

29 May 2023

GradMDM: Adversarial Attack on Dynamic NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

201

01 Apr 2023

Memorization Capacity of Neural Networks with Conditional ComputationInternational Conference on Learning Representations (ICLR), 2023

Erdem Koyuncu

167

20 Mar 2023

Spatial Mixture-of-ExpertsNeural Information Processing Systems (NeurIPS), 2022

Nikoli Dryden

Torsten Hoefler

MoE

242

24 Nov 2022

Switchable Representation Learning Framework with Self-compatibilityComputer Vision and Pattern Recognition (CVPR), 2022

Xiongkun Linghu

339

16 Jun 2022

Hub-Pathway: Transfer Learning from A Hub of Pre-trained ModelsNeural Information Processing Systems (NeurIPS), 2022

243

08 Jun 2022

APG: Adaptive Parameter Generation Network for Click-Through Rate PredictionNeural Information Processing Systems (NeurIPS), 2022

322

30 Mar 2022

Mixture-of-Experts with Expert Choice RoutingNeural Information Processing Systems (NeurIPS), 2022

646

609

18 Feb 2022

Efficient Large Scale Language Modeling with Mixtures of ExpertsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

...

Luke Zettlemoyer

550

231

20 Dec 2021

Zoo-Tuning: Adaptive Transfer from a Zoo of ModelsInternational Conference on Machine Learning (ICML), 2021

202

29 Jun 2021

Scaling Vision with Sparse Mixture of ExpertsNeural Information Processing Systems (NeurIPS), 2021

365

894

10 Jun 2021

Dynamic Multi-Branch Layers for On-Device Neural Machine TranslationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021

Zhixing Tan

Zeyuan Yang

Meng Zhang

Qun Liu

Maosong Sun

Yang Liu

AI4CE

206

14 May 2021

Dynamic Neural Networks: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021

Gao Huang

Yulin Wang

523

818

09 Feb 2021

CNN with large memory layers

246

27 Jan 2021

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient SparsityJournal of machine learning research (JMLR), 2021

589

3,353

11 Jan 2021

Real-time Human Activity Recognition Using Conditionally Parametrized Convolutions on Mobile and Wearable Devices

Jun He

215

05 Jun 2020

Surprisal-Triggered Conditional Computation with Neural Networks

Loren Lugosch

Derek Nowrouzezahrai

B. Meyer

224

02 Jun 2020

Entities as Experts: Sparse Memory Access with Entity SupervisionConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

Livio Baldini Soares

363

172

15 Apr 2020

Efficient Content-Based Sparse Attention with Routing TransformersTransactions of the Association for Computational Linguistics (TACL), 2020

1.1K

714

12 Mar 2020

Controlling Computation versus Quality for Neural Sequence Models

Ankur Bapna

N. Arivazhagan

Orhan Firat

254

17 Feb 2020

Large Memory Layers with Product KeysNeural Information Processing Systems (NeurIPS), 2019

Guillaume Lample

Alexandre Sablayrolles

279

159

10 Jul 2019

Conditional Computation for Continual Learning

161

16 Jun 2019

CondConv: Conditionally Parameterized Convolutions for Efficient Inference

431

773

10 Apr 2019

MaskConnect: Connectivity Learning by Gradient Descent

Karim Ahmed

Lorenzo Torresani

176

28 Jul 2018

Restricted Recurrent Neural Tensor Networks: Exploiting Word Frequency and Compositionality

Alexandre Salle

Aline Villavicencio

130

03 Apr 2017

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts LayerInternational Conference on Learning Representations (ICLR), 2017

660

4,008

23 Jan 2017