ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1406.7362
  4. Cited By
Exponentially Increasing the Capacity-to-Computation Ratio for
  Conditional Computation in Deep Learning

Exponentially Increasing the Capacity-to-Computation Ratio for Conditional Computation in Deep Learning

28 June 2014
Dong Wang
Yoshua Bengio
ArXiv (abs)PDFHTML

Papers citing "Exponentially Increasing the Capacity-to-Computation Ratio for Conditional Computation in Deep Learning"

32 / 32 papers shown
MoIN: Mixture of Introvert Experts to Upcycle an LLM
MoIN: Mixture of Introvert Experts to Upcycle an LLM
Ajinkya Tejankar
K. Navaneet
Ujjawal Panchal
Kossar Pourahmadi
Hamed Pirsiavash
MoE
351
0
0
13 Oct 2024
Scaling Diffusion Transformers to 16 Billion Parameters
Scaling Diffusion Transformers to 16 Billion Parameters
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Debang Li
Junshi Huang
DiffMMoE
298
37
0
16 Jul 2024
Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of
  Modules
Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules
Zhuocheng Gong
Ang Lv
Jian Guan
Junxi Yan
Wei Wu
Huishuai Zhang
Minlie Huang
Dongyan Zhao
Rui Yan
MoE
183
8
0
09 Jul 2024
Mixture-of-Depths: Dynamically allocating compute in transformer-based
  language models
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
David Raposo
Sam Ritter
Blake A. Richards
Timothy Lillicrap
Peter C. Humphreys
Adam Santoro
MoE
288
146
0
02 Apr 2024
Statistical Perspective of Top-K Sparse Softmax Gating Mixture of
  Experts
Statistical Perspective of Top-K Sparse Softmax Gating Mixture of ExpertsInternational Conference on Learning Representations (ICLR), 2023
Huy Nguyen
Pedram Akbarian
Fanqi Yan
Nhat Ho
MoE
356
26
0
25 Sep 2023
ScrollNet: Dynamic Weight Importance for Continual Learning
ScrollNet: Dynamic Weight Importance for Continual Learning
Fei Yang
Kai Wang
Joost van de Weijer
209
6
0
31 Aug 2023
Brainformers: Trading Simplicity for Efficiency
Brainformers: Trading Simplicity for EfficiencyInternational Conference on Machine Learning (ICML), 2023
Yan-Quan Zhou
Nan Du
Yanping Huang
Daiyi Peng
Chang Lan
...
Zhifeng Chen
Quoc V. Le
Claire Cui
J.H.J. Laundon
J. Dean
MoE
249
36
0
29 May 2023
GradMDM: Adversarial Attack on Dynamic Networks
GradMDM: Adversarial Attack on Dynamic NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jianhong Pan
Lin Geng Foo
Qichen Zheng
Zhipeng Fan
Hossein Rahmani
Qiuhong Ke
Jing Liu
AAML
197
9
0
01 Apr 2023
Memorization Capacity of Neural Networks with Conditional Computation
Memorization Capacity of Neural Networks with Conditional ComputationInternational Conference on Learning Representations (ICLR), 2023
Erdem Koyuncu
155
5
0
20 Mar 2023
Spatial Mixture-of-Experts
Spatial Mixture-of-ExpertsNeural Information Processing Systems (NeurIPS), 2022
Nikoli Dryden
Torsten Hoefler
MoE
235
16
0
24 Nov 2022
Switchable Representation Learning Framework with Self-compatibility
Switchable Representation Learning Framework with Self-compatibilityComputer Vision and Pattern Recognition (CVPR), 2022
Shengsen Wu
Yan Bai
Yihang Lou
Xiongkun Linghu
Jianzhong He
Ling-yu Duan
296
1
0
16 Jun 2022
Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models
Hub-Pathway: Transfer Learning from A Hub of Pre-trained ModelsNeural Information Processing Systems (NeurIPS), 2022
Yang Shu
Zhangjie Cao
Ziyang Zhang
Jianmin Wang
Mingsheng Long
240
6
0
08 Jun 2022
APG: Adaptive Parameter Generation Network for Click-Through Rate
  Prediction
APG: Adaptive Parameter Generation Network for Click-Through Rate PredictionNeural Information Processing Systems (NeurIPS), 2022
Bencheng Yan
Pengjie Wang
Kai Zhang
Feng Li
Hongbo Deng
Jian Xu
Bo Zheng
275
26
0
30 Mar 2022
Mixture-of-Experts with Expert Choice Routing
Mixture-of-Experts with Expert Choice RoutingNeural Information Processing Systems (NeurIPS), 2022
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
637
578
0
18 Feb 2022
Efficient Large Scale Language Modeling with Mixtures of Experts
Efficient Large Scale Language Modeling with Mixtures of ExpertsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Mikel Artetxe
Shruti Bhosale
Naman Goyal
Todor Mihaylov
Myle Ott
...
Jeff Wang
Luke Zettlemoyer
Mona T. Diab
Zornitsa Kozareva
Ves Stoyanov
MoE
476
226
0
20 Dec 2021
Zoo-Tuning: Adaptive Transfer from a Zoo of Models
Zoo-Tuning: Adaptive Transfer from a Zoo of ModelsInternational Conference on Machine Learning (ICML), 2021
Yang Shu
Zhi Kou
Zhangjie Cao
Jianmin Wang
Mingsheng Long
195
46
0
29 Jun 2021
Scaling Vision with Sparse Mixture of Experts
Scaling Vision with Sparse Mixture of ExpertsNeural Information Processing Systems (NeurIPS), 2021
C. Riquelme
J. Puigcerver
Basil Mustafa
Maxim Neumann
Rodolphe Jenatton
André Susano Pinto
Daniel Keysers
N. Houlsby
MoE
346
870
0
10 Jun 2021
Dynamic Multi-Branch Layers for On-Device Neural Machine Translation
Dynamic Multi-Branch Layers for On-Device Neural Machine TranslationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021
Zhixing Tan
Zeyuan Yang
Meng Zhang
Qun Liu
Maosong Sun
Yang Liu
AI4CE
201
5
0
14 May 2021
Dynamic Neural Networks: A Survey
Dynamic Neural Networks: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Yizeng Han
Gao Huang
Shiji Song
Le Yang
Honghui Wang
Yulin Wang
3DHAI4TSAI4CE
447
812
0
09 Feb 2021
CNN with large memory layers
CNN with large memory layers
R. Karimov
Yury Malkov
Karim Iskakov
Victor Lempitsky
223
0
0
27 Jan 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple
  and Efficient Sparsity
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient SparsityJournal of machine learning research (JMLR), 2021
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
577
3,178
0
11 Jan 2021
Real-time Human Activity Recognition Using Conditionally Parametrized
  Convolutions on Mobile and Wearable Devices
Real-time Human Activity Recognition Using Conditionally Parametrized Convolutions on Mobile and Wearable Devices
Xin-Hua Cheng
Guang Dai
Yin Tang
Yue Liu
Hao Wu
Jun He
CVBM3DHHAI
204
63
0
05 Jun 2020
Surprisal-Triggered Conditional Computation with Neural Networks
Surprisal-Triggered Conditional Computation with Neural Networks
Loren Lugosch
Derek Nowrouzezahrai
B. Meyer
218
6
0
02 Jun 2020
Entities as Experts: Sparse Memory Access with Entity Supervision
Entities as Experts: Sparse Memory Access with Entity SupervisionConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Thibault Févry
Livio Baldini Soares
Nicholas FitzGerald
Eunsol Choi
Tom Kwiatkowski
RALM
357
170
0
15 Apr 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Efficient Content-Based Sparse Attention with Routing TransformersTransactions of the Association for Computational Linguistics (TACL), 2020
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
996
693
0
12 Mar 2020
Controlling Computation versus Quality for Neural Sequence Models
Controlling Computation versus Quality for Neural Sequence Models
Ankur Bapna
N. Arivazhagan
Orhan Firat
241
34
0
17 Feb 2020
Large Memory Layers with Product Keys
Large Memory Layers with Product KeysNeural Information Processing Systems (NeurIPS), 2019
Guillaume Lample
Alexandre Sablayrolles
MarcÁurelio Ranzato
Ludovic Denoyer
Edouard Grave
MoE
246
154
0
10 Jul 2019
Conditional Computation for Continual Learning
Conditional Computation for Continual Learning
Min Lin
Jie Fu
Yoshua Bengio
CLL
148
11
0
16 Jun 2019
CondConv: Conditionally Parameterized Convolutions for Efficient
  Inference
CondConv: Conditionally Parameterized Convolutions for Efficient Inference
Brandon Yang
Gabriel Bender
Quoc V. Le
Jiquan Ngiam
MedIm3DV
405
764
0
10 Apr 2019
MaskConnect: Connectivity Learning by Gradient Descent
MaskConnect: Connectivity Learning by Gradient Descent
Karim Ahmed
Lorenzo Torresani
163
52
0
28 Jul 2018
Restricted Recurrent Neural Tensor Networks: Exploiting Word Frequency
  and Compositionality
Restricted Recurrent Neural Tensor Networks: Exploiting Word Frequency and Compositionality
Alexandre Salle
Aline Villavicencio
123
1
0
03 Apr 2017
Outrageously Large Neural Networks: The Sparsely-Gated
  Mixture-of-Experts Layer
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts LayerInternational Conference on Learning Representations (ICLR), 2017
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
642
3,807
0
23 Jan 2017
1
Page 1 of 1