ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1406.7362
  4. Cited By
Exponentially Increasing the Capacity-to-Computation Ratio for
  Conditional Computation in Deep Learning

Exponentially Increasing the Capacity-to-Computation Ratio for Conditional Computation in Deep Learning

28 June 2014
Kyunghyun Cho
Yoshua Bengio
ArXivPDFHTML

Papers citing "Exponentially Increasing the Capacity-to-Computation Ratio for Conditional Computation in Deep Learning"

14 / 14 papers shown
Title
GradMDM: Adversarial Attack on Dynamic Networks
GradMDM: Adversarial Attack on Dynamic Networks
Jianhong Pan
Lin Geng Foo
Qichen Zheng
Zhipeng Fan
Hossein Rahmani
Qiuhong Ke
Xiaozhong Liu
AAML
16
6
0
01 Apr 2023
Memorization Capacity of Neural Networks with Conditional Computation
Memorization Capacity of Neural Networks with Conditional Computation
Erdem Koyuncu
38
4
0
20 Mar 2023
Spatial Mixture-of-Experts
Spatial Mixture-of-Experts
Nikoli Dryden
Torsten Hoefler
MoE
34
9
0
24 Nov 2022
Switchable Representation Learning Framework with Self-compatibility
Switchable Representation Learning Framework with Self-compatibility
Shengsen Wu
Yan Bai
Yihang Lou
Xiongkun Linghu
Jianzhong He
Ling-yu Duan
22
1
0
16 Jun 2022
Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models
Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models
Yang Shu
Zhangjie Cao
Ziyang Zhang
Jianmin Wang
Mingsheng Long
17
4
0
08 Jun 2022
APG: Adaptive Parameter Generation Network for Click-Through Rate
  Prediction
APG: Adaptive Parameter Generation Network for Click-Through Rate Prediction
Bencheng Yan
Pengjie Wang
Kai Zhang
Feng Li
Hongbo Deng
Jian Xu
Bo Zheng
27
20
0
30 Mar 2022
Mixture-of-Experts with Expert Choice Routing
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
160
329
0
18 Feb 2022
Efficient Large Scale Language Modeling with Mixtures of Experts
Efficient Large Scale Language Modeling with Mixtures of Experts
Mikel Artetxe
Shruti Bhosale
Naman Goyal
Todor Mihaylov
Myle Ott
...
Jeff Wang
Luke Zettlemoyer
Mona T. Diab
Zornitsa Kozareva
Ves Stoyanov
MoE
61
188
0
20 Dec 2021
Zoo-Tuning: Adaptive Transfer from a Zoo of Models
Zoo-Tuning: Adaptive Transfer from a Zoo of Models
Yang Shu
Zhi Kou
Zhangjie Cao
Jianmin Wang
Mingsheng Long
29
44
0
29 Jun 2021
Scaling Vision with Sparse Mixture of Experts
Scaling Vision with Sparse Mixture of Experts
C. Riquelme
J. Puigcerver
Basil Mustafa
Maxim Neumann
Rodolphe Jenatton
André Susano Pinto
Daniel Keysers
N. Houlsby
MoE
17
575
0
10 Jun 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple
  and Efficient Sparsity
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
11
2,075
0
11 Jan 2021
Conditional Computation for Continual Learning
Conditional Computation for Continual Learning
Min-Bin Lin
Jie Fu
Yoshua Bengio
CLL
31
10
0
16 Jun 2019
Outrageously Large Neural Networks: The Sparsely-Gated
  Mixture-of-Experts Layer
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
55
2,513
0
23 Jan 2017
Improving neural networks by preventing co-adaptation of feature
  detectors
Improving neural networks by preventing co-adaptation of feature detectors
Geoffrey E. Hinton
Nitish Srivastava
A. Krizhevsky
Ilya Sutskever
Ruslan Salakhutdinov
VLM
266
7,638
0
03 Jul 2012
1