ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.13491
  4. Cited By
Spatial Mixture-of-Experts

Spatial Mixture-of-Experts

24 November 2022
Nikoli Dryden
Torsten Hoefler
    MoE
ArXivPDFHTML

Papers citing "Spatial Mixture-of-Experts"

50 / 71 papers shown
Title
ENS-10: A Dataset For Post-Processing Ensemble Weather Forecasts
ENS-10: A Dataset For Post-Processing Ensemble Weather Forecasts
Saleh Ashkboos
La-mei Huang
Nikoli Dryden
Tal Ben-Nun
P. Dueben
Lukas Gianinazzi
L. Kummer
Torsten Hoefler
36
19
0
29 Jun 2022
FourCastNet: A Global Data-driven High-resolution Weather Model using
  Adaptive Fourier Neural Operators
FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators
Jaideep Pathak
Shashank Subramanian
P. Harrington
S. Raja
Ashesh Chattopadhyay
...
Zong-Yi Li
Kamyar Azizzadenesheli
Pedram Hassanzadeh
K. Kashinath
Anima Anandkumar
AI4Cl
115
674
0
22 Feb 2022
Forecasting Global Weather with Graph Neural Networks
Forecasting Global Weather with Graph Neural Networks
R. Keisler
AI4Cl
60
167
0
15 Feb 2022
Unified Scaling Laws for Routed Language Models
Unified Scaling Laws for Routed Language Models
Aidan Clark
Diego de Las Casas
Aurelia Guy
A. Mensch
Michela Paganini
...
Oriol Vinyals
Jack W. Rae
Erich Elsen
Koray Kavukcuoglu
Karen Simonyan
MoE
57
178
0
02 Feb 2022
Two Sparsities Are Better Than One: Unlocking the Performance Benefits
  of Sparse-Sparse Networks
Two Sparsities Are Better Than One: Unlocking the Performance Benefits of Sparse-Sparse Networks
Kevin Lee Hunter
Lawrence Spracklen
Subutai Ahmad
39
20
0
27 Dec 2021
Efficient Large Scale Language Modeling with Mixtures of Experts
Efficient Large Scale Language Modeling with Mixtures of Experts
Mikel Artetxe
Shruti Bhosale
Naman Goyal
Todor Mihaylov
Myle Ott
...
Jeff Wang
Luke Zettlemoyer
Mona T. Diab
Zornitsa Kozareva
Ves Stoyanov
MoE
96
192
0
20 Dec 2021
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Nan Du
Yanping Huang
Andrew M. Dai
Simon Tong
Dmitry Lepikhin
...
Kun Zhang
Quoc V. Le
Yonghui Wu
Zhiwen Chen
Claire Cui
ALM
MoE
114
794
0
13 Dec 2021
Skillful Twelve Hour Precipitation Forecasts using Large Context Neural
  Networks
Skillful Twelve Hour Precipitation Forecasts using Large Context Neural Networks
L. Espeholt
Shreya Agrawal
C. Sønderby
M. Kumar
Jonathan Heek
Carla Bromberg
Cenk Gazen
Jason Hickey
Aaron Bell
Nal Kalchbrenner
AI4Cl
38
48
0
14 Nov 2021
SwinIR: Image Restoration Using Swin Transformer
SwinIR: Image Restoration Using Swin Transformer
Christos Sakaridis
Jie Cao
Guolei Sun
Peng Sun
Luc Van Gool
Radu Timofte
ViT
152
2,862
0
23 Aug 2021
PVT v2: Improved Baselines with Pyramid Vision Transformer
PVT v2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
AI4TS
71
1,634
0
25 Jun 2021
Scaling Vision with Sparse Mixture of Experts
Scaling Vision with Sparse Mixture of Experts
C. Riquelme
J. Puigcerver
Basil Mustafa
Maxim Neumann
Rodolphe Jenatton
André Susano Pinto
Daniel Keysers
N. Houlsby
MoE
52
587
0
10 Jun 2021
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
M. Bronstein
Joan Bruna
Taco S. Cohen
Petar Velivcković
GNN
250
1,137
0
27 Apr 2021
Skillful Precipitation Nowcasting using Deep Generative Models of Radar
Skillful Precipitation Nowcasting using Deep Generative Models of Radar
Suman V. Ravuri
Karel Lenc
Matthew Willson
D. Kangin
Rémi R. Lam
...
R. Hadsell
Nial H. Robinson
Ellen Clancy
A. Arribas
S. Mohamed
AI4Cl
87
727
0
02 Apr 2021
BASE Layers: Simplifying Training of Large, Sparse Models
BASE Layers: Simplifying Training of Large, Sparse Models
M. Lewis
Shruti Bhosale
Tim Dettmers
Naman Goyal
Luke Zettlemoyer
MoE
122
275
0
30 Mar 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
203
21,051
0
25 Mar 2021
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Ashish Vaswani
Prajit Ramachandran
A. Srinivas
Niki Parmar
Blake A. Hechtman
Jonathon Shlens
60
398
0
23 Mar 2021
ConViT: Improving Vision Transformers with Soft Convolutional Inductive
  Biases
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
Stéphane dÁscoli
Hugo Touvron
Matthew L. Leavitt
Ari S. Morcos
Giulio Biroli
Levent Sagun
ViT
80
818
0
19 Mar 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
402
3,660
0
24 Feb 2021
Sparsity in Deep Learning: Pruning and growth for efficient inference
  and training in neural networks
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
Torsten Hoefler
Dan Alistarh
Tal Ben-Nun
Nikoli Dryden
Alexandra Peste
MQ
214
703
0
31 Jan 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple
  and Efficient Sparsity
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
32
2,136
0
11 Jan 2021
Biased Mixtures Of Experts: Enabling Computer Vision Inference Under
  Data Transfer Limitations
Biased Mixtures Of Experts: Enabling Computer Vision Inference Under Data Transfer Limitations
Alhabib Abbas
Y. Andreopoulos
MoE
75
18
0
21 Aug 2020
Hardware Acceleration of Sparse and Irregular Tensor Computations of ML
  Models: A Survey and Insights
Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights
Shail Dave
Riyadh Baghdadi
Tony Nowatzki
Sasikanth Avancha
Aviral Shrivastava
Baoxin Li
75
82
0
02 Jul 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
60
133
0
30 Jun 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic
  Sharding
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Zhiwen Chen
MoE
66
1,142
0
30 Jun 2020
Dynamic Model Pruning with Feedback
Dynamic Model Pruning with Feedback
Tao R. Lin
Sebastian U. Stich
Luis Barba
Daniil Dmitriev
Martin Jaggi
70
199
0
12 Jun 2020
Deep Learning for Post-Processing Ensemble Weather Forecasts
Deep Learning for Post-Processing Ensemble Weather Forecasts
Peter Grönquist
Chengyuan Yao
Tal Ben-Nun
Nikoli Dryden
P. Dueben
Shigang Li
Torsten Hoefler
19
165
0
18 May 2020
MetNet: A Neural Weather Model for Precipitation Forecasting
MetNet: A Neural Weather Model for Precipitation Forecasting
C. Sønderby
L. Espeholt
Jonathan Heek
Mostafa Dehghani
Avital Oliver
Tim Salimans
Shreya Agrawal
Jason Hickey
Nal Kalchbrenner
AI4Cl
257
274
0
24 Mar 2020
Revisiting Spatial Invariance with Low-Rank Local Connectivity
Revisiting Spatial Invariance with Low-Rank Local Connectivity
Gamaleldin F. Elsayed
Prajit Ramachandran
Jonathon Shlens
Simon Kornblith
50
44
0
07 Feb 2020
WeatherBench: A benchmark dataset for data-driven weather forecasting
WeatherBench: A benchmark dataset for data-driven weather forecasting
S. Rasp
P. Dueben
S. Scher
Jonathan A. Weyn
Soukayna Mouatadid
Nils Thuerey
AI4Cl
AI4TS
41
434
0
02 Feb 2020
Machine Learning for Precipitation Nowcasting from Radar Images
Machine Learning for Precipitation Nowcasting from Radar Images
Shreya Agrawal
Luke Barrington
Carla Bromberg
J. Burge
Cenk Gazen
Jason Hickey
AI4Cl
39
223
0
11 Dec 2019
PyTorch: An Imperative Style, High-Performance Deep Learning Library
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
106
42,038
0
03 Dec 2019
LatentGNN: Learning Efficient Non-local Relations for Visual Recognition
LatentGNN: Learning Efficient Non-local Relations for Visual Recognition
Songyang Zhang
Shipeng Yan
Xuming He
GNN
43
81
0
28 May 2019
CondConv: Conditionally Parameterized Convolutions for Efficient
  Inference
CondConv: Conditionally Parameterized Convolutions for Efficient Inference
Brandon Yang
Gabriel Bender
Quoc V. Le
Jiquan Ngiam
MedIm
3DV
37
628
0
10 Apr 2019
How Can We Be So Dense? The Benefits of Using Highly Sparse
  Representations
How Can We Be So Dense? The Benefits of Using Highly Sparse Representations
Subutai Ahmad
Luiz Scheinkman
37
96
0
27 Mar 2019
Attention Branch Network: Learning of Attention Mechanism for Visual
  Explanation
Attention Branch Network: Learning of Attention Mechanism for Visual Explanation
Hiroshi Fukui
Tsubasa Hirakawa
Takayoshi Yamashita
H. Fujiyoshi
XAI
FAtt
43
402
0
25 Dec 2018
Graph-Based Global Reasoning Networks
Graph-Based Global Reasoning Networks
Yunpeng Chen
Marcus Rohrbach
Zhicheng Yan
Shuicheng Yan
Jiashi Feng
Yannis Kalantidis
GNN
NAI
284
457
0
30 Nov 2018
You Look Twice: GaterNet for Dynamic Filter Selection in CNNs
You Look Twice: GaterNet for Dynamic Filter Selection in CNNs
Zhourong Chen
Yang Li
Samy Bengio
Si Si
42
98
0
27 Nov 2018
Bayesian Deep Convolutional Networks with Many Channels are Gaussian
  Processes
Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes
Roman Novak
Lechao Xiao
Jaehoon Lee
Yasaman Bahri
Greg Yang
Jiri Hron
Daniel A. Abolafia
Jeffrey Pennington
Jascha Narain Sohl-Dickstein
UQCV
BDL
37
308
0
11 Oct 2018
CBAM: Convolutional Block Attention Module
CBAM: Convolutional Block Attention Module
Sanghyun Woo
Jongchan Park
Joon-Young Lee
In So Kweon
149
16,337
0
17 Jul 2018
An Intriguing Failing of Convolutional Neural Networks and the CoordConv
  Solution
An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution
Rosanne Liu
Joel Lehman
Piero Molino
F. Such
Eric Frank
Alexander Sergeev
J. Yosinski
56
887
0
09 Jul 2018
Neural networks for post-processing ensemble weather forecasts
Neural networks for post-processing ensemble weather forecasts
S. Rasp
Sebastian Lerch
57
343
0
23 May 2018
Learn To Pay Attention
Learn To Pay Attention
Saumya Jetley
Nicholas A. Lord
Namhoon Lee
Philip Torr
87
437
0
06 Apr 2018
Non-local Neural Networks
Non-local Neural Networks
Xinyu Wang
Ross B. Girshick
Abhinav Gupta
Kaiming He
OffRL
170
8,867
0
21 Nov 2017
Routing Networks: Adaptive Selection of Non-linear Functions for
  Multi-Task Learning
Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning
Clemens Rosenbaum
Tim Klinger
Matthew D Riemer
58
242
0
03 Nov 2017
Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model
Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model
Xingjian Shi
Zhihan Gao
Leonard Lausen
Hao Wang
Dit-Yan Yeung
W. Wong
W. Woo
36
794
0
12 Jun 2017
Residual Attention Network for Image Classification
Residual Attention Network for Image Classification
Fei Wang
Mengqing Jiang
Chao Qian
Shuo Yang
Cheng Li
Honggang Zhang
Xiaogang Wang
Xiaoou Tang
88
3,299
0
23 Apr 2017
Hard Mixtures of Experts for Large Scale Weakly Supervised Vision
Hard Mixtures of Experts for Large Scale Weakly Supervised Vision
Sam Gross
MarcÁurelio Ranzato
Arthur Szlam
MoE
31
102
0
20 Apr 2017
Deciding How to Decide: Dynamic Routing in Artificial Neural Networks
Deciding How to Decide: Dynamic Routing in Artificial Neural Networks
Mason McGill
Pietro Perona
33
102
0
17 Mar 2017
PathNet: Evolution Channels Gradient Descent in Super Neural Networks
PathNet: Evolution Channels Gradient Descent in Super Neural Networks
Chrisantha Fernando
Dylan Banarse
Charles Blundell
Yori Zwols
David R Ha
Andrei A. Rusu
Alexander Pritzel
Daan Wierstra
22
879
0
30 Jan 2017
Outrageously Large Neural Networks: The Sparsely-Gated
  Mixture-of-Experts Layer
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
104
2,582
0
23 Jan 2017
12
Next