Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2007.00072
Cited By
Data Movement Is All You Need: A Case Study on Optimizing Transformers
30 June 2020
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Data Movement Is All You Need: A Case Study on Optimizing Transformers"
34 / 34 papers shown
Title
Morello: Compiling Fast Neural Networks with Dynamic Programming and Spatial Compression
Samuel J. Kaufman
René Just
Rastislav Bodik
22
0
0
03 May 2025
Nonlinear Computation with Linear Optics via Source-Position Encoding
N. Richardson
C. Bosch
R. P. Adams
39
0
0
29 Apr 2025
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
Gregor Bachmann
Sotiris Anagnostidis
Albert Pumarola
Markos Georgopoulos
A. Sanakoyeu
Yuming Du
Edgar Schönfeld
Ali K. Thabet
Jonas Kohler
ALM
BDL
93
6
0
31 Jan 2025
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
Mohammadali Shakerdargah
Shan Lu
Chao Gao
Di Niu
70
0
0
20 Nov 2024
BF-IMNA: A Bit Fluid In-Memory Neural Architecture for Neural Network Acceleration
M. Rakka
Rachid Karami
A. Eltawil
M. Fouda
Fadi J. Kurdahi
MQ
39
1
0
03 Nov 2024
Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers
Rya Sanovar
Srikant Bharadwaj
Renée St. Amant
Victor Rühle
Saravan Rajmohan
58
6
0
17 May 2024
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts
R. Prabhakar
R. Sivaramakrishnan
Darshan Gandhi
Yun Du
Mingran Wang
...
Urmish Thakker
Dawei Huang
Sumti Jairath
Kevin J. Brown
K. Olukotun
MoE
39
12
0
13 May 2024
Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models
Wenqi Jiang
Marco Zeller
R. Waleffe
Torsten Hoefler
Gustavo Alonso
54
16
0
15 Oct 2023
OSMOSIS: Enabling Multi-Tenancy in Datacenter SmartNICs
Mikhail Khalilov
Marcin Chrapek
Siyuan Shen
Alessandro Vezzu
Thomas Emanuel Benz
Salvatore Di Girolamo
Timo Schneider
Daniele Di Sensi
Luca Benini
Torsten Hoefler
32
7
0
07 Sep 2023
Bridging Control-Centric and Data-Centric Optimization
Tal Ben-Nun
Berke Ates
A. Calotoiu
Torsten Hoefler
31
7
0
01 Jun 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurelien Lucchi
Thomas Hofmann
36
53
0
25 May 2023
STen: Productive and Efficient Sparsity in PyTorch
Andrei Ivanov
Nikoli Dryden
Tal Ben-Nun
Saleh Ashkboos
Torsten Hoefler
34
4
0
15 Apr 2023
Operator Fusion in XLA: Analysis and Evaluation
Danielle Snider
Ruofan Liang
24
4
0
30 Jan 2023
Myths and Legends in High-Performance Computing
Satoshi Matsuoka
Jens Domke
M. Wahib
Aleksandr Drozd
Torsten Hoefler
27
14
0
06 Jan 2023
A Theory of I/O-Efficient Sparse Neural Network Inference
Niels Gleinig
Tal Ben-Nun
Torsten Hoefler
25
0
0
03 Jan 2023
Pex: Memory-efficient Microcontroller Deep Learning through Partial Execution
Edgar Liberis
Nicholas D. Lane
20
3
0
30 Nov 2022
Spatial Mixture-of-Experts
Nikoli Dryden
Torsten Hoefler
MoE
34
9
0
24 Nov 2022
HammingMesh: A Network Topology for Large-Scale Deep Learning
Torsten Hoefler
Tommaso Bonato
Daniele De Sensi
Salvatore Di Girolamo
Shigang Li
Marco Heddes
Jon Belk
Deepak Goel
Miguel Castro
Steve Scott
3DH
GNN
AI4CE
26
20
0
03 Sep 2022
Survey: Exploiting Data Redundancy for Optimization of Deep Learning
Jou-An Chen
Wei Niu
Bin Ren
Yanzhi Wang
Xipeng Shen
23
24
0
29 Aug 2022
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Reza Yazdani Aminabadi
Samyam Rajbhandari
Minjia Zhang
A. A. Awan
Cheng-rong Li
...
Elton Zheng
Jeff Rasley
Shaden Smith
Olatunji Ruwase
Yuxiong He
29
335
0
30 Jun 2022
SimA: Simple Softmax-free Attention for Vision Transformers
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
16
25
0
17 Jun 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
69
2,024
0
27 May 2022
C-NMT: A Collaborative Inference Framework for Neural Machine Translation
Yukai Chen
R. Chiaro
Enrico Macii
M. Poncino
Daniele Jahier Pagliari
23
0
0
08 Apr 2022
DNNFuser: Generative Pre-Trained Transformer as a Generalized Mapper for Layer Fusion in DNN Accelerators
Sheng-Chun Kao
Xiaoyu Huang
T. Krishna
AI4CE
35
9
0
26 Jan 2022
Lifting C Semantics for Dataflow Optimization
A. Calotoiu
Tal Ben-Nun
Grzegorz Kwa'sniewski
Johannes de Fine Licht
Timo Schneider
Philipp Schaad
Torsten Hoefler
19
6
0
22 Dec 2021
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
Sheng-Chun Kao
Suvinay Subramanian
Gaurav Agrawal
Amir Yazdanbakhsh
T. Krishna
38
57
0
13 Jul 2021
Productivity, Portability, Performance: Data-Centric Python
Yiheng Wang
Yao Zhang
Yanzhang Wang
Yan Wan
Jiao Wang
Zhongyuan Wu
Yuhao Yang
Bowen She
54
94
0
01 Jul 2021
Improving the Efficiency of Transformers for Resource-Constrained Devices
Hamid Tabani
Ajay Balasubramaniam
Shabbir Marzban
Elahe Arani
Bahram Zonooz
38
20
0
30 Jun 2021
Clairvoyant Prefetching for Distributed Machine Learning I/O
Nikoli Dryden
Roman Böhringer
Tal Ben-Nun
Torsten Hoefler
31
55
0
21 Jan 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
246
4,489
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,821
0
17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
Geometric deep learning: going beyond Euclidean data
M. Bronstein
Joan Bruna
Yann LeCun
Arthur Szlam
P. Vandergheynst
GNN
259
3,239
0
24 Nov 2016
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
218
7,926
0
17 Aug 2015
1