NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units

15 November 2019

Papers citing "NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units"

22 / 22 papers shown

Title
PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units Yujeong Choi Minsoo Rhu 18 128 0 06 Sep 2019
Beyond Human-Level Accuracy: Computational Challenges in Deep Learning Joel Hestness Newsha Ardalani G. Diamos 31 67 0 03 Sep 2019
TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning Youngeun Kwon Yunjae Lee Minsoo Rhu 35 208 0 08 Aug 2019
The Architectural Implications of Facebook's DNN-based Personalized Recommendation Udit Gupta Carole-Jean Wu Xiaodong Wang Maxim Naumov Brandon Reagen ... Andrey Malevich Dheevatsa Mudigere M. Smelyanskiy Liang Xiong Xuan Zhang GNN 65 290 0 06 Jun 2019
Deep Learning Recommendation Model for Personalization and Recommendation Systems Maxim Naumov Dheevatsa Mudigere Hao-Jun Michael Shi Jianyu Huang Narayanan Sundaraman ... Wenlin Chen Vijay Rao Bill Jia Liang Xiong M. Smelyanskiy 60 726 0 31 May 2019
Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning Youngeun Kwon Minsoo Rhu 34 57 0 18 Feb 2019
Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications Jongsoo Park Maxim Naumov Protonu Basu Summer Deng Aravind Kalaiah ... Lin Qiao Vijay Rao Nadav Rotem S. Yoo M. Smelyanskiy FedML GNN BDL 50 187 0 24 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 943 93,936 0 11 Oct 2018
Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How A. Delmas Patrick Judd Dylan Malone Stuart Zissis Poulos Mostafa Mahmoud Sayeh Sharify Milos Nikolic Andreas Moshovos 39 24 0 09 Mar 2018
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 435 129,831 0 12 Jun 2017
SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks A. Parashar Minsoo Rhu Anurag Mukkara A. Puglielli Rangharajan Venkatesan Brucek Khailany J. Emer S. Keckler W. Dally 54 1,122 0 23 May 2017
Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks Minsoo Rhu Mike O'Connor Niladrish Chatterjee Jeff Pool S. Keckler 42 176 0 03 May 2017
In-Datacenter Performance Analysis of a Tensor Processing Unit N. Jouppi C. Young Nishant Patil David Patterson Gaurav Agrawal ... Vijay Vasudevan Richard Walter Walter Wang Eric Wilcox Doe Hyun Yoon 170 4,619 0 16 Apr 2017
Bit-pragmatic Deep Neural Network Computing Jorge Albericio Patrick Judd A. Delmas Sayeh Sharify Andreas Moshovos MQ 54 239 0 20 Oct 2016
Accelerating Deep Convolutional Networks using low-precision and sparsity Ganesh Venkatesh Eriko Nurvitadhi Debbie Marr 56 135 0 02 Oct 2016
Exploring the Limits of Language Modeling Rafal Jozefowicz Oriol Vinyals M. Schuster Noam M. Shazeer Yonghui Wu 118 1,143 0 07 Feb 2016
EIE: Efficient Inference Engine on Compressed Deep Neural Network Song Han Xingyu Liu Huizi Mao Jing Pu A. Pedram M. Horowitz W. Dally 102 2,453 0 04 Feb 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 1.4K 192,638 0 10 Dec 2015
Listen, Attend and Spell William Chan Navdeep Jaitly Quoc V. Le Oriol Vinyals RALM 126 2,261 0 05 Aug 2015
Neural Turing Machines Alex Graves Greg Wayne Ivo Danihelka 64 2,318 0 20 Oct 2014
Going Deeper with Convolutions Christian Szegedy Wei Liu Yangqing Jia P. Sermanet Scott E. Reed Dragomir Anguelov D. Erhan Vincent Vanhoucke Andrew Rabinovich 299 43,511 0 17 Sep 2014
One weird trick for parallelizing convolutional neural networks A. Krizhevsky GNN 74 1,297 0 23 Apr 2014