Benanza: Automatic $μ$ Benchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs

16 November 2019

Jinjun Xiong

Papers citing "Benanza: Automatic $μ$Benchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs"

23 / 23 papers shown

Title
MIOpen: An Open Source Library For Deep Learning Primitives Jehandad Khan Paul Fultz Artem Tamazov Daniel Lowell Chao-Jung Liu ... Vasilii Filippov Jing Zhang Jing Zhou Bragadeesh Natarajan Mayank Daga VLM MoE 20 38 0 30 Sep 2019
NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement Learning Ameer Haj-Ali Nesreen Ahmed Theodore L. Willke Sophia Shao Krste Asanović Ion Stoica 37 101 0 20 Sep 2019
XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs Cheng-rong Li Abdul Dakkak Jinjun Xiong Wei Wei Lingjie Xu Wen-mei W. Hwu 21 16 0 19 Aug 2019
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning Tal Ben-Nun Maciej Besta Simon Huber A. Ziogas D. Peter Torsten Hoefler ELM ALM 36 77 0 29 Jan 2019
Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark Cody Coleman Daniel Kang Deepak Narayanan Luigi Nardi Tian Zhao Jian Zhang Peter Bailis K. Olukotun Christopher Ré Matei A. Zaharia 30 117 0 04 Jun 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis Tal Ben-Nun Torsten Hoefler GNN 47 705 0 26 Feb 2018
A Survey on Compiler Autotuning using Machine Learning Amir H. Ashouri W. Killian John Cavazos G. Palermo Cristina Silvano 45 200 0 13 Jan 2018
Optimal DNN Primitive Selection with Partitioned Boolean Quadratic Programming Andrew Anderson David Gregg 36 34 0 03 Oct 2017
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices Xiangyu Zhang Xinyu Zhou Mengxiao Lin Jian Sun AI4TS 108 6,830 0 04 Jul 2017
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand M. Andreetto Hartwig Adam 3DH 1.0K 20,692 0 17 Apr 2017
Understanding Convolution for Semantic Segmentation Panqu Wang Pengfei Chen Ye Yuan Ding Liu Zehua Huang Xiaodi Hou G. Cottrell SSeg 61 1,682 0 27 Feb 2017
YOLO9000: Better, Faster, Stronger Joseph Redmon Ali Farhadi VLM ObjD 133 15,535 0 25 Dec 2016
Densely Connected Convolutional Networks Gao Huang Zhuang Liu Laurens van der Maaten Kilian Q. Weinberger PINN 3DV 600 36,599 0 25 Aug 2016
Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution Emad Barsoum Cha Zhang Cristian Canton Ferrer Zhengyou Zhang 63 708 0 03 Aug 2016
Identity Mappings in Deep Residual Networks Kaiming He Xinming Zhang Shaoqing Ren Jian Sun 264 10,149 0 16 Mar 2016
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size F. Iandola Song Han Matthew W. Moskewicz Khalid Ashraf W. Dally Kurt Keutzer 107 7,448 0 24 Feb 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 1.2K 192,638 0 10 Dec 2015
Rethinking the Inception Architecture for Computer Vision Christian Szegedy Vincent Vanhoucke Sergey Ioffe Jonathon Shlens Z. Wojna 3DV BDL 401 27,231 0 02 Dec 2015
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Sergey Ioffe Christian Szegedy OOD 257 43,154 0 11 Feb 2015
Going Deeper with Convolutions Christian Szegedy Wei Liu Yangqing Jia P. Sermanet Scott E. Reed Dragomir Anguelov D. Erhan Vincent Vanhoucke Andrew Rabinovich 269 43,511 0 17 Sep 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan Andrew Zisserman FAtt MDE 758 99,991 0 04 Sep 2014
Visualizing and Understanding Convolutional Networks Matthew D. Zeiler Rob Fergus FAtt SSL 221 15,825 0 12 Nov 2013
Rich feature hierarchies for accurate object detection and semantic segmentation Ross B. Girshick Jeff Donahue Trevor Darrell Jitendra Malik ObjD 194 26,091 0 11 Nov 2013

Benanza: Automatic μμμBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs

Papers citing "Benanza: Automatic $μ$Benchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs"

Benanza: Automatic $μ$ Benchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs