ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1911.06922
  4. Cited By
Benanza: Automatic $μ$Benchmark Generation to Compute "Lower-bound"
  Latency and Inform Optimizations of Deep Learning Models on GPUs

Benanza: Automatic μμμBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs

16 November 2019
Cheng-rong Li
Abdul Dakkak
Jinjun Xiong
Wen-mei W. Hwu
ArXivPDFHTML

Papers citing "Benanza: Automatic $μ$Benchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs"

23 / 23 papers shown
Title
MIOpen: An Open Source Library For Deep Learning Primitives
MIOpen: An Open Source Library For Deep Learning Primitives
Jehandad Khan
Paul Fultz
Artem Tamazov
Daniel Lowell
Chao-Jung Liu
...
Vasilii Filippov
Jing Zhang
Jing Zhou
Bragadeesh Natarajan
Mayank Daga
VLM
MoE
20
38
0
30 Sep 2019
NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement
  Learning
NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement Learning
Ameer Haj-Ali
Nesreen Ahmed
Theodore L. Willke
Sophia Shao
Krste Asanović
Ion Stoica
37
101
0
20 Sep 2019
XSP: Across-Stack Profiling and Analysis of Machine Learning Models on
  GPUs
XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs
Cheng-rong Li
Abdul Dakkak
Jinjun Xiong
Wei Wei
Lingjie Xu
Wen-mei W. Hwu
21
16
0
19 Aug 2019
A Modular Benchmarking Infrastructure for High-Performance and
  Reproducible Deep Learning
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning
Tal Ben-Nun
Maciej Besta
Simon Huber
A. Ziogas
D. Peter
Torsten Hoefler
ELM
ALM
36
77
0
29 Jan 2019
Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance
  Benchmark
Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark
Cody Coleman
Daniel Kang
Deepak Narayanan
Luigi Nardi
Tian Zhao
Jian Zhang
Peter Bailis
K. Olukotun
Christopher Ré
Matei A. Zaharia
30
117
0
04 Jun 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth
  Concurrency Analysis
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Tal Ben-Nun
Torsten Hoefler
GNN
47
705
0
26 Feb 2018
A Survey on Compiler Autotuning using Machine Learning
A Survey on Compiler Autotuning using Machine Learning
Amir H. Ashouri
W. Killian
John Cavazos
G. Palermo
Cristina Silvano
45
200
0
13 Jan 2018
Optimal DNN Primitive Selection with Partitioned Boolean Quadratic
  Programming
Optimal DNN Primitive Selection with Partitioned Boolean Quadratic Programming
Andrew Anderson
David Gregg
36
34
0
03 Oct 2017
ShuffleNet: An Extremely Efficient Convolutional Neural Network for
  Mobile Devices
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
Xiangyu Zhang
Xinyu Zhou
Mengxiao Lin
Jian Sun
AI4TS
108
6,830
0
04 Jul 2017
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
  Applications
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
1.0K
20,692
0
17 Apr 2017
Understanding Convolution for Semantic Segmentation
Understanding Convolution for Semantic Segmentation
Panqu Wang
Pengfei Chen
Ye Yuan
Ding Liu
Zehua Huang
Xiaodi Hou
G. Cottrell
SSeg
61
1,682
0
27 Feb 2017
YOLO9000: Better, Faster, Stronger
YOLO9000: Better, Faster, Stronger
Joseph Redmon
Ali Farhadi
VLM
ObjD
133
15,535
0
25 Dec 2016
Densely Connected Convolutional Networks
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
Laurens van der Maaten
Kilian Q. Weinberger
PINN
3DV
600
36,599
0
25 Aug 2016
Training Deep Networks for Facial Expression Recognition with
  Crowd-Sourced Label Distribution
Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution
Emad Barsoum
Cha Zhang
Cristian Canton Ferrer
Zhengyou Zhang
63
708
0
03 Aug 2016
Identity Mappings in Deep Residual Networks
Identity Mappings in Deep Residual Networks
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
264
10,149
0
16 Mar 2016
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB
  model size
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
F. Iandola
Song Han
Matthew W. Moskewicz
Khalid Ashraf
W. Dally
Kurt Keutzer
107
7,448
0
24 Feb 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.2K
192,638
0
10 Dec 2015
Rethinking the Inception Architecture for Computer Vision
Rethinking the Inception Architecture for Computer Vision
Christian Szegedy
Vincent Vanhoucke
Sergey Ioffe
Jonathon Shlens
Z. Wojna
3DV
BDL
401
27,231
0
02 Dec 2015
Batch Normalization: Accelerating Deep Network Training by Reducing
  Internal Covariate Shift
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe
Christian Szegedy
OOD
257
43,154
0
11 Feb 2015
Going Deeper with Convolutions
Going Deeper with Convolutions
Christian Szegedy
Wei Liu
Yangqing Jia
P. Sermanet
Scott E. Reed
Dragomir Anguelov
D. Erhan
Vincent Vanhoucke
Andrew Rabinovich
269
43,511
0
17 Sep 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
758
99,991
0
04 Sep 2014
Visualizing and Understanding Convolutional Networks
Visualizing and Understanding Convolutional Networks
Matthew D. Zeiler
Rob Fergus
FAtt
SSL
221
15,825
0
12 Nov 2013
Rich feature hierarchies for accurate object detection and semantic
  segmentation
Rich feature hierarchies for accurate object detection and semantic segmentation
Ross B. Girshick
Jeff Donahue
Trevor Darrell
Jitendra Malik
ObjD
194
26,091
0
11 Nov 2013
1