Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1410.0759
Cited By
cuDNN: Efficient Primitives for Deep Learning
3 October 2014
Sharan Chetlur
Cliff Woolley
Philippe Vandermersch
Jonathan M. Cohen
J. Tran
Bryan Catanzaro
Evan Shelhamer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"cuDNN: Efficient Primitives for Deep Learning"
50 / 249 papers shown
Title
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs
Yongkweon Jeon
Baeseong Park
S. Kwon
Byeongwook Kim
Jeongin Yun
Dongsoo Lee
MQ
33
30
0
20 May 2020
Energy-Aware DNN Graph Optimization
Yu Wang
Rong Ge
Shuang Qiu
GNN
27
2
0
12 May 2020
Reinforcement Learning with Augmented Data
Michael Laskin
Kimin Lee
Adam Stooke
Lerrel Pinto
Pieter Abbeel
A. Srinivas
OffRL
20
648
0
30 Apr 2020
BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation
Changqian Yu
Changxin Gao
Jingbo Wang
Gang Yu
Chunhua Shen
Nong Sang
SSeg
34
1,178
0
05 Apr 2020
FT-CNN: Algorithm-Based Fault Tolerance for Convolutional Neural Networks
Kai Zhao
Sheng Di
Sihuan Li
Xin Liang
Yujia Zhai
Jieyang Chen
Kaiming Ouyang
Franck Cappello
Zizhong Chen
30
80
0
27 Mar 2020
Pipelined Backpropagation at Scale: Training Large Models without Batches
Atli Kosson
Vitaliy Chiley
Abhinav Venigalla
Joel Hestness
Urs Koster
35
33
0
25 Mar 2020
Machine Learning enabled Spectrum Sharing in Dense LTE-U/Wi-Fi Coexistence Scenarios
Adam Dziedzic
V. Sathya
M. I. Rochman
M. Ghosh
S. Krishnan
24
19
0
18 Mar 2020
Exploiting Verified Neural Networks via Floating Point Numerical Error
Kai Jia
Martin Rinard
AAML
37
34
0
06 Mar 2020
Advances in Deep Space Exploration via Simulators & Deep Learning
James Bird
Linda R. Petzold
P. Lubin
Dulia Deacon
16
15
0
10 Feb 2020
Deep Learning on Image Denoising: An overview
Chunwei Tian
Lunke Fei
Wenxian Zheng
Yong-mei Xu
W. Zuo
Chia-Wen Lin
37
815
0
31 Dec 2019
Pipelined Training with Stale Weights of Deep Convolutional Neural Networks
Lifu Zhang
T. Abdelrahman
24
0
0
29 Dec 2019
Tangent Images for Mitigating Spherical Distortion
Marc Eder
Mykhailo Shvets
John Lim
Jan-Michael Frahm
24
105
0
19 Dec 2019
Array Languages Make Neural Networks Fast
Artjoms Šinkarovs
Hans-Nikolai Vießmann
S. Scholz
25
5
0
11 Dec 2019
A Multigrid Method for Efficiently Training Video Models
Chaoxia Wu
Ross B. Girshick
Kaiming He
Christoph Feichtenhofer
Philipp Krahenbuhl
32
94
0
02 Dec 2019
Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design
Xingyao Zhang
Shuaiwen Leon Song
Chenhao Xie
Jing Wang
Wei-gong Zhang
Xin Fu
34
20
0
07 Nov 2019
MLPerf Inference Benchmark
Vijayarāghava Reḍḍī
C. Cheng
David Kanter
Pete H Mattson
Guenther Schmuelling
...
Bing Yu
George Y. Yuan
Aaron Zhong
P. Zhang
Yuchen Zhou
31
488
0
06 Nov 2019
Depth-wise Decomposition for Accelerating Separable Convolutions in Efficient Convolutional Neural Networks
Yihui He
Jianing Qian
Jianren Wang
Cindy X. Le
Congrui Hetang
Qi Lyu
Wenping Wang
Tianwei Yue
50
11
0
21 Oct 2019
AI Benchmark: All About Deep Learning on Smartphones in 2019
Andrey D. Ignatov
Radu Timofte
Andrei Kulik
Seungsoo Yang
Ke Wang
Felix Baum
Max Wu
Lirong Xu
Luc Van Gool
ELM
23
218
0
15 Oct 2019
EDEN: Enabling Energy-Efficient, High-Performance Deep Neural Network Inference Using Approximate DRAM
Skanda Koppula
Lois Orosa
A. G. Yaglikçi
Roknoddin Azizi
Taha Shahroodi
Konstantinos Kanellopoulos
O. Mutlu
27
105
0
12 Oct 2019
MLPerf Training Benchmark
Arya D. McCarthy
Christine Cheng
Cody Coleman
Greg Diamos
Paulius Micikevicius
...
Carole-Jean Wu
Lingjie Xu
Masafumi Yamazaki
C. Young
Matei A. Zaharia
47
307
0
02 Oct 2019
MIOpen: An Open Source Library For Deep Learning Primitives
Jehandad Khan
Paul Fultz
Artem Tamazov
Daniel Lowell
Chao-Jung Liu
...
Vasilii Filippov
Jing Zhang
Jing Zhou
Bragadeesh Natarajan
Mayank Daga
VLM
MoE
20
38
0
30 Sep 2019
Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator
Tian Zhao
Yaqi Zhang
K. Olukotun
33
16
0
26 Sep 2019
Exascale Deep Learning for Scientific Inverse Problems
N. Laanait
Josh Romero
Junqi Yin
M. T. Young
Sean Treichler
V. Starchenko
A. Borisevich
Alexander Sergeev
Michael A. Matheson
FedML
BDL
35
29
0
24 Sep 2019
360
o
360^o
36
0
o
Surface Regression with a Hyper-Sphere Loss
Antonis Karakottas
N. Zioulis
Stamatis Samaras
Dimitrios Ataloglou
V. Gkitsas
D. Zarpalas
P. Daras
3DH
22
8
0
16 Sep 2019
rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch
Adam Stooke
Pieter Abbeel
OffRL
24
96
0
03 Sep 2019
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
Saptadeep Pal
Eiman Ebrahimi
A. Zulfiqar
Yaosheng Fu
Victor Zhang
Szymon Migacz
D. Nellans
Puneet Gupta
36
55
0
30 Jul 2019
DeepCABAC: A Universal Compression Algorithm for Deep Neural Networks
Simon Wiedemann
H. Kirchhoffer
Stefan Matlage
Paul Haase
Arturo Marbán
...
Ahmed Osman
D. Marpe
H. Schwarz
Thomas Wiegand
Wojciech Samek
54
93
0
27 Jul 2019
AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation
Hyeongmin Lee
Taeoh Kim
Tae-Young Chung
Daehyun Pak
Yuseok Ban
Sangyoun Lee
30
235
0
24 Jul 2019
Cross-Domain Car Detection Using Unsupervised Image-to-Image Translation: From Day to Night
Vinicius F. Arruda
T. M. Paixão
Rodrigo Berriel
Alberto F. de Souza
C. Badue
N. Sebe
Thiago Oliveira-Santos
ViT
15
103
0
19 Jul 2019
Profiling based Out-of-core Hybrid Method for Large Neural Networks
Yuki Ito
Haruki Imai
Tung D. Le
Yasushi Negishi
K. Kawachiya
R. Matsumiya
Toshio Endo
35
9
0
11 Jul 2019
A Unified Optimization Approach for CNN Model Inference on Integrated GPUs
Leyuan Wang
Zhi Chen
Yizhi Liu
Yao Wang
Lianmin Zheng
Mu Li
Yida Wang
39
30
0
03 Jul 2019
A Winograd-based Integrated Photonics Accelerator for Convolutional Neural Networks
A. Mehrabian
M. Miscuglio
Yousra Alkabani
V. Sorger
T. El-Ghazawi
27
46
0
25 Jun 2019
Parameterized Structured Pruning for Deep Neural Networks
Günther Schindler
Wolfgang Roth
Franz Pernkopf
Holger Froening
26
6
0
12 Jun 2019
DeepCABAC: Context-adaptive binary arithmetic coding for deep neural network compression
Simon Wiedemann
H. Kirchhoffer
Stefan Matlage
Paul Haase
Arturo Marbán
...
Ahmed Osman
D. Marpe
H. Schwarz
Thomas Wiegand
Wojciech Samek
MQ
19
21
0
15 May 2019
MobiVSR: A Visual Speech Recognition Solution for Mobile Devices
Nilay Shrivastava
Astitwa Saxena
Yaman Kumar Singla
Preeti Kaur
Debanjan Mahata
R. Shah
27
3
0
10 May 2019
Cross-Platform Performance Portability Using Highly Parametrized SYCL Kernels
John Lawson
M. Goli
Duncan McBain
Daniel Soutar
Louis Sugy
21
7
0
10 Apr 2019
Resource Efficient 3D Convolutional Neural Networks
Okan Kopuklu
Neslihan Köse
Ahmet Gunduz
Gerhard Rigoll
27
186
0
04 Apr 2019
swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight
Jiarui Fang
Liandeng Li
Haohuan Fu
Jinlei Jiang
Wenlai Zhao
Conghui He
Xin You
Guangwen Yang
23
30
0
16 Mar 2019
Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism
Nikoli Dryden
N. Maruyama
Tom Benson
Tim Moon
M. Snir
B. Van Essen
26
49
0
15 Mar 2019
Accelerating Training of Deep Neural Networks with a Standardization Loss
Jasmine Collins
Johannes Ballé
Jonathon Shlens
21
3
0
03 Mar 2019
Real-world Mapping of Gaze Fixations Using Instance Segmentation for Road Construction Safety Applications
Idris Jeelani
Khashayar Asadi
Hariharan Ramshankar
Kevin K. Han
A. Albert
19
5
0
30 Jan 2019
AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Neural Networks
Jinrong Guo
Wantao Liu
Wang Wang
Q. Lu
Songlin Hu
Jizhong Han
Ruixuan Li
21
9
0
21 Jan 2019
FPGA-based Accelerators of Deep Learning Networks for Learning and Classification: A Review
Ahmad Shawahna
S. M. Sait
A. El-Maleh
28
372
0
01 Jan 2019
An Optical Frontend for a Convolutional Neural Network
S. Colburn
Yiren Chu
Eli Shlizerman
A. Majumdar
27
89
0
23 Dec 2018
wav2letter++: The Fastest Open-source Speech Recognition System
Vineel Pratap
Awni Y. Hannun
Qiantong Xu
Jeff Cai
Jacob Kahn
Gabriel Synnaeve
Vitaliy Liptchinsky
R. Collobert
VLM
26
156
0
18 Dec 2018
SIMD-X: Programming and Processing of Graph Algorithms on GPUs
Hang Liu
Howie Huang
GNN
14
53
0
10 Dec 2018
Analyzing Machine Learning Workloads Using a Detailed GPU Simulator
Jonathan Lew
Deval Shah
Suchita Pati
Shaylin Cattell
Mengchi Zhang
...
Christopher Ng
Negar Goli
Matthew D. Sinclair
Timothy G. Rogers
Tor M. Aamodt
29
65
0
18 Nov 2018
Incremental Deep Learning for Robust Object Detection in Unknown Cluttered Environments
Dongwha Shin
M. Ahmed
P. Rhee
ObjD
26
20
0
13 Oct 2018
LIRS: Enabling efficient machine learning on NVM-based storage via a lightweight implementation of random shuffling
Zhi-Lin Ke
Hsiang-Yun Cheng
Chia-Lin Yang
22
9
0
10 Oct 2018
Co-Learning Feature Fusion Maps from PET-CT Images of Lung Cancer
Ashnil Kumar
M. Fulham
Dagan Feng
Jinman Kim
MedIm
27
165
0
05 Oct 2018
Previous
1
2
3
4
5
Next