ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1412.7580
  4. Cited By
Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

24 December 2014
Nicolas Vasilache
Jeff Johnson
Michaël Mathieu
Soumith Chintala
Serkan Piantino
Yann LeCun
ArXivPDFHTML

Papers citing "Fast Convolutional Nets With fbfft: A GPU Performance Evaluation"

50 / 53 papers shown
Title
Underwater object detection in sonar imagery with detection transformer and Zero-shot neural architecture search
Underwater object detection in sonar imagery with detection transformer and Zero-shot neural architecture search
XiaoTong Gu
Shengyu Tang
Yiming Cao
Changdong Yu
ViT
36
0
0
10 May 2025
AdaOPC: A Self-Adaptive Mask Optimization Framework For Real Design
  Patterns
AdaOPC: A Self-Adaptive Mask Optimization Framework For Real Design Patterns
Wenqian Zhao
Xufeng Yao
Ziyang Yu
Guojin Chen
Yuzhe Ma
Bei Yu
Martin D. F. Wong
30
17
0
15 Mar 2023
Pruning Very Deep Neural Network Channels for Efficient Inference
Pruning Very Deep Neural Network Channels for Efficient Inference
Yihui He
35
1
0
14 Nov 2022
Testing predictions of representation cost theory with CNNs
Testing predictions of representation cost theory with CNNs
Charles Godfrey
Elise Bishoff
Myles Mckay
Davis Brown
Grayson Jorgenson
Henry Kvinge
E. Byler
34
0
0
03 Oct 2022
Fast-FNet: Accelerating Transformer Encoder Models via Efficient Fourier
  Layers
Fast-FNet: Accelerating Transformer Encoder Models via Efficient Fourier Layers
Nurullah Sevim
Ege Ozan Özyedek
Furkan Şahinuç
Aykut Koç
40
11
0
26 Sep 2022
Learning Convolutional Neural Networks in the Frequency Domain
Learning Convolutional Neural Networks in the Frequency Domain
H. Pan
Yixin Chen
Xin-Yi Niu
Wenbo Zhou
Dongsheng Li
OOD
25
9
0
14 Apr 2022
NUMA-aware FFT-based Convolution on ARMv8 Many-core CPUs
NUMA-aware FFT-based Convolution on ARMv8 Many-core CPUs
Xiandong Huang
Qinglin Wang
Shuyu Lu
Ruochen Hao
Songzhu Mei
Jie Liu
21
7
0
25 Sep 2021
Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Narendra Chaudhary
Sanchit Misra
Dhiraj D. Kalamkar
A. Heinecke
E. Georganas
Barukh Ziv
Menachem Adelman
Bharat Kaul
32
9
0
16 Apr 2021
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Ashish Vaswani
Prajit Ramachandran
A. Srinivas
Niki Parmar
Blake A. Hechtman
Jonathon Shlens
30
395
0
23 Mar 2021
Deep Networks from the Principle of Rate Reduction
Deep Networks from the Principle of Rate Reduction
Kwan Ho Ryan Chan
Yaodong Yu
Chong You
Haozhi Qi
John N. Wright
Yi Ma
22
21
0
27 Oct 2020
Exploring Sparsity in Image Super-Resolution for Efficient Inference
Exploring Sparsity in Image Super-Resolution for Efficient Inference
Longguang Wang
Xiaoyu Dong
Yingqian Wang
Xinyi Ying
Zaiping Lin
W. An
Yulan Guo
SupR
34
4
0
17 Jun 2020
ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network
ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network
David Gschwend
35
64
0
14 May 2020
Machine Learning enabled Spectrum Sharing in Dense LTE-U/Wi-Fi
  Coexistence Scenarios
Machine Learning enabled Spectrum Sharing in Dense LTE-U/Wi-Fi Coexistence Scenarios
Adam Dziedzic
V. Sathya
M. I. Rochman
M. Ghosh
S. Krishnan
24
19
0
18 Mar 2020
Exploiting Verified Neural Networks via Floating Point Numerical Error
Exploiting Verified Neural Networks via Floating Point Numerical Error
Kai Jia
Martin Rinard
AAML
37
34
0
06 Mar 2020
Depth-wise Decomposition for Accelerating Separable Convolutions in
  Efficient Convolutional Neural Networks
Depth-wise Decomposition for Accelerating Separable Convolutions in Efficient Convolutional Neural Networks
Yihui He
Jianing Qian
Jianren Wang
Cindy X. Le
Congrui Hetang
Qi Lyu
Wenping Wang
Tianwei Yue
50
11
0
21 Oct 2019
Global Sparse Momentum SGD for Pruning Very Deep Neural Networks
Global Sparse Momentum SGD for Pruning Very Deep Neural Networks
Xiaohan Ding
Guiguang Ding
Xiangxin Zhou
Yuchen Guo
Jungong Han
Ji Liu
18
162
0
27 Sep 2019
A Winograd-based Integrated Photonics Accelerator for Convolutional
  Neural Networks
A Winograd-based Integrated Photonics Accelerator for Convolutional Neural Networks
A. Mehrabian
M. Miscuglio
Yousra Alkabani
V. Sorger
T. El-Ghazawi
27
46
0
25 Jun 2019
Object Detection in 20 Years: A Survey
Object Detection in 20 Years: A Survey
Zhengxia Zou
Keyan Chen
Zhenwei Shi
Yuhong Guo
Jieping Ye
VLM
ObjD
AI4TS
34
2,290
0
13 May 2019
MobiVSR: A Visual Speech Recognition Solution for Mobile Devices
MobiVSR: A Visual Speech Recognition Solution for Mobile Devices
Nilay Shrivastava
Astitwa Saxena
Yaman Kumar Singla
Preeti Kaur
Debanjan Mahata
R. Shah
27
3
0
10 May 2019
A detailed comparative study of open source deep learning frameworks
A detailed comparative study of open source deep learning frameworks
Ghadeer Al-Bdour
Raffi Al-Qurran
M. Al-Ayyoub
A. Shatnawi
ELM
21
13
0
25 Feb 2019
Towards Compact ConvNets via Structure-Sparsity Regularized Filter
  Pruning
Towards Compact ConvNets via Structure-Sparsity Regularized Filter Pruning
Shaohui Lin
Rongrong Ji
Yuchao Li
Cheng Deng
Xuelong Li
35
70
0
23 Jan 2019
Efficient Winograd Convolution via Integer Arithmetic
Efficient Winograd Convolution via Integer Arithmetic
Lingchuan Meng
J. Brothers
16
29
0
07 Jan 2019
High Performance Zero-Memory Overhead Direct Convolutions
High Performance Zero-Memory Overhead Direct Convolutions
Jiyuan Zhang
F. Franchetti
Tze Meng Low
17
68
0
20 Sep 2018
2PFPCE: Two-Phase Filter Pruning Based on Conditional Entropy
2PFPCE: Two-Phase Filter Pruning Based on Conditional Entropy
Chuhan Min
Aosen Wang
Yiran Chen
Wenyao Xu
Xin Chen
30
41
0
06 Sep 2018
Auto Deep Compression by Reinforcement Learning Based Actor-Critic
  Structure
Auto Deep Compression by Reinforcement Learning Based Actor-Critic Structure
Hamed Hakkak
OffRL
AI4CE
15
1
0
08 Jul 2018
Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN
  Inference Engine
Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine
Renzo Andri
Lukas Cavigelli
D. Rossi
Luca Benini
MQ
24
19
0
05 Mar 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth
  Concurrency Analysis
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Tal Ben-Nun
Torsten Hoefler
GNN
33
704
0
26 Feb 2018
Tensor Comprehensions: Framework-Agnostic High-Performance Machine
  Learning Abstractions
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
Nicolas Vasilache
O. Zinenko
Theodoros Theodoridis
Priya Goyal
Zach DeVito
William S. Moses
Sven Verdoolaege
Andrew Adams
Albert Cohen
40
432
0
13 Feb 2018
AMC: AutoML for Model Compression and Acceleration on Mobile Devices
AMC: AutoML for Model Compression and Acceleration on Mobile Devices
Yihui He
Ji Lin
Zhijian Liu
Hanrui Wang
Li Li
Song Han
35
1,343
0
10 Feb 2018
High performance ultra-low-precision convolutions on mobile devices
High performance ultra-low-precision convolutions on mobile devices
Andrew Tulloch
Yangqing Jia
HAI
MQ
18
27
0
06 Dec 2017
Distributed Training Large-Scale Deep Architectures
Distributed Training Large-Scale Deep Architectures
Shang-Xuan Zou
Chun-Yen Chen
Jui-Lin Wu
Chun-Nan Chou
Chia-Chin Tsao
Kuan-Chieh Tung
Ting-Wei Lin
Cheng-Lung Sung
Edward Y. Chang
26
22
0
10 Aug 2017
Channel Pruning for Accelerating Very Deep Neural Networks
Channel Pruning for Accelerating Very Deep Neural Networks
Yihui He
Xiangyu Zhang
Jian Sun
119
2,508
0
19 Jul 2017
MEC: Memory-efficient Convolution for Deep Neural Network
MEC: Memory-efficient Convolution for Deep Neural Network
Minsik Cho
D. Brand
21
86
0
21 Jun 2017
Network Sketching: Exploiting Binary Structure in Deep CNNs
Network Sketching: Exploiting Binary Structure in Deep CNNs
Yiwen Guo
Anbang Yao
Hao Zhao
Yurong Chen
MQ
37
95
0
07 Jun 2017
Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs
Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs
Xiaoming Chen
Jianxu Chen
Danny Chen
X. S. Hu
27
10
0
29 May 2017
Deep Learning in the Automotive Industry: Applications and Tools
Deep Learning in the Automotive Industry: Applications and Tools
André Luckow
M. Cook
Nathan Ashcraft
Edwin Weill
Emil Djerekarov
Bennie Vorster
28
116
0
30 Apr 2017
HPTT: A High-Performance Tensor Transposition C++ Library
HPTT: A High-Performance Tensor Transposition C++ Library
P. Springer
Tong Su
Paolo Bientinesi
16
49
0
14 Apr 2017
CBinfer: Change-Based Inference for Convolutional Neural Networks on
  Video Data
CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data
Lukas Cavigelli
Philippe Degen
Luca Benini
BDL
25
51
0
14 Apr 2017
Exploring the Design Space of Deep Convolutional Neural Networks at
  Large Scale
Exploring the Design Space of Deep Convolutional Neural Networks at Large Scale
F. Iandola
3DV
26
18
0
20 Dec 2016
Faster CNNs with Direct Sparse Convolutions and Guided Pruning
Faster CNNs with Direct Sparse Convolutions and Guided Pruning
Jongsoo Park
Sheng Li
W. Wen
P. T. P. Tang
Hai Helen Li
Yiran Chen
Pradeep Dubey
39
182
0
04 Aug 2016
Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs
Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs
Stefan Hadjis
Ce Zhang
Ioannis Mitliagkas
Dan Iter
Christopher Ré
20
65
0
14 Jun 2016
Optimizing Performance of Recurrent Neural Networks on GPUs
Optimizing Performance of Recurrent Neural Networks on GPUs
J. Appleyard
Tomás Kociský
Phil Blunsom
25
91
0
07 Apr 2016
TTC: A high-performance Compiler for Tensor Transpositions
TTC: A high-performance Compiler for Tensor Transpositions
P. Springer
J. Hammond
Paolo Bientinesi
25
17
0
07 Mar 2016
Very Efficient Training of Convolutional Neural Networks using Fast
  Fourier Transform and Overlap-and-Add
Very Efficient Training of Convolutional Neural Networks using Fast Fourier Transform and Overlap-and-Add
Tyler Highlander
Andres Rodriguez
26
59
0
25 Jan 2016
On the energy landscape of deep networks
On the energy landscape of deep networks
Pratik Chaudhari
Stefano Soatto
ODL
43
27
0
20 Nov 2015
Comparative Study of Deep Learning Software Frameworks
Comparative Study of Deep Learning Software Frameworks
S. Bahrampour
Naveen Ramakrishnan
Lukas Schott
Mohak Shah
29
161
0
19 Nov 2015
FireCaffe: near-linear acceleration of deep neural network training on
  compute clusters
FireCaffe: near-linear acceleration of deep neural network training on compute clusters
F. Iandola
Khalid Ashraf
Matthew W. Moskewicz
Kurt Keutzer
30
302
0
31 Oct 2015
Compact Convolutional Neural Network Cascade for Face Detection
Compact Convolutional Neural Network Cascade for Face Detection
Ilya Kalinowski
V. Spitsyn
CVBM
24
35
0
06 Aug 2015
Deep Convolutional Networks on Graph-Structured Data
Deep Convolutional Networks on Graph-Structured Data
Mikael Henaff
Joan Bruna
Yann LeCun
GNN
21
1,581
0
16 Jun 2015
Compressing Convolutional Neural Networks
Compressing Convolutional Neural Networks
Wenlin Chen
James T. Wilson
Stephen Tyree
Kilian Q. Weinberger
Yixin Chen
26
139
0
14 Jun 2015
12
Next