ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.00175
  4. Cited By
FireCaffe: near-linear acceleration of deep neural network training on
  compute clusters

FireCaffe: near-linear acceleration of deep neural network training on compute clusters

31 October 2015
F. Iandola
Khalid Ashraf
Matthew W. Moskewicz
Kurt Keutzer
ArXivPDFHTML

Papers citing "FireCaffe: near-linear acceleration of deep neural network training on compute clusters"

50 / 106 papers shown
Title
MATCHA: Speeding Up Decentralized SGD via Matching Decomposition
  Sampling
MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling
Jianyu Wang
Anit Kumar Sahu
Zhouyi Yang
Gauri Joshi
S. Kar
29
159
0
23 May 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
28
985
0
01 Apr 2019
Scalable Deep Learning on Distributed Infrastructures: Challenges,
  Techniques and Tools
Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools
R. Mayer
Hans-Arno Jacobsen
GNN
27
186
0
27 Mar 2019
swCaffe: a Parallel Framework for Accelerating Deep Learning
  Applications on Sunway TaihuLight
swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight
Jiarui Fang
Liandeng Li
Haohuan Fu
Jinlei Jiang
Wenlai Zhao
Conghui He
Xin You
Guangwen Yang
23
30
0
16 Mar 2019
Inefficiency of K-FAC for Large Batch Size Training
Inefficiency of K-FAC for Large Batch Size Training
Linjian Ma
Gabe Montague
Jiayu Ye
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
32
24
0
14 Mar 2019
Speeding up Deep Learning with Transient Servers
Speeding up Deep Learning with Transient Servers
Shijian Li
R. Walls
Lijie Xu
Tian Guo
30
12
0
28 Feb 2019
Optimizing Network Performance for Distributed DNN Training on GPU
  Clusters: ImageNet/AlexNet Training in 1.5 Minutes
Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes
Peng Sun
Wansen Feng
Ruobing Han
Shengen Yan
Yonggang Wen
AI4CE
26
70
0
19 Feb 2019
Large-Batch Training for LSTM and Beyond
Large-Batch Training for LSTM and Beyond
Yang You
Jonathan Hseu
Chris Ying
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
33
89
0
24 Jan 2019
Stanza: Layer Separation for Distributed Training in Deep Learning
Stanza: Layer Separation for Distributed Training in Deep Learning
Xiaorui Wu
Hongao Xu
Bo Li
Y. Xiong
MoE
28
9
0
27 Dec 2018
Layer-Parallel Training of Deep Residual Neural Networks
Layer-Parallel Training of Deep Residual Neural Networks
Stefanie Günther
Lars Ruthotto
J. Schroder
E. Cyr
N. Gauger
27
90
0
11 Dec 2018
Elastic Gossip: Distributing Neural Network Training Using Gossip-like
  Protocols
Elastic Gossip: Distributing Neural Network Training Using Gossip-like Protocols
Siddharth Pramod
FedML
28
2
0
06 Dec 2018
On the Computational Inefficiency of Large Batch Sizes for Stochastic
  Gradient Descent
On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent
Noah Golmant
N. Vemuri
Z. Yao
Vladimir Feinberg
A. Gholami
Kai Rothauge
Michael W. Mahoney
Joseph E. Gonzalez
21
73
0
30 Nov 2018
Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep
  Net Training
Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training
Youjie Li
Hang Qiu
Songze Li
A. Avestimehr
Nam Sung Kim
Alex Schwing
FedML
24
104
0
08 Nov 2018
GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient
  Aggregation in Distributed CNN Training
GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training
Timo C. Wunderlich
Zhifeng Lin
S. A. Aamir
Andreas Grübl
Youjie Li
David Stöckel
Alex Schwing
M. Annavaram
A. Avestimehr
MQ
19
64
0
08 Nov 2018
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI:
  Characterization, Designs, and Performance Evaluation
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation
A. A. Awan
Jeroen Bédorf
Ching-Hsiang Chu
Hari Subramoni
D. Panda
GNN
33
45
0
25 Oct 2018
Adaptive Communication Strategies to Achieve the Best Error-Runtime
  Trade-off in Local-Update SGD
Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD
Jianyu Wang
Gauri Joshi
FedML
33
232
0
19 Oct 2018
Distributed Learning over Unreliable Networks
Distributed Learning over Unreliable Networks
Chen Yu
Hanlin Tang
Cédric Renggli
S. Kassing
Ankit Singla
Dan Alistarh
Ce Zhang
Ji Liu
OOD
25
60
0
17 Oct 2018
MotherNets: Rapid Deep Ensemble Learning
MotherNets: Rapid Deep Ensemble Learning
Abdul Wasay
Brian Hentschel
Yuze Liao
Sanyuan Chen
Stratos Idreos
14
35
0
12 Sep 2018
Efficient and Robust Parallel DNN Training through Model Parallelism on
  Multi-GPU Platform
Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform
Chi-Chung Chen
Chia-Lin Yang
Hsiang-Yun Cheng
33
100
0
08 Sep 2018
CosmoFlow: Using Deep Learning to Learn the Universe at Scale
CosmoFlow: Using Deep Learning to Learn the Universe at Scale
Amrita Mathuriya
Deborah Bard
P. Mendygral
Lawrence Meadows
James A. Arnemann
...
Nalini Kumar
S. Ho
Michael F. Ringenburg
P. Prabhat
Victor W. Lee
AI4CE
18
125
0
14 Aug 2018
Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural
  Networks
Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks
Kang Liu
Brendan Dolan-Gavitt
S. Garg
AAML
24
1,021
0
30 May 2018
Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural
  Network Training
Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training
Liang Luo
Jacob Nelson
Luis Ceze
Amar Phanishayee
Arvind Krishnamurthy
76
120
0
21 May 2018
Spark-MPI: Approaching the Fifth Paradigm of Cognitive Applications
Spark-MPI: Approaching the Fifth Paradigm of Cognitive Applications
N. Malitsky
R. Castain
Matt Cowan
36
6
0
16 May 2018
GossipGraD: Scalable Deep Learning using Gossip Communication based
  Asynchronous Gradient Descent
GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent
J. Daily
Abhinav Vishnu
Charles Siegel
T. Warfel
Vinay C. Amatya
15
95
0
15 Mar 2018
TicTac: Accelerating Distributed Deep Learning with Communication
  Scheduling
TicTac: Accelerating Distributed Deep Learning with Communication Scheduling
Sayed Hadi Hashemi
Sangeetha Abdu Jyothi
R. Campbell
11
196
0
08 Mar 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth
  Concurrency Analysis
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Tal Ben-Nun
Torsten Hoefler
GNN
33
704
0
26 Feb 2018
Hessian-based Analysis of Large Batch Training and Robustness to
  Adversaries
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
Z. Yao
A. Gholami
Qi Lei
Kurt Keutzer
Michael W. Mahoney
35
163
0
22 Feb 2018
MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for
  scaling Deep Learning
MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning
Amith R. Mamidala
Georgios Kollias
C. Ward
F. Artico
34
20
0
11 Jan 2018
Online Job Scheduling in Distributed Machine Learning Clusters
Online Job Scheduling in Distributed Machine Learning Clusters
Yixin Bao
Size Zheng
Chuan Wu
Zongpeng Li
19
109
0
03 Jan 2018
SparCE: Sparsity aware General Purpose Core Extensions to Accelerate
  Deep Neural Networks
SparCE: Sparsity aware General Purpose Core Extensions to Accelerate Deep Neural Networks
Sanchari Sen
Shubham Jain
Swagath Venkataramani
A. Raghunathan
24
30
0
07 Nov 2017
Efficient Training of Convolutional Neural Nets on Large Distributed
  Systems
Efficient Training of Convolutional Neural Nets on Large Distributed Systems
Sameer Kumar
D. Sreedhar
Vaibhav Saxena
Yogish Sabharwal
Ashish Verma
35
4
0
02 Nov 2017
Keynote: Small Neural Nets Are Beautiful: Enabling Embedded Systems with
  Small Deep-Neural-Network Architectures
Keynote: Small Neural Nets Are Beautiful: Enabling Embedded Systems with Small Deep-Neural-Network Architectures
F. Iandola
Kurt Keutzer
31
37
0
07 Oct 2017
Accelerating SGD for Distributed Deep-Learning Using Approximated
  Hessian Matrix
Accelerating SGD for Distributed Deep-Learning Using Approximated Hessian Matrix
Sébastien M. R. Arnold
Chunming Wang
13
0
0
15 Sep 2017
ImageNet Training in Minutes
ImageNet Training in Minutes
Yang You
Zhao-jie Zhang
Cho-Jui Hsieh
J. Demmel
Kurt Keutzer
VLM
LRM
29
57
0
14 Sep 2017
What does fault tolerant Deep Learning need from MPI?
What does fault tolerant Deep Learning need from MPI?
Vinay C. Amatya
Abhinav Vishnu
Charles Siegel
J. Daily
28
19
0
11 Sep 2017
Distributed Deep Neural Networks over the Cloud, the Edge and End
  Devices
Distributed Deep Neural Networks over the Cloud, the Edge and End Devices
Surat Teerapittayanon
Bradley McDanel
H. T. Kung
FedML
27
714
0
06 Sep 2017
A Data and Model-Parallel, Distributed and Scalable Framework for
  Training of Deep Networks in Apache Spark
A Data and Model-Parallel, Distributed and Scalable Framework for Training of Deep Networks in Apache Spark
Disha Shrivastava
S. Chaudhury
Jayadeva Jayadeva
16
11
0
19 Aug 2017
Deep Learning at 15PF: Supervised and Semi-Supervised Classification for
  Scientific Data
Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data
Thorsten Kurth
Jian Zhang
N. Satish
Ioannis Mitliagkas
Evan Racah
...
J. Deslippe
Mikhail Shiryaev
Srinivas Sridharan
P. Prabhat
Pradeep Dubey
9
83
0
17 Aug 2017
Distributed Training Large-Scale Deep Architectures
Distributed Training Large-Scale Deep Architectures
Shang-Xuan Zou
Chun-Yen Chen
Jui-Lin Wu
Chun-Nan Chou
Chia-Chin Tsao
Kuan-Chieh Tung
Ting-Wei Lin
Cheng-Lung Sung
Edward Y. Chang
26
22
0
10 Aug 2017
Scaling Deep Learning on GPU and Knights Landing clusters
Scaling Deep Learning on GPU and Knights Landing clusters
Yang You
A. Buluç
J. Demmel
GNN
22
75
0
09 Aug 2017
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand
  Clusters: MPI or NCCL?
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?
A. A. Awan
Ching-Hsiang Chu
Hari Subramoni
D. Panda
GNN
44
46
0
28 Jul 2017
Poseidon: An Efficient Communication Architecture for Distributed Deep
  Learning on GPU Clusters
Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters
Huatian Zhang
Zeyu Zheng
Shizhen Xu
Wei-Ming Dai
Qirong Ho
Xiaodan Liang
Zhiting Hu
Jinliang Wei
P. Xie
Eric Xing
GNN
33
343
0
11 Jun 2017
Deep Learning in the Automotive Industry: Applications and Tools
Deep Learning in the Automotive Industry: Applications and Tools
André Luckow
M. Cook
Nathan Ashcraft
Edwin Weill
Emil Djerekarov
Bennie Vorster
28
116
0
30 Apr 2017
Scaling Binarized Neural Networks on Reconfigurable Logic
Scaling Binarized Neural Networks on Reconfigurable Logic
Nicholas J. Fraser
Yaman Umuroglu
Giulio Gambardella
Michaela Blott
Philip H. W. Leong
Magnus Jahre
K. Vissers
MQ
20
57
0
12 Jan 2017
Exploring the Design Space of Deep Convolutional Neural Networks at
  Large Scale
Exploring the Design Space of Deep Convolutional Neural Networks at Large Scale
F. Iandola
3DV
26
18
0
20 Dec 2016
How to scale distributed deep learning?
How to scale distributed deep learning?
Peter H. Jin
Qiaochu Yuan
F. Iandola
Kurt Keutzer
3DH
27
136
0
14 Nov 2016
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding
Dan Alistarh
Demjan Grubic
Jerry Li
Ryota Tomioka
Milan Vojnović
MQ
28
426
0
07 Oct 2016
Adaptive Neuron Apoptosis for Accelerating Deep Learning on Large Scale
  Systems
Adaptive Neuron Apoptosis for Accelerating Deep Learning on Large Scale Systems
Charles Siegel
J. Daily
Abhinav Vishnu
AI4CE
24
10
0
03 Oct 2016
Distributed Training of Deep Neural Networks: Theoretical and Practical
  Limits of Parallel Scalability
Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability
J. Keuper
Franz-Josef Pfreundt
GNN
55
97
0
22 Sep 2016
A Convolutional Autoencoder for Multi-Subject fMRI Data Aggregation
A Convolutional Autoencoder for Multi-Subject fMRI Data Aggregation
Po-Hsuan Chen
Xia Zhu
Hejia Zhang
Javier S. Turek
Janice Chen
Theodore L. Willke
Uri Hasson
Peter J. Ramadge
29
24
0
17 Aug 2016
Previous
123
Next