ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1604.06174
  4. Cited By
Training Deep Nets with Sublinear Memory Cost

Training Deep Nets with Sublinear Memory Cost

21 April 2016
Tianqi Chen
Bing Xu
Chiyuan Zhang
Carlos Guestrin
ArXivPDFHTML

Papers citing "Training Deep Nets with Sublinear Memory Cost"

50 / 232 papers shown
Title
EVA: Exploring the Limits of Masked Visual Representation Learning at
  Scale
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
87
679
0
14 Nov 2022
Language models are good pathologists: using attention-based sequence
  reduction and text-pretrained transformers for efficient WSI classification
Language models are good pathologists: using attention-based sequence reduction and text-pretrained transformers for efficient WSI classification
Juan Pisula
Katarzyna Bozek
VLM
MedIm
36
3
0
14 Nov 2022
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture,
  and Generalization Capabilities
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities
Andros Tjandra
Nayan Singhal
David C. Zhang
Ozlem Kalinli
Abdel-rahman Mohamed
Duc Le
M. Seltzer
37
12
0
10 Nov 2022
Attention-based Neural Cellular Automata
Attention-based Neural Cellular Automata
Mattie Tesfaldet
Derek Nowrouzezahrai
C. Pal
ViT
37
17
0
02 Nov 2022
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the
  Memory Usage of Neural Networks
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks
Benoit Steiner
Mostafa Elhoushi
Jacob Kahn
James Hegarty
29
8
0
24 Oct 2022
An efficient deep neural network to find small objects in large 3D
  images
An efficient deep neural network to find small objects in large 3D images
Jungkyu Park
Jakub Chlkedowski
Stanislaw Jastrzebski
Jan Witowski
Yan Xu
...
Melanie Wegener
Linda Moy
Laura Heacock
B. Reig
Krzysztof J. Geras
MedIm
21
1
0
16 Oct 2022
An In-depth Study of Stochastic Backpropagation
An In-depth Study of Stochastic Backpropagation
J. Fang
Ming Xu
Hao Chen
Bing Shuai
Z. Tu
Joseph Tighe
BDL
32
1
0
30 Sep 2022
Space-time tradeoffs of lenses and optics via higher category theory
Space-time tradeoffs of lenses and optics via higher category theory
Bruno Gavranović
18
5
0
19 Sep 2022
Mimose: An Input-Aware Checkpointing Planner for Efficient Training on
  GPU
Mimose: An Input-Aware Checkpointing Planner for Efficient Training on GPU
Jian-He Liao
Mingzhen Li
Qingxiao Sun
Jiwei Hao
F. Yu
...
Ye Tao
Zicheng Zhang
Hailong Yang
Zhongzhi Luan
D. Qian
23
4
0
06 Sep 2022
POET: Training Neural Networks on Tiny Devices with Integrated
  Rematerialization and Paging
POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging
Shishir G. Patil
Paras Jain
P. Dutta
Ion Stoica
Joseph E. Gonzalez
12
35
0
15 Jul 2022
HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle
HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle
Guoxia Wang
Xiaomin Fang
Zhihua Wu
Yiqun Liu
Yang Xue
Yingfei Xiang
Dianhai Yu
Fan Wang
Yanjun Ma
33
31
0
12 Jul 2022
Transforming PageRank into an Infinite-Depth Graph Neural Network
Transforming PageRank into an Infinite-Depth Graph Neural Network
Andreas Roth
Thomas Liebig
GNN
34
14
0
01 Jul 2022
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network
Vitaliy Chiley
Vithursan Thangarasa
Abhay Gupta
Anshul Samar
Joel Hestness
D. DeCoste
50
8
0
28 Jun 2022
Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple
  Granularities
Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities
Zejiang Shen
Kyle Lo
L. Yu
N. Dahlberg
Margo Schlanger
Doug Downey
ELM
AILaw
37
43
0
22 Jun 2022
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer
  Learning
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning
Yi-Lin Sung
Jaemin Cho
Joey Tianyi Zhou
VLM
21
236
0
13 Jun 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
78
2,045
0
27 May 2022
ETAD: Training Action Detection End to End on a Laptop
ETAD: Training Action Detection End to End on a Laptop
Shuming Liu
Mengmeng Xu
Chen Zhao
Xu Zhao
Guohao Li
44
6
0
14 May 2022
Reducing Activation Recomputation in Large Transformer Models
Reducing Activation Recomputation in Large Transformer Models
V. Korthikanti
Jared Casper
Sangkug Lym
Lawrence C. McAfee
M. Andersch
M. Shoeybi
Bryan Catanzaro
AI4CE
29
257
0
10 May 2022
TALLFormer: Temporal Action Localization with a Long-memory Transformer
TALLFormer: Temporal Action Localization with a Long-memory Transformer
Feng Cheng
Gedas Bertasius
ViT
35
91
0
04 Apr 2022
BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection
BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection
Junjie Huang
Guan Huang
27
333
0
31 Mar 2022
DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation
DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation
Yu Tang
Chenyu Wang
Yufan Zhang
Yuliang Liu
Xingcheng Zhang
Linbo Qiao
Zhiquan Lai
Dongsheng Li
21
4
0
30 Mar 2022
Static Prediction of Runtime Errors by Learning to Execute Programs with
  External Resource Descriptions
Static Prediction of Runtime Errors by Learning to Execute Programs with External Resource Descriptions
David Bieber
Rishab Goel
Daniel Zheng
Hugo Larochelle
Daniel Tarlow
19
15
0
07 Mar 2022
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object
  Detection
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Hao Zhang
Feng Li
Shilong Liu
Lei Zhang
Hang Su
Jun Zhu
L. Ni
H. Shum
ViT
59
1,376
0
07 Mar 2022
R-GCN: The R Could Stand for Random
R-GCN: The R Could Stand for Random
Vic Degraeve
Gilles Vandewiele
F. Ongenae
Sofie Van Hoecke
GNN
29
13
0
04 Mar 2022
FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours
FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours
Shenggan Cheng
Xuanlei Zhao
Guangyang Lu
Bin-Rui Li
Zhongming Yu
Tian Zheng
R. Wu
Xiwen Zhang
Jian Peng
Yang You
AI4CE
24
30
0
02 Mar 2022
DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training
DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training
Joya Chen
Kai Xu
Yuhui Wang
Yifei Cheng
Angela Yao
19
7
0
28 Feb 2022
Survey on Large Scale Neural Network Training
Survey on Large Scale Neural Network Training
Julia Gusak
Daria Cherniuk
Alena Shilova
A. Katrutsa
Daniel Bershatsky
...
Lionel Eyraud-Dubois
Oleg Shlyazhko
Denis Dimitrov
Ivan Oseledets
Olivier Beaumont
22
10
0
21 Feb 2022
Vision Models Are More Robust And Fair When Pretrained On Uncurated
  Images Without Supervision
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
Priya Goyal
Quentin Duval
Isaac Seessel
Mathilde Caron
Ishan Misra
Levent Sagun
Armand Joulin
Piotr Bojanowski
VLM
SSL
26
110
0
16 Feb 2022
Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive
  DNN Models on Commodity Servers
Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers
Youjie Li
Amar Phanishayee
D. Murray
Jakub Tarnawski
N. Kim
19
19
0
02 Feb 2022
Tutorial on amortized optimization
Tutorial on amortized optimization
Brandon Amos
OffRL
75
43
0
01 Feb 2022
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional
  Vision-Language Generation
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
Han Zhang
Weichong Yin
Yewei Fang
Lanxin Li
Boqiang Duan
Zhihua Wu
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
27
58
0
31 Dec 2021
Unbiased Gradient Estimation in Unrolled Computation Graphs with
  Persistent Evolution Strategies
Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies
Paul Vicol
Luke Metz
Jascha Narain Sohl-Dickstein
27
67
0
27 Dec 2021
Ultrasound Speckle Suppression and Denoising using MRI-derived
  Normalizing Flow Priors
Ultrasound Speckle Suppression and Denoising using MRI-derived Normalizing Flow Priors
Vincent van de Schaft
Ruud J. G. van Sloun
OOD
MedIm
18
6
0
24 Dec 2021
Efficient Large Scale Language Modeling with Mixtures of Experts
Efficient Large Scale Language Modeling with Mixtures of Experts
Mikel Artetxe
Shruti Bhosale
Naman Goyal
Todor Mihaylov
Myle Ott
...
Jeff Wang
Luke Zettlemoyer
Mona T. Diab
Zornitsa Kozareva
Ves Stoyanov
MoE
61
188
0
20 Dec 2021
Long Context Question Answering via Supervised Contrastive Learning
Long Context Question Answering via Supervised Contrastive Learning
Avi Caciularu
Ido Dagan
Jacob Goldberger
Arman Cohan
RALM
19
23
0
16 Dec 2021
Self-attention Does Not Need $O(n^2)$ Memory
Self-attention Does Not Need O(n2)O(n^2)O(n2) Memory
M. Rabe
Charles Staats
LRM
26
139
0
10 Dec 2021
Mesa: A Memory-saving Training Framework for Transformers
Mesa: A Memory-saving Training Framework for Transformers
Zizheng Pan
Peng Chen
Haoyu He
Jing Liu
Jianfei Cai
Bohan Zhuang
31
20
0
22 Nov 2021
Swin Transformer V2: Scaling Up Capacity and Resolution
Swin Transformer V2: Scaling Up Capacity and Resolution
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
...
Yue Cao
Zheng-Wei Zhang
Li Dong
Furu Wei
B. Guo
ViT
70
1,754
0
18 Nov 2021
COMET: A Novel Memory-Efficient Deep Learning Training Framework by
  Using Error-Bounded Lossy Compression
COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression
Sian Jin
Chengming Zhang
Xintong Jiang
Yunhe Feng
Hui Guan
Guanpeng Li
Shuaiwen Leon Song
Dingwen Tao
27
23
0
18 Nov 2021
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at
  Scale
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
Arun Babu
Changhan Wang
Andros Tjandra
Kushal Lakhotia
Qiantong Xu
...
Yatharth Saraf
J. Pino
Alexei Baevski
Alexis Conneau
Michael Auli
SSL
32
657
0
17 Nov 2021
Gradients are Not All You Need
Gradients are Not All You Need
Luke Metz
C. Freeman
S. Schoenholz
Tal Kachman
30
93
0
10 Nov 2021
FILIP: Fine-grained Interactive Language-Image Pre-Training
FILIP: Fine-grained Interactive Language-Image Pre-Training
Lewei Yao
Runhu Huang
Lu Hou
Guansong Lu
Minzhe Niu
Hang Xu
Xiaodan Liang
Zhenguo Li
Xin Jiang
Chunjing Xu
VLM
CLIP
30
615
0
09 Nov 2021
BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the
  Edge
BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the Edge
Abdelrahman I. Hosny
Marina Neseem
Sherief Reda
MQ
35
4
0
29 Oct 2021
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel
  Training
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training
Yongbin Li
Hongxin Liu
Zhengda Bian
Boxiang Wang
Haichen Huang
Fan Cui
Chuan-Qing Wang
Yang You
GNN
30
143
0
28 Oct 2021
Robustness of Graph Neural Networks at Scale
Robustness of Graph Neural Networks at Scale
Simon Geisler
Tobias Schmidt
Hakan cSirin
Daniel Zügner
Aleksandar Bojchevski
Stephan Günnemann
AAML
30
125
0
26 Oct 2021
Neural Flows: Efficient Alternative to Neural ODEs
Neural Flows: Efficient Alternative to Neural ODEs
Marin Bilovs
Johanna Sommer
Syama Sundar Rangapuram
Tim Januschowski
Stephan Günnemann
AI4TS
24
70
0
25 Oct 2021
AxoNN: An asynchronous, message-driven parallel framework for
  extreme-scale deep learning
AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning
Siddharth Singh
A. Bhatele
GNN
34
14
0
25 Oct 2021
Hydra: A System for Large Multi-Model Deep Learning
Hydra: A System for Large Multi-Model Deep Learning
Kabir Nagrecha
Arun Kumar
MoE
AI4CE
38
5
0
16 Oct 2021
MarkupLM: Pre-training of Text and Markup Language for Visually-rich
  Document Understanding
MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding
Junlong Li
Yiheng Xu
Lei Cui
Furu Wei
VLM
3DGS
31
59
0
16 Oct 2021
Partial Variable Training for Efficient On-Device Federated Learning
Partial Variable Training for Efficient On-Device Federated Learning
Tien-Ju Yang
Dhruv Guliani
F. Beaufays
Giovanni Motta
FedML
24
25
0
11 Oct 2021
Previous
12345
Next