ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1604.06174
  4. Cited By
Training Deep Nets with Sublinear Memory Cost

Training Deep Nets with Sublinear Memory Cost

21 April 2016
Tianqi Chen
Bing Xu
Chiyuan Zhang
Carlos Guestrin
ArXivPDFHTML

Papers citing "Training Deep Nets with Sublinear Memory Cost"

50 / 232 papers shown
Title
Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and
  Few-Shot Learning
Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning
Shaohua Wu
Xudong Zhao
Tong Yu
Rongguo Zhang
C. Shen
...
Feng Li
Hong Zhu
Jiangang Luo
Liang Xu
Xuanwei Zhang
ALM
29
59
0
10 Oct 2021
8-bit Optimizers via Block-wise Quantization
8-bit Optimizers via Block-wise Quantization
Tim Dettmers
M. Lewis
Sam Shleifer
Luke Zettlemoyer
MQ
34
273
0
06 Oct 2021
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based
  Memory Management
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management
Jiarui Fang
Zilin Zhu
Shenggui Li
Hui Su
Yang Yu
Jie Zhou
Yang You
VLM
37
24
0
12 Aug 2021
Chimera: Efficiently Training Large-Scale Neural Networks with
  Bidirectional Pipelines
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
Shigang Li
Torsten Hoefler
GNN
AI4CE
LRM
80
131
0
14 Jul 2021
A Field Guide to Federated Optimization
A Field Guide to Federated Optimization
Jianyu Wang
Zachary B. Charles
Zheng Xu
Gauri Joshi
H. B. McMahan
...
Mi Zhang
Tong Zhang
Chunxiang Zheng
Chen Zhu
Wennan Zhu
FedML
187
412
0
14 Jul 2021
Feature Alignment as a Generative Process
Feature Alignment as a Generative Process
T. S. Farias
Jonas Maziero
DiffM
BDL
21
1
0
23 Jun 2021
CPM-2: Large-scale Cost-effective Pre-trained Language Models
CPM-2: Large-scale Cost-effective Pre-trained Language Models
Zhengyan Zhang
Yuxian Gu
Xu Han
Shengqi Chen
Chaojun Xiao
...
Minlie Huang
Wentao Han
Yang Liu
Xiaoyan Zhu
Maosong Sun
MoE
37
86
0
20 Jun 2021
Layered gradient accumulation and modular pipeline parallelism: fast and
  efficient training of large language models
Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models
J. Lamy-Poirier
MoE
23
8
0
04 Jun 2021
SHINE: SHaring the INverse Estimate from the forward pass for bi-level
  optimization and implicit models
SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models
Zaccharie Ramzi
Florian Mannel
Shaojie Bai
Jean-Luc Starck
P. Ciuciu
Thomas Moreau
29
28
0
01 Jun 2021
Doc2Dict: Information Extraction as Text Generation
Doc2Dict: Information Extraction as Text Generation
Benjamin Townsend
Eamon Ito-Fisher
Lily Zhang
Madison May
28
7
0
16 May 2021
GSPMD: General and Scalable Parallelization for ML Computation Graphs
GSPMD: General and Scalable Parallelization for ML Computation Graphs
Yuanzhong Xu
HyoukJoong Lee
Dehao Chen
Blake A. Hechtman
Yanping Huang
...
Noam M. Shazeer
Shibo Wang
Tao Wang
Yonghui Wu
Zhifeng Chen
MoE
28
128
0
10 May 2021
Poolingformer: Long Document Modeling with Pooling Attention
Poolingformer: Long Document Modeling with Pooling Attention
Hang Zhang
Yeyun Gong
Yelong Shen
Weisheng Li
Jiancheng Lv
Nan Duan
Weizhu Chen
37
0
0
10 May 2021
Long-Span Summarization via Local Attention and Content Selection
Long-Span Summarization via Local Attention and Content Selection
Potsawee Manakul
Mark J. F. Gales
18
42
0
08 May 2021
A Dataset of Information-Seeking Questions and Answers Anchored in
  Research Papers
A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers
Pradeep Dasigi
Kyle Lo
Iz Beltagy
Arman Cohan
Noah A. Smith
Matt Gardner
RALM
48
279
0
07 May 2021
ActNN: Reducing Training Memory Footprint via 2-Bit Activation
  Compressed Training
ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
Jianfei Chen
Lianmin Zheng
Z. Yao
Dequan Wang
Ion Stoica
Michael W. Mahoney
Joseph E. Gonzalez
MQ
27
74
0
29 Apr 2021
An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of
  Convolutional Neural Networks
An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks
A. Kahira
Truong Thao Nguyen
L. Bautista-Gomez
Ryousei Takano
Rosa M. Badia
M. Wahib
15
9
0
19 Apr 2021
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep
  Learning
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Samyam Rajbhandari
Olatunji Ruwase
Jeff Rasley
Shaden Smith
Yuxiong He
GNN
41
370
0
16 Apr 2021
Efficient Large-Scale Language Model Training on GPU Clusters Using
  Megatron-LM
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Deepak Narayanan
M. Shoeybi
Jared Casper
P. LeGresley
M. Patwary
...
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
37
646
0
09 Apr 2021
No frame left behind: Full Video Action Recognition
No frame left behind: Full Video Action Recognition
X. Liu
S. Pintea
F. Karimi Nejadasl
Olaf Booij
Jan van Gemert
19
40
0
29 Mar 2021
Deep and Statistical Learning in Biomedical Imaging: State of the Art in
  3D MRI Brain Tumor Segmentation
Deep and Statistical Learning in Biomedical Imaging: State of the Art in 3D MRI Brain Tumor Segmentation
K. R. M. Fernando
Cris P Tsokos
31
53
0
09 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
156
27,772
0
26 Feb 2021
Jacobian Determinant of Normalizing Flows
Jacobian Determinant of Normalizing Flows
Huadong Liao
Jiawei He
DRL
19
7
0
12 Feb 2021
Enabling Binary Neural Network Training on the Edge
Enabling Binary Neural Network Training on the Edge
Erwei Wang
James J. Davis
Daniele Moro
Piotr Zielinski
Jia Jie Lim
C. Coelho
S. Chatterjee
P. Cheung
George A. Constantinides
MQ
20
24
0
08 Feb 2021
Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup
Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup
Luyu Gao
Yunyi Zhang
Jiawei Han
Jamie Callan
23
91
0
18 Jan 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
177
416
0
18 Jan 2021
A Novel Memory-Efficient Deep Learning Training Framework via
  Error-Bounded Lossy Compression
A Novel Memory-Efficient Deep Learning Training Framework via Error-Bounded Lossy Compression
Sian Jin
Guanpeng Li
Shuaiwen Leon Song
Dingwen Tao
AI4CE
29
12
0
18 Nov 2020
RocketQA: An Optimized Training Approach to Dense Passage Retrieval for
  Open-Domain Question Answering
RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering
Yingqi Qu
Yuchen Ding
Jing Liu
Kai Liu
Ruiyang Ren
Xin Zhao
Daxiang Dong
Hua Wu
Haifeng Wang
RALM
OffRL
214
594
0
16 Oct 2020
Memformer: A Memory-Augmented Transformer for Sequence Modeling
Memformer: A Memory-Augmented Transformer for Sequence Modeling
Qingyang Wu
Zhenzhong Lan
Kun Qian
Jing Gu
A. Geramifard
Zhou Yu
16
49
0
14 Oct 2020
SMYRF: Efficient Attention using Asymmetric Clustering
SMYRF: Efficient Attention using Asymmetric Clustering
Giannis Daras
Nikita Kitaev
Augustus Odena
A. Dimakis
28
44
0
11 Oct 2020
Review: Deep Learning in Electron Microscopy
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
34
79
0
17 Sep 2020
Scaling Distributed Deep Learning Workloads beyond the Memory Capacity
  with KARMA
Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA
M. Wahib
Haoyu Zhang
Truong Thao Nguyen
Aleksandr Drozd
Jens Domke
Lingqi Zhang
Ryousei Takano
Satoshi Matsuoka
OODD
34
23
0
26 Aug 2020
GANBERT: Generative Adversarial Networks with Bidirectional Encoder
  Representations from Transformers for MRI to PET synthesis
GANBERT: Generative Adversarial Networks with Bidirectional Encoder Representations from Transformers for MRI to PET synthesis
Hoo-Chang Shin
Alvin Ihsani
Swetha Mandava
Sharath Turuvekere Sreenivas
Christopher Forster
Jiook Cha
Alzheimer's Disease Neuroimaging Initiative
GAN
MedIm
29
19
0
10 Aug 2020
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs
  with Hybrid Parallelism
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism
Yosuke Oyama
N. Maruyama
Nikoli Dryden
Erin McCarthy
P. Harrington
J. Balewski
Satoshi Matsuoka
Peter Nugent
B. Van Essen
3DV
AI4CE
32
37
0
25 Jul 2020
DAPPLE: A Pipelined Data Parallel Approach for Training Large Models
DAPPLE: A Pipelined Data Parallel Approach for Training Large Models
Shiqing Fan
Yi Rong
Chen Meng
Zongyan Cao
Siyu Wang
...
Jun Yang
Lixue Xia
Lansong Diao
Xiaoyong Liu
Wei Lin
21
232
0
02 Jul 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
36
131
0
30 Jun 2020
PLATO-2: Towards Building an Open-Domain Chatbot via Curriculum Learning
PLATO-2: Towards Building an Open-Domain Chatbot via Curriculum Learning
Siqi Bao
H. He
Fan Wang
Hua Wu
Haifeng Wang
Wenquan Wu
Zhen Guo
Zhibin Liu
Xinchao Xu
30
137
0
30 Jun 2020
LAMP: Large Deep Nets with Automated Model Parallelism for Image
  Segmentation
LAMP: Large Deep Nets with Automated Model Parallelism for Image Segmentation
Wentao Zhu
Can Zhao
Wenqi Li
H. Roth
Ziyue Xu
Daguang Xu
3DV
32
18
0
22 Jun 2020
Dynamic Tensor Rematerialization
Dynamic Tensor Rematerialization
Marisa Kirisame
Steven Lyubomirsky
Altan Haan
Jennifer Brennan
Mike He
Jared Roesch
Tianqi Chen
Zachary Tatlock
27
93
0
17 Jun 2020
Memory-Efficient Pipeline-Parallel DNN Training
Memory-Efficient Pipeline-Parallel DNN Training
Deepak Narayanan
Amar Phanishayee
Kaiyu Shi
Xie Chen
Matei A. Zaharia
MoE
36
212
0
16 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
VirTex: Learning Visual Representations from Textual Annotations
Karan Desai
Justin Johnson
SSL
VLM
30
432
0
11 Jun 2020
Linformer: Self-Attention with Linear Complexity
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
63
1,647
0
08 Jun 2020
UFO-BLO: Unbiased First-Order Bilevel Optimization
UFO-BLO: Unbiased First-Order Bilevel Optimization
Valerii Likhosherstov
Xingyou Song
K. Choromanski
Jared Davis
Adrian Weller
32
7
0
05 Jun 2020
Hybrid Attention for Automatic Segmentation of Whole Fetal Head in
  Prenatal Ultrasound Volumes
Hybrid Attention for Automatic Segmentation of Whole Fetal Head in Prenatal Ultrasound Volumes
Xin Yang
Xu Wang
Yi Wang
Haoran Dou
Shengli Li
H. Wen
Yi Lin
Pheng-Ann Heng
Dong Ni
16
19
0
28 Apr 2020
Longformer: The Long-Document Transformer
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
28
3,929
0
10 Apr 2020
TorchIO: A Python library for efficient loading, preprocessing,
  augmentation and patch-based sampling of medical images in deep learning
TorchIO: A Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning
Fernando Pérez-García
Rachel Sparks
Sébastien Ourselin
MedIm
LM&MA
144
427
0
09 Mar 2020
Towards Crowdsourced Training of Large Neural Networks using
  Decentralized Mixture-of-Experts
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
Max Ryabinin
Anton I. Gusev
FedML
27
48
0
10 Feb 2020
i-flow: High-dimensional Integration and Sampling with Normalizing Flows
i-flow: High-dimensional Integration and Sampling with Normalizing Flows
Christina Gao
J. Isaacson
Claudius Krause
AI4CE
16
106
0
15 Jan 2020
Efficient Memory Management for Deep Neural Net Inference
Efficient Memory Management for Deep Neural Net Inference
Yury Pisarchyk
Juhyun Lee
24
36
0
10 Jan 2020
Optimal checkpointing for heterogeneous chains: how to train deep neural
  networks with limited memory
Optimal checkpointing for heterogeneous chains: how to train deep neural networks with limited memory
Julien Herrmann
Olivier Beaumont
Lionel Eyraud-Dubois
J. Herrmann
Alexis Joly
Alena Shilova
BDL
28
29
0
27 Nov 2019
Streaming convolutional neural networks for end-to-end learning with
  multi-megapixel images
Streaming convolutional neural networks for end-to-end learning with multi-megapixel images
H. Pinckaers
Bram van Ginneken
G. Litjens
MedIm
27
94
0
11 Nov 2019
Previous
12345
Next