ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1604.06174
  4. Cited By
Training Deep Nets with Sublinear Memory Cost

Training Deep Nets with Sublinear Memory Cost

21 April 2016
Tianqi Chen
Bing Xu
Chiyuan Zhang
Carlos Guestrin
ArXivPDFHTML

Papers citing "Training Deep Nets with Sublinear Memory Cost"

50 / 232 papers shown
Title
Go beyond End-to-End Training: Boosting Greedy Local Learning with
  Context Supply
Go beyond End-to-End Training: Boosting Greedy Local Learning with Context Supply
Chengting Yu
Fengzhao Zhang
Hanzhi Ma
Aili Wang
Er-ping Li
29
1
0
12 Dec 2023
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language
  Models with 3D Parallelism
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen
Xuchen Pan
Yaliang Li
Bolin Ding
Jingren Zhou
LRM
41
31
0
08 Dec 2023
Moirai: Towards Optimal Placement for Distributed Inference on
  Heterogeneous Devices
Moirai: Towards Optimal Placement for Distributed Inference on Heterogeneous Devices
Beibei Zhang
Hongwei Zhu
Feng Gao
Zhihui Yang
Xiaoyang Sean Wang
29
1
0
07 Dec 2023
PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction
PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction
Lei Guan
Dongsheng Li
Jiye Liang
Wenjian Wang
Wenjian Wang
Xicheng Lu
37
1
0
01 Dec 2023
End-to-End Temporal Action Detection with 1B Parameters Across 1000
  Frames
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
Shuming Liu
Chen-Da Liu-Zhang
Chen Zhao
Guohao Li
36
25
0
28 Nov 2023
xTrimoGene: An Efficient and Scalable Representation Learner for
  Single-Cell RNA-Seq Data
xTrimoGene: An Efficient and Scalable Representation Learner for Single-Cell RNA-Seq Data
Jing Gong
Minsheng Hao
Xingyi Cheng
Xin Zeng
Chiming Liu
Jianzhu Ma
Xuegong Zhang
Taifeng Wang
Leo T. Song
31
18
0
26 Nov 2023
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large
  Reconstruction Model
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model
Jiahao Li
Hao Tan
Kai Zhang
Zexiang Xu
Fujun Luan
Yinghao Xu
Yicong Hong
Kalyan Sunkavalli
Greg Shakhnarovich
Sai Bi
59
254
0
10 Nov 2023
TorchDEQ: A Library for Deep Equilibrium Models
TorchDEQ: A Library for Deep Equilibrium Models
Zhengyang Geng
J. Zico Kolter
VLM
56
12
0
28 Oct 2023
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources
Zhikai Li
Xiaoxuan Liu
Banghua Zhu
Zhen Dong
Qingyi Gu
Kurt Keutzer
MQ
32
7
0
11 Oct 2023
Generative Judge for Evaluating Alignment
Generative Judge for Evaluating Alignment
Junlong Li
Shichao Sun
Weizhe Yuan
Run-Ze Fan
Hai Zhao
Pengfei Liu
ELM
ALM
35
79
0
09 Oct 2023
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
34
15
0
28 Sep 2023
Cost-effective On-device Continual Learning over Memory Hierarchy with
  Miro
Cost-effective On-device Continual Learning over Memory Hierarchy with Miro
Xinyue Ma
Suyeon Jeong
Minjia Zhang
Di Wang
Jonghyun Choi
Myeongjae Jeon
CLL
16
13
0
11 Aug 2023
Towards General Text Embeddings with Multi-stage Contrastive Learning
Towards General Text Embeddings with Multi-stage Contrastive Learning
Zehan Li
Xin Zhang
Yanzhao Zhang
Dingkun Long
Pengjun Xie
Meishan Zhang
59
351
0
07 Aug 2023
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
Hao Lin
Ke Wu
Jie Li
Jun Yu Li
Wu-Jun Li
39
1
0
31 Jul 2023
Breaking On-device Training Memory Wall: A Systematic Survey
Breaking On-device Training Memory Wall: A Systematic Survey
Shitian Li
Chunlin Tian
Kahou Tam
Ruirui Ma
Li Li
23
2
0
17 Jun 2023
Full Parameter Fine-tuning for Large Language Models with Limited
  Resources
Full Parameter Fine-tuning for Large Language Models with Limited Resources
Kai Lv
Yuqing Yang
Tengxiao Liu
Qi-jie Gao
Qipeng Guo
Xipeng Qiu
47
127
0
16 Jun 2023
Parameter-efficient is not sufficient: Exploring Parameter, Memory, and
  Time Efficient Adapter Tuning for Dense Predictions
Parameter-efficient is not sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense Predictions
Dongshuo Yin
Xueting Han
Bin Li
Hao Feng
Jinghua Bai
VPVLM
36
18
0
16 Jun 2023
Make Pre-trained Model Reversible: From Parameter to Memory Efficient
  Fine-Tuning
Make Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning
Baohao Liao
Shaomu Tan
Christof Monz
KELM
23
29
0
01 Jun 2023
Automated Tensor Model Parallelism with Overlapped Communication for
  Efficient Foundation Model Training
Automated Tensor Model Parallelism with Overlapped Communication for Efficient Foundation Model Training
Shengwei Li
Zhiquan Lai
Yanqi Hao
Weijie Liu
Ke-shi Ge
Xiaoge Deng
Dongsheng Li
KaiCheng Lu
16
10
0
25 May 2023
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text
  Translation
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Chenyang Le
Yao Qian
Long Zhou
Shujie Liu
Yanmin Qian
Michael Zeng
Xuedong Huang
24
13
0
24 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
48
115
0
18 May 2023
TextDiffuser: Diffusion Models as Text Painters
TextDiffuser: Diffusion Models as Text Painters
Jingye Chen
Yupan Huang
Tengchao Lv
Lei Cui
Qifeng Chen
Furu Wei
48
113
0
18 May 2023
OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
Youhe Jiang
Fangcheng Fu
Xupeng Miao
Xiaonan Nie
Bin Cui
36
11
0
17 May 2023
TASTY: A Transformer based Approach to Space and Time complexity
TASTY: A Transformer based Approach to Space and Time complexity
K. Moudgalya
Ankit Ramakrishnan
Vamsikrishna Chemudupati
Xinghai Lu
16
3
0
06 May 2023
Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution
  Strategies
Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution Strategies
Oscar Li
James Harrison
Jascha Narain Sohl-Dickstein
Virginia Smith
Luke Metz
51
5
0
21 Apr 2023
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via
  Word-Region Alignment
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
Lewei Yao
Jianhua Han
Xiaodan Liang
Danqian Xu
Wei Zhang
Zhenguo Li
Hang Xu
VLM
ObjD
CLIP
56
74
0
10 Apr 2023
Training Neural Networks for Execution on Approximate Hardware
Training Neural Networks for Execution on Approximate Hardware
Tianmu Li
Shurui Li
Puneet Gupta
27
1
0
08 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature
  Review
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
30
41
0
07 Apr 2023
BloombergGPT: A Large Language Model for Finance
BloombergGPT: A Large Language Model for Finance
Shijie Wu
Ozan Irsoy
Steven Lu
Vadim Dabravolski
Mark Dredze
Sebastian Gehrmann
P. Kambadur
David S. Rosenberg
Gideon Mann
AIFin
76
789
0
30 Mar 2023
EVA-CLIP: Improved Training Techniques for CLIP at Scale
EVA-CLIP: Improved Training Techniques for CLIP at Scale
Quan-Sen Sun
Yuxin Fang
Ledell Yu Wu
Xinlong Wang
Yue Cao
CLIP
VLM
69
470
0
27 Mar 2023
An Evaluation of Memory Optimization Methods for Training Neural
  Networks
An Evaluation of Memory Optimization Methods for Training Neural Networks
Xiaoxuan Liu
Siddharth Jha
Alvin Cheung
29
0
0
26 Mar 2023
MedNeXt: Transformer-driven Scaling of ConvNets for Medical Image
  Segmentation
MedNeXt: Transformer-driven Scaling of ConvNets for Medical Image Segmentation
Saikat Roy
Gregor Koehler
Constantin Ulrich
Michael Baumgartner
Jens Petersen
Fabian Isensee
Paul F. Jaeger
Klaus Maier-Hein
ViT
MedIm
35
138
0
17 Mar 2023
Angel-PTM: A Scalable and Economical Large-scale Pre-training System in
  Tencent
Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
Xiaonan Nie
Yi Liu
Fangcheng Fu
Jinbao Xue
Dian Jiao
Xupeng Miao
Yangyu Tao
Bin Cui
MoE
31
16
0
06 Mar 2023
Standing Between Past and Future: Spatio-Temporal Modeling for
  Multi-Camera 3D Multi-Object Tracking
Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking
Ziqi Pang
Jie Li
P. Tokmakov
Di Chen
Sergey Zagoruyko
Yu-xiong Wang
3DPC
33
47
0
07 Feb 2023
Colossal-Auto: Unified Automation of Parallelization and Activation
  Checkpoint for Large-scale Models
Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Yuliang Liu
Shenggui Li
Jiarui Fang
Yan Shao
Boyuan Yao
Yang You
OffRL
27
7
0
06 Feb 2023
A Survey on Efficient Training of Transformers
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
31
47
0
02 Feb 2023
TAP: Accelerating Large-Scale DNN Training Through Tensor Automatic
  Parallelisation
TAP: Accelerating Large-Scale DNN Training Through Tensor Automatic Parallelisation
Ziji Shi
Le Jiang
Ang Wang
Jie Zhang
Xianyan Jia
Yong Li
Chencan Wu
Jialin Li
Wei Lin
GNN
44
2
0
01 Feb 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly
  Communication-Efficient
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
30
31
0
27 Jan 2023
ExplainableFold: Understanding AlphaFold Prediction with Explainable AI
ExplainableFold: Understanding AlphaFold Prediction with Explainable AI
Juntao Tan
Yongfeng Zhang
28
6
0
27 Jan 2023
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware
  Communication Compression
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
Jaeyong Song
Jinkyu Yim
Jaewon Jung
Hongsun Jang
H. Kim
Youngsok Kim
Jinho Lee
GNN
24
25
0
24 Jan 2023
A Multi-Resolution Framework for U-Nets with Applications to
  Hierarchical VAEs
A Multi-Resolution Framework for U-Nets with Applications to Hierarchical VAEs
Fabian Falck
Christopher Williams
D. Danks
George Deligiannidis
C. Yau
Chris Holmes
Arnaud Doucet
M. Willetts
27
8
0
19 Jan 2023
Multimodal Inverse Cloze Task for Knowledge-based Visual Question
  Answering
Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering
Paul Lerner
O. Ferret
C. Guinaudeau
21
9
0
11 Jan 2023
Systems for Parallel and Distributed Large-Model Deep Learning Training
Systems for Parallel and Distributed Large-Model Deep Learning Training
Kabir Nagrecha
GNN
VLM
MoE
26
7
0
06 Jan 2023
CAPSTONE: Curriculum Sampling for Dense Retrieval with Document
  Expansion
CAPSTONE: Curriculum Sampling for Dense Retrieval with Document Expansion
Xingwei He
Yeyun Gong
Alex Jin
Hang Zhang
Anlei Dong
Jian Jiao
Siu-Ming Yiu
Nan Duan
RALM
54
3
0
18 Dec 2022
On-device Training: A First Overview on Existing Systems
On-device Training: A First Overview on Existing Systems
Shuai Zhu
Thiemo Voigt
Jeonggil Ko
Fatemeh Rahimian
34
14
0
01 Dec 2022
Scaling Language-Image Pre-training via Masking
Scaling Language-Image Pre-training via Masking
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
CLIP
VLM
27
318
0
01 Dec 2022
Task Discovery: Finding the Tasks that Neural Networks Generalize on
Task Discovery: Finding the Tasks that Neural Networks Generalize on
Andrei Atanov
Andrei Filatov
Teresa Yeo
Ajay Sohmshetty
Amir Zamir
OOD
45
10
0
01 Dec 2022
Towards Practical Few-shot Federated NLP
Towards Practical Few-shot Federated NLP
Dongqi Cai
Yaozong Wu
Haitao Yuan
Shangguang Wang
F. Lin
Mengwei Xu
FedML
39
6
0
01 Dec 2022
RAMP: A Flat Nanosecond Optical Network and MPI Operations for
  Distributed Deep Learning Systems
RAMP: A Flat Nanosecond Optical Network and MPI Operations for Distributed Deep Learning Systems
Alessandro Ottino
Joshua L. Benjamin
G. Zervas
30
7
0
28 Nov 2022
PipeFisher: Efficient Training of Large Language Models Using Pipelining
  and Fisher Information Matrices
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Kazuki Osawa
Shigang Li
Torsten Hoefler
AI4CE
35
24
0
25 Nov 2022
Previous
12345
Next