Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.11124
Cited By
Mesa: A Memory-saving Training Framework for Transformers
22 November 2021
Zizheng Pan
Peng Chen
Haoyu He
Jing Liu
Jianfei Cai
Bohan Zhuang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mesa: A Memory-saving Training Framework for Transformers"
15 / 15 papers shown
Title
Tin-Tin: Towards Tiny Learning on Tiny Devices with Integer-based Neural Network Training
Yi Hu
Jinhang Zuo
Eddie Zhang
Bob Iannucci
Carlee Joe-Wong
37
0
0
13 Apr 2025
CompAct: Compressed Activations for Memory-Efficient LLM Training
Yara Shamshoum
Nitzan Hodos
Yuval Sieradzki
Assaf Schuster
MQ
VLM
47
0
0
20 Oct 2024
Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation
Yuchen Yang
Yingdong Shi
Cheems Wang
Xiantong Zhen
Yuxuan Shi
Jun Xu
37
1
0
24 Jun 2024
Block Selective Reprogramming for On-device Training of Vision Transformers
Sreetama Sarkar
Souvik Kundu
Kai Zheng
P. Beerel
37
2
0
25 Mar 2024
LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing Platforms
Young D. Kwon
Jagmohan Chauhan
Hong Jia
Stylianos I. Venieris
Cecilia Mascolo
38
11
0
19 Nov 2023
TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge
Young D. Kwon
Rui Li
Stylianos I. Venieris
Jagmohan Chauhan
Nicholas D. Lane
Cecilia Mascolo
19
8
0
19 Jul 2023
An Evaluation of Memory Optimization Methods for Training Neural Networks
Xiaoxuan Liu
Siddharth Jha
Alvin Cheung
26
0
0
26 Mar 2023
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
31
47
0
02 Feb 2023
DIVISION: Memory Efficient Training via Dual Activation Precision
Guanchu Wang
Zirui Liu
Zhimeng Jiang
Ninghao Liu
Nannan Zou
Xia Hu
MQ
26
2
0
05 Aug 2022
GACT: Activation Compressed Training for Generic Network Architectures
Xiaoxuan Liu
Lianmin Zheng
Dequan Wang
Yukuo Cen
Weize Chen
...
Zhiyuan Liu
Jie Tang
Joey Gonzalez
Michael W. Mahoney
Alvin Cheung
VLM
GNN
MQ
17
30
0
22 Jun 2022
DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training
Joya Chen
Kai Xu
Yuhui Wang
Yifei Cheng
Angela Yao
19
7
0
28 Feb 2022
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
283
3,623
0
24 Feb 2021
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
142
221
0
31 Dec 2020
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
233
576
0
12 Sep 2019
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
253
1,828
0
18 Aug 2016
1