Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.00980
Cited By
Learning to Grow Pretrained Models for Efficient Transformer Training
2 March 2023
Peihao Wang
Rameswar Panda
Lucas Torroba Hennigen
P. Greengard
Leonid Karlinsky
Rogerio Feris
David D. Cox
Zhangyang Wang
Yoon Kim
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning to Grow Pretrained Models for Efficient Transformer Training"
15 / 15 papers shown
Title
A multilevel approach to accelerate the training of Transformers
Guillaume Lauga
Maël Chaumette
Edgar Desainte-Maréville
Étienne Lasalle
Arthur Lebeurrier
AI4CE
40
0
0
24 Apr 2025
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Haiyang Wang
Yue Fan
Muhammad Ferjad Naeem
Yongqin Xian
J. E. Lenssen
Liwei Wang
F. Tombari
Bernt Schiele
46
2
0
30 Oct 2024
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
Chenze Shao
Fandong Meng
Jie Zhou
46
1
0
17 Jul 2024
A Multi-Level Framework for Accelerating Training Transformer Models
Longwei Zou
Han Zhang
Yangdong Deng
AI4CE
40
1
0
07 Apr 2024
Beyond Uniform Scaling: Exploring Depth Heterogeneity in Neural Architectures
Akash Guna R.T
Arnav Chavan
Deepak Gupta
MDE
32
0
0
19 Feb 2024
FLM-101B: An Open LLM and How to Train It with
100
K
B
u
d
g
e
t
100K Budget
100
K
B
u
d
g
e
t
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Xuying Meng
...
LI DU
Bowen Qin
Zheng-Wei Zhang
Aixin Sun
Yequan Wang
60
21
0
07 Sep 2023
Composable Function-preserving Expansions for Transformer Architectures
Andrea Gesmundo
Kaitlin Maile
AI4CE
34
8
0
11 Aug 2023
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
20
41
0
12 Jul 2023
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,848
0
18 Apr 2021
Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks
Lemeng Wu
Bo Liu
Peter Stone
Qiang Liu
56
55
0
17 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
On the Transformer Growth for Progressive BERT Training
Xiaotao Gu
Liyuan Liu
Hongkun Yu
Jing Li
Cheng Chen
Jiawei Han
VLM
69
51
0
23 Oct 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
246
4,489
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,821
0
17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
1