Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2011.13635
Cited By
Progressively Stacking 2.0: A Multi-stage Layerwise Training Method for BERT Training Speedup
27 November 2020
Cheng Yang
Shengnan Wang
Chao Yang
Yuechuan Li
Ru He
Jingqiao Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Progressively Stacking 2.0: A Multi-stage Layerwise Training Method for BERT Training Speedup"
7 / 7 papers shown
Title
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
Chenze Shao
Fandong Meng
Jie Zhou
53
1
0
17 Jul 2024
A Multi-Level Framework for Accelerating Training Transformer Models
Longwei Zou
Han Zhang
Yangdong Deng
AI4CE
40
1
0
07 Apr 2024
Automated Progressive Learning for Efficient Training of Vision Transformers
Changlin Li
Bohan Zhuang
Guangrun Wang
Xiaodan Liang
Xiaojun Chang
Yi Yang
33
46
0
28 Mar 2022
bert2BERT: Towards Reusable Pretrained Language Models
Cheng Chen
Yichun Yin
Lifeng Shang
Xin Jiang
Yujia Qin
Fengyu Wang
Zhi Wang
Xiao Chen
Zhiyuan Liu
Qun Liu
VLM
24
59
0
14 Oct 2021
How to Train BERT with an Academic Budget
Peter Izsak
Moshe Berchansky
Omer Levy
23
113
0
15 Apr 2021
K-BERT: Enabling Language Representation with Knowledge Graph
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Qi Ju
Haotang Deng
Ping Wang
231
777
0
17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
304
6,996
0
20 Apr 2018
1