Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.07705
Cited By
How to Train BERT with an Academic Budget
15 April 2021
Peter Izsak
Moshe Berchansky
Omer Levy
Re-assign community
ArXiv
PDF
HTML
Papers citing
"How to Train BERT with an Academic Budget"
22 / 72 papers shown
Title
Effective Pre-Training Objectives for Transformer-based Autoencoders
Luca Di Liello
Matteo Gabburo
Alessandro Moschitti
25
3
0
24 Oct 2022
Performance-Efficiency Trade-Offs in Adapting Language Models to Text Classification Tasks
Laura Aina
Nikos Voskarides
Roi Blanco
19
0
0
21 Oct 2022
Incorporating Context into Subword Vocabularies
Shaked Yehezkel
Yuval Pinter
47
8
0
13 Oct 2022
Spontaneous Emerging Preference in Two-tower Language Model
Zhengqi He
Taro Toyoizumi
LRM
18
1
0
13 Oct 2022
Pre-Training a Graph Recurrent Network for Language Representation
Yile Wang
Linyi Yang
Zhiyang Teng
M. Zhou
Yue Zhang
GNN
38
1
0
08 Sep 2022
Transformers with Learnable Activation Functions
Haishuo Fang
Ji-Ung Lee
N. Moosavi
Iryna Gurevych
AI4CE
25
7
0
30 Aug 2022
What Dense Graph Do You Need for Self-Attention?
Yuxing Wang
Chu-Tak Lee
Qipeng Guo
Zhangyue Yin
Yunhua Zhou
Xuanjing Huang
Xipeng Qiu
GNN
8
4
0
27 May 2022
Simple Recurrence Improves Masked Language Models
Tao Lei
Ran Tian
Jasmijn Bastings
Ankur P. Parikh
85
4
0
23 May 2022
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Sadhika Malladi
Kaifeng Lyu
A. Panigrahi
Sanjeev Arora
92
42
0
20 May 2022
METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals
Payal Bajaj
Chenyan Xiong
Guolin Ke
Xiaodong Liu
Di He
Saurabh Tiwary
Tie-Yan Liu
Paul N. Bennett
Xia Song
Jianfeng Gao
50
32
0
13 Apr 2022
DCT-Former: Efficient Self-Attention with Discrete Cosine Transform
Carmelo Scribano
Giorgia Franchini
M. Prato
Marko Bertogna
18
21
0
02 Mar 2022
Should You Mask 15% in Masked Language Modeling?
Alexander Wettig
Tianyu Gao
Zexuan Zhong
Danqi Chen
CVBM
29
162
0
16 Feb 2022
Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey
Bonan Min
Hayley L Ross
Elior Sulem
Amir Pouran Ben Veyseh
Thien Huu Nguyen
Oscar Sainz
Eneko Agirre
Ilana Heinz
Dan Roth
LM&MA
VLM
AI4CE
83
1,035
0
01 Nov 2021
Pre-train or Annotate? Domain Adaptation with a Constrained Budget
Fan Bai
Alan Ritter
Wei-ping Xu
66
31
0
10 Sep 2021
Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens
Itay Itzhak
Omer Levy
17
18
0
25 Aug 2021
Curriculum learning for language modeling
Daniel Fernando Campos
16
32
0
04 Aug 2021
Greedy-layer Pruning: Speeding up Transformer Models for Natural Language Processing
David Peer
Sebastian Stabinger
Stefan Engl
A. Rodríguez-Sánchez
11
27
0
31 May 2021
Optimal Subarchitecture Extraction For BERT
Adrian de Wynter
Daniel J. Perry
MQ
45
18
0
20 Oct 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
264
4,489
0
23 Jan 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick
Hinrich Schütze
258
1,589
0
21 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,826
0
17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
Previous
1
2