Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.19319
Cited By
Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget
30 April 2024
Minh Duc Bui
Fabian David Schmidt
Goran Glavaš
Katharina von der Wense
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget"
10 / 10 papers shown
Title
Probing Across Time: What Does RoBERTa Know and When?
Leo Z. Liu
Yizhong Wang
Jungo Kasai
Hannaneh Hajishirzi
Noah A. Smith
KELM
65
82
0
16 Apr 2021
How to Train BERT with an Academic Budget
Peter Izsak
Moshe Berchansky
Omer Levy
77
116
0
15 Apr 2021
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
87
807
0
06 Apr 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
87
1,230
0
25 Feb 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
449
4,662
0
23 Jan 2020
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
126
7,437
0
02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
59
1,847
0
23 Sep 2019
Neural Network Acceptability Judgments
Alex Warstadt
Amanpreet Singh
Samuel R. Bowman
169
1,390
0
31 May 2018
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Adina Williams
Nikita Nangia
Samuel R. Bowman
397
4,444
0
18 Apr 2017
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
151
8,067
0
16 Jun 2016
1