Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.01150
Cited By
Cost-effective Selection of Pretraining Data: A Case Study of Pretraining BERT on Social Media
2 October 2020
Xiang Dai
Sarvnaz Karimi
Ben Hachey
Cécile Paris
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Cost-effective Selection of Pretraining Data: A Case Study of Pretraining BERT on Social Media"
18 / 18 papers shown
Title
Ignore Me But Don't Replace Me: Utilizing Non-Linguistic Elements for Pretraining on the Cybersecurity Domain
Eugene Jang
Jian Cui
Dayeon Yim
Youngjin Jin
Jin-Woo Chung
Seung-Eui Shin
Yongjae Lee
65
2
0
15 Mar 2024
Selecting Subsets of Source Data for Transfer Learning with Applications in Metal Additive Manufacturing
Yifan Tang
M. Rahmani Dehaghani
Pouyan Sajadi
G. G. Wang
19
12
0
16 Jan 2024
German FinBERT: A German Pre-trained Language Model
Moritz Scherrmann
35
0
0
15 Nov 2023
Better with Less: A Data-Active Perspective on Pre-Training Graph Neural Networks
Jiarong Xu
Renhong Huang
Xin Jiang
Yuxuan Cao
Carl Yang
Chunping Wang
Yang Yang
AI4CE
38
14
0
02 Nov 2023
S2F-NER: Exploring Sequence-to-Forest Generation for Complex Entity Recognition
Yongxiu Xu
Heyan Huang
Yue Hu
21
0
0
29 Oct 2023
The Effects of In-domain Corpus Size on pre-training BERT
Chris Sanchez
Zheyu Zhang
AI4CE
16
4
0
15 Dec 2022
Automatic Document Selection for Efficient Encoder Pretraining
Yukun Feng
Patrick Xia
Benjamin Van Durme
João Sedoc
58
7
0
20 Oct 2022
Sort by Structure: Language Model Ranking as Dependency Probing
Max Müller-Eberstein
Rob van der Goot
Barbara Plank
41
3
0
10 Jun 2022
Mixed-effects transformers for hierarchical adaptation
Julia White
Noah D. Goodman
Robert D. Hawkins
24
2
0
03 May 2022
How Universal is Genre in Universal Dependencies?
Max Müller-Eberstein
Rob van der Goot
Barbara Plank
6
6
0
09 Dec 2021
Extraction of Medication Names from Twitter Using Augmentation and an Ensemble of Language Models
I. Kulev
Berkay Köprü
Raul Rodriguez-Esteban
Diego Saldana Miranda
Yi Huang
Alessandro La Torraca
Elif Özkirimli
MedIm
18
4
0
12 Nov 2021
On the Universality of Deep Contextual Language Models
Shaily Bhatt
Poonam Goyal
Sandipan Dandapat
Monojit Choudhury
Sunayana Sitaram
ELM
25
5
0
15 Sep 2021
MDAPT: Multilingual Domain Adaptive Pretraining in a Single Model
Rasmus Jorgensen
Mareike Hartmann
Xiang Dai
Desmond Elliott
AI4CE
24
13
0
14 Sep 2021
Genre as Weak Supervision for Cross-lingual Dependency Parsing
Max Müller-Eberstein
Rob van der Goot
Barbara Plank
176
19
0
10 Sep 2021
Discontinuous Named Entity Recognition as Maximal Clique Discovery
Yucheng Wang
Yu Bowen
Hongsong Zhu
Tingwen Liu
Nan Yu
Limin Sun
BDL
27
47
0
01 Jun 2021
To Share or not to Share: Predicting Sets of Sources for Model Transfer Learning
Lukas Lange
Jannik Strötgen
Heike Adel
Dietrich Klakow
17
12
0
16 Apr 2021
An Analysis of Simple Data Augmentation for Named Entity Recognition
Xiang Dai
Heike Adel
35
194
0
22 Oct 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,996
0
20 Apr 2018
1