ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.10964
  4. Cited By
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

23 April 2020
Suchin Gururangan
Ana Marasović
Swabha Swayamdipta
Kyle Lo
Iz Beltagy
Doug Downey
Noah A. Smith
    VLM
    AI4CE
    CLL
ArXivPDFHTML

Papers citing "Don't Stop Pretraining: Adapt Language Models to Domains and Tasks"

50 / 522 papers shown
Title
DiSTRICT: Dialogue State Tracking with Retriever Driven In-Context
  Tuning
DiSTRICT: Dialogue State Tracking with Retriever Driven In-Context Tuning
Praveen Venkateswaran
Evelyn Duesterwald
Vatche Isahagian
41
7
0
06 Dec 2022
LUNA: Language Understanding with Number Augmentations on Transformers
  via Number Plugins and Pre-training
LUNA: Language Understanding with Number Augmentations on Transformers via Number Plugins and Pre-training
Hongwei Han
Jialiang Xu
Mengyuan Zhou
Yijia Shao
Shi Han
Dongmei Zhang
LMTD
29
7
0
06 Dec 2022
Data-Efficient Finetuning Using Cross-Task Nearest Neighbors
Data-Efficient Finetuning Using Cross-Task Nearest Neighbors
Hamish Ivison
Noah A. Smith
Hannaneh Hajishirzi
Pradeep Dasigi
38
20
0
01 Dec 2022
ConvLab-3: A Flexible Dialogue System Toolkit Based on a Unified Data
  Format
ConvLab-3: A Flexible Dialogue System Toolkit Based on a Unified Data Format
Qi Zhu
Christian Geishauser
Hsien-Chin Lin
Carel van Niekerk
Baolin Peng
...
Dazhen Wan
Xiaochen Zhu
Jianfeng Gao
Milica Gavsić
Minlie Huang
56
23
0
30 Nov 2022
BudgetLongformer: Can we Cheaply Pretrain a SotA Legal Language Model
  From Scratch?
BudgetLongformer: Can we Cheaply Pretrain a SotA Legal Language Model From Scratch?
Joel Niklaus
Daniele Giofré
33
11
0
30 Nov 2022
Rationale-Guided Few-Shot Classification to Detect Abusive Language
Rationale-Guided Few-Shot Classification to Detect Abusive Language
Punyajoy Saha
Divyanshu Sheth
Kushal Kedia
Binny Mathew
Animesh Mukherjee
9
3
0
30 Nov 2022
Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image
  Models
Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models
Lei Wang
Jian He
Xingdong Xu
Ning Liu
Hui-juan Liu
41
2
0
27 Nov 2022
Gender Biases Unexpectedly Fluctuate in the Pre-training Stage of Masked
  Language Models
Gender Biases Unexpectedly Fluctuate in the Pre-training Stage of Masked Language Models
Kenan Tang
Hanchun Jiang
AI4CE
18
1
0
26 Nov 2022
Detecting Entities in the Astrophysics Literature: A Comparison of
  Word-based and Span-based Entity Recognition Methods
Detecting Entities in the Astrophysics Literature: A Comparison of Word-based and Span-based Entity Recognition Methods
Xiang Dai
Sarvnaz Karimi
32
3
0
24 Nov 2022
Using Selective Masking as a Bridge between Pre-training and Fine-tuning
Using Selective Masking as a Bridge between Pre-training and Fine-tuning
Tanish Lad
Himanshu Maheshwari
Shreyas Kottukkal
R. Mamidi
24
3
0
24 Nov 2022
Continual Learning of Natural Language Processing Tasks: A Survey
Continual Learning of Natural Language Processing Tasks: A Survey
Zixuan Ke
Bin Liu
KELM
CLL
VLM
37
69
0
23 Nov 2022
TCBERT: A Technical Report for Chinese Topic Classification BERT
TCBERT: A Technical Report for Chinese Topic Classification BERT
Ting Han
Kunhao Pan
Xinyu Chen
Dingjie Song
Yuchen Fan
Xinyu Gao
Ruyi Gan
Jiaxing Zhang
VLM
25
1
0
21 Nov 2022
An Efficient Active Learning Pipeline for Legal Text Classification
An Efficient Active Learning Pipeline for Legal Text Classification
Sepideh Mamooler
R. Lebret
Stéphane Massonnet
Karl Aberer
AILaw
27
4
0
15 Nov 2022
Unsupervised Domain Adaptation for Sparse Retrieval by Filling
  Vocabulary and Word Frequency Gaps
Unsupervised Domain Adaptation for Sparse Retrieval by Filling Vocabulary and Word Frequency Gaps
Hiroki Iida
Naoaki Okazaki
42
4
0
08 Nov 2022
Coarse-to-fine Knowledge Graph Domain Adaptation based on
  Distantly-supervised Iterative Training
Coarse-to-fine Knowledge Graph Domain Adaptation based on Distantly-supervised Iterative Training
Homgmin Cai
Wenxiong Liao
Zheng Liu
Yiyang Zhang
Xiaoke Huang
...
Lingfei Wu
Ninghao Liu
Quanzheng Li
Tianming Liu
Xiang Li
14
20
0
05 Nov 2022
T5lephone: Bridging Speech and Text Self-supervised Models for Spoken
  Language Understanding via Phoneme level T5
T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5
Chan-Jan Hsu
Ho-Lam Chung
Hung-yi Lee
Yu Tsao
29
6
0
01 Nov 2022
Where to start? Analyzing the potential value of intermediate models
Where to start? Analyzing the potential value of intermediate models
Leshem Choshen
Elad Venezian
Shachar Don-Yehiya
Noam Slonim
Yoav Katz
MoMe
27
27
0
31 Oct 2022
WHEN FLUE MEETS FLANG: Benchmarks and Large Pre-trained Language Model
  for Financial Domain
WHEN FLUE MEETS FLANG: Benchmarks and Large Pre-trained Language Model for Financial Domain
Raj Sanjay Shah
Kunal Chawla
Dheeraj Eidnani
Agam Shah
Wendi Du
Sudheer Chava
Natraj Raman
Charese Smiley
Jiaao Chen
Diyi Yang
AIFin
37
103
0
31 Oct 2022
Generating Sequences by Learning to Self-Correct
Generating Sequences by Learning to Self-Correct
Sean Welleck
Ximing Lu
Peter West
Faeze Brahman
T. Shen
Daniel Khashabi
Yejin Choi
LRM
38
217
0
31 Oct 2022
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for
  Text Generation and Modular Control
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control
Xiaochuang Han
Sachin Kumar
Yulia Tsvetkov
45
79
0
31 Oct 2022
Parameter-Efficient Tuning Makes a Good Classification Head
Parameter-Efficient Tuning Makes a Good Classification Head
Zhuoyi Yang
Ming Ding
Yanhui Guo
Qingsong Lv
Jie Tang
VLM
63
14
0
30 Oct 2022
COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with
  Contrastive and Distributionally Robust Learning
COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning
Yue Yu
Chenyan Xiong
Si Sun
Chao Zhang
Arnold Overwijk
VLM
OOD
52
22
0
27 Oct 2022
Learning on Large-scale Text-attributed Graphs via Variational Inference
Learning on Large-scale Text-attributed Graphs via Variational Inference
Jianan Zhao
Meng Qu
Chaozhuo Li
Hao Yan
Qian Liu
Rui Li
Xing Xie
Jian Tang
VLM
37
134
0
26 Oct 2022
Predicting Long-Term Citations from Short-Term Linguistic Influence
Predicting Long-Term Citations from Short-Term Linguistic Influence
Sandeep Soni
David Bamman
Jacob Eisenstein
23
2
0
24 Oct 2022
Knowledge Transfer from Answer Ranking to Answer Generation
Knowledge Transfer from Answer Ranking to Answer Generation
Matteo Gabburo
Rik Koncel-Kedziorski
Siddhant Garg
Luca Soldaini
Alessandro Moschitti
33
7
0
23 Oct 2022
Cross-domain Generalization for AMR Parsing
Cross-domain Generalization for AMR Parsing
Xuefeng Bai
Sen Yang
Leyang Cui
Linfeng Song
Yue Zhang
49
2
0
22 Oct 2022
NeuroCounterfactuals: Beyond Minimal-Edit Counterfactuals for Richer
  Data Augmentation
NeuroCounterfactuals: Beyond Minimal-Edit Counterfactuals for Richer Data Augmentation
Phillip Howard
Gadi Singer
Vasudev Lal
Yejin Choi
Swabha Swayamdipta
CML
60
25
0
22 Oct 2022
A Survey of Active Learning for Natural Language Processing
A Survey of Active Learning for Natural Language Processing
Zhisong Zhang
Emma Strubell
Eduard H. Hovy
LM&MA
35
65
0
18 Oct 2022
Using Bottleneck Adapters to Identify Cancer in Clinical Notes under
  Low-Resource Constraints
Using Bottleneck Adapters to Identify Cancer in Clinical Notes under Low-Resource Constraints
Omid Rohanian
Hannah Jauncey
Mohammadmahdi Nouriborji
Vinod Kumar Chauhan
Bronner P. Gonccalves
Christiana Kartsonaki
Isaric Clinical Characterisation Group
L. Merson
David Clifton
24
7
0
17 Oct 2022
Table-To-Text generation and pre-training with TabT5
Table-To-Text generation and pre-training with TabT5
Ewa Andrejczuk
Julian Martin Eisenschlos
Francesco Piccinno
Syrine Krichene
Yasemin Altun
LMTD
34
31
0
17 Oct 2022
Improving generalizability of distilled self-supervised speech
  processing models under distorted settings
Improving generalizability of distilled self-supervised speech processing models under distorted settings
Kuan-Po Huang
Yu-Kuan Fu
Tsung-Yuan Hsu
Fabian Ritter-Gutierrez
Fan Wang
Liang-Hsuan Tseng
Yu Zhang
Hung-yi Lee
32
14
0
14 Oct 2022
Self-Adaptive Named Entity Recognition by Retrieving Unstructured
  Knowledge
Self-Adaptive Named Entity Recognition by Retrieving Unstructured Knowledge
Kosuke Nishida
Naoki Yoshinaga
Kyosuke Nishida
37
2
0
14 Oct 2022
Developing a general-purpose clinical language inference model from a
  large corpus of clinical notes
Developing a general-purpose clinical language inference model from a large corpus of clinical notes
Madhumita Sushil
Dana Ludwig
A. Butte
V. Rudrapatna
LM&MA
14
12
0
12 Oct 2022
EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain
EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain
Amir Hadifar
Semere Kiros Bitew
Johannes Deleu
Chris Develder
Thomas Demeester
AI4Ed
41
18
0
12 Oct 2022
MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and
  Contextualized Masked Language Model Score
MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and Contextualized Masked Language Model Score
Sunjae Kwon
Zonghai Yao
H. Jordan
David Levy
Brian Corner
Hong-ye Yu
30
18
0
12 Oct 2022
Knowledge Distillation Transfer Sets and their Impact on Downstream NLU
  Tasks
Knowledge Distillation Transfer Sets and their Impact on Downstream NLU Tasks
Charith Peris
Lizhen Tan
Thomas Gueudré
Turan Gojayev
Vivi Wei
Gokmen Oz
30
4
0
10 Oct 2022
Unified Detoxifying and Debiasing in Language Generation via
  Inference-time Adaptive Optimization
Unified Detoxifying and Debiasing in Language Generation via Inference-time Adaptive Optimization
Zonghan Yang
Xiaoyuan Yi
Peng Li
Yang Liu
Xing Xie
38
33
0
10 Oct 2022
Leveraging Key Information Modeling to Improve Less-Data Constrained
  News Headline Generation via Duality Fine-Tuning
Leveraging Key Information Modeling to Improve Less-Data Constrained News Headline Generation via Duality Fine-Tuning
Zhuoxuan Jiang
Lingfeng Qiao
Di Yin
Shanshan Feng
Bo Ren
SyDa
30
2
0
10 Oct 2022
KSAT: Knowledge-infused Self Attention Transformer -- Integrating
  Multiple Domain-Specific Contexts
KSAT: Knowledge-infused Self Attention Transformer -- Integrating Multiple Domain-Specific Contexts
Kaushik Roy
Yuxin Zi
Vignesh Narayanan
Manas Gaur
Amit P. Sheth
AI4MH
46
12
0
09 Oct 2022
Spread Love Not Hate: Undermining the Importance of Hateful Pre-training
  for Hate Speech Detection
Spread Love Not Hate: Undermining the Importance of Hateful Pre-training for Hate Speech Detection
Omkar Gokhale
Aditya Kane
Shantanu Patankar
Tanmay Chavan
Raviraj Joshi
VLM
35
7
0
09 Oct 2022
On Task-Adaptive Pretraining for Dialogue Response Selection
On Task-Adaptive Pretraining for Dialogue Response Selection
Tzu-Hsiang Lin
Ta-Chung Chi
Anna Rumshisky
21
1
0
08 Oct 2022
Short Text Pre-training with Extended Token Classification for
  E-commerce Query Understanding
Short Text Pre-training with Extended Token Classification for E-commerce Query Understanding
Haoming Jiang
Tianyu Cao
Zheng Li
Cheng-hsin Luo
Xianfeng Tang
Qingyu Yin
Danqing Zhang
R. Goutam
Bing Yin
RALM
37
11
0
08 Oct 2022
Calibrating Factual Knowledge in Pretrained Language Models
Calibrating Factual Knowledge in Pretrained Language Models
Qingxiu Dong
Damai Dai
Yifan Song
Jingjing Xu
Zhifang Sui
Lei Li
KELM
249
83
0
07 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
129
95
0
06 Oct 2022
SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis
SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis
Jiaxin Pei
Vítor Silva
Maarten W. Bos
Yozon Liu
Leonardo Neves
David Jurgens
Francesco Barbieri
55
28
0
03 Oct 2022
DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language
  Processing
DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing
Yanjun Gao
Dmitriy Dligach
Timothy A. Miller
John R. Caskey
Brihat Sharma
M. Churpek
Majid Afshar
ELM
LRM
34
17
0
29 Sep 2022
Downstream Datasets Make Surprisingly Good Pretraining Corpora
Downstream Datasets Make Surprisingly Good Pretraining Corpora
Kundan Krishna
Saurabh Garg
Jeffrey P. Bigham
Zachary Chase Lipton
50
30
0
28 Sep 2022
PePe: Personalized Post-editing Model utilizing User-generated
  Post-edits
PePe: Personalized Post-editing Model utilizing User-generated Post-edits
Jihyeon Janel Lee
Taehee Kim
Yunwon Tae
Cheonbok Park
Jaegul Choo
24
0
0
21 Sep 2022
Generating Persuasive Responses to Customer Reviews with Multi-Source
  Prior Knowledge in E-commerce
Generating Persuasive Responses to Customer Reviews with Multi-Source Prior Knowledge in E-commerce
Bo Chen
Jiayi Liu
M. Maimaiti
Xing Gao
Ji Zhang
25
3
0
20 Sep 2022
Generalizing through Forgetting -- Domain Generalization for Symptom
  Event Extraction in Clinical Notes
Generalizing through Forgetting -- Domain Generalization for Symptom Event Extraction in Clinical Notes
Sitong Zhou
K. Lybarger
Meliha Yetisgen-Yildiz
Mari Ostendorf
40
2
0
20 Sep 2022
Previous
123456...91011
Next