ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.06305
  4. Cited By
Fine-Tuning Pretrained Language Models: Weight Initializations, Data
  Orders, and Early Stopping

Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

15 February 2020
Jesse Dodge
Gabriel Ilharco
Roy Schwartz
Ali Farhadi
Hannaneh Hajishirzi
Noah A. Smith
ArXivPDFHTML

Papers citing "Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping"

37 / 137 papers shown
Title
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
Rabeeh Karimi Mahabadi
James Henderson
Sebastian Ruder
MoE
67
468
0
08 Jun 2021
The Out-of-Distribution Problem in Explainability and Search Methods for
  Feature Importance Explanations
The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations
Peter Hase
Harry Xie
Joey Tianyi Zhou
OODD
LRM
FAtt
23
91
0
01 Jun 2021
Sentiment analysis in tweets: an assessment study from classical to
  modern text representation models
Sentiment analysis in tweets: an assessment study from classical to modern text representation models
Sérgio Barreto
Ricardo Moura
Jonnathan Carvalho
A. Paes
A. Plastino
23
14
0
29 May 2021
PTR: Prompt Tuning with Rules for Text Classification
PTR: Prompt Tuning with Rules for Text Classification
Xu Han
Weilin Zhao
Ning Ding
Zhiyuan Liu
Maosong Sun
VLM
41
513
0
24 May 2021
On the Importance of Effectively Adapting Pretrained Language Models for
  Active Learning
On the Importance of Effectively Adapting Pretrained Language Models for Active Learning
Katerina Margatina
Loïc Barrault
Nikolaos Aletras
27
36
0
16 Apr 2021
How Many Data Points is a Prompt Worth?
How Many Data Points is a Prompt Worth?
Teven Le Scao
Alexander M. Rush
VLM
66
296
0
15 Mar 2021
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout
  Transformer
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
Rafal Powalski
Łukasz Borchmann
Dawid Jurkiewicz
Tomasz Dwojak
Michal Pietruszka
Gabriela Pałka
ViT
36
157
0
18 Feb 2021
I-BERT: Integer-only BERT Quantization
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
107
341
0
05 Jan 2021
Making Pre-trained Language Models Better Few-shot Learners
Making Pre-trained Language Models Better Few-shot Learners
Tianyu Gao
Adam Fisch
Danqi Chen
243
1,924
0
31 Dec 2020
A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots
  Matters
A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters
Mengjie Zhao
Yi Zhu
Ehsan Shareghi
Ivan Vulić
Roi Reichart
Anna Korhonen
Hinrich Schütze
32
64
0
31 Dec 2020
Infusing Finetuning with Semantic Dependencies
Infusing Finetuning with Semantic Dependencies
Zhaofeng Wu
Hao Peng
Noah A. Smith
25
36
0
10 Dec 2020
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework
  of Vision-and-Language BERTs
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs
Emanuele Bugliarello
Ryan Cotterell
Naoaki Okazaki
Desmond Elliott
35
119
0
30 Nov 2020
Underspecification Presents Challenges for Credibility in Modern Machine
  Learning
Underspecification Presents Challenges for Credibility in Modern Machine Learning
Alexander DÁmour
Katherine A. Heller
D. Moldovan
Ben Adlam
B. Alipanahi
...
Kellie Webster
Steve Yadlowsky
T. Yun
Xiaohua Zhai
D. Sculley
OffRL
77
670
0
06 Nov 2020
Supervised Contrastive Learning for Pre-trained Language Model
  Fine-tuning
Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning
Beliz Gunel
Jingfei Du
Alexis Conneau
Ves Stoyanov
18
499
0
03 Nov 2020
On the Transformer Growth for Progressive BERT Training
On the Transformer Growth for Progressive BERT Training
Xiaotao Gu
Liyuan Liu
Hongkun Yu
Jing Li
Chong Chen
Jiawei Han
VLM
69
51
0
23 Oct 2020
GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight
  Gated Injection Method
GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight Gated Injection Method
Nicole Peinelt
Marek Rei
Maria Liakata
30
2
0
23 Oct 2020
The RELX Dataset and Matching the Multilingual Blanks for Cross-Lingual
  Relation Classification
The RELX Dataset and Matching the Multilingual Blanks for Cross-Lingual Relation Classification
Abdullatif Köksal
Arzucan Özgür
38
19
0
19 Oct 2020
Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for
  Pairwise Sentence Scoring Tasks
Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks
Nandan Thakur
Nils Reimers
Johannes Daxenberger
Iryna Gurevych
208
241
0
16 Oct 2020
How Effective is Task-Agnostic Data Augmentation for Pretrained
  Transformers?
How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers?
Shayne Longpre
Yu Wang
Christopher DuBois
ViT
19
83
0
05 Oct 2020
Data-Efficient Pretraining via Contrastive Self-Supervision
Data-Efficient Pretraining via Contrastive Self-Supervision
Nils Rethmeier
Isabelle Augenstein
23
20
0
02 Oct 2020
Dataset Cartography: Mapping and Diagnosing Datasets with Training
  Dynamics
Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics
Swabha Swayamdipta
Roy Schwartz
Nicholas Lourie
Yizhong Wang
Hannaneh Hajishirzi
Noah A. Smith
Yejin Choi
44
429
0
22 Sep 2020
Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning
  in NLP Using Fewer Parameters & Less Data
Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data
Jonathan Pilault
Amine Elhattami
C. Pal
CLL
MoE
30
89
0
19 Sep 2020
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot
  Learners
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
Timo Schick
Hinrich Schütze
51
955
0
15 Sep 2020
Differentially Private Language Models Benefit from Public Pre-training
Differentially Private Language Models Benefit from Public Pre-training
Gavin Kerrigan
Dylan Slack
Jens Tuyls
24
56
0
13 Sep 2020
Learning to summarize from human feedback
Learning to summarize from human feedback
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
46
1,984
0
02 Sep 2020
Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for
  Improved Generalization
Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization
Sang Michael Xie
Tengyu Ma
Percy Liang
35
13
0
29 Jun 2020
Revisiting Few-sample BERT Fine-tuning
Revisiting Few-sample BERT Fine-tuning
Tianyi Zhang
Felix Wu
Arzoo Katiyar
Kilian Q. Weinberger
Yoav Artzi
41
441
0
10 Jun 2020
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and
  Strong Baselines
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
Marius Mosbach
Maksym Andriushchenko
Dietrich Klakow
31
352
0
08 Jun 2020
ApplicaAI at SemEval-2020 Task 11: On RoBERTa-CRF, Span CLS and Whether
  Self-Training Helps Them
ApplicaAI at SemEval-2020 Task 11: On RoBERTa-CRF, Span CLS and Whether Self-Training Helps Them
Dawid Jurkiewicz
Łukasz Borchmann
Izabela Kosmala
Filip Graliñski
19
40
0
16 May 2020
The Sensitivity of Language Models and Humans to Winograd Schema
  Perturbations
The Sensitivity of Language Models and Humans to Winograd Schema Perturbations
Mostafa Abdou
Vinit Ravishankar
Maria Barrett
Yonatan Belinkov
Desmond Elliott
Anders Søgaard
ReLM
LRM
62
34
0
04 May 2020
DQI: Measuring Data Quality in NLP
DQI: Measuring Data Quality in NLP
Swaroop Mishra
Anjana Arunkumar
Bhavdeep Singh Sachdeva
Chris Bryan
Chitta Baral
36
30
0
02 May 2020
IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization
IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization
Wenxuan Zhou
Bill Yuchen Lin
Xiang Ren
14
24
0
02 May 2020
MT-Clinical BERT: Scaling Clinical Information Extraction with Multitask
  Learning
MT-Clinical BERT: Scaling Clinical Information Extraction with Multitask Learning
Andriy Mulyar
Bridget T. McInnes
16
52
0
21 Apr 2020
Beyond Fine-tuning: Few-Sample Sentence Embedding Transfer
Beyond Fine-tuning: Few-Sample Sentence Embedding Transfer
Siddhant Garg
Rohit Kumar Sharma
Yingyu Liang
36
4
0
10 Apr 2020
Pre-trained Models for Natural Language Processing: A Survey
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
243
1,452
0
18 Mar 2020
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
Chen Zhu
Yu Cheng
Zhe Gan
S. Sun
Tom Goldstein
Jingjing Liu
AAML
232
438
0
25 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
Previous
123