ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.06305
  4. Cited By
Fine-Tuning Pretrained Language Models: Weight Initializations, Data
  Orders, and Early Stopping

Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

15 February 2020
Jesse Dodge
Gabriel Ilharco
Roy Schwartz
Ali Farhadi
Hannaneh Hajishirzi
Noah A. Smith
ArXivPDFHTML

Papers citing "Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping"

50 / 137 papers shown
Title
Training Dynamics for Curriculum Learning: A Study on Monolingual and
  Cross-lingual NLU
Training Dynamics for Curriculum Learning: A Study on Monolingual and Cross-lingual NLU
Fenia Christopoulou
Gerasimos Lampouras
Ignacio Iacobacci
48
3
0
22 Oct 2022
Performance-Efficiency Trade-Offs in Adapting Language Models to Text
  Classification Tasks
Performance-Efficiency Trade-Offs in Adapting Language Models to Text Classification Tasks
Laura Aina
Nikos Voskarides
Roi Blanco
22
0
0
21 Oct 2022
lo-fi: distributed fine-tuning without communication
lo-fi: distributed fine-tuning without communication
Mitchell Wortsman
Suchin Gururangan
Shen Li
Ali Farhadi
Ludwig Schmidt
Michael G. Rabbat
Ari S. Morcos
32
24
0
19 Oct 2022
Improving Stability of Fine-Tuning Pretrained Language Models via
  Component-Wise Gradient Norm Clipping
Improving Stability of Fine-Tuning Pretrained Language Models via Component-Wise Gradient Norm Clipping
Chenghao Yang
Xuezhe Ma
35
6
0
19 Oct 2022
Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling
Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling
Haw-Shiuan Chang
Ruei-Yao Sun
Kathryn Ricci
Andrew McCallum
43
14
0
10 Oct 2022
Efficient Few-Shot Learning Without Prompts
Efficient Few-Shot Learning Without Prompts
Lewis Tunstall
Nils Reimers
Unso Eun Seo Jo
Luke Bates
Daniel Korat
Moshe Wasserblat
Oren Pereg
VLM
36
182
0
22 Sep 2022
Deep Reinforcement Learning for Cryptocurrency Trading: Practical
  Approach to Address Backtest Overfitting
Deep Reinforcement Learning for Cryptocurrency Trading: Practical Approach to Address Backtest Overfitting
Berend Gort
Xiao-Yang Liu
Xinghang Sun
Jiechao Gao
Shuai Chen
Chris Wang
32
13
0
12 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
33
109
0
31 Aug 2022
Combating high variance in Data-Scarce Implicit Hate Speech
  Classification
Combating high variance in Data-Scarce Implicit Hate Speech Classification
Debaditya Pal
Kaustubh Chaudhari
Harsh Sharma
25
1
0
29 Aug 2022
Mere Contrastive Learning for Cross-Domain Sentiment Analysis
Mere Contrastive Learning for Cross-Domain Sentiment Analysis
Yun Luo
Fang Guo
Zihan Liu
Yue Zhang
39
15
0
18 Aug 2022
Eco2AI: carbon emissions tracking of machine learning models as the
  first step towards sustainable AI
Eco2AI: carbon emissions tracking of machine learning models as the first step towards sustainable AI
S. Budennyy
V. Lazarev
N. Zakharenko
A. Korovin
Olga Plosskaya
...
Ivan Oseledets
I. Barsola
Ilya M. Egorov
A. Kosterina
L. Zhukov
39
91
0
31 Jul 2022
Zero-shot Cross-lingual Transfer is Under-specified Optimization
Zero-shot Cross-lingual Transfer is Under-specified Optimization
Shijie Wu
Benjamin Van Durme
Mark Dredze
33
6
0
12 Jul 2022
Explanation-based Counterfactual Retraining(XCR): A Calibration Method
  for Black-box Models
Explanation-based Counterfactual Retraining(XCR): A Calibration Method for Black-box Models
Liu Zhendong
Wenyu Jiang
Yan Zhang
Chongjun Wang
CML
11
0
0
22 Jun 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for
  Large-Scale Transformers
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
VLM
MQ
73
444
0
04 Jun 2022
Can Foundation Models Help Us Achieve Perfect Secrecy?
Can Foundation Models Help Us Achieve Perfect Secrecy?
Simran Arora
Christopher Ré
FedML
24
6
0
27 May 2022
ATTEMPT: Parameter-Efficient Multi-task Tuning via Attentional Mixtures
  of Soft Prompts
ATTEMPT: Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts
Akari Asai
Mohammadreza Salehi
Matthew E. Peters
Hannaneh Hajishirzi
130
100
0
24 May 2022
Few-Shot Natural Language Inference Generation with PDD: Prompt and
  Dynamic Demonstration
Few-Shot Natural Language Inference Generation with PDD: Prompt and Dynamic Demonstration
Kaijian Li
Shansan Gong
Kenny Q. Zhu
27
0
0
21 May 2022
PreQuEL: Quality Estimation of Machine Translation Outputs in Advance
PreQuEL: Quality Estimation of Machine Translation Outputs in Advance
Shachar Don-Yehiya
Leshem Choshen
Omri Abend
33
10
0
18 May 2022
When to Use Multi-Task Learning vs Intermediate Fine-Tuning for
  Pre-Trained Encoder Transfer Learning
When to Use Multi-Task Learning vs Intermediate Fine-Tuning for Pre-Trained Encoder Transfer Learning
Orion Weller
Kevin Seppi
Matt Gardner
22
21
0
17 May 2022
How to Fine-tune Models with Few Samples: Update, Data Augmentation, and
  Test-time Augmentation
How to Fine-tune Models with Few Samples: Update, Data Augmentation, and Test-time Augmentation
Yujin Kim
Jaehoon Oh
Sungnyun Kim
Se-Young Yun
29
6
0
13 May 2022
A Comparison of Approaches for Imbalanced Classification Problems in the
  Context of Retrieving Relevant Documents for an Analysis
A Comparison of Approaches for Imbalanced Classification Problems in the Context of Retrieving Relevant Documents for an Analysis
Sandra Wankmüller
33
2
0
03 May 2022
Embedding Hallucination for Few-Shot Language Fine-tuning
Embedding Hallucination for Few-Shot Language Fine-tuning
Yiren Jian
Chongyang Gao
Soroush Vosoughi
28
4
0
03 May 2022
Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce
  Data Annotation Required in Visual Commonsense Tasks
Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce Data Annotation Required in Visual Commonsense Tasks
Navid Rezaei
Marek Reformat
VLM
17
2
0
25 Apr 2022
mGPT: Few-Shot Learners Go Multilingual
mGPT: Few-Shot Learners Go Multilingual
Oleh Shliazhko
Alena Fenogenova
Maria Tikhonova
Vladislav Mikhailov
Anastasia Kozlova
Tatiana Shavrina
49
149
0
15 Apr 2022
Reducing Model Jitter: Stable Re-training of Semantic Parsers in
  Production Environments
Reducing Model Jitter: Stable Re-training of Semantic Parsers in Production Environments
Christopher Hidey
Fei Liu
Rahul Goel
29
4
0
10 Apr 2022
PERFECT: Prompt-free and Efficient Few-shot Learning with Language
  Models
PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models
Rabeeh Karimi Mahabadi
Luke Zettlemoyer
James Henderson
Marzieh Saeidi
Lambert Mathias
Ves Stoyanov
Majid Yazdani
VLM
34
69
0
03 Apr 2022
Model soups: averaging weights of multiple fine-tuned models improves
  accuracy without increasing inference time
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Mitchell Wortsman
Gabriel Ilharco
S. Gadre
Rebecca Roelofs
Raphael Gontijo-Lopes
...
Hongseok Namkoong
Ali Farhadi
Y. Carmon
Simon Kornblith
Ludwig Schmidt
MoMe
54
922
1
10 Mar 2022
Revisiting Parameter-Efficient Tuning: Are We Really There Yet?
Revisiting Parameter-Efficient Tuning: Are We Really There Yet?
Guanzheng Chen
Fangyu Liu
Zaiqiao Meng
Shangsong Liang
26
88
0
16 Feb 2022
A Differential Entropy Estimator for Training Neural Networks
A Differential Entropy Estimator for Training Neural Networks
Georg Pichler
Pierre Colombo
Malik Boudiaf
Günther Koliander
Pablo Piantanida
25
21
0
14 Feb 2022
Adaptive Fine-Tuning of Transformer-Based Language Models for Named
  Entity Recognition
Adaptive Fine-Tuning of Transformer-Based Language Models for Named Entity Recognition
Felix Stollenwerk
12
3
0
05 Feb 2022
Diversity Enhanced Active Learning with Strictly Proper Scoring Rules
Diversity Enhanced Active Learning with Strictly Proper Scoring Rules
Wei Tan
Lan Du
Wray L. Buntine
16
30
0
27 Oct 2021
SkullEngine: A Multi-stage CNN Framework for Collaborative CBCT Image
  Segmentation and Landmark Detection
SkullEngine: A Multi-stage CNN Framework for Collaborative CBCT Image Segmentation and Landmark Detection
Qin Liu
H. Deng
C. Lian
Xiaoyang Chen
Deqiang Xiao
...
Xu Chen
Tianshu Kuang
J. Gateno
P. Yap
J. Xia
21
25
0
07 Oct 2021
UoB at SemEval-2021 Task 5: Extending Pre-Trained Language Models to
  Include Task and Domain-Specific Information for Toxic Span Prediction
UoB at SemEval-2021 Task 5: Extending Pre-Trained Language Models to Include Task and Domain-Specific Information for Toxic Span Prediction
E. Yan
Harish Tayyar Madabushi
26
2
0
07 Oct 2021
KNN-BERT: Fine-Tuning Pre-Trained Models with KNN Classifier
KNN-BERT: Fine-Tuning Pre-Trained Models with KNN Classifier
Linyang Li
Demin Song
Ruotian Ma
Xipeng Qiu
Xuanjing Huang
29
21
0
06 Oct 2021
Understanding and Overcoming the Challenges of Efficient Transformer
  Quantization
Understanding and Overcoming the Challenges of Efficient Transformer Quantization
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
25
133
0
27 Sep 2021
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao
Ao Zhang
Zhengyan Zhang
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
MLLM
VPVLM
VLM
208
221
0
24 Sep 2021
Weakly Supervised Explainable Phrasal Reasoning with Neural Fuzzy Logic
Weakly Supervised Explainable Phrasal Reasoning with Neural Fuzzy Logic
Zijun Wu
Zi Xuan Zhang
Atharva Naik
Zhijian Mei
Mauajama Firdaus
Lili Mou
LRM
NAI
44
14
0
18 Sep 2021
Raise a Child in Large Language Model: Towards Effective and
  Generalizable Fine-tuning
Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning
Runxin Xu
Fuli Luo
Zhiyuan Zhang
Chuanqi Tan
Baobao Chang
Songfang Huang
Fei Huang
LRM
151
178
0
13 Sep 2021
Subword Mapping and Anchoring across Languages
Subword Mapping and Anchoring across Languages
Giorgos Vernikos
Andrei Popescu-Belis
70
12
0
09 Sep 2021
Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning
Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning
Prasetya Ajie Utama
N. Moosavi
Victor Sanh
Iryna Gurevych
AAML
61
35
0
09 Sep 2021
On the Transferability of Pre-trained Language Models: A Study from
  Artificial Datasets
On the Transferability of Pre-trained Language Models: A Study from Artificial Datasets
Cheng-Han Chiang
Hung-yi Lee
SyDa
34
24
0
08 Sep 2021
Deep Reinforcement Learning at the Edge of the Statistical Precipice
Deep Reinforcement Learning at the Edge of the Statistical Precipice
Rishabh Agarwal
Max Schwarzer
Pablo Samuel Castro
Aaron Courville
Marc G. Bellemare
OffRL
59
637
0
30 Aug 2021
Rethinking Why Intermediate-Task Fine-Tuning Works
Rethinking Why Intermediate-Task Fine-Tuning Works
Ting-Yun Chang
Chi-Jen Lu
LRM
19
29
0
26 Aug 2021
Linking Common Vulnerabilities and Exposures to the MITRE ATT&CK
  Framework: A Self-Distillation Approach
Linking Common Vulnerabilities and Exposures to the MITRE ATT&CK Framework: A Self-Distillation Approach
Benjamin Ampel
Sagar Samtani
Steven Ullman
Hsinchun Chen
25
35
0
03 Aug 2021
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods
  in Natural Language Processing
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
Pengfei Liu
Weizhe Yuan
Jinlan Fu
Zhengbao Jiang
Hiroaki Hayashi
Graham Neubig
VLM
SyDa
58
3,838
0
28 Jul 2021
FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark
FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark
Liang Xu
Xiaojing Lu
Chenyang Yuan
Xuanwei Zhang
Huilin Xu
...
Guoao Wei
X. Pan
Xin Tian
Libo Qin
Hai Hu
ELM
24
56
0
15 Jul 2021
Noise Stability Regularization for Improving BERT Fine-tuning
Noise Stability Regularization for Improving BERT Fine-tuning
Hang Hua
Xingjian Li
Dejing Dou
Chengzhong Xu
Jiebo Luo
19
43
0
10 Jul 2021
The MultiBERTs: BERT Reproductions for Robustness Analysis
The MultiBERTs: BERT Reproductions for Robustness Analysis
Thibault Sellam
Steve Yadlowsky
Jason W. Wei
Naomi Saphra
Alexander DÁmour
...
Iulia Turc
Jacob Eisenstein
Dipanjan Das
Ian Tenney
Ellie Pavlick
24
93
0
30 Jun 2021
A Closer Look at How Fine-tuning Changes BERT
A Closer Look at How Fine-tuning Changes BERT
Yichu Zhou
Vivek Srikumar
26
63
0
27 Jun 2021
Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with
  Language Models
Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models
Robert L Logan IV
Ivana Balavzević
Eric Wallace
Fabio Petroni
Sameer Singh
Sebastian Riedel
VPVLM
39
207
0
24 Jun 2021
Previous
123
Next