On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines

8 June 2020

Papers citing "On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines"

50 / 93 papers shown

Title
Can Frontier LLMs Replace Annotators in Biomedical Text Mining? Analyzing Challenges and Exploring Solutions Yichong Zhao Susumu Goto 60 0 0 05 Mar 2025
Decoding Reading Goals from Eye Movements Omer Shubi Cfir Avraham Hadar Yevgeni Berzak AIMat 49 1 0 28 Oct 2024
HATFormer: Historic Handwritten Arabic Text Recognition with Transformers Adrian Chan Anupam Mijar Mehreen Saeed Chau-Wai Wong Akram Khater 41 0 0 03 Oct 2024
Efficient LLM Context Distillation Rajesh Upadhayayaya Zachary Smith Chritopher Kottmyer Manish Raj Osti 42 1 0 03 Sep 2024
Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models Christopher Schröder Gerhard Heyer VLM 44 0 0 13 Jun 2024
Investigating the Robustness of Modelling Decisions for Few-Shot Cross-Topic Stance Detection: A Preregistered Study Myrthe Reuver Suzan Verberne Antske Fokkens 37 1 0 05 Apr 2024
CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion Zijun Long George Killick Lipeng Zhuang Gerardo Aragon Camarasa Zaiqiao Meng R. McCreadie VLM 47 2 0 22 Feb 2024
PolyIE: A Dataset of Information Extraction from Polymer Material Scientific Literature Jerry Junyang Cheung Yuchen Zhuang Yinghao Li Pranav Shetty Wantian Zhao Sanjeev Grampurohit R. Ramprasad Chao Zhang AI4CE 14 11 0 13 Nov 2023
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics Yupei Du Albert Gatt Dong Nguyen 31 1 0 10 Oct 2023
Prompt to be Consistent is Better than Self-Consistent? Few-Shot and Zero-Shot Fact Verification with Pre-trained Language Models Fengzhu Zeng Wei Gao 17 5 0 05 Jun 2023
Understanding Emotion Valence is a Joint Deep Learning Task Gabriel Roccabruna Seyed Mahed Mousavi Giuseppe Riccardi 21 0 0 27 May 2023
Toward Connecting Speech Acts and Search Actions in Conversational Search Tasks Souvick Ghosh Satanu Ghosh C. Shah 25 2 0 08 May 2023
KINLP at SemEval-2023 Task 12: Kinyarwanda Tweet Sentiment Analysis Antoine Nzeyimana 17 3 0 25 Apr 2023
On the Variance of Neural Network Training with respect to Test Sets and Distributions Keller Jordan OOD 21 10 0 04 Apr 2023
Sociocultural knowledge is needed for selection of shots in hate speech detection tasks Antonis Maronikolakis Abdullatif Köksal Hinrich Schütze 43 0 0 04 Apr 2023
Finding the Needle in a Haystack: Unsupervised Rationale Extraction from Long Text Classifiers Kamil Bujel Andrew Caines H. Yannakoudakis Marek Rei AI4TS 19 1 0 14 Mar 2023
Multimodal Prompting with Missing Modalities for Visual Recognition Yi-Lun Lee Yi-Hsuan Tsai Wei-Chen Chiu Chen-Yu Lee VPVLM 27 94 0 06 Mar 2023
Measuring the Instability of Fine-Tuning Yupei Du D. Nguyen 25 4 0 15 Feb 2023
Evaluating the Robustness of Discrete Prompts Yoichi Ishibashi Danushka Bollegala Katsuhito Sudoh Satoshi Nakamura 23 18 0 11 Feb 2023
Multi-Tenant Optimization For Few-Shot Task-Oriented FAQ Retrieval Asha Vishwanathan R. Warrier G. V. Suresh Chandrashekhar Kandpal 11 2 0 25 Jan 2023
A Stability Analysis of Fine-Tuning a Pre-Trained Model Z. Fu Anthony Man-Cho So Nigel Collier 23 3 0 24 Jan 2023
NarrowBERT: Accelerating Masked Language Model Pretraining and Inference Haoxin Li Phillip Keung Daniel Cheng Jungo Kasai Noah A. Smith 14 3 0 11 Jan 2023
InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers Leonid Boytsov Preksha Patel Vivek Sourabh Riddhi Nisar Sayan Kundu R. Ramanathan Eric Nyberg 23 19 0 08 Jan 2023
Examining Political Rhetoric with Epistemic Stance Detection Ankita Gupta Su Lin Blodgett Justin H. Gross Brendan O'Connor 22 0 0 29 Dec 2022
KL Regularized Normalization Framework for Low Resource Tasks Neeraj Kumar Ankur Narang Brejesh Lall 26 1 0 21 Dec 2022
Task-Specific Embeddings for Ante-Hoc Explainable Text Classification Kishaloy Halder Josip Krapac A. Akbik Anthony Brew Matti Lyra 30 0 0 30 Nov 2022
BudgetLongformer: Can we Cheaply Pretrain a SotA Legal Language Model From Scratch? Joel Niklaus Daniele Giofré 27 11 0 30 Nov 2022
Detecting Entities in the Astrophysics Literature: A Comparison of Word-based and Span-based Entity Recognition Methods Xiang Dai Sarvnaz Karimi 24 3 0 24 Nov 2022
MEAL: Stable and Active Learning for Few-Shot Prompting Abdullatif Köksal Timo Schick Hinrich Schütze 19 25 0 15 Nov 2022
An Efficient Active Learning Pipeline for Legal Text Classification Sepideh Mamooler R. Lebret Stéphane Massonnet Karl Aberer AILaw 24 4 0 15 Nov 2022
Probing neural language models for understanding of words of estimative probability Damien Sileo Marie-Francine Moens 19 10 0 07 Nov 2022
Gradient Knowledge Distillation for Pre-trained Language Models Lean Wang Lei Li Xu Sun VLM 23 5 0 02 Nov 2022
We need to talk about random seeds Steven Bethard 31 8 0 24 Oct 2022
Improving Stability of Fine-Tuning Pretrained Language Models via Component-Wise Gradient Norm Clipping Chenghao Yang Xuezhe Ma 35 6 0 19 Oct 2022
Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning Shuo Xie Jiahao Qiu Ankita Pasad Li Du Qing Qu Hongyuan Mei 35 16 0 18 Oct 2022
AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning Tao Yang Jinghao Deng Xiaojun Quan Qifan Wang Shaoliang Nie 30 3 0 12 Oct 2022
Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling Haw-Shiuan Chang Ruei-Yao Sun Kathryn Ricci Andrew McCallum 43 14 0 10 Oct 2022
UU-Tax at SemEval-2022 Task 3: Improving the generalizability of language models for taxonomy classification through data augmentation I. Sarhan P. Mosteiro Marco Spruit 29 2 0 07 Oct 2022
An Empirical Study on Cross-X Transfer for Legal Judgment Prediction Joel Niklaus Matthias Sturmer Ilias Chalkidis ELM AILaw 37 19 0 25 Sep 2022
Drawing Causal Inferences About Performance Effects in NLP Sandra Wankmüller CML 16 1 0 14 Sep 2022
Heuristic-free Optimization of Force-Controlled Robot Search Strategies in Stochastic Environments Bastian Alt Darko Katic Rainer Jäkel Michael Beetz 21 6 0 15 Jul 2022
Zero-shot Cross-lingual Transfer is Under-specified Optimization Shijie Wu Benjamin Van Durme Mark Dredze 25 6 0 12 Jul 2022
Pretrained Models for Multilingual Federated Learning Orion Weller Marc Marone Vladimir Braverman Dawn J Lawrie Benjamin Van Durme VLM FedML AI4CE 33 42 0 06 Jun 2022
Can Foundation Models Help Us Achieve Perfect Secrecy? Simran Arora Christopher Ré FedML 21 6 0 27 May 2022
Linear Connectivity Reveals Generalization Strategies Jeevesh Juneja Rachit Bansal Kyunghyun Cho João Sedoc Naomi Saphra 239 45 0 24 May 2022
ATTEMPT: Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts Akari Asai Mohammadreza Salehi Matthew E. Peters Hannaneh Hajishirzi 127 100 0 24 May 2022
Calibration of Natural Language Understanding Models with Venn--ABERS Predictors Patrizio Giovannotti 38 6 0 21 May 2022
Zero-shot Code-Mixed Offensive Span Identification through Rationale Extraction Manikandan Ravikiran Bharathi Raja Chakravarthi 22 3 0 12 May 2022
Few-shot Mining of Naturally Occurring Inputs and Outputs Mandar Joshi Terra Blevins M. Lewis Daniel S. Weld Luke Zettlemoyer 27 1 0 09 May 2022
A Comparison of Approaches for Imbalanced Classification Problems in the Context of Retrieving Relevant Documents for an Analysis Sandra Wankmüller 28 2 0 03 May 2022