BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance

7 November 2019

Papers citing "BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance"

50 / 51 papers shown

Title
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs Oskar van der Wal Pietro Lesci Max Muller-Eberstein Naomi Saphra Hailey Schoelkopf Willem H. Zuidema Stella Biderman LRM 70 2 0 12 Mar 2025
(How) Do Language Models Track State? Belinda Z. Li Zifan Carl Guo Jacob Andreas LRM 59 2 0 04 Mar 2025
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases Michael Y. Hu Jackson Petty Chuan Shi William Merrill Tal Linzen AI4CE 75 1 0 26 Feb 2025
Distributional Scaling for Emergent Capabilities Rosie Zhao Tian Qin David Alvarez-Melis Sham Kakade Naomi Saphra LRM 62 2 0 24 Feb 2025
The Curious Case of Arbitrariness in Machine Learning Prakhar Ganesh Afaf Taik G. Farnadi 79 2 0 28 Jan 2025
Survival of the Fittest Representation: A Case Study with Modular Addition Xiaoman Delores Ding Zifan Carl Guo Eric J. Michaud Ziming Liu Max Tegmark 61 3 0 27 May 2024
Acquiring Linguistic Knowledge from Multimodal Input Theodor Amariucai Alexander Scott Warstadt CLL 53 2 0 27 Feb 2024
Punctuation Restoration Improves Structure Understanding Without Supervision Junghyun Min Minho Lee Woochul Lee Yeonsoo Lee 62 1 0 13 Feb 2024
Position: Key Claims in LLM Research Have a Long Tail of Footnotes Anna Rogers A. Luccioni 89 19 0 14 Aug 2023
On The Impact of Machine Learning Randomness on Group Fairness Prakhar Ganesh Hong Chang Martin Strobel Reza Shokri FaML 41 30 0 09 Jul 2023
Cross-lingual Transfer Can Worsen Bias in Sentiment Analysis Seraphina Goldfarb-Tarrant Bjorn Ross Adam Lopez 59 7 0 22 May 2023
Similarity of Neural Network Models: A Survey of Functional and Representational Measures Max Klabunde Tobias Schumacher M. Strohmaier Florian Lemmerich 65 67 0 10 May 2023
Evaluating the Robustness of Machine Reading Comprehension Models to Low Resource Entity Renaming Clemencia Siro T. Ajayi 34 2 0 06 Apr 2023
A Modern Look at the Relationship between Sharpness and Generalization Maksym Andriushchenko Francesco Croce Maximilian Müller Matthias Hein Nicolas Flammarion 3DH 61 56 0 14 Feb 2023
Learning the Effects of Physical Actions in a Multi-modal Environment Gautier Dagan Frank Keller A. Lascarides LM&Ro 47 3 0 27 Jan 2023
Where to start? Analyzing the potential value of intermediate models Leshem Choshen Elad Venezian Shachar Don-Yehiya Noam Slonim Yoav Katz MoMe 40 27 0 31 Oct 2022
Probing with Noise: Unpicking the Warp and Weft of Embeddings Filip Klubicka John D. Kelleher 43 4 0 21 Oct 2022
Monotonic Risk Relationships under Distribution Shifts for Regularized Risk Minimization Daniel LeJeune Jiayu Liu Reinhard Heckel 33 0 0 20 Oct 2022
GULP: a prediction-based metric between representations Enric Boix Adserà Hannah Lawrence George Stepaniants Philippe Rigollet 51 11 0 12 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review Dieuwke Hupkes Mario Giulianelli Verna Dankers Mikel Artetxe Yanai Elazar ... Leila Khalatbari Maria Ryskina Rita Frieske Ryan Cotterell Zhijing Jin 131 95 0 06 Oct 2022
Lost in Context? On the Sense-wise Variance of Contextualized Word Embeddings Yile Wang Yue Zhang 31 4 0 20 Aug 2022
Linear Connectivity Reveals Generalization Strategies Jeevesh Juneja Rachit Bansal Kyunghyun Cho João Sedoc Naomi Saphra 252 45 0 24 May 2022
mGPT: Few-Shot Learners Go Multilingual Oleh Shliazhko Alena Fenogenova Maria Tikhonova Vladislav Mikhailov Anastasia Kozlova Tatiana Shavrina 60 150 0 15 Apr 2022
Reducing Model Jitter: Stable Re-training of Semantic Parsers in Production Environments Christopher Hidey Fei Liu Rahul Goel 37 4 0 10 Apr 2022
Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT Aparna Elangovan Yuan Li Douglas E. V. Pires Melissa J. Davis Karin Verspoor 36 8 0 06 Jan 2022
Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation Zoey Liu Emily Tucker Prudhommeaux 68 4 0 05 Jan 2022
Building Human-like Communicative Intelligence: A Grounded Perspective M. Dubova 36 12 0 02 Jan 2022
How Emotionally Stable is ALBERT? Testing Robustness with Stochastic Weight Averaging on a Sentiment Analysis Task Urja Khurana Eric T. Nalisnick Antske Fokkens MoMe 43 6 0 18 Nov 2021
The Grammar-Learning Trajectories of Neural Language Models Leshem Choshen Guy Hacohen D. Weinshall Omri Abend 43 28 0 13 Sep 2021
Debiasing Methods in Natural Language Understanding Make Bias More Accessible Michael J. Mendelson Yonatan Belinkov 49 23 0 09 Sep 2021
Teaching Autoregressive Language Models Complex Tasks By Demonstration Gabriel Recchia 41 22 0 05 Sep 2021
Grounding Representation Similarity with Statistical Testing Frances Ding Jean-Stanislas Denain Jacob Steinhardt 22 30 0 03 Aug 2021
QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension Anna Rogers Matt Gardner Isabelle Augenstein 41 163 0 27 Jul 2021
The MultiBERTs: BERT Reproductions for Robustness Analysis Thibault Sellam Steve Yadlowsky Jason W. Wei Naomi Saphra Alexander DÁmour ... Iulia Turc Jacob Eisenstein Dipanjan Das Ian Tenney Ellie Pavlick 47 93 0 30 Jun 2021
On the proper role of linguistically-oriented deep net analysis in linguistic theorizing Marco Baroni 26 51 0 16 Jun 2021
Are Larger Pretrained Language Models Uniformly Better? Comparing Performance at the Instance Level Ruiqi Zhong Dhruba Ghosh Dan Klein Jacob Steinhardt 43 35 0 13 May 2021
How Reliable are Model Diagnostics? V. Aribandi Yi Tay Donald Metzler 24 19 0 12 May 2021
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures Sushant Singh A. Mahmood AI4TS 64 94 0 23 Mar 2021
How Many Data Points is a Prompt Worth? Teven Le Scao Alexander M. Rush VLM 71 300 0 15 Mar 2021
WILDS: A Benchmark of in-the-Wild Distribution Shifts Pang Wei Koh Shiori Sagawa Henrik Marklund Sang Michael Xie Marvin Zhang ... A. Kundaje Emma Pierson Sergey Levine Chelsea Finn Percy Liang OOD 113 1,396 0 14 Dec 2020
Underspecification Presents Challenges for Credibility in Modern Machine Learning Alexander DÁmour Katherine A. Heller D. Moldovan Ben Adlam B. Alipanahi ... Kellie Webster Steve Yadlowsky T. Yun Xiaohua Zhai D. Sculley OffRL 82 677 0 06 Nov 2020
Improving Robustness by Augmenting Training Sentences with Predicate-Argument Structures N. Moosavi M. Boer Prasetya Ajie Utama Iryna Gurevych 34 13 0 23 Oct 2020
Compositional Networks Enable Systematic Generalization for Grounded Language Understanding Yen-Ling Kuo Boris Katz Andrei Barbu 41 22 0 06 Aug 2020
How Can We Accelerate Progress Towards Human-like Linguistic Generalization? Tal Linzen 220 191 0 03 May 2020
When BERT Plays the Lottery, All Tickets Are Winning Sai Prasanna Anna Rogers Anna Rumshisky MILM 30 187 0 01 May 2020
Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance Prasetya Ajie Utama N. Moosavi Iryna Gurevych OODD 30 125 0 01 May 2020
The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions Xiang Zhou Yixin Nie Hao Tan Joey Tianyi Zhou 49 40 0 28 Apr 2020
Syntactic Data Augmentation Increases Robustness to Inference Heuristics Junghyun Min R. Thomas McCoy Dipanjan Das Emily Pitler Tal Linzen 44 177 0 24 Apr 2020
Adversarial Filters of Dataset Biases Ronan Le Bras Swabha Swayamdipta Chandra Bhagavatula Rowan Zellers Matthew E. Peters Ashish Sabharwal Yejin Choi 41 220 0 10 Feb 2020
The Fine Line between Linguistic Generalization and Failure in Seq2Seq-Attention Models Noah Weber L. Shekhar Niranjan Balasubramanian 105 30 0 03 May 2018