Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.02969
Cited By
BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance
7 November 2019
R. Thomas McCoy
Junghyun Min
Tal Linzen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance"
50 / 51 papers shown
Title
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
Oskar van der Wal
Pietro Lesci
Max Muller-Eberstein
Naomi Saphra
Hailey Schoelkopf
Willem H. Zuidema
Stella Biderman
LRM
70
2
0
12 Mar 2025
(How) Do Language Models Track State?
Belinda Z. Li
Zifan Carl Guo
Jacob Andreas
LRM
59
2
0
04 Mar 2025
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Michael Y. Hu
Jackson Petty
Chuan Shi
William Merrill
Tal Linzen
AI4CE
75
1
0
26 Feb 2025
Distributional Scaling for Emergent Capabilities
Rosie Zhao
Tian Qin
David Alvarez-Melis
Sham Kakade
Naomi Saphra
LRM
62
2
0
24 Feb 2025
The Curious Case of Arbitrariness in Machine Learning
Prakhar Ganesh
Afaf Taik
G. Farnadi
79
2
0
28 Jan 2025
Survival of the Fittest Representation: A Case Study with Modular Addition
Xiaoman Delores Ding
Zifan Carl Guo
Eric J. Michaud
Ziming Liu
Max Tegmark
61
3
0
27 May 2024
Acquiring Linguistic Knowledge from Multimodal Input
Theodor Amariucai
Alexander Scott Warstadt
CLL
53
2
0
27 Feb 2024
Punctuation Restoration Improves Structure Understanding Without Supervision
Junghyun Min
Minho Lee
Woochul Lee
Yeonsoo Lee
62
1
0
13 Feb 2024
Position: Key Claims in LLM Research Have a Long Tail of Footnotes
Anna Rogers
A. Luccioni
89
19
0
14 Aug 2023
On The Impact of Machine Learning Randomness on Group Fairness
Prakhar Ganesh
Hong Chang
Martin Strobel
Reza Shokri
FaML
41
30
0
09 Jul 2023
Cross-lingual Transfer Can Worsen Bias in Sentiment Analysis
Seraphina Goldfarb-Tarrant
Bjorn Ross
Adam Lopez
59
7
0
22 May 2023
Similarity of Neural Network Models: A Survey of Functional and Representational Measures
Max Klabunde
Tobias Schumacher
M. Strohmaier
Florian Lemmerich
65
67
0
10 May 2023
Evaluating the Robustness of Machine Reading Comprehension Models to Low Resource Entity Renaming
Clemencia Siro
T. Ajayi
34
2
0
06 Apr 2023
A Modern Look at the Relationship between Sharpness and Generalization
Maksym Andriushchenko
Francesco Croce
Maximilian Müller
Matthias Hein
Nicolas Flammarion
3DH
61
56
0
14 Feb 2023
Learning the Effects of Physical Actions in a Multi-modal Environment
Gautier Dagan
Frank Keller
A. Lascarides
LM&Ro
47
3
0
27 Jan 2023
Where to start? Analyzing the potential value of intermediate models
Leshem Choshen
Elad Venezian
Shachar Don-Yehiya
Noam Slonim
Yoav Katz
MoMe
40
27
0
31 Oct 2022
Probing with Noise: Unpicking the Warp and Weft of Embeddings
Filip Klubicka
John D. Kelleher
43
4
0
21 Oct 2022
Monotonic Risk Relationships under Distribution Shifts for Regularized Risk Minimization
Daniel LeJeune
Jiayu Liu
Reinhard Heckel
33
0
0
20 Oct 2022
GULP: a prediction-based metric between representations
Enric Boix Adserà
Hannah Lawrence
George Stepaniants
Philippe Rigollet
51
11
0
12 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
131
95
0
06 Oct 2022
Lost in Context? On the Sense-wise Variance of Contextualized Word Embeddings
Yile Wang
Yue Zhang
31
4
0
20 Aug 2022
Linear Connectivity Reveals Generalization Strategies
Jeevesh Juneja
Rachit Bansal
Kyunghyun Cho
João Sedoc
Naomi Saphra
252
45
0
24 May 2022
mGPT: Few-Shot Learners Go Multilingual
Oleh Shliazhko
Alena Fenogenova
Maria Tikhonova
Vladislav Mikhailov
Anastasia Kozlova
Tatiana Shavrina
60
150
0
15 Apr 2022
Reducing Model Jitter: Stable Re-training of Semantic Parsers in Production Environments
Christopher Hidey
Fei Liu
Rahul Goel
37
4
0
10 Apr 2022
Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT
Aparna Elangovan
Yuan Li
Douglas E. V. Pires
Melissa J. Davis
Karin Verspoor
36
8
0
06 Jan 2022
Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation
Zoey Liu
Emily Tucker Prudhommeaux
68
4
0
05 Jan 2022
Building Human-like Communicative Intelligence: A Grounded Perspective
M. Dubova
36
12
0
02 Jan 2022
How Emotionally Stable is ALBERT? Testing Robustness with Stochastic Weight Averaging on a Sentiment Analysis Task
Urja Khurana
Eric T. Nalisnick
Antske Fokkens
MoMe
43
6
0
18 Nov 2021
The Grammar-Learning Trajectories of Neural Language Models
Leshem Choshen
Guy Hacohen
D. Weinshall
Omri Abend
43
28
0
13 Sep 2021
Debiasing Methods in Natural Language Understanding Make Bias More Accessible
Michael J. Mendelson
Yonatan Belinkov
49
23
0
09 Sep 2021
Teaching Autoregressive Language Models Complex Tasks By Demonstration
Gabriel Recchia
41
22
0
05 Sep 2021
Grounding Representation Similarity with Statistical Testing
Frances Ding
Jean-Stanislas Denain
Jacob Steinhardt
22
30
0
03 Aug 2021
QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension
Anna Rogers
Matt Gardner
Isabelle Augenstein
41
163
0
27 Jul 2021
The MultiBERTs: BERT Reproductions for Robustness Analysis
Thibault Sellam
Steve Yadlowsky
Jason W. Wei
Naomi Saphra
Alexander DÁmour
...
Iulia Turc
Jacob Eisenstein
Dipanjan Das
Ian Tenney
Ellie Pavlick
47
93
0
30 Jun 2021
On the proper role of linguistically-oriented deep net analysis in linguistic theorizing
Marco Baroni
26
51
0
16 Jun 2021
Are Larger Pretrained Language Models Uniformly Better? Comparing Performance at the Instance Level
Ruiqi Zhong
Dhruba Ghosh
Dan Klein
Jacob Steinhardt
43
35
0
13 May 2021
How Reliable are Model Diagnostics?
V. Aribandi
Yi Tay
Donald Metzler
24
19
0
12 May 2021
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures
Sushant Singh
A. Mahmood
AI4TS
64
94
0
23 Mar 2021
How Many Data Points is a Prompt Worth?
Teven Le Scao
Alexander M. Rush
VLM
71
300
0
15 Mar 2021
WILDS: A Benchmark of in-the-Wild Distribution Shifts
Pang Wei Koh
Shiori Sagawa
Henrik Marklund
Sang Michael Xie
Marvin Zhang
...
A. Kundaje
Emma Pierson
Sergey Levine
Chelsea Finn
Percy Liang
OOD
113
1,396
0
14 Dec 2020
Underspecification Presents Challenges for Credibility in Modern Machine Learning
Alexander DÁmour
Katherine A. Heller
D. Moldovan
Ben Adlam
B. Alipanahi
...
Kellie Webster
Steve Yadlowsky
T. Yun
Xiaohua Zhai
D. Sculley
OffRL
82
677
0
06 Nov 2020
Improving Robustness by Augmenting Training Sentences with Predicate-Argument Structures
N. Moosavi
M. Boer
Prasetya Ajie Utama
Iryna Gurevych
34
13
0
23 Oct 2020
Compositional Networks Enable Systematic Generalization for Grounded Language Understanding
Yen-Ling Kuo
Boris Katz
Andrei Barbu
41
22
0
06 Aug 2020
How Can We Accelerate Progress Towards Human-like Linguistic Generalization?
Tal Linzen
220
191
0
03 May 2020
When BERT Plays the Lottery, All Tickets Are Winning
Sai Prasanna
Anna Rogers
Anna Rumshisky
MILM
30
187
0
01 May 2020
Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance
Prasetya Ajie Utama
N. Moosavi
Iryna Gurevych
OODD
30
125
0
01 May 2020
The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions
Xiang Zhou
Yixin Nie
Hao Tan
Joey Tianyi Zhou
49
40
0
28 Apr 2020
Syntactic Data Augmentation Increases Robustness to Inference Heuristics
Junghyun Min
R. Thomas McCoy
Dipanjan Das
Emily Pitler
Tal Linzen
44
177
0
24 Apr 2020
Adversarial Filters of Dataset Biases
Ronan Le Bras
Swabha Swayamdipta
Chandra Bhagavatula
Rowan Zellers
Matthew E. Peters
Ashish Sabharwal
Yejin Choi
41
220
0
10 Feb 2020
The Fine Line between Linguistic Generalization and Failure in Seq2Seq-Attention Models
Noah Weber
L. Shekhar
Niranjan Balasubramanian
105
30
0
03 May 2018
1
2
Next