Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1806.00692
Cited By
v1
v2
v3 (latest)
Stress Test Evaluation for Natural Language Inference
2 June 2018
Aakanksha Naik
Abhilasha Ravichander
Norman M. Sadeh
Carolyn Rose
Graham Neubig
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Stress Test Evaluation for Natural Language Inference"
50 / 149 papers shown
Title
Lost in Variation? Evaluating NLI Performance in Basque and Spanish Geographical Variants
Jaione Bengoetxea
Itziar Gonzalez-Dios
Rodrigo Agerri
19
0
0
18 Jun 2025
Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding
Yeonkyoung So
Gyuseong Lee
Sungmok Jung
Joonhak Lee
JiA Kang
Sangho Kim
Jaejin Lee
38
0
0
17 Jun 2025
Exploring Explanations Improves the Robustness of In-Context Learning
Ukyo Honda
Tatsushi Oka
LRM
70
0
0
03 Jun 2025
What Has Been Lost with Synthetic Evaluation?
Alexander Gill
Abhilasha Ravichander
Ana Marasović
ELM
36
0
0
28 May 2025
Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification
Leon Eshuijs
Shihan Wang
Antske Fokkens
144
0
0
09 May 2025
aiXamine: Simplified LLM Safety and Security
Fatih Deniz
Dorde Popovic
Yazan Boshmaf
Euisuh Jeong
M. Ahmad
Sanjay Chawla
Issa M. Khalil
ELM
341
0
0
21 Apr 2025
CodeCrash: Stress Testing LLM Reasoning under Structural and Semantic Perturbations
Man Ho Lam
Chaozheng Wang
Jen-tse Huang
Michael R. Lyu
LRM
112
1
0
19 Apr 2025
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs
Zhaofeng Wu
Michihiro Yasunaga
Andrew Cohen
Yoon Kim
Asli Celikyilmaz
Marjan Ghazvininejad
90
3
0
14 Mar 2025
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation
Yue Zhou
Yi-Ju Chang
Yuan Wu
MoMe
122
3
0
21 Feb 2025
From Superficial Patterns to Semantic Understanding: Fine-Tuning Language Models on Contrast Sets
Daniel Petrov
50
0
0
05 Jan 2025
Dissecting the Ullman Variations with a SCALPEL: Why do LLMs fail at Trivial Alterations to the False Belief Task?
Zhiqiang Pi
Annapurna Vadaparty
Benjamin Bergen
Cameron R. Jones
81
3
0
20 Jun 2024
Pre-Calc: Learning to Use the Calculator Improves Numeracy in Language Models
Vishruth Veerendranath
Vishwa Shah
Kshitish Ghate
101
0
0
22 Apr 2024
Specification Overfitting in Artificial Intelligence
Benjamin Roth
Pedro Henrique Luz de Araujo
Yuxi Xia
Saskia Kaltenbrunner
Christoph Korab
233
1
0
13 Mar 2024
Semantic Sensitivities and Inconsistent Predictions: Measuring the Fragility of NLI Models
Erik Arakelyan
Zhaoqi Liu
Isabelle Augenstein
AAML
145
12
0
25 Jan 2024
Latent Feature-based Data Splits to Improve Generalisation Evaluation: A Hate Speech Detection Case Study
Maike Zufle
Verna Dankers
Ivan Titov
96
0
0
16 Nov 2023
Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness
Ashim Gupta
Rishanth Rajendhran
Nathan Stringham
Vivek Srikumar
Ana Marasović
AAML
90
3
0
16 Nov 2023
Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features
Ester Hlavnova
Sebastian Ruder
84
5
0
11 Jul 2023
Evaluating Paraphrastic Robustness in Textual Entailment Models
Dhruv Verma
Yash Kumar Lal
Shreyashee Sinha
Benjamin Van Durme
Adam Poliak
91
5
0
29 Jun 2023
From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework
Yangyi Chen
Hongcheng Gao
Ganqu Cui
Lifan Yuan
Dehan Kong
...
Longtao Huang
H. Xue
Zhiyuan Liu
Maosong Sun
Heng Ji
AAML
ELM
101
6
0
29 May 2023
On Degrees of Freedom in Defining and Testing Natural Language Understanding
Saku Sugawara
S. Tsugita
ELM
86
1
0
24 May 2023
Out-of-Distribution Generalization in Text Classification: Past, Present, and Future
Linyi Yang
Yangqiu Song
Xuan Ren
Chenyang Lyu
Yidong Wang
Lingqiao Liu
Jindong Wang
Jennifer Foster
Yue Zhang
OOD
129
3
0
23 May 2023
A Mixed-Methods Approach to Understanding User Trust after Voice Assistant Failures
Amanda Baughan
Allison Mercurio
Ariel Liu
Xuezhi Wang
Jilin Chen
Xiao Ma
76
15
0
01 Mar 2023
SMoA: Sparse Mixture of Adapters to Mitigate Multiple Dataset Biases
Yanchen Liu
Jing Yang
Yan Chen
Jing Liu
Huaqin Wu
MoE
85
2
0
28 Feb 2023
On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex
Terry Yue Zhuo
Zhuang Li
Yujin Huang
Fatemeh Shiri
Weiqing Wang
Gholamreza Haffari
Yuan-Fang Li
AAML
107
57
0
30 Jan 2023
DISCO: Distilling Counterfactuals with Large Language Models
Zeming Chen
Qiyue Gao
Antoine Bosselut
Ashish Sabharwal
Kyle Richardson
96
31
0
20 Dec 2022
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
Tianxing He
Jingyu Zhang
Tianle Wang
Sachin Kumar
Kyunghyun Cho
James R. Glass
Yulia Tsvetkov
150
45
0
20 Dec 2022
Feature-Level Debiased Natural Language Understanding
Yougang Lyu
Piji Li
Yechang Yang
Maarten de Rijke
Fajie Yuan
Yukun Zhao
D. Yin
Zhaochun Ren
91
12
0
11 Dec 2022
AGRO: Adversarial Discovery of Error-prone groups for Robust Optimization
Bhargavi Paranjape
Pradeep Dasigi
Vivek Srikumar
Luke Zettlemoyer
Hannaneh Hajishirzi
98
8
0
02 Dec 2022
AutoCAD: Automatically Generating Counterfactuals for Mitigating Shortcut Learning
Jiaxin Wen
Yeshuang Zhu
Jinchao Zhang
Jie Zhou
Minlie Huang
CML
AAML
115
9
0
29 Nov 2022
Using Focal Loss to Fight Shallow Heuristics: An Empirical Analysis of Modulated Cross-Entropy in Natural Language Inference
Frano Rajic
Ivan Stresec
Axel Marmet
Tim Postuvan
46
3
0
23 Nov 2022
Capabilities for Better ML Engineering
Chenyang Yang
Rachel A. Brower-Sinning
Grace A. Lewis
Christian Kastner
Tongshuang Wu
63
4
0
11 Nov 2022
Looking at the Overlooked: An Analysis on the Word-Overlap Bias in Natural Language Inference
S. Rajaee
Yadollah Yaghoobzadeh
Mohammad Taher Pilehvar
73
5
0
07 Nov 2022
Probing neural language models for understanding of words of estimative probability
Damien Sileo
Marie-Francine Moens
51
12
0
07 Nov 2022
Overcoming Barriers to Skill Injection in Language Modeling: Case Study in Arithmetic
Mandar Sharma
Nikhil Muralidhar
Naren Ramakrishnan
58
6
0
03 Nov 2022
CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation
Abhilasha Ravichander
Matt Gardner
Ana Marasović
112
35
0
01 Nov 2022
Lexical Generalization Improves with Larger Models and Longer Training
Elron Bandel
Yoav Goldberg
Yanai Elazar
94
7
0
23 Oct 2022
Enhancing Tabular Reasoning with Pattern Exploiting Training
Abhilash Shankarampeta
Vivek Gupta
Shuo Zhang
LMTD
RALM
ReLM
139
6
0
21 Oct 2022
Measures of Information Reflect Memorization Patterns
Rachit Bansal
Danish Pruthi
Yonatan Belinkov
110
10
0
17 Oct 2022
A Survey of Parameters Associated with the Quality of Benchmarks in NLP
Swaroop Mishra
Anjana Arunkumar
Chris Bryan
Chitta Baral
105
1
0
14 Oct 2022
Kernel-Whitening: Overcome Dataset Bias with Isotropic Sentence Embedding
Songyang Gao
Shihan Dou
Qi Zhang
Xuanjing Huang
43
8
0
14 Oct 2022
Benchmarking Long-tail Generalization with Likelihood Splits
Ameya Godbole
Robin Jia
ALM
79
9
0
13 Oct 2022
CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation
Tanay Dixit
Bhargavi Paranjape
Hannaneh Hajishirzi
Luke Zettlemoyer
SyDa
206
26
0
10 Oct 2022
InferES : A Natural Language Inference Corpus for Spanish Featuring Negation-Based Contrastive and Adversarial Examples
Venelin Kovatchev
Mariona Taulé
73
4
0
06 Oct 2022
Compositional Evaluation on Japanese Textual Entailment and Similarity
Hitomi Yanaka
K. Mineshima
93
24
0
09 Aug 2022
Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions
Yanai Elazar
Nora Kassner
Shauli Ravfogel
Amir Feder
Abhilasha Ravichander
Marius Mosbach
Yonatan Belinkov
Hinrich Schütze
Yoav Goldberg
CML
SyDa
MILM
110
55
0
28 Jul 2022
Probing via Prompting
Jiaoda Li
Ryan Cotterell
Mrinmaya Sachan
109
13
0
04 Jul 2022
longhorns at DADC 2022: How many linguists does it take to fool a Question Answering model? A systematic approach to adversarial attacks
Venelin Kovatchev
Trina Chatterjee
Venkata S Govindarajan
Jifan Chen
Eunsol Choi
...
K. Erk
Matthew Lease
Junyi Jessy Li
Yating Wu
Kyle Mahowald
AAML
ELM
89
9
0
29 Jun 2022
LegoNN: Building Modular Encoder-Decoder Models
Siddharth Dalmia
Dmytro Okhonko
M. Lewis
Sergey Edunov
Shinji Watanabe
Florian Metze
Luke Zettlemoyer
Abdel-rahman Mohamed
AuLLM
MoE
71
14
0
07 Jun 2022
Linear Connectivity Reveals Generalization Strategies
Jeevesh Juneja
Rachit Bansal
Kyunghyun Cho
João Sedoc
Naomi Saphra
333
48
0
24 May 2022
White-box Testing of NLP models with Mask Neuron Coverage
Arshdeep Sekhon
Yangfeng Ji
Matthew B. Dwyer
Yanjun Qi
AAML
52
3
0
10 May 2022
1
2
3
Next