ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.00692
  4. Cited By
Stress Test Evaluation for Natural Language Inference
v1v2v3 (latest)

Stress Test Evaluation for Natural Language Inference

2 June 2018
Aakanksha Naik
Abhilasha Ravichander
Norman M. Sadeh
Carolyn Rose
Graham Neubig
    ELM
ArXiv (abs)PDFHTML

Papers citing "Stress Test Evaluation for Natural Language Inference"

50 / 149 papers shown
Title
Generalized Quantifiers as a Source of Error in Multilingual NLU
  Benchmarks
Generalized Quantifiers as a Source of Error in Multilingual NLU Benchmarks
Ruixiang Cui
Daniel Hershcovich
Anders Søgaard
87
13
0
22 Apr 2022
When Does Syntax Mediate Neural Language Model Performance? Evidence
  from Dropout Probes
When Does Syntax Mediate Neural Language Model Performance? Evidence from Dropout Probes
Mycal Tucker
Tiwalayo Eisape
Peng Qian
R. Levy
J. Shah
MILM
66
12
0
20 Apr 2022
mGPT: Few-Shot Learners Go Multilingual
mGPT: Few-Shot Learners Go Multilingual
Oleh Shliazhko
Alena Fenogenova
Maria Tikhonova
Vladislav Mikhailov
Anastasia Kozlova
Tatiana Shavrina
137
155
0
15 Apr 2022
Generating Data to Mitigate Spurious Correlations in Natural Language
  Inference Datasets
Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets
Yuxiang Wu
Matt Gardner
Pontus Stenetorp
Pradeep Dasigi
95
68
0
24 Mar 2022
An Analysis of Negation in Natural Language Understanding Corpora
An Analysis of Negation in Natural Language Understanding Corpora
Md Mosharaf Hossain
Dhivya Chinnappa
Eduardo Blanco
116
43
0
16 Mar 2022
Generalized but not Robust? Comparing the Effects of Data Modification
  Methods on Out-of-Domain Generalization and Adversarial Robustness
Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness
Tejas Gokhale
Swaroop Mishra
Man Luo
Bhavdeep Singh Sachdeva
Chitta Baral
102
30
0
15 Mar 2022
Investigating Selective Prediction Approaches Across Several Tasks in
  IID, OOD, and Adversarial Settings
Investigating Selective Prediction Approaches Across Several Tasks in IID, OOD, and Adversarial Settings
Neeraj Varshney
Swaroop Mishra
Chitta Baral
105
56
0
01 Mar 2022
Predicting Out-of-Distribution Error with the Projection Norm
Predicting Out-of-Distribution Error with the Projection Norm
Yaodong Yu
Zitong Yang
Alexander Wei
Yi-An Ma
Jacob Steinhardt
OODD
81
44
0
11 Feb 2022
Describing Differences between Text Distributions with Natural Language
Describing Differences between Text Distributions with Natural Language
Ruiqi Zhong
Charles Burton Snell
Dan Klein
Jacob Steinhardt
VLM
198
44
0
28 Jan 2022
Robust Natural Language Processing: Recent Advances, Challenges, and
  Future Directions
Robust Natural Language Processing: Recent Advances, Challenges, and Future Directions
Marwan Omar
Soohyeon Choi
Daehun Nyang
David A. Mohaisen
80
58
0
03 Jan 2022
Measure and Improve Robustness in NLP Models: A Survey
Measure and Improve Robustness in NLP Models: A Survey
Xuezhi Wang
Haohan Wang
Diyi Yang
300
139
0
15 Dec 2021
Quantifying Adaptability in Pre-trained Language Models with 500 Tasks
Quantifying Adaptability in Pre-trained Language Models with 500 Tasks
Belinda Z. Li
Jane A. Yu
Madian Khabsa
Luke Zettlemoyer
A. Halevy
Jacob Andreas
ELM
89
17
0
06 Dec 2021
NATURE: Natural Auxiliary Text Utterances for Realistic Spoken Language
  Evaluation
NATURE: Natural Auxiliary Text Utterances for Realistic Spoken Language Evaluation
David Alfonso-Hermelo
Ahmad Rashid
Abbas Ghaddar
Huawei Noah’s
Mehdi Rezagholizadeh
77
2
0
09 Nov 2021
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
  Language Models
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models
Wei Ping
Chejian Xu
Shuohang Wang
Zhe Gan
Yu Cheng
Jianfeng Gao
Ahmed Hassan Awadallah
Yangqiu Song
VLMELMAAML
78
227
0
04 Nov 2021
IndoNLI: A Natural Language Inference Dataset for Indonesian
IndoNLI: A Natural Language Inference Dataset for Indonesian
Rahmad Mahendra
Alham Fikri Aji
Samuel Louvan
Fahrurrozi Rahman
Clara Vania
70
32
0
27 Oct 2021
Behavioral Experiments for Understanding Catastrophic Forgetting
Behavioral Experiments for Understanding Catastrophic Forgetting
Samuel J. Bell
Neil D. Lawrence
82
4
0
20 Oct 2021
Retrieval-guided Counterfactual Generation for QA
Retrieval-guided Counterfactual Generation for QA
Bhargavi Paranjape
Matthew Lamm
Ian Tenney
94
31
0
14 Oct 2021
Semantically Distributed Robust Optimization for Vision-and-Language
  Inference
Semantically Distributed Robust Optimization for Vision-and-Language Inference
Tejas Gokhale
A. Chaudhary
Pratyay Banerjee
Chitta Baral
Yezhou Yang
126
17
0
14 Oct 2021
ReaSCAN: Compositional Reasoning in Language Grounding
ReaSCAN: Compositional Reasoning in Language Grounding
Zhengxuan Wu
Elisa Kreiss
Desmond C. Ong
Christopher Potts
CoGeLRM
79
22
0
18 Sep 2021
Does External Knowledge Help Explainable Natural Language Inference?
  Automatic Evaluation vs. Human Ratings
Does External Knowledge Help Explainable Natural Language Inference? Automatic Evaluation vs. Human Ratings
Hendrik Schuff
Hsiu-yu Yang
Heike Adel
Ngoc Thang Vu
ELMReLMLRM
62
13
0
16 Sep 2021
Types of Out-of-Distribution Texts and How to Detect Them
Types of Out-of-Distribution Texts and How to Detect Them
Udit Arora
William Huang
He He
OODD
281
101
0
14 Sep 2021
An Evaluation Dataset and Strategy for Building Robust Multi-turn
  Response Selection Model
An Evaluation Dataset and Strategy for Building Robust Multi-turn Response Selection Model
Kijong Han
Seojin Lee
Wooin Lee
Joosung Lee
Donghun Lee
AAML
38
5
0
10 Sep 2021
Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning
Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning
Prasetya Ajie Utama
N. Moosavi
Victor Sanh
Iryna Gurevych
AAML
128
36
0
09 Sep 2021
Unsupervised Pre-training with Structured Knowledge for Improving
  Natural Language Inference
Unsupervised Pre-training with Structured Knowledge for Improving Natural Language Inference
Xiaoyu Yang
Xiao-Dan Zhu
Zhan Shi
Tianda Li
SSL
54
1
0
08 Sep 2021
Causal Inference in Natural Language Processing: Estimation, Prediction,
  Interpretation and Beyond
Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond
Amir Feder
Katherine A. Keith
Emaad A. Manzoor
Reid Pryzant
Dhanya Sridhar
...
Roi Reichart
Margaret E. Roberts
Brandon M Stewart
Victor Veitch
Diyi Yang
CML
118
246
0
02 Sep 2021
Grounding Representation Similarity with Statistical Testing
Grounding Representation Similarity with Statistical Testing
Frances Ding
Jean-Stanislas Denain
Jacob Steinhardt
87
30
0
03 Aug 2021
Stress Test Evaluation of Biomedical Word Embeddings
Stress Test Evaluation of Biomedical Word Embeddings
Vladimir Araujo
Andrés Carvallo
Carlos Aspillaga
C. Thorne
Denis Parra
44
8
0
24 Jul 2021
Tailor: Generating and Perturbing Text with Semantic Controls
Tailor: Generating and Perturbing Text with Semantic Controls
Alexis Ross
Tongshuang Wu
Hao Peng
Matthew E. Peters
Matt Gardner
202
79
0
15 Jul 2021
An Investigation of the (In)effectiveness of Counterfactually Augmented
  Data
An Investigation of the (In)effectiveness of Counterfactually Augmented Data
Nitish Joshi
He He
OODD
86
47
0
01 Jul 2021
Combining Feature and Instance Attribution to Detect Artifacts
Combining Feature and Instance Attribution to Detect Artifacts
Pouya Pezeshkpour
Sarthak Jain
Sameer Singh
Byron C. Wallace
TDI
127
42
0
01 Jul 2021
The MultiBERTs: BERT Reproductions for Robustness Analysis
The MultiBERTs: BERT Reproductions for Robustness Analysis
Thibault Sellam
Steve Yadlowsky
Jason W. Wei
Naomi Saphra
Alexander DÁmour
...
Iulia Turc
Jacob Eisenstein
Dipanjan Das
Ian Tenney
Ellie Pavlick
111
95
0
30 Jun 2021
Probing Pre-Trained Language Models for Disease Knowledge
Probing Pre-Trained Language Models for Disease Knowledge
Israa Alghanmi
Luis Espinosa-Anke
Steven Schockaert
LM&MAELM
82
13
0
14 Jun 2021
Evaluating Entity Disambiguation and the Role of Popularity in
  Retrieval-Based NLP
Evaluating Entity Disambiguation and the Role of Popularity in Retrieval-Based NLP
Anthony Chen
Pallavi Gudipati
Shayne Longpre
Xiao Ling
Sameer Singh
73
40
0
12 Jun 2021
Figurative Language in Recognizing Textual Entailment
Figurative Language in Recognizing Textual Entailment
Tuhin Chakrabarty
Debanjan Ghosh
Adam Poliak
Smaranda Muresan
64
38
0
02 Jun 2021
SyGNS: A Systematic Generalization Testbed Based on Natural Language
  Semantics
SyGNS: A Systematic Generalization Testbed Based on Natural Language Semantics
Hitomi Yanaka
K. Mineshima
Kentaro Inui
NAIAI4CE
117
11
0
02 Jun 2021
Counterfactual Invariance to Spurious Correlations: Why and How to Pass
  Stress Tests
Counterfactual Invariance to Spurious Correlations: Why and How to Pass Stress Tests
Victor Veitch
Alexander DÁmour
Steve Yadlowsky
Jacob Eisenstein
OOD
82
93
0
31 May 2021
Dynaboard: An Evaluation-As-A-Service Platform for Holistic
  Next-Generation Benchmarking
Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking
Zhiyi Ma
Kawin Ethayarajh
Tristan Thrush
Somya Jain
Ledell Yu Wu
Robin Jia
Christopher Potts
Adina Williams
Douwe Kiela
ELM
115
59
0
21 May 2021
Are Larger Pretrained Language Models Uniformly Better? Comparing
  Performance at the Instance Level
Are Larger Pretrained Language Models Uniformly Better? Comparing Performance at the Instance Level
Ruiqi Zhong
Dhruba Ghosh
Dan Klein
Jacob Steinhardt
91
36
0
13 May 2021
Understanding by Understanding Not: Modeling Negation in Language Models
Understanding by Understanding Not: Modeling Negation in Language Models
Arian Hosseini
Siva Reddy
Dzmitry Bahdanau
R. Devon Hjelm
Alessandro Sordoni
Rameswar Panda
98
90
0
07 May 2021
Flexible Generation of Natural Language Deductions
Flexible Generation of Natural Language Deductions
Kaj Bostrom
Xinyu Zhao
Swarat Chaudhuri
Greg Durrett
ReLMLRM
317
33
0
18 Apr 2021
Dynabench: Rethinking Benchmarking in NLP
Dynabench: Rethinking Benchmarking in NLP
Douwe Kiela
Max Bartolo
Yixin Nie
Divyansh Kaushik
Atticus Geiger
...
Pontus Stenetorp
Robin Jia
Joey Tianyi Zhou
Christopher Potts
Adina Williams
218
411
0
07 Apr 2021
What Will it Take to Fix Benchmarking in Natural Language Understanding?
What Will it Take to Fix Benchmarking in Natural Language Understanding?
Samuel R. Bowman
George E. Dahl
ELMALM
78
164
0
05 Apr 2021
Contrastive Explanations for Model Interpretability
Contrastive Explanations for Model Interpretability
Alon Jacovi
Swabha Swayamdipta
Shauli Ravfogel
Yanai Elazar
Yejin Choi
Yoav Goldberg
163
98
0
02 Mar 2021
NoiseQA: Challenge Set Evaluation for User-Centric Question Answering
NoiseQA: Challenge Set Evaluation for User-Centric Question Answering
Abhilasha Ravichander
Siddharth Dalmia
Maria Ryskina
Florian Metze
Eduard H. Hovy
A. Black
ELM
59
32
0
16 Feb 2021
Statistically Profiling Biases in Natural Language Reasoning Datasets
  and Models
Statistically Profiling Biases in Natural Language Reasoning Datasets and Models
Shanshan Huang
Kenny Q. Zhu
34
1
0
09 Feb 2021
SICKNL: A Dataset for Dutch Natural Language Inference
SICKNL: A Dataset for Dutch Natural Language Inference
G. Wijnholds
M. Moortgat
119
26
0
14 Jan 2021
Robustness Gym: Unifying the NLP Evaluation Landscape
Robustness Gym: Unifying the NLP Evaluation Landscape
Karan Goel
Nazneen Rajani
Jesse Vig
Samson Tan
Jason M. Wu
Stephan Zheng
Caiming Xiong
Joey Tianyi Zhou
Christopher Ré
AAMLOffRLOOD
199
140
0
13 Jan 2021
Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and
  Improving Models
Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models
Tongshuang Wu
Marco Tulio Ribeiro
Jeffrey Heer
Daniel S. Weld
142
251
0
01 Jan 2021
HateCheck: Functional Tests for Hate Speech Detection Models
HateCheck: Functional Tests for Hate Speech Detection Models
Paul Röttger
B. Vidgen
Dong Nguyen
Zeerak Talat
Helen Z. Margetts
J. Pierrehumbert
135
276
0
31 Dec 2020
DynaSent: A Dynamic Benchmark for Sentiment Analysis
DynaSent: A Dynamic Benchmark for Sentiment Analysis
Christopher Potts
Zhengxuan Wu
Atticus Geiger
Douwe Kiela
299
80
0
30 Dec 2020
Previous
123
Next