ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.00692
  4. Cited By
Stress Test Evaluation for Natural Language Inference
v1v2v3 (latest)

Stress Test Evaluation for Natural Language Inference

2 June 2018
Aakanksha Naik
Abhilasha Ravichander
Norman M. Sadeh
Carolyn Rose
Graham Neubig
    ELM
ArXiv (abs)PDFHTML

Papers citing "Stress Test Evaluation for Natural Language Inference"

50 / 149 papers shown
Title
Lost in Variation? Evaluating NLI Performance in Basque and Spanish Geographical Variants
Lost in Variation? Evaluating NLI Performance in Basque and Spanish Geographical Variants
Jaione Bengoetxea
Itziar Gonzalez-Dios
Rodrigo Agerri
19
0
0
18 Jun 2025
Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding
Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding
Yeonkyoung So
Gyuseong Lee
Sungmok Jung
Joonhak Lee
JiA Kang
Sangho Kim
Jaejin Lee
38
0
0
17 Jun 2025
Exploring Explanations Improves the Robustness of In-Context Learning
Exploring Explanations Improves the Robustness of In-Context Learning
Ukyo Honda
Tatsushi Oka
LRM
70
0
0
03 Jun 2025
What Has Been Lost with Synthetic Evaluation?
What Has Been Lost with Synthetic Evaluation?
Alexander Gill
Abhilasha Ravichander
Ana Marasović
ELM
36
0
0
28 May 2025
Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification
Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification
Leon Eshuijs
Shihan Wang
Antske Fokkens
144
0
0
09 May 2025
aiXamine: Simplified LLM Safety and Security
aiXamine: Simplified LLM Safety and Security
Fatih Deniz
Dorde Popovic
Yazan Boshmaf
Euisuh Jeong
M. Ahmad
Sanjay Chawla
Issa M. Khalil
ELM
341
0
0
21 Apr 2025
CodeCrash: Stress Testing LLM Reasoning under Structural and Semantic Perturbations
CodeCrash: Stress Testing LLM Reasoning under Structural and Semantic Perturbations
Man Ho Lam
Chaozheng Wang
Jen-tse Huang
Michael R. Lyu
LRM
112
1
0
19 Apr 2025
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs
Zhaofeng Wu
Michihiro Yasunaga
Andrew Cohen
Yoon Kim
Asli Celikyilmaz
Marjan Ghazvininejad
90
3
0
14 Mar 2025
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation
Yue Zhou
Yi-Ju Chang
Yuan Wu
MoMe
122
3
0
21 Feb 2025
From Superficial Patterns to Semantic Understanding: Fine-Tuning Language Models on Contrast Sets
From Superficial Patterns to Semantic Understanding: Fine-Tuning Language Models on Contrast Sets
Daniel Petrov
50
0
0
05 Jan 2025
Dissecting the Ullman Variations with a SCALPEL: Why do LLMs fail at Trivial Alterations to the False Belief Task?
Dissecting the Ullman Variations with a SCALPEL: Why do LLMs fail at Trivial Alterations to the False Belief Task?
Zhiqiang Pi
Annapurna Vadaparty
Benjamin Bergen
Cameron R. Jones
81
3
0
20 Jun 2024
Pre-Calc: Learning to Use the Calculator Improves Numeracy in Language
  Models
Pre-Calc: Learning to Use the Calculator Improves Numeracy in Language Models
Vishruth Veerendranath
Vishwa Shah
Kshitish Ghate
101
0
0
22 Apr 2024
Specification Overfitting in Artificial Intelligence
Specification Overfitting in Artificial Intelligence
Benjamin Roth
Pedro Henrique Luz de Araujo
Yuxi Xia
Saskia Kaltenbrunner
Christoph Korab
233
1
0
13 Mar 2024
Semantic Sensitivities and Inconsistent Predictions: Measuring the
  Fragility of NLI Models
Semantic Sensitivities and Inconsistent Predictions: Measuring the Fragility of NLI Models
Erik Arakelyan
Zhaoqi Liu
Isabelle Augenstein
AAML
145
12
0
25 Jan 2024
Latent Feature-based Data Splits to Improve Generalisation Evaluation: A
  Hate Speech Detection Case Study
Latent Feature-based Data Splits to Improve Generalisation Evaluation: A Hate Speech Detection Case Study
Maike Zufle
Verna Dankers
Ivan Titov
96
0
0
16 Nov 2023
Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness
Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness
Ashim Gupta
Rishanth Rajendhran
Nathan Stringham
Vivek Srikumar
Ana Marasović
AAML
90
3
0
16 Nov 2023
Empowering Cross-lingual Behavioral Testing of NLP Models with
  Typological Features
Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features
Ester Hlavnova
Sebastian Ruder
84
5
0
11 Jul 2023
Evaluating Paraphrastic Robustness in Textual Entailment Models
Evaluating Paraphrastic Robustness in Textual Entailment Models
Dhruv Verma
Yash Kumar Lal
Shreyashee Sinha
Benjamin Van Durme
Adam Poliak
91
5
0
29 Jun 2023
From Adversarial Arms Race to Model-centric Evaluation: Motivating a
  Unified Automatic Robustness Evaluation Framework
From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework
Yangyi Chen
Hongcheng Gao
Ganqu Cui
Lifan Yuan
Dehan Kong
...
Longtao Huang
H. Xue
Zhiyuan Liu
Maosong Sun
Heng Ji
AAMLELM
101
6
0
29 May 2023
On Degrees of Freedom in Defining and Testing Natural Language
  Understanding
On Degrees of Freedom in Defining and Testing Natural Language Understanding
Saku Sugawara
S. Tsugita
ELM
86
1
0
24 May 2023
Out-of-Distribution Generalization in Text Classification: Past,
  Present, and Future
Out-of-Distribution Generalization in Text Classification: Past, Present, and Future
Linyi Yang
Yangqiu Song
Xuan Ren
Chenyang Lyu
Yidong Wang
Lingqiao Liu
Jindong Wang
Jennifer Foster
Yue Zhang
OOD
129
3
0
23 May 2023
A Mixed-Methods Approach to Understanding User Trust after Voice
  Assistant Failures
A Mixed-Methods Approach to Understanding User Trust after Voice Assistant Failures
Amanda Baughan
Allison Mercurio
Ariel Liu
Xuezhi Wang
Jilin Chen
Xiao Ma
76
15
0
01 Mar 2023
SMoA: Sparse Mixture of Adapters to Mitigate Multiple Dataset Biases
SMoA: Sparse Mixture of Adapters to Mitigate Multiple Dataset Biases
Yanchen Liu
Jing Yang
Yan Chen
Jing Liu
Huaqin Wu
MoE
85
2
0
28 Feb 2023
On Robustness of Prompt-based Semantic Parsing with Large Pre-trained
  Language Model: An Empirical Study on Codex
On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex
Terry Yue Zhuo
Zhuang Li
Yujin Huang
Fatemeh Shiri
Weiqing Wang
Gholamreza Haffari
Yuan-Fang Li
AAML
107
57
0
30 Jan 2023
DISCO: Distilling Counterfactuals with Large Language Models
DISCO: Distilling Counterfactuals with Large Language Models
Zeming Chen
Qiyue Gao
Antoine Bosselut
Ashish Sabharwal
Kyle Richardson
96
31
0
20 Dec 2022
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
Tianxing He
Jingyu Zhang
Tianle Wang
Sachin Kumar
Kyunghyun Cho
James R. Glass
Yulia Tsvetkov
150
45
0
20 Dec 2022
Feature-Level Debiased Natural Language Understanding
Feature-Level Debiased Natural Language Understanding
Yougang Lyu
Piji Li
Yechang Yang
Maarten de Rijke
Fajie Yuan
Yukun Zhao
D. Yin
Zhaochun Ren
91
12
0
11 Dec 2022
AGRO: Adversarial Discovery of Error-prone groups for Robust
  Optimization
AGRO: Adversarial Discovery of Error-prone groups for Robust Optimization
Bhargavi Paranjape
Pradeep Dasigi
Vivek Srikumar
Luke Zettlemoyer
Hannaneh Hajishirzi
98
8
0
02 Dec 2022
AutoCAD: Automatically Generating Counterfactuals for Mitigating
  Shortcut Learning
AutoCAD: Automatically Generating Counterfactuals for Mitigating Shortcut Learning
Jiaxin Wen
Yeshuang Zhu
Jinchao Zhang
Jie Zhou
Minlie Huang
CMLAAML
115
9
0
29 Nov 2022
Using Focal Loss to Fight Shallow Heuristics: An Empirical Analysis of
  Modulated Cross-Entropy in Natural Language Inference
Using Focal Loss to Fight Shallow Heuristics: An Empirical Analysis of Modulated Cross-Entropy in Natural Language Inference
Frano Rajic
Ivan Stresec
Axel Marmet
Tim Postuvan
46
3
0
23 Nov 2022
Capabilities for Better ML Engineering
Capabilities for Better ML Engineering
Chenyang Yang
Rachel A. Brower-Sinning
Grace A. Lewis
Christian Kastner
Tongshuang Wu
63
4
0
11 Nov 2022
Looking at the Overlooked: An Analysis on the Word-Overlap Bias in
  Natural Language Inference
Looking at the Overlooked: An Analysis on the Word-Overlap Bias in Natural Language Inference
S. Rajaee
Yadollah Yaghoobzadeh
Mohammad Taher Pilehvar
73
5
0
07 Nov 2022
Probing neural language models for understanding of words of estimative
  probability
Probing neural language models for understanding of words of estimative probability
Damien Sileo
Marie-Francine Moens
51
12
0
07 Nov 2022
Overcoming Barriers to Skill Injection in Language Modeling: Case Study
  in Arithmetic
Overcoming Barriers to Skill Injection in Language Modeling: Case Study in Arithmetic
Mandar Sharma
Nikhil Muralidhar
Naren Ramakrishnan
58
6
0
03 Nov 2022
CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about
  Negation
CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation
Abhilasha Ravichander
Matt Gardner
Ana Marasović
112
35
0
01 Nov 2022
Lexical Generalization Improves with Larger Models and Longer Training
Lexical Generalization Improves with Larger Models and Longer Training
Elron Bandel
Yoav Goldberg
Yanai Elazar
94
7
0
23 Oct 2022
Enhancing Tabular Reasoning with Pattern Exploiting Training
Enhancing Tabular Reasoning with Pattern Exploiting Training
Abhilash Shankarampeta
Vivek Gupta
Shuo Zhang
LMTDRALMReLM
139
6
0
21 Oct 2022
Measures of Information Reflect Memorization Patterns
Measures of Information Reflect Memorization Patterns
Rachit Bansal
Danish Pruthi
Yonatan Belinkov
110
10
0
17 Oct 2022
A Survey of Parameters Associated with the Quality of Benchmarks in NLP
A Survey of Parameters Associated with the Quality of Benchmarks in NLP
Swaroop Mishra
Anjana Arunkumar
Chris Bryan
Chitta Baral
105
1
0
14 Oct 2022
Kernel-Whitening: Overcome Dataset Bias with Isotropic Sentence
  Embedding
Kernel-Whitening: Overcome Dataset Bias with Isotropic Sentence Embedding
Songyang Gao
Shihan Dou
Qi Zhang
Xuanjing Huang
43
8
0
14 Oct 2022
Benchmarking Long-tail Generalization with Likelihood Splits
Benchmarking Long-tail Generalization with Likelihood Splits
Ameya Godbole
Robin Jia
ALM
79
9
0
13 Oct 2022
CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation
CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation
Tanay Dixit
Bhargavi Paranjape
Hannaneh Hajishirzi
Luke Zettlemoyer
SyDa
206
26
0
10 Oct 2022
InferES : A Natural Language Inference Corpus for Spanish Featuring
  Negation-Based Contrastive and Adversarial Examples
InferES : A Natural Language Inference Corpus for Spanish Featuring Negation-Based Contrastive and Adversarial Examples
Venelin Kovatchev
Mariona Taulé
73
4
0
06 Oct 2022
Compositional Evaluation on Japanese Textual Entailment and Similarity
Compositional Evaluation on Japanese Textual Entailment and Similarity
Hitomi Yanaka
K. Mineshima
93
24
0
09 Aug 2022
Measuring Causal Effects of Data Statistics on Language Model's
  `Factual' Predictions
Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions
Yanai Elazar
Nora Kassner
Shauli Ravfogel
Amir Feder
Abhilasha Ravichander
Marius Mosbach
Yonatan Belinkov
Hinrich Schütze
Yoav Goldberg
CMLSyDaMILM
110
55
0
28 Jul 2022
Probing via Prompting
Probing via Prompting
Jiaoda Li
Ryan Cotterell
Mrinmaya Sachan
109
13
0
04 Jul 2022
longhorns at DADC 2022: How many linguists does it take to fool a
  Question Answering model? A systematic approach to adversarial attacks
longhorns at DADC 2022: How many linguists does it take to fool a Question Answering model? A systematic approach to adversarial attacks
Venelin Kovatchev
Trina Chatterjee
Venkata S Govindarajan
Jifan Chen
Eunsol Choi
...
K. Erk
Matthew Lease
Junyi Jessy Li
Yating Wu
Kyle Mahowald
AAMLELM
89
9
0
29 Jun 2022
LegoNN: Building Modular Encoder-Decoder Models
LegoNN: Building Modular Encoder-Decoder Models
Siddharth Dalmia
Dmytro Okhonko
M. Lewis
Sergey Edunov
Shinji Watanabe
Florian Metze
Luke Zettlemoyer
Abdel-rahman Mohamed
AuLLMMoE
71
14
0
07 Jun 2022
Linear Connectivity Reveals Generalization Strategies
Linear Connectivity Reveals Generalization Strategies
Jeevesh Juneja
Rachit Bansal
Kyunghyun Cho
João Sedoc
Naomi Saphra
333
48
0
24 May 2022
White-box Testing of NLP models with Mask Neuron Coverage
White-box Testing of NLP models with Mask Neuron Coverage
Arshdeep Sekhon
Yangfeng Ji
Matthew B. Dwyer
Yanjun Qi
AAML
52
3
0
10 May 2022
123
Next