Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1803.02324
Cited By
v1
v2 (latest)
Annotation Artifacts in Natural Language Inference Data
6 March 2018
Suchin Gururangan
Swabha Swayamdipta
Omer Levy
Roy Schwartz
Samuel R. Bowman
Noah A. Smith
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Annotation Artifacts in Natural Language Inference Data"
50 / 796 papers shown
Title
When Does Meaning Backfire? Investigating the Role of AMRs in NLI
Junghyun Min
Xiulin Yang
Shira Wein
LLMSV
39
0
0
17 Jun 2025
Mitigating Negative Interference in Multilingual Sequential Knowledge Editing through Null-Space Constraints
Wei Sun
Tingyu Qu
Mingxiao Li
Jesse Davis
Marie-Francine Moens
KELM
111
0
0
12 Jun 2025
Moment Alignment: Unifying Gradient and Hessian Matching for Domain Generalization
Yuen Chen
Haozhe Si
Guojun Zhang
Han Zhao
OOD
27
0
0
09 Jun 2025
Not quite Sherlock Holmes: Language model predictions do not reliably differentiate impossible from improbable events
J. Michaelov
Reeka Estacio
Zhien Zhang
Benjamin Bergen
ReLM
LRM
26
0
0
07 Jun 2025
Exploring Explanations Improves the Robustness of In-Context Learning
Ukyo Honda
Tatsushi Oka
LRM
65
0
0
03 Jun 2025
Recover Experimental Data with Selection Bias using Counterfactual Logic
Jingyang He
Shuai Wang
Ang Li
CML
25
0
0
31 May 2025
What Has Been Lost with Synthetic Evaluation?
Alexander Gill
Abhilasha Ravichander
Ana Marasović
ELM
29
0
0
28 May 2025
Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate
Ashim Gupta
Maitrey Mehta
Zhichao Xu
Vivek Srikumar
49
0
0
28 May 2025
Research Community Perspectives on "Intelligence" and Large Language Models
Bertram Højer
Terne Sasha Thorn Jakobsen
Anna Rogers
Stefan Heinrich
44
0
0
27 May 2025
How to Improve the Robustness of Closed-Source Models on NLI
Joe Stacey
Lisa Alazraki
Aran Ubhi
Beyza Ermis
Aaron Mueller
Marek Rei
34
0
0
26 May 2025
Benchmarking and Pushing the Multi-Bias Elimination Boundary of LLMs via Causal Effect Estimation-guided Debiasing
Zhouhao Sun
Zhiyuan Kan
Xiao Ding
Li Du
Yang Zhao
Bing Qin
Ting Liu
104
0
0
22 May 2025
Class Distillation with Mahalanobis Contrast: An Efficient Training Paradigm for Pragmatic Language Understanding Tasks
Chenlu Wang
Weimin Lyu
Ritwik Banerjee
68
0
0
17 May 2025
HalluMix: A Task-Agnostic, Multi-Domain Benchmark for Real-World Hallucination Detection
Deanna Emery
Michael Goitia
Freddie Vargus
Iulia Neagu
HILM
VLM
139
0
0
01 May 2025
Pushing the boundary on Natural Language Inference
Pablo Miralles-González
Javier Huertas-Tato
Alejandro Martín
David Camacho
LRM
227
0
0
25 Apr 2025
FinNLI: Novel Dataset for Multi-Genre Financial Natural Language Inference Benchmarking
Jabez Magomere
Elena Kochkina
Samuel Mensah
Simerjot Kaur
Charese Smiley
105
1
0
22 Apr 2025
ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition
Haidar Khan
H. A. Alyahya
Yazeed Alnumay
M Saiful Bari
B. Yener
ELM
LRM
88
0
0
17 Apr 2025
A Perplexity and Menger Curvature-Based Approach for Similarity Evaluation of Large Language Models
Yuantao Zhang
Zhankui Yang
AAML
76
0
0
05 Apr 2025
Negation: A Pink Elephant in the Large Language Models' Room?
Tereza Vrabcová
Marek Kadlcík
Petr Sojka
Michal Štefánik
Michal Spiegel
132
0
0
28 Mar 2025
When is dataset cartography ineffective? Using training dynamics does not improve robustness against Adversarial SQuAD
Paul K. Mandal
AAML
86
0
0
24 Mar 2025
LLM-Based Insight Extraction for Contact Center Analytics and Cost-Efficient Deployment
Varsha Embar
Ritvik Shrivastava
Vinay Damodaran
Travis Mehlinger
Yu-Chung Hsiao
Karthik Raghunathan
66
0
0
24 Mar 2025
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs
Zhaofeng Wu
Michihiro Yasunaga
Andrew Cohen
Yoon Kim
Asli Celikyilmaz
Marjan Ghazvininejad
90
3
0
14 Mar 2025
Efficient Multi-Task Inferencing: Model Merging with Gromov-Wasserstein Feature Alignment
Luyang Fang
Ehsan Latif
Haoran Lu
Yimiao Zhou
Ping Ma
Xiaoming Zhai
MoMe
114
0
0
12 Mar 2025
ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships
Johan R. Portela
Nicolás Perez
Rubén Manrique
71
0
0
11 Mar 2025
Group-robust Sample Reweighting for Subpopulation Shifts via Influence Functions
Rui Qiao
Zhaoxuan Wu
Jingtan Wang
Pang Wei Koh
Bryan Kian Hsiang Low
OOD
105
2
0
10 Mar 2025
Biases in Large Language Model-Elicited Text: A Case Study in Natural Language Inference
Grace Proebsting
Adam Poliak
101
0
0
06 Mar 2025
ANPMI: Assessing the True Comprehension Capabilities of LLMs for Multiple Choice Questions
Gyeongje Cho
Yeonkyoung So
Jaejin Lee
ELM
126
0
0
26 Feb 2025
DBR: Divergence-Based Regularization for Debiasing Natural Language Understanding Models
Zihao Li
Ruixiang Tang
Lu Cheng
Shuaiqiang Wang
Dawei Yin
Jundong Li
144
0
0
25 Feb 2025
Neuro-Symbolic Contrastive Learning for Cross-domain Inference
Mingyue Liu
Ryo Ueda
Zhen Wan
Katsumi Inoue
Chris G. Willcocks
NAI
183
0
0
13 Feb 2025
In-Context Learning (and Unlearning) of Length Biases
S. Schoch
Yangfeng Ji
167
0
0
10 Feb 2025
Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge
Daniel Tamayo
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
KELM
160
5
0
04 Feb 2025
Beyond Benchmarks: On The False Promise of AI Regulation
Gabriel Stanovsky
Renana Keydar
Gadi Perl
Eliya Habba
90
2
0
28 Jan 2025
A Study of the Plausibility of Attention between RNN Encoders in Natural Language Inference
Duc Hau Nguyen
Duc Hau Nguyen
Pascale Sébillot
128
5
0
23 Jan 2025
Regularization, Semi-supervision, and Supervision for a Plausible Attention-Based Explanation
Duc Hau Nguyen
Cyrielle Mallart
Guillaume Gravier
Pascale Sébillot
131
0
0
22 Jan 2025
Reference-free Evaluation Metrics for Text Generation: A Survey
Takumi Ito
Kees van Deemter
Jun Suzuki
ELM
123
2
0
21 Jan 2025
Codebook LLMs: Evaluating LLMs as Measurement Tools for Political Science Concepts
Andrew Halterman
Katherine A. Keith
91
0
0
10 Jan 2025
What makes a good metric? Evaluating automatic metrics for text-to-image consistency
Candace Ross
Melissa Hall
Adriana Romero Soriano
Adina Williams
162
4
0
18 Dec 2024
On Crowdsourcing Task Design for Discourse Relation Annotation
Frances Yung
Vera Demberg
169
2
0
16 Dec 2024
Unpacking the Resilience of SNLI Contradiction Examples to Attacks
Chetan Verma
Archit Agarwal
AAML
96
0
0
15 Dec 2024
The Vulnerability of Language Model Benchmarks: Do They Accurately Reflect True LLM Performance?
Sourav Banerjee
Ayushi Agarwal
Eishkaran Singh
ELM
105
3
0
02 Dec 2024
SelfPrompt: Autonomously Evaluating LLM Robustness via Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts
Aihua Pei
Zehua Yang
Shunan Zhu
Ruoxi Cheng
Ju Jia
AAML
131
3
0
01 Dec 2024
AMREx: AMR for Explainable Fact Verification
Chathuri Jayaweera
Sangpil Youm
Bonnie J. Dorr
60
1
0
02 Nov 2024
Benchmark Data Repositories for Better Benchmarking
Rachel Longjohn
Markelle Kelly
Sameer Singh
Padhraic Smyth
85
2
0
31 Oct 2024
DISCERN: Decoding Systematic Errors in Natural Language for Text Classifiers
Rakesh R Menon
Shashank Srivastava
46
2
0
29 Oct 2024
Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Vipul Gupta
Candace Ross
David Pantoja
R. Passonneau
Megan Ung
Adina Williams
303
2
0
26 Oct 2024
Task Calibration: Calibrating Large Language Models on Inference Tasks
Yingjie Li
Yun Luo
Xiaotian Xie
Yue Zhang
LRM
57
0
0
24 Oct 2024
Optimizing importance weighting in the presence of sub-population shifts
Floris Holstege
Bram Wouters
Noud van Giersbergen
C. Diks
88
0
0
18 Oct 2024
Attuned to Change: Causal Fine-Tuning under Latent-Confounded Shifts
Jialin Yu
Yuxiang Zhou
Yulan He
Nevin L. Zhang
Ricardo Silva
Philip Torr
Ricardo M. A. Silva
93
0
0
18 Oct 2024
Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers
Lorenzo Pacchiardi
Marko Tesic
Lucy G. Cheke
José Hernández-Orallo
89
3
0
15 Oct 2024
Eliciting Textual Descriptions from Representations of Continuous Prompts
Dana Ramati
Daniela Gottesman
Mor Geva
101
0
0
15 Oct 2024
RepMatch: Quantifying Cross-Instance Similarities in Representation Space
Mohammad Reza Modarres
Sina Abbasi
Mohammad Taher Pilehvar
75
0
0
12 Oct 2024
1
2
3
4
...
14
15
16
Next