Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1803.02324
Cited By
Annotation Artifacts in Natural Language Inference Data
6 March 2018
Suchin Gururangan
Swabha Swayamdipta
Omer Levy
Roy Schwartz
Samuel R. Bowman
Noah A. Smith
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Annotation Artifacts in Natural Language Inference Data"
50 / 783 papers shown
Title
Class Distillation with Mahalanobis Contrast: An Efficient Training Paradigm for Pragmatic Language Understanding Tasks
Chenlu Wang
Weimin Lyu
Ritwik Banerjee
2
0
0
17 May 2025
HalluMix: A Task-Agnostic, Multi-Domain Benchmark for Real-World Hallucination Detection
Deanna Emery
Michael Goitia
Freddie Vargus
Iulia Neagu
HILM
VLM
61
0
0
01 May 2025
Pushing the boundary on Natural Language Inference
Pablo Miralles-González
Javier Huertas-Tato
Alejandro Martín
David Camacho
LRM
44
0
0
25 Apr 2025
FinNLI: Novel Dataset for Multi-Genre Financial Natural Language Inference Benchmarking
Jabez Magomere
Elena Kochkina
Samuel Mensah
Simerjot Kaur
Charese Smiley
30
1
0
22 Apr 2025
ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition
Haidar Khan
H. A. Alyahya
Yazeed Alnumay
M Saiful Bari
B. Yener
ELM
LRM
57
0
0
17 Apr 2025
A Perplexity and Menger Curvature-Based Approach for Similarity Evaluation of Large Language Models
Yuantao Zhang
Zhankui Yang
AAML
38
0
0
05 Apr 2025
When is dataset cartography ineffective? Using training dynamics does not improve robustness against Adversarial SQuAD
Paul K. Mandal
AAML
73
0
0
24 Mar 2025
LLM-Based Insight Extraction for Contact Center Analytics and Cost-Efficient Deployment
Varsha Embar
Ritvik Shrivastava
Vinay Damodaran
Travis Mehlinger
Yu-Chung Hsiao
Karthik Raghunathan
39
0
0
24 Mar 2025
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs
Zhaofeng Wu
Michihiro Yasunaga
Andrew Cohen
Yoon Kim
Asli Celikyilmaz
Marjan Ghazvininejad
46
2
0
14 Mar 2025
Efficient Multi-Task Inferencing: Model Merging with Gromov-Wasserstein Feature Alignment
Luyang Fang
Ehsan Latif
Haoran Lu
Yue Zhou
Ping Ma
Xiaoming Zhai
MoMe
83
0
0
12 Mar 2025
ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships
Johan R. Portela
Nicolás Perez
Rubén Manrique
46
0
0
11 Mar 2025
Group-robust Sample Reweighting for Subpopulation Shifts via Influence Functions
Rui Qiao
Zhaoxuan Wu
Jingtan Wang
Pang Wei Koh
Bryan Kian Hsiang Low
OOD
53
2
0
10 Mar 2025
Biases in Large Language Model-Elicited Text: A Case Study in Natural Language Inference
Grace Proebsting
Adam Poliak
55
0
0
06 Mar 2025
ANPMI: Assessing the True Comprehension Capabilities of LLMs for Multiple Choice Questions
Gyeongje Cho
Yeonkyoung So
Jaejin Lee
ELM
62
0
0
26 Feb 2025
DBR: Divergence-Based Regularization for Debiasing Natural Language Understanding Models
Zihao Li
Ruixiang Tang
Lu Cheng
S. Wang
Dawei Yin
Jundong Li
75
0
0
25 Feb 2025
Neuro-Symbolic Contrastive Learning for Cross-domain Inference
Mingyue Liu
Ryo Ueda
Zhen Wan
Katsumi Inoue
Chris G. Willcocks
NAI
72
0
0
13 Feb 2025
In-Context Learning (and Unlearning) of Length Biases
S. Schoch
Yangfeng Ji
100
0
0
10 Feb 2025
Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge
Daniel Tamayo
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
KELM
93
3
0
04 Feb 2025
Beyond Benchmarks: On The False Promise of AI Regulation
Gabriel Stanovsky
Renana Keydar
Gadi Perl
Eliya Habba
41
1
0
28 Jan 2025
A Study of the Plausibility of Attention between RNN Encoders in Natural Language Inference
Duc Hau Nguyen
Duc Hau Nguyen
Pascale Sébillot
52
5
0
23 Jan 2025
Regularization, Semi-supervision, and Supervision for a Plausible Attention-Based Explanation
Duc Hau Nguyen
Cyrielle Mallart
Guillaume Gravier
Pascale Sébillot
60
0
0
22 Jan 2025
Reference-free Evaluation Metrics for Text Generation: A Survey
Takumi Ito
Kees van Deemter
Jun Suzuki
ELM
41
2
0
21 Jan 2025
Codebook LLMs: Evaluating LLMs as Measurement Tools for Political Science Concepts
Andrew Halterman
Katherine A. Keith
50
2
0
10 Jan 2025
What makes a good metric? Evaluating automatic metrics for text-to-image consistency
Candace Ross
Melissa Hall
Adriana Romero Soriano
Adina Williams
95
3
0
18 Dec 2024
On Crowdsourcing Task Design for Discourse Relation Annotation
Frances Yung
Vera Demberg
81
2
0
16 Dec 2024
Unpacking the Resilience of SNLI Contradiction Examples to Attacks
Chetan Verma
Archit Agarwal
AAML
74
0
0
15 Dec 2024
The Vulnerability of Language Model Benchmarks: Do They Accurately Reflect True LLM Performance?
Sourav Banerjee
Ayushi Agarwal
Eishkaran Singh
ELM
73
2
0
02 Dec 2024
SelfPrompt: Autonomously Evaluating LLM Robustness via Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts
Aihua Pei
Zehua Yang
Shunan Zhu
Ruoxi Cheng
Ju Jia
AAML
80
2
0
01 Dec 2024
AMREx: AMR for Explainable Fact Verification
Chathuri Jayaweera
Sangpil Youm
Bonnie J. Dorr
26
1
0
02 Nov 2024
Benchmark Data Repositories for Better Benchmarking
Rachel Longjohn
Markelle Kelly
Sameer Singh
Padhraic Smyth
46
0
0
31 Oct 2024
DISCERN: Decoding Systematic Errors in Natural Language for Text Classifiers
Rakesh R Menon
Shashank Srivastava
26
1
0
29 Oct 2024
Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Vipul Gupta
Candace Ross
David Pantoja
R. Passonneau
Megan Ung
Adina Williams
91
1
0
26 Oct 2024
Task Calibration: Calibrating Large Language Models on Inference Tasks
Yingjie Li
Yun Luo
Xiaotian Xie
Yue Zhang
LRM
21
0
0
24 Oct 2024
Fine-Tuning Pre-trained Language Models for Robust Causal Representation Learning
Jialin Yu
Yuxiang Zhou
Yulan He
Nevin L. Zhang
Ricardo Silva
33
0
0
18 Oct 2024
Optimizing importance weighting in the presence of sub-population shifts
Floris Holstege
Bram Wouters
Noud van Giersbergen
C. Diks
31
0
0
18 Oct 2024
Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers
Lorenzo Pacchiardi
Marko Tesic
Lucy G. Cheke
José Hernández-Orallo
36
3
0
15 Oct 2024
Eliciting Textual Descriptions from Representations of Continuous Prompts
Dana Ramati
Daniela Gottesman
Mor Geva
37
0
0
15 Oct 2024
RepMatch: Quantifying Cross-Instance Similarities in Representation Space
Mohammad Reza Modarres
Sina Abbasi
Mohammad Taher Pilehvar
29
0
0
12 Oct 2024
ALVIN: Active Learning Via INterpolation
Michalis Korakakis
Andreas Vlachos
Adrian Weller
33
0
0
11 Oct 2024
Explanation sensitivity to the randomness of large language models: the case of journalistic text classification
Jérémie Bogaert
Marie-Catherine de Marneffe
Antonin Descampe
Louis Escouflaire
Cedrick Fairon
François-Xavier Standaert
24
1
0
07 Oct 2024
Plausibly Problematic Questions in Multiple-Choice Benchmarks for Commonsense Reasoning
Shramay Palta
Nishant Balepur
Peter Rankel
Sarah Wiegreffe
Marine Carpuat
Rachel Rudinger
ELM
36
4
0
06 Oct 2024
How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics
Adrian Cosma
Stefan Ruseti
Mihai Dascălu
Cornelia Caragea
21
2
0
04 Oct 2024
In-context Learning in Presence of Spurious Correlations
Hrayr Harutyunyan
R. Darbinyan
Samvel Karapetyan
Hrant Khachatrian
LRM
51
1
0
04 Oct 2024
The Hard Positive Truth about Vision-Language Compositionality
Amita Kamath
Cheng-Yu Hsieh
Kai-Wei Chang
Ranjay Krishna
CLIP
CoGe
VLM
32
5
0
26 Sep 2024
AlpaPICO: Extraction of PICO Frames from Clinical Trial Documents Using LLMs
Madhusudan Ghosh
Shrimon Mukherjee
Asmit Ganguly
Partha Basuchowdhuri
S. Naskar
Debasis Ganguly
36
7
0
15 Sep 2024
Enhancing adversarial robustness in Natural Language Inference using explanations
Alexandros Koulakos
Maria Lymperaiou
Giorgos Filandrianos
Giorgos Stamou
SILM
AAML
43
0
0
11 Sep 2024
Seemingly Plausible Distractors in Multi-Hop Reasoning: Are Large Language Models Attentive Readers?
Neeladri Bhuiya
Viktor Schlegel
Stefan Winkler
LRM
37
5
0
08 Sep 2024
Investigating a Benchmark for Training-set free Evaluation of Linguistic Capabilities in Machine Reading Comprehension
Viktor Schlegel
Goran Nenadic
R. Batista-Navarro
ELM
32
0
0
09 Aug 2024
DisTrack: a new Tool for Semi-automatic Misinformation Tracking in Online Social Networks
Francesco Di Salvo
Álvaro Huertas-García
Sebastian Doerrich
Javier Huertas-Tato
Christian Ledig
36
0
0
01 Aug 2024
Consent in Crisis: The Rapid Decline of the AI Data Commons
Shayne Longpre
Robert Mahari
Ariel N. Lee
Campbell Lund
Hamidah Oderinwale
...
Hanlin Li
Daphne Ippolito
Sara Hooker
Jad Kabbara
Sandy Pentland
69
36
0
20 Jul 2024
1
2
3
4
...
14
15
16
Next