ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.02324
  4. Cited By
Annotation Artifacts in Natural Language Inference Data
v1v2 (latest)

Annotation Artifacts in Natural Language Inference Data

6 March 2018
Suchin Gururangan
Swabha Swayamdipta
Omer Levy
Roy Schwartz
Samuel R. Bowman
Noah A. Smith
ArXiv (abs)PDFHTML

Papers citing "Annotation Artifacts in Natural Language Inference Data"

50 / 796 papers shown
Title
When Does Meaning Backfire? Investigating the Role of AMRs in NLI
When Does Meaning Backfire? Investigating the Role of AMRs in NLI
Junghyun Min
Xiulin Yang
Shira Wein
LLMSV
39
0
0
17 Jun 2025
Mitigating Negative Interference in Multilingual Sequential Knowledge Editing through Null-Space Constraints
Mitigating Negative Interference in Multilingual Sequential Knowledge Editing through Null-Space Constraints
Wei Sun
Tingyu Qu
Mingxiao Li
Jesse Davis
Marie-Francine Moens
KELM
111
0
0
12 Jun 2025
Moment Alignment: Unifying Gradient and Hessian Matching for Domain Generalization
Moment Alignment: Unifying Gradient and Hessian Matching for Domain Generalization
Yuen Chen
Haozhe Si
Guojun Zhang
Han Zhao
OOD
27
0
0
09 Jun 2025
Not quite Sherlock Holmes: Language model predictions do not reliably differentiate impossible from improbable events
Not quite Sherlock Holmes: Language model predictions do not reliably differentiate impossible from improbable events
J. Michaelov
Reeka Estacio
Zhien Zhang
Benjamin Bergen
ReLMLRM
26
0
0
07 Jun 2025
Exploring Explanations Improves the Robustness of In-Context Learning
Exploring Explanations Improves the Robustness of In-Context Learning
Ukyo Honda
Tatsushi Oka
LRM
65
0
0
03 Jun 2025
Recover Experimental Data with Selection Bias using Counterfactual Logic
Recover Experimental Data with Selection Bias using Counterfactual Logic
Jingyang He
Shuai Wang
Ang Li
CML
25
0
0
31 May 2025
What Has Been Lost with Synthetic Evaluation?
What Has Been Lost with Synthetic Evaluation?
Alexander Gill
Abhilasha Ravichander
Ana Marasović
ELM
29
0
0
28 May 2025
Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate
Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate
Ashim Gupta
Maitrey Mehta
Zhichao Xu
Vivek Srikumar
49
0
0
28 May 2025
Research Community Perspectives on "Intelligence" and Large Language Models
Research Community Perspectives on "Intelligence" and Large Language Models
Bertram Højer
Terne Sasha Thorn Jakobsen
Anna Rogers
Stefan Heinrich
44
0
0
27 May 2025
How to Improve the Robustness of Closed-Source Models on NLI
How to Improve the Robustness of Closed-Source Models on NLI
Joe Stacey
Lisa Alazraki
Aran Ubhi
Beyza Ermis
Aaron Mueller
Marek Rei
34
0
0
26 May 2025
Benchmarking and Pushing the Multi-Bias Elimination Boundary of LLMs via Causal Effect Estimation-guided Debiasing
Benchmarking and Pushing the Multi-Bias Elimination Boundary of LLMs via Causal Effect Estimation-guided Debiasing
Zhouhao Sun
Zhiyuan Kan
Xiao Ding
Li Du
Yang Zhao
Bing Qin
Ting Liu
104
0
0
22 May 2025
Class Distillation with Mahalanobis Contrast: An Efficient Training Paradigm for Pragmatic Language Understanding Tasks
Class Distillation with Mahalanobis Contrast: An Efficient Training Paradigm for Pragmatic Language Understanding Tasks
Chenlu Wang
Weimin Lyu
Ritwik Banerjee
68
0
0
17 May 2025
HalluMix: A Task-Agnostic, Multi-Domain Benchmark for Real-World Hallucination Detection
HalluMix: A Task-Agnostic, Multi-Domain Benchmark for Real-World Hallucination Detection
Deanna Emery
Michael Goitia
Freddie Vargus
Iulia Neagu
HILMVLM
139
0
0
01 May 2025
Pushing the boundary on Natural Language Inference
Pushing the boundary on Natural Language Inference
Pablo Miralles-González
Javier Huertas-Tato
Alejandro Martín
David Camacho
LRM
227
0
0
25 Apr 2025
FinNLI: Novel Dataset for Multi-Genre Financial Natural Language Inference Benchmarking
FinNLI: Novel Dataset for Multi-Genre Financial Natural Language Inference Benchmarking
Jabez Magomere
Elena Kochkina
Samuel Mensah
Simerjot Kaur
Charese Smiley
105
1
0
22 Apr 2025
ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition
ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition
Haidar Khan
H. A. Alyahya
Yazeed Alnumay
M Saiful Bari
B. Yener
ELMLRM
88
0
0
17 Apr 2025
A Perplexity and Menger Curvature-Based Approach for Similarity Evaluation of Large Language Models
A Perplexity and Menger Curvature-Based Approach for Similarity Evaluation of Large Language Models
Yuantao Zhang
Zhankui Yang
AAML
76
0
0
05 Apr 2025
Negation: A Pink Elephant in the Large Language Models' Room?
Negation: A Pink Elephant in the Large Language Models' Room?
Tereza Vrabcová
Marek Kadlcík
Petr Sojka
Michal Štefánik
Michal Spiegel
132
0
0
28 Mar 2025
When is dataset cartography ineffective? Using training dynamics does not improve robustness against Adversarial SQuAD
When is dataset cartography ineffective? Using training dynamics does not improve robustness against Adversarial SQuAD
Paul K. Mandal
AAML
86
0
0
24 Mar 2025
LLM-Based Insight Extraction for Contact Center Analytics and Cost-Efficient Deployment
LLM-Based Insight Extraction for Contact Center Analytics and Cost-Efficient Deployment
Varsha Embar
Ritvik Shrivastava
Vinay Damodaran
Travis Mehlinger
Yu-Chung Hsiao
Karthik Raghunathan
66
0
0
24 Mar 2025
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs
Zhaofeng Wu
Michihiro Yasunaga
Andrew Cohen
Yoon Kim
Asli Celikyilmaz
Marjan Ghazvininejad
90
3
0
14 Mar 2025
Efficient Multi-Task Inferencing: Model Merging with Gromov-Wasserstein Feature Alignment
Luyang Fang
Ehsan Latif
Haoran Lu
Yimiao Zhou
Ping Ma
Xiaoming Zhai
MoMe
114
0
0
12 Mar 2025
ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships
Johan R. Portela
Nicolás Perez
Rubén Manrique
71
0
0
11 Mar 2025
Group-robust Sample Reweighting for Subpopulation Shifts via Influence Functions
Rui Qiao
Zhaoxuan Wu
Jingtan Wang
Pang Wei Koh
Bryan Kian Hsiang Low
OOD
105
2
0
10 Mar 2025
Biases in Large Language Model-Elicited Text: A Case Study in Natural Language Inference
Grace Proebsting
Adam Poliak
101
0
0
06 Mar 2025
ANPMI: Assessing the True Comprehension Capabilities of LLMs for Multiple Choice Questions
ANPMI: Assessing the True Comprehension Capabilities of LLMs for Multiple Choice Questions
Gyeongje Cho
Yeonkyoung So
Jaejin Lee
ELM
126
0
0
26 Feb 2025
DBR: Divergence-Based Regularization for Debiasing Natural Language Understanding Models
DBR: Divergence-Based Regularization for Debiasing Natural Language Understanding Models
Zihao Li
Ruixiang Tang
Lu Cheng
Shuaiqiang Wang
Dawei Yin
Jundong Li
144
0
0
25 Feb 2025
Neuro-Symbolic Contrastive Learning for Cross-domain Inference
Neuro-Symbolic Contrastive Learning for Cross-domain Inference
Mingyue Liu
Ryo Ueda
Zhen Wan
Katsumi Inoue
Chris G. Willcocks
NAI
183
0
0
13 Feb 2025
In-Context Learning (and Unlearning) of Length Biases
In-Context Learning (and Unlearning) of Length Biases
S. Schoch
Yangfeng Ji
167
0
0
10 Feb 2025
Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge
Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge
Daniel Tamayo
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
KELM
160
5
0
04 Feb 2025
Beyond Benchmarks: On The False Promise of AI Regulation
Gabriel Stanovsky
Renana Keydar
Gadi Perl
Eliya Habba
90
2
0
28 Jan 2025
A Study of the Plausibility of Attention between RNN Encoders in Natural Language Inference
A Study of the Plausibility of Attention between RNN Encoders in Natural Language Inference
Duc Hau Nguyen
Duc Hau Nguyen
Pascale Sébillot
128
5
0
23 Jan 2025
Regularization, Semi-supervision, and Supervision for a Plausible Attention-Based Explanation
Regularization, Semi-supervision, and Supervision for a Plausible Attention-Based Explanation
Duc Hau Nguyen
Cyrielle Mallart
Guillaume Gravier
Pascale Sébillot
131
0
0
22 Jan 2025
Reference-free Evaluation Metrics for Text Generation: A Survey
Reference-free Evaluation Metrics for Text Generation: A Survey
Takumi Ito
Kees van Deemter
Jun Suzuki
ELM
123
2
0
21 Jan 2025
Codebook LLMs: Evaluating LLMs as Measurement Tools for Political Science Concepts
Codebook LLMs: Evaluating LLMs as Measurement Tools for Political Science Concepts
Andrew Halterman
Katherine A. Keith
91
0
0
10 Jan 2025
What makes a good metric? Evaluating automatic metrics for text-to-image
  consistency
What makes a good metric? Evaluating automatic metrics for text-to-image consistency
Candace Ross
Melissa Hall
Adriana Romero Soriano
Adina Williams
162
4
0
18 Dec 2024
On Crowdsourcing Task Design for Discourse Relation Annotation
On Crowdsourcing Task Design for Discourse Relation Annotation
Frances Yung
Vera Demberg
169
2
0
16 Dec 2024
Unpacking the Resilience of SNLI Contradiction Examples to Attacks
Unpacking the Resilience of SNLI Contradiction Examples to Attacks
Chetan Verma
Archit Agarwal
AAML
96
0
0
15 Dec 2024
The Vulnerability of Language Model Benchmarks: Do They Accurately
  Reflect True LLM Performance?
The Vulnerability of Language Model Benchmarks: Do They Accurately Reflect True LLM Performance?
Sourav Banerjee
Ayushi Agarwal
Eishkaran Singh
ELM
105
3
0
02 Dec 2024
SelfPrompt: Autonomously Evaluating LLM Robustness via
  Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts
SelfPrompt: Autonomously Evaluating LLM Robustness via Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts
Aihua Pei
Zehua Yang
Shunan Zhu
Ruoxi Cheng
Ju Jia
AAML
131
3
0
01 Dec 2024
AMREx: AMR for Explainable Fact Verification
AMREx: AMR for Explainable Fact Verification
Chathuri Jayaweera
Sangpil Youm
Bonnie J. Dorr
60
1
0
02 Nov 2024
Benchmark Data Repositories for Better Benchmarking
Benchmark Data Repositories for Better Benchmarking
Rachel Longjohn
Markelle Kelly
Sameer Singh
Padhraic Smyth
85
2
0
31 Oct 2024
DISCERN: Decoding Systematic Errors in Natural Language for Text
  Classifiers
DISCERN: Decoding Systematic Errors in Natural Language for Text Classifiers
Rakesh R Menon
Shashank Srivastava
46
2
0
29 Oct 2024
Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Vipul Gupta
Candace Ross
David Pantoja
R. Passonneau
Megan Ung
Adina Williams
303
2
0
26 Oct 2024
Task Calibration: Calibrating Large Language Models on Inference Tasks
Task Calibration: Calibrating Large Language Models on Inference Tasks
Yingjie Li
Yun Luo
Xiaotian Xie
Yue Zhang
LRM
57
0
0
24 Oct 2024
Optimizing importance weighting in the presence of sub-population shifts
Optimizing importance weighting in the presence of sub-population shifts
Floris Holstege
Bram Wouters
Noud van Giersbergen
C. Diks
88
0
0
18 Oct 2024
Attuned to Change: Causal Fine-Tuning under Latent-Confounded Shifts
Attuned to Change: Causal Fine-Tuning under Latent-Confounded Shifts
Jialin Yu
Yuxiang Zhou
Yulan He
Nevin L. Zhang
Ricardo Silva
Philip Torr
Ricardo M. A. Silva
93
0
0
18 Oct 2024
Leaving the barn door open for Clever Hans: Simple features predict LLM
  benchmark answers
Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers
Lorenzo Pacchiardi
Marko Tesic
Lucy G. Cheke
José Hernández-Orallo
89
3
0
15 Oct 2024
Eliciting Textual Descriptions from Representations of Continuous
  Prompts
Eliciting Textual Descriptions from Representations of Continuous Prompts
Dana Ramati
Daniela Gottesman
Mor Geva
101
0
0
15 Oct 2024
RepMatch: Quantifying Cross-Instance Similarities in Representation
  Space
RepMatch: Quantifying Cross-Instance Similarities in Representation Space
Mohammad Reza Modarres
Sina Abbasi
Mohammad Taher Pilehvar
75
0
0
12 Oct 2024
1234...141516
Next