ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.04118
  4. Cited By
Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

8 May 2020
Marco Tulio Ribeiro
Tongshuang Wu
Carlos Guestrin
Sameer Singh
    ELM
ArXivPDFHTML

Papers citing "Beyond Accuracy: Behavioral Testing of NLP models with CheckList"

50 / 664 papers shown
Title
Is Self-Supervised Learning More Robust Than Supervised Learning?
Is Self-Supervised Learning More Robust Than Supervised Learning?
Yuanyi Zhong
Haoran Tang
Jun-Kun Chen
Jian-wei Peng
Yu-xiong Wang
SSL
OOD
27
24
0
10 Jun 2022
Abstraction not Memory: BERT and the English Article System
Abstraction not Memory: BERT and the English Article System
Harish Tayyar Madabushi
Dagmar Divjak
P. Milin
7
4
0
08 Jun 2022
Challenges in Applying Explainability Methods to Improve the Fairness of
  NLP Models
Challenges in Applying Explainability Methods to Improve the Fairness of NLP Models
Esma Balkir
S. Kiritchenko
I. Nejadgholi
Kathleen C. Fraser
21
36
0
08 Jun 2022
Beyond Value: CHECKLIST for Testing Inferences in Planning-Based RL
Beyond Value: CHECKLIST for Testing Inferences in Planning-Based RL
Kin-Ho Lam
Delyar Tabatabai
Jed Irvine
Donald Bertucci
Anita Ruangrotsakun
Minsuk Kahng
Alan Fern
OffRL
23
1
0
04 Jun 2022
Order-sensitive Shapley Values for Evaluating Conceptual Soundness of
  NLP Models
Order-sensitive Shapley Values for Evaluating Conceptual Soundness of NLP Models
Kaiji Lu
Anupam Datta
21
0
0
01 Jun 2022
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More
  Compressible Models
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models
Clara Na
Sanket Vaibhav Mehta
Emma Strubell
64
19
0
25 May 2022
ER-Test: Evaluating Explanation Regularization Methods for Language
  Models
ER-Test: Evaluating Explanation Regularization Methods for Language Models
Brihi Joshi
Aaron Chan
Ziyi Liu
Shaoliang Nie
Maziar Sanjabi
Hamed Firooz
Xiang Ren
AAML
38
6
0
25 May 2022
FLUTE: Figurative Language Understanding through Textual Explanations
FLUTE: Figurative Language Understanding through Textual Explanations
Tuhin Chakrabarty
Arkadiy Saakyan
Debanjan Ghosh
Smaranda Muresan
54
67
0
24 May 2022
A Dynamic, Interpreted CheckList for Meaning-oriented NLG Metric
  Evaluation -- through the Lens of Semantic Similarity Rating
A Dynamic, Interpreted CheckList for Meaning-oriented NLG Metric Evaluation -- through the Lens of Semantic Similarity Rating
Laura Zeidler
Juri Opitz
Anette Frank
22
6
0
24 May 2022
Attributing AUC-ROC to Analyze Binary Classifier Performance
Attributing AUC-ROC to Analyze Binary Classifier Performance
Arya Tafvizi
Besim Avci
Mukund Sundararajan
19
7
0
24 May 2022
A Fine-grained Interpretability Evaluation Benchmark for Neural NLP
A Fine-grained Interpretability Evaluation Benchmark for Neural NLP
Lijie Wang
Yaozong Shen
Shu-ping Peng
Shuai Zhang
Xinyan Xiao
Hao Liu
Hongxuan Tang
Ying-Cong Chen
Hua Wu
Haifeng Wang
ELM
19
21
0
23 May 2022
A Domain-adaptive Pre-training Approach for Language Bias Detection in
  News
A Domain-adaptive Pre-training Approach for Language Bias Detection in News
Jan-David Krieger
Timo Spinde
Terry Ruas
Juhi Kulshrestha
Bela Gipp
AI4CE
34
21
0
22 May 2022
SALTED: A Framework for SAlient Long-Tail Translation Error Detection
SALTED: A Framework for SAlient Long-Tail Translation Error Detection
Vikas Raunak
Matt Post
Arul Menezes
33
25
0
20 May 2022
Psychiatric Scale Guided Risky Post Screening for Early Detection of
  Depression
Psychiatric Scale Guided Risky Post Screening for Early Detection of Depression
Zhiling Zhang
Siyuan Chen
Mengyue Wu
Ke Zhu
39
25
0
19 May 2022
AEON: A Method for Automatic Evaluation of NLP Test Cases
AEON: A Method for Automatic Evaluation of NLP Test Cases
Jen-tse Huang
Jianping Zhang
Wenxuan Wang
Pinjia He
Yuxin Su
Michael R. Lyu
40
23
0
13 May 2022
Evaluation Gaps in Machine Learning Practice
Evaluation Gaps in Machine Learning Practice
Ben Hutchinson
Negar Rostamzadeh
Christina Greer
Katherine A. Heller
Vinodkumar Prabhakaran
ELM
36
56
0
11 May 2022
Sibylvariant Transformations for Robust Text Classification
Sibylvariant Transformations for Robust Text Classification
Fabrice Harel-Canada
Muhammad Ali Gulzar
Nanyun Peng
Miryung Kim
AAML
VLM
11
4
0
10 May 2022
White-box Testing of NLP models with Mask Neuron Coverage
White-box Testing of NLP models with Mask Neuron Coverage
Arshdeep Sekhon
Yangfeng Ji
Matthew B. Dwyer
Yanjun Qi
AAML
19
3
0
10 May 2022
Counterfactually Augmented Data and Unintended Bias: The Case of Sexism
  and Hate Speech Detection
Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection
Indira Sen
Mattia Samory
Claudia Wagner
Isabelle Augenstein
26
17
0
09 May 2022
Improving negation detection with negation-focused pre-training
Improving negation detection with negation-focused pre-training
Thinh Hung Truong
Timothy Baldwin
Trevor Cohn
Karin Verspoor
27
21
0
09 May 2022
Necessity and Sufficiency for Explaining Text Classifiers: A Case Study
  in Hate Speech Detection
Necessity and Sufficiency for Explaining Text Classifiers: A Case Study in Hate Speech Detection
Esma Balkir
I. Nejadgholi
Kathleen C. Fraser
S. Kiritchenko
FAtt
41
27
0
06 May 2022
Implicit N-grams Induced by Recurrence
Implicit N-grams Induced by Recurrence
Xiaobing Sun
Wei Lu
27
3
0
05 May 2022
HateCheckHIn: Evaluating Hindi Hate Speech Detection Models
HateCheckHIn: Evaluating Hindi Hate Speech Detection Models
Mithun Das
Punyajoy Saha
Binny Mathew
Animesh Mukherjee
36
17
0
30 Apr 2022
What do we Really Know about State of the Art NER?
What do we Really Know about State of the Art NER?
Sowmya Vajjala
Ramya Balasubramaniam
27
15
0
29 Apr 2022
Russian Texts Detoxification with Levenshtein Editing
Russian Texts Detoxification with Levenshtein Editing
I. Gusev
13
1
0
28 Apr 2022
Counterfactual Explanations for Natural Language Interfaces
Counterfactual Explanations for Natural Language Interfaces
George Tolkachev
Stephen Mell
Steve Zdancewic
Osbert Bastani
LRM
AAML
19
4
0
27 Apr 2022
Systematicity, Compositionality and Transitivity of Deep NLP Models: a
  Metamorphic Testing Perspective
Systematicity, Compositionality and Transitivity of Deep NLP Models: a Metamorphic Testing Perspective
Edoardo Manino
Julia Rozanova
Danilo S. Carvalho
André Freitas
Lucas C. Cordeiro
30
7
0
26 Apr 2022
LM-Debugger: An Interactive Tool for Inspection and Intervention in
  Transformer-Based Language Models
LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models
Mor Geva
Avi Caciularu
Guy Dar
Paul Roit
Shoval Sadde
Micah Shlain
Bar Tamir
Yoav Goldberg
KELM
35
27
0
26 Apr 2022
Generalized Quantifiers as a Source of Error in Multilingual NLU
  Benchmarks
Generalized Quantifiers as a Source of Error in Multilingual NLU Benchmarks
Ruixiang Cui
Daniel Hershcovich
Anders Søgaard
25
13
0
22 Apr 2022
Towards an Enhanced Understanding of Bias in Pre-trained Neural Language
  Models: A Survey with Special Emphasis on Affective Bias
Towards an Enhanced Understanding of Bias in Pre-trained Neural Language Models: A Survey with Special Emphasis on Affective Bias
Anoop Kadan
Manjary P.Gangan
Deepak P
L. LajishV.
AI4CE
40
10
0
21 Apr 2022
A Corpus for Understanding and Generating Moral Stories
A Corpus for Understanding and Generating Moral Stories
Jian Guan
Ziqi Liu
Minlie Huang
32
9
0
20 Apr 2022
Probing for the Usage of Grammatical Number
Probing for the Usage of Grammatical Number
Karim Lasri
Tiago Pimentel
Alessandro Lenci
Thierry Poibeau
Ryan Cotterell
38
55
0
19 Apr 2022
Ingredient Extraction from Text in the Recipe Domain
Ingredient Extraction from Text in the Recipe Domain
Arkin Dharawat
Chris H Doan
19
1
0
18 Apr 2022
Fast Few-shot Debugging for NLU Test Suites
Fast Few-shot Debugging for NLU Test Suites
Christopher Malon
Kai Li
E. Kruus
30
4
0
13 Apr 2022
Experimental Standards for Deep Learning in Natural Language Processing
  Research
Experimental Standards for Deep Learning in Natural Language Processing Research
Dennis Ulmer
Elisa Bassignana
Max Müller-Eberstein
Daniel Varab
Mike Zhang
Rob van der Goot
Christian Hardmeier
Barbara Plank
19
10
0
13 Apr 2022
Can Question Rewriting Help Conversational Question Answering?
Can Question Rewriting Help Conversational Question Answering?
Etsuko Ishii
Yan Xu
Samuel Cahyawijaya
Bryan Wilie
33
9
0
13 Apr 2022
ProtoTEx: Explaining Model Decisions with Prototype Tensors
ProtoTEx: Explaining Model Decisions with Prototype Tensors
Anubrata Das
Chitrank Gupta
Venelin Kovatchev
Matthew Lease
Junjie Li
31
26
0
11 Apr 2022
KOBEST: Korean Balanced Evaluation of Significant Tasks
KOBEST: Korean Balanced Evaluation of Significant Tasks
Dohyeong Kim
Myeongjun Jang
D. Kwon
Eric Davis
ALM
16
23
0
09 Apr 2022
Informativeness and Invariance: Two Perspectives on Spurious
  Correlations in Natural Language
Informativeness and Invariance: Two Perspectives on Spurious Correlations in Natural Language
Jacob Eisenstein
CML
35
25
0
09 Apr 2022
Checking HateCheck: a cross-functional analysis of behaviour-aware
  learning for hate speech detection
Checking HateCheck: a cross-functional analysis of behaviour-aware learning for hate speech detection
Pedro Henrique Luz de Araujo
Benjamin Roth
22
2
0
08 Apr 2022
Lifelong Self-Adaptation: Self-Adaptation Meets Lifelong Machine
  Learning
Lifelong Self-Adaptation: Self-Adaptation Meets Lifelong Machine Learning
Omid Gheibi
Danny Weyns
CLL
28
21
0
04 Apr 2022
Learning Disentangled Representations of Negation and Uncertainty
Learning Disentangled Representations of Negation and Uncertainty
J. Vasilakes
Chrysoula Zerva
Makoto Miwa
Sophia Ananiadou
SSL
OOD
UD
CoGe
DRL
25
16
0
01 Apr 2022
Probing Speech Emotion Recognition Transformers for Linguistic Knowledge
Probing Speech Emotion Recognition Transformers for Linguistic Knowledge
Andreas Triantafyllopoulos
Johannes Wagner
H. Wierstorf
Maximilian Schmitt
U. Reichel
F. Eyben
Felix Burkhardt
Björn W. Schuller
21
25
0
01 Apr 2022
The Inefficiency of Language Models in Scholarly Retrieval: An
  Experimental Walk-through
The Inefficiency of Language Models in Scholarly Retrieval: An Experimental Walk-through
Shruti Singh
Mayank Singh
40
1
0
29 Mar 2022
UKP-SQUARE: An Online Platform for Question Answering Research
UKP-SQUARE: An Online Platform for Question Answering Research
Tim Baumgärtner
Kexin Wang
Rachneet Sachdeva
Max Eichler
Gregor Geigle
...
Leonardo F. R. Ribeiro
Jonas Pfeiffer
Nils Reimers
Gözde Gül Sahin
Iryna Gurevych
25
7
0
25 Mar 2022
NPC: Neuron Path Coverage via Characterizing Decision Logic of Deep
  Neural Networks
NPC: Neuron Path Coverage via Characterizing Decision Logic of Deep Neural Networks
Xiaofei Xie
Tianlin Li
Jian-Xun Wang
Lei Ma
Qing Guo
Felix Juefei Xu
Yang Liu
AAML
21
51
0
24 Mar 2022
Multilingual CheckList: Generation and Evaluation
Multilingual CheckList: Generation and Evaluation
Karthikeyan K
Shaily Bhatt
Pankaj Singh
Somak Aditya
Sandipan Dandapat
Sunayana Sitaram
Monojit Choudhary
ELM
24
1
0
24 Mar 2022
Factual Consistency of Multilingual Pretrained Language Models
Factual Consistency of Multilingual Pretrained Language Models
Constanza Fierro
Anders Søgaard
HILM
27
15
0
22 Mar 2022
Towards Explainable Evaluation Metrics for Natural Language Generation
Towards Explainable Evaluation Metrics for Natural Language Generation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei-Ye Zhao
Yang Gao
Steffen Eger
AAML
ELM
30
20
0
21 Mar 2022
Probing Factually Grounded Content Transfer with Factual Ablation
Probing Factually Grounded Content Transfer with Factual Ablation
Peter West
Chris Quirk
Michel Galley
Yejin Choi
HILM
30
9
0
18 Mar 2022
Previous
123...8910...121314
Next