Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.04118
Cited By
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
8 May 2020
Marco Tulio Ribeiro
Tongshuang Wu
Carlos Guestrin
Sameer Singh
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Beyond Accuracy: Behavioral Testing of NLP models with CheckList"
50 / 664 papers shown
Title
Evaluating Out-of-Distribution Performance on Document Image Classifiers
Stefan Larson
Gordon Lim
Yutong Ai
David Kuang
Kevin Leach
OODD
OOD
37
18
0
14 Oct 2022
Predicting Fine-Tuning Performance with Probing
Zining Zhu
Soroosh Shahtalebi
Frank Rudzicz
30
9
0
13 Oct 2022
A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained Models
Jimin Sun
Patrick Fernandes
Xinyi Wang
Graham Neubig
35
9
0
13 Oct 2022
Benchmarking Long-tail Generalization with Likelihood Splits
Ameya Godbole
Robin Jia
ALM
32
9
0
13 Oct 2022
SEAL : Interactive Tool for Systematic Error Analysis and Labeling
Nazneen Rajani
Weixin Liang
Lingjiao Chen
Margaret Mitchell
James Zou
48
16
0
11 Oct 2022
Checks and Strategies for Enabling Code-Switched Machine Translation
Thamme Gowda
Mozhdeh Gheini
Jonathan May
30
3
0
11 Oct 2022
REV: Information-Theoretic Evaluation of Free-Text Rationales
Hanjie Chen
Faeze Brahman
Xiang Ren
Yangfeng Ji
Yejin Choi
Swabha Swayamdipta
92
23
0
10 Oct 2022
Montague semantics and modifier consistency measurement in neural language models
Danilo S. Carvalho
Edoardo Manino
Julia Rozanova
Lucas C. Cordeiro
André Freitas
24
0
0
10 Oct 2022
CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation
Tanay Dixit
Bhargavi Paranjape
Hannaneh Hajishirzi
Luke Zettlemoyer
SyDa
146
24
0
10 Oct 2022
Quantifying Social Biases Using Templates is Unreliable
P. Seshadri
Pouya Pezeshkpour
Sameer Singh
51
33
0
09 Oct 2022
Artificial Intelligence and Natural Language Processing and Understanding in Space: A Methodological Framework and Four ESA Case Studies
José Manuél Gómez-Pérez
Andrés García-Silva
R. Leone
M. Albani
Moritz Fontaine
C. Poncet
L. Summerer
A. Donati
Ilaria Roma
Stefano Scaglioni
18
1
0
07 Oct 2022
Using Interventions to Improve Out-of-Distribution Generalization of Text-Matching Recommendation Systems
Parikshit Bansal
Yashoteja Prabhu
Emre Kıcıman
Amit Sharma
CML
OOD
33
0
0
07 Oct 2022
Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation
Thinh Hung Truong
Yulia Otmakhova
Tim Baldwin
Trevor Cohn
Jey Han Lau
Karin Verspoor
65
21
0
06 Oct 2022
InferES : A Natural Language Inference Corpus for Spanish Featuring Negation-Based Contrastive and Adversarial Examples
Venelin Kovatchev
Mariona Taulé
33
4
0
06 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
129
95
0
06 Oct 2022
Are Synonym Substitution Attacks Really Synonym Substitution Attacks?
Cheng-Han Chiang
Hunghuei Lee
AAML
33
5
0
06 Oct 2022
Understanding Prior Bias and Choice Paralysis in Transformer-based Language Representation Models through Four Experimental Probes
Ke Shen
Mayank Kejriwal
27
4
0
03 Oct 2022
Unpacking Large Language Models with Conceptual Consistency
Pritish Sahu
Michael Cogswell
Yunye Gong
Ajay Divakaran
LRM
87
16
0
29 Sep 2022
Neural Media Bias Detection Using Distant Supervision With BABE -- Bias Annotations By Experts
Timo Spinde
Manuel Plank
Jan-David Krieger
Terry Ruas
Bela Gipp
Akiko Aizawa
27
68
0
29 Sep 2022
An Interdisciplinary Perspective on Evaluation and Experimental Design for Visual Text Analytics: Position Paper
Kostiantyn Kucher
N. Sultanum
Angel Daza
Vasiliki Simaki
Maria Skeppstedt
Barbara Plank
Jean-Daniel Fekete
Narges Mahyar
25
4
0
23 Sep 2022
Automatic Error Analysis for Document-level Information Extraction
Aliva Das
Xinya Du
Barry Wang
Kejian Shi
J. Gu
Thomas Porter
Claire Cardie
26
10
0
15 Sep 2022
The Role of Explanatory Value in Natural Language Processing
Kees van Deemter
XAI
18
0
0
13 Sep 2022
On Faithfulness and Coherence of Language Explanations for Recommendation Systems
Zhouhang Xie
Julian McAuley
Bodhisattwa Prasad Majumder
LRM
35
1
0
12 Sep 2022
DECK: Behavioral Tests to Improve Interpretability and Generalizability of BERT Models Detecting Depression from Text
Jekaterina Novikova
Ksenia Shkaruta
AI4MH
35
4
0
12 Sep 2022
Increasing Adverse Drug Events extraction robustness on social media: case study on negation and speculation
Simone Scaboro
Beatrice Portelli
Emmanuele Chersoni
Enrico Santus
G. Serra
32
5
0
06 Sep 2022
A Survey on Measuring and Mitigating Reasoning Shortcuts in Machine Reading Comprehension
Xanh Ho
Johannes Mario Meissner
Saku Sugawara
Akiko Aizawa
OffRL
35
4
0
05 Sep 2022
Generating Intermediate Steps for NLI with Next-Step Supervision
Deepanway Ghosal
Somak Aditya
Monojit Choudhury
LRM
35
1
0
31 Aug 2022
Shortcut Learning of Large Language Models in Natural Language Understanding
Mengnan Du
Fengxiang He
Na Zou
Dacheng Tao
Xia Hu
KELM
OffRL
42
84
0
25 Aug 2022
PSSAT: A Perturbed Semantic Structure Awareness Transferring Method for Perturbation-Robust Slot Filling
Guanting Dong
Daichi Guo
Liwen Wang
Xuefeng Li
Zechen Wang
...
Hao Lei
Xinyue Cui
Yi Huang
Junlan Feng
Weiran Xu
21
12
0
24 Aug 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
231
447
0
23 Aug 2022
KGxBoard: Explainable and Interactive Leaderboard for Evaluation of Knowledge Graph Completion Models
Haris Widjaja
Kiril Gashteovski
Wiem Ben-Rim
Pengfei Liu
Christopher Malon
Daniel Ruffinelli
Carolin (Haas) Lawrence
Graham Neubig
25
5
0
23 Aug 2022
UKP-SQuARE v2: Explainability and Adversarial Attacks for Trustworthy QA
Rachneet Sachdeva
Haritz Puerto
Tim Baumgärtner
Sewin Tariverdian
Hao Zhang
Kexin Wang
H. Saad
Leonardo F. R. Ribeiro
Iryna Gurevych
AAML
18
2
0
19 Aug 2022
Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning
Olivia Wiles
Isabela Albuquerque
Sven Gowal
VLM
43
47
0
18 Aug 2022
MENLI: Robust Evaluation Metrics from Natural Language Inference
Yanran Chen
Steffen Eger
32
16
0
15 Aug 2022
Patching open-vocabulary models by interpolating weights
Gabriel Ilharco
Mitchell Wortsman
S. Gadre
Shuran Song
Hannaneh Hajishirzi
Simon Kornblith
Ali Farhadi
Ludwig Schmidt
VLM
KELM
32
167
0
10 Aug 2022
Generating Coherent Narratives by Learning Dynamic and Discrete Entity States with a Contrastive Framework
Jian Guan
Zhenyu Yang
Rongsheng Zhang
Zhipeng Hu
Minlie Huang
26
9
0
08 Aug 2022
A Holistic Approach to Undesired Content Detection in the Real World
Todor Markov
Chong Zhang
Sandhini Agarwal
Tyna Eloundou
Teddy Lee
Steven Adler
Angela Jiang
L. Weng
22
211
0
05 Aug 2022
ACE: Adaptive Constraint-aware Early Stopping in Hyperparameter Optimization
Yi-Wei Chen
Chi Wang
A. Saied
Rui Zhuang
19
2
0
04 Aug 2022
Unit Testing for Concepts in Neural Networks
Charles Lovering
Ellie Pavlick
25
28
0
28 Jul 2022
An Interpretability Evaluation Benchmark for Pre-trained Language Models
Ya-Ming Shen
Lijie Wang
Ying-Cong Chen
Xinyan Xiao
Jing Liu
Hua Wu
37
4
0
28 Jul 2022
A Survey of Intent Classification and Slot-Filling Datasets for Task-Oriented Dialog
Stefan Larson
Kevin Leach
41
20
0
26 Jul 2022
Human-Centric Research for NLP: Towards a Definition and Guiding Questions
Bhushan Kotnis
Kiril Gashteovski
J. Gastinger
G. Serra
Francesco Alesiani
T. Sztyler
Ammar Shaker
Na Gong
Carolin (Haas) Lawrence
Zhao Xu
25
9
0
10 Jul 2022
Probing Classifiers are Unreliable for Concept Removal and Detection
Abhinav Kumar
Chenhao Tan
Amit Sharma
AAML
34
21
0
08 Jul 2022
The "Collections as ML Data" Checklist for Machine Learning & Cultural Heritage
Benjamin Charles Germain Lee
VLM
16
7
0
06 Jul 2022
VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations
Tiancheng Zhao
Tianqi Zhang
Mingwei Zhu
Haozhan Shen
Kyusong Lee
Xiaopeng Lu
Jianwei Yin
VLM
CoGe
MLLM
47
91
0
01 Jul 2022
longhorns at DADC 2022: How many linguists does it take to fool a Question Answering model? A systematic approach to adversarial attacks
Venelin Kovatchev
Trina Chatterjee
Venkata S Govindarajan
Jifan Chen
Eunsol Choi
...
K. Erk
Matthew Lease
Junyi Jessy Li
Yating Wu
Kyle Mahowald
AAML
ELM
19
10
0
29 Jun 2022
Plug and Play Counterfactual Text Generation for Model Robustness
Nishtha Madaan
Srikanta J. Bedathur
Diptikalyan Saha
31
4
0
21 Jun 2022
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models
Paul Röttger
Haitham Seelawi
Debora Nozza
Zeerak Talat
Bertie Vidgen
30
65
0
20 Jun 2022
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Maribeth Rauh
John F. J. Mellor
J. Uesato
Po-Sen Huang
Johannes Welbl
...
Amelia Glaese
G. Irving
Iason Gabriel
William S. Isaac
Lisa Anne Hendricks
33
49
0
16 Jun 2022
"Understanding Robustness Lottery": A Geometric Visual Comparative Analysis of Neural Network Pruning Approaches
Zhimin Li
Shusen Liu
Xin Yu
Kailkhura Bhavya
Jie Cao
Diffenderfer James Daniel
P. Bremer
Valerio Pascucci
AAML
29
1
0
16 Jun 2022
Previous
1
2
3
...
7
8
9
...
12
13
14
Next