ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.04118
  4. Cited By
Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

8 May 2020
Marco Tulio Ribeiro
Tongshuang Wu
Carlos Guestrin
Sameer Singh
    ELM
ArXivPDFHTML

Papers citing "Beyond Accuracy: Behavioral Testing of NLP models with CheckList"

50 / 664 papers shown
Title
Evaluating Out-of-Distribution Performance on Document Image Classifiers
Evaluating Out-of-Distribution Performance on Document Image Classifiers
Stefan Larson
Gordon Lim
Yutong Ai
David Kuang
Kevin Leach
OODD
OOD
37
18
0
14 Oct 2022
Predicting Fine-Tuning Performance with Probing
Predicting Fine-Tuning Performance with Probing
Zining Zhu
Soroosh Shahtalebi
Frank Rudzicz
30
9
0
13 Oct 2022
A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained
  Models
A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained Models
Jimin Sun
Patrick Fernandes
Xinyi Wang
Graham Neubig
35
9
0
13 Oct 2022
Benchmarking Long-tail Generalization with Likelihood Splits
Benchmarking Long-tail Generalization with Likelihood Splits
Ameya Godbole
Robin Jia
ALM
32
9
0
13 Oct 2022
SEAL : Interactive Tool for Systematic Error Analysis and Labeling
SEAL : Interactive Tool for Systematic Error Analysis and Labeling
Nazneen Rajani
Weixin Liang
Lingjiao Chen
Margaret Mitchell
James Zou
48
16
0
11 Oct 2022
Checks and Strategies for Enabling Code-Switched Machine Translation
Checks and Strategies for Enabling Code-Switched Machine Translation
Thamme Gowda
Mozhdeh Gheini
Jonathan May
30
3
0
11 Oct 2022
REV: Information-Theoretic Evaluation of Free-Text Rationales
REV: Information-Theoretic Evaluation of Free-Text Rationales
Hanjie Chen
Faeze Brahman
Xiang Ren
Yangfeng Ji
Yejin Choi
Swabha Swayamdipta
92
23
0
10 Oct 2022
Montague semantics and modifier consistency measurement in neural
  language models
Montague semantics and modifier consistency measurement in neural language models
Danilo S. Carvalho
Edoardo Manino
Julia Rozanova
Lucas C. Cordeiro
André Freitas
24
0
0
10 Oct 2022
CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation
CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation
Tanay Dixit
Bhargavi Paranjape
Hannaneh Hajishirzi
Luke Zettlemoyer
SyDa
146
24
0
10 Oct 2022
Quantifying Social Biases Using Templates is Unreliable
Quantifying Social Biases Using Templates is Unreliable
P. Seshadri
Pouya Pezeshkpour
Sameer Singh
51
33
0
09 Oct 2022
Artificial Intelligence and Natural Language Processing and
  Understanding in Space: A Methodological Framework and Four ESA Case Studies
Artificial Intelligence and Natural Language Processing and Understanding in Space: A Methodological Framework and Four ESA Case Studies
José Manuél Gómez-Pérez
Andrés García-Silva
R. Leone
M. Albani
Moritz Fontaine
C. Poncet
L. Summerer
A. Donati
Ilaria Roma
Stefano Scaglioni
18
1
0
07 Oct 2022
Using Interventions to Improve Out-of-Distribution Generalization of
  Text-Matching Recommendation Systems
Using Interventions to Improve Out-of-Distribution Generalization of Text-Matching Recommendation Systems
Parikshit Bansal
Yashoteja Prabhu
Emre Kıcıman
Amit Sharma
CML
OOD
33
0
0
07 Oct 2022
Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal
  Negation
Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation
Thinh Hung Truong
Yulia Otmakhova
Tim Baldwin
Trevor Cohn
Jey Han Lau
Karin Verspoor
65
21
0
06 Oct 2022
InferES : A Natural Language Inference Corpus for Spanish Featuring
  Negation-Based Contrastive and Adversarial Examples
InferES : A Natural Language Inference Corpus for Spanish Featuring Negation-Based Contrastive and Adversarial Examples
Venelin Kovatchev
Mariona Taulé
33
4
0
06 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
129
95
0
06 Oct 2022
Are Synonym Substitution Attacks Really Synonym Substitution Attacks?
Are Synonym Substitution Attacks Really Synonym Substitution Attacks?
Cheng-Han Chiang
Hunghuei Lee
AAML
33
5
0
06 Oct 2022
Understanding Prior Bias and Choice Paralysis in Transformer-based
  Language Representation Models through Four Experimental Probes
Understanding Prior Bias and Choice Paralysis in Transformer-based Language Representation Models through Four Experimental Probes
Ke Shen
Mayank Kejriwal
27
4
0
03 Oct 2022
Unpacking Large Language Models with Conceptual Consistency
Unpacking Large Language Models with Conceptual Consistency
Pritish Sahu
Michael Cogswell
Yunye Gong
Ajay Divakaran
LRM
87
16
0
29 Sep 2022
Neural Media Bias Detection Using Distant Supervision With BABE -- Bias
  Annotations By Experts
Neural Media Bias Detection Using Distant Supervision With BABE -- Bias Annotations By Experts
Timo Spinde
Manuel Plank
Jan-David Krieger
Terry Ruas
Bela Gipp
Akiko Aizawa
27
68
0
29 Sep 2022
An Interdisciplinary Perspective on Evaluation and Experimental Design
  for Visual Text Analytics: Position Paper
An Interdisciplinary Perspective on Evaluation and Experimental Design for Visual Text Analytics: Position Paper
Kostiantyn Kucher
N. Sultanum
Angel Daza
Vasiliki Simaki
Maria Skeppstedt
Barbara Plank
Jean-Daniel Fekete
Narges Mahyar
25
4
0
23 Sep 2022
Automatic Error Analysis for Document-level Information Extraction
Automatic Error Analysis for Document-level Information Extraction
Aliva Das
Xinya Du
Barry Wang
Kejian Shi
J. Gu
Thomas Porter
Claire Cardie
26
10
0
15 Sep 2022
The Role of Explanatory Value in Natural Language Processing
The Role of Explanatory Value in Natural Language Processing
Kees van Deemter
XAI
18
0
0
13 Sep 2022
On Faithfulness and Coherence of Language Explanations for
  Recommendation Systems
On Faithfulness and Coherence of Language Explanations for Recommendation Systems
Zhouhang Xie
Julian McAuley
Bodhisattwa Prasad Majumder
LRM
35
1
0
12 Sep 2022
DECK: Behavioral Tests to Improve Interpretability and Generalizability
  of BERT Models Detecting Depression from Text
DECK: Behavioral Tests to Improve Interpretability and Generalizability of BERT Models Detecting Depression from Text
Jekaterina Novikova
Ksenia Shkaruta
AI4MH
35
4
0
12 Sep 2022
Increasing Adverse Drug Events extraction robustness on social media:
  case study on negation and speculation
Increasing Adverse Drug Events extraction robustness on social media: case study on negation and speculation
Simone Scaboro
Beatrice Portelli
Emmanuele Chersoni
Enrico Santus
G. Serra
32
5
0
06 Sep 2022
A Survey on Measuring and Mitigating Reasoning Shortcuts in Machine
  Reading Comprehension
A Survey on Measuring and Mitigating Reasoning Shortcuts in Machine Reading Comprehension
Xanh Ho
Johannes Mario Meissner
Saku Sugawara
Akiko Aizawa
OffRL
35
4
0
05 Sep 2022
Generating Intermediate Steps for NLI with Next-Step Supervision
Generating Intermediate Steps for NLI with Next-Step Supervision
Deepanway Ghosal
Somak Aditya
Monojit Choudhury
LRM
35
1
0
31 Aug 2022
Shortcut Learning of Large Language Models in Natural Language
  Understanding
Shortcut Learning of Large Language Models in Natural Language Understanding
Mengnan Du
Fengxiang He
Na Zou
Dacheng Tao
Xia Hu
KELM
OffRL
42
84
0
25 Aug 2022
PSSAT: A Perturbed Semantic Structure Awareness Transferring Method for
  Perturbation-Robust Slot Filling
PSSAT: A Perturbed Semantic Structure Awareness Transferring Method for Perturbation-Robust Slot Filling
Guanting Dong
Daichi Guo
Liwen Wang
Xuefeng Li
Zechen Wang
...
Hao Lei
Xinyue Cui
Yi Huang
Junlan Feng
Weiran Xu
21
12
0
24 Aug 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
231
447
0
23 Aug 2022
KGxBoard: Explainable and Interactive Leaderboard for Evaluation of
  Knowledge Graph Completion Models
KGxBoard: Explainable and Interactive Leaderboard for Evaluation of Knowledge Graph Completion Models
Haris Widjaja
Kiril Gashteovski
Wiem Ben-Rim
Pengfei Liu
Christopher Malon
Daniel Ruffinelli
Carolin (Haas) Lawrence
Graham Neubig
25
5
0
23 Aug 2022
UKP-SQuARE v2: Explainability and Adversarial Attacks for Trustworthy QA
UKP-SQuARE v2: Explainability and Adversarial Attacks for Trustworthy QA
Rachneet Sachdeva
Haritz Puerto
Tim Baumgärtner
Sewin Tariverdian
Hao Zhang
Kexin Wang
H. Saad
Leonardo F. R. Ribeiro
Iryna Gurevych
AAML
18
2
0
19 Aug 2022
Discovering Bugs in Vision Models using Off-the-shelf Image Generation
  and Captioning
Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning
Olivia Wiles
Isabela Albuquerque
Sven Gowal
VLM
43
47
0
18 Aug 2022
MENLI: Robust Evaluation Metrics from Natural Language Inference
MENLI: Robust Evaluation Metrics from Natural Language Inference
Yanran Chen
Steffen Eger
32
16
0
15 Aug 2022
Patching open-vocabulary models by interpolating weights
Patching open-vocabulary models by interpolating weights
Gabriel Ilharco
Mitchell Wortsman
S. Gadre
Shuran Song
Hannaneh Hajishirzi
Simon Kornblith
Ali Farhadi
Ludwig Schmidt
VLM
KELM
32
167
0
10 Aug 2022
Generating Coherent Narratives by Learning Dynamic and Discrete Entity
  States with a Contrastive Framework
Generating Coherent Narratives by Learning Dynamic and Discrete Entity States with a Contrastive Framework
Jian Guan
Zhenyu Yang
Rongsheng Zhang
Zhipeng Hu
Minlie Huang
26
9
0
08 Aug 2022
A Holistic Approach to Undesired Content Detection in the Real World
A Holistic Approach to Undesired Content Detection in the Real World
Todor Markov
Chong Zhang
Sandhini Agarwal
Tyna Eloundou
Teddy Lee
Steven Adler
Angela Jiang
L. Weng
22
211
0
05 Aug 2022
ACE: Adaptive Constraint-aware Early Stopping in Hyperparameter
  Optimization
ACE: Adaptive Constraint-aware Early Stopping in Hyperparameter Optimization
Yi-Wei Chen
Chi Wang
A. Saied
Rui Zhuang
19
2
0
04 Aug 2022
Unit Testing for Concepts in Neural Networks
Unit Testing for Concepts in Neural Networks
Charles Lovering
Ellie Pavlick
25
28
0
28 Jul 2022
An Interpretability Evaluation Benchmark for Pre-trained Language Models
An Interpretability Evaluation Benchmark for Pre-trained Language Models
Ya-Ming Shen
Lijie Wang
Ying-Cong Chen
Xinyan Xiao
Jing Liu
Hua Wu
37
4
0
28 Jul 2022
A Survey of Intent Classification and Slot-Filling Datasets for
  Task-Oriented Dialog
A Survey of Intent Classification and Slot-Filling Datasets for Task-Oriented Dialog
Stefan Larson
Kevin Leach
41
20
0
26 Jul 2022
Human-Centric Research for NLP: Towards a Definition and Guiding
  Questions
Human-Centric Research for NLP: Towards a Definition and Guiding Questions
Bhushan Kotnis
Kiril Gashteovski
J. Gastinger
G. Serra
Francesco Alesiani
T. Sztyler
Ammar Shaker
Na Gong
Carolin (Haas) Lawrence
Zhao Xu
25
9
0
10 Jul 2022
Probing Classifiers are Unreliable for Concept Removal and Detection
Probing Classifiers are Unreliable for Concept Removal and Detection
Abhinav Kumar
Chenhao Tan
Amit Sharma
AAML
34
21
0
08 Jul 2022
The "Collections as ML Data" Checklist for Machine Learning & Cultural
  Heritage
The "Collections as ML Data" Checklist for Machine Learning & Cultural Heritage
Benjamin Charles Germain Lee
VLM
16
7
0
06 Jul 2022
VL-CheckList: Evaluating Pre-trained Vision-Language Models with
  Objects, Attributes and Relations
VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations
Tiancheng Zhao
Tianqi Zhang
Mingwei Zhu
Haozhan Shen
Kyusong Lee
Xiaopeng Lu
Jianwei Yin
VLM
CoGe
MLLM
47
91
0
01 Jul 2022
longhorns at DADC 2022: How many linguists does it take to fool a
  Question Answering model? A systematic approach to adversarial attacks
longhorns at DADC 2022: How many linguists does it take to fool a Question Answering model? A systematic approach to adversarial attacks
Venelin Kovatchev
Trina Chatterjee
Venkata S Govindarajan
Jifan Chen
Eunsol Choi
...
K. Erk
Matthew Lease
Junyi Jessy Li
Yating Wu
Kyle Mahowald
AAML
ELM
19
10
0
29 Jun 2022
Plug and Play Counterfactual Text Generation for Model Robustness
Plug and Play Counterfactual Text Generation for Model Robustness
Nishtha Madaan
Srikanta J. Bedathur
Diptikalyan Saha
31
4
0
21 Jun 2022
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech
  Detection Models
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models
Paul Röttger
Haitham Seelawi
Debora Nozza
Zeerak Talat
Bertie Vidgen
30
65
0
20 Jun 2022
Characteristics of Harmful Text: Towards Rigorous Benchmarking of
  Language Models
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Maribeth Rauh
John F. J. Mellor
J. Uesato
Po-Sen Huang
Johannes Welbl
...
Amelia Glaese
G. Irving
Iason Gabriel
William S. Isaac
Lisa Anne Hendricks
33
49
0
16 Jun 2022
"Understanding Robustness Lottery": A Geometric Visual Comparative
  Analysis of Neural Network Pruning Approaches
"Understanding Robustness Lottery": A Geometric Visual Comparative Analysis of Neural Network Pruning Approaches
Zhimin Li
Shusen Liu
Xin Yu
Kailkhura Bhavya
Jie Cao
Diffenderfer James Daniel
P. Bremer
Valerio Pascucci
AAML
29
1
0
16 Jun 2022
Previous
123...789...121314
Next