ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.04118
  4. Cited By
Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

8 May 2020
Marco Tulio Ribeiro
Tongshuang Wu
Carlos Guestrin
Sameer Singh
    ELM
ArXivPDFHTML

Papers citing "Beyond Accuracy: Behavioral Testing of NLP models with CheckList"

50 / 664 papers shown
Title
Multi-Scales Data Augmentation Approach In Natural Language Inference
  For Artifacts Mitigation And Pre-Trained Model Optimization
Multi-Scales Data Augmentation Approach In Natural Language Inference For Artifacts Mitigation And Pre-Trained Model Optimization
Zhenyu Lu
20
1
0
16 Dec 2022
Azimuth: Systematic Error Analysis for Text Classification
Azimuth: Systematic Error Analysis for Text Classification
Gabrielle Gauthier Melançon
Orlando Marquez Ayala
Lindsay D. Brin
Chris Tyler
Frederic Branchaud-Charron
Joseph Marinier
Karine Grande
Dieu-Thu Le
23
3
0
16 Dec 2022
Tensions Between the Proxies of Human Values in AI
Tensions Between the Proxies of Human Values in AI
Teresa Datta
D. Nissani
Max Cembalest
Akash Khanna
Haley Massa
John P. Dickerson
34
2
0
14 Dec 2022
On Text-based Personality Computing: Challenges and Future Directions
On Text-based Personality Computing: Challenges and Future Directions
Qixiang Fang
Anastasia Giachanou
A. Bagheri
L. Boeschoten
E. V. Kesteren
Mahdi Shafiee Kamalabad
Daniel L. Oberski
26
6
0
13 Dec 2022
Robustness of Learning from Task Instructions
Robustness of Learning from Task Instructions
Jiasheng Gu
Hongyu Zhao
Hanzi Xu
Liang Nie
Hongyuan Mei
Wenpeng Yin
OOD
20
32
0
07 Dec 2022
Adaptive Testing of Computer Vision Models
Adaptive Testing of Computer Vision Models
Irena Gao
Gabriel Ilharco
Scott M. Lundberg
Marco Tulio Ribeiro
VLM
17
42
0
06 Dec 2022
Human-in-the-Loop Hate Speech Classification in a Multilingual Context
Human-in-the-Loop Hate Speech Classification in a Multilingual Context
Ana Kotarcic
Dominik Hangartner
Fabrizio Gilardi
Selina Kurer
K. Donnay
24
2
0
05 Dec 2022
Event knowledge in large language models: the gap between the impossible
  and the unlikely
Event knowledge in large language models: the gap between the impossible and the unlikely
Carina Kauf
Anna A. Ivanova
Giulia Rambelli
Emmanuele Chersoni
Jingyuan Selena She
Zawad Chowdhury
Evelina Fedorenko
Alessandro Lenci
37
67
0
02 Dec 2022
Rank-One Editing of Encoder-Decoder Models
Rank-One Editing of Encoder-Decoder Models
Vikas Raunak
Arul Menezes
KELM
26
10
0
23 Nov 2022
Validating Large Language Models with ReLM
Validating Large Language Models with ReLM
Michael Kuchnik
Virginia Smith
George Amvrosiadis
36
27
0
21 Nov 2022
Operationalizing Specifications, In Addition to Test Sets for Evaluating
  Constrained Generative Models
Operationalizing Specifications, In Addition to Test Sets for Evaluating Constrained Generative Models
Vikas Raunak
Matt Post
Arul Menezes
EGVM
37
0
0
19 Nov 2022
GLUE-X: Evaluating Natural Language Understanding Models from an
  Out-of-distribution Generalization Perspective
GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective
Linyi Yang
Shuibai Zhang
Libo Qin
Yafu Li
Yidong Wang
Hanmeng Liu
Jindong Wang
Xingxu Xie
Yue Zhang
ELM
54
79
0
15 Nov 2022
Capabilities for Better ML Engineering
Capabilities for Better ML Engineering
Chenyang Yang
Rachel A. Brower-Sinning
Grace A. Lewis
Christian Kastner
Tongshuang Wu
29
3
0
11 Nov 2022
Understanding Text Classification Data and Models Using Aggregated Input
  Salience
Understanding Text Classification Data and Models Using Aggregated Input Salience
Sebastian Ebert
Alice Shoshana Jakobovits
Katja Filippova
FAtt
27
3
0
10 Nov 2022
Towards Human-Centred Explainability Benchmarks For Text Classification
Towards Human-Centred Explainability Benchmarks For Text Classification
Viktor Schlegel
Erick Mendez Guzman
Riza Batista-Navarro
28
5
0
10 Nov 2022
DC-Check: A Data-Centric AI checklist to guide the development of
  reliable machine learning systems
DC-Check: A Data-Centric AI checklist to guide the development of reliable machine learning systems
Nabeel Seedat
F. Imrie
M. Schaar
32
12
0
09 Nov 2022
Discover, Explanation, Improvement: An Automatic Slice Detection
  Framework for Natural Language Processing
Discover, Explanation, Improvement: An Automatic Slice Detection Framework for Natural Language Processing
Wenyue Hua
Lifeng Jin
Linfeng Song
Haitao Mi
Yongfeng Zhang
Dong Yu
32
1
0
08 Nov 2022
Fixing Model Bugs with Natural Language Patches
Fixing Model Bugs with Natural Language Patches
Shikhar Murty
Christopher D. Manning
Scott M. Lundberg
Marco Tulio Ribeiro
KELM
32
37
0
07 Nov 2022
Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content
  Dilutions
Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions
Gaurav Verma
Vishwa Vinay
Ryan A. Rossi
Srijan Kumar
VLM
AAML
13
8
0
04 Nov 2022
Dealing with Drift of Adaptation Spaces in Learning-based Self-Adaptive
  Systems using Lifelong Self-Adaptation
Dealing with Drift of Adaptation Spaces in Learning-based Self-Adaptive Systems using Lifelong Self-Adaptation
Omid Gheibi
Danny Weyns
23
3
0
04 Nov 2022
LMentry: A Language Model Benchmark of Elementary Language Tasks
LMentry: A Language Model Benchmark of Elementary Language Tasks
Avia Efrat
Or Honovich
Omer Levy
34
20
0
03 Nov 2022
Characterizing Intrinsic Compositionality in Transformers with Tree
  Projections
Characterizing Intrinsic Compositionality in Transformers with Tree Projections
Shikhar Murty
Pratyusha Sharma
Jacob Andreas
Christopher D. Manning
19
39
0
02 Nov 2022
Why is Winoground Hard? Investigating Failures in Visuolinguistic
  Compositionality
Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality
Anuj Diwan
Layne Berry
Eunsol Choi
David Harwath
Kyle Mahowald
CoGe
111
41
0
01 Nov 2022
CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about
  Negation
CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation
Abhilasha Ravichander
Matt Gardner
Ana Marasović
33
34
0
01 Nov 2022
Lila: A Unified Benchmark for Mathematical Reasoning
Lila: A Unified Benchmark for Mathematical Reasoning
Swaroop Mishra
Matthew Finlayson
Pan Lu
Leonard Tang
Sean Welleck
...
Tanmay Rajpurohit
Oyvind Tafjord
Ashish Sabharwal
Peter Clark
Ashwin Kalyan
ELM
AIMat
ReLM
LRM
36
0
0
31 Oct 2022
Emergent Linguistic Structures in Neural Networks are Fragile
Emergent Linguistic Structures in Neural Networks are Fragile
Emanuele La Malfa
Matthew Wicker
Marta Kiatkowska
22
1
0
31 Oct 2022
Truncation Sampling as Language Model Desmoothing
Truncation Sampling as Language Model Desmoothing
John Hewitt
Christopher D. Manning
Percy Liang
BDL
44
76
0
27 Oct 2022
Leveraging Affirmative Interpretations from Negation Improves Natural
  Language Understanding
Leveraging Affirmative Interpretations from Negation Improves Natural Language Understanding
Md Mosharaf Hossain
Eduardo Blanco
50
4
0
26 Oct 2022
IDK-MRC: Unanswerable Questions for Indonesian Machine Reading
  Comprehension
IDK-MRC: Unanswerable Questions for Indonesian Machine Reading Comprehension
Rifki Afina Putri
Alice Oh
38
9
0
25 Oct 2022
DEMETR: Diagnosing Evaluation Metrics for Translation
DEMETR: Diagnosing Evaluation Metrics for Translation
Marzena Karpinska
N. Raj
Katherine Thai
Yixiao Song
Ankita Gupta
Mohit Iyyer
31
38
0
25 Oct 2022
Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating
  Models to Reflect Conflicting Evidence
Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating Models to Reflect Conflicting Evidence
Hung-Ting Chen
Michael J.Q. Zhang
Eunsol Choi
RALM
HILM
52
92
0
25 Oct 2022
Cascading Biases: Investigating the Effect of Heuristic Annotation
  Strategies on Data and Models
Cascading Biases: Investigating the Effect of Heuristic Annotation Strategies on Data and Models
Chaitanya Malaviya
Sudeep Bhatia
Mark Yatskar
32
4
0
24 Oct 2022
The Better Your Syntax, the Better Your Semantics? Probing Pretrained
  Language Models for the English Comparative Correlative
The Better Your Syntax, the Better Your Semantics? Probing Pretrained Language Models for the English Comparative Correlative
Leonie Weissweiler
Valentin Hofmann
Abdullatif Köksal
Hinrich Schütze
37
33
0
24 Oct 2022
Multilingual Auxiliary Tasks Training: Bridging the Gap between
  Languages for Zero-Shot Transfer of Hate Speech Detection Models
Multilingual Auxiliary Tasks Training: Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models
Syrielle Montariol
Arij Riabi
Djamé Seddah
29
10
0
24 Oct 2022
Lexical Generalization Improves with Larger Models and Longer Training
Lexical Generalization Improves with Larger Models and Longer Training
Elron Bandel
Yoav Goldberg
Yanai Elazar
64
6
0
23 Oct 2022
Exploring The Landscape of Distributional Robustness for Question
  Answering Models
Exploring The Landscape of Distributional Robustness for Question Answering Models
Anas Awadalla
Mitchell Wortsman
Gabriel Ilharco
Sewon Min
Ian H. Magnusson
Hannaneh Hajishirzi
Ludwig Schmidt
ELM
OOD
KELM
72
19
0
22 Oct 2022
NeuroCounterfactuals: Beyond Minimal-Edit Counterfactuals for Richer
  Data Augmentation
NeuroCounterfactuals: Beyond Minimal-Edit Counterfactuals for Richer Data Augmentation
Phillip Howard
Gadi Singer
Vasudev Lal
Yejin Choi
Swabha Swayamdipta
CML
60
25
0
22 Oct 2022
Enhancing Tabular Reasoning with Pattern Exploiting Training
Enhancing Tabular Reasoning with Pattern Exploiting Training
Abhilash Shankarampeta
Vivek Gupta
Shuo Zhang
LMTD
RALM
ReLM
68
6
0
21 Oct 2022
A Causal Framework to Quantify the Robustness of Mathematical Reasoning
  with Language Models
A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models
Alessandro Stolfo
Zhijing Jin
Kumar Shridhar
Bernhard Schölkopf
Mrinmaya Sachan
ELM
OOD
LRM
35
62
0
21 Oct 2022
AugCSE: Contrastive Sentence Embedding with Diverse Augmentations
AugCSE: Contrastive Sentence Embedding with Diverse Augmentations
Zilu Tang
Muhammed Yusuf Kocyigit
Derry Wijaya
37
9
0
20 Oct 2022
Why Should Adversarial Perturbations be Imperceptible? Rethink the
  Research Paradigm in Adversarial NLP
Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP
Yangyi Chen
Hongcheng Gao
Yuchen Zhang
Fanchao Qi
Longtao Huang
Zhiyuan Liu
Maosong Sun
SILM
27
45
0
19 Oct 2022
Controllable Fake Document Infilling for Cyber Deception
Controllable Fake Document Infilling for Cyber Deception
Yibo Hu
Yu Lin
Eric Parolin
Latif Khan
Kevin W. Hamlen
37
8
0
18 Oct 2022
ROSE: Robust Selective Fine-tuning for Pre-trained Language Models
ROSE: Robust Selective Fine-tuning for Pre-trained Language Models
Lan Jiang
Hao Zhou
Yankai Lin
Peng Li
Jie Zhou
R. Jiang
AAML
39
8
0
18 Oct 2022
Prompting GPT-3 To Be Reliable
Prompting GPT-3 To Be Reliable
Chenglei Si
Zhe Gan
Zhengyuan Yang
Shuohang Wang
Jianfeng Wang
Jordan L. Boyd-Graber
Lijuan Wang
KELM
LRM
60
283
0
17 Oct 2022
Beyond Model Interpretability: On the Faithfulness and Adversarial
  Robustness of Contrastive Textual Explanations
Beyond Model Interpretability: On the Faithfulness and Adversarial Robustness of Contrastive Textual Explanations
Julia El Zini
M. Awad
AAML
26
2
0
17 Oct 2022
TestAug: A Framework for Augmenting Capability-based NLP Tests
TestAug: A Framework for Augmenting Capability-based NLP Tests
Guanqun Yang
Mirazul Haque
Qiaochu Song
Wei Yang
Xueqing Liu
ELM
34
0
0
14 Oct 2022
Efficiently Controlling Multiple Risks with Pareto Testing
Efficiently Controlling Multiple Risks with Pareto Testing
Bracha Laufer-Goldshtein
Adam Fisch
Regina Barzilay
Tommi Jaakkola
38
16
0
14 Oct 2022
Language Generation Models Can Cause Harm: So What Can We Do About It?
  An Actionable Survey
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
Sachin Kumar
Vidhisha Balachandran
Lucille Njoo
Antonios Anastasopoulos
Yulia Tsvetkov
ELM
81
86
0
14 Oct 2022
Pretrained Transformers Do not Always Improve Robustness
Pretrained Transformers Do not Always Improve Robustness
Swaroop Mishra
Bhavdeep Singh Sachdeva
Chitta Baral
VLM
33
2
0
14 Oct 2022
Can Language Representation Models Think in Bets?
Can Language Representation Models Think in Bets?
Zhi–Bin Tang
Mayank Kejriwal
15
6
0
14 Oct 2022
Previous
123...678...121314
Next