Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

8 May 2020

Marco Tulio Ribeiro

Tongshuang Wu

Carlos Guestrin

Papers citing "Beyond Accuracy: Behavioral Testing of NLP models with CheckList"

14 / 664 papers shown

Title
Which *BERT? A Survey Organizing Contextualized Encoders Patrick Xia Shijie Wu Benjamin Van Durme 26 50 0 02 Oct 2020
TaxiNLI: Taking a Ride up the NLU Hill Pratik M. Joshi Somak Aditya Aalok Sathe Monojit Choudhury 25 36 0 30 Sep 2020
Automated Source Code Generation and Auto-completion Using Deep Learning: Comparing and Discussing Current Language-Model-Related Approaches Juan Cruz-Benito Sanjay Vishwakarma Francisco Martín-Fernández Ismael Faro Ibm Quantum 22 30 0 16 Sep 2020
A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research Cody Watson Nathan Cooper David Nader-Palacio Kevin Moran Denys Poshyvanyk 26 111 0 14 Sep 2020
On Robustness and Bias Analysis of BERT-based Relation Extraction Luoqiu Li Xiang Chen Hongbin Ye Zhen Bi Shumin Deng Ningyu Zhang Huajun Chen 32 18 0 14 Sep 2020
Visually Analyzing Contextualized Embeddings M. Berger 19 13 0 05 Sep 2020
The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models Ian Tenney James Wexler Jasmijn Bastings Tolga Bolukbasi Andy Coenen ... Ellen Jiang Mahima Pushkarna Carey Radebaugh Emily Reif Ann Yuan VLM 38 191 0 12 Aug 2020
Technology Readiness Levels for AI & ML Alexander Lavin Ajay Sharma VLM 19 105 0 21 Jun 2020
Benchmarking Robustness of Machine Reading Comprehension Models Chenglei Si Ziqing Yang Yiming Cui Wentao Ma Ting Liu Shijin Wang ELM AAML 19 42 0 29 Apr 2020
A Primer in BERTology: What we know about how BERT works Anna Rogers Olga Kovaleva Anna Rumshisky OffRL 35 1,456 0 27 Feb 2020
Undersensitivity in Neural Reading Comprehension Johannes Welbl Pasquale Minervini Max Bartolo Pontus Stenetorp Sebastian Riedel AAML 11 18 0 15 Feb 2020
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets Mor Geva Yoav Goldberg Jonathan Berant 242 320 0 21 Aug 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 297 6,959 0 20 Apr 2018
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks Mohit Iyyer John Wieting Kevin Gimpel Luke Zettlemoyer AAML GAN 205 711 0 17 Apr 2018