Beyond Accuracy: Behavioral Testing of NLP models with CheckList

8 May 2020

Tongshuang Wu

Papers citing "Beyond Accuracy: Behavioral Testing of NLP models with CheckList"

50 / 664 papers shown

Title
Multi-Scales Data Augmentation Approach In Natural Language Inference For Artifacts Mitigation And Pre-Trained Model Optimization Zhenyu Lu 20 1 0 16 Dec 2022
Azimuth: Systematic Error Analysis for Text Classification Gabrielle Gauthier Melançon Orlando Marquez Ayala Lindsay D. Brin Chris Tyler Frederic Branchaud-Charron Joseph Marinier Karine Grande Dieu-Thu Le 23 3 0 16 Dec 2022
Tensions Between the Proxies of Human Values in AI Teresa Datta D. Nissani Max Cembalest Akash Khanna Haley Massa John P. Dickerson 34 2 0 14 Dec 2022
On Text-based Personality Computing: Challenges and Future Directions Qixiang Fang Anastasia Giachanou A. Bagheri L. Boeschoten E. V. Kesteren Mahdi Shafiee Kamalabad Daniel L. Oberski 26 6 0 13 Dec 2022
Robustness of Learning from Task Instructions Jiasheng Gu Hongyu Zhao Hanzi Xu Liang Nie Hongyuan Mei Wenpeng Yin OOD 20 32 0 07 Dec 2022
Adaptive Testing of Computer Vision Models Irena Gao Gabriel Ilharco Scott M. Lundberg Marco Tulio Ribeiro VLM 17 42 0 06 Dec 2022
Human-in-the-Loop Hate Speech Classification in a Multilingual Context Ana Kotarcic Dominik Hangartner Fabrizio Gilardi Selina Kurer K. Donnay 24 2 0 05 Dec 2022
Event knowledge in large language models: the gap between the impossible and the unlikely Carina Kauf Anna A. Ivanova Giulia Rambelli Emmanuele Chersoni Jingyuan Selena She Zawad Chowdhury Evelina Fedorenko Alessandro Lenci 37 67 0 02 Dec 2022
Rank-One Editing of Encoder-Decoder Models Vikas Raunak Arul Menezes KELM 26 10 0 23 Nov 2022
Validating Large Language Models with ReLM Michael Kuchnik Virginia Smith George Amvrosiadis 36 27 0 21 Nov 2022
Operationalizing Specifications, In Addition to Test Sets for Evaluating Constrained Generative Models Vikas Raunak Matt Post Arul Menezes EGVM 37 0 0 19 Nov 2022
GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective Linyi Yang Shuibai Zhang Libo Qin Yafu Li Yidong Wang Hanmeng Liu Jindong Wang Xingxu Xie Yue Zhang ELM 54 79 0 15 Nov 2022
Capabilities for Better ML Engineering Chenyang Yang Rachel A. Brower-Sinning Grace A. Lewis Christian Kastner Tongshuang Wu 29 3 0 11 Nov 2022
Understanding Text Classification Data and Models Using Aggregated Input Salience Sebastian Ebert Alice Shoshana Jakobovits Katja Filippova FAtt 27 3 0 10 Nov 2022
Towards Human-Centred Explainability Benchmarks For Text Classification Viktor Schlegel Erick Mendez Guzman Riza Batista-Navarro 28 5 0 10 Nov 2022
DC-Check: A Data-Centric AI checklist to guide the development of reliable machine learning systems Nabeel Seedat F. Imrie M. Schaar 32 12 0 09 Nov 2022
Discover, Explanation, Improvement: An Automatic Slice Detection Framework for Natural Language Processing Wenyue Hua Lifeng Jin Linfeng Song Haitao Mi Yongfeng Zhang Dong Yu 32 1 0 08 Nov 2022
Fixing Model Bugs with Natural Language Patches Shikhar Murty Christopher D. Manning Scott M. Lundberg Marco Tulio Ribeiro KELM 32 37 0 07 Nov 2022
Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions Gaurav Verma Vishwa Vinay Ryan A. Rossi Srijan Kumar VLM AAML 13 8 0 04 Nov 2022
Dealing with Drift of Adaptation Spaces in Learning-based Self-Adaptive Systems using Lifelong Self-Adaptation Omid Gheibi Danny Weyns 23 3 0 04 Nov 2022
LMentry: A Language Model Benchmark of Elementary Language Tasks Avia Efrat Or Honovich Omer Levy 34 20 0 03 Nov 2022
Characterizing Intrinsic Compositionality in Transformers with Tree Projections Shikhar Murty Pratyusha Sharma Jacob Andreas Christopher D. Manning 19 39 0 02 Nov 2022
Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality Anuj Diwan Layne Berry Eunsol Choi David Harwath Kyle Mahowald CoGe 111 41 0 01 Nov 2022
CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation Abhilasha Ravichander Matt Gardner Ana Marasović 33 34 0 01 Nov 2022
Lila: A Unified Benchmark for Mathematical Reasoning Swaroop Mishra Matthew Finlayson Pan Lu Leonard Tang Sean Welleck ... Tanmay Rajpurohit Oyvind Tafjord Ashish Sabharwal Peter Clark Ashwin Kalyan ELM AIMat ReLM LRM 36 0 0 31 Oct 2022
Emergent Linguistic Structures in Neural Networks are Fragile Emanuele La Malfa Matthew Wicker Marta Kiatkowska 22 1 0 31 Oct 2022
Truncation Sampling as Language Model Desmoothing John Hewitt Christopher D. Manning Percy Liang BDL 44 76 0 27 Oct 2022
Leveraging Affirmative Interpretations from Negation Improves Natural Language Understanding Md Mosharaf Hossain Eduardo Blanco 50 4 0 26 Oct 2022
IDK-MRC: Unanswerable Questions for Indonesian Machine Reading Comprehension Rifki Afina Putri Alice Oh 38 9 0 25 Oct 2022
DEMETR: Diagnosing Evaluation Metrics for Translation Marzena Karpinska N. Raj Katherine Thai Yixiao Song Ankita Gupta Mohit Iyyer 31 38 0 25 Oct 2022
Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating Models to Reflect Conflicting Evidence Hung-Ting Chen Michael J.Q. Zhang Eunsol Choi RALM HILM 52 92 0 25 Oct 2022
Cascading Biases: Investigating the Effect of Heuristic Annotation Strategies on Data and Models Chaitanya Malaviya Sudeep Bhatia Mark Yatskar 32 4 0 24 Oct 2022
The Better Your Syntax, the Better Your Semantics? Probing Pretrained Language Models for the English Comparative Correlative Leonie Weissweiler Valentin Hofmann Abdullatif Köksal Hinrich Schütze 37 33 0 24 Oct 2022
Multilingual Auxiliary Tasks Training: Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models Syrielle Montariol Arij Riabi Djamé Seddah 29 10 0 24 Oct 2022
Lexical Generalization Improves with Larger Models and Longer Training Elron Bandel Yoav Goldberg Yanai Elazar 64 6 0 23 Oct 2022
Exploring The Landscape of Distributional Robustness for Question Answering Models Anas Awadalla Mitchell Wortsman Gabriel Ilharco Sewon Min Ian H. Magnusson Hannaneh Hajishirzi Ludwig Schmidt ELM OOD KELM 72 19 0 22 Oct 2022
NeuroCounterfactuals: Beyond Minimal-Edit Counterfactuals for Richer Data Augmentation Phillip Howard Gadi Singer Vasudev Lal Yejin Choi Swabha Swayamdipta CML 60 25 0 22 Oct 2022
Enhancing Tabular Reasoning with Pattern Exploiting Training Abhilash Shankarampeta Vivek Gupta Shuo Zhang LMTD RALM ReLM 68 6 0 21 Oct 2022
A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models Alessandro Stolfo Zhijing Jin Kumar Shridhar Bernhard Schölkopf Mrinmaya Sachan ELM OOD LRM 35 62 0 21 Oct 2022
AugCSE: Contrastive Sentence Embedding with Diverse Augmentations Zilu Tang Muhammed Yusuf Kocyigit Derry Wijaya 37 9 0 20 Oct 2022
Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP Yangyi Chen Hongcheng Gao Yuchen Zhang Fanchao Qi Longtao Huang Zhiyuan Liu Maosong Sun SILM 27 45 0 19 Oct 2022
Controllable Fake Document Infilling for Cyber Deception Yibo Hu Yu Lin Eric Parolin Latif Khan Kevin W. Hamlen 37 8 0 18 Oct 2022
ROSE: Robust Selective Fine-tuning for Pre-trained Language Models Lan Jiang Hao Zhou Yankai Lin Peng Li Jie Zhou R. Jiang AAML 39 8 0 18 Oct 2022
Prompting GPT-3 To Be Reliable Chenglei Si Zhe Gan Zhengyuan Yang Shuohang Wang Jianfeng Wang Jordan L. Boyd-Graber Lijuan Wang KELM LRM 60 283 0 17 Oct 2022
Beyond Model Interpretability: On the Faithfulness and Adversarial Robustness of Contrastive Textual Explanations Julia El Zini M. Awad AAML 26 2 0 17 Oct 2022
TestAug: A Framework for Augmenting Capability-based NLP Tests Guanqun Yang Mirazul Haque Qiaochu Song Wei Yang Xueqing Liu ELM 34 0 0 14 Oct 2022
Efficiently Controlling Multiple Risks with Pareto Testing Bracha Laufer-Goldshtein Adam Fisch Regina Barzilay Tommi Jaakkola 38 16 0 14 Oct 2022
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey Sachin Kumar Vidhisha Balachandran Lucille Njoo Antonios Anastasopoulos Yulia Tsvetkov ELM 81 86 0 14 Oct 2022
Pretrained Transformers Do not Always Improve Robustness Swaroop Mishra Bhavdeep Singh Sachdeva Chitta Baral VLM 33 2 0 14 Oct 2022
Can Language Representation Models Think in Bets? Zhi–Bin Tang Mayank Kejriwal 15 6 0 14 Oct 2022