Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models

14 October 2021

Diyi Yang

Papers citing "Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models"

40 / 40 papers shown

Title
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models Haoyang Li Xiaogeng Liu SILM 55 5 0 30 Oct 2024
Focus On This, Not That! Steering LLMs With Adaptive Feature Specification Tom A. Lamb Adam Davies Alasdair Paren Philip Torr Francesco Pinto 77 0 0 30 Oct 2024
Specification Overfitting in Artificial Intelligence Benjamin Roth Pedro Henrique Luz de Araujo Yuxi Xia Saskia Kaltenbrunner Christoph Korab 95 1 0 13 Mar 2024
HiddenCut: Simple Data Augmentation for Natural Language Understanding with Better Generalization Jiaao Chen Dinghan Shen Weizhu Chen Diyi Yang BDL 29 47 0 31 May 2021
Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU Models Mengnan Du Varun Manjunatha R. Jain Ruchi Deshpande Franck Dernoncourt Jiuxiang Gu Tong Sun Xia Hu 66 107 0 11 Mar 2021
Contrastive Explanations for Model Interpretability Alon Jacovi Swabha Swayamdipta Shauli Ravfogel Yanai Elazar Yejin Choi Yoav Goldberg 76 96 0 02 Mar 2021
Robustness to Spurious Correlations in Text Classification via Automatically Generated Counterfactuals Zhao Wang A. Culotta CML OOD 30 100 0 18 Dec 2020
Removing Spurious Features can Hurt Accuracy and Affect Groups Disproportionately Fereshte Khani Percy Liang FaML 26 65 0 07 Dec 2020
Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles Christopher Clark Mark Yatskar Luke Zettlemoyer 33 62 0 07 Nov 2020
Identifying Spurious Correlations for Robust Text Classification Zhao Wang A. Culotta OOD 23 76 0 06 Oct 2020
An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models Lifu Tu Garima Lalwani Spandana Gella He He LRM 48 185 0 14 Jul 2020
Robustness to Spurious Correlations via Human Annotations Megha Srivastava Tatsunori Hashimoto Percy Liang CML OOD 23 89 0 13 Jul 2020
Towards Robustifying NLI Models Against Lexical Dataset Biases Xiang Zhou Joey Tianyi Zhou 35 58 0 10 May 2020
An Investigation of Why Overparameterization Exacerbates Spurious Correlations Shiori Sagawa Aditi Raghunathan Pang Wei Koh Percy Liang 167 375 0 09 May 2020
Beyond Accuracy: Behavioral Testing of NLP models with CheckList Marco Tulio Ribeiro Tongshuang Wu Carlos Guestrin Sameer Singh ELM 98 1,089 0 08 May 2020
Evaluating Robustness to Input Perturbations for Neural Machine Translation Xing Niu Prashant Mathur Georgiana Dinu Yaser Al-Onaizan AAML 30 64 0 01 May 2020
WT5?! Training Text-to-Text Models to Explain their Predictions Sharan Narang Colin Raffel Katherine Lee Adam Roberts Noah Fiedel Karishma Malkan 32 199 0 30 Apr 2020
Self-Attention Attribution: Interpreting Information Interactions Inside Transformer Y. Hao Li Dong Furu Wei Ke Xu ViT 41 219 0 23 Apr 2020
Shortcut Learning in Deep Neural Networks Robert Geirhos J. Jacobsen Claudio Michaelis R. Zemel Wieland Brendel Matthias Bethge Felix Wichmann 126 2,023 0 16 Apr 2020
Pretrained Transformers Improve Out-of-Distribution Robustness Dan Hendrycks Xiaoyuan Liu Eric Wallace Adam Dziedzic R. Krishnan D. Song OOD 98 430 0 13 Apr 2020
Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection Hanjie Chen Guangtao Zheng Yangfeng Ji FAtt 51 92 0 04 Apr 2020
Automatic Shortcut Removal for Self-Supervised Representation Learning Matthias Minderer Olivier Bachem N. Houlsby Michael Tschannen SSL 39 73 0 20 Feb 2020
Adversarial Filters of Dataset Biases Ronan Le Bras Swabha Swayamdipta Chandra Bhagavatula Rowan Zellers Matthew E. Peters Ashish Sabharwal Yejin Choi 55 221 0 10 Feb 2020
Learning to Deceive with Attention-Based Explanations Danish Pruthi Mansi Gupta Bhuwan Dhingra Graham Neubig Zachary Chase Lipton 36 193 0 17 Sep 2019
Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases Christopher Clark Mark Yatskar Luke Zettlemoyer OOD 45 463 0 09 Sep 2019
Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual He He Sheng Zha Haohan Wang 33 198 0 28 Aug 2019
Adversarial Domain Adaptation for Machine Reading Comprehension Huazheng Wang Zhe Gan Xiaodong Liu Jingjing Liu Jianfeng Gao Hongning Wang 41 64 0 24 Aug 2019
Revealing the Dark Secrets of BERT Olga Kovaleva Alexey Romanov Anna Rogers Anna Rumshisky 22 551 0 21 Aug 2019
What Does BERT Look At? An Analysis of BERT's Attention Kevin Clark Urvashi Khandelwal Omer Levy Christopher D. Manning MILM 170 1,586 0 11 Jun 2019
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference R. Thomas McCoy Ellie Pavlick Tal Linzen 94 1,226 0 04 Feb 2019
Fairwashing: the risk of rationalization Ulrich Aïvodji Hiromi Arai O. Fortineau Sébastien Gambs Satoshi Hara Alain Tapp FaML 28 146 0 28 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 751 93,936 0 11 Oct 2018
Robustness May Be at Odds with Accuracy Dimitris Tsipras Shibani Santurkar Logan Engstrom Alexander Turner Aleksander Madry AAML 54 1,772 0 30 May 2018
Generating Natural Language Adversarial Examples M. Alzantot Yash Sharma Ahmed Elgohary Bo-Jhang Ho Mani B. Srivastava Kai-Wei Chang AAML 327 921 0 21 Apr 2018
Adversarial Examples for Evaluating Reading Comprehension Systems Robin Jia Percy Liang AAML ELM 171 1,594 0 23 Jul 2017
Axiomatic Attribution for Deep Networks Mukund Sundararajan Ankur Taly Qiqi Yan OOD FAtt 65 5,920 0 04 Mar 2017
Yelp Dataset Challenge: Review Rating Prediction Nabiha Asghar 24 167 0 17 May 2016
Counter-fitting Word Vectors to Linguistic Constraints N. Mrksic Diarmuid Ó Séaghdha Blaise Thomson Milica Gasic L. Rojas-Barahona Pei-hao Su David Vandyke Tsung-Hsien Wen S. Young 49 483 0 02 Mar 2016
"Why Should I Trust You?": Explaining the Predictions of Any Classifier Marco Tulio Ribeiro Sameer Singh Carlos Guestrin FAtt FaML 338 16,765 0 16 Feb 2016
Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering Ruining He Julian McAuley 52 2,048 0 04 Feb 2016