NLPGuard: A Framework for Mitigating the Use of Protected Attributes by NLP Classifiers

1 July 2024

Papers citing "NLPGuard: A Framework for Mitigating the Use of Protected Attributes by NLP Classifiers"

21 / 21 papers shown

Title
Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection Indira Sen Mattia Samory Claudia Wagner Isabelle Augenstein 56 17 0 09 May 2022
Simple data balancing achieves competitive worst-group-accuracy Badr Youbi Idrissi Martín Arjovsky Mohammad Pezeshki David Lopez-Paz 106 181 0 27 Oct 2021
Mitigating Racial Biases in Toxic Language Detection with an Equity-Based Ensemble Framework Matan Halevy Camille Harris A. Bruckman Diyi Yang A. Howard 83 36 0 27 Sep 2021
A Diagnostic Study of Explainability Techniques for Text Classification Pepa Atanasova J. Simonsen Christina Lioma Isabelle Augenstein XAI FAtt 71 224 0 25 Sep 2020
Why Fairness Cannot Be Automated: Bridging the Gap Between EU Non-Discrimination Law and AI Sandra Wachter Brent Mittelstadt Chris Russell FaML 56 280 0 12 May 2020
An Investigation of Why Overparameterization Exacerbates Spurious Correlations Shiori Sagawa Aditi Raghunathan Pang Wei Koh Percy Liang 182 379 0 09 May 2020
Demographics Should Not Be the Reason of Toxicity: Mitigating Discrimination in Text Classifications with Instance Weighting Guanhua Zhang Bing Bai Junqi Zhang Kun Bai Conghui Zhu Tiejun Zhao 63 71 0 29 Apr 2020
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection Shauli Ravfogel Yanai Elazar Hila Gonen Michael Twiton Yoav Goldberg 122 381 0 16 Apr 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Colin Raffel Noam M. Shazeer Adam Roberts Katherine Lee Sharan Narang Michael Matena Yanqi Zhou Wei Li Peter J. Liu AIMat 399 20,114 0 23 Oct 2019
Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack Emily Dinan Samuel Humeau Bharath Chintagunta Jason Weston 73 244 0 17 Aug 2019
Mitigating Gender Bias in Natural Language Processing: Literature Review Tony Sun Andrew Gaut Shirlyn Tang Yuxin Huang Mai Elsherief Jieyu Zhao Diba Mirza E. Belding-Royer Kai-Wei Chang William Yang Wang AI4CE 105 556 0 21 Jun 2019
Racial Bias in Hate Speech and Abusive Language Detection Datasets Thomas Davidson Debasmita Bhattacharya Ingmar Weber 96 457 0 29 May 2019
Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting Maria De-Arteaga Alexey Romanov Hanna M. Wallach J. Chayes C. Borgs Alexandra Chouldechova S. Geyik K. Kenthapadi Adam Tauman Kalai 171 455 0 27 Jan 2019
Privacy Preserving Machine Learning: Threats and Solutions Mohammad Al-Rubaie Jerome Chang 49 336 0 27 Mar 2018
A Survey Of Methods For Explaining Black Box Models Riccardo Guidotti A. Monreale Salvatore Ruggieri Franco Turini D. Pedreschi F. Giannotti XAI 124 3,954 0 06 Feb 2018
A Unified Approach to Interpreting Model Predictions Scott M. Lundberg Su-In Lee FAtt 1.1K 21,864 0 22 May 2017
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Ramprasaath R. Selvaraju Michael Cogswell Abhishek Das Ramakrishna Vedantam Devi Parikh Dhruv Batra FAtt 272 19,981 0 07 Oct 2016
European Union regulations on algorithmic decision-making and a "right to explanation" B. Goodman Seth Flaxman FaML AILaw 63 1,899 0 28 Jun 2016
Explaining Predictions of Non-Linear Classifiers in NLP L. Arras F. Horn G. Montavon K. Müller Wojciech Samek FAtt 74 117 0 23 Jun 2016
"Why Should I Trust You?": Explaining the Predictions of Any Classifier Marco Tulio Ribeiro Sameer Singh Carlos Guestrin FAtt FaML 1.2K 16,954 0 16 Feb 2016
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 1.7K 150,006 0 22 Dec 2014