Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

16 April 2020

Papers citing "Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection"

50 / 260 papers shown

Title
Counterfactually Probing Language Identity in Multilingual Models Anirudh Srinivasan Venkata S Govindarajan Kyle Mahowald 34 1 0 29 Oct 2023
Probing LLMs for Joint Encoding of Linguistic Categories Giulio Starace Konstantinos Papakostas Rochelle Choenni Apostolos Panagiotopoulos Matteo Rosati Alina Leidinger Ekaterina Shutova 27 5 0 28 Oct 2023
Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint Junghyun Lee Hanseul Cho Se-Young Yun Chulhee Yun 38 5 0 28 Oct 2023
Do Not Harm Protected Groups in Debiasing Language Representation Models Chloe Qinyu Zhu Rickard Stureborg Brandon Fain 29 0 0 27 Oct 2023
Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training Max Müller-Eberstein Rob van der Goot Barbara Plank Ivan Titov 8 8 0 25 Oct 2023
Verb Conjugation in Transformers Is Determined by Linear Encodings of Subject Number Sophie Hao Tal Linzen 19 5 0 23 Oct 2023
A Novel Information-Theoretic Objective to Disentangle Representations for Fair Classification Pierre Colombo Nathan Noiry Guillaume Staerman Pablo Piantanida FaML DRL 38 1 0 21 Oct 2023
Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language Model Abhijith Chintam Rahel Beloch Willem H. Zuidema Michael Hanna Oskar van der Wal 28 16 0 19 Oct 2023
Fast Model Debias with Machine Unlearning Ruizhe Chen Jianfei Yang Huimin Xiong Jianhong Bai Tianxiang Hu Jinxiang Hao Yang Feng Qiufeng Wang Jian Wu Zuo-Qiang Liu MU 37 60 0 19 Oct 2023
Co $^2$ PT: Mitigating Bias in Pre-trained Language Models through Counterfactual Contrastive Prompt Tuning Xiangjue Dong Ziwei Zhu Zhuoer Wang Maria Teleki James Caverlee 47 11 0 19 Oct 2023
Removing Spurious Concepts from Neural Network Representations via Joint Subspace Estimation Floris Holstege Bram Wouters Noud van Giersbergen C. Diks 34 1 0 18 Oct 2023
Emptying the Ocean with a Spoon: Should We Edit Models? Yuval Pinter Michael Elhadad KELM 27 26 0 18 Oct 2023
The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models Aviv Slobodkin Omer Goldman Avi Caciularu Ido Dagan Shauli Ravfogel HILM LRM 54 24 0 18 Oct 2023
Investigating Bias in Multilingual Language Models: Cross-Lingual Transfer of Debiasing Techniques Manon Reusens Philipp Borchert Margot Mieskes Jochen De Weerdt Bart Baesens 40 8 0 16 Oct 2023
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks Vaidehi Patil Peter Hase Joey Tianyi Zhou KELM AAML 31 100 0 29 Sep 2023
Boosting Fair Classifier Generalization through Adaptive Priority Reweighing Zhihao Hu Yiran Xu Mengnan Du Jindong Gu Xinmei Tian Fengxiang He 41 1 0 15 Sep 2023
Bias and Fairness in Large Language Models: A Survey Isabel O. Gallegos Ryan A. Rossi Joe Barrow Md Mehrab Tanjim Sungchul Kim Franck Dernoncourt Tong Yu Ruiyi Zhang Nesreen Ahmed AILaw 40 498 0 02 Sep 2023
A Survey on Fairness in Large Language Models Yingji Li Mengnan Du Rui Song Xin Wang Ying Wang ALM 54 60 0 20 Aug 2023
Explainable AI for clinical risk prediction: a survey of concepts, methods, and modalities Munib Mesinovic Peter Watkinson Ting Zhu FaML 24 3 0 16 Aug 2023
Arithmetic with Language Models: from Memorization to Computation Davide Maltoni Matteo Ferrara KELM LRM 47 5 0 02 Aug 2023
A Geometric Notion of Causal Probing Clément Guerner Anej Svete Tianyu Liu Alex Warstadt Ryan Cotterell LLMSV 41 12 0 27 Jul 2023
Gender-tuning: Empowering Fine-tuning for Debiasing Pre-trained Language Models Somayeh Ghanbarzadeh Yan-ping Huang Hamid Palangi R. C. Moreno Hamed Khanpour 42 12 0 20 Jul 2023
Building Socio-culturally Inclusive Stereotype Resources with Community Engagement Sunipa Dev J. Goyal Dinesh Tewari Shachi Dave Vinodkumar Prabhakaran 30 22 0 20 Jul 2023
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations Yanda Chen Ruiqi Zhong Narutatsu Ri Chen Zhao He He Jacob Steinhardt Zhou Yu Kathleen McKeown LRM 34 47 0 17 Jul 2023
Evaluating Biased Attitude Associations of Language Models in an Intersectional Context Shiva Omrani Sabbaghi Robert Wolfe Aylin Caliskan 26 22 0 07 Jul 2023
Concept-Based Explanations to Test for False Causal Relationships Learned by Abusive Language Classifiers I. Nejadgholi S. Kiritchenko Kathleen C. Fraser Esma Balkir 26 0 0 04 Jul 2023
Prompt Tuning Pushes Farther, Contrastive Learning Pulls Closer: A Two-Stage Approach to Mitigate Social Biases Yingji Li Mengnan Du Xin Wang Ying Wang 53 27 0 04 Jul 2023
Operationalising Representation in Natural Language Processing J. Harding 33 12 0 14 Jun 2023
Sociodemographic Bias in Language Models: A Survey and Forward Path Vipul Gupta Pranav Narayanan Venkit Shomir Wilson R. Passonneau 47 22 0 13 Jun 2023
Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions Himanshu Thakur Atishay Jain Praneetha Vaddamanu Paul Pu Liang Louis-Philippe Morency 47 31 0 07 Jun 2023
An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models Zhongbin Xie Thomas Lukasiewicz 33 12 0 06 Jun 2023
LEACE: Perfect linear concept erasure in closed form Nora Belrose David Schneider-Joseph Shauli Ravfogel Ryan Cotterell Edward Raff Stella Biderman KELM MU 41 103 0 06 Jun 2023
Controlling Learned Effects to Reduce Spurious Correlations in Text Classifiers Parikshit Bansal Amit Sharma CML 26 5 0 26 May 2023
Counterfactual Probing for the Influence of Affect and Specificity on Intergroup Bias Venkata S Govindarajan Kyle Mahowald David Beaver Junjie Li 28 2 0 25 May 2023
Trade-Offs Between Fairness and Privacy in Language Modeling Cleo Matzken Steffen Eger Ivan Habernal SILM 41 6 0 24 May 2023
Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in Multilingual Machine Translation Minwoo Lee Hyukhun Koh Kang-il Lee Dongdong Zhang Minsu Kim Kyomin Jung 35 9 0 23 May 2023
Word Embeddings Are Steers for Language Models Chi Han Jialiang Xu Manling Li Yi R. Fung Chenkai Sun Nan Jiang Tarek Abdelzaher Heng Ji LLMSV 32 29 0 22 May 2023
Transferring Fairness using Multi-Task Learning with Limited Demographic Information Carlos Alejandro Aguirre Mark Dredze 35 0 0 22 May 2023
Fair Without Leveling Down: A New Intersectional Fairness Definition Gaurav Maheshwari A. Bellet Pascal Denis Mikaela Keller FaML 39 2 0 21 May 2023
UP5: Unbiased Foundation Model for Fairness-aware Recommendation Wenyue Hua Yingqiang Ge Shuyuan Xu Jianchao Ji Yongfeng Zhang 31 51 0 20 May 2023
SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models Akshita Jha Aida Mostafazadeh Davani Chandan K. Reddy Shachi Dave Vinodkumar Prabhakaran Sunipa Dev 34 42 0 19 May 2023
Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection Shadi Iskander Kira Radinsky Yonatan Belinkov 46 17 0 17 May 2023
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca Zhengxuan Wu Atticus Geiger Thomas Icard Christopher Potts Noah D. Goodman MILM 44 82 0 15 May 2023
Interventional Probing in High Dimensions: An NLI Case Study Julia Rozanova Marco Valentino Lucas C. Cordeiro André Freitas 16 7 0 20 Apr 2023
Effectiveness of Debiasing Techniques: An Indigenous Qualitative Analysis Vithya Yogarajan Gillian Dobbie Henry Gouk 19 3 0 17 Apr 2023
Evaluation of Social Biases in Recent Large Pre-Trained Models Swapnil Sharma Nikita Anand V. KranthiKiranG. Alind Jain 26 0 0 13 Apr 2023
Inspecting and Editing Knowledge Representations in Language Models Evan Hernandez Belinda Z. Li Jacob Andreas KELM 27 80 0 03 Apr 2023
DualFair: Fair Representation Learning at Both Group and Individual Levels via Contrastive Self-supervision Sungwon Han Seungeon Lee Fangzhao Wu Sundong Kim Chuhan Wu Xiting Wang Xing Xie M. Cha FaML 28 6 0 15 Mar 2023
Logic Against Bias: Textual Entailment Mitigates Stereotypical Sentence Reasoning Hongyin Luo James R. Glass NAI 29 7 0 10 Mar 2023
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations Atticus Geiger Zhengxuan Wu Christopher Potts Thomas Icard Noah D. Goodman CML 75 101 0 05 Mar 2023