Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

16 April 2020

Papers citing "Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection"

50 / 260 papers shown

Title
Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation Bar Iluz Yanai Elazar Asaf Yehudai Gabriel Stanovsky 43 1 0 02 Jun 2024
A Scoping Review of Earth Observation and Machine Learning for Causal Inference: Implications for the Geography of Poverty Kazuki Sakamoto Connor Jerzak Adel Daoud 43 3 0 30 May 2024
Synthetic Data Generation for Intersectional Fairness by Leveraging Hierarchical Group Structure Gaurav Maheshwari A. Bellet Pascal Denis Mikaela Keller 57 1 0 23 May 2024
Spectral Editing of Activations for Large Language Model Alignment Yifu Qiu Zheng Zhao Yftah Ziser Anna Korhonen Edoardo Ponti Shay B. Cohen KELM LLMSV 28 16 0 15 May 2024
Large Language Model Bias Mitigation from the Perspective of Knowledge Editing Ruizhe Chen Yichen Li Zikai Xiao Zuo-Qiang Liu KELM 40 13 0 15 May 2024
A Philosophical Introduction to Language Models - Part II: The Way Forward Raphael Milliere Cameron Buckner LRM 66 14 0 06 May 2024
The Trade-off between Performance, Efficiency, and Fairness in Adapter Modules for Text Classification Minh Duc Bui K. Wense 36 0 0 03 May 2024
Mechanistic Interpretability for AI Safety -- A Review Leonard Bereska E. Gavves AI4CE 45 118 0 22 Apr 2024
Reactive Model Correction: Mitigating Harm to Task-Relevant Features via Conditional Bias Suppression Dilyara Bareeva Maximilian Dreyer Frederik Pahde Wojciech Samek Sebastian Lapuschkin KELM 67 1 0 15 Apr 2024
Digital Forgetting in Large Language Models: A Survey of Unlearning Methods Alberto Blanco-Justicia N. Jebreel Benet Manzanares-Salor David Sánchez Josep Domingo-Ferrer Guillem Collell Kuan Eeik Tan KELM MU 60 17 0 02 Apr 2024
A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias Yuemei Xu Ling Hu Jiayi Zhao Zihan Qiu Yuqi Ye Hanwen Gu LRM 32 37 0 01 Apr 2024
Fairness in Large Language Models: A Taxonomic Survey Zhibo Chu Zichong Wang Wenbin Zhang AILaw 48 33 0 31 Mar 2024
Addressing Both Statistical and Causal Gender Fairness in NLP Models Hannah Chen Yangfeng Ji David Evans 31 2 0 30 Mar 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models Samuel Marks Can Rager Eric J. Michaud Yonatan Belinkov David Bau Aaron Mueller 46 121 0 28 Mar 2024
Debiasing Sentence Embedders through Contrastive Word Pairs Philip Kenneweg Sarah Schröder Alexander Schulz Barbara Hammer 49 0 0 27 Mar 2024
Can Large Language Models (or Humans) Disentangle Text? Nicolas Audinet de Pieuchon Adel Daoud Connor Jerzak Moa Johansson Richard Johansson 50 0 0 25 Mar 2024
What Happens to a Dataset Transformed by a Projection-based Concept Removal Method? Richard Johansson 34 0 0 24 Mar 2024
FairSTG: Countering performance heterogeneity via collaborative sample-level optimization Gengyu Lin Zhen-Qiang Zhou Qihe Huang Kuo Yang Shifen Cheng Yang Wang AI4TS 32 1 0 19 Mar 2024
Investigating grammatical abstraction in language models using few-shot learning of novel noun gender Priyanka Sukumaran Conor Houghton N. Kazanina 46 0 0 15 Mar 2024
Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction Ziyang Xu Keqin Peng Liang Ding Dacheng Tao Xiliang Lu 34 10 0 15 Mar 2024
Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information Shadi Iskander Kira Radinsky Yonatan Belinkov 56 4 0 14 Mar 2024
Ethos: Rectifying Language Models in Orthogonal Parameter Space Lei Gao Yue Niu Tingting Tang A. Avestimehr Murali Annavaram MU 40 10 0 13 Mar 2024
AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs Sana Ebrahimi Kaiwen Chen Abolfazl Asudeh Gautam Das Nick Koudas 27 4 0 01 Mar 2024
On the Scaling Laws of Geographical Representation in Language Models Nathan Godey Eric Villemonte de la Clergerie Benoît Sagot 51 6 0 29 Feb 2024
Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps Giuseppe Attanasio Beatrice Savoldi Dennis Fucci Dirk Hovy 39 4 0 28 Feb 2024
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations Jing-ling Huang Zhengxuan Wu Christopher Potts Mor Geva Atticus Geiger 62 28 0 27 Feb 2024
MultiContrievers: Analysis of Dense Retrieval Representations Seraphina Goldfarb-Tarrant Pedro Rodriguez Jane Dwivedi-Yu Patrick Lewis 38 1 0 24 Feb 2024
A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models Ashutosh Sathe Prachi Jain Sunayana Sitaram 60 1 0 21 Feb 2024
From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings Aishik Rakshit Smriti Singh Shuvam Keshari Arijit Ghosh Chowdhury Vinija Jain Aman Chadha 37 1 0 18 Feb 2024
Representation Surgery: Theory and Practice of Affine Steering Shashwat Singh Shauli Ravfogel Jonathan Herzig Roee Aharoni Ryan Cotterell Ponnurangam Kumaraguru LLMSV 35 13 0 15 Feb 2024
A survey of recent methods for addressing AI fairness and bias in biomedicine Yifan Yang Mingquan Lin Han Zhao Yifan Peng Furong Huang Zhiyong Lu 37 15 0 13 Feb 2024
MAFIA: Multi-Adapter Fused Inclusive LanguAge Models Prachi Jain Ashutosh Sathe Varun Gumma Kabir Ahuja Sunayana Sitaram 28 1 0 12 Feb 2024
Measuring machine learning harms from stereotypes: requires understanding who is being harmed by which errors in what ways Angelina Wang Xuechunzi Bai Solon Barocas Su Lin Blodgett FaML 52 5 0 06 Feb 2024
Enhancing Robustness in Biomedical NLI Models: A Probing Approach for Clinical Trials Ata Mustafa AAML 26 0 0 04 Feb 2024
Explaining Text Classifiers with Counterfactual Representations Pirmin Lemberger Antoine Saillenfest 44 0 0 01 Feb 2024
Effective Controllable Bias Mitigation for Classification and Retrieval using Gate Adapters Shahed Masoudian Cornelia Volaucnik Markus Schedl Navid Rekabsaz 21 5 0 29 Jan 2024
A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments Zhengxuan Wu Atticus Geiger Jing-ling Huang Aryaman Arora Thomas Icard Christopher Potts Noah D. Goodman 36 6 0 23 Jan 2024
From Bytes to Biases: Investigating the Cultural Self-Perception of Large Language Models Wolfgang Messner Tatum Greene Josephine Matalone 35 4 0 21 Dec 2023
Taxonomy-based CheckList for Large Language Model Evaluation Damin Zhang 27 0 0 15 Dec 2023
Understanding the Effect of Model Compression on Social Bias in Large Language Models Gustavo Gonçalves Emma Strubell 23 10 0 09 Dec 2023
Emergence and Function of Abstract Representations in Self-Supervised Transformers Quentin RV. Ferry Joshua Ching Takashi Kawai 32 2 0 08 Dec 2023
Tackling Bias in Pre-trained Language Models: Current Trends and Under-represented Societies Vithya Yogarajan Gillian Dobbie Te Taka Keegan R. Neuwirth ALM 54 11 0 03 Dec 2023
PEFTDebias : Capturing debiasing information using PEFTs Sumit Agarwal Aditya Srikanth Veerubhotla Srijan Bansal 22 3 0 01 Dec 2023
Robust Concept Erasure via Kernelized Rate-Distortion Maximization Somnath Basu Roy Chowdhury Nicholas Monath Kumar Avinava Dubey Amr Ahmed Snigdha Chaturvedi 34 4 0 30 Nov 2023
Fair Text Classification with Wasserstein Independence Thibaud Leteno Antoine Gourru Charlotte Laclau Rémi Emonet Christophe Gravier FaML 32 2 0 21 Nov 2023
Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion Kerem Zaman Leshem Choshen Shashank Srivastava KELM MoMe 30 10 0 13 Nov 2023
Gen-Z: Generative Zero-Shot Text Classification with Contextualized Label Descriptions Sachin Kumar Chan Young Park Yulia Tsvetkov VLM 30 2 0 13 Nov 2023
All Should Be Equal in the Eyes of Language Models: Counterfactually Aware Fair Text Generation Pragyan Banerjee Abhinav Java Surgan Jandial Simra Shahid Shaz Furniturewala Balaji Krishnamurthy S. Bhatia 33 3 0 09 Nov 2023
Large Human Language Models: A Need and the Challenges Nikita Soni H. Andrew Schwartz João Sedoc Niranjan Balasubramanian ALM AI4CE 30 11 0 09 Nov 2023
Uncovering Intermediate Variables in Transformers using Circuit Probing Michael A. Lepori Thomas Serre Ellie Pavlick 78 7 0 07 Nov 2023