Exploring the Linear Subspace Hypothesis in Gender Bias Mitigation

20 September 2020

Papers citing "Exploring the Linear Subspace Hypothesis in Gender Bias Mitigation"

18 / 18 papers shown

Title
Do Large Language Models know who did what to whom? Joseph M. Denning Xiaohan Bryor Snefjella Idan A. Blank 62 1 0 23 Apr 2025
Robustly identifying concepts introduced during chat fine-tuning using crosscoders Julian Minder Clement Dumas Caden Juang Bilal Chugtai Neel Nanda 29 0 0 03 Apr 2025
Controllable Context Sensitivity and the Knob Behind It Julian Minder Kevin Du Niklas Stoehr Giovanni Monea Chris Wendler Robert West Ryan Cotterell KELM 58 3 0 11 Nov 2024
Gumbel Counterfactual Generation From Language Models Shauli Ravfogel Anej Svete Vésteinn Snæbjarnarson Ryan Cotterell LRM CML 33 0 0 11 Nov 2024
Towards a theory of model distillation Enric Boix-Adserà FedML VLM 44 6 0 14 Mar 2024
This Reads Like That: Deep Learning for Interpretable Natural Language Processing Claudio Fanconi Moritz Vandenhirtz Severin Husmann Julia E. Vogt FAtt 14 2 0 25 Oct 2023
Removing Spurious Concepts from Neural Network Representations via Joint Subspace Estimation Floris Holstege Bram Wouters Noud van Giersbergen C. Diks 34 1 0 18 Oct 2023
Competence-Based Analysis of Language Models Adam Davies Jize Jiang Chengxiang Zhai ELM 29 4 0 01 Mar 2023
Unsupervised Detection of Contextualized Embedding Bias with Application to Ideology Valentin Hofmann J. Pierrehumbert Hinrich Schütze 22 0 0 14 Dec 2022
Better Hit the Nail on the Head than Beat around the Bush: Removing Protected Attributes with a Single Projection P. Haghighatkhah Antske Fokkens Pia Sommerauer Bettina Speckmann Kevin Verbeek 32 10 0 08 Dec 2022
Debiasing Methods for Fairer Neural Models in Vision and Language Research: A Survey Otávio Parraga Martin D. Móre C. M. Oliveira Nathan Gavenski L. S. Kupssinskü Adilson Medronha L. V. Moura Gabriel S. Simões Rodrigo C. Barros 45 18 0 10 Nov 2022
Kernelized Concept Erasure Shauli Ravfogel Francisco Vargas Yoav Goldberg Ryan Cotterell 24 32 0 28 Jan 2022
Linear Adversarial Concept Erasure Shauli Ravfogel Michael Twiton Yoav Goldberg Ryan Cotterell KELM 81 57 0 28 Jan 2022
A Word on Machine Ethics: A Response to Jiang et al. (2021) Zeerak Talat Hagen Blix Josef Valvoda M. I. Ganesh Ryan Cotterell Adina Williams SyDa FaML 96 38 0 07 Nov 2021
On a Benefit of Mask Language Modeling: Robustness to Simplicity Bias Ting-Rui Chiang 32 3 0 11 Oct 2021
Assessing the Reliability of Word Embedding Gender Bias Measures Yupei Du Qixiang Fang D. Nguyen 46 21 0 10 Sep 2021
The Low-Dimensional Linear Geometry of Contextualized Word Representations Evan Hernandez Jacob Andreas MILM 25 40 0 15 May 2021
WordBias: An Interactive Visual Tool for Discovering Intersectional Biases Encoded in Word Embeddings Bhavya Ghai Md. Naimul Hoque Klaus Mueller 29 26 0 05 Mar 2021