Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.07667
Cited By
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
16 April 2020
Shauli Ravfogel
Yanai Elazar
Hila Gonen
Michael Twiton
Yoav Goldberg
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection"
50 / 260 papers shown
Title
Counterfactually Probing Language Identity in Multilingual Models
Anirudh Srinivasan
Venkata S Govindarajan
Kyle Mahowald
34
1
0
29 Oct 2023
Probing LLMs for Joint Encoding of Linguistic Categories
Giulio Starace
Konstantinos Papakostas
Rochelle Choenni
Apostolos Panagiotopoulos
Matteo Rosati
Alina Leidinger
Ekaterina Shutova
27
5
0
28 Oct 2023
Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint
Junghyun Lee
Hanseul Cho
Se-Young Yun
Chulhee Yun
38
5
0
28 Oct 2023
Do Not Harm Protected Groups in Debiasing Language Representation Models
Chloe Qinyu Zhu
Rickard Stureborg
Brandon Fain
29
0
0
27 Oct 2023
Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training
Max Müller-Eberstein
Rob van der Goot
Barbara Plank
Ivan Titov
8
8
0
25 Oct 2023
Verb Conjugation in Transformers Is Determined by Linear Encodings of Subject Number
Sophie Hao
Tal Linzen
19
5
0
23 Oct 2023
A Novel Information-Theoretic Objective to Disentangle Representations for Fair Classification
Pierre Colombo
Nathan Noiry
Guillaume Staerman
Pablo Piantanida
FaML
DRL
38
1
0
21 Oct 2023
Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language Model
Abhijith Chintam
Rahel Beloch
Willem H. Zuidema
Michael Hanna
Oskar van der Wal
28
16
0
19 Oct 2023
Fast Model Debias with Machine Unlearning
Ruizhe Chen
Jianfei Yang
Huimin Xiong
Jianhong Bai
Tianxiang Hu
Jinxiang Hao
Yang Feng
Qiufeng Wang
Jian Wu
Zuo-Qiang Liu
MU
37
60
0
19 Oct 2023
Co
2
^2
2
PT: Mitigating Bias in Pre-trained Language Models through Counterfactual Contrastive Prompt Tuning
Xiangjue Dong
Ziwei Zhu
Zhuoer Wang
Maria Teleki
James Caverlee
47
11
0
19 Oct 2023
Removing Spurious Concepts from Neural Network Representations via Joint Subspace Estimation
Floris Holstege
Bram Wouters
Noud van Giersbergen
C. Diks
34
1
0
18 Oct 2023
Emptying the Ocean with a Spoon: Should We Edit Models?
Yuval Pinter
Michael Elhadad
KELM
27
26
0
18 Oct 2023
The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models
Aviv Slobodkin
Omer Goldman
Avi Caciularu
Ido Dagan
Shauli Ravfogel
HILM
LRM
54
24
0
18 Oct 2023
Investigating Bias in Multilingual Language Models: Cross-Lingual Transfer of Debiasing Techniques
Manon Reusens
Philipp Borchert
Margot Mieskes
Jochen De Weerdt
Bart Baesens
40
8
0
16 Oct 2023
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks
Vaidehi Patil
Peter Hase
Joey Tianyi Zhou
KELM
AAML
31
100
0
29 Sep 2023
Boosting Fair Classifier Generalization through Adaptive Priority Reweighing
Zhihao Hu
Yiran Xu
Mengnan Du
Jindong Gu
Xinmei Tian
Fengxiang He
41
1
0
15 Sep 2023
Bias and Fairness in Large Language Models: A Survey
Isabel O. Gallegos
Ryan A. Rossi
Joe Barrow
Md Mehrab Tanjim
Sungchul Kim
Franck Dernoncourt
Tong Yu
Ruiyi Zhang
Nesreen Ahmed
AILaw
40
498
0
02 Sep 2023
A Survey on Fairness in Large Language Models
Yingji Li
Mengnan Du
Rui Song
Xin Wang
Ying Wang
ALM
54
60
0
20 Aug 2023
Explainable AI for clinical risk prediction: a survey of concepts, methods, and modalities
Munib Mesinovic
Peter Watkinson
Ting Zhu
FaML
24
3
0
16 Aug 2023
Arithmetic with Language Models: from Memorization to Computation
Davide Maltoni
Matteo Ferrara
KELM
LRM
47
5
0
02 Aug 2023
A Geometric Notion of Causal Probing
Clément Guerner
Anej Svete
Tianyu Liu
Alex Warstadt
Ryan Cotterell
LLMSV
41
12
0
27 Jul 2023
Gender-tuning: Empowering Fine-tuning for Debiasing Pre-trained Language Models
Somayeh Ghanbarzadeh
Yan-ping Huang
Hamid Palangi
R. C. Moreno
Hamed Khanpour
42
12
0
20 Jul 2023
Building Socio-culturally Inclusive Stereotype Resources with Community Engagement
Sunipa Dev
J. Goyal
Dinesh Tewari
Shachi Dave
Vinodkumar Prabhakaran
30
22
0
20 Jul 2023
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Yanda Chen
Ruiqi Zhong
Narutatsu Ri
Chen Zhao
He He
Jacob Steinhardt
Zhou Yu
Kathleen McKeown
LRM
34
47
0
17 Jul 2023
Evaluating Biased Attitude Associations of Language Models in an Intersectional Context
Shiva Omrani Sabbaghi
Robert Wolfe
Aylin Caliskan
26
22
0
07 Jul 2023
Concept-Based Explanations to Test for False Causal Relationships Learned by Abusive Language Classifiers
I. Nejadgholi
S. Kiritchenko
Kathleen C. Fraser
Esma Balkir
26
0
0
04 Jul 2023
Prompt Tuning Pushes Farther, Contrastive Learning Pulls Closer: A Two-Stage Approach to Mitigate Social Biases
Yingji Li
Mengnan Du
Xin Wang
Ying Wang
53
27
0
04 Jul 2023
Operationalising Representation in Natural Language Processing
J. Harding
33
12
0
14 Jun 2023
Sociodemographic Bias in Language Models: A Survey and Forward Path
Vipul Gupta
Pranav Narayanan Venkit
Shomir Wilson
R. Passonneau
47
22
0
13 Jun 2023
Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions
Himanshu Thakur
Atishay Jain
Praneetha Vaddamanu
Paul Pu Liang
Louis-Philippe Morency
47
31
0
07 Jun 2023
An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models
Zhongbin Xie
Thomas Lukasiewicz
33
12
0
06 Jun 2023
LEACE: Perfect linear concept erasure in closed form
Nora Belrose
David Schneider-Joseph
Shauli Ravfogel
Ryan Cotterell
Edward Raff
Stella Biderman
KELM
MU
41
103
0
06 Jun 2023
Controlling Learned Effects to Reduce Spurious Correlations in Text Classifiers
Parikshit Bansal
Amit Sharma
CML
26
5
0
26 May 2023
Counterfactual Probing for the Influence of Affect and Specificity on Intergroup Bias
Venkata S Govindarajan
Kyle Mahowald
David Beaver
Junjie Li
28
2
0
25 May 2023
Trade-Offs Between Fairness and Privacy in Language Modeling
Cleo Matzken
Steffen Eger
Ivan Habernal
SILM
41
6
0
24 May 2023
Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in Multilingual Machine Translation
Minwoo Lee
Hyukhun Koh
Kang-il Lee
Dongdong Zhang
Minsu Kim
Kyomin Jung
35
9
0
23 May 2023
Word Embeddings Are Steers for Language Models
Chi Han
Jialiang Xu
Manling Li
Yi R. Fung
Chenkai Sun
Nan Jiang
Tarek Abdelzaher
Heng Ji
LLMSV
32
29
0
22 May 2023
Transferring Fairness using Multi-Task Learning with Limited Demographic Information
Carlos Alejandro Aguirre
Mark Dredze
35
0
0
22 May 2023
Fair Without Leveling Down: A New Intersectional Fairness Definition
Gaurav Maheshwari
A. Bellet
Pascal Denis
Mikaela Keller
FaML
39
2
0
21 May 2023
UP5: Unbiased Foundation Model for Fairness-aware Recommendation
Wenyue Hua
Yingqiang Ge
Shuyuan Xu
Jianchao Ji
Yongfeng Zhang
31
51
0
20 May 2023
SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models
Akshita Jha
Aida Mostafazadeh Davani
Chandan K. Reddy
Shachi Dave
Vinodkumar Prabhakaran
Sunipa Dev
34
42
0
19 May 2023
Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection
Shadi Iskander
Kira Radinsky
Yonatan Belinkov
46
17
0
17 May 2023
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Zhengxuan Wu
Atticus Geiger
Thomas Icard
Christopher Potts
Noah D. Goodman
MILM
44
82
0
15 May 2023
Interventional Probing in High Dimensions: An NLI Case Study
Julia Rozanova
Marco Valentino
Lucas C. Cordeiro
André Freitas
16
7
0
20 Apr 2023
Effectiveness of Debiasing Techniques: An Indigenous Qualitative Analysis
Vithya Yogarajan
Gillian Dobbie
Henry Gouk
19
3
0
17 Apr 2023
Evaluation of Social Biases in Recent Large Pre-Trained Models
Swapnil Sharma
Nikita Anand
V. KranthiKiranG.
Alind Jain
26
0
0
13 Apr 2023
Inspecting and Editing Knowledge Representations in Language Models
Evan Hernandez
Belinda Z. Li
Jacob Andreas
KELM
27
80
0
03 Apr 2023
DualFair: Fair Representation Learning at Both Group and Individual Levels via Contrastive Self-supervision
Sungwon Han
Seungeon Lee
Fangzhao Wu
Sundong Kim
Chuhan Wu
Xiting Wang
Xing Xie
M. Cha
FaML
28
6
0
15 Mar 2023
Logic Against Bias: Textual Entailment Mitigates Stereotypical Sentence Reasoning
Hongyin Luo
James R. Glass
NAI
29
7
0
10 Mar 2023
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Atticus Geiger
Zhengxuan Wu
Christopher Potts
Thomas Icard
Noah D. Goodman
CML
75
101
0
05 Mar 2023
Previous
1
2
3
4
5
6
Next