Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.07667
Cited By
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
16 April 2020
Shauli Ravfogel
Yanai Elazar
Hila Gonen
Michael Twiton
Yoav Goldberg
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection"
50 / 260 papers shown
Title
Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation
Bar Iluz
Yanai Elazar
Asaf Yehudai
Gabriel Stanovsky
43
1
0
02 Jun 2024
A Scoping Review of Earth Observation and Machine Learning for Causal Inference: Implications for the Geography of Poverty
Kazuki Sakamoto
Connor Jerzak
Adel Daoud
43
3
0
30 May 2024
Synthetic Data Generation for Intersectional Fairness by Leveraging Hierarchical Group Structure
Gaurav Maheshwari
A. Bellet
Pascal Denis
Mikaela Keller
57
1
0
23 May 2024
Spectral Editing of Activations for Large Language Model Alignment
Yifu Qiu
Zheng Zhao
Yftah Ziser
Anna Korhonen
Edoardo Ponti
Shay B. Cohen
KELM
LLMSV
28
16
0
15 May 2024
Large Language Model Bias Mitigation from the Perspective of Knowledge Editing
Ruizhe Chen
Yichen Li
Zikai Xiao
Zuo-Qiang Liu
KELM
40
13
0
15 May 2024
A Philosophical Introduction to Language Models - Part II: The Way Forward
Raphael Milliere
Cameron Buckner
LRM
66
14
0
06 May 2024
The Trade-off between Performance, Efficiency, and Fairness in Adapter Modules for Text Classification
Minh Duc Bui
K. Wense
36
0
0
03 May 2024
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
45
118
0
22 Apr 2024
Reactive Model Correction: Mitigating Harm to Task-Relevant Features via Conditional Bias Suppression
Dilyara Bareeva
Maximilian Dreyer
Frederik Pahde
Wojciech Samek
Sebastian Lapuschkin
KELM
67
1
0
15 Apr 2024
Digital Forgetting in Large Language Models: A Survey of Unlearning Methods
Alberto Blanco-Justicia
N. Jebreel
Benet Manzanares-Salor
David Sánchez
Josep Domingo-Ferrer
Guillem Collell
Kuan Eeik Tan
KELM
MU
60
17
0
02 Apr 2024
A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias
Yuemei Xu
Ling Hu
Jiayi Zhao
Zihan Qiu
Yuqi Ye
Hanwen Gu
LRM
32
37
0
01 Apr 2024
Fairness in Large Language Models: A Taxonomic Survey
Zhibo Chu
Zichong Wang
Wenbin Zhang
AILaw
48
33
0
31 Mar 2024
Addressing Both Statistical and Causal Gender Fairness in NLP Models
Hannah Chen
Yangfeng Ji
David Evans
31
2
0
30 Mar 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
46
121
0
28 Mar 2024
Debiasing Sentence Embedders through Contrastive Word Pairs
Philip Kenneweg
Sarah Schröder
Alexander Schulz
Barbara Hammer
49
0
0
27 Mar 2024
Can Large Language Models (or Humans) Disentangle Text?
Nicolas Audinet de Pieuchon
Adel Daoud
Connor Jerzak
Moa Johansson
Richard Johansson
50
0
0
25 Mar 2024
What Happens to a Dataset Transformed by a Projection-based Concept Removal Method?
Richard Johansson
34
0
0
24 Mar 2024
FairSTG: Countering performance heterogeneity via collaborative sample-level optimization
Gengyu Lin
Zhen-Qiang Zhou
Qihe Huang
Kuo Yang
Shifen Cheng
Yang Wang
AI4TS
32
1
0
19 Mar 2024
Investigating grammatical abstraction in language models using few-shot learning of novel noun gender
Priyanka Sukumaran
Conor Houghton
N. Kazanina
46
0
0
15 Mar 2024
Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction
Ziyang Xu
Keqin Peng
Liang Ding
Dacheng Tao
Xiliang Lu
34
10
0
15 Mar 2024
Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information
Shadi Iskander
Kira Radinsky
Yonatan Belinkov
56
4
0
14 Mar 2024
Ethos: Rectifying Language Models in Orthogonal Parameter Space
Lei Gao
Yue Niu
Tingting Tang
A. Avestimehr
Murali Annavaram
MU
40
10
0
13 Mar 2024
AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs
Sana Ebrahimi
Kaiwen Chen
Abolfazl Asudeh
Gautam Das
Nick Koudas
27
4
0
01 Mar 2024
On the Scaling Laws of Geographical Representation in Language Models
Nathan Godey
Eric Villemonte de la Clergerie
Benoît Sagot
51
6
0
29 Feb 2024
Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps
Giuseppe Attanasio
Beatrice Savoldi
Dennis Fucci
Dirk Hovy
39
4
0
28 Feb 2024
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
Jing-ling Huang
Zhengxuan Wu
Christopher Potts
Mor Geva
Atticus Geiger
62
28
0
27 Feb 2024
MultiContrievers: Analysis of Dense Retrieval Representations
Seraphina Goldfarb-Tarrant
Pedro Rodriguez
Jane Dwivedi-Yu
Patrick Lewis
38
1
0
24 Feb 2024
A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models
Ashutosh Sathe
Prachi Jain
Sunayana Sitaram
60
1
0
21 Feb 2024
From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings
Aishik Rakshit
Smriti Singh
Shuvam Keshari
Arijit Ghosh Chowdhury
Vinija Jain
Aman Chadha
37
1
0
18 Feb 2024
Representation Surgery: Theory and Practice of Affine Steering
Shashwat Singh
Shauli Ravfogel
Jonathan Herzig
Roee Aharoni
Ryan Cotterell
Ponnurangam Kumaraguru
LLMSV
35
13
0
15 Feb 2024
A survey of recent methods for addressing AI fairness and bias in biomedicine
Yifan Yang
Mingquan Lin
Han Zhao
Yifan Peng
Furong Huang
Zhiyong Lu
37
15
0
13 Feb 2024
MAFIA: Multi-Adapter Fused Inclusive LanguAge Models
Prachi Jain
Ashutosh Sathe
Varun Gumma
Kabir Ahuja
Sunayana Sitaram
28
1
0
12 Feb 2024
Measuring machine learning harms from stereotypes: requires understanding who is being harmed by which errors in what ways
Angelina Wang
Xuechunzi Bai
Solon Barocas
Su Lin Blodgett
FaML
52
5
0
06 Feb 2024
Enhancing Robustness in Biomedical NLI Models: A Probing Approach for Clinical Trials
Ata Mustafa
AAML
26
0
0
04 Feb 2024
Explaining Text Classifiers with Counterfactual Representations
Pirmin Lemberger
Antoine Saillenfest
44
0
0
01 Feb 2024
Effective Controllable Bias Mitigation for Classification and Retrieval using Gate Adapters
Shahed Masoudian
Cornelia Volaucnik
Markus Schedl
Navid Rekabsaz
21
5
0
29 Jan 2024
A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments
Zhengxuan Wu
Atticus Geiger
Jing-ling Huang
Aryaman Arora
Thomas Icard
Christopher Potts
Noah D. Goodman
36
6
0
23 Jan 2024
From Bytes to Biases: Investigating the Cultural Self-Perception of Large Language Models
Wolfgang Messner
Tatum Greene
Josephine Matalone
35
4
0
21 Dec 2023
Taxonomy-based CheckList for Large Language Model Evaluation
Damin Zhang
27
0
0
15 Dec 2023
Understanding the Effect of Model Compression on Social Bias in Large Language Models
Gustavo Gonçalves
Emma Strubell
23
10
0
09 Dec 2023
Emergence and Function of Abstract Representations in Self-Supervised Transformers
Quentin RV. Ferry
Joshua Ching
Takashi Kawai
32
2
0
08 Dec 2023
Tackling Bias in Pre-trained Language Models: Current Trends and Under-represented Societies
Vithya Yogarajan
Gillian Dobbie
Te Taka Keegan
R. Neuwirth
ALM
54
11
0
03 Dec 2023
PEFTDebias : Capturing debiasing information using PEFTs
Sumit Agarwal
Aditya Srikanth Veerubhotla
Srijan Bansal
22
3
0
01 Dec 2023
Robust Concept Erasure via Kernelized Rate-Distortion Maximization
Somnath Basu Roy Chowdhury
Nicholas Monath
Kumar Avinava Dubey
Amr Ahmed
Snigdha Chaturvedi
34
4
0
30 Nov 2023
Fair Text Classification with Wasserstein Independence
Thibaud Leteno
Antoine Gourru
Charlotte Laclau
Rémi Emonet
Christophe Gravier
FaML
32
2
0
21 Nov 2023
Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion
Kerem Zaman
Leshem Choshen
Shashank Srivastava
KELM
MoMe
30
10
0
13 Nov 2023
Gen-Z: Generative Zero-Shot Text Classification with Contextualized Label Descriptions
Sachin Kumar
Chan Young Park
Yulia Tsvetkov
VLM
30
2
0
13 Nov 2023
All Should Be Equal in the Eyes of Language Models: Counterfactually Aware Fair Text Generation
Pragyan Banerjee
Abhinav Java
Surgan Jandial
Simra Shahid
Shaz Furniturewala
Balaji Krishnamurthy
S. Bhatia
33
3
0
09 Nov 2023
Large Human Language Models: A Need and the Challenges
Nikita Soni
H. Andrew Schwartz
João Sedoc
Niranjan Balasubramanian
ALM
AI4CE
30
11
0
09 Nov 2023
Uncovering Intermediate Variables in Transformers using Circuit Probing
Michael A. Lepori
Thomas Serre
Ellie Pavlick
78
7
0
07 Nov 2023
Previous
1
2
3
4
5
6
Next