ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.07667
  4. Cited By
Null It Out: Guarding Protected Attributes by Iterative Nullspace
  Projection

Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

16 April 2020
Shauli Ravfogel
Yanai Elazar
Hila Gonen
Michael Twiton
Yoav Goldberg
ArXivPDFHTML

Papers citing "Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection"

50 / 260 papers shown
Title
Applying Intrinsic Debiasing on Downstream Tasks: Challenges and
  Considerations for Machine Translation
Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation
Bar Iluz
Yanai Elazar
Asaf Yehudai
Gabriel Stanovsky
43
1
0
02 Jun 2024
A Scoping Review of Earth Observation and Machine Learning for Causal Inference: Implications for the Geography of Poverty
A Scoping Review of Earth Observation and Machine Learning for Causal Inference: Implications for the Geography of Poverty
Kazuki Sakamoto
Connor Jerzak
Adel Daoud
43
3
0
30 May 2024
Synthetic Data Generation for Intersectional Fairness by Leveraging
  Hierarchical Group Structure
Synthetic Data Generation for Intersectional Fairness by Leveraging Hierarchical Group Structure
Gaurav Maheshwari
A. Bellet
Pascal Denis
Mikaela Keller
57
1
0
23 May 2024
Spectral Editing of Activations for Large Language Model Alignment
Spectral Editing of Activations for Large Language Model Alignment
Yifu Qiu
Zheng Zhao
Yftah Ziser
Anna Korhonen
Edoardo Ponti
Shay B. Cohen
KELM
LLMSV
28
16
0
15 May 2024
Large Language Model Bias Mitigation from the Perspective of Knowledge
  Editing
Large Language Model Bias Mitigation from the Perspective of Knowledge Editing
Ruizhe Chen
Yichen Li
Zikai Xiao
Zuo-Qiang Liu
KELM
40
13
0
15 May 2024
A Philosophical Introduction to Language Models - Part II: The Way
  Forward
A Philosophical Introduction to Language Models - Part II: The Way Forward
Raphael Milliere
Cameron Buckner
LRM
66
14
0
06 May 2024
The Trade-off between Performance, Efficiency, and Fairness in Adapter
  Modules for Text Classification
The Trade-off between Performance, Efficiency, and Fairness in Adapter Modules for Text Classification
Minh Duc Bui
K. Wense
36
0
0
03 May 2024
Mechanistic Interpretability for AI Safety -- A Review
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
45
118
0
22 Apr 2024
Reactive Model Correction: Mitigating Harm to Task-Relevant Features via
  Conditional Bias Suppression
Reactive Model Correction: Mitigating Harm to Task-Relevant Features via Conditional Bias Suppression
Dilyara Bareeva
Maximilian Dreyer
Frederik Pahde
Wojciech Samek
Sebastian Lapuschkin
KELM
67
1
0
15 Apr 2024
Digital Forgetting in Large Language Models: A Survey of Unlearning
  Methods
Digital Forgetting in Large Language Models: A Survey of Unlearning Methods
Alberto Blanco-Justicia
N. Jebreel
Benet Manzanares-Salor
David Sánchez
Josep Domingo-Ferrer
Guillem Collell
Kuan Eeik Tan
KELM
MU
60
17
0
02 Apr 2024
A Survey on Multilingual Large Language Models: Corpora, Alignment, and
  Bias
A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias
Yuemei Xu
Ling Hu
Jiayi Zhao
Zihan Qiu
Yuqi Ye
Hanwen Gu
LRM
32
37
0
01 Apr 2024
Fairness in Large Language Models: A Taxonomic Survey
Fairness in Large Language Models: A Taxonomic Survey
Zhibo Chu
Zichong Wang
Wenbin Zhang
AILaw
48
33
0
31 Mar 2024
Addressing Both Statistical and Causal Gender Fairness in NLP Models
Addressing Both Statistical and Causal Gender Fairness in NLP Models
Hannah Chen
Yangfeng Ji
David Evans
31
2
0
30 Mar 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
46
121
0
28 Mar 2024
Debiasing Sentence Embedders through Contrastive Word Pairs
Debiasing Sentence Embedders through Contrastive Word Pairs
Philip Kenneweg
Sarah Schröder
Alexander Schulz
Barbara Hammer
49
0
0
27 Mar 2024
Can Large Language Models (or Humans) Disentangle Text?
Can Large Language Models (or Humans) Disentangle Text?
Nicolas Audinet de Pieuchon
Adel Daoud
Connor Jerzak
Moa Johansson
Richard Johansson
50
0
0
25 Mar 2024
What Happens to a Dataset Transformed by a Projection-based Concept
  Removal Method?
What Happens to a Dataset Transformed by a Projection-based Concept Removal Method?
Richard Johansson
34
0
0
24 Mar 2024
FairSTG: Countering performance heterogeneity via collaborative
  sample-level optimization
FairSTG: Countering performance heterogeneity via collaborative sample-level optimization
Gengyu Lin
Zhen-Qiang Zhou
Qihe Huang
Kuo Yang
Shifen Cheng
Yang Wang
AI4TS
32
1
0
19 Mar 2024
Investigating grammatical abstraction in language models using few-shot
  learning of novel noun gender
Investigating grammatical abstraction in language models using few-shot learning of novel noun gender
Priyanka Sukumaran
Conor Houghton
N. Kazanina
46
0
0
15 Mar 2024
Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias
  in Factual Knowledge Extraction
Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction
Ziyang Xu
Keqin Peng
Liang Ding
Dacheng Tao
Xiliang Lu
34
10
0
15 Mar 2024
Leveraging Prototypical Representations for Mitigating Social Bias
  without Demographic Information
Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information
Shadi Iskander
Kira Radinsky
Yonatan Belinkov
56
4
0
14 Mar 2024
Ethos: Rectifying Language Models in Orthogonal Parameter Space
Ethos: Rectifying Language Models in Orthogonal Parameter Space
Lei Gao
Yue Niu
Tingting Tang
A. Avestimehr
Murali Annavaram
MU
40
10
0
13 Mar 2024
AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language
  Model Outputs
AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs
Sana Ebrahimi
Kaiwen Chen
Abolfazl Asudeh
Gautam Das
Nick Koudas
27
4
0
01 Mar 2024
On the Scaling Laws of Geographical Representation in Language Models
On the Scaling Laws of Geographical Representation in Language Models
Nathan Godey
Eric Villemonte de la Clergerie
Benoît Sagot
51
6
0
29 Feb 2024
Twists, Humps, and Pebbles: Multilingual Speech Recognition Models
  Exhibit Gender Performance Gaps
Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps
Giuseppe Attanasio
Beatrice Savoldi
Dennis Fucci
Dirk Hovy
39
4
0
28 Feb 2024
RAVEL: Evaluating Interpretability Methods on Disentangling Language
  Model Representations
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
Jing-ling Huang
Zhengxuan Wu
Christopher Potts
Mor Geva
Atticus Geiger
62
28
0
27 Feb 2024
MultiContrievers: Analysis of Dense Retrieval Representations
MultiContrievers: Analysis of Dense Retrieval Representations
Seraphina Goldfarb-Tarrant
Pedro Rodriguez
Jane Dwivedi-Yu
Patrick Lewis
38
1
0
24 Feb 2024
A Unified Framework and Dataset for Assessing Societal Bias in
  Vision-Language Models
A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models
Ashutosh Sathe
Prachi Jain
Sunayana Sitaram
60
1
0
21 Feb 2024
From Prejudice to Parity: A New Approach to Debiasing Large Language
  Model Word Embeddings
From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings
Aishik Rakshit
Smriti Singh
Shuvam Keshari
Arijit Ghosh Chowdhury
Vinija Jain
Aman Chadha
37
1
0
18 Feb 2024
Representation Surgery: Theory and Practice of Affine Steering
Representation Surgery: Theory and Practice of Affine Steering
Shashwat Singh
Shauli Ravfogel
Jonathan Herzig
Roee Aharoni
Ryan Cotterell
Ponnurangam Kumaraguru
LLMSV
35
13
0
15 Feb 2024
A survey of recent methods for addressing AI fairness and bias in
  biomedicine
A survey of recent methods for addressing AI fairness and bias in biomedicine
Yifan Yang
Mingquan Lin
Han Zhao
Yifan Peng
Furong Huang
Zhiyong Lu
37
15
0
13 Feb 2024
MAFIA: Multi-Adapter Fused Inclusive LanguAge Models
MAFIA: Multi-Adapter Fused Inclusive LanguAge Models
Prachi Jain
Ashutosh Sathe
Varun Gumma
Kabir Ahuja
Sunayana Sitaram
28
1
0
12 Feb 2024
Measuring machine learning harms from stereotypes: requires
  understanding who is being harmed by which errors in what ways
Measuring machine learning harms from stereotypes: requires understanding who is being harmed by which errors in what ways
Angelina Wang
Xuechunzi Bai
Solon Barocas
Su Lin Blodgett
FaML
52
5
0
06 Feb 2024
Enhancing Robustness in Biomedical NLI Models: A Probing Approach for
  Clinical Trials
Enhancing Robustness in Biomedical NLI Models: A Probing Approach for Clinical Trials
Ata Mustafa
AAML
26
0
0
04 Feb 2024
Explaining Text Classifiers with Counterfactual Representations
Explaining Text Classifiers with Counterfactual Representations
Pirmin Lemberger
Antoine Saillenfest
44
0
0
01 Feb 2024
Effective Controllable Bias Mitigation for Classification and Retrieval
  using Gate Adapters
Effective Controllable Bias Mitigation for Classification and Retrieval using Gate Adapters
Shahed Masoudian
Cornelia Volaucnik
Markus Schedl
Navid Rekabsaz
21
5
0
29 Jan 2024
A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments
A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments
Zhengxuan Wu
Atticus Geiger
Jing-ling Huang
Aryaman Arora
Thomas Icard
Christopher Potts
Noah D. Goodman
36
6
0
23 Jan 2024
From Bytes to Biases: Investigating the Cultural Self-Perception of
  Large Language Models
From Bytes to Biases: Investigating the Cultural Self-Perception of Large Language Models
Wolfgang Messner
Tatum Greene
Josephine Matalone
35
4
0
21 Dec 2023
Taxonomy-based CheckList for Large Language Model Evaluation
Taxonomy-based CheckList for Large Language Model Evaluation
Damin Zhang
27
0
0
15 Dec 2023
Understanding the Effect of Model Compression on Social Bias in Large
  Language Models
Understanding the Effect of Model Compression on Social Bias in Large Language Models
Gustavo Gonçalves
Emma Strubell
23
10
0
09 Dec 2023
Emergence and Function of Abstract Representations in Self-Supervised
  Transformers
Emergence and Function of Abstract Representations in Self-Supervised Transformers
Quentin RV. Ferry
Joshua Ching
Takashi Kawai
32
2
0
08 Dec 2023
Tackling Bias in Pre-trained Language Models: Current Trends and
  Under-represented Societies
Tackling Bias in Pre-trained Language Models: Current Trends and Under-represented Societies
Vithya Yogarajan
Gillian Dobbie
Te Taka Keegan
R. Neuwirth
ALM
54
11
0
03 Dec 2023
PEFTDebias : Capturing debiasing information using PEFTs
PEFTDebias : Capturing debiasing information using PEFTs
Sumit Agarwal
Aditya Srikanth Veerubhotla
Srijan Bansal
22
3
0
01 Dec 2023
Robust Concept Erasure via Kernelized Rate-Distortion Maximization
Robust Concept Erasure via Kernelized Rate-Distortion Maximization
Somnath Basu Roy Chowdhury
Nicholas Monath
Kumar Avinava Dubey
Amr Ahmed
Snigdha Chaturvedi
34
4
0
30 Nov 2023
Fair Text Classification with Wasserstein Independence
Fair Text Classification with Wasserstein Independence
Thibaud Leteno
Antoine Gourru
Charlotte Laclau
Rémi Emonet
Christophe Gravier
FaML
32
2
0
21 Nov 2023
Fuse to Forget: Bias Reduction and Selective Memorization through Model
  Fusion
Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion
Kerem Zaman
Leshem Choshen
Shashank Srivastava
KELM
MoMe
30
10
0
13 Nov 2023
Gen-Z: Generative Zero-Shot Text Classification with Contextualized
  Label Descriptions
Gen-Z: Generative Zero-Shot Text Classification with Contextualized Label Descriptions
Sachin Kumar
Chan Young Park
Yulia Tsvetkov
VLM
30
2
0
13 Nov 2023
All Should Be Equal in the Eyes of Language Models: Counterfactually
  Aware Fair Text Generation
All Should Be Equal in the Eyes of Language Models: Counterfactually Aware Fair Text Generation
Pragyan Banerjee
Abhinav Java
Surgan Jandial
Simra Shahid
Shaz Furniturewala
Balaji Krishnamurthy
S. Bhatia
33
3
0
09 Nov 2023
Large Human Language Models: A Need and the Challenges
Large Human Language Models: A Need and the Challenges
Nikita Soni
H. Andrew Schwartz
João Sedoc
Niranjan Balasubramanian
ALM
AI4CE
30
11
0
09 Nov 2023
Uncovering Intermediate Variables in Transformers using Circuit Probing
Uncovering Intermediate Variables in Transformers using Circuit Probing
Michael A. Lepori
Thomas Serre
Ellie Pavlick
78
7
0
07 Nov 2023
Previous
123456
Next