ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.07667
  4. Cited By
Null It Out: Guarding Protected Attributes by Iterative Nullspace
  Projection

Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

16 April 2020
Shauli Ravfogel
Yanai Elazar
Hila Gonen
Michael Twiton
Yoav Goldberg
ArXivPDFHTML

Papers citing "Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection"

50 / 260 papers shown
Title
Mitigating Group-Level Fairness Disparities in Federated Visual Language Models
Mitigating Group-Level Fairness Disparities in Federated Visual Language Models
Chaomeng Chen
Zitong Yu
Jin Song Dong
Sen Su
L. Shen
Shutao Xia
Xiaochun Cao
FedML
VLM
205
0
0
03 May 2025
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation
Aviv Slobodkin
Hagai Taitelbaum
Yonatan Bitton
Brian Gordon
Michal Sokolik
...
Almog Gueta
Royi Rassin
Itay Laish
Dani Lischinski
Idan Szpektor
EGVM
VGen
43
0
0
24 Apr 2025
FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering
FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering
Heng Chang
Zhiting Fan
Ruizhe Chen
Xiaotang Gai
Luqi Gong
Yan Zhang
Zuozhu Liu
LLMSV
40
1
0
20 Apr 2025
On Linear Representations and Pretraining Data Frequency in Language Models
On Linear Representations and Pretraining Data Frequency in Language Models
Jack Merullo
Noah A. Smith
Sarah Wiegreffe
Yanai Elazar
44
0
0
16 Apr 2025
Bias Beyond English: Evaluating Social Bias and Debiasing Methods in a Low-Resource Setting
Bias Beyond English: Evaluating Social Bias and Debiasing Methods in a Low-Resource Setting
Ej Zhou
Weiming Lu
31
0
0
15 Apr 2025
Deep Fair Learning: A Unified Framework for Fine-tuning Representations with Sufficient Networks
Deep Fair Learning: A Unified Framework for Fine-tuning Representations with Sufficient Networks
Enze Shi
Linglong Kong
Bei Jiang
FaML
FedML
78
0
0
08 Apr 2025
GraphSeg: Segmented 3D Representations via Graph Edge Addition and Contraction
GraphSeg: Segmented 3D Representations via Graph Edge Addition and Contraction
Haozhan Tang
Tianyi Zhang
Oliver Kroemer
Matthew Johnson-Roberson
Weiming Zhi
3DPC
61
0
0
04 Apr 2025
Fair Sufficient Representation Learning
Fair Sufficient Representation Learning
Xueyu Zhou
Chun Yin IP
Jian Huang
FaML
55
0
0
29 Mar 2025
Fundamental Limits of Perfect Concept Erasure
Fundamental Limits of Perfect Concept Erasure
Somnath Basu Roy Chowdhury
Avinava Dubey
Ahmad Beirami
Rahul Kidambi
Nicholas Monath
Amr Ahmed
Snigdha Chaturvedi
66
1
0
25 Mar 2025
BiasEdit: Debiasing Stereotyped Language Models via Model Editing
Xin Xu
Wei Xu
N. Zhang
Julian McAuley
KELM
46
0
0
11 Mar 2025
Fair Text Classification via Transferable Representations
Thibaud Leteno
Michael Perrot
Charlotte Laclau
Antoine Gourru
Christophe Gravier
FaML
88
0
0
10 Mar 2025
Post-Hoc Concept Disentanglement: From Correlated to Isolated Concept Representations
Eren Erogullari
Sebastian Lapuschkin
Wojciech Samek
Frederik Pahde
LLMSV
CoGe
65
0
0
07 Mar 2025
LUNAR: LLM Unlearning via Neural Activation Redirection
LUNAR: LLM Unlearning via Neural Activation Redirection
William F. Shen
Xinchi Qiu
Meghdad Kurmanji
Alex Iacob
Lorenzo Sani
Yihong Chen
Nicola Cancedda
Nicholas D. Lane
MU
56
1
0
11 Feb 2025
Representation in large language models
Cameron C. Yetman
41
1
0
03 Jan 2025
Concept-ROT: Poisoning Concepts in Large Language Models with Model
  Editing
Concept-ROT: Poisoning Concepts in Large Language Models with Model Editing
Keltin Grimes
Marco Christiani
David Shriver
Marissa Connor
KELM
85
1
0
17 Dec 2024
Bias Vector: Mitigating Biases in Language Models with Task Arithmetic
  Approach
Bias Vector: Mitigating Biases in Language Models with Task Arithmetic Approach
Daiki Shirafuji
Makoto Takenaka
Shinya Taguchi
LLMAG
77
0
0
16 Dec 2024
Joint Vision-Language Social Bias Removal for CLIP
Joint Vision-Language Social Bias Removal for CLIP
Haoyu Zhang
Yangyang Guo
Mohan S. Kankanhalli
VLM
72
0
0
19 Nov 2024
Bias in Large Language Models: Origin, Evaluation, and Mitigation
Yufei Guo
Muzhe Guo
Juntao Su
Zhou Yang
Mengqiu Zhu
Hongfei Li
Mengyang Qiu
Shuo Shuo Liu
AILaw
33
10
0
16 Nov 2024
Gumbel Counterfactual Generation From Language Models
Gumbel Counterfactual Generation From Language Models
Shauli Ravfogel
Anej Svete
Vésteinn Snæbjarnarson
Ryan Cotterell
LRM
CML
33
0
0
11 Nov 2024
Focus On This, Not That! Steering LLMs With Adaptive Feature Specification
Focus On This, Not That! Steering LLMs With Adaptive Feature Specification
Tom A. Lamb
Adam Davies
Alasdair Paren
Philip Torr
Francesco Pinto
52
0
0
30 Oct 2024
Attention Speaks Volumes: Localizing and Mitigating Bias in Language
  Models
Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models
Rishabh Adiga
Besmira Nushi
Varun Chandrasekaran
57
0
0
29 Oct 2024
Does Differential Privacy Impact Bias in Pretrained NLP Models?
Does Differential Privacy Impact Bias in Pretrained NLP Models?
Md. Khairul Islam
Andrew Wang
Tianhao Wang
Yangfeng Ji
Judy Fox
Jieyu Zhao
AI4CE
31
0
0
24 Oct 2024
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
Yu Zhao
Alessio Devoto
Giwon Hong
Xiaotang Du
Aryo Pradipta Gema
Hongru Wang
Xuanli He
Kam-Fai Wong
Pasquale Minervini
KELM
LLMSV
45
17
0
21 Oct 2024
Seeing Through VisualBERT: A Causal Adventure on Memetic Landscapes
Seeing Through VisualBERT: A Causal Adventure on Memetic Landscapes
Dibyanayan Bandyopadhyay
Mohammed Hasanuzzaman
Asif Ekbal
AAML
39
0
0
17 Oct 2024
FairGLVQ: Fairness in Partition-Based Classification
FairGLVQ: Fairness in Partition-Based Classification
Felix Störck
Fabian Hinder
Johannes Brinkrolf
Benjamin Paassen
Valerie Vaquet
Barbara Hammer
VLM
FaML
33
0
0
16 Oct 2024
Improving Instruction-Following in Language Models through Activation Steering
Improving Instruction-Following in Language Models through Activation Steering
Alessandro Stolfo
Vidhisha Balachandran
Safoora Yousefi
Eric Horvitz
Besmira Nushi
LLMSV
64
18
0
15 Oct 2024
Collapsed Language Models Promote Fairness
Collapsed Language Models Promote Fairness
Jingxuan Xu
Wuyang Chen
Linyi Li
Yao Zhao
Yunchao Wei
46
0
0
06 Oct 2024
OD-Stega: LLM-Based Near-Imperceptible Steganography via Optimized
  Distributions
OD-Stega: LLM-Based Near-Imperceptible Steganography via Optimized Distributions
Yu-Shin Huang
Peter Just
Krishna Narayanan
Chao Tian
49
1
0
06 Oct 2024
Robust LLM safeguarding via refusal feature adversarial training
Robust LLM safeguarding via refusal feature adversarial training
L. Yu
Virginie Do
Karen Hambardzumyan
Nicola Cancedda
AAML
62
10
0
30 Sep 2024
Unlabeled Debiasing in Downstream Tasks via Class-wise Low Variance
  Regularization
Unlabeled Debiasing in Downstream Tasks via Class-wise Low Variance Regularization
Shahed Masoudian
Markus Frohmann
Navid Rekabsaz
Markus Schedl
28
0
0
29 Sep 2024
The Lou Dataset -- Exploring the Impact of Gender-Fair Language in
  German Text Classification
The Lou Dataset -- Exploring the Impact of Gender-Fair Language in German Text Classification
Andreas Waldis
Joel Birrer
Anne Lauscher
Iryna Gurevych
41
1
0
26 Sep 2024
NPAT Null-Space Projected Adversarial Training Towards Zero
  Deterioration
NPAT Null-Space Projected Adversarial Training Towards Zero Deterioration
Hanyi Hu
Qiao Han
Kui Chen
Yao Yang
AAML
33
0
0
18 Sep 2024
Recurrent Neural Networks Learn to Store and Generate Sequences using
  Non-Linear Representations
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Róbert Csordás
Christopher Potts
Christopher D. Manning
Atticus Geiger
GAN
36
16
0
20 Aug 2024
MABR: Multilayer Adversarial Bias Removal Without Prior Bias Knowledge
MABR: Multilayer Adversarial Bias Removal Without Prior Bias Knowledge
Maxwell J. Yin
Boyu Wang
Charles Ling
42
0
0
10 Aug 2024
The Quest for the Right Mediator: A History, Survey, and Theoretical
  Grounding of Causal Interpretability
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Aaron Mueller
Jannik Brinkmann
Millicent Li
Samuel Marks
Koyena Pal
...
Arnab Sen Sharma
Jiuding Sun
Eric Todd
David Bau
Yonatan Belinkov
CML
52
19
0
02 Aug 2024
Fairness in Large Language Models in Three Hours
Fairness in Large Language Models in Three Hours
Thang Doan Viet
Zichong Wang
Minh Nhat Nguyen
Wenbin Zhang
51
8
0
02 Aug 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
Nitay Calderon
Roi Reichart
42
13
0
27 Jul 2024
Towards Transfer Unlearning: Empirical Evidence of Cross-Domain Bias
  Mitigation
Towards Transfer Unlearning: Empirical Evidence of Cross-Domain Bias Mitigation
Huimin Lu
Masaru Isonuma
Junichiro Mori
Ichiro Sakata
MU
39
0
0
24 Jul 2024
Balancing the Scales: Reinforcement Learning for Fair Classification
Balancing the Scales: Reinforcement Learning for Fair Classification
Leon Eshuijs
Shihan Wang
Antske Fokkens
FaML
32
0
0
15 Jul 2024
Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for
  Interpreting Neural Networks
Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks
Aaron Mueller
CML
36
10
0
05 Jul 2024
NLPGuard: A Framework for Mitigating the Use of Protected Attributes by
  NLP Classifiers
NLPGuard: A Framework for Mitigating the Use of Protected Attributes by NLP Classifiers
Salvatore Greco
Ke Zhou
L. Capra
Tania Cerquitelli
Daniele Quercia
36
2
0
01 Jul 2024
Refusal in Language Models Is Mediated by a Single Direction
Refusal in Language Models Is Mediated by a Single Direction
Andy Arditi
Oscar Obeso
Aaquib Syed
Daniel Paleka
Nina Panickssery
Wes Gurnee
Neel Nanda
50
138
0
17 Jun 2024
Exploring Safety-Utility Trade-Offs in Personalized Language Models
Exploring Safety-Utility Trade-Offs in Personalized Language Models
Anvesh Rao Vijjini
Somnath Basu Roy Chowdhury
Snigdha Chaturvedi
56
7
0
17 Jun 2024
On the Encoding of Gender in Transformer-based ASR Representations
On the Encoding of Gender in Transformer-based ASR Representations
Aravind Krishnan
Badr M. Abdullah
Dietrich Klakow
49
2
0
14 Jun 2024
Interpreting the Weight Space of Customized Diffusion Models
Interpreting the Weight Space of Customized Diffusion Models
Amil Dravid
Yossi Gandelsman
Kuan-Chieh Jackson Wang
Rameen Abdal
Gordon Wetzstein
Alexei A. Efros
Kfir Aberman
34
10
0
13 Jun 2024
Deconstructing The Ethics of Large Language Models from Long-standing
  Issues to New-emerging Dilemmas
Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas
Chengyuan Deng
Yiqun Duan
Xin Jin
Heng Chang
Yijun Tian
...
Kuofeng Gao
Sihong He
Jun Zhuang
Lu Cheng
Haohan Wang
AILaw
46
16
0
08 Jun 2024
Towards Understanding Task-agnostic Debiasing Through the Lenses of
  Intrinsic Bias and Forgetfulness
Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness
Guangliang Liu
Milad Afshari
Xitong Zhang
Zhiyu Xue
Avrajit Ghosh
Bidhan Bashyal
Rongrong Wang
K. Johnson
32
0
0
06 Jun 2024
Probing the Category of Verbal Aspect in Transformer Language Models
Probing the Category of Verbal Aspect in Transformer Language Models
Anisia Katinskaia
R. Yangarber
61
2
0
04 Jun 2024
The Life Cycle of Large Language Models: A Review of Biases in Education
The Life Cycle of Large Language Models: A Review of Biases in Education
Jinsook Lee
Yann Hicke
Renzhe Yu
Christopher A. Brooks
René F. Kizilcec
AI4Ed
42
1
0
03 Jun 2024
Large Language Models as Recommender Systems: A Study of Popularity Bias
Large Language Models as Recommender Systems: A Study of Popularity Bias
Jan Malte Lichtenberg
Alexander K. Buchholz
Pola Schwöbel
50
6
0
03 Jun 2024
123456
Next