Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.07893
Cited By
Gold Doesn't Always Glitter: Spectral Removal of Linear and Nonlinear Guarded Attribute Information
15 March 2022
Shun Shao
Yftah Ziser
Shay B. Cohen
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Gold Doesn't Always Glitter: Spectral Removal of Linear and Nonlinear Guarded Attribute Information"
8 / 8 papers shown
Title
Improving Instruction-Following in Language Models through Activation Steering
Alessandro Stolfo
Vidhisha Balachandran
Safoora Yousefi
Eric Horvitz
Besmira Nushi
LLMSV
64
18
0
15 Oct 2024
Uncovering Intermediate Variables in Transformers using Circuit Probing
Michael A. Lepori
Thomas Serre
Ellie Pavlick
78
7
0
07 Nov 2023
A Joint Matrix Factorization Analysis of Multilingual Representations
Zheng Zhao
Yftah Ziser
Bonnie Webber
Shay B. Cohen
32
2
0
24 Oct 2023
LEACE: Perfect linear concept erasure in closed form
Nora Belrose
David Schneider-Joseph
Shauli Ravfogel
Ryan Cotterell
Edward Raff
Stella Biderman
KELM
MU
41
102
0
06 Jun 2023
Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection
Shadi Iskander
Kira Radinsky
Yonatan Belinkov
44
17
0
17 May 2023
Better Hit the Nail on the Head than Beat around the Bush: Removing Protected Attributes with a Single Projection
P. Haghighatkhah
Antske Fokkens
Pia Sommerauer
Bettina Speckmann
Kevin Verbeek
32
10
0
08 Dec 2022
Log-linear Guardedness and its Implications
Shauli Ravfogel
Yoav Goldberg
Ryan Cotterell
28
2
0
18 Oct 2022
Linear Adversarial Concept Erasure
Shauli Ravfogel
Michael Twiton
Yoav Goldberg
Ryan Cotterell
KELM
84
57
0
28 Jan 2022
1