Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2201.12191
Cited By
Kernelized Concept Erasure
28 January 2022
Shauli Ravfogel
Francisco Vargas
Yoav Goldberg
Ryan Cotterell
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Kernelized Concept Erasure"
31 / 31 papers shown
Title
Fundamental Limits of Perfect Concept Erasure
Somnath Basu Roy Chowdhury
Avinava Dubey
Ahmad Beirami
Rahul Kidambi
Nicholas Monath
Amr Ahmed
Snigdha Chaturvedi
61
0
0
25 Mar 2025
Gumbel Counterfactual Generation From Language Models
Shauli Ravfogel
Anej Svete
Vésteinn Snæbjarnarson
Ryan Cotterell
LRM
CML
33
1
0
11 Nov 2024
Focus On This, Not That! Steering LLMs With Adaptive Feature Specification
Tom A. Lamb
Adam Davies
Alasdair Paren
Philip H. S. Torr
Francesco Pinto
47
0
0
30 Oct 2024
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Aaron Mueller
Jannik Brinkmann
Millicent Li
Samuel Marks
Koyena Pal
...
Arnab Sen Sharma
Jiuding Sun
Eric Todd
David Bau
Yonatan Belinkov
CML
52
18
0
02 Aug 2024
Machine Unlearning Fails to Remove Data Poisoning Attacks
Martin Pawelczyk
Jimmy Z. Di
Yiwei Lu
Gautam Kamath
Ayush Sekhari
Seth Neel
AAML
MU
60
8
0
25 Jun 2024
Protecting Privacy Through Approximating Optimal Parameters for Sequence Unlearning in Language Models
Dohyun Lee
Daniel Rim
Minseok Choi
Jaegul Choo
PILM
MU
62
4
0
20 Jun 2024
Exploring Safety-Utility Trade-Offs in Personalized Language Models
Anvesh Rao Vijjini
Somnath Basu Roy Chowdhury
Snigdha Chaturvedi
51
6
0
17 Jun 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
46
115
0
28 Mar 2024
Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information
Shadi Iskander
Kira Radinsky
Yonatan Belinkov
45
4
0
14 Mar 2024
Representation Surgery: Theory and Practice of Affine Steering
Shashwat Singh
Shauli Ravfogel
Jonathan Herzig
Roee Aharoni
Ryan Cotterell
Ponnurangam Kumaraguru
LLMSV
27
13
0
15 Feb 2024
Explaining Text Classifiers with Counterfactual Representations
Pirmin Lemberger
Antoine Saillenfest
39
0
0
01 Feb 2024
The Ethics of Automating Legal Actors
Josef Valvoda
Alec Thompson
Ryan Cotterell
Simone Teufel
AILaw
ELM
24
1
0
01 Dec 2023
Robust Concept Erasure via Kernelized Rate-Distortion Maximization
Somnath Basu Roy Chowdhury
Nicholas Monath
Kumar Avinava Dubey
Amr Ahmed
Snigdha Chaturvedi
32
4
0
30 Nov 2023
Gen-Z: Generative Zero-Shot Text Classification with Contextualized Label Descriptions
Sachin Kumar
Chan Young Park
Yulia Tsvetkov
VLM
30
2
0
13 Nov 2023
Counterfactually Probing Language Identity in Multilingual Models
Anirudh Srinivasan
Venkata S Govindarajan
Kyle Mahowald
23
1
0
29 Oct 2023
Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint
Junghyun Lee
Hanseul Cho
Se-Young Yun
Chulhee Yun
35
5
0
28 Oct 2023
How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation
Marco Gaido
Dennis Fucci
Matteo Negri
L. Bentivogli
37
2
0
23 Oct 2023
Removing Spurious Concepts from Neural Network Representations via Joint Subspace Estimation
Floris Holstege
Bram Wouters
Noud van Giersbergen
C. Diks
34
1
0
18 Oct 2023
In-Context Unlearning: Language Models as Few Shot Unlearners
Martin Pawelczyk
Seth Neel
Himabindu Lakkaraju
MU
28
100
0
11 Oct 2023
LEACE: Perfect linear concept erasure in closed form
Nora Belrose
David Schneider-Joseph
Shauli Ravfogel
Ryan Cotterell
Edward Raff
Stella Biderman
KELM
MU
41
102
0
06 Jun 2023
Counterfactual Probing for the Influence of Affect and Specificity on Intergroup Bias
Venkata S Govindarajan
Kyle Mahowald
David Beaver
J. Li
15
2
0
25 May 2023
Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection
Shadi Iskander
Kira Radinsky
Yonatan Belinkov
30
17
0
17 May 2023
Emergent and Predictable Memorization in Large Language Models
Stella Biderman
USVSN Sai Prashanth
Lintang Sutawika
Hailey Schoelkopf
Quentin G. Anthony
Shivanshu Purohit
Edward Raf
29
116
0
21 Apr 2023
Competence-Based Analysis of Language Models
Adam Davies
Jize Jiang
Chengxiang Zhai
ELM
29
4
0
01 Mar 2023
Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of Foundation Models
Peter Henderson
E. Mitchell
Christopher D. Manning
Dan Jurafsky
Chelsea Finn
23
47
0
27 Nov 2022
Probing Classifiers are Unreliable for Concept Removal and Detection
Abhinav Kumar
Chenhao Tan
Amit Sharma
AAML
31
20
0
08 Jul 2022
Naturalistic Causal Probing for Morpho-Syntax
Afra Amini
Tiago Pimentel
Clara Meister
Ryan Cotterell
MILM
106
18
0
14 May 2022
Probing for the Usage of Grammatical Number
Karim Lasri
Tiago Pimentel
Alessandro Lenci
Thierry Poibeau
Ryan Cotterell
35
55
0
19 Apr 2022
Linear Adversarial Concept Erasure
Shauli Ravfogel
Michael Twiton
Yoav Goldberg
Ryan Cotterell
KELM
81
57
0
28 Jan 2022
On the Global Optima of Kernelized Adversarial Representation Learning
Bashir Sadeghi
Runyi Yu
Vishnu Naresh Boddeti
AAML
67
31
0
16 Oct 2019
Efficient Estimation of Word Representations in Vector Space
Tomáš Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
275
31,267
0
16 Jan 2013
1