ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.12191
  4. Cited By
Kernelized Concept Erasure

Kernelized Concept Erasure

28 January 2022
Shauli Ravfogel
Francisco Vargas
Yoav Goldberg
Ryan Cotterell
ArXivPDFHTML

Papers citing "Kernelized Concept Erasure"

31 / 31 papers shown
Title
Fundamental Limits of Perfect Concept Erasure
Fundamental Limits of Perfect Concept Erasure
Somnath Basu Roy Chowdhury
Avinava Dubey
Ahmad Beirami
Rahul Kidambi
Nicholas Monath
Amr Ahmed
Snigdha Chaturvedi
66
0
0
25 Mar 2025
Gumbel Counterfactual Generation From Language Models
Gumbel Counterfactual Generation From Language Models
Shauli Ravfogel
Anej Svete
Vésteinn Snæbjarnarson
Ryan Cotterell
LRM
CML
33
0
0
11 Nov 2024
Focus On This, Not That! Steering LLMs With Adaptive Feature Specification
Focus On This, Not That! Steering LLMs With Adaptive Feature Specification
Tom A. Lamb
Adam Davies
Alasdair Paren
Philip Torr
Francesco Pinto
52
0
0
30 Oct 2024
The Quest for the Right Mediator: A History, Survey, and Theoretical
  Grounding of Causal Interpretability
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Aaron Mueller
Jannik Brinkmann
Millicent Li
Samuel Marks
Koyena Pal
...
Arnab Sen Sharma
Jiuding Sun
Eric Todd
David Bau
Yonatan Belinkov
CML
52
18
0
02 Aug 2024
Machine Unlearning Fails to Remove Data Poisoning Attacks
Machine Unlearning Fails to Remove Data Poisoning Attacks
Martin Pawelczyk
Jimmy Z. Di
Yiwei Lu
Gautam Kamath
Ayush Sekhari
Seth Neel
AAML
MU
62
8
0
25 Jun 2024
Protecting Privacy Through Approximating Optimal Parameters for Sequence
  Unlearning in Language Models
Protecting Privacy Through Approximating Optimal Parameters for Sequence Unlearning in Language Models
Dohyun Lee
Daniel Rim
Minseok Choi
Jaegul Choo
PILM
MU
65
4
0
20 Jun 2024
Exploring Safety-Utility Trade-Offs in Personalized Language Models
Exploring Safety-Utility Trade-Offs in Personalized Language Models
Anvesh Rao Vijjini
Somnath Basu Roy Chowdhury
Snigdha Chaturvedi
53
6
0
17 Jun 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
46
115
0
28 Mar 2024
Leveraging Prototypical Representations for Mitigating Social Bias
  without Demographic Information
Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information
Shadi Iskander
Kira Radinsky
Yonatan Belinkov
47
4
0
14 Mar 2024
Representation Surgery: Theory and Practice of Affine Steering
Representation Surgery: Theory and Practice of Affine Steering
Shashwat Singh
Shauli Ravfogel
Jonathan Herzig
Roee Aharoni
Ryan Cotterell
Ponnurangam Kumaraguru
LLMSV
32
13
0
15 Feb 2024
Explaining Text Classifiers with Counterfactual Representations
Explaining Text Classifiers with Counterfactual Representations
Pirmin Lemberger
Antoine Saillenfest
39
0
0
01 Feb 2024
The Ethics of Automating Legal Actors
The Ethics of Automating Legal Actors
Josef Valvoda
Alec Thompson
Ryan Cotterell
Simone Teufel
AILaw
ELM
24
1
0
01 Dec 2023
Robust Concept Erasure via Kernelized Rate-Distortion Maximization
Robust Concept Erasure via Kernelized Rate-Distortion Maximization
Somnath Basu Roy Chowdhury
Nicholas Monath
Kumar Avinava Dubey
Amr Ahmed
Snigdha Chaturvedi
32
4
0
30 Nov 2023
Gen-Z: Generative Zero-Shot Text Classification with Contextualized
  Label Descriptions
Gen-Z: Generative Zero-Shot Text Classification with Contextualized Label Descriptions
Sachin Kumar
Chan Young Park
Yulia Tsvetkov
VLM
30
2
0
13 Nov 2023
Counterfactually Probing Language Identity in Multilingual Models
Counterfactually Probing Language Identity in Multilingual Models
Anirudh Srinivasan
Venkata S Govindarajan
Kyle Mahowald
26
1
0
29 Oct 2023
Fair Streaming Principal Component Analysis: Statistical and Algorithmic
  Viewpoint
Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint
Junghyun Lee
Hanseul Cho
Se-Young Yun
Chulhee Yun
38
5
0
28 Oct 2023
How To Build Competitive Multi-gender Speech Translation Models For
  Controlling Speaker Gender Translation
How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation
Marco Gaido
Dennis Fucci
Matteo Negri
L. Bentivogli
37
2
0
23 Oct 2023
Removing Spurious Concepts from Neural Network Representations via Joint
  Subspace Estimation
Removing Spurious Concepts from Neural Network Representations via Joint Subspace Estimation
Floris Holstege
Bram Wouters
Noud van Giersbergen
C. Diks
34
1
0
18 Oct 2023
In-Context Unlearning: Language Models as Few Shot Unlearners
In-Context Unlearning: Language Models as Few Shot Unlearners
Martin Pawelczyk
Seth Neel
Himabindu Lakkaraju
MU
28
101
0
11 Oct 2023
LEACE: Perfect linear concept erasure in closed form
LEACE: Perfect linear concept erasure in closed form
Nora Belrose
David Schneider-Joseph
Shauli Ravfogel
Ryan Cotterell
Edward Raff
Stella Biderman
KELM
MU
41
102
0
06 Jun 2023
Counterfactual Probing for the Influence of Affect and Specificity on
  Intergroup Bias
Counterfactual Probing for the Influence of Affect and Specificity on Intergroup Bias
Venkata S Govindarajan
Kyle Mahowald
David Beaver
J. Li
17
2
0
25 May 2023
Shielded Representations: Protecting Sensitive Attributes Through
  Iterative Gradient-Based Projection
Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection
Shadi Iskander
Kira Radinsky
Yonatan Belinkov
38
17
0
17 May 2023
Emergent and Predictable Memorization in Large Language Models
Emergent and Predictable Memorization in Large Language Models
Stella Biderman
USVSN Sai Prashanth
Lintang Sutawika
Hailey Schoelkopf
Quentin G. Anthony
Shivanshu Purohit
Edward Raf
29
116
0
21 Apr 2023
Competence-Based Analysis of Language Models
Competence-Based Analysis of Language Models
Adam Davies
Jize Jiang
Chengxiang Zhai
ELM
29
4
0
01 Mar 2023
Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of
  Foundation Models
Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of Foundation Models
Peter Henderson
E. Mitchell
Christopher D. Manning
Dan Jurafsky
Chelsea Finn
23
47
0
27 Nov 2022
Probing Classifiers are Unreliable for Concept Removal and Detection
Probing Classifiers are Unreliable for Concept Removal and Detection
Abhinav Kumar
Chenhao Tan
Amit Sharma
AAML
31
21
0
08 Jul 2022
Naturalistic Causal Probing for Morpho-Syntax
Naturalistic Causal Probing for Morpho-Syntax
Afra Amini
Tiago Pimentel
Clara Meister
Ryan Cotterell
MILM
108
18
0
14 May 2022
Probing for the Usage of Grammatical Number
Probing for the Usage of Grammatical Number
Karim Lasri
Tiago Pimentel
Alessandro Lenci
Thierry Poibeau
Ryan Cotterell
35
55
0
19 Apr 2022
Linear Adversarial Concept Erasure
Linear Adversarial Concept Erasure
Shauli Ravfogel
Michael Twiton
Yoav Goldberg
Ryan Cotterell
KELM
81
57
0
28 Jan 2022
On the Global Optima of Kernelized Adversarial Representation Learning
On the Global Optima of Kernelized Adversarial Representation Learning
Bashir Sadeghi
Runyi Yu
Vishnu Naresh Boddeti
AAML
67
31
0
16 Oct 2019
Efficient Estimation of Word Representations in Vector Space
Efficient Estimation of Word Representations in Vector Space
Tomáš Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
281
31,267
0
16 Jan 2013
1