ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.12091
  4. Cited By
Linear Adversarial Concept Erasure

Linear Adversarial Concept Erasure

28 January 2022
Shauli Ravfogel
Michael Twiton
Yoav Goldberg
Ryan Cotterell
    KELM
ArXivPDFHTML

Papers citing "Linear Adversarial Concept Erasure"

20 / 20 papers shown
Title
Controllable Context Sensitivity and the Knob Behind It
Controllable Context Sensitivity and the Knob Behind It
Julian Minder
Kevin Du
Niklas Stoehr
Giovanni Monea
Chris Wendler
Robert West
Ryan Cotterell
KELM
44
3
0
11 Nov 2024
Collapsed Language Models Promote Fairness
Collapsed Language Models Promote Fairness
Jingxuan Xu
Wuyang Chen
Linyi Li
Yao Zhao
Yunchao Wei
39
0
0
06 Oct 2024
Machine Unlearning Fails to Remove Data Poisoning Attacks
Machine Unlearning Fails to Remove Data Poisoning Attacks
Martin Pawelczyk
Jimmy Z. Di
Yiwei Lu
Gautam Kamath
Ayush Sekhari
Seth Neel
AAML
MU
54
8
0
25 Jun 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
44
111
0
28 Mar 2024
Robust Concept Erasure via Kernelized Rate-Distortion Maximization
Robust Concept Erasure via Kernelized Rate-Distortion Maximization
Somnath Basu Roy Chowdhury
Nicholas Monath
Kumar Avinava Dubey
Amr Ahmed
Snigdha Chaturvedi
19
4
0
30 Nov 2023
Uncovering Intermediate Variables in Transformers using Circuit Probing
Uncovering Intermediate Variables in Transformers using Circuit Probing
Michael A. Lepori
Thomas Serre
Ellie Pavlick
70
7
0
07 Nov 2023
Sparse Autoencoders Find Highly Interpretable Features in Language
  Models
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Hoagy Cunningham
Aidan Ewart
Logan Riggs
R. Huben
Lee Sharkey
MILM
23
332
0
15 Sep 2023
A Survey on Fairness in Large Language Models
A Survey on Fairness in Large Language Models
Yingji Li
Mengnan Du
Rui Song
Xin Wang
Ying Wang
ALM
41
59
0
20 Aug 2023
A Geometric Notion of Causal Probing
A Geometric Notion of Causal Probing
Clément Guerner
Anej Svete
Tianyu Liu
Alex Warstadt
Ryan Cotterell
LLMSV
36
12
0
27 Jul 2023
Prompt Tuning Pushes Farther, Contrastive Learning Pulls Closer: A
  Two-Stage Approach to Mitigate Social Biases
Prompt Tuning Pushes Farther, Contrastive Learning Pulls Closer: A Two-Stage Approach to Mitigate Social Biases
Yingji Li
Mengnan Du
Xin Wang
Ying Wang
45
26
0
04 Jul 2023
LEACE: Perfect linear concept erasure in closed form
LEACE: Perfect linear concept erasure in closed form
Nora Belrose
David Schneider-Joseph
Shauli Ravfogel
Ryan Cotterell
Edward Raff
Stella Biderman
KELM
MU
41
102
0
06 Jun 2023
DAMP: Doubly Aligned Multilingual Parser for Task-Oriented Dialogue
DAMP: Doubly Aligned Multilingual Parser for Task-Oriented Dialogue
William B. Held
Christopher Hidey
Fei Liu
Eric Zhu
Rahul Goel
Diyi Yang
Rushin Shah
21
0
0
15 Dec 2022
Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of
  Foundation Models
Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of Foundation Models
Peter Henderson
E. Mitchell
Christopher D. Manning
Dan Jurafsky
Chelsea Finn
16
47
0
27 Nov 2022
Log-linear Guardedness and its Implications
Log-linear Guardedness and its Implications
Shauli Ravfogel
Yoav Goldberg
Ryan Cotterell
28
2
0
18 Oct 2022
Visual Comparison of Language Model Adaptation
Visual Comparison of Language Model Adaptation
R. Sevastjanova
E. Cakmak
Shauli Ravfogel
Ryan Cotterell
Mennatallah El-Assady
VLM
33
16
0
17 Aug 2022
Analyzing Gender Representation in Multilingual Models
Analyzing Gender Representation in Multilingual Models
Hila Gonen
Shauli Ravfogel
Yoav Goldberg
15
11
0
20 Apr 2022
Debiasing Pre-trained Contextualised Embeddings
Debiasing Pre-trained Contextualised Embeddings
Masahiro Kaneko
Danushka Bollegala
210
138
0
23 Jan 2021
On the Global Optima of Kernelized Adversarial Representation Learning
On the Global Optima of Kernelized Adversarial Representation Learning
Bashir Sadeghi
Runyi Yu
Vishnu Naresh Boddeti
AAML
59
31
0
16 Oct 2019
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networks
Tero Karras
S. Laine
Timo Aila
264
10,348
0
12 Dec 2018
Adversarial Deep Averaging Networks for Cross-Lingual Sentiment
  Classification
Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification
Xilun Chen
Yu Sun
Ben Athiwaratkun
Claire Cardie
Kilian Q. Weinberger
217
315
0
06 Jun 2016
1