Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.04153
Cited By
Probing Classifiers are Unreliable for Concept Removal and Detection
8 July 2022
Abhinav Kumar
Chenhao Tan
Amit Sharma
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Probing Classifiers are Unreliable for Concept Removal and Detection"
12 / 12 papers shown
Title
Representation Engineering for Large-Language Models: Survey and Research Challenges
Lukasz Bartoszcze
Sarthak Munshi
Bryan Sukidi
Jennifer Yen
Zejia Yang
David Williams-King
Linh Le
Kosi Asuzu
Carsten Maple
102
0
0
24 Feb 2025
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
Nitay Calderon
Roi Reichart
40
10
0
27 Jul 2024
A Geometric Notion of Causal Probing
Clément Guerner
Anej Svete
Tianyu Liu
Alex Warstadt
Ryan Cotterell
LLMSV
41
12
0
27 Jul 2023
LEACE: Perfect linear concept erasure in closed form
Nora Belrose
David Schneider-Joseph
Shauli Ravfogel
Ryan Cotterell
Edward Raff
Stella Biderman
KELM
MU
41
102
0
06 Jun 2023
Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in Multilingual Machine Translation
Minwoo Lee
Hyukhun Koh
Kang-il Lee
Dongdong Zhang
Minsu Kim
Kyomin Jung
35
9
0
23 May 2023
Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection
Shadi Iskander
Kira Radinsky
Yonatan Belinkov
38
17
0
17 May 2023
Towards Procedural Fairness: Uncovering Biases in How a Toxic Language Classifier Uses Sentiment Information
I. Nejadgholi
Esma Balkir
Kathleen C. Fraser
S. Kiritchenko
40
3
0
19 Oct 2022
Linear Adversarial Concept Erasure
Shauli Ravfogel
Michael Twiton
Yoav Goldberg
Ryan Cotterell
KELM
81
57
0
28 Jan 2022
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
226
405
0
24 Feb 2021
An Investigation of Why Overparameterization Exacerbates Spurious Correlations
Shiori Sagawa
Aditi Raghunathan
Pang Wei Koh
Percy Liang
152
371
0
09 May 2020
A Survey on Bias and Fairness in Machine Learning
Ninareh Mehrabi
Fred Morstatter
N. Saxena
Kristina Lerman
Aram Galstyan
SyDa
FaML
326
4,223
0
23 Aug 2019
What you can cram into a single vector: Probing sentence embeddings for linguistic properties
Alexis Conneau
Germán Kruszewski
Guillaume Lample
Loïc Barrault
Marco Baroni
201
882
0
03 May 2018
1