Inferring Sensitive Attributes from Model Explanations

Inferring Sensitive Attributes from Model Explanations

21 August 2022

Papers citing "Inferring Sensitive Attributes from Model Explanations"

11 / 11 papers shown

Title
Counterfactual Explanations Can Be Manipulated Dylan Slack Sophie Hilgard Himabindu Lakkaraju Sameer Singh 62 137 0 04 Jun 2021
Honest-but-Curious Nets: Sensitive Attributes of Private Inputs Can Be Secretly Coded into the Classifiers' Outputs Mohammad Malekzadeh Anastasia Borovykh Deniz Gündüz MIACV 59 42 0 25 May 2021
Measuring Data Leakage in Machine-Learning Models with Fisher Information Awni Y. Hannun Chuan Guo Laurens van der Maaten FedML MIACV 49 56 0 23 Feb 2021
Robust and Stable Black Box Explanations Himabindu Lakkaraju Nino Arsov Osbert Bastani AAML FAtt 53 84 0 12 Nov 2020
Model extraction from counterfactual explanations Ulrich Aïvodji Alexandre Bolot Sébastien Gambs MIACV MLAU 58 51 0 03 Sep 2020
"How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations Himabindu Lakkaraju Osbert Bastani 56 254 0 15 Nov 2019
Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods Dylan Slack Sophie Hilgard Emily Jia Sameer Singh Himabindu Lakkaraju FAtt AAML MLAU 66 817 0 06 Nov 2019
Fooling Neural Network Interpretations via Adversarial Model Manipulation Juyeon Heo Sunghwan Joo Taesup Moon AAML FAtt 88 202 0 06 Feb 2019
Exploiting Unintended Feature Leakage in Collaborative Learning Luca Melis Congzheng Song Emiliano De Cristofaro Vitaly Shmatikov FedML 138 1,471 0 10 May 2018
Learning Important Features Through Propagating Activation Differences Avanti Shrikumar Peyton Greenside A. Kundaje FAtt 180 3,865 0 10 Apr 2017
Axiomatic Attribution for Deep Networks Mukund Sundararajan Ankur Taly Qiqi Yan OOD FAtt 175 5,968 0 04 Mar 2017