Feature Attributions and Counterfactual Explanations Can Be Manipulated

v1v2 (latest)

Feature Attributions and Counterfactual Explanations Can Be Manipulated

23 June 2021

Himabindu Lakkaraju

ArXiv (abs)PDF HTML

Papers citing "Feature Attributions and Counterfactual Explanations Can Be Manipulated"

10 / 10 papers shown

Title
Interpretable Counterfactual Explanations Guided by Prototypes A. V. Looveren Janis Klaise FAtt 83 387 0 03 Jul 2019
Sanity Checks for Saliency Maps Julius Adebayo Justin Gilmer M. Muelly Ian Goodfellow Moritz Hardt Been Kim FAtt AAML XAI 152 1,972 0 08 Oct 2018
This Looks Like That: Deep Learning for Interpretable Image Recognition Chaofan Chen Oscar Li Chaofan Tao A. Barnett Jonathan Su Cynthia Rudin 260 1,187 0 27 Jun 2018
On the Robustness of Interpretability Methods David Alvarez-Melis Tommi Jaakkola 100 528 0 21 Jun 2018
Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives Amit Dhurandhar Pin-Yu Chen Ronny Luss Chun-Chen Tu Pai-Shun Ting Karthikeyan Shanmugam Payel Das FAtt 129 592 0 21 Feb 2018
Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR Sandra Wachter Brent Mittelstadt Chris Russell MLAU 138 2,371 0 01 Nov 2017
Axiomatic Attribution for Deep Networks Mukund Sundararajan Ankur Taly Qiqi Yan OOD FAtt 193 6,027 0 04 Mar 2017
"Why Should I Trust You?": Explaining the Predictions of Any Classifier Marco Tulio Ribeiro Sameer Singh Carlos Guestrin FAtt FaML 1.2K 17,071 0 16 Feb 2016
Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model Benjamin Letham Cynthia Rudin Tyler H. McCormick D. Madigan FAtt 72 745 0 05 Nov 2015
Certifying and removing disparate impact Michael Feldman Sorelle A. Friedler John Moeller C. Scheidegger Suresh Venkatasubramanian FaML 212 1,996 0 11 Dec 2014