Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples

18 August 2017

Yinpeng Dong

Hang Su

Jun Zhu

Fan Bao

AAML

ArXiv PDF HTML

Papers citing "Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples"

50 / 70 papers shown

Title
IG2: Integrated Gradient on Iterative Gradient Path for Feature Attribution Yue Zhuo Zhiqiang Ge 26 7 0 16 Jun 2024
Enhancing Adversarial Transferability Through Neighborhood Conditional Sampling Chunlin Qiu Yiheng Duan Lingchen Zhao Qian Wang AAML 40 2 0 25 May 2024
Robust Explainable Recommendation Sairamvinay Vijayaraghavan Prasant Mohapatra AAML 38 0 0 03 May 2024
Black-Box Access is Insufficient for Rigorous AI Audits Stephen Casper Carson Ezell Charlotte Siegmann Noam Kolt Taylor Lynn Curtis ... Michael Gerovitch David Bau Max Tegmark David M. Krueger Dylan Hadfield-Menell AAML 36 78 0 25 Jan 2024
Adversarial Doodles: Interpretable and Human-drawable Attacks Provide Describable Insights Ryoya Nara Yusuke Matsui AAML 29 0 0 27 Nov 2023
$Interpretable Machine Learning for Discovery: Statistical Challenges \& Opportunities$ Interpretable Machine Learning for Discovery: Statistical Challenges \& Opportunities Genevera I. Allen Luqin Gan Lili Zheng 38 9 0 02 Aug 2023
Feature Chirality in Deep Learning Models Shipeng Ji Yang Li Ruizhi Fu Jiabao Wang Zhuang Miao SSL 19 0 0 06 May 2023
Rethinking Model Ensemble in Transfer-based Adversarial Attacks Huanran Chen Yichi Zhang Yinpeng Dong Xiao Yang Hang Su Junyi Zhu AAML 28 56 0 16 Mar 2023
It is not "accuracy vs. explainability" -- we need both for trustworthy AI systems D. Petkovic 30 22 0 16 Dec 2022
Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks Stephen Casper K. Hariharan Dylan Hadfield-Menell AAML 26 11 0 18 Nov 2022
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks Tilman Raukur A. Ho Stephen Casper Dylan Hadfield-Menell AAML AI4CE 23 124 0 27 Jul 2022
Multi-concept adversarial attacks Vibha Belavadi Yan Zhou Murat Kantarcioglu B. Thuraisingham AAML 33 0 0 19 Oct 2021
DI-AA: An Interpretable White-box Attack for Fooling Deep Neural Networks Yixiang Wang Jiqiang Liu Xiaolin Chang Jianhua Wang Ricardo J. Rodríguez AAML 27 28 0 14 Oct 2021
Robust Feature-Level Adversaries are Interpretability Tools Stephen Casper Max Nadeau Dylan Hadfield-Menell Gabriel Kreiman AAML 48 27 0 07 Oct 2021
Prediction of Hereditary Cancers Using Neural Networks Zoe Guan Giovanni Parmigiani D. Braun L. Trippa MedIm 12 0 0 25 Jun 2021
Evaluating the Robustness of Bayesian Neural Networks Against Different Types of Attacks Yutian Pang Sheng Cheng Jueming Hu Yongming Liu AAML 20 12 0 17 Jun 2021
Pay attention to your loss: understanding misconceptions about 1-Lipschitz neural networks Louis Bethune Thibaut Boissin M. Serrurier Franck Mamalet Corentin Friedrich Alberto González Sanz 38 21 0 11 Apr 2021
Zero-shot Adversarial Quantization Yuang Liu Wei Zhang Jun Wang MQ 19 78 0 29 Mar 2021
Noise Modulation: Let Your Model Interpret Itself Haoyang Li Xinggang Wang FAtt AAML 14 0 0 19 Mar 2021
EX-RAY: Distinguishing Injected Backdoor from Natural Features in Neural Networks by Examining Differential Feature Symmetry Yingqi Liu Guangyu Shen Guanhong Tao Zhenting Wang Shiqing Ma Xinming Zhang AAML 30 8 0 16 Mar 2021
A Unified Game-Theoretic Interpretation of Adversarial Robustness Jie Ren Die Zhang Yisen Wang Lu Chen Zhanpeng Zhou ... Xu Cheng Xin Wang Meng Zhou Jie Shi Quanshi Zhang AAML 72 22 0 12 Mar 2021
Resilience of Bayesian Layer-Wise Explanations under Adversarial Attacks Ginevra Carbone G. Sanguinetti Luca Bortolussi FAtt AAML 21 4 0 22 Feb 2021
Theory-guided hard constraint projection (HCP): a knowledge-based data-driven scientific machine learning method Yuntian Chen Dou Huang Dongxiao Zhang Junsheng Zeng Nanzhe Wang Haoran Zhang Jinyue Yan PINN 42 107 0 11 Dec 2020
Learning Black-Box Attackers with Transferable Priors and Query Feedback Jiancheng Yang Yangzhou Jiang Xiaoyang Huang Bingbing Ni Chenglong Zhao AAML 18 81 0 21 Oct 2020
Explaining Neural Matrix Factorization with Gradient Rollback Carolin (Haas) Lawrence T. Sztyler Mathias Niepert 19 12 0 12 Oct 2020
The Intriguing Relation Between Counterfactual Explanations and Adversarial Examples Timo Freiesleben GAN 41 62 0 11 Sep 2020
DNN2LR: Interpretation-inspired Feature Crossing for Real-world Tabular Data Zhaocheng Liu Qiang Liu Haoli Zhang Yuntian Chen 16 12 0 22 Aug 2020
Deep Active Learning by Model Interpretability Qiang Liu Zhaocheng Liu Xiaofang Zhu Yeliang Xiu 24 4 0 23 Jul 2020
Sequential Interpretability: Methods, Applications, and Future Direction for Understanding Deep Learning Models in the Context of Sequential Data B. Shickel Parisa Rashidi AI4TS 30 17 0 27 Apr 2020
Adversarial Attacks and Defenses: An Interpretation Perspective Ninghao Liu Mengnan Du Ruocheng Guo Huan Liu Xia Hu AAML 26 8 0 23 Apr 2020
Adversarial Robustness on In- and Out-Distribution Improves Explainability Maximilian Augustin Alexander Meinke Matthias Hein OOD 75 99 0 20 Mar 2020
Heat and Blur: An Effective and Fast Defense Against Adversarial Examples Haya Brama Tal Grinshpoun AAML 19 6 0 17 Mar 2020
Adversarial Ranking Attack and Defense Mo Zhou Zhenxing Niu Le Wang Qilin Zhang G. Hua 36 38 0 26 Feb 2020
Category-wise Attack: Transferable Adversarial Examples for Anchor Free Object Detection Quanyu Liao Xin Wang Bin Kong Siwei Lyu Youbing Yin Qi Song Xi Wu AAML 20 8 0 10 Feb 2020
Explaining with Counter Visual Attributes and Examples Sadaf Gulshad A. Smeulders XAI FAtt AAML 30 15 0 27 Jan 2020
A Framework for Explainable Text Classification in Legal Document Review Christian J. Mahoney Jianping Zhang Nathaniel Huber-Fliflet Peter Gronvall Haozhen Zhao AILaw 19 32 0 19 Dec 2019
An Empirical Study on the Relation between Network Interpretability and Adversarial Robustness Adam Noack Isaac Ahern Dejing Dou Boyang Albert Li OOD AAML 18 10 0 07 Dec 2019
FANN-on-MCU: An Open-Source Toolkit for Energy-Efficient Neural Network Inference at the Edge of the Internet of Things Xiaying Wang Michele Magno Lukas Cavigelli Luca Benini 19 116 0 08 Nov 2019
Understanding Misclassifications by Attributes Sadaf Gulshad Zeynep Akata J. H. Metzen A. Smeulders AAML 38 0 0 15 Oct 2019
Adversarial Learning with Margin-based Triplet Embedding Regularization Yaoyao Zhong Weihong Deng AAML 28 50 0 20 Sep 2019
Interpreting and Improving Adversarial Robustness of Deep Neural Networks with Neuron Sensitivity Chongzhi Zhang Aishan Liu Xianglong Liu Yitao Xu Hang Yu Yuqing Ma Tianlin Li AAML 27 19 0 16 Sep 2019
I-MAD: Interpretable Malware Detector Using Galaxy Transformer Miles Q. Li Benjamin C. M. Fung P. Charland Steven H. H. Ding 38 31 0 15 Sep 2019
FDA: Feature Disruptive Attack Aditya Ganeshan S. VivekB. R. Venkatesh Babu AAML 31 100 0 10 Sep 2019
Interpretable Few-Shot Learning via Linear Distillation Arip Asadulaev Igor Kuznetsov Andrey Filchenkov FedML FAtt 11 1 0 13 Jun 2019
Interpreting Adversarially Trained Convolutional Neural Networks Tianyuan Zhang Zhanxing Zhu AAML GAN FAtt 28 158 0 23 May 2019
Testing DNN Image Classifiers for Confusion & Bias Errors Yuchi Tian Ziyuan Zhong Vicente Ordonez Gail E. Kaiser Baishakhi Ray 24 52 0 20 May 2019
Investigating Robustness and Interpretability of Link Prediction via Adversarial Modifications Pouya Pezeshkpour Yifan Tian Sameer Singh KELM AAML 4 73 0 02 May 2019
Interpreting Adversarial Examples with Attributes Sadaf Gulshad J. H. Metzen A. Smeulders Zeynep Akata FAtt AAML 33 6 0 17 Apr 2019
Interpreting Adversarial Examples by Activation Promotion and Suppression Kaidi Xu Sijia Liu Gaoyuan Zhang Mengshu Sun Pu Zhao Quanfu Fan Chuang Gan X. Lin AAML FAtt 24 43 0 03 Apr 2019
Data-Free Learning of Student Networks Hanting Chen Yunhe Wang Chang Xu Zhaohui Yang Chuanjian Liu Boxin Shi Chunjing Xu Chao Xu Qi Tian FedML 11 365 0 02 Apr 2019