TriGuard: Testing Model Safety with Attribution Entropy, Verification, and Drift

TriGuard: Testing Model Safety with Attribution Entropy, Verification, and Drift

17 June 2025

Dipesh Tharu Mahato

Pramod Dhungana

ArXiv (abs)PDF HTML

Papers citing "TriGuard: Testing Model Safety with Attribution Entropy, Verification, and Drift"

14 / 14 papers shown

Title
Sanity Simulations for Saliency Methods Joon Sik Kim Gregory Plumb Ameet Talwalkar FAtt 73 18 0 13 May 2021
Transformer Interpretability Beyond Attention Visualization Hila Chefer Shir Gur Lior Wolf 137 671 0 17 Dec 2020
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization Dan Hendrycks Steven Basart Norman Mu Saurav Kadavath Frank Wang ... Samyak Parajuli Mike Guo Basel Alomair Jacob Steinhardt Justin Gilmer OOD 347 1,751 0 29 Jun 2020
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks Francesco Croce Matthias Hein AAML 224 1,855 0 03 Mar 2020
Towards Stable and Efficient Training of Verifiably Robust Neural Networks Huan Zhang Hongge Chen Chaowei Xiao Sven Gowal Robert Stanforth Yue Liu Duane S. Boning Cho-Jui Hsieh AAML 80 349 0 14 Jun 2019
A Benchmark for Interpretability Methods in Deep Neural Networks Sara Hooker D. Erhan Pieter-Jan Kindermans Been Kim FAtt UQCV 114 682 0 28 Jun 2018
On the Robustness of Interpretability Methods David Alvarez-Melis Tommi Jaakkola 82 528 0 21 Jun 2018
Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients A. Ross Finale Doshi-Velez AAML 152 683 0 26 Nov 2017
Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models Wojciech Samek Thomas Wiegand K. Müller XAI VLM 75 1,195 0 28 Aug 2017
Towards Deep Learning Models Resistant to Adversarial Attacks Aleksander Madry Aleksandar Makelov Ludwig Schmidt Dimitris Tsipras Adrian Vladu SILM OOD 310 12,117 0 19 Jun 2017
Interpretable Explanations of Black Boxes by Meaningful Perturbation Ruth C. Fong Andrea Vedaldi FAtt AAML 76 1,525 0 11 Apr 2017
Axiomatic Attribution for Deep Networks Mukund Sundararajan Ankur Taly Qiqi Yan OOD FAtt 191 6,015 0 04 Mar 2017
Explaining and Harnessing Adversarial Examples Ian Goodfellow Jonathon Shlens Christian Szegedy AAML GAN 280 19,107 0 20 Dec 2014
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps Karen Simonyan Andrea Vedaldi Andrew Zisserman FAtt 312 7,308 0 20 Dec 2013