Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2506.14217
Cited By
TriGuard: Testing Model Safety with Attribution Entropy, Verification, and Drift
17 June 2025
Dipesh Tharu Mahato
Rohan Poudel
Pramod Dhungana
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"TriGuard: Testing Model Safety with Attribution Entropy, Verification, and Drift"
14 / 14 papers shown
Title
Sanity Simulations for Saliency Methods
Joon Sik Kim
Gregory Plumb
Ameet Talwalkar
FAtt
73
18
0
13 May 2021
Transformer Interpretability Beyond Attention Visualization
Hila Chefer
Shir Gur
Lior Wolf
137
671
0
17 Dec 2020
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
Dan Hendrycks
Steven Basart
Norman Mu
Saurav Kadavath
Frank Wang
...
Samyak Parajuli
Mike Guo
Basel Alomair
Jacob Steinhardt
Justin Gilmer
OOD
347
1,751
0
29 Jun 2020
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks
Francesco Croce
Matthias Hein
AAML
224
1,855
0
03 Mar 2020
Towards Stable and Efficient Training of Verifiably Robust Neural Networks
Huan Zhang
Hongge Chen
Chaowei Xiao
Sven Gowal
Robert Stanforth
Yue Liu
Duane S. Boning
Cho-Jui Hsieh
AAML
80
349
0
14 Jun 2019
A Benchmark for Interpretability Methods in Deep Neural Networks
Sara Hooker
D. Erhan
Pieter-Jan Kindermans
Been Kim
FAtt
UQCV
114
682
0
28 Jun 2018
On the Robustness of Interpretability Methods
David Alvarez-Melis
Tommi Jaakkola
82
528
0
21 Jun 2018
Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients
A. Ross
Finale Doshi-Velez
AAML
152
683
0
26 Nov 2017
Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models
Wojciech Samek
Thomas Wiegand
K. Müller
XAI
VLM
75
1,195
0
28 Aug 2017
Towards Deep Learning Models Resistant to Adversarial Attacks
Aleksander Madry
Aleksandar Makelov
Ludwig Schmidt
Dimitris Tsipras
Adrian Vladu
SILM
OOD
310
12,117
0
19 Jun 2017
Interpretable Explanations of Black Boxes by Meaningful Perturbation
Ruth C. Fong
Andrea Vedaldi
FAtt
AAML
76
1,525
0
11 Apr 2017
Axiomatic Attribution for Deep Networks
Mukund Sundararajan
Ankur Taly
Qiqi Yan
OOD
FAtt
191
6,015
0
04 Mar 2017
Explaining and Harnessing Adversarial Examples
Ian Goodfellow
Jonathon Shlens
Christian Szegedy
AAML
GAN
280
19,107
0
20 Dec 2014
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
Karen Simonyan
Andrea Vedaldi
Andrew Zisserman
FAtt
312
7,308
0
20 Dec 2013
1