ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.14217
  4. Cited By
TriGuard: Testing Model Safety with Attribution Entropy, Verification, and Drift

TriGuard: Testing Model Safety with Attribution Entropy, Verification, and Drift

17 June 2025
Dipesh Tharu Mahato
Rohan Poudel
Pramod Dhungana
    AAML
ArXiv (abs)PDFHTML

Papers citing "TriGuard: Testing Model Safety with Attribution Entropy, Verification, and Drift"

14 / 14 papers shown
Title
Sanity Simulations for Saliency Methods
Sanity Simulations for Saliency Methods
Joon Sik Kim
Gregory Plumb
Ameet Talwalkar
FAtt
73
18
0
13 May 2021
Transformer Interpretability Beyond Attention Visualization
Transformer Interpretability Beyond Attention Visualization
Hila Chefer
Shir Gur
Lior Wolf
137
671
0
17 Dec 2020
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution
  Generalization
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
Dan Hendrycks
Steven Basart
Norman Mu
Saurav Kadavath
Frank Wang
...
Samyak Parajuli
Mike Guo
Basel Alomair
Jacob Steinhardt
Justin Gilmer
OOD
347
1,751
0
29 Jun 2020
Reliable evaluation of adversarial robustness with an ensemble of
  diverse parameter-free attacks
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks
Francesco Croce
Matthias Hein
AAML
224
1,855
0
03 Mar 2020
Towards Stable and Efficient Training of Verifiably Robust Neural
  Networks
Towards Stable and Efficient Training of Verifiably Robust Neural Networks
Huan Zhang
Hongge Chen
Chaowei Xiao
Sven Gowal
Robert Stanforth
Yue Liu
Duane S. Boning
Cho-Jui Hsieh
AAML
80
349
0
14 Jun 2019
A Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural Networks
Sara Hooker
D. Erhan
Pieter-Jan Kindermans
Been Kim
FAttUQCV
114
682
0
28 Jun 2018
On the Robustness of Interpretability Methods
On the Robustness of Interpretability Methods
David Alvarez-Melis
Tommi Jaakkola
82
528
0
21 Jun 2018
Improving the Adversarial Robustness and Interpretability of Deep Neural
  Networks by Regularizing their Input Gradients
Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients
A. Ross
Finale Doshi-Velez
AAML
152
683
0
26 Nov 2017
Explainable Artificial Intelligence: Understanding, Visualizing and
  Interpreting Deep Learning Models
Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models
Wojciech Samek
Thomas Wiegand
K. Müller
XAIVLM
75
1,195
0
28 Aug 2017
Towards Deep Learning Models Resistant to Adversarial Attacks
Towards Deep Learning Models Resistant to Adversarial Attacks
Aleksander Madry
Aleksandar Makelov
Ludwig Schmidt
Dimitris Tsipras
Adrian Vladu
SILMOOD
310
12,117
0
19 Jun 2017
Interpretable Explanations of Black Boxes by Meaningful Perturbation
Interpretable Explanations of Black Boxes by Meaningful Perturbation
Ruth C. Fong
Andrea Vedaldi
FAttAAML
76
1,525
0
11 Apr 2017
Axiomatic Attribution for Deep Networks
Axiomatic Attribution for Deep Networks
Mukund Sundararajan
Ankur Taly
Qiqi Yan
OODFAtt
191
6,015
0
04 Mar 2017
Explaining and Harnessing Adversarial Examples
Explaining and Harnessing Adversarial Examples
Ian Goodfellow
Jonathon Shlens
Christian Szegedy
AAMLGAN
280
19,107
0
20 Dec 2014
Deep Inside Convolutional Networks: Visualising Image Classification
  Models and Saliency Maps
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
Karen Simonyan
Andrea Vedaldi
Andrew Zisserman
FAtt
312
7,308
0
20 Dec 2013
1