Which Models have Perceptually-Aligned Gradients? An Explanation via
Off-Manifold Robustness

v1v2 (latest)

Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness

30 May 2023

Sebastian Bordt

Hima Lakkaraju

ArXiv (abs)PDF HTML Github (2★)

Papers citing "Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness"

17 / 17 papers shown

Title
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners Soichiro Kumano Hiroshi Kera Toshihiko Yamasaki AAML 121 0 0 20 May 2025
Enhancing Diffusion-Based Image Synthesis with Robust Classifier Guidance Bahjat Kawar Roy Ganz Michael Elad DiffM 65 39 0 18 Aug 2022
Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations Tessa Han Suraj Srinivas Himabindu Lakkaraju FAtt 103 88 0 02 Jun 2022
Elucidating the Design Space of Diffusion-Based Generative Models Tero Karras M. Aittala Timo Aila S. Laine DiffM 222 2,033 0 01 Jun 2022
Towards Understanding the Generative Capability of Adversarially Robust Classifiers Yao Zhu Jiacheng Ma Jiacheng Sun Zewei Chen Rongxin Jiang Zhenguo Li AAML 69 24 0 20 Aug 2021
Have We Learned to Explain?: How Interpretability Methods Can Learn to Encode Predictions in their Interpretations N. Jethani Mukund Sudarshan Yindalon Aphinyanagphongs Rajesh Ranganath FAtt 146 71 0 02 Mar 2021
RobustBench: a standardized adversarial robustness benchmark Francesco Croce Maksym Andriushchenko Vikash Sehwag Edoardo Debenedetti Nicolas Flammarion M. Chiang Prateek Mittal Matthias Hein VLM 339 704 0 19 Oct 2020
Fairwashing Explanations with Off-Manifold Detergent Christopher J. Anders Plamen Pasliev Ann-Kathrin Dombrowski K. Müller Pan Kessel FAtt FaML 63 97 0 20 Jul 2020
Do Adversarially Robust ImageNet Models Transfer Better? Hadi Salman Andrew Ilyas Logan Engstrom Ashish Kapoor Aleksander Madry 93 426 0 16 Jul 2020
Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One Will Grathwohl Kuan-Chieh Wang J. Jacobsen David Duvenaud Mohammad Norouzi Kevin Swersky VLM 93 546 0 06 Dec 2019
Certified Adversarial Robustness via Randomized Smoothing Jeremy M. Cohen Elan Rosenfeld J. Zico Kolter AAML 169 2,052 0 08 Feb 2019
Sanity Checks for Saliency Maps Julius Adebayo Justin Gilmer M. Muelly Ian Goodfellow Moritz Hardt Been Kim FAtt AAML XAI 152 1,970 0 08 Oct 2018
Robustness May Be at Odds with Accuracy Dimitris Tsipras Shibani Santurkar Logan Engstrom Alexander Turner Aleksander Madry AAML 110 1,784 0 30 May 2018
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric Richard Y. Zhang Phillip Isola Alexei A. Efros Eli Shechtman Oliver Wang EGVM 384 11,920 0 11 Jan 2018
Towards Deep Learning Models Resistant to Adversarial Attacks Aleksander Madry Aleksandar Makelov Ludwig Schmidt Dimitris Tsipras Adrian Vladu SILM OOD 319 12,138 0 19 Jun 2017
Axiomatic Attribution for Deep Networks Mukund Sundararajan Ankur Taly Qiqi Yan OOD FAtt 193 6,024 0 04 Mar 2017
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps Karen Simonyan Andrea Vedaldi Andrew Zisserman FAtt 314 7,321 0 20 Dec 2013