Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.03605
Cited By
Robust Feature-Level Adversaries are Interpretability Tools
7 October 2021
Stephen Casper
Max Nadeau
Dylan Hadfield-Menell
Gabriel Kreiman
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Robust Feature-Level Adversaries are Interpretability Tools"
24 / 24 papers shown
Title
A Survey of Adversarial Defenses in Vision-based Systems: Categorization, Methods and Challenges
Nandish Chattopadhyay
Abdul Basit
B. Ouni
Muhammad Shafique
AAML
31
0
0
01 Mar 2025
MERIT: Multi-view evidential learning for reliable and interpretable liver fibrosis staging
Yuanye Liu
Zheyao Gao
Nannan Shi
Fuping Wu
Yuxin Shi
Qingchao Chen
Xiahai Zhuang
27
2
0
05 May 2024
The SaTML '24 CNN Interpretability Competition: New Innovations for Concept-Level Interpretability
Stephen Casper
Jieun Yun
Joonhyuk Baek
Yeseong Jung
Minhwan Kim
...
A. Nicolson
Arush Tagade
Jessica Rumbelow
Hieu Minh Nguyen
Dylan Hadfield-Menell
19
2
0
03 Apr 2024
How to Train your Antivirus: RL-based Hardening through the Problem-Space
Jacopo Cortellazzi
Ilias Tsingenopoulos
B. Bosanský
Simone Aonzo
Davy Preuveneers
Wouter Joosen
Fabio Pierazzi
Lorenzo Cavallaro
21
2
0
29 Feb 2024
Exploring higher-order neural network node interactions with total correlation
Thomas Kerby
Teresa White
Kevin Moon
22
0
0
06 Feb 2024
Transcending Adversarial Perturbations: Manifold-Aided Adversarial Examples with Legitimate Semantics
Shuai Li
Xiaoyu Jiang
Xiaoguang Ma
AAML
21
0
0
05 Feb 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
34
78
0
25 Jan 2024
DiG-IN: Diffusion Guidance for Investigating Networks -- Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations
Maximilian Augustin
Yannic Neuhaus
Matthias Hein
DiffM
37
4
0
29 Nov 2023
Adversarial Doodles: Interpretable and Human-drawable Attacks Provide Describable Insights
Ryoya Nara
Yusuke Matsui
AAML
29
0
0
27 Nov 2023
Corrupting Neuron Explanations of Deep Visual Features
Divyansh Srivastava
Tuomas P. Oikarinen
Tsui-Wei Weng
FAtt
AAML
17
2
0
25 Oct 2023
Investigating the Adversarial Robustness of Density Estimation Using the Probability Flow ODE
Marius Arvinte
Cory Cornelius
Jason Martin
N. Himayat
DiffM
46
3
0
10 Oct 2023
Physical Adversarial Attacks For Camera-based Smart Systems: Current Trends, Categorization, Applications, Research Challenges, and Future Outlook
Amira Guesmi
Muhammad Abdullah Hanif
B. Ouni
Muhammed Shafique
AAML
23
21
0
11 Aug 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
47
472
0
27 Jul 2023
Red Teaming Deep Neural Networks with Feature Synthesis Tools
Stephen Casper
Yuxiao Li
Jiawei Li
Tong Bu
Ke Zhang
K. Hariharan
Dylan Hadfield-Menell
AAML
32
15
0
08 Feb 2023
Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks
Stephen Casper
K. Hariharan
Dylan Hadfield-Menell
AAML
18
11
0
18 Nov 2022
Physical Adversarial Attack meets Computer Vision: A Decade Survey
Hui Wei
Hao Tang
Xuemei Jia
Zhixiang Wang
Han-Bing Yu
Zhubo Li
Shiníchi Satoh
Luc Van Gool
Zheng Wang
AAML
29
43
0
30 Sep 2022
A Survey on Physical Adversarial Attack in Computer Vision
Donghua Wang
Wen Yao
Tingsong Jiang
Guijian Tang
Xiaoqian Chen
AAML
56
38
0
28 Sep 2022
Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning
Olivia Wiles
Isabela Albuquerque
Sven Gowal
VLM
35
47
0
18 Aug 2022
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks
Tilman Raukur
A. Ho
Stephen Casper
Dylan Hadfield-Menell
AAML
AI4CE
23
124
0
27 Jul 2022
Adversarial Training for High-Stakes Reliability
Daniel M. Ziegler
Seraphina Nix
Lawrence Chan
Tim Bauman
Peter Schmidt-Nielsen
...
Noa Nabeshima
Benjamin Weinstein-Raun
D. Haas
Buck Shlegeris
Nate Thomas
AAML
30
59
0
03 May 2022
Adversarial Neon Beam: A Light-based Physical Attack to DNNs
Chen-Hao Hu
Weiwen Shi
Wen Li
AAML
35
8
0
02 Apr 2022
Natural Language Descriptions of Deep Visual Features
Evan Hernandez
Sarah Schwettmann
David Bau
Teona Bagashvili
Antonio Torralba
Jacob Andreas
MILM
201
117
0
26 Jan 2022
Constructing Unrestricted Adversarial Examples with Generative Models
Yang Song
Rui Shu
Nate Kushman
Stefano Ermon
GAN
AAML
185
302
0
21 May 2018
Adversarial examples in the physical world
Alexey Kurakin
Ian Goodfellow
Samy Bengio
SILM
AAML
287
5,837
0
08 Jul 2016
1