Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.04610
Cited By
Red-Teaming the Stable Diffusion Safety Filter
3 October 2022
Javier Rando
Daniel Paleka
David Lindner
Lennard Heim
Florian Tramèr
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Red-Teaming the Stable Diffusion Safety Filter"
35 / 35 papers shown
Title
Towards SFW sampling for diffusion models via external conditioning
Camilo Carvajal Reyes
J. Fontbona
Felipe A. Tobar
DiffM
34
0
0
12 May 2025
Erased but Not Forgotten: How Backdoors Compromise Concept Erasure
Jonas Henry Grebe
Tobias Braun
Marcus Rohrbach
Anna Rohrbach
AAML
85
0
0
29 Apr 2025
Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models
Reza Shirkavand
Peiran Yu
Shangqian Gao
Gowthami Somepalli
Tom Goldstein
Heng-Chiao Huang
113
1
0
13 Mar 2025
Sparse Autoencoder as a Zero-Shot Classifier for Concept Erasing in Text-to-Image Diffusion Models
Zhihua Tian
Sirun Nan
Ming Xu
Shengfang Zhai
Wenjie Qu
Jian Liu
Kui Ren
Ruoxi Jia
Jiaheng Zhang
DiffM
96
1
0
12 Mar 2025
Beyond Release: Access Considerations for Generative AI Systems
Irene Solaiman
Rishi Bommasani
Dan Hendrycks
Ariel Herbert-Voss
Yacine Jernite
Aviya Skowron
Andrew Trask
62
1
0
23 Feb 2025
Robust Concept Erasure Using Task Vectors
Minh Pham
Kelly O. Marshall
Chinmay Hegde
Niv Cohen
123
17
0
21 Feb 2025
DiffGuard: Text-Based Safety Checker for Diffusion Models
Massine El Khader
Elias Al Bouzidi
Abdellah Oumida
Mohammed Sbaihi
Eliott Binard
Jean-Philippe Poli
Wassila Ouerdane
Boussad Addad
Katarzyna Kapusta
DiffM
114
0
0
20 Feb 2025
The Pitfalls of "Security by Obscurity" And What They Mean for Transparent AI
Peter Hall
Olivia Mundahl
Sunoo Park
76
0
0
30 Jan 2025
SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders
Bartosz Cywiñski
Kamil Deja
DiffM
63
6
0
29 Jan 2025
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models
Yong-Hyun Park
Sangdoo Yun
Jin-Hwa Kim
Junho Kim
Geonhui Jang
Yonghyun Jeong
Junghyo Jo
Gayoung Lee
76
13
0
17 Jan 2025
MLLM-as-a-Judge for Image Safety without Human Labeling
Zhenting Wang
Shuming Hu
Shiyu Zhao
Xiaowen Lin
F. Xu
...
Nan Jiang
Lingjuan Lyu
Shiqing Ma
Dimitris N. Metaxas
Ankit Jain
164
2
0
31 Dec 2024
In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models
Zhi-Yi Chin
Kuan-Chen Mu
Mario Fritz
Pin-Yu Chen
DiffM
87
0
0
25 Nov 2024
Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction
J. Vice
Naveed Akhtar
Richard I. Hartley
Ajmal Saeed Mian
Ajmal Mian
DiffM
89
0
0
21 Nov 2024
Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning
Saemi Moon
M. Lee
Sangdon Park
Dongwoo Kim
44
1
0
08 Oct 2024
Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models
Tianqi Chen
Shujian Zhang
Mingyuan Zhou
DiffM
83
3
0
17 Sep 2024
On Synthetic Texture Datasets: Challenges, Creation, and Curation
Blaine Hoak
Patrick McDaniel
EGVM
71
0
0
16 Sep 2024
Perception-guided Jailbreak against Text-to-Image Models
Yihao Huang
Le Liang
Tianlin Li
Xiaojun Jia
Run Wang
Weikai Miao
G. Pu
Yang Liu
41
7
0
20 Aug 2024
DiffZOO: A Purely Query-Based Black-Box Attack for Red-teaming Text-to-Image Generative Model via Zeroth Order Optimization
Pucheng Dang
Xing Hu
Dong Li
Rui Zhang
Qi Guo
Kaidi Xu
DiffM
36
5
0
18 Aug 2024
HateSieve: A Contrastive Learning Framework for Detecting and Segmenting Hateful Content in Multimodal Memes
Xuanyu Su
Yansong Li
Diana Inkpen
Nathalie Japkowicz
VLM
81
2
0
11 Aug 2024
Attacks and Defenses for Generative Diffusion Models: A Comprehensive Survey
V. T. Truong
Luan Ba Dang
Long Bao Le
DiffM
MedIm
56
16
0
06 Aug 2024
Jailbreaking Text-to-Image Models with LLM-Based Agents
Yingkai Dong
Zheng Li
Xiangtao Meng
Ning Yu
Shanqing Guo
LLMAG
45
13
0
01 Aug 2024
Iterative Ensemble Training with Anti-Gradient Control for Mitigating Memorization in Diffusion Models
Xiao Liu
Xiaoliu Guan
Yu Wu
Jiaxu Miao
42
7
0
22 Jul 2024
T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models
Yibo Miao
Yifan Zhu
Yinpeng Dong
Lijia Yu
Jun Zhu
Xiao-Shan Gao
EGVM
43
12
0
08 Jul 2024
Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models
Jie Ren
Kangrui Chen
Yingqian Cui
Shenglai Zeng
Hui Liu
Yue Xing
Jiliang Tang
Lingjuan Lyu
53
1
0
21 Jun 2024
Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation
Jessica Quaye
Alicia Parrish
Oana Inel
Charvi Rastogi
Hannah Rose Kirk
...
Nathan Clement
Rafael Mosquera
Juan Ciro
Vijay Janapa Reddi
Lora Aroyo
31
7
0
14 Feb 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
34
78
0
25 Jan 2024
All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models
Seunghoo Hong
Juhun Lee
Simon S. Woo
19
18
0
20 Dec 2023
IMMA: Immunizing text-to-image Models against Malicious Adaptation
Yijia Zheng
Raymond A. Yeh
47
8
0
30 Nov 2023
On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
Yixin Wu
Ning Yu
Michael Backes
Yun Shen
Yang Zhang
DiffM
53
8
0
25 Oct 2023
Understanding and Mitigating Copying in Diffusion Models
Gowthami Somepalli
Vasu Singla
Micah Goldblum
Jonas Geiping
Tom Goldstein
DiffM
20
125
0
31 May 2023
Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models
Y. Qu
Xinyue Shen
Xinlei He
Michael Backes
Savvas Zannettou
Yang Zhang
21
106
0
23 May 2023
Data Redaction from Conditional Generative Models
Zhifeng Kong
Kamalika Chaudhuri
KELM
16
7
0
18 May 2023
Poisoning Web-Scale Training Datasets is Practical
Nicholas Carlini
Matthew Jagielski
Christopher A. Choquette-Choo
Daniel Paleka
Will Pearce
Hyrum S. Anderson
Andreas Terzis
Kurt Thomas
Florian Tramèr
SILM
31
182
0
20 Feb 2023
The Gradient of Generative AI Release: Methods and Considerations
Irene Solaiman
33
98
0
05 Feb 2023
Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of Foundation Models
Peter Henderson
E. Mitchell
Christopher D. Manning
Dan Jurafsky
Chelsea Finn
23
47
0
27 Nov 2022
1