ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.00236
  4. Cited By
Image Hijacks: Adversarial Images can Control Generative Models at
  Runtime
v1v2v3 (latest)

Image Hijacks: Adversarial Images can Control Generative Models at Runtime

1 September 2023
Luke Bailey
Euan Ong
Stuart J. Russell
Scott Emmons
    VLMMLLM
ArXiv (abs)PDFHTML

Papers citing "Image Hijacks: Adversarial Images can Control Generative Models at Runtime"

21 / 71 papers shown
Title
Query-Based Adversarial Prompt Generation
Query-Based Adversarial Prompt Generation
Jonathan Hayase
Ema Borevkovic
Nicholas Carlini
Florian Tramèr
Milad Nasr
AAMLSILM
104
32
0
19 Feb 2024
A StrongREJECT for Empty Jailbreaks
A StrongREJECT for Empty Jailbreaks
Alexandra Souly
Qingyuan Lu
Dillon Bowen
Tu Trinh
Elvis Hsieh
...
Pieter Abbeel
Justin Svegliato
Scott Emmons
Olivia Watkins
Sam Toyer
113
98
0
15 Feb 2024
Test-Time Backdoor Attacks on Multimodal Large Language Models
Test-Time Backdoor Attacks on Multimodal Large Language Models
Dong Lu
Tianyu Pang
Chao Du
Qian Liu
Xianjun Yang
Min Lin
AAML
165
26
0
13 Feb 2024
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM
  Agents Exponentially Fast
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Ye Wang
Jing Jiang
Min Lin
LLMAGLM&Ro
52
63
0
13 Feb 2024
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming
  and Robust Refusal
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Mantas Mazeika
Long Phan
Xuwang Yin
Andy Zou
Zifan Wang
...
Nathaniel Li
Steven Basart
Bo Li
David A. Forsyth
Dan Hendrycks
AAML
114
419
0
06 Feb 2024
Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language
  Models
Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models
Yuancheng Xu
Jiarui Yao
Manli Shu
Yanchao Sun
Zichu Wu
Ning Yu
Tom Goldstein
Furong Huang
AAML
125
25
0
05 Feb 2024
Jailbreaking Attack against Multimodal Large Language Model
Jailbreaking Attack against Multimodal Large Language Model
Zhenxing Niu
Haoxuan Ji
Xinbo Gao
Gang Hua
Rong Jin
97
76
0
04 Feb 2024
Safety of Multimodal Large Language Models on Images and Texts
Safety of Multimodal Large Language Models on Images and Texts
Xin Liu
Yichen Zhu
Yunshi Lan
Chao Yang
Yu Qiao
90
36
0
01 Feb 2024
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Michael Feffer
Anusha Sinha
Wesley Hanwen Deng
Zachary Chase Lipton
Hoda Heidari
AAML
123
75
0
29 Jan 2024
Visibility into AI Agents
Visibility into AI Agents
Alan Chan
Carson Ezell
Max Kaufmann
K. Wei
Lewis Hammond
...
Nitarshan Rajkumar
David M. Krueger
Noam Kolt
Lennart Heim
Markus Anderljung
141
44
0
23 Jan 2024
Benchmarking Large Multimodal Models against Common Corruptions
Benchmarking Large Multimodal Models against Common Corruptions
Jiawei Zhang
Tianyu Pang
Chao Du
Yi Ren
Yue Liu
Min Lin
MLLM
92
15
0
22 Jan 2024
PsySafe: A Comprehensive Framework for Psychological-based Attack,
  Defense, and Evaluation of Multi-agent System Safety
PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety
Zaibin Zhang
Yongting Zhang
Lijun Li
Hongzhi Gao
Lijun Wang
Huchuan Lu
Feng Zhao
Yu Qiao
Jing Shao
LLMAG
96
41
0
22 Jan 2024
InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language
  Models
InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models
Xunguang Wang
Zhenlan Ji
Pingchuan Ma
Zongjie Li
Shuai Wang
MLLM
96
14
0
04 Dec 2023
Jailbreaking GPT-4V via Self-Adversarial Attacks with System Prompts
Jailbreaking GPT-4V via Self-Adversarial Attacks with System Prompts
Yuanwei Wu
Xiang Li
Yixin Liu
Pan Zhou
Lichao Sun
102
65
0
15 Nov 2023
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong
Delong Ran
Jinyuan Liu
Conglei Wang
Tianshuo Cong
Anyu Wang
Sisi Duan
Xiaoyun Wang
MLLM
240
161
0
09 Nov 2023
Can LLMs Follow Simple Rules?
Can LLMs Follow Simple Rules?
Norman Mu
Sarah Chen
Zifan Wang
Sizhe Chen
David Karamardian
Lulwa Aljeraisy
Basel Alomair
Dan Hendrycks
David Wagner
ALM
97
32
0
06 Nov 2023
Survey of Vulnerabilities in Large Language Models Revealed by
  Adversarial Attacks
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
241
164
0
16 Oct 2023
Can Language Models be Instructed to Protect Personal Information?
Can Language Models be Instructed to Protect Personal Information?
Yang Chen
Ethan Mendes
Sauvik Das
Wei Xu
Alan Ritter
PILM
77
37
0
03 Oct 2023
How Robust is Google's Bard to Adversarial Image Attacks?
How Robust is Google's Bard to Adversarial Image Attacks?
Yinpeng Dong
Huanran Chen
Jiawei Chen
Zhengwei Fang
Xiaohu Yang
Yichi Zhang
Yu Tian
Hang Su
Jun Zhu
AAML
120
116
0
21 Sep 2023
From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application?
From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application?
Rodrigo Pedro
Daniel Castro
Paulo Carreira
Nuno Santos
SILMAAML
136
57
0
03 Aug 2023
Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal
  Language Models
Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
Erfan Shayegani
Yue Dong
Nael B. Abu-Ghazaleh
119
153
0
26 Jul 2023
Previous
12