Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.10414
Cited By
Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations
15 November 2024
Jianfeng Chi
Ujjwal Karn
Hongyuan Zhan
Eric Michael Smith
Javier Rando
Yiming Zhang
Kate Plawiak
Zacharie Delpierre Coudert
Kartikeya Upasani
Mahesh Pasupuleti
MLLM
3DH
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations"
17 / 17 papers shown
Title
CAPTURE: Context-Aware Prompt Injection Testing and Robustness Enhancement
Gauri Kholkar
Ratinder Ahuja
SILM
2
0
0
18 May 2025
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
Xuannan Liu
Zekun Li
Zheqi He
Peipei Li
Shuhan Xia
Xing Cui
Huaibo Huang
Xi Yang
Ran He
EGVM
AAML
21
0
0
17 May 2025
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
Yong-Jin Liu
Shengfang Zhai
Mingzhe Du
Yulin Chen
Tri Cao
...
Xuzhao Li
Kun Wang
Junfeng Fang
Jiaheng Zhang
Bryan Hooi
OffRL
LRM
12
0
0
16 May 2025
Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model
Xinyue Lou
You Li
Jinan Xu
Xiangyu Shi
Chong Chen
Kaiyu Huang
LRM
28
0
0
10 May 2025
LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities
Kalyan Nakka
Jimmy Dani
Ausmit Mondal
Nitesh Saxena
AAML
30
0
0
08 May 2025
Safety in Large Reasoning Models: A Survey
Cheng Wang
Yong-Jin Liu
Yangqiu Song
Duzhen Zhang
Zechao Li
Junfeng Fang
Bryan Hooi
LRM
185
1
0
24 Apr 2025
BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation
Ke Xu
Mingli Zhu
Jiarong Ou
R. J. Chen
Xin Tao
Pengfei Wan
Baoyuan Wu
DiffM
AAML
VGen
53
0
0
23 Apr 2025
The Structural Safety Generalization Problem
Julius Broomfield
Tom Gibbs
Ethan Kosak-Hine
George Ingebretsen
Tia Nasir
Jason Zhang
Reihaneh Iranmanesh
Sara Pieri
Reihaneh Rabbany
Kellin Pelrine
AAML
35
0
0
13 Apr 2025
Safe RLHF-V: Safe Reinforcement Learning from Human Feedback in Multimodal Large Language Models
Yalan Qin
Xiuying Chen
Rui Pan
Han Zhu
C. Zhang
...
Juntao Dai
Chi-Min Chan
Sirui Han
Yike Guo
Yiran Yang
OffRL
82
4
0
22 Mar 2025
Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning
Yiwei Chen
Yuguang Yao
Yihua Zhang
Bingquan Shen
Gaowen Liu
Sijia Liu
AAML
MU
63
1
0
14 Mar 2025
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning
Borong Zhang
Yuhao Zhang
Yalan Qin
Yingshan Lei
Josef Dai
Yuanpei Chen
Yaodong Yang
66
4
0
05 Mar 2025
FLAME: Flexible LLM-Assisted Moderation Engine
Ivan Bakulin
Ilia Kopanichuk
Iaroslav Bespalov
Nikita Radchenko
V. Shaposhnikov
Dmitry V. Dylov
Ivan Oseledets
96
0
0
13 Feb 2025
Universal Adversarial Attack on Aligned Multimodal LLMs
Temurbek Rahmatullaev
Polina Druzhinina
Matvey Mikhalchuk
Andrey Kuznetsov
Anton Razzhigaev
AAML
105
0
0
11 Feb 2025
Adversarial ML Problems Are Getting Harder to Solve and to Evaluate
Javier Rando
Jie Zhang
Nicholas Carlini
F. Tramèr
AAML
ELM
61
3
0
04 Feb 2025
Peering Behind the Shield: Guardrail Identification in Large Language Models
Ziqing Yang
Yixin Wu
Rui Wen
Michael Backes
Yang Zhang
63
1
0
03 Feb 2025
GuardReasoner: Towards Reasoning-based LLM Safeguards
Yue Liu
Hongcheng Gao
Shengfang Zhai
Jun Xia
Tianyi Wu
Zhiwei Xue
Yuxiao Chen
Kenji Kawaguchi
Jiaheng Zhang
Bryan Hooi
AI4TS
LRM
131
14
0
30 Jan 2025
Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks
Samuele Poppi
Zheng-Xin Yong
Yifei He
Bobbie Chern
Han Zhao
Aobo Yang
Jianfeng Chi
AAML
50
15
0
23 Oct 2024
1