Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.00172
Cited By
Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield
31 October 2023
Jinhwa Kim
Ali Derakhshan
Ian G. Harris
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield"
6 / 6 papers shown
Title
JavelinGuard: Low-Cost Transformer Architectures for LLM Security
Yash Datta
Sharath Rajasekar
39
0
0
09 Jun 2025
Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems
Pierre Peigne-Lefebvre
Mikolaj Kniejski
Filip Sondej
Matthieu David
J. Hoelscher-Obermaier
Christian Schroeder de Witt
Esben Kran
127
7
0
26 Feb 2025
Prompt Inject Detection with Generative Explanation as an Investigative Tool
Jonathan Pan
Swee Liang Wong
Yidi Yuan
Xin Wei Chia
SILM
132
0
0
16 Feb 2025
CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs
Zhihao Liu
Chenhui Hu
ALM
ELM
75
1
0
29 Oct 2024
Recent Advances in Attack and Defense Approaches of Large Language Models
Jing Cui
Yishi Xu
Zhewei Huang
Shuchang Zhou
Jianbin Jiao
Junge Zhang
PILM
AAML
143
2
0
05 Sep 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
129
15
0
20 Jul 2024
1