ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.00172
  4. Cited By
Robust Safety Classifier for Large Language Models: Adversarial Prompt
  Shield

Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield

31 October 2023
Jinhwa Kim
Ali Derakhshan
Ian G. Harris
    AAML
ArXiv (abs)PDFHTML

Papers citing "Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield"

6 / 6 papers shown
Title
JavelinGuard: Low-Cost Transformer Architectures for LLM Security
JavelinGuard: Low-Cost Transformer Architectures for LLM Security
Yash Datta
Sharath Rajasekar
39
0
0
09 Jun 2025
Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems
Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems
Pierre Peigne-Lefebvre
Mikolaj Kniejski
Filip Sondej
Matthieu David
J. Hoelscher-Obermaier
Christian Schroeder de Witt
Esben Kran
127
7
0
26 Feb 2025
Prompt Inject Detection with Generative Explanation as an Investigative Tool
Prompt Inject Detection with Generative Explanation as an Investigative Tool
Jonathan Pan
Swee Liang Wong
Yidi Yuan
Xin Wei Chia
SILM
132
0
0
16 Feb 2025
CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs
CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs
Zhihao Liu
Chenhui Hu
ALMELM
75
1
0
29 Oct 2024
Recent Advances in Attack and Defense Approaches of Large Language
  Models
Recent Advances in Attack and Defense Approaches of Large Language Models
Jing Cui
Yishi Xu
Zhewei Huang
Shuchang Zhou
Jianbin Jiao
Junge Zhang
PILMAAML
143
2
0
05 Sep 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
129
15
0
20 Jul 2024
1