ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.00172
  4. Cited By
Robust Safety Classifier for Large Language Models: Adversarial Prompt
  Shield

Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield

31 October 2023
Jinhwa Kim
Ali Derakhshan
Ian G. Harris
    AAML
ArXivPDFHTML

Papers citing "Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield"

2 / 2 papers shown
Title
Recent Advances in Attack and Defense Approaches of Large Language
  Models
Recent Advances in Attack and Defense Approaches of Large Language Models
Jing Cui
Yishi Xu
Zhewei Huang
Shuchang Zhou
Jianbin Jiao
Junge Zhang
PILM
AAML
60
1
0
05 Sep 2024
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
234
449
0
23 Aug 2022
1