ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.15043
  4. Cited By
Universal and Transferable Adversarial Attacks on Aligned Language
  Models
v1v2 (latest)

Universal and Transferable Adversarial Attacks on Aligned Language Models

27 July 2023
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
ArXiv (abs)PDFHTMLGithub (3937★)

Papers citing "Universal and Transferable Adversarial Attacks on Aligned Language Models"

1 / 1,101 papers shown
Title
Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting
  Jailbreaks
Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks
Abhinav Rao
S. Vashistha
Atharva Naik
Somak Aditya
Monojit Choudhury
120
24
0
24 May 2023
Previous
123...212223