ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.08487
  4. Cited By
Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output
  Robustness of Large Language Models
v1v2v3 (latest)

Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models

17 July 2023
Huachuan Qiu
Shuai Zhang
Anqi Li
Hongliang He
Zhenzhong Lan
    ALM
ArXiv (abs)PDFHTMLGithub (38★)

Papers citing "Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models"

15 / 15 papers shown
Title
Improving LLM Outputs Against Jailbreak Attacks with Expert Model Integration
Tatia Tsmindashvili
Ana Kolkhidashvili
Dachi Kurtskhalia
Nino Maghlakelidze
Elene Mekvabishvili
Guram Dentoshvili
Orkhan Shamilov
Zaal Gachechiladze
Steven Saporta
David Dachi Choladze
167
0
0
18 May 2025
ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models
ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models
Han Zhang
Hongfu Gao
Qiang Hu
Guanhua Chen
L. Yang
Bingyi Jing
Hongxin Wei
Bing Wang
Haifeng Bai
Lei Yang
AILawELM
122
2
0
24 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
148
1
0
09 Oct 2024
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Xiaogeng Liu
Peiran Li
Edward Suh
Yevgeniy Vorobeychik
Zhuoqing Mao
Somesh Jha
Patrick McDaniel
Huan Sun
Bo Li
Chaowei Xiao
103
29
0
03 Oct 2024
OR-Bench: An Over-Refusal Benchmark for Large Language Models
OR-Bench: An Over-Refusal Benchmark for Large Language Models
Justin Cui
Wei-Lin Chiang
Ion Stoica
Cho-Jui Hsieh
ALM
141
50
0
31 May 2024
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Neeloy Chakraborty
Melkior Ornik
Katherine Driggs-Campbell
LRM
176
12
0
25 Mar 2024
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT
  Models
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
Wei Ping
Weixin Chen
Hengzhi Pei
Chulin Xie
Mintong Kang
...
Zinan Lin
Yuk-Kit Cheng
Sanmi Koyejo
Basel Alomair
Yue Liu
95
416
0
20 Jun 2023
Toxicity in ChatGPT: Analyzing Persona-assigned Language Models
Toxicity in ChatGPT: Analyzing Persona-assigned Language Models
Ameet Deshpande
Vishvak Murahari
Tanmay Rajpurohit
Ashwin Kalyan
Karthik Narasimhan
LM&MALLMAG
67
365
0
11 Apr 2023
Not what you've signed up for: Compromising Real-World LLM-Integrated
  Applications with Indirect Prompt Injection
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake
Sahar Abdelnabi
Shailesh Mishra
C. Endres
Thorsten Holz
Mario Fritz
SILM
126
488
0
23 Feb 2023
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard
  Security Attacks
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Daniel Kang
Xuechen Li
Ion Stoica
Carlos Guestrin
Matei A. Zaharia
Tatsunori Hashimoto
AAML
92
253
0
11 Feb 2023
Constitutional AI: Harmlessness from AI Feedback
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDaMoMe
196
1,634
0
15 Dec 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
386
2,388
0
09 Nov 2022
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng Zhang
Yuxiao Dong
Jie Tang
BDLLRM
346
1,094
0
05 Oct 2022
PaLM: Scaling Language Modeling with Pathways
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILMLRM
495
6,240
0
05 Apr 2022
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
659
24,464
0
26 Jul 2019
1