ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.05044
  4. Cited By
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large
  Language Models

SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models

7 February 2024
Lijun Li
Bowen Dong
Ruohui Wang
Xuhao Hu
Wangmeng Zuo
Dahua Lin
Yu Qiao
Jing Shao
    ELM
ArXivPDFHTML

Papers citing "SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models"

28 / 28 papers shown
Title
MDIT-Bench: Evaluating the Dual-Implicit Toxicity in Large Multimodal Models
MDIT-Bench: Evaluating the Dual-Implicit Toxicity in Large Multimodal Models
Bohan Jin
Shuhan Qi
Kehai Chen
Xinyi Guo
Xuan Wang
37
0
0
22 May 2025
ExpertSteer: Intervening in LLMs through Expert Knowledge
ExpertSteer: Intervening in LLMs through Expert Knowledge
Weixuan Wang
Minghao Wu
Barry Haddow
Alexandra Birch
LLMSV
129
0
0
18 May 2025
CHBench: A Chinese Dataset for Evaluating Health in Large Language Models
CHBench: A Chinese Dataset for Evaluating Health in Large Language Models
Chenlu Guo
Nuo Xu
Yi-Ju Chang
Yuan Wu
AI4MH
LM&MA
80
2
0
24 Feb 2025
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
Yueying Zou
Peipei Li
Zekun Li
Huaibo Huang
Xing Cui
Xuannan Liu
Chenghanyu Zhang
Ran He
DeLMO
156
3
0
07 Feb 2025
GuardReasoner: Towards Reasoning-based LLM Safeguards
Yue Liu
Hongcheng Gao
Shengfang Zhai
Jun Xia
Tianyi Wu
Zhiwei Xue
Yuxiao Chen
Kenji Kawaguchi
Jiaheng Zhang
Bryan Hooi
AI4TS
LRM
230
23
0
30 Jan 2025
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs
Yujun Zhou
Jingdong Yang
Yue Huang
Kehan Guo
Zoe Emory
...
Tian Gao
Werner Geyer
Nuno Moniz
Nitesh Chawla
Xiangliang Zhang
88
0
0
18 Oct 2024
On Calibration of LLM-based Guard Models for Reliable Content Moderation
On Calibration of LLM-based Guard Models for Reliable Content Moderation
Hongfu Liu
Hengguan Huang
Hao Wang
Xiangming Gu
Ye Wang
113
4
0
14 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
139
1
0
09 Oct 2024
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time
Yi Ding
Bolian Li
Ruqi Zhang
MLLM
101
13
0
09 Oct 2024
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
Tinghao Xie
Xiangyu Qi
Yi Zeng
Yangsibo Huang
Udari Madhushani Sehwag
...
Bo Li
Kai Li
Danqi Chen
Peter Henderson
Prateek Mittal
ALM
ELM
130
71
0
20 Jun 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Yongting Zhang
Lu Chen
Guodong Zheng
Yifeng Gao
Rui Zheng
...
Yu Qiao
Xuanjing Huang
Feng Zhao
Tao Gui
Jing Shao
VLM
123
31
0
17 Jun 2024
Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents
Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents
Avital Shafran
R. Schuster
Vitaly Shmatikov
89
35
0
09 Jun 2024
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
Xunguang Wang
Daoyuan Wu
Zhenlan Ji
Zongjie Li
Pingchuan Ma
Shuai Wang
Yingjiu Li
Yang Liu
Ning Liu
Juergen Rahmel
AAML
115
10
0
08 Jun 2024
OR-Bench: An Over-Refusal Benchmark for Large Language Models
OR-Bench: An Over-Refusal Benchmark for Large Language Models
Justin Cui
Wei-Lin Chiang
Ion Stoica
Cho-Jui Hsieh
ALM
129
50
0
31 May 2024
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Hakan Inan
Kartikeya Upasani
Jianfeng Chi
Rashi Rungta
Krithika Iyer
...
Michael Tontchev
Qing Hu
Brian Fuller
Davide Testuggine
Madian Khabsa
AI4MH
150
435
0
07 Dec 2023
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Alexander Robey
Eric Wong
Hamed Hassani
George J. Pappas
AAML
109
247
0
05 Oct 2023
Qwen Technical Report
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
231
1,792
0
28 Sep 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
163
340
0
19 Sep 2023
Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM
Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM
Bochuan Cao
Yu Cao
Lu Lin
Jinghui Chen
AAML
55
148
0
18 Sep 2023
Certifying LLM Safety against Adversarial Prompting
Certifying LLM Safety against Adversarial Prompting
Aounon Kumar
Chirag Agarwal
Suraj Srinivas
Aaron Jiaxun Li
Soheil Feizi
Himabindu Lakkaraju
AAML
74
191
0
06 Sep 2023
Universal and Transferable Adversarial Attacks on Aligned Language
  Models
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
287
1,449
0
27 Jul 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
314
4,288
0
09 Jun 2023
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and
  Implicit Hate Speech Detection
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Thomas Hartvigsen
Saadia Gabriel
Hamid Palangi
Maarten Sap
Dipankar Ray
Ece Kamar
70
374
0
17 Mar 2022
A New Generation of Perspective API: Efficient Multilingual
  Character-level Transformers
A New Generation of Perspective API: Efficient Multilingual Character-level Transformers
Alyssa Lees
Vinh Q. Tran
Yi Tay
Jeffrey Scott Sorensen
Jai Gupta
Donald Metzler
Lucy Vasserman
78
190
0
22 Feb 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
227
4,392
0
27 Oct 2021
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
  Models
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
133
1,194
0
24 Sep 2020
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers
Iryna Gurevych
1.2K
12,181
0
27 Aug 2019
CommonsenseQA: A Question Answering Challenge Targeting Commonsense
  Knowledge
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
Alon Talmor
Jonathan Herzig
Nicholas Lourie
Jonathan Berant
RALM
140
1,725
0
02 Nov 2018
1