Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.21965
Cited By
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
29 October 2024
Yutao Mou
Shikun Zhang
Wei Ye
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types"
9 / 9 papers shown
Title
FORTRESS: Frontier Risk Evaluation for National Security and Public Safety
Christina Q. Knight
Kaustubh Deshpande
Ved Sirdeshmukh
Meher Mankikar
Scale Red Team
SEAL Research Team
Julian Michael
AAML
ELM
51
0
0
17 Jun 2025
Beyond Jailbreaks: Revealing Stealthier and Broader LLM Security Risks Stemming from Alignment Failures
Yukai Zhou
Sibei Yang
Wenjie Wang
AAML
21
0
0
09 Jun 2025
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI
Ranjan Sapkota
Konstantinos I. Roumeliotis
Manoj Karkee
117
1
0
26 May 2025
Evaluation Faking: Unveiling Observer Effects in Safety Evaluation of Frontier AI Systems
Yihe Fan
Wenqi Zhang
Xudong Pan
Min Yang
89
0
0
23 May 2025
SafeVid: Toward Safety Aligned Video Large Multimodal Models
Yixu Wang
Jiaxin Song
Yifeng Gao
Xin Wang
Yang Yao
Yan Teng
Xingjun Ma
Yingchun Wang
Yu-Gang Jiang
139
0
0
17 May 2025
TeleEval-OS: Performance evaluations of large language models for operations scheduling
Yanyan Wang
Yingying Wang
Junli Liang
Yin Xu
Yunlong Liu
...
Fei Li
Long Zhao
Kuang Xu
Qi Song
Xiangyang Li
AI4TS
32
0
0
06 May 2025
SaRO: Enhancing LLM Safety through Reasoning-based Alignment
Yutao Mou
Yuxiao Luo
Shikun Zhang
Wei Ye
LLMSV
LRM
63
2
0
13 Apr 2025
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Xiaoshuai Song
Muxi Diao
Guanting Dong
Zhengyang Wang
Yujia Fu
...
Yejie Wang
Zhuoma Gongque
Jianing Yu
Qiuna Tan
Weiran Xu
ELM
183
15
0
12 Jun 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELM
KELM
169
39
0
08 Apr 2024
1