ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.21965
  4. Cited By
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and
  Prompt Types

SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types

29 October 2024
Yutao Mou
Shikun Zhang
Wei Ye
    ELM
ArXiv (abs)PDFHTML

Papers citing "SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types"

9 / 9 papers shown
Title
FORTRESS: Frontier Risk Evaluation for National Security and Public Safety
FORTRESS: Frontier Risk Evaluation for National Security and Public Safety
Christina Q. Knight
Kaustubh Deshpande
Ved Sirdeshmukh
Meher Mankikar
Scale Red Team
SEAL Research Team
Julian Michael
AAMLELM
51
0
0
17 Jun 2025
Beyond Jailbreaks: Revealing Stealthier and Broader LLM Security Risks Stemming from Alignment Failures
Beyond Jailbreaks: Revealing Stealthier and Broader LLM Security Risks Stemming from Alignment Failures
Yukai Zhou
Sibei Yang
Wenjie Wang
AAML
21
0
0
09 Jun 2025
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI
Ranjan Sapkota
Konstantinos I. Roumeliotis
Manoj Karkee
117
1
0
26 May 2025
Evaluation Faking: Unveiling Observer Effects in Safety Evaluation of Frontier AI Systems
Evaluation Faking: Unveiling Observer Effects in Safety Evaluation of Frontier AI Systems
Yihe Fan
Wenqi Zhang
Xudong Pan
Min Yang
89
0
0
23 May 2025
SafeVid: Toward Safety Aligned Video Large Multimodal Models
SafeVid: Toward Safety Aligned Video Large Multimodal Models
Yixu Wang
Jiaxin Song
Yifeng Gao
Xin Wang
Yang Yao
Yan Teng
Xingjun Ma
Yingchun Wang
Yu-Gang Jiang
139
0
0
17 May 2025
TeleEval-OS: Performance evaluations of large language models for operations scheduling
TeleEval-OS: Performance evaluations of large language models for operations scheduling
Yanyan Wang
Yingying Wang
Junli Liang
Yin Xu
Yunlong Liu
...
Fei Li
Long Zhao
Kuang Xu
Qi Song
Xiangyang Li
AI4TS
32
0
0
06 May 2025
SaRO: Enhancing LLM Safety through Reasoning-based Alignment
SaRO: Enhancing LLM Safety through Reasoning-based Alignment
Yutao Mou
Yuxiao Luo
Shikun Zhang
Wei Ye
LLMSVLRM
63
2
0
13 Apr 2025
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Xiaoshuai Song
Muxi Diao
Guanting Dong
Zhengyang Wang
Yujia Fu
...
Yejie Wang
Zhuoma Gongque
Jianing Yu
Qiuna Tan
Weiran Xu
ELM
183
15
0
12 Jun 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELMKELM
169
39
0
08 Apr 2024
1