Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.16444
Cited By
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors
26 February 2024
Zhexin Zhang
Yida Lu
Jingyuan Ma
Di Zhang
Rui Li
Pei Ke
Hao Sun
Lei Sha
Zhifang Sui
Hongning Wang
Minlie Huang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors"
5 / 5 papers shown
Title
JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift
Julien Piet
Xiao Huang
Dennis Jacob
Annabella Chow
Maha Alrashed
Geng Zhao
Zhanhao Hu
Chawin Sitawarin
Basel Alomair
David A. Wagner
AAML
70
0
0
28 Apr 2025
Alleviating the Fear of Losing Alignment in LLM Fine-tuning
Kang Yang
Guanhong Tao
X. Chen
Jun Xu
36
0
0
13 Apr 2025
Be a Multitude to Itself: A Prompt Evolution Framework for Red Teaming
Rui Li
Peiyi Wang
Jingyuan Ma
Di Zhang
Lei Sha
Zhifang Sui
LLMAG
46
0
0
22 Feb 2025
PopAlign: Population-Level Alignment for Fair Text-to-Image Generation
Shufan Li
Harkanwar Singh
Aditya Grover
EGVM
38
2
0
28 Jun 2024
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward
Xuan Xie
Jiayang Song
Zhehua Zhou
Yuheng Huang
Da Song
Lei Ma
OffRL
53
6
0
12 Apr 2024
1