Legilimens: Practical and Unified Content Moderation for Large Language Model Services

28 August 2024

Papers citing "Legilimens: Practical and Unified Content Moderation for Large Language Model Services"

4 / 4 papers shown

Title
Advancing Embodied Agent Security: From Safety Benchmarks to Input Moderation Ning Wang Zihan Yan W. Li Chuan Ma H. Chen Tao Xiang AAML 35 0 0 22 Apr 2025
EdgeAIGuard: Agentic LLMs for Minor Protection in Digital Spaces G. Mujtaba Sunder Ali Khowaja K. Dev 38 0 0 28 Feb 2025
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond Shanshan Han 84 1 0 09 Oct 2024
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability Xing-ming Guo Fangxu Yu Huan Zhang Lianhui Qin Bin Hu AAML 117 69 0 13 Feb 2024