XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs

30 April 2025

Papers citing "XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs"

3 / 3 papers shown

Title
Atla Selene Mini: A General Purpose Evaluation Model Andrei Alexandru Antonia Calvi Henry Broomfield Jackson Golden Kyle Dai ... Max Bartolo Roman Engeler Sashank Pisupati Toby Drane Young Sun Park ALM ELM AILaw LM&MA LRM 101 6 0 27 Jan 2025
FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering Md Rafi Ur Rashid Vishnu Asutosh Dasu Kang Gu Najrin Sultana Shagufta Mehnaz AAML FedML 176 12 0 24 Oct 2023
A Unified Approach to Interpreting Model Predictions Scott M. Lundberg Su-In Lee FAtt 1.2K 22,295 0 22 May 2017