
Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
Papers citing "Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks"
17 / 17 papers shown
Title |
---|
![]() Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations Hakan Inan Kartikeya Upasani Jianfeng Chi Rashi Rungta Krithika Iyer ...Michael Tontchev Qing Hu Brian Fuller Davide Testuggine Madian Khabsa |