Refusal Behavior in Large Language Models: A Nonlinear Perspective

14 January 2025

Papers citing "Refusal Behavior in Large Language Models: A Nonlinear Perspective"

3 / 3 papers shown

Title
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety Seongmin Lee Aeree Cho Grace C. Kim ShengYun Peng Mansi Phute Duen Horng Chau LM&MA AI4CE 82 0 0 05 Jun 2025
From Rogue to Safe AI: The Role of Explicit Refusals in Aligning LLMs with International Humanitarian Law John Mavi Diana Teodora Găitan Sergio Coronado 28 0 0 05 Jun 2025
From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs Stanley Yu Vaidehi Bulusu Oscar Yasunaga Clayton Lau Cole Blondin Sean O'Brien Kevin Zhu Vasu Sharma 72 0 0 27 May 2025