Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.08145
Cited By
Refusal Behavior in Large Language Models: A Nonlinear Perspective
14 January 2025
Fabian Hildebrandt
Andreas K. Maier
Patrick Krauss
A. Schilling
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Refusal Behavior in Large Language Models: A Nonlinear Perspective"
3 / 3 papers shown
Title
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
Seongmin Lee
Aeree Cho
Grace C. Kim
ShengYun Peng
Mansi Phute
Duen Horng Chau
LM&MA
AI4CE
82
0
0
05 Jun 2025
From Rogue to Safe AI: The Role of Explicit Refusals in Aligning LLMs with International Humanitarian Law
John Mavi
Diana Teodora Găitan
Sergio Coronado
28
0
0
05 Jun 2025
From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs
Stanley Yu
Vaidehi Bulusu
Oscar Yasunaga
Clayton Lau
Cole Blondin
Sean O'Brien
Kevin Zhu
Vasu Sharma
72
0
0
27 May 2025
1