Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.01174
Cited By
Towards Inference-time Category-wise Safety Steering for Large Language Models
2 October 2024
Amrita Bhattacharjee
Shaona Ghosh
Traian Rebedea
Christopher Parisien
LLMSV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towards Inference-time Category-wise Safety Steering for Large Language Models"
2 / 2 papers shown
Title
What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks
Nathalie Maria Kirch
Constantin Weisser
Severin Field
Helen Yannakoudakis
Stephen Casper
39
2
0
02 Nov 2024
Focus On This, Not That! Steering LLMs With Adaptive Feature Specification
Tom A. Lamb
Adam Davies
Alasdair Paren
Philip H. S. Torr
Francesco Pinto
47
0
0
30 Oct 2024
1