ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.01174
  4. Cited By
Towards Inference-time Category-wise Safety Steering for Large Language
  Models

Towards Inference-time Category-wise Safety Steering for Large Language Models

2 October 2024
Amrita Bhattacharjee
Shaona Ghosh
Traian Rebedea
Christopher Parisien
    LLMSV
ArXivPDFHTML

Papers citing "Towards Inference-time Category-wise Safety Steering for Large Language Models"

2 / 2 papers shown
Title
What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks
What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks
Nathalie Maria Kirch
Constantin Weisser
Severin Field
Helen Yannakoudakis
Stephen Casper
39
2
0
02 Nov 2024
Focus On This, Not That! Steering LLMs With Adaptive Feature Specification
Focus On This, Not That! Steering LLMs With Adaptive Feature Specification
Tom A. Lamb
Adam Davies
Alasdair Paren
Philip H. S. Torr
Francesco Pinto
47
0
0
30 Oct 2024
1