ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.11356
  4. Cited By
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models

SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models

17 February 2025
Z. He
Haiyan Zhao
Yiran Qiao
Fan Yang
Ali Payani
Jing Ma
Mengnan Du
    LLMSV
ArXivPDFHTML

Papers citing "SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models"

1 / 1 papers shown
Title
Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders
Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders
Dong Shu
Xuansheng Wu
Haiyan Zhao
Mengnan Du
Ninghao Liu
LLMSV
40
0
0
12 May 2025
1