ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.20271
  4. Cited By
Investigating task-specific prompts and sparse autoencoders for activation monitoring

Investigating task-specific prompts and sparse autoencoders for activation monitoring

28 April 2025
Henk Tillman
Dan Mossing
    LLMSV
ArXiv (abs)PDFHTML

Papers citing "Investigating task-specific prompts and sparse autoencoders for activation monitoring"

5 / 5 papers shown
Title
Detecting High-Stakes Interactions with Activation Probes
Detecting High-Stakes Interactions with Activation Probes
Alex McKenzie
Urja Pawar
Phil Blandfort
William Bankes
David M. Krueger
Ekdeep Singh Lubana
Dmitrii Krasheninnikov
169
0
0
12 Jun 2025
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
Subhash Kantamneni
Joshua Engels
Senthooran Rajamanoharan
Max Tegmark
Neel Nanda
152
17
0
23 Feb 2025
Sparse Autoencoder Features for Classifications and Transferability
Sparse Autoencoder Features for Classifications and Transferability
Jack Gallifant
Shan Chen
Kuleen Sasse
Hugo J. W. L. Aerts
Thomas Hartvigsen
Danielle S. Bitterman
103
6
0
17 Feb 2025
Detecting Strategic Deception Using Linear Probes
Detecting Strategic Deception Using Linear Probes
Nicholas Goldowsky-Dill
Bilal Chughtai
Stefan Heimersheim
Marius Hobbhahn
LLMSV
132
10
0
05 Feb 2025
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Hadas Orgad
Michael Toker
Zorik Gekhman
Roi Reichart
Idan Szpektor
Hadas Kotek
Yonatan Belinkov
HILMAIFin
134
45
0
03 Oct 2024
1