Robustly identifying concepts introduced during chat fine-tuning using crosscoders

3 April 2025

Papers citing "Robustly identifying concepts introduced during chat fine-tuning using crosscoders"

1 / 1 papers shown

Title
Response Uncertainty and Probe Modeling: Two Sides of the Same Coin in LLM Interpretability? Yongjie Wang Yibo Wang Xin Zhou Zhiqi Shen 5 0 0 24 May 2025