Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.17969
Cited By
Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective
25 June 2024
Hanqi Yan
Yanzheng Xiang
Guangyi Chen
Yifei Wang
Lin Gui
Yulan He
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective"
7 / 7 papers shown
Title
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
Chaitali Bhattacharyya
Yeseong Kim
45
0
0
01 May 2025
Multi-Faceted Multimodal Monosemanticity
Hanqi Yan
Xiangxiang Cui
Lu Yin
Paul Pu Liang
Yulan He
Yifei Wang
44
0
0
16 Feb 2025
Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic Interpretability: A Case Study on Othello-GPT
Zhengfu He
Xuyang Ge
Qiong Tang
Tianxiang Sun
Qinyuan Cheng
Xipeng Qiu
39
20
0
19 Feb 2024
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee
Xiaoyan Bai
Itamar Pres
Martin Wattenberg
Jonathan K. Kummerfeld
Rada Mihalcea
77
96
0
03 Jan 2024
Cognitive Reframing of Negative Thoughts through Human-Language Model Interaction
Ashish Sharma
Kevin Rushton
Inna Wanyin Lin
David Wadden
Khendra G. Lucas
Adam S. Miner
Theresa Nguyen
Tim Althoff
73
71
0
04 May 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
160
188
0
02 May 2023
On Feature Decorrelation in Self-Supervised Learning
Tianyu Hua
Wenxiao Wang
Zihui Xue
Sucheng Ren
Yue Wang
Hang Zhao
SSL
OOD
133
187
0
02 May 2021
1