ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.15949
  4. Cited By
Transformer visualization via dictionary learning: contextualized
  embedding as a linear superposition of transformer factors

Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors

29 March 2021
Zeyu Yun
Yubei Chen
Bruno A. Olshausen
Yann LeCun
ArXivPDFHTML

Papers citing "Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors"

22 / 22 papers shown
Title
Are Sparse Autoencoders Useful for Java Function Bug Detection?
Are Sparse Autoencoders Useful for Java Function Bug Detection?
Rui Melo
Claudia Mamede
Andre Catarino
Rui Abreu
Henrique Lopes Cardoso
31
0
0
15 May 2025
UNet with Axial Transformer : A Neural Weather Model for Precipitation Nowcasting
UNet with Axial Transformer : A Neural Weather Model for Precipitation Nowcasting
Maitreya Sonawane
Sumit Mamtani
65
0
0
28 Apr 2025
Understanding the Repeat Curse in Large Language Models from a Feature Perspective
Understanding the Repeat Curse in Large Language Models from a Feature Perspective
Junchi Yao
Shu Yang
Jianhua Xu
Lijie Hu
Mengdi Li
Di Wang
29
0
0
19 Apr 2025
The Complexity of Learning Sparse Superposed Features with Feedback
The Complexity of Learning Sparse Superposed Features with Feedback
Akash Kumar
235
0
0
08 Feb 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Jiajun Song
Zhuoyan Xu
Yiqiao Zhong
88
4
0
31 Dec 2024
Beyond Label Attention: Transparency in Language Models for Automated Medical Coding via Dictionary Learning
Beyond Label Attention: Transparency in Language Models for Automated Medical Coding via Dictionary Learning
John Wu
David Wu
Jimeng Sun
58
1
0
31 Oct 2024
Focus On This, Not That! Steering LLMs With Adaptive Feature Specification
Focus On This, Not That! Steering LLMs With Adaptive Feature Specification
Tom A. Lamb
Adam Davies
Alasdair Paren
Philip Torr
Francesco Pinto
52
0
0
30 Oct 2024
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
Yu Zhao
Alessio Devoto
Giwon Hong
Xiaotang Du
Aryo Pradipta Gema
Hongru Wang
Xuanli He
Kam-Fai Wong
Pasquale Minervini
KELM
LLMSV
45
17
0
21 Oct 2024
The Geometry of Concepts: Sparse Autoencoder Feature Structure
The Geometry of Concepts: Sparse Autoencoder Feature Structure
Yuxiao Li
Eric J. Michaud
David D. Baek
Joshua Engels
Xiaoqing Sun
Max Tegmark
58
8
0
10 Oct 2024
Residual Stream Analysis with Multi-Layer SAEs
Residual Stream Analysis with Multi-Layer SAEs
Tim Lawson
Lucy Farnik
Conor Houghton
Laurence Aitchison
31
3
0
06 Sep 2024
Understanding Generative AI Content with Embedding Models
Understanding Generative AI Content with Embedding Models
Max Vargas
Reilly Cannon
A. Engel
Anand D. Sarwate
Tony Chiang
60
3
0
19 Aug 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
85
22
0
02 Jul 2024
Codebook Features: Sparse and Discrete Interpretability for Neural
  Networks
Codebook Features: Sparse and Discrete Interpretability for Neural Networks
Alex Tamkin
Mohammad Taufeeque
Noah D. Goodman
35
27
0
26 Oct 2023
Towards Best Practices of Activation Patching in Language Models:
  Metrics and Methods
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
Fred Zhang
Neel Nanda
LLMSV
36
101
0
27 Sep 2023
Sparse Autoencoders Find Highly Interpretable Features in Language
  Models
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Hoagy Cunningham
Aidan Ewart
Logan Riggs
R. Huben
Lee Sharkey
MILM
33
347
0
15 Sep 2023
Explaining black box text modules in natural language with language
  models
Explaining black box text modules in natural language with language models
Chandan Singh
Aliyah R. Hsu
Richard Antonello
Shailee Jain
Alexander G. Huth
Bin-Xia Yu
Jianfeng Gao
MILM
34
47
0
17 May 2023
Minimalistic Unsupervised Learning with the Sparse Manifold Transform
Minimalistic Unsupervised Learning with the Sparse Manifold Transform
Yubei Chen
Zeyu Yun
Yi Ma
Bruno A. Olshausen
Yann LeCun
54
8
0
30 Sep 2022
Interpreting Embedding Spaces by Conceptualization
Interpreting Embedding Spaces by Conceptualization
Adi Simhi
Shaul Markovitch
24
5
0
22 Aug 2022
How to Dissect a Muppet: The Structure of Transformer Embedding Spaces
How to Dissect a Muppet: The Structure of Transformer Embedding Spaces
Timothee Mickus
Denis Paperno
Mathieu Constant
31
20
0
07 Jun 2022
Explainable Patterns for Distinction and Prediction of Moral Judgement
  on Reddit
Explainable Patterns for Distinction and Prediction of Moral Judgement on Reddit
Ion Stagkos Efstathiadis
Guilherme Paulino-Passos
Francesca Toni
31
9
0
26 Jan 2022
Translation Error Detection as Rationale Extraction
Translation Error Detection as Rationale Extraction
M. Fomicheva
Lucia Specia
Nikolaos Aletras
21
23
0
27 Aug 2021
Generic Attention-model Explainability for Interpreting Bi-Modal and
  Encoder-Decoder Transformers
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers
Hila Chefer
Shir Gur
Lior Wolf
ViT
31
302
0
29 Mar 2021
1