ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.09169
  4. Cited By
Engineering Monosemanticity in Toy Models

Engineering Monosemanticity in Toy Models

16 November 2022
Adam Jermyn
Nicholas Schiefer
Evan Hubinger
    MILM
ArXivPDFHTML

Papers citing "Engineering Monosemanticity in Toy Models"

8 / 8 papers shown
Title
Mixture of Experts Made Intrinsically Interpretable
Xingyi Yang
Constantin Venhoff
Ashkan Khakzar
Christian Schroeder de Witt
P. Dokania
Adel Bibi
Philip Torr
MoE
57
0
0
05 Mar 2025
The Representation and Recall of Interwoven Structured Knowledge in LLMs: A Geometric and Layered Analysis
The Representation and Recall of Interwoven Structured Knowledge in LLMs: A Geometric and Layered Analysis
Ge Lei
Samuel J. Cooper
KELM
51
0
0
15 Feb 2025
Enhancing Neural Network Interpretability with Feature-Aligned Sparse
  Autoencoders
Enhancing Neural Network Interpretability with Feature-Aligned Sparse Autoencoders
Luke Marks
Alasdair Paren
David M. Krueger
Fazl Barez
AAML
27
4
0
02 Nov 2024
Mechanistic Interpretability for AI Safety -- A Review
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
45
118
0
22 Apr 2024
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
165
190
0
02 May 2023
Disentangling Neuron Representations with Concept Vectors
Disentangling Neuron Representations with Concept Vectors
Laura O'Mahony
Vincent Andrearczyk
Henning Muller
Mara Graziani
MILM
34
14
0
19 Apr 2023
Polysemanticity and Capacity in Neural Networks
Polysemanticity and Capacity in Neural Networks
Adam Scherlis
Kshitij Sachan
Adam Jermyn
Joe Benton
Buck Shlegeris
MILM
135
25
0
04 Oct 2022
Toy Models of Superposition
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
133
326
0
21 Sep 2022
1