ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.05147
  4. Cited By
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2

9 August 2024
Tom Lieberum
Senthooran Rajamanoharan
Arthur Conmy
Lewis Smith
Nicolas Sonnerat
Vikrant Varma
János Kramár
Anca Dragan
Rohin Shah
Neel Nanda
ArXivPDFHTML

Papers citing "Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2"

17 / 67 papers shown
Title
Analyzing (In)Abilities of SAEs via Formal Languages
Analyzing (In)Abilities of SAEs via Formal Languages
Abhinav Menon
Manish Shrivastava
David M. Krueger
Ekdeep Singh Lubana
50
7
0
15 Oct 2024
Mechanistic Permutability: Match Features Across Layers
Mechanistic Permutability: Match Features Across Layers
Nikita Balagansky
Ian Maksimov
Daniil Gavrilov
21
4
0
10 Oct 2024
The Geometry of Concepts: Sparse Autoencoder Feature Structure
The Geometry of Concepts: Sparse Autoencoder Feature Structure
Yuxiao Li
Eric J. Michaud
David D. Baek
Joshua Engels
Xiaoqing Sun
Max Tegmark
58
8
0
10 Oct 2024
SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders
SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders
Constantin Venhoff
Anisoara Calinescu
Philip Torr
Christian Schroeder de Witt
35
0
0
09 Oct 2024
Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders
Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders
Michael Lan
Philip Torr
Austin Meek
Ashkan Khakzar
David M. Krueger
Fazl Barez
43
11
0
09 Oct 2024
Sparse Autoencoders Reveal Temporal Difference Learning in Large
  Language Models
Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models
Can Demircan
Tankred Saanum
Akshay K. Jagadish
Marcel Binz
Eric Schulz
35
1
0
02 Oct 2024
Towards Inference-time Category-wise Safety Steering for Large Language
  Models
Towards Inference-time Category-wise Safety Steering for Large Language Models
Amrita Bhattacharjee
Shaona Ghosh
Traian Rebedea
Christopher Parisien
LLMSV
37
4
0
02 Oct 2024
A is for Absorption: Studying Feature Splitting and Absorption in Sparse
  Autoencoders
A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
David Chanin
James Wilken-Smith
Tomáš Dulka
Hardik Bhatnagar
Joseph Bloom
23
20
0
22 Sep 2024
Residual Stream Analysis with Multi-Layer SAEs
Residual Stream Analysis with Multi-Layer SAEs
Tim Lawson
Lucy Farnik
Conor Houghton
Laurence Aitchison
31
3
0
06 Sep 2024
Evaluating Open-Source Sparse Autoencoders on Disentangling Factual
  Knowledge in GPT-2 Small
Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small
Maheep Chaudhary
Atticus Geiger
26
13
0
05 Sep 2024
Measuring Progress in Dictionary Learning for Language Model
  Interpretability with Board Game Models
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Adam Karvonen
Benjamin Wright
Can Rager
Rico Angell
Jannik Brinkmann
Logan Smith
C. M. Verdun
David Bau
Samuel Marks
38
26
0
31 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
85
22
0
02 Jul 2024
Interpretability in the Wild: a Circuit for Indirect Object
  Identification in GPT-2 small
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
212
507
0
01 Nov 2022
Toy Models of Superposition
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
133
322
0
21 Sep 2022
Probing Classifiers: Promises, Shortcomings, and Advances
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
226
409
0
24 Feb 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,833
0
17 Sep 2019
Efficient Estimation of Word Representations in Vector Space
Efficient Estimation of Word Representations in Vector Space
Tomáš Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
302
31,280
0
16 Jan 2013
Previous
12