Recurrent Neural Networks Learn to Store and Generate Sequences using
Non-Linear Representations

Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations

20 August 2024

Róbert Csordás

Christopher Potts

Christopher D. Manning

Papers citing "Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations"

10 / 10 papers shown

Title
MIB: A Mechanistic Interpretability Benchmark Aaron Mueller Atticus Geiger Sarah Wiegreffe Dana Arad Iván Arcuschin ... Alessandro Stolfo Martin Tutek Amir Zur David Bau Yonatan Belinkov 51 1 0 17 Apr 2025
On Linear Representations and Pretraining Data Frequency in Language Models Jack Merullo Noah A. Smith Sarah Wiegreffe Yanai Elazar 40 0 0 16 Apr 2025
HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks Jiuding Sun Jing Huang Sidharth Baskaran Karel DÓosterlinck Christopher Potts Michael Sklar Atticus Geiger AI4CE 71 1 0 13 Mar 2025
Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages Jannik Brinkmann Chris Wendler Christian Bartelt Aaron Mueller 51 9 0 10 Jan 2025
ICLR: In-Context Learning of Representations Core Francisco Park Andrew Lee Ekdeep Singh Lubana Yongyi Yang Maya Okawa Kento Nishi Martin Wattenberg Hidenori Tanaka AIFin 120 3 0 29 Dec 2024
Decomposing The Dark Matter of Sparse Autoencoders Joshua Engels Logan Riggs Max Tegmark LLMSV 65 10 0 18 Oct 2024
Improving Instruction-Following in Language Models through Activation Steering Alessandro Stolfo Vidhisha Balachandran Safoora Yousefi Eric Horvitz Besmira Nushi LLMSV 64 17 0 15 Oct 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 82 19 0 02 Jul 2024
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations Jing-ling Huang Zhengxuan Wu Christopher Potts Mor Geva Atticus Geiger 59 27 0 27 Feb 2024
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 212 497 0 01 Nov 2022