Mechanistic evaluation of Transformers and state space models

21 May 2025

Papers citing "Mechanistic evaluation of Transformers and state space models"

11 / 11 papers shown

Title
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism Aviv Bick Eric P. Xing Albert Gu RALM 108 1 0 22 Apr 2025
(How) Do Language Models Track State? Belinda Z. Li Zifan Carl Guo Jacob Andreas LRM 71 2 0 04 Mar 2025
Which Attention Heads Matter for In-Context Learning? Kayo Yin Jacob Steinhardt 53 10 0 19 Feb 2025
Associative Recurrent Memory Transformer Ivan Rodkin Yuri Kuratov Aydar Bulatov Andrey Kravchenko 102 3 0 17 Feb 2025
An Empirical Study of Mamba-based Language Models R. Waleffe Wonmin Byeon Duncan Riach Brandon Norick V. Korthikanti ... Vartika Singh Jared Casper Jan Kautz Mohammad Shoeybi Bryan Catanzaro 94 72 0 12 Jun 2024
gzip Predicts Data-dependent Scaling Laws Rohan Pandey 59 11 0 26 May 2024
The mechanistic basis of data dependence and abrupt learning in an in-context classification task Gautam Reddy 67 59 0 03 Dec 2023
Physics of Language Models: Part 1, Learning Hierarchical Language Structures Zeyuan Allen-Zhu Yuanzhi Li 88 18 0 23 May 2023
A Theory of Emergent In-Context Learning as Implicit Structure Induction Michael Hahn Navin Goyal LRM 45 79 0 14 Mar 2023
Examining the Inductive Bias of Neural Language Models with Artificial Languages Jennifer C. White Ryan Cotterell 52 44 0 02 Jun 2021
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 526 129,831 0 12 Jun 2017