ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.15105
  4. Cited By
Mechanistic evaluation of Transformers and state space models

Mechanistic evaluation of Transformers and state space models

21 May 2025
Aryaman Arora
Neil Rathi
Nikil Roashan Selvam
Róbert Csordás
Dan Jurafsky
Christopher Potts
ArXivPDFHTML

Papers citing "Mechanistic evaluation of Transformers and state space models"

11 / 11 papers shown
Title
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
Aviv Bick
Eric P. Xing
Albert Gu
RALM
108
1
0
22 Apr 2025
(How) Do Language Models Track State?
Belinda Z. Li
Zifan Carl Guo
Jacob Andreas
LRM
71
2
0
04 Mar 2025
Which Attention Heads Matter for In-Context Learning?
Which Attention Heads Matter for In-Context Learning?
Kayo Yin
Jacob Steinhardt
53
10
0
19 Feb 2025
Associative Recurrent Memory Transformer
Associative Recurrent Memory Transformer
Ivan Rodkin
Yuri Kuratov
Aydar Bulatov
Andrey Kravchenko
102
3
0
17 Feb 2025
An Empirical Study of Mamba-based Language Models
An Empirical Study of Mamba-based Language Models
R. Waleffe
Wonmin Byeon
Duncan Riach
Brandon Norick
V. Korthikanti
...
Vartika Singh
Jared Casper
Jan Kautz
Mohammad Shoeybi
Bryan Catanzaro
94
72
0
12 Jun 2024
gzip Predicts Data-dependent Scaling Laws
gzip Predicts Data-dependent Scaling Laws
Rohan Pandey
59
11
0
26 May 2024
The mechanistic basis of data dependence and abrupt learning in an
  in-context classification task
The mechanistic basis of data dependence and abrupt learning in an in-context classification task
Gautam Reddy
67
59
0
03 Dec 2023
Physics of Language Models: Part 1, Learning Hierarchical Language Structures
Physics of Language Models: Part 1, Learning Hierarchical Language Structures
Zeyuan Allen-Zhu
Yuanzhi Li
88
18
0
23 May 2023
A Theory of Emergent In-Context Learning as Implicit Structure Induction
A Theory of Emergent In-Context Learning as Implicit Structure Induction
Michael Hahn
Navin Goyal
LRM
45
79
0
14 Mar 2023
Examining the Inductive Bias of Neural Language Models with Artificial
  Languages
Examining the Inductive Bias of Neural Language Models with Artificial Languages
Jennifer C. White
Ryan Cotterell
52
44
0
02 Jun 2021
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
526
129,831
0
12 Jun 2017
1