ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.04161
  4. Cited By
Attention with Markov: A Framework for Principled Analysis of
  Transformers via Markov Chains

Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains

6 February 2024
Ashok Vardhan Makkuva
Marco Bondaschi
Adway Girish
Alliot Nagle
Martin Jaggi
Hyeji Kim
Michael C. Gastpar
    OffRL
ArXivPDFHTML

Papers citing "Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains"

23 / 23 papers shown
Title
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
Yingcong Li
Davoud Ataee Tarzanagh
A. S. Rawat
Maryam Fazel
Samet Oymak
30
0
0
06 Apr 2025
Cognitive Activation and Chaotic Dynamics in Large Language Models: A Quasi-Lyapunov Analysis of Reasoning Mechanisms
Cognitive Activation and Chaotic Dynamics in Large Language Models: A Quasi-Lyapunov Analysis of Reasoning Mechanisms
Xiaojian Li
Yongkang Leng
Ruiqing Ding
Hangjie Mo
Shanlin Yang
LRM
52
0
0
15 Mar 2025
Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size
Alireza Behtash
Marijan Fofonjka
Ethan Baird
Tyler Mauer
Hossein Moghimifam
David Stout
Joel Dennison
MQ
66
1
0
06 Mar 2025
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Yutong Yin
Zhaoran Wang
LRM
ReLM
226
0
0
27 Jan 2025
Transformers learn variable-order Markov chains in-context
Transformers learn variable-order Markov chains in-context
Ruida Zhou
C. Tian
Suhas Diggavi
28
0
0
07 Oct 2024
Training Nonlinear Transformers for Chain-of-Thought Inference: A
  Theoretical Generalization Analysis
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Hongkang Li
Meng Wang
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
LRM
35
5
0
03 Oct 2024
Large Language Models as Markov Chains
Large Language Models as Markov Chains
Oussama Zekri
Ambroise Odonnat
Abdelhakim Benechehab
Linus Bleistein
Nicolas Boullé
I. Redko
53
10
0
03 Oct 2024
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Wanpeng Zhang
Zilong Xie
Yicheng Feng
Yijiang Li
Xingrun Xing
Sipeng Zheng
Zongqing Lu
MLLM
30
0
0
03 Oct 2024
Analysis of Unstructured High-Density Crowded Scenes for Crowd
  Monitoring
Analysis of Unstructured High-Density Crowded Scenes for Crowd Monitoring
Alexandre Matov
31
1
0
06 Aug 2024
Transformers on Markov Data: Constant Depth Suffices
Transformers on Markov Data: Constant Depth Suffices
Nived Rajaraman
Marco Bondaschi
Kannan Ramchandran
Michael C. Gastpar
Ashok Vardhan Makkuva
51
4
0
25 Jul 2024
On the Power of Convolution Augmented Transformer
On the Power of Convolution Augmented Transformer
Mingchen Li
Xuechen Zhang
Yixiao Huang
Samet Oymak
40
1
0
08 Jul 2024
Do LLMs dream of elephants (when told not to)? Latent concept
  association and associative memory in transformers
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
Yibo Jiang
Goutham Rajendran
Pradeep Ravikumar
Bryon Aragam
CLL
KELM
47
6
0
26 Jun 2024
Understanding and Mitigating Tokenization Bias in Language Models
Understanding and Mitigating Tokenization Bias in Language Models
Buu Phan
Marton Havasi
Matthew Muckley
Karen Ullrich
54
3
0
24 Jun 2024
Training LLMs over Neurally Compressed Text
Training LLMs over Neurally Compressed Text
Brian Lester
Jaehoon Lee
A. Alemi
Jeffrey Pennington
Adam Roberts
Jascha Narain Sohl-Dickstein
Noah Constant
45
6
0
04 Apr 2024
Mechanics of Next Token Prediction with Self-Attention
Mechanics of Next Token Prediction with Self-Attention
Yingcong Li
Yixiao Huang
M. E. Ildiz
A. S. Rawat
Samet Oymak
42
27
0
12 Mar 2024
From Self-Attention to Markov Models: Unveiling the Dynamics of
  Generative Transformers
From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers
M. E. Ildiz
Yixiao Huang
Yingcong Li
A. S. Rawat
Samet Oymak
38
17
0
21 Feb 2024
The Evolution of Statistical Induction Heads: In-Context Learning Markov
  Chains
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains
Benjamin L. Edelman
Ezra Edelman
Surbhi Goel
Eran Malach
Nikolaos Tsilivis
BDL
29
43
0
16 Feb 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
39
15
0
08 Feb 2024
Attention Meets Post-hoc Interpretability: A Mathematical Perspective
Attention Meets Post-hoc Interpretability: A Mathematical Perspective
Gianluigi Lopardo
F. Precioso
Damien Garreau
21
4
0
05 Feb 2024
Dissecting Recall of Factual Associations in Auto-Regressive Language
  Models
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
211
270
0
28 Apr 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic
  Understanding
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
Yuchen Li
Yuan-Fang Li
Andrej Risteski
120
61
0
07 Mar 2023
Interpretability in the Wild: a Circuit for Indirect Object
  Identification in GPT-2 small
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
215
507
0
01 Nov 2022
Rethinking embedding coupling in pre-trained language models
Rethinking embedding coupling in pre-trained language models
Hyung Won Chung
Thibault Févry
Henry Tsai
Melvin Johnson
Sebastian Ruder
95
142
0
24 Oct 2020
1