Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains

6 February 2024

Ashok Vardhan Makkuva

Papers citing "Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains"

23 / 23 papers shown

Title
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning Yingcong Li Davoud Ataee Tarzanagh A. S. Rawat Maryam Fazel Samet Oymak 30 0 0 06 Apr 2025
Cognitive Activation and Chaotic Dynamics in Large Language Models: A Quasi-Lyapunov Analysis of Reasoning Mechanisms Xiaojian Li Yongkang Leng Ruiqing Ding Hangjie Mo Shanlin Yang LRM 52 0 0 15 Mar 2025
Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size Alireza Behtash Marijan Fofonjka Ethan Baird Tyler Mauer Hossein Moghimifam David Stout Joel Dennison MQ 66 1 0 06 Mar 2025
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data? Yutong Yin Zhaoran Wang LRM ReLM 226 0 0 27 Jan 2025
Transformers learn variable-order Markov chains in-context Ruida Zhou C. Tian Suhas Diggavi 28 0 0 07 Oct 2024
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis Hongkang Li Meng Wang Songtao Lu Xiaodong Cui Pin-Yu Chen LRM 35 5 0 03 Oct 2024
Large Language Models as Markov Chains Oussama Zekri Ambroise Odonnat Abdelhakim Benechehab Linus Bleistein Nicolas Boullé I. Redko 53 10 0 03 Oct 2024
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities Wanpeng Zhang Zilong Xie Yicheng Feng Yijiang Li Xingrun Xing Sipeng Zheng Zongqing Lu MLLM 30 0 0 03 Oct 2024
Analysis of Unstructured High-Density Crowded Scenes for Crowd Monitoring Alexandre Matov 31 1 0 06 Aug 2024
Transformers on Markov Data: Constant Depth Suffices Nived Rajaraman Marco Bondaschi Kannan Ramchandran Michael C. Gastpar Ashok Vardhan Makkuva 51 4 0 25 Jul 2024
On the Power of Convolution Augmented Transformer Mingchen Li Xuechen Zhang Yixiao Huang Samet Oymak 40 1 0 08 Jul 2024
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers Yibo Jiang Goutham Rajendran Pradeep Ravikumar Bryon Aragam CLL KELM 47 6 0 26 Jun 2024
Understanding and Mitigating Tokenization Bias in Language Models Buu Phan Marton Havasi Matthew Muckley Karen Ullrich 54 3 0 24 Jun 2024
Training LLMs over Neurally Compressed Text Brian Lester Jaehoon Lee A. Alemi Jeffrey Pennington Adam Roberts Jascha Narain Sohl-Dickstein Noah Constant 45 6 0 04 Apr 2024
Mechanics of Next Token Prediction with Self-Attention Yingcong Li Yixiao Huang M. E. Ildiz A. S. Rawat Samet Oymak 42 27 0 12 Mar 2024
From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers M. E. Ildiz Yixiao Huang Yingcong Li A. S. Rawat Samet Oymak 38 17 0 21 Feb 2024
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains Benjamin L. Edelman Ezra Edelman Surbhi Goel Eran Malach Nikolaos Tsilivis BDL 29 43 0 16 Feb 2024
Implicit Bias and Fast Convergence Rates for Self-attention Bhavya Vasudeva Puneesh Deora Christos Thrampoulidis 39 15 0 08 Feb 2024
Attention Meets Post-hoc Interpretability: A Mathematical Perspective Gianluigi Lopardo F. Precioso Damien Garreau 21 4 0 05 Feb 2024
Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 211 270 0 28 Apr 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding Yuchen Li Yuan-Fang Li Andrej Risteski 120 61 0 07 Mar 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 215 507 0 01 Nov 2022
Rethinking embedding coupling in pre-trained language models Hyung Won Chung Thibault Févry Henry Tsai Melvin Johnson Sebastian Ruder 95 142 0 24 Oct 2020