ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.02724
40
10

Large Language Models as Markov Chains

3 October 2024
Oussama Zekri
Ambroise Odonnat
Abdelhakim Benechehab
Linus Bleistein
Nicolas Boullé
I. Redko
ArXivPDFHTML
Abstract

Large language models (LLMs) are remarkably efficient across a wide range of natural language processing tasks and well beyond them. However, a comprehensive theoretical analysis of the LLMs' generalization capabilities remains elusive. In our paper, we approach this task by drawing an equivalence between autoregressive transformer-based language models and Markov chains defined on a finite state space. This allows us to study the multi-step inference mechanism of LLMs from first principles. We relate the obtained results to the pathological behavior observed with LLMs such as repetitions and incoherent replies with high temperature. Finally, we leverage the proposed formalization to derive pre-training and in-context learning generalization bounds for LLMs under realistic data and model assumptions. Experiments with the most recent Llama and Gemma herds of models show that our theory correctly captures their behavior in practice.

View on arXiv
@article{zekri2025_2410.02724,
  title={ Large Language Models as Markov Chains },
  author={ Oussama Zekri and Ambroise Odonnat and Abdelhakim Benechehab and Linus Bleistein and Nicolas Boullé and Ievgen Redko },
  journal={arXiv preprint arXiv:2410.02724},
  year={ 2025 }
}
Comments on this paper