ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.18244
  4. Cited By
Multi-Scale Probabilistic Generation Theory: A Hierarchical Framework for Interpreting Large Language Models

Multi-Scale Probabilistic Generation Theory: A Hierarchical Framework for Interpreting Large Language Models

23 May 2025
Yukin Zhang
Qi Dong
ArXiv (abs)PDFHTML

Papers citing "Multi-Scale Probabilistic Generation Theory: A Hierarchical Framework for Interpreting Large Language Models"

25 / 25 papers shown
Title
Can Large Language Models Understand Context?
Can Large Language Models Understand Context?
Yilun Zhu
Joel Ruben Antony Moniz
Shruti Bhargava
Jiarui Lu
Dhivya Piraviperumal
Site Li
Yuan-kang Zhang
Hong-ye Yu
Bo-Hsiang Tseng
80
25
0
01 Feb 2024
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
419
12,076
0
18 Jul 2023
The Internal State of an LLM Knows When It's Lying
The Internal State of an LLM Knows When It's Lying
A. Azaria
Tom Michael Mitchell
HILM
318
345
0
26 Apr 2023
Constitutional AI: Harmlessness from AI Feedback
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDaMoMe
214
1,646
0
15 Dec 2022
Emergent Abilities of Large Language Models
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELMReLMLRM
295
2,521
0
15 Jun 2022
Locating and Editing Factual Associations in GPT
Locating and Editing Factual Associations in GPT
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
251
1,389
0
10 Feb 2022
Improving language models by retrieving from trillions of tokens
Improving language models by retrieving from trillions of tokens
Sebastian Borgeaud
A. Mensch
Jordan Hoffmann
Trevor Cai
Eliza Rutherford
...
Simon Osindero
Karen Simonyan
Jack W. Rae
Erich Elsen
Laurent Sifre
KELMRALM
254
1,100
0
08 Dec 2021
Transformer Feed-Forward Layers Are Key-Value Memories
Transformer Feed-Forward Layers Are Key-Value Memories
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
185
848
0
29 Dec 2020
Leveraging Passage Retrieval with Generative Models for Open Domain
  Question Answering
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
Gautier Izacard
Edouard Grave
RALM
147
1,184
0
02 Jul 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
904
42,520
0
28 May 2020
Quantifying Attention Flow in Transformers
Quantifying Attention Flow in Transformers
Samira Abnar
Willem H. Zuidema
169
803
0
02 May 2020
A Primer in BERTology: What we know about how BERT works
A Primer in BERTology: What we know about how BERT works
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
106
1,503
0
27 Feb 2020
Plug and Play Language Models: A Simple Approach to Controlled Text
  Generation
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
Sumanth Dathathri
Andrea Madotto
Janice Lan
Jane Hung
Eric Frank
Piero Molino
J. Yosinski
Rosanne Liu
KELM
151
979
0
04 Dec 2019
The Bottom-up Evolution of Representations in the Transformer: A Study
  with Machine Translation and Language Modeling Objectives
The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives
Elena Voita
Rico Sennrich
Ivan Titov
294
187
0
03 Sep 2019
Revealing the Dark Secrets of BERT
Revealing the Dark Secrets of BERT
Olga Kovaleva
Alexey Romanov
Anna Rogers
Anna Rumshisky
53
554
0
21 Aug 2019
A Multiscale Visualization of Attention in the Transformer Model
A Multiscale Visualization of Attention in the Transformer Model
Jesse Vig
ViT
79
582
0
12 Jun 2019
What Does BERT Look At? An Analysis of BERT's Attention
What Does BERT Look At? An Analysis of BERT's Attention
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
MILM
235
1,605
0
11 Jun 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy
  Lifting, the Rest Can Be Pruned
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
119
1,149
0
23 May 2019
What do you learn from context? Probing for sentence structure in
  contextualized word representations
What do you learn from context? Probing for sentence structure in contextualized word representations
Ian Tenney
Patrick Xia
Berlin Chen
Alex Jinpeng Wang
Adam Poliak
...
Najoung Kim
Benjamin Van Durme
Samuel R. Bowman
Dipanjan Das
Ellie Pavlick
189
865
0
15 May 2019
BERT Rediscovers the Classical NLP Pipeline
BERT Rediscovers the Classical NLP Pipeline
Ian Tenney
Dipanjan Das
Ellie Pavlick
MILMSSeg
145
1,482
0
15 May 2019
Linguistic Knowledge and Transferability of Contextual Representations
Linguistic Knowledge and Transferability of Contextual Representations
Nelson F. Liu
Matt Gardner
Yonatan Belinkov
Matthew E. Peters
Noah A. Smith
137
735
0
21 Mar 2019
Analysis Methods in Neural Language Processing: A Survey
Analysis Methods in Neural Language Processing: A Survey
Yonatan Belinkov
James R. Glass
98
558
0
21 Dec 2018
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
811
132,725
0
12 Jun 2017
Emergence of Invariance and Disentanglement in Deep Representations
Emergence of Invariance and Disentanglement in Deep Representations
Alessandro Achille
Stefano Soatto
OODDRL
115
477
0
05 Jun 2017
Deep Learning and the Information Bottleneck Principle
Deep Learning and the Information Bottleneck Principle
Naftali Tishby
Noga Zaslavsky
DRL
220
1,595
0
09 Mar 2015
1