ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.00802
  4. Cited By
Birth of a Transformer: A Memory Viewpoint
v1v2 (latest)

Birth of a Transformer: A Memory Viewpoint

1 June 2023
A. Bietti
Vivien A. Cabannes
Diane Bouchacourt
Hervé Jégou
Léon Bottou
ArXiv (abs)PDFHTML

Papers citing "Birth of a Transformer: A Memory Viewpoint"

19 / 69 papers shown
Title
Learning Associative Memories with Gradient Descent
Learning Associative Memories with Gradient Descent
Vivien A. Cabannes
Berfin Simsek
A. Bietti
109
8
0
28 Feb 2024
Prospector Heads: Generalized Feature Attribution for Large Models &
  Data
Prospector Heads: Generalized Feature Attribution for Large Models & Data
Gautam Machiraju
Alexander Derry
Arjun D Desai
Neel Guha
Amir-Hossein Karimi
James Zou
Russ Altman
Christopher Ré
Parag Mallick
AI4TSMedIm
132
0
0
18 Feb 2024
The Evolution of Statistical Induction Heads: In-Context Learning Markov
  Chains
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains
Benjamin L. Edelman
Ezra Edelman
Surbhi Goel
Eran Malach
Nikolaos Tsilivis
BDL
99
56
0
16 Feb 2024
Understanding In-Context Learning with a Pelican Soup Framework
Understanding In-Context Learning with a Pelican Soup Framework
Ting-Rui Chiang
Dani Yogatama
65
3
0
16 Feb 2024
Secret Collusion among AI Agents: Multi-Agent Deception via Steganography
Secret Collusion among AI Agents: Multi-Agent Deception via Steganography
S. Motwani
Mikhail Baranchuk
Martin Strohmeier
Vijay Bolina
Philip Torr
Lewis Hammond
Christian Schroeder de Witt
196
4
0
12 Feb 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
119
21
0
08 Feb 2024
A phase transition between positional and semantic learning in a
  solvable model of dot-product attention
A phase transition between positional and semantic learning in a solvable model of dot-product attention
Hugo Cui
Freya Behrens
Florent Krzakala
Lenka Zdeborová
MLT
100
16
0
06 Feb 2024
Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains
Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains
Ashok Vardhan Makkuva
Marco Bondaschi
Adway Girish
Alliot Nagle
Martin Jaggi
Hyeji Kim
Michael C. Gastpar
OffRL
85
26
0
06 Feb 2024
Towards Understanding the Word Sensitivity of Attention Layers: A Study
  via Random Features
Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features
Simone Bombari
Marco Mondelli
109
5
0
05 Feb 2024
Self-attention Networks Localize When QK-eigenspectrum Concentrates
Self-attention Networks Localize When QK-eigenspectrum Concentrates
Han Bao
Ryuichiro Hataya
Ryo Karakida
60
6
0
03 Feb 2024
In-Context Learning Dynamics with Random Binary Sequences
In-Context Learning Dynamics with Random Binary Sequences
Eric J. Bigelow
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
T. Ullman
119
5
0
26 Oct 2023
On the Optimization and Generalization of Multi-head Attention
On the Optimization and Generalization of Multi-head Attention
Puneesh Deora
Rouzbeh Ghaderi
Hossein Taheri
Christos Thrampoulidis
MLT
100
35
0
19 Oct 2023
How Do Transformers Learn In-Context Beyond Simple Functions? A Case
  Study on Learning with Representations
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations
Tianyu Guo
Wei Hu
Song Mei
Huan Wang
Caiming Xiong
Silvio Savarese
Yu Bai
114
60
0
16 Oct 2023
Scaling Laws for Associative Memories
Scaling Laws for Associative Memories
Vivien A. Cabannes
Elvis Dohmatob
A. Bietti
150
21
0
04 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and
  Attention
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Yuandong Tian
Yiping Wang
Zhenyu Zhang
Beidi Chen
Simon Shaolei Du
105
41
0
01 Oct 2023
Breaking through the learning plateaus of in-context learning in
  Transformer
Breaking through the learning plateaus of in-context learning in Transformer
Jingwen Fu
Tao Yang
Yuwang Wang
Yan Lu
Nanning Zheng
92
3
0
12 Sep 2023
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing
  Tool for BLIP
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP
Vedant Palit
Rohan Pandey
Aryaman Arora
Paul Pu Liang
88
23
0
27 Aug 2023
Bidirectional Attention as a Mixture of Continuous Word Experts
Bidirectional Attention as a Mixture of Continuous Word Experts
Kevin Christian Wibisono
Yixin Wang
MoE
44
0
0
08 Jul 2023
A Survey on In-context Learning
A Survey on In-context Learning
Qingxiu Dong
Lei Li
Damai Dai
Ce Zheng
Jingyuan Ma
...
Zhiyong Wu
Baobao Chang
Xu Sun
Lei Li
Zhifang Sui
ReLMAIMat
174
547
0
31 Dec 2022
Previous
12