Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.00802
Cited By
v1
v2 (latest)
Birth of a Transformer: A Memory Viewpoint
1 June 2023
A. Bietti
Vivien A. Cabannes
Diane Bouchacourt
Hervé Jégou
Léon Bottou
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Birth of a Transformer: A Memory Viewpoint"
19 / 69 papers shown
Title
Learning Associative Memories with Gradient Descent
Vivien A. Cabannes
Berfin Simsek
A. Bietti
109
8
0
28 Feb 2024
Prospector Heads: Generalized Feature Attribution for Large Models & Data
Gautam Machiraju
Alexander Derry
Arjun D Desai
Neel Guha
Amir-Hossein Karimi
James Zou
Russ Altman
Christopher Ré
Parag Mallick
AI4TS
MedIm
132
0
0
18 Feb 2024
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains
Benjamin L. Edelman
Ezra Edelman
Surbhi Goel
Eran Malach
Nikolaos Tsilivis
BDL
99
56
0
16 Feb 2024
Understanding In-Context Learning with a Pelican Soup Framework
Ting-Rui Chiang
Dani Yogatama
65
3
0
16 Feb 2024
Secret Collusion among AI Agents: Multi-Agent Deception via Steganography
S. Motwani
Mikhail Baranchuk
Martin Strohmeier
Vijay Bolina
Philip Torr
Lewis Hammond
Christian Schroeder de Witt
196
4
0
12 Feb 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
119
21
0
08 Feb 2024
A phase transition between positional and semantic learning in a solvable model of dot-product attention
Hugo Cui
Freya Behrens
Florent Krzakala
Lenka Zdeborová
MLT
100
16
0
06 Feb 2024
Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains
Ashok Vardhan Makkuva
Marco Bondaschi
Adway Girish
Alliot Nagle
Martin Jaggi
Hyeji Kim
Michael C. Gastpar
OffRL
85
26
0
06 Feb 2024
Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features
Simone Bombari
Marco Mondelli
109
5
0
05 Feb 2024
Self-attention Networks Localize When QK-eigenspectrum Concentrates
Han Bao
Ryuichiro Hataya
Ryo Karakida
60
6
0
03 Feb 2024
In-Context Learning Dynamics with Random Binary Sequences
Eric J. Bigelow
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
T. Ullman
119
5
0
26 Oct 2023
On the Optimization and Generalization of Multi-head Attention
Puneesh Deora
Rouzbeh Ghaderi
Hossein Taheri
Christos Thrampoulidis
MLT
100
35
0
19 Oct 2023
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations
Tianyu Guo
Wei Hu
Song Mei
Huan Wang
Caiming Xiong
Silvio Savarese
Yu Bai
114
60
0
16 Oct 2023
Scaling Laws for Associative Memories
Vivien A. Cabannes
Elvis Dohmatob
A. Bietti
150
21
0
04 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Yuandong Tian
Yiping Wang
Zhenyu Zhang
Beidi Chen
Simon Shaolei Du
105
41
0
01 Oct 2023
Breaking through the learning plateaus of in-context learning in Transformer
Jingwen Fu
Tao Yang
Yuwang Wang
Yan Lu
Nanning Zheng
92
3
0
12 Sep 2023
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP
Vedant Palit
Rohan Pandey
Aryaman Arora
Paul Pu Liang
88
23
0
27 Aug 2023
Bidirectional Attention as a Mixture of Continuous Word Experts
Kevin Christian Wibisono
Yixin Wang
MoE
44
0
0
08 Jul 2023
A Survey on In-context Learning
Qingxiu Dong
Lei Li
Damai Dai
Ce Zheng
Jingyuan Ma
...
Zhiyong Wu
Baobao Chang
Xu Sun
Lei Li
Zhifang Sui
ReLM
AIMat
174
547
0
31 Dec 2022
Previous
1
2