ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.02010
  4. Cited By
Memorization Capacity of Multi-Head Attention in Transformers

Memorization Capacity of Multi-Head Attention in Transformers

3 June 2023
Sadegh Mahdavi
Renjie Liao
Christos Thrampoulidis
ArXivPDFHTML

Papers citing "Memorization Capacity of Multi-Head Attention in Transformers"

16 / 16 papers shown
Title
Mixture of Parrots: Experts improve memorization more than reasoning
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
65
4
0
24 Oct 2024
Undesirable Memorization in Large Language Models: A Survey
Undesirable Memorization in Large Language Models: A Survey
Ali Satvaty
Suzan Verberne
Fatih Turkmen
ELM
PILM
139
7
0
03 Oct 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
51
19
0
08 Feb 2024
A Theoretical Understanding of Shallow Vision Transformers: Learning,
  Generalization, and Sample Complexity
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity
Hongkang Li
Ming Wang
Sijia Liu
Pin-Yu Chen
ViT
MLT
87
62
0
12 Feb 2023
Transformers learn in-context by gradient descent
Transformers learn in-context by gradient descent
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
91
487
0
15 Dec 2022
When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work
When Expressivity Meets Trainability: Fewer than nnn Neurons Can Work
Jiawei Zhang
Yushun Zhang
Mingyi Hong
Ruoyu Sun
Zhi-Quan Luo
70
10
0
21 Oct 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function
  Classes
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Shivam Garg
Dimitris Tsipras
Percy Liang
Gregory Valiant
116
504
0
01 Aug 2022
Memorization and Optimization in Deep Neural Networks with Minimum
  Over-parameterization
Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization
Simone Bombari
Mohammad Hossein Amani
Marco Mondelli
53
26
0
20 May 2022
Attention is Not All You Need: Pure Attention Loses Rank Doubly
  Exponentially with Depth
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
93
385
0
05 Mar 2021
Transformer Feed-Forward Layers Are Key-Value Memories
Transformer Feed-Forward Layers Are Key-Value Memories
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
130
820
0
29 Dec 2020
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
436
1,906
0
14 Dec 2020
Provable Memorization via Deep Neural Networks using Sub-linear
  Parameters
Provable Memorization via Deep Neural Networks using Sub-linear Parameters
Sejun Park
Jaeho Lee
Chulhee Yun
Jinwoo Shin
FedML
MDE
45
37
0
26 Oct 2020
Hopfield Networks is All You Need
Hopfield Networks is All You Need
Hubert Ramsauer
Bernhard Schafl
Johannes Lehner
Philipp Seidl
Michael Widrich
...
David P. Kreil
Michael K Kopp
Günter Klambauer
Johannes Brandstetter
Sepp Hochreiter
83
429
0
16 Jul 2020
Network size and weights size for memorization with two-layers neural
  networks
Network size and weights size for memorization with two-layers neural networks
Sébastien Bubeck
Ronen Eldan
Y. Lee
Dan Mikulincer
56
33
0
04 Jun 2020
Low-Rank Bottleneck in Multi-head Attention Models
Low-Rank Bottleneck in Multi-head Attention Models
Srinadh Bhojanapalli
Chulhee Yun
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
55
95
0
17 Feb 2020
Understanding deep learning requires rethinking generalization
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
324
4,624
0
10 Nov 2016
1