Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.10540
Cited By
Infinite attention: NNGP and NTK for deep attention networks
18 June 2020
Jiri Hron
Yasaman Bahri
Jascha Narain Sohl-Dickstein
Roman Novak
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Infinite attention: NNGP and NTK for deep attention networks"
32 / 32 papers shown
Title
Issues with Neural Tangent Kernel Approach to Neural Networks
Haoran Liu
Anthony S. Tai
David J. Crandall
Chunfeng Huang
42
0
0
19 Jan 2025
Attention layers provably solve single-location regression
Pierre Marion
Raphael Berthier
Gérard Biau
Claire Boyer
182
2
0
02 Oct 2024
Advancing Hybrid Defense for Byzantine Attacks in Federated Learning
Kai Yue
Richeng Jin
Chau-Wai Wong
H. Dai
AAML
39
0
0
10 Sep 2024
Variational Search Distributions
Daniel M. Steinberg
Rafael Oliveira
Cheng Soon Ong
Edwin V. Bonilla
33
0
0
10 Sep 2024
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
42
3
0
29 May 2024
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers
Lorenzo Tiberi
Francesca Mignacco
Kazuki Irie
H. Sompolinsky
44
6
0
24 May 2024
Infinite Limits of Multi-head Transformer Dynamics
Blake Bordelon
Hamza Tahir Chaudhry
Cengiz Pehlevan
AI4CE
49
9
0
24 May 2024
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining
Licong Lin
Yu Bai
Song Mei
OffRL
32
45
0
12 Oct 2023
Controlled Descent Training
Viktor Andersson
B. Varga
Vincent Szolnoky
Andreas Syrén
Rebecka Jörnsten
Balázs Kulcsár
43
1
0
16 Mar 2023
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity
Hongkang Li
Ming Wang
Sijia Liu
Pin-Yu Chen
ViT
MLT
37
57
0
12 Feb 2023
Width and Depth Limits Commute in Residual Networks
Soufiane Hayou
Greg Yang
44
14
0
01 Feb 2023
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models
Yufeng Zhang
Boyi Liu
Qi Cai
Lingxiao Wang
Zhaoran Wang
53
11
0
30 Dec 2022
Exploring Predictive Uncertainty and Calibration in NLP: A Study on the Impact of Method & Data Scarcity
Dennis Ulmer
J. Frellsen
Christian Hardmeier
195
22
0
20 Oct 2022
A connection between probability, physics and neural networks
Sascha Ranftl
PINN
17
9
0
26 Sep 2022
Open Source Vizier: Distributed Infrastructure and API for Reliable and Flexible Blackbox Optimization
Xingyou Song
Sagi Perel
Chansoo Lee
Greg Kochanski
Daniel Golovin
31
26
0
27 Jul 2022
AutoInit: Automatic Initialization via Jacobian Tuning
Tianyu He
Darshil Doshi
Andrey Gromov
14
4
0
27 Jun 2022
Fast Finite Width Neural Tangent Kernel
Roman Novak
Jascha Narain Sohl-Dickstein
S. Schoenholz
AAML
28
53
0
17 Jun 2022
Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling
Jiri Hron
Roman Novak
Jeffrey Pennington
Jascha Narain Sohl-Dickstein
UQCV
BDL
48
6
0
15 Jun 2022
Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture
Libin Zhu
Chaoyue Liu
M. Belkin
GNN
AI4CE
23
4
0
24 May 2022
Generalization Through The Lens Of Leave-One-Out Error
Gregor Bachmann
Thomas Hofmann
Aurelien Lucchi
67
7
0
07 Mar 2022
How Do Vision Transformers Work?
Namuk Park
Songkuk Kim
ViT
47
466
0
14 Feb 2022
Demystify Optimization and Generalization of Over-parameterized PAC-Bayesian Learning
Wei Huang
Chunrui Liu
Yilan Chen
Tianyu Liu
R. Xu
BDL
MLT
19
2
0
04 Feb 2022
Inductive Biases and Variable Creation in Self-Attention Mechanisms
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Cyril Zhang
27
117
0
19 Oct 2021
Dataset Distillation with Infinitely Wide Convolutional Networks
Timothy Nguyen
Roman Novak
Lechao Xiao
Jaehoon Lee
DD
51
229
0
27 Jul 2021
Precise characterization of the prior predictive distribution of deep ReLU networks
Lorenzo Noci
Gregor Bachmann
Kevin Roth
Sebastian Nowozin
Thomas Hofmann
BDL
UQCV
29
32
0
11 Jun 2021
The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective
Geoff Pleiss
John P. Cunningham
28
24
0
11 Jun 2021
A Neural Tangent Kernel Perspective of GANs
Jean-Yves Franceschi
Emmanuel de Bézenac
Ibrahim Ayed
Mickaël Chen
Sylvain Lamprier
Patrick Gallinari
37
26
0
10 Jun 2021
A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs
Gadi Naveh
Zohar Ringel
SSL
MLT
36
31
0
08 Jun 2021
Priors in Bayesian Deep Learning: A Review
Vincent Fortuin
UQCV
BDL
31
124
0
14 May 2021
Tensor Programs III: Neural Matrix Laws
Greg Yang
14
43
0
22 Sep 2020
Tensor Programs II: Neural Tangent Kernel for Any Architecture
Greg Yang
58
135
0
25 Jun 2020
On the Computational Power of Transformers and its Implications in Sequence Modeling
S. Bhattamishra
Arkil Patel
Navin Goyal
33
65
0
16 Jun 2020
1