ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.10540
  4. Cited By
Infinite attention: NNGP and NTK for deep attention networks

Infinite attention: NNGP and NTK for deep attention networks

18 June 2020
Jiri Hron
Yasaman Bahri
Jascha Narain Sohl-Dickstein
Roman Novak
ArXivPDFHTML

Papers citing "Infinite attention: NNGP and NTK for deep attention networks"

32 / 32 papers shown
Title
Issues with Neural Tangent Kernel Approach to Neural Networks
Issues with Neural Tangent Kernel Approach to Neural Networks
Haoran Liu
Anthony S. Tai
David J. Crandall
Chunfeng Huang
42
0
0
19 Jan 2025
Attention layers provably solve single-location regression
Attention layers provably solve single-location regression
Pierre Marion
Raphael Berthier
Gérard Biau
Claire Boyer
191
3
0
02 Oct 2024
Advancing Hybrid Defense for Byzantine Attacks in Federated Learning
Advancing Hybrid Defense for Byzantine Attacks in Federated Learning
Kai Yue
Richeng Jin
Chau-Wai Wong
H. Dai
AAML
39
0
0
10 Sep 2024
Variational Search Distributions
Variational Search Distributions
Daniel M. Steinberg
Rafael Oliveira
Cheng Soon Ong
Edwin V. Bonilla
33
0
0
10 Sep 2024
Understanding and Minimising Outlier Features in Neural Network Training
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
42
3
0
29 May 2024
Dissecting the Interplay of Attention Paths in a Statistical Mechanics
  Theory of Transformers
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers
Lorenzo Tiberi
Francesca Mignacco
Kazuki Irie
H. Sompolinsky
44
6
0
24 May 2024
Infinite Limits of Multi-head Transformer Dynamics
Infinite Limits of Multi-head Transformer Dynamics
Blake Bordelon
Hamza Tahir Chaudhry
Cengiz Pehlevan
AI4CE
51
9
0
24 May 2024
Transformers as Decision Makers: Provable In-Context Reinforcement
  Learning via Supervised Pretraining
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining
Licong Lin
Yu Bai
Song Mei
OffRL
32
45
0
12 Oct 2023
Controlled Descent Training
Controlled Descent Training
Viktor Andersson
B. Varga
Vincent Szolnoky
Andreas Syrén
Rebecka Jörnsten
Balázs Kulcsár
43
1
0
16 Mar 2023
A Theoretical Understanding of Shallow Vision Transformers: Learning,
  Generalization, and Sample Complexity
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity
Hongkang Li
Ming Wang
Sijia Liu
Pin-Yu Chen
ViT
MLT
37
57
0
12 Feb 2023
Width and Depth Limits Commute in Residual Networks
Width and Depth Limits Commute in Residual Networks
Soufiane Hayou
Greg Yang
47
14
0
01 Feb 2023
An Analysis of Attention via the Lens of Exchangeability and Latent
  Variable Models
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models
Yufeng Zhang
Boyi Liu
Qi Cai
Lingxiao Wang
Zhaoran Wang
53
11
0
30 Dec 2022
Exploring Predictive Uncertainty and Calibration in NLP: A Study on the
  Impact of Method & Data Scarcity
Exploring Predictive Uncertainty and Calibration in NLP: A Study on the Impact of Method & Data Scarcity
Dennis Ulmer
J. Frellsen
Christian Hardmeier
195
22
0
20 Oct 2022
A connection between probability, physics and neural networks
A connection between probability, physics and neural networks
Sascha Ranftl
PINN
17
9
0
26 Sep 2022
Open Source Vizier: Distributed Infrastructure and API for Reliable and
  Flexible Blackbox Optimization
Open Source Vizier: Distributed Infrastructure and API for Reliable and Flexible Blackbox Optimization
Xingyou Song
Sagi Perel
Chansoo Lee
Greg Kochanski
Daniel Golovin
33
26
0
27 Jul 2022
AutoInit: Automatic Initialization via Jacobian Tuning
AutoInit: Automatic Initialization via Jacobian Tuning
Tianyu He
Darshil Doshi
Andrey Gromov
19
4
0
27 Jun 2022
Fast Finite Width Neural Tangent Kernel
Fast Finite Width Neural Tangent Kernel
Roman Novak
Jascha Narain Sohl-Dickstein
S. Schoenholz
AAML
28
54
0
17 Jun 2022
Wide Bayesian neural networks have a simple weight posterior: theory and
  accelerated sampling
Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling
Jiri Hron
Roman Novak
Jeffrey Pennington
Jascha Narain Sohl-Dickstein
UQCV
BDL
48
6
0
15 Jun 2022
Transition to Linearity of General Neural Networks with Directed Acyclic
  Graph Architecture
Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture
Libin Zhu
Chaoyue Liu
M. Belkin
GNN
AI4CE
23
4
0
24 May 2022
Generalization Through The Lens Of Leave-One-Out Error
Generalization Through The Lens Of Leave-One-Out Error
Gregor Bachmann
Thomas Hofmann
Aurelien Lucchi
67
7
0
07 Mar 2022
How Do Vision Transformers Work?
How Do Vision Transformers Work?
Namuk Park
Songkuk Kim
ViT
47
466
0
14 Feb 2022
Demystify Optimization and Generalization of Over-parameterized
  PAC-Bayesian Learning
Demystify Optimization and Generalization of Over-parameterized PAC-Bayesian Learning
Wei Huang
Chunrui Liu
Yilan Chen
Tianyu Liu
R. Xu
BDL
MLT
19
2
0
04 Feb 2022
Inductive Biases and Variable Creation in Self-Attention Mechanisms
Inductive Biases and Variable Creation in Self-Attention Mechanisms
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Cyril Zhang
27
117
0
19 Oct 2021
Dataset Distillation with Infinitely Wide Convolutional Networks
Dataset Distillation with Infinitely Wide Convolutional Networks
Timothy Nguyen
Roman Novak
Lechao Xiao
Jaehoon Lee
DD
51
231
0
27 Jul 2021
Precise characterization of the prior predictive distribution of deep
  ReLU networks
Precise characterization of the prior predictive distribution of deep ReLU networks
Lorenzo Noci
Gregor Bachmann
Kevin Roth
Sebastian Nowozin
Thomas Hofmann
BDL
UQCV
29
32
0
11 Jun 2021
The Limitations of Large Width in Neural Networks: A Deep Gaussian
  Process Perspective
The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective
Geoff Pleiss
John P. Cunningham
28
24
0
11 Jun 2021
A Neural Tangent Kernel Perspective of GANs
A Neural Tangent Kernel Perspective of GANs
Jean-Yves Franceschi
Emmanuel de Bézenac
Ibrahim Ayed
Mickaël Chen
Sylvain Lamprier
Patrick Gallinari
37
26
0
10 Jun 2021
A self consistent theory of Gaussian Processes captures feature learning
  effects in finite CNNs
A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs
Gadi Naveh
Zohar Ringel
SSL
MLT
36
31
0
08 Jun 2021
Priors in Bayesian Deep Learning: A Review
Priors in Bayesian Deep Learning: A Review
Vincent Fortuin
UQCV
BDL
31
124
0
14 May 2021
Tensor Programs III: Neural Matrix Laws
Tensor Programs III: Neural Matrix Laws
Greg Yang
14
44
0
22 Sep 2020
Tensor Programs II: Neural Tangent Kernel for Any Architecture
Tensor Programs II: Neural Tangent Kernel for Any Architecture
Greg Yang
58
135
0
25 Jun 2020
On the Computational Power of Transformers and its Implications in
  Sequence Modeling
On the Computational Power of Transformers and its Implications in Sequence Modeling
S. Bhattamishra
Arkil Patel
Navin Goyal
33
65
0
16 Jun 2020
1