Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.09697
Cited By
Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent
19 October 2020
William Merrill
Vivek Ramanujan
Yoav Goldberg
Roy Schwartz
Noah A. Smith
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent"
7 / 7 papers shown
Title
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
53
0
0
02 May 2025
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory
Nikola Zubić
Federico Soldá
Aurelio Sulser
Davide Scaramuzza
LRM
BDL
52
5
0
26 May 2024
Counting Like Transformers: Compiling Temporal Counting Logic Into Softmax Transformers
Andy Yang
David Chiang
41
8
0
05 Apr 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
37
13
0
08 Feb 2024
Small-scale proxies for large-scale Transformer training instabilities
Mitchell Wortsman
Peter J. Liu
Lechao Xiao
Katie Everett
A. Alemi
...
Jascha Narain Sohl-Dickstein
Kelvin Xu
Jaehoon Lee
Justin Gilmer
Simon Kornblith
40
84
0
25 Sep 2023
Overcoming a Theoretical Limitation of Self-Attention
David Chiang
Peter A. Cholak
36
78
0
24 Feb 2022
Extracting Finite Automata from RNNs Using State Merging
William Merrill
Nikolaos Tsilivis
22
14
0
28 Jan 2022
1