ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.09697
  4. Cited By
Effects of Parameter Norm Growth During Transformer Training: Inductive
  Bias from Gradient Descent

Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent

19 October 2020
William Merrill
Vivek Ramanujan
Yoav Goldberg
Roy Schwartz
Noah A. Smith
    AI4CE
ArXivPDFHTML

Papers citing "Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent"

7 / 7 papers shown
Title
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
53
0
0
02 May 2025
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory
Nikola Zubić
Federico Soldá
Aurelio Sulser
Davide Scaramuzza
LRM
BDL
52
5
0
26 May 2024
Counting Like Transformers: Compiling Temporal Counting Logic Into
  Softmax Transformers
Counting Like Transformers: Compiling Temporal Counting Logic Into Softmax Transformers
Andy Yang
David Chiang
41
8
0
05 Apr 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
37
13
0
08 Feb 2024
Small-scale proxies for large-scale Transformer training instabilities
Small-scale proxies for large-scale Transformer training instabilities
Mitchell Wortsman
Peter J. Liu
Lechao Xiao
Katie Everett
A. Alemi
...
Jascha Narain Sohl-Dickstein
Kelvin Xu
Jaehoon Lee
Justin Gilmer
Simon Kornblith
40
84
0
25 Sep 2023
Overcoming a Theoretical Limitation of Self-Attention
Overcoming a Theoretical Limitation of Self-Attention
David Chiang
Peter A. Cholak
36
78
0
24 Feb 2022
Extracting Finite Automata from RNNs Using State Merging
Extracting Finite Automata from RNNs Using State Merging
William Merrill
Nikolaos Tsilivis
22
14
0
28 Jan 2022
1