Effective Theory of Transformers at Initialization

Effective Theory of Transformers at Initialization

Papers citing "Effective Theory of Transformers at Initialization"

45 / 45 papers shown
Title
Using the Output Embedding to Improve Language Models
Using the Output Embedding to Improve Language Models
Ofir Press
Lior Wolf
104
738
0
20 Aug 2016
Layer Normalization
Layer Normalization
437
10,548
0
21 Jul 2016

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.