Spike No More: Stabilizing the Pre-training of Large Language Models

Spike No More: Stabilizing the Pre-training of Large Language Models

Papers citing "Spike No More: Stabilizing the Pre-training of Large Language Models"

18 / 18 papers shown
Title
8-bit Optimizers via Block-wise Quantization
8-bit Optimizers via Block-wise Quantization
Tim Dettmers
M. Lewis
Sam Shleifer
Luke Zettlemoyer
90
286
0
06 Oct 2021
Using the Output Embedding to Improve Language Models
Using the Output Embedding to Improve Language Models
Ofir Press
Lior Wolf
46
731
0
20 Aug 2016

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.