Large Scale Language Modeling: Converging on 40GB of Text in Four Hours
v1v2 (latest)

Large Scale Language Modeling: Converging on 40GB of Text in Four Hours

Papers citing "Large Scale Language Modeling: Converging on 40GB of Text in Four Hours"

16 / 16 papers shown
Title
Compressing Gradient Optimizers via Count-Sketches
Compressing Gradient Optimizers via Count-Sketches
Ryan Spring
Anastasios Kyrillidis
Vijai Mohan
Anshumali Shrivastava
58
36
0
01 Feb 2019

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.