Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.06514
Cited By
Scalable Training of Language Models using JAX pjit and TPUv4
13 April 2022
Joanna Yoo
Kuba Perlin
Siddhartha Rao Kamalakara
J. Araújo
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Scalable Training of Language Models using JAX pjit and TPUv4"
8 / 8 papers shown
Title
GSPMD: General and Scalable Parallelization for ML Computation Graphs
Yuanzhong Xu
HyoukJoong Lee
Dehao Chen
Blake A. Hechtman
Yanping Huang
...
Noam M. Shazeer
Shibo Wang
Tao Wang
Yonghui Wu
Zhifeng Chen
MoE
54
129
0
10 May 2021
Bootstrap your own latent: A new approach to self-supervised Learning
Jean-Bastien Grill
Florian Strub
Florent Altché
Corentin Tallec
Pierre Harvey Richemond
...
M. G. Azar
Bilal Piot
Koray Kavukcuoglu
Rémi Munos
Michal Valko
SSL
398
6,837
0
13 Jun 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
485
20,317
0
23 Oct 2019
Benchmarking TPU, GPU, and CPU Platforms for Deep Learning
Y. Wang
Gu-Yeon Wei
David Brooks
ELM
VLM
64
275
0
24 Jul 2019
Ray: A Distributed Framework for Emerging AI Applications
Philipp Moritz
Robert Nishihara
Stephanie Wang
Alexey Tumanov
Richard Liaw
...
Melih Elibol
Zongheng Yang
William Paul
Michael I. Jordan
Ion Stoica
GNN
107
1,267
0
16 Dec 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
786
132,363
0
12 Jun 2017
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
253
2,686
0
23 Jan 2017
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.1K
150,312
0
22 Dec 2014
1