Efficient softmax approximation for GPUs

v1v2v3 (latest)

Efficient softmax approximation for GPUs

14 September 2016

Moustapha Cissé

ArXiv (abs)PDF HTML Github (394★)

Papers citing "Efficient softmax approximation for GPUs"

16 / 16 papers shown

Title
Large Vocabulary Size Improves Large Language Models Sho Takase Ryokan Ri Shun Kiyono Takuya Kato 106 4 0 24 Jun 2024
Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation Yacine Jernite A. Choromańska David Sontag 141 36 0 14 Oct 2016
Exploring the Limits of Language Modeling Rafal Jozefowicz Oriol Vinyals M. Schuster Noam M. Shazeer Yonghui Wu 199 1,145 0 07 Feb 2016
Strategies for Training Large Vocabulary Neural Language Models Welin Chen David Grangier Michael Auli VLM 55 139 0 15 Dec 2015
BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies Shihao Ji S.V.N. Vishwanathan N. Satish Michael J. Anderson Pradeep Dubey 71 77 0 21 Nov 2015
Learning Visual Features from Large Weakly Supervised Data Armand Joulin Laurens van der Maaten Allan Jabri Nicolas Vasilache SSL 107 408 0 06 Nov 2015
A Simple Way to Initialize Recurrent Networks of Rectified Linear Units Quoc V. Le Navdeep Jaitly Geoffrey E. Hinton ODL 88 721 0 03 Apr 2015
Learning Longer Memory in Recurrent Neural Networks Tomas Mikolov Armand Joulin S. Chopra Michaël Mathieu MarcÁurelio Ranzato 94 259 0 24 Dec 2014
Efficient Exact Gradient Update for training Deep Networks with Very Large Sparse Targets Pascal Vincent A. D. Brébisson Xavier Bouthillier 56 49 0 22 Dec 2014
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Junyoung Chung Çağlar Gülçehre Kyunghyun Cho Yoshua Bengio 598 12,734 0 11 Dec 2014
On Using Very Large Target Vocabulary for Neural Machine Translation Sébastien Jean Kyunghyun Cho Roland Memisevic Yoshua Bengio 155 1,011 0 05 Dec 2014
Sequence to Sequence Learning with Neural Networks Ilya Sutskever Oriol Vinyals Quoc V. Le AIMat 437 20,584 0 10 Sep 2014
One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling Ciprian Chelba Tomas Mikolov M. Schuster Qi Ge T. Brants P. Koehn T. Robinson 190 1,109 0 11 Dec 2013
Speech Recognition with Deep Recurrent Neural Networks Alex Graves Abdel-rahman Mohamed Geoffrey E. Hinton 228 8,523 0 22 Mar 2013
Efficient Estimation of Word Representations in Vector Space Tomas Mikolov Kai Chen G. Corrado J. Dean 3DV 680 31,538 0 16 Jan 2013
A Fast and Simple Algorithm for Training Neural Probabilistic Language Models A. Mnih Yee Whye Teh 177 578 0 27 Jun 2012