Papers citing "Language Models are Few-Shot Learners"

9 / 1,609 papers shown

Title
Learning to learn by gradient descent by gradient descent Marcin Andrychowicz Misha Denil Sergio Gomez Colmenarejo Matthew W. Hoffman David Pfau Tom Schaul Brendan Shillingford Nando de Freitas 108 2,006 0 14 Jun 2016
Matching Networks for One Shot Learning Oriol Vinyals Charles Blundell Timothy Lillicrap Koray Kavukcuoglu Daan Wierstra VLM 370 7,321 0 13 Jun 2016
Adaptive Computation Time for Recurrent Neural Networks Alex Graves 112 547 0 29 Mar 2016
Exploring the Limits of Language Modeling Rafal Jozefowicz Oriol Vinyals M. Schuster Noam M. Shazeer Yonghui Wu 191 1,145 0 07 Feb 2016
Improving Neural Machine Translation Models with Monolingual Data Rico Sennrich Barry Haddow Alexandra Birch 248 2,717 0 20 Nov 2015
Semi-supervised Sequence Learning Andrew M. Dai Quoc V. Le SSL 128 1,233 0 04 Nov 2015
Distilling the Knowledge in a Neural Network Geoffrey E. Hinton Oriol Vinyals J. Dean FedML 362 19,643 0 09 Mar 2015
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation Yoshua Bengio Nicholas Léonard Aaron Courville 381 3,135 0 15 Aug 2013
Efficient Estimation of Word Representations in Vector Space Tomas Mikolov Kai Chen G. Corrado J. Dean 3DV 677 31,512 0 16 Jan 2013