Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1903.07785
Cited By
Cloze-driven Pretraining of Self-attention Networks
19 March 2019
Alexei Baevski
Sergey Edunov
Yinhan Liu
Luke Zettlemoyer
Michael Auli
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Cloze-driven Pretraining of Self-attention Networks"
26 / 26 papers shown
Title
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Myle Ott
Sergey Edunov
Alexei Baevski
Angela Fan
Sam Gross
Nathan Ng
David Grangier
Michael Auli
VLM
FaML
76
3,141
0
01 Apr 2019
Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks
Jason Phang
Thibault Févry
Samuel R. Bowman
67
467
0
02 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
882
93,936
0
11 Oct 2018
Adaptive Input Representations for Neural Language Modeling
Alexei Baevski
Michael Auli
88
389
0
28 Sep 2018
Language Modeling Teaches You More Syntax than Translation Does: Lessons Learned Through Auxiliary Task Analysis
Kelly W. Zhang
Samuel R. Bowman
55
72
0
26 Sep 2018
Scaling Neural Machine Translation
Myle Ott
Sergey Edunov
David Grangier
Michael Auli
AIMat
149
611
0
01 Jun 2018
Constituency Parsing with a Self-Attentive Encoder
Nikita Kitaev
Dan Klein
48
537
0
02 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
594
7,080
0
20 Apr 2018
Learning Word Vectors for 157 Languages
Edouard Grave
Piotr Bojanowski
Prakhar Gupta
Armand Joulin
Tomas Mikolov
SSL
FaML
80
1,421
0
19 Feb 2018
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
99
11,520
0
15 Feb 2018
Learned in Translation: Contextualized Word Vectors
Bryan McCann
James Bradbury
Caiming Xiong
R. Socher
95
907
0
01 Aug 2017
SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation
Daniel Cer
Mona T. Diab
Eneko Agirre
I. Lopez-Gazpio
Lucia Specia
174
1,870
0
31 Jul 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
427
129,831
0
12 Jun 2017
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Adina Williams
Nikita Nangia
Samuel R. Bowman
383
4,444
0
18 Apr 2017
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
Hakan Inan
Khashayar Khosravi
R. Socher
80
384
0
04 Nov 2016
Efficient softmax approximation for GPUs
Edouard Grave
Armand Joulin
Moustapha Cissé
David Grangier
Hervé Jégou
59
271
0
14 Sep 2016
Using the Output Embedding to Improve Language Models
Ofir Press
Lior Wolf
51
731
0
20 Aug 2016
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov
Frank Hutter
ODL
219
8,030
0
13 Aug 2016
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
142
8,067
0
16 Jun 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.3K
192,638
0
10 Dec 2015
Semi-supervised Sequence Learning
Andrew M. Dai
Quoc V. Le
SSL
96
1,232
0
04 Nov 2015
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
151
7,683
0
31 Aug 2015
Character-Aware Neural Language Models
Yoon Kim
Yacine Jernite
David Sontag
Alexander M. Rush
64
1,665
0
26 Aug 2015
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books
Yukun Zhu
Ryan Kiros
R. Zemel
Ruslan Salakhutdinov
R. Urtasun
Antonio Torralba
Sanja Fidler
94
2,529
0
22 Jun 2015
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
736
149,474
0
22 Dec 2014
On the difficulty of training Recurrent Neural Networks
Razvan Pascanu
Tomas Mikolov
Yoshua Bengio
ODL
114
5,318
0
21 Nov 2012
1