Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.03762
Cited By
v1
v2
v3
v4
v5
v6
v7 (latest)
Attention Is All You Need
12 June 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Attention Is All You Need"
43 / 2,193 papers shown
Title
Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes
Greg Yang
123
201
0
28 Oct 2019
DFSMN-SAN with Persistent Memory Model for Automatic Speech Recognition
Zhao You
Dan Su
Jie Chen
Chao Weng
Dong Yu
71
13
0
28 Oct 2019
Thieves on Sesame Street! Model Extraction of BERT-based APIs
Kalpesh Krishna
Gaurav Singh Tomar
Ankur P. Parikh
Nicolas Papernot
Mohit Iyyer
MIACV
MLAU
112
201
0
27 Oct 2019
Fast Structured Decoding for Sequence Models
Zhiqing Sun
Zhuohan Li
Haoqing Wang
Zi Lin
Di He
Zhihong Deng
85
122
0
25 Oct 2019
HUBERT Untangles BERT to Improve Transfer across NLP Tasks
M. Moradshahi
Hamid Palangi
M. Lam
P. Smolensky
Jianfeng Gao
106
16
0
25 Oct 2019
Towards Online End-to-end Transformer Automatic Speech Recognition
E. Tsunoo
Yosuke Kashiwagi
Toshiyuki Kumakura
Shinji Watanabe
65
32
0
25 Oct 2019
Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
Andy T. Liu
Shu-Wen Yang
Po-Han Chi
Po-Chun Hsu
Hung-yi Lee
SSL
150
374
0
25 Oct 2019
Conversational Emotion Analysis via Attention Mechanisms
Zheng Lian
J. Tao
Bin Liu
Jian Huang
51
27
0
24 Oct 2019
U-Time: A Fully Convolutional Network for Time Series Segmentation Applied to Sleep Staging
Mathias Perslev
M. Jensen
S. Darkner
P. Jennum
Christian Igel
AI4TS
134
251
0
24 Oct 2019
Controlling the Output Length of Neural Machine Translation
Surafel Melaku Lakew
Mattia Antonino Di Gangi
Marcello Federico
113
69
0
23 Oct 2019
A Transformer with Interleaved Self-attention and Convolution for Hybrid Acoustic Models
Liang Lu
79
4
0
23 Oct 2019
Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks
Andros Tjandra
Chunxi Liu
Frank Zhang
Xiaohui Zhang
Yongqiang Wang
Gabriel Synnaeve
Satoshi Nakamura
Geoffrey Zweig
ViT
69
46
0
23 Oct 2019
Discovering the Compositional Structure of Vector Representations with Role Learning Networks
Paul Soulos
R. Thomas McCoy
Tal Linzen
P. Smolensky
CoGe
89
44
0
21 Oct 2019
A Deep Reinforced Model for Abstractive Summarization
Romain Paulus
Caiming Xiong
R. Socher
AI4TS
206
1,559
0
11 May 2017
Convolutional Sequence to Sequence Learning
Jonas Gehring
Michael Auli
David Grangier
Denis Yarats
Yann N. Dauphin
AIMat
171
3,289
0
08 May 2017
Factorization tricks for LSTM networks
Oleksii Kuchaiev
Boris Ginsburg
61
113
0
31 Mar 2017
Massive Exploration of Neural Machine Translation Architectures
D. Britz
Anna Goldie
Minh-Thang Luong
Quoc V. Le
63
519
0
11 Mar 2017
A Structured Self-attentive Sentence Embedding
Zhouhan Lin
Minwei Feng
Cicero Nogueira dos Santos
Mo Yu
Bing Xiang
Bowen Zhou
Yoshua Bengio
115
2,141
0
09 Mar 2017
Structured Attention Networks
Yoon Kim
Carl Denton
Luong Hoang
Alexander M. Rush
116
463
0
03 Feb 2017
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
253
2,686
0
23 Jan 2017
Neural Machine Translation in Linear Time
Nal Kalchbrenner
L. Espeholt
Karen Simonyan
Aaron van den Oord
Alex Graves
Koray Kavukcuoglu
AIMat
115
553
0
31 Oct 2016
Can Active Memory Replace Attention?
Lukasz Kaiser
Samy Bengio
67
59
0
27 Oct 2016
Xception: Deep Learning with Depthwise Separable Convolutions
François Chollet
MDE
BDL
PINN
1.4K
14,608
0
07 Oct 2016
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Zhiwen Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
911
6,796
0
26 Sep 2016
Using the Output Embedding to Improve Language Models
Ofir Press
Lior Wolf
89
736
0
20 Aug 2016
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
423
10,531
0
21 Jul 2016
Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation
Jie Zhou
Ying Cao
Xuguang Wang
Peng Li
Wenyuan Xu
AIMat
64
217
0
14 Jun 2016
Recurrent Neural Network Grammars
Chris Dyer
A. Kuncoro
Miguel Ballesteros
Noah A. Smith
GNN
91
527
0
25 Feb 2016
Exploring the Limits of Language Modeling
Rafal Jozefowicz
Oriol Vinyals
M. Schuster
Noam M. Shazeer
Yonghui Wu
201
1,145
0
07 Feb 2016
Long Short-Term Memory-Networks for Machine Reading
Jianpeng Cheng
Li Dong
Mirella Lapata
AIMat
RALM
109
1,123
0
25 Jan 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,426
0
10 Dec 2015
Rethinking the Inception Architecture for Computer Vision
Christian Szegedy
Vincent Vanhoucke
Sergey Ioffe
Jonathon Shlens
Z. Wojna
3DV
BDL
886
27,416
0
02 Dec 2015
Neural GPUs Learn Algorithms
Lukasz Kaiser
Ilya Sutskever
84
370
0
25 Nov 2015
Multi-task Sequence to Sequence Learning
Minh-Thang Luong
Quoc V. Le
Ilya Sutskever
Oriol Vinyals
Lukasz Kaiser
AIMat
116
808
0
19 Nov 2015
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
228
7,757
0
31 Aug 2015
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
407
7,969
0
17 Aug 2015
Grammar as a Foreign Language
Oriol Vinyals
Lukasz Kaiser
Terry Koo
Slav Petrov
Ilya Sutskever
Geoffrey E. Hinton
121
932
0
23 Dec 2014
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.0K
150,312
0
22 Dec 2014
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Junyoung Chung
Çağlar Gülçehre
Kyunghyun Cho
Yoshua Bengio
601
12,741
0
11 Dec 2014
Sequence to Sequence Learning with Neural Networks
Ilya Sutskever
Oriol Vinyals
Quoc V. Le
AIMat
443
20,590
0
10 Sep 2014
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
578
27,327
0
01 Sep 2014
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Kyunghyun Cho
B. V. Merrienboer
Çağlar Gülçehre
Dzmitry Bahdanau
Fethi Bougares
Holger Schwenk
Yoshua Bengio
AIMat
1.1K
23,388
0
03 Jun 2014
Generating Sequences With Recurrent Neural Networks
Alex Graves
GAN
164
4,039
0
04 Aug 2013
Previous
1
2
3
...
42
43
44