Attention Is All You Need

v1v2v3v4v5v6v7 (latest)

Attention Is All You Need

12 June 2017

Noam M. Shazeer

Jakob Uszkoreit

Illia Polosukhin

ArXiv (abs)PDF HTML

Papers citing "Attention Is All You Need"

17 / 27,017 papers shown

Title
Dual Supervised Learning Yingce Xia Tao Qin Wei-neng Chen Jiang Bian Nenghai Yu Tie-Yan Liu SSL 140 143 0 03 Jul 2017
VAIN: Attentional Multi-agent Predictive Modeling Yedid Hoshen GNN 103 240 0 19 Jun 2017
One Model To Learn Them All Lukasz Kaiser Aidan Gomez Noam M. Shazeer Ashish Vaswani Niki Parmar Llion Jones Jakob Uszkoreit VLM ViT 82 334 0 16 Jun 2017
Depthwise Separable Convolutions for Neural Machine Translation Lukasz Kaiser Aidan Gomez François Chollet 74 279 0 09 Jun 2017
Jointly Learning Sentence Embeddings and Syntax with Unsupervised Tree-LSTMs Jean Maillard S. Clark Dani Yogatama 77 89 0 25 May 2017
Recurrent Additive Networks Kenton Lee Omer Levy Luke Zettlemoyer GNN AI4CE 90 38 0 21 May 2017
Reinforced Mnemonic Reader for Machine Reading Comprehension Minghao Hu Yuxing Peng Zhen Huang Xipeng Qiu Furu Wei Ming Zhou RALM AIMat 97 69 0 08 May 2017
Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU Jacob Devlin 73 36 0 04 May 2017
Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets Zhen-Le Yang Wei Chen Feng Wang Bo Xu GAN AI4CE 89 170 0 15 Mar 2017
Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush 146 463 0 03 Feb 2017
Symbolic, Distributed and Distributional Representations for Natural Language Processing in the Era of Deep Learning: a Survey L. Ferrone Fabio Massimo Zanzotto 43 38 0 02 Feb 2017
Deep Reinforcement Learning: An Overview Yuxi Li OffRL VLM 307 1,548 0 25 Jan 2017
Boosting Neural Machine Translation Dakun Zhang Jungi Kim Josep Crego Jean Senellart AI4CE 73 26 0 19 Dec 2016
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model Marcella Cornia Lorenzo Baraldi G. Serra Rita Cucchiara 122 551 0 29 Nov 2016
One Sentence One Model for Neural Machine Translation Xiaoqing Li Jiajun Zhang Chengqing Zong AI4CE 165 62 0 21 Sep 2016
Quantifying the probable approximation error of probabilistic inference programs Marco F. Cusumano-Towner Vikash K. Mansinghka 100 7 0 31 May 2016
Impact of Power System Partitioning on the Efficiency of Distributed Multi-Step Optimization Dongliang Chen A. Bucchiarone Zhihan Lv 42 12 0 31 May 2016