v1v2v3v4v5v6v7 (latest)

Attention Is All You Need

12 June 2017

Papers citing "Attention Is All You Need"

43 / 2,193 papers shown

Title
Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes Greg Yang 123 201 0 28 Oct 2019
DFSMN-SAN with Persistent Memory Model for Automatic Speech Recognition Zhao You Dan Su Jie Chen Chao Weng Dong Yu 71 13 0 28 Oct 2019
Thieves on Sesame Street! Model Extraction of BERT-based APIs Kalpesh Krishna Gaurav Singh Tomar Ankur P. Parikh Nicolas Papernot Mohit Iyyer MIACV MLAU 112 201 0 27 Oct 2019
Fast Structured Decoding for Sequence Models Zhiqing Sun Zhuohan Li Haoqing Wang Zi Lin Di He Zhihong Deng 85 122 0 25 Oct 2019
HUBERT Untangles BERT to Improve Transfer across NLP Tasks M. Moradshahi Hamid Palangi M. Lam P. Smolensky Jianfeng Gao 106 16 0 25 Oct 2019
Towards Online End-to-end Transformer Automatic Speech Recognition E. Tsunoo Yosuke Kashiwagi Toshiyuki Kumakura Shinji Watanabe 65 32 0 25 Oct 2019
Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders Andy T. Liu Shu-Wen Yang Po-Han Chi Po-Chun Hsu Hung-yi Lee SSL 150 374 0 25 Oct 2019
Conversational Emotion Analysis via Attention Mechanisms Zheng Lian J. Tao Bin Liu Jian Huang 51 27 0 24 Oct 2019
U-Time: A Fully Convolutional Network for Time Series Segmentation Applied to Sleep Staging Mathias Perslev M. Jensen S. Darkner P. Jennum Christian Igel AI4TS 134 251 0 24 Oct 2019
Controlling the Output Length of Neural Machine Translation Surafel Melaku Lakew Mattia Antonino Di Gangi Marcello Federico 113 69 0 23 Oct 2019
A Transformer with Interleaved Self-attention and Convolution for Hybrid Acoustic Models Liang Lu 79 4 0 23 Oct 2019
Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks Andros Tjandra Chunxi Liu Frank Zhang Xiaohui Zhang Yongqiang Wang Gabriel Synnaeve Satoshi Nakamura Geoffrey Zweig ViT 69 46 0 23 Oct 2019
Discovering the Compositional Structure of Vector Representations with Role Learning Networks Paul Soulos R. Thomas McCoy Tal Linzen P. Smolensky CoGe 89 44 0 21 Oct 2019
A Deep Reinforced Model for Abstractive Summarization Romain Paulus Caiming Xiong R. Socher AI4TS 206 1,559 0 11 May 2017
Convolutional Sequence to Sequence Learning Jonas Gehring Michael Auli David Grangier Denis Yarats Yann N. Dauphin AIMat 171 3,289 0 08 May 2017
Factorization tricks for LSTM networks Oleksii Kuchaiev Boris Ginsburg 61 113 0 31 Mar 2017
Massive Exploration of Neural Machine Translation Architectures D. Britz Anna Goldie Minh-Thang Luong Quoc V. Le 63 519 0 11 Mar 2017
A Structured Self-attentive Sentence Embedding Zhouhan Lin Minwei Feng Cicero Nogueira dos Santos Mo Yu Bing Xiang Bowen Zhou Yoshua Bengio 115 2,141 0 09 Mar 2017
Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush 116 463 0 03 Feb 2017
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer Noam M. Shazeer Azalia Mirhoseini Krzysztof Maziarz Andy Davis Quoc V. Le Geoffrey E. Hinton J. Dean MoE 253 2,686 0 23 Jan 2017
Neural Machine Translation in Linear Time Nal Kalchbrenner L. Espeholt Karen Simonyan Aaron van den Oord Alex Graves Koray Kavukcuoglu AIMat 115 553 0 31 Oct 2016
Can Active Memory Replace Attention? Lukasz Kaiser Samy Bengio 67 59 0 27 Oct 2016
Xception: Deep Learning with Depthwise Separable Convolutions François Chollet MDE BDL PINN 1.4K 14,608 0 07 Oct 2016
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Zhiwen Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 911 6,796 0 26 Sep 2016
Using the Output Embedding to Improve Language Models Ofir Press Lior Wolf 89 736 0 20 Aug 2016
Layer Normalization Jimmy Lei Ba J. Kiros Geoffrey E. Hinton 423 10,531 0 21 Jul 2016
Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation Jie Zhou Ying Cao Xuguang Wang Peng Li Wenyuan Xu AIMat 64 217 0 14 Jun 2016
Recurrent Neural Network Grammars Chris Dyer A. Kuncoro Miguel Ballesteros Noah A. Smith GNN 91 527 0 25 Feb 2016
Exploring the Limits of Language Modeling Rafal Jozefowicz Oriol Vinyals M. Schuster Noam M. Shazeer Yonghui Wu 201 1,145 0 07 Feb 2016
Long Short-Term Memory-Networks for Machine Reading Jianpeng Cheng Li Dong Mirella Lapata AIMat RALM 109 1,123 0 25 Jan 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 2.2K 194,426 0 10 Dec 2015
Rethinking the Inception Architecture for Computer Vision Christian Szegedy Vincent Vanhoucke Sergey Ioffe Jonathon Shlens Z. Wojna 3DV BDL 886 27,416 0 02 Dec 2015
Neural GPUs Learn Algorithms Lukasz Kaiser Ilya Sutskever 84 370 0 25 Nov 2015
Multi-task Sequence to Sequence Learning Minh-Thang Luong Quoc V. Le Ilya Sutskever Oriol Vinyals Lukasz Kaiser AIMat 116 808 0 19 Nov 2015
Neural Machine Translation of Rare Words with Subword Units Rico Sennrich Barry Haddow Alexandra Birch 228 7,757 0 31 Aug 2015
Effective Approaches to Attention-based Neural Machine Translation Thang Luong Hieu H. Pham Christopher D. Manning 407 7,969 0 17 Aug 2015
Grammar as a Foreign Language Oriol Vinyals Lukasz Kaiser Terry Koo Slav Petrov Ilya Sutskever Geoffrey E. Hinton 121 932 0 23 Dec 2014
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 2.0K 150,312 0 22 Dec 2014
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Junyoung Chung Çağlar Gülçehre Kyunghyun Cho Yoshua Bengio 601 12,741 0 11 Dec 2014
Sequence to Sequence Learning with Neural Networks Ilya Sutskever Oriol Vinyals Quoc V. Le AIMat 443 20,590 0 10 Sep 2014
Neural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau Kyunghyun Cho Yoshua Bengio AIMat 578 27,327 0 01 Sep 2014
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation Kyunghyun Cho B. V. Merrienboer Çağlar Gülçehre Dzmitry Bahdanau Fethi Bougares Holger Schwenk Yoshua Bengio AIMat 1.1K 23,388 0 03 Jun 2014
Generating Sequences With Recurrent Neural Networks Alex Graves GAN 164 4,039 0 04 Aug 2013