ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.03762
  4. Cited By
Attention Is All You Need
v1v2v3v4v5v6v7 (latest)

Attention Is All You Need

12 June 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
    3DV
ArXiv (abs)PDFHTML

Papers citing "Attention Is All You Need"

50 / 27,337 papers shown
Title
code2vec: Learning Distributed Representations of Code
code2vec: Learning Distributed Representations of Code
Uri Alon
Meital Zilberstein
Omer Levy
Eran Yahav
90
1,188
0
26 Mar 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in
  End-to-End Speech Synthesis
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Yuxuan Wang
Daisy Stanton
Yu Zhang
RJ Skerry-Ryan
Eric Battenberg
Joel Shor
Y. Xiao
Fei Ren
Ye Jia
Rif A. Saurous
68
827
0
23 Mar 2018
Attention, Learn to Solve Routing Problems!
Attention, Learn to Solve Routing Problems!
W. Kool
H. V. Hoof
Max Welling
137
1,236
0
22 Mar 2018
AllenNLP: A Deep Semantic Natural Language Processing Platform
AllenNLP: A Deep Semantic Natural Language Processing Platform
Matt Gardner
Joel Grus
Mark Neumann
Oyvind Tafjord
Pradeep Dasigi
Nelson F. Liu
Matthew E. Peters
Michael Schmitz
Luke Zettlemoyer
VLM
102
1,284
0
20 Mar 2018
GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal
  Graphs
GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs
Jiani Zhang
Xingjian Shi
Junyuan Xie
Hao Ma
Irwin King
Dit-Yan Yeung
GNN
122
573
0
20 Mar 2018
Why not be Versatile? Applications of the SGNMT Decoder for Machine
  Translation
Why not be Versatile? Applications of the SGNMT Decoder for Machine Translation
Felix Stahlberg
Danielle Saunders
Gonzalo Iglesias
Bill Byrne
63
11
0
20 Mar 2018
English-Catalan Neural Machine Translation in the Biomedical Domain
  through the cascade approach
English-Catalan Neural Machine Translation in the Biomedical Domain through the cascade approach
Marta R. Costa-jussá
Noe Casas
Maite Melero
35
5
0
19 Mar 2018
Learning Region Features for Object Detection
Learning Region Features for Object Detection
Jiayuan Gu
Han Hu
Liwei Wang
Yichen Wei
Jifeng Dai
ObjD
90
79
0
19 Mar 2018
Towards Explanation of DNN-based Prediction with Guided Feature
  Inversion
Towards Explanation of DNN-based Prediction with Guided Feature Inversion
Mengnan Du
Ninghao Liu
Qingquan Song
Helen Zhou
FAtt
97
127
0
19 Mar 2018
Tensor2Tensor for Neural Machine Translation
Tensor2Tensor for Neural Machine Translation
Ashish Vaswani
Samy Bengio
E. Brevdo
François Chollet
Aidan Gomez
...
Nal Kalchbrenner
Niki Parmar
Ryan Sepassi
Noam M. Shazeer
Jakob Uszkoreit
98
530
0
16 Mar 2018
TBD: Benchmarking and Analyzing Deep Neural Network Training
TBD: Benchmarking and Analyzing Deep Neural Network Training
Hongyu Zhu
Mohamed Akrout
Bojian Zheng
Andrew Pelegris
Amar Phanishayee
Bianca Schroeder
Gennady Pekhimenko
90
80
0
16 Mar 2018
Achieving Human Parity on Automatic Chinese to English News Translation
Achieving Human Parity on Automatic Chinese to English News Translation
Hany Hassan
Anthony Aue
Chang Chen
Vishal Chowdhary
Jonathan Clark
...
Shuangzhi Wu
Yingce Xia
Dongdong Zhang
Zhirui Zhang
Ming Zhou
96
607
0
15 Mar 2018
LCANet: End-to-End Lipreading with Cascaded Attention-CTC
LCANet: End-to-End Lipreading with Cascaded Attention-CTC
Kai Xu
Dawei Li
N. Cassimatis
Xiaolong Wang
68
97
0
13 Mar 2018
Recurrent Neural Network Attention Mechanisms for Interpretable System
  Log Anomaly Detection
Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection
Andy Brown
Aaron Tuor
Brian Hutchinson
Nicole Nichols
42
173
0
13 Mar 2018
The Importance of Being Recurrent for Modeling Hierarchical Structure
The Importance of Being Recurrent for Modeling Hierarchical Structure
Ke M. Tran
Arianna Bisazza
Christof Monz
89
150
0
09 Mar 2018
Fast Decoding in Sequence Models using Discrete Latent Variables
Fast Decoding in Sequence Models using Discrete Latent Variables
Łukasz Kaiser
Aurko Roy
Ashish Vaswani
Niki Parmar
Samy Bengio
Jakob Uszkoreit
Noam M. Shazeer
83
232
0
09 Mar 2018
Compositional Attention Networks for Machine Reasoning
Compositional Attention Networks for Machine Reasoning
Drew A. Hudson
Christopher D. Manning
BDLOODLRM
203
578
0
08 Mar 2018
Generating Contradictory, Neutral, and Entailing Sentences
Generating Contradictory, Neutral, and Entailing Sentences
Songlin Yang
Shawn Tan
Chin-Wei Huang
Aaron Courville
35
3
0
07 Mar 2018
Self-Attention with Relative Position Representations
Self-Attention with Relative Position Representations
Peter Shaw
Jakob Uszkoreit
Ashish Vaswani
203
2,314
0
06 Mar 2018
Norm matters: efficient and accurate normalization schemes in deep
  networks
Norm matters: efficient and accurate normalization schemes in deep networks
Elad Hoffer
Ron Banner
Itay Golan
Daniel Soudry
OffRL
90
179
0
05 Mar 2018
Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with
  Adversarial Examples
Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples
Minhao Cheng
Jinfeng Yi
Pin-Yu Chen
Huan Zhang
Cho-Jui Hsieh
SILMAAML
116
245
0
03 Mar 2018
XNMT: The eXtensible Neural Machine Translation Toolkit
XNMT: The eXtensible Neural Machine Translation Toolkit
Graham Neubig
Matthias Sperber
Xinyi Wang
Matthieu Felix
Austin Matthews
...
Philip Arthur
Pierre Godard
John Hewitt
Rachid Riad
Liming Wang
77
67
0
01 Mar 2018
Learning Longer-term Dependencies in RNNs with Auxiliary Losses
Learning Longer-term Dependencies in RNNs with Auxiliary Losses
Trieu H. Trinh
Andrew M. Dai
Thang Luong
Quoc V. Le
98
181
0
01 Mar 2018
Analyzing Uncertainty in Neural Machine Translation
Analyzing Uncertainty in Neural Machine Translation
Myle Ott
Michael Auli
David Grangier
MarcÁurelio Ranzato
UQLM
182
275
0
28 Feb 2018
Pop Music Highlighter: Marking the Emotion Keypoints
Pop Music Highlighter: Marking the Emotion Keypoints
Yu-Siang Huang
Szu-Yu Chou
Yi-Hsuan Yang
36
17
0
28 Feb 2018
Shampoo: Preconditioned Stochastic Tensor Optimization
Shampoo: Preconditioned Stochastic Tensor Optimization
Vineet Gupta
Tomer Koren
Y. Singer
ODL
115
226
0
26 Feb 2018
Can Neural Networks Understand Logical Entailment?
Can Neural Networks Understand Logical Entailment?
Richard Evans
D. Saxton
David Amos
Pushmeet Kohli
Edward Grefenstette
NAI
196
128
0
23 Feb 2018
Efficient Neural Audio Synthesis
Efficient Neural Audio Synthesis
Nal Kalchbrenner
Erich Elsen
Karen Simonyan
Seb Noury
Norman Casagrande
Edward Lockhart
Florian Stimberg
Aaron van den Oord
Sander Dieleman
Koray Kavukcuoglu
99
872
0
23 Feb 2018
Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient
  Algorithms
Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms
Ashok Vardhan Makkuva
Sewoong Oh
Sreeram Kannan
Pramod Viswanath
MoE
47
19
0
21 Feb 2018
Attentive Tensor Product Learning
Attentive Tensor Product Learning
Qiuyuan Huang
Li Deng
D. Wu
Chang Liu
Xiaodong He
82
23
0
20 Feb 2018
Fitting New Speakers Based on a Short Untranscribed Sample
Fitting New Speakers Based on a Short Untranscribed Sample
Eliya Nachmani
Adam Polyak
Yaniv Taigman
Lior Wolf
53
84
0
20 Feb 2018
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative
  Refinement
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement
Jason D. Lee
Elman Mansimov
Kyunghyun Cho
DiffMBDL
97
456
0
19 Feb 2018
Global Pose Estimation with an Attention-based Recurrent Network
Global Pose Estimation with an Attention-based Recurrent Network
Emilio Parisotto
Devendra Singh Chaplot
Jian Zhang
Ruslan Salakhutdinov
58
70
0
19 Feb 2018
Building a Word Segmenter for Sanskrit Overnight
Building a Word Segmenter for Sanskrit Overnight
V. Reddy
Amrith Krishna
V. Sharma
Prateek Gupta
R. VineethM.
Pawan Goyal
43
18
0
17 Feb 2018
Image Transformer
Image Transformer
Niki Parmar
Ashish Vaswani
Jakob Uszkoreit
Lukasz Kaiser
Noam M. Shazeer
Alexander Ku
Dustin Tran
ViT
161
1,691
0
15 Feb 2018
Model compression via distillation and quantization
Model compression via distillation and quantization
A. Polino
Razvan Pascanu
Dan Alistarh
MQ
103
734
0
15 Feb 2018
Universal Neural Machine Translation for Extremely Low Resource
  Languages
Universal Neural Machine Translation for Extremely Low Resource Languages
Jiatao Gu
Hany Hassan
Jacob Devlin
Victor O.K. Li
101
278
0
15 Feb 2018
Multimodal Generative Models for Scalable Weakly-Supervised Learning
Multimodal Generative Models for Scalable Weakly-Supervised Learning
Mike Wu
Noah D. Goodman
DRL
106
383
0
14 Feb 2018
Neural Voice Cloning with a Few Samples
Neural Voice Cloning with a Few Samples
Sercan O. Arik
Jitong Chen
Kainan Peng
Ming-Yu Liu
Yanqi Zhou
82
388
0
14 Feb 2018
$\mathcal{G}$-SGD: Optimizing ReLU Neural Networks in its Positively
  Scale-Invariant Space
G\mathcal{G}G-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space
Qi Meng
Shuxin Zheng
Huishuai Zhang
Wei Chen
Zhi-Ming Ma
Tie-Yan Liu
133
39
0
11 Feb 2018
Tree-to-tree Neural Networks for Program Translation
Tree-to-tree Neural Networks for Program Translation
Xinyun Chen
Chang-rui Liu
Basel Alomair
107
279
0
11 Feb 2018
On the Universal Approximability and Complexity Bounds of Quantized ReLU
  Neural Networks
On the Universal Approximability and Complexity Bounds of Quantized ReLU Neural Networks
Yukun Ding
Jinglan Liu
Jinjun Xiong
Yiyu Shi
MQ
120
21
0
10 Feb 2018
Online Learning for Effort Reduction in Interactive Neural Machine
  Translation
Online Learning for Effort Reduction in Interactive Neural Machine Translation
Álvaro Peris
F. Casacuberta
72
49
0
10 Feb 2018
Recurrent Neural Network-Based Semantic Variational Autoencoder for
  Sequence-to-Sequence Learning
Recurrent Neural Network-Based Semantic Variational Autoencoder for Sequence-to-Sequence Learning
Myeongjun Jang
Seungwan Seo
Pilsung Kang
DRL
88
57
0
09 Feb 2018
Zero-Resource Neural Machine Translation with Multi-Agent Communication
  Game
Zero-Resource Neural Machine Translation with Multi-Agent Communication Game
Yun Chen
Yang Liu
Victor O.K. Li
149
48
0
09 Feb 2018
Question-Answer Selection in User to User Marketplace Conversations
Question-Answer Selection in User to User Marketplace Conversations
Girish Kumar
Matthew Henderson
Shannon Chan
Hoang-Diep Nguyen
L. Ngoo
50
8
0
06 Feb 2018
Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention
  for Sequence Modeling
Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling
Tao Shen
Dinesh Manocha
Guodong Long
Jing Jiang
Sen Wang
Chengqi Zhang
AI4TS
142
144
0
31 Jan 2018
Generating Wikipedia by Summarizing Long Sequences
Generating Wikipedia by Summarizing Long Sequences
Peter J. Liu
Mohammad Saleh
Etienne Pot
Ben Goodrich
Ryan Sepassi
Lukasz Kaiser
Noam M. Shazeer
CVBM
229
801
0
30 Jan 2018
Discrete Autoencoders for Sequence Models
Discrete Autoencoders for Sequence Models
Lukasz Kaiser
Samy Bengio
BDL
94
50
0
29 Jan 2018
Multi-Pointer Co-Attention Networks for Recommendation
Multi-Pointer Co-Attention Networks for Recommendation
Yi Tay
Anh Tuan Luu
S. Hui
3DV
191
290
0
28 Jan 2018
Previous
123...544545546547
Next