ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.03762
  4. Cited By
Attention Is All You Need

Attention Is All You Need

12 June 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
    3DV
ArXivPDFHTML

Papers citing "Attention Is All You Need"

50 / 19,017 papers shown
Title
Learning Longer-term Dependencies in RNNs with Auxiliary Losses
Learning Longer-term Dependencies in RNNs with Auxiliary Losses
Trieu H. Trinh
Andrew M. Dai
Thang Luong
Quoc V. Le
41
179
0
01 Mar 2018
Analyzing Uncertainty in Neural Machine Translation
Analyzing Uncertainty in Neural Machine Translation
Myle Ott
Michael Auli
David Grangier
MarcÁurelio Ranzato
UQLM
43
271
0
28 Feb 2018
Pop Music Highlighter: Marking the Emotion Keypoints
Pop Music Highlighter: Marking the Emotion Keypoints
Yu-Siang Huang
Szu-Yu Chou
Yi-Hsuan Yang
23
17
0
28 Feb 2018
Shampoo: Preconditioned Stochastic Tensor Optimization
Shampoo: Preconditioned Stochastic Tensor Optimization
Vineet Gupta
Tomer Koren
Y. Singer
ODL
34
201
0
26 Feb 2018
Efficient Neural Audio Synthesis
Efficient Neural Audio Synthesis
Nal Kalchbrenner
Erich Elsen
Karen Simonyan
Seb Noury
Norman Casagrande
Edward Lockhart
Florian Stimberg
Aaron van den Oord
Sander Dieleman
Koray Kavukcuoglu
50
864
0
23 Feb 2018
Attentive Tensor Product Learning
Attentive Tensor Product Learning
Qiuyuan Huang
Li Deng
D. Wu
Chang Liu
Xiaodong He
27
23
0
20 Feb 2018
Fitting New Speakers Based on a Short Untranscribed Sample
Fitting New Speakers Based on a Short Untranscribed Sample
Eliya Nachmani
Adam Polyak
Yaniv Taigman
Lior Wolf
24
84
0
20 Feb 2018
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative
  Refinement
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement
Jason D. Lee
Elman Mansimov
Kyunghyun Cho
DiffM
BDL
42
455
0
19 Feb 2018
Global Pose Estimation with an Attention-based Recurrent Network
Global Pose Estimation with an Attention-based Recurrent Network
Emilio Parisotto
Devendra Singh Chaplot
Jian Zhang
Ruslan Salakhutdinov
26
70
0
19 Feb 2018
Universal Neural Machine Translation for Extremely Low Resource
  Languages
Universal Neural Machine Translation for Extremely Low Resource Languages
Jiatao Gu
Hany Hassan
Jacob Devlin
V. Li
35
275
0
15 Feb 2018
Multimodal Generative Models for Scalable Weakly-Supervised Learning
Multimodal Generative Models for Scalable Weakly-Supervised Learning
Mike Wu
Noah D. Goodman
DRL
39
378
0
14 Feb 2018
$\mathcal{G}$-SGD: Optimizing ReLU Neural Networks in its Positively
  Scale-Invariant Space
G\mathcal{G}G-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space
Qi Meng
Shuxin Zheng
Huishuai Zhang
Wei Chen
Zhi-Ming Ma
Tie-Yan Liu
35
38
0
11 Feb 2018
Tree-to-tree Neural Networks for Program Translation
Tree-to-tree Neural Networks for Program Translation
Xinyun Chen
Chang-rui Liu
D. Song
18
275
0
11 Feb 2018
On the Universal Approximability and Complexity Bounds of Quantized ReLU
  Neural Networks
On the Universal Approximability and Complexity Bounds of Quantized ReLU Neural Networks
Yukun Ding
Jinglan Liu
Jinjun Xiong
Yiyu Shi
MQ
37
21
0
10 Feb 2018
Recurrent Neural Network-Based Semantic Variational Autoencoder for
  Sequence-to-Sequence Learning
Recurrent Neural Network-Based Semantic Variational Autoencoder for Sequence-to-Sequence Learning
Myeongjun Jang
Seungwan Seo
Pilsung Kang
DRL
51
55
0
09 Feb 2018
Zero-Resource Neural Machine Translation with Multi-Agent Communication
  Game
Zero-Resource Neural Machine Translation with Multi-Agent Communication Game
Yun Chen
Yang Liu
V. Li
41
47
0
09 Feb 2018
Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention
  for Sequence Modeling
Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling
Tao Shen
Dinesh Manocha
Guodong Long
Jing Jiang
Sen Wang
Chengqi Zhang
AI4TS
45
144
0
31 Jan 2018
Multi-Pointer Co-Attention Networks for Recommendation
Multi-Pointer Co-Attention Networks for Recommendation
Yi Tay
Anh Tuan Luu
S. Hui
3DV
29
287
0
28 Jan 2018
Context Models for OOV Word Translation in Low-Resource Languages
Context Models for OOV Word Translation in Low-Resource Languages
Angli Liu
Katrin Kirchhoff
29
9
0
26 Jan 2018
MaskGAN: Better Text Generation via Filling in the______
MaskGAN: Better Text Generation via Filling in the______
W. Fedus
Ian Goodfellow
Andrew M. Dai
24
468
0
23 Jan 2018
Fix your classifier: the marginal value of training the last weight
  layer
Fix your classifier: the marginal value of training the last weight layer
Elad Hoffer
Itay Hubara
Daniel Soudry
35
101
0
14 Jan 2018
Distance-based Self-Attention Network for Natural Language Inference
Distance-based Self-Attention Network for Natural Language Inference
Jinbae Im
Sungzoon Cho
43
76
0
06 Dec 2017
Strong Baselines for Simple Question Answering over Knowledge Graphs
  with and without Neural Networks
Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks
Salman Mohammed
Peng Shi
Jimmy J. Lin
31
105
0
05 Dec 2017
Improving the Performance of Online Neural Transducer Models
Improving the Performance of Online Neural Transducer Models
Tara N. Sainath
Chung-Cheng Chiu
Rohit Prabhavalkar
Anjuli Kannan
Yonghui Wu
Patrick Nguyen
Zhehuai Chen
AI4TS
41
49
0
05 Dec 2017
State-of-the-art Speech Recognition With Sequence-to-Sequence Models
State-of-the-art Speech Recognition With Sequence-to-Sequence Models
Chung-Cheng Chiu
Tara N. Sainath
Yonghui Wu
Rohit Prabhavalkar
Patrick Nguyen
...
Katya Gonina
Navdeep Jaitly
Bo Li
J. Chorowski
M. Bacchiani
AI4TS
19
1,149
0
05 Dec 2017
Deep Semantic Role Labeling with Self-Attention
Deep Semantic Role Labeling with Self-Attention
Zhixing Tan
Mingxuan Wang
Jun Xie
Yidong Chen
X. Shi
33
308
0
05 Dec 2017
SkipNet: Learning Dynamic Routing in Convolutional Networks
SkipNet: Learning Dynamic Routing in Convolutional Networks
Xin Wang
Feng Yu
Zi-Yi Dou
Trevor Darrell
Joseph E. Gonzalez
39
626
0
26 Nov 2017
Convolutional Image Captioning
Convolutional Image Captioning
J. Aneja
Aditya Deshpande
Alex Schwing
VLM
37
360
0
24 Nov 2017
Speech recognition for medical conversations
Speech recognition for medical conversations
Chung-Cheng Chiu
Anshuman Tripathi
Katherine Chou
Chris Co
Navdeep Jaitly
...
Ananth Sankar
Justin Tansuwan
Nathan Wan
Yonghui Wu
Xuedong Zhang
LM&MA
40
84
0
20 Nov 2017
ATRank: An Attention-Based User Behavior Modeling Framework for
  Recommendation
ATRank: An Attention-Based User Behavior Modeling Framework for Recommendation
Chang Zhou
Jinze Bai
Junshuai Song
Xiaofei Liu
Zhengchao Zhao
Xiusi Chen
Jun Gao
HAI
41
306
0
17 Nov 2017
Image Matters: Visually modeling user behaviors using Advanced Model
  Server
Image Matters: Visually modeling user behaviors using Advanced Model Server
T. Ge
Liqin Zhao
Guorui Zhou
Keyu Chen
Shuying Liu
...
Sui Huang
Qing Cui
Xiaoqiang Zhu
Yu Zhang
Kun Gai
32
41
0
17 Nov 2017
FusionNet: Fusing via Fully-Aware Attention with Application to Machine
  Comprehension
FusionNet: Fusing via Fully-Aware Attention with Application to Machine Comprehension
Hsin-Yuan Huang
Chenguang Zhu
Yelong Shen
Weizhu Chen
FedML
38
183
0
16 Nov 2017
Classical Structured Prediction Losses for Sequence to Sequence Learning
Classical Structured Prediction Losses for Sequence to Sequence Learning
Sergey Edunov
Myle Ott
Michael Auli
David Grangier
MarcÁurelio Ranzato
AIMat
56
185
0
14 Nov 2017
QuickEdit: Editing Text & Translations by Crossing Words Out
QuickEdit: Editing Text & Translations by Crossing Words Out
David Grangier
Michael Auli
KELM
31
10
0
13 Nov 2017
Few-Shot Learning with Graph Neural Networks
Few-Shot Learning with Graph Neural Networks
Victor Garcia Satorras
Joan Bruna
GNN
54
1,230
0
10 Nov 2017
Attend and Diagnose: Clinical Time Series Analysis using Attention
  Models
Attend and Diagnose: Clinical Time Series Analysis using Attention Models
Huan-Zhi Song
Deepta Rajan
Jayaraman J. Thiagarajan
A. Spanias
MLAU
52
447
0
10 Nov 2017
Attentional Pooling for Action Recognition
Attentional Pooling for Action Recognition
Rohit Girdhar
Deva Ramanan
24
319
0
04 Nov 2017
Fixing a Broken ELBO
Fixing a Broken ELBO
Alexander A. Alemi
Ben Poole
Ian S. Fischer
Joshua V. Dillon
Rif A. Saurous
Kevin Patrick Murphy
DRL
BDL
39
80
0
01 Nov 2017
Paraphrase Generation with Deep Reinforcement Learning
Paraphrase Generation with Deep Reinforcement Learning
Zichao Li
Xin Jiang
Lifeng Shang
Hang Li
OffRL
21
213
0
01 Nov 2017
Phase Conductor on Multi-layered Attentions for Machine Comprehension
Phase Conductor on Multi-layered Attentions for Machine Comprehension
R. Liu
Wei Wei
Weiguang Mao
M. Chikina
40
22
0
28 Oct 2017
Social Attention: Modeling Attention in Human Crowds
Social Attention: Modeling Attention in Human Crowds
Anirudh Vemula
Katharina Muelling
Jean Oh
HAI
48
632
0
12 Oct 2017
Improving Lexical Choice in Neural Machine Translation
Improving Lexical Choice in Neural Machine Translation
Toan Q. Nguyen
David Chiang
29
86
0
03 Oct 2017
Attentive Convolution: Equipping CNNs with RNN-style Attention
  Mechanisms
Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms
Wenpeng Yin
Hinrich Schütze
35
41
0
02 Oct 2017
Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named
  Entity Recognition
Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition
L. T. Anh
M. Y. Arkhipov
M. Burtsev
18
37
0
27 Sep 2017
Generating Sentences by Editing Prototypes
Generating Sentences by Editing Prototypes
Kelvin Guu
Tatsunori B. Hashimoto
Yonatan Oren
Percy Liang
30
316
0
26 Sep 2017
DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language
  Understanding
DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding
Tao Shen
Dinesh Manocha
Guodong Long
Jing Jiang
Shirui Pan
Chengqi Zhang
16
749
0
14 Sep 2017
Natural Language Inference over Interaction Space
Natural Language Inference over Interaction Space
Yichen Gong
Heng Luo
Jian Zhang
26
264
0
13 Sep 2017
Deep Learning Techniques for Music Generation -- A Survey
Deep Learning Techniques for Music Generation -- A Survey
Jean-Pierre Briot
Gaëtan Hadjeres
F. Pachet
MGen
39
298
0
05 Sep 2017
Squeeze-and-Excitation Networks
Squeeze-and-Excitation Networks
Jie Hu
Li Shen
Samuel Albanie
Gang Sun
Enhua Wu
123
26,077
0
05 Sep 2017
Revisiting the Effectiveness of Off-the-shelf Temporal Modeling
  Approaches for Large-scale Video Classification
Revisiting the Effectiveness of Off-the-shelf Temporal Modeling Approaches for Large-scale Video Classification
Yunlong Bian
Chuang Gan
Xiao-Chang Liu
Fu Li
Xiang Long
Yandong Li
Heng Qi
Jie Zhou
Shilei Wen
Yuanqing Lin
18
48
0
12 Aug 2017
Previous
123...379380381
Next