ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.03762
  4. Cited By
Attention Is All You Need

Attention Is All You Need

12 June 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
    3DV
ArXivPDFHTML

Papers citing "Attention Is All You Need"

50 / 18,803 papers shown
Title
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal
  Pre-training
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSL
VLM
MLLM
89
895
0
16 Aug 2019
Survey on Deep Neural Networks in Speech and Vision Systems
Survey on Deep Neural Networks in Speech and Vision Systems
M. Alam
Manar D. Samad
Lasitha Vidyaratne
Alexander M. Glandon
Khan M. Iftekharuddin
3DV
VLM
AI4TS
34
205
0
16 Aug 2019
Bidirectional Context-Aware Hierarchical Attention Network for Document
  Understanding
Bidirectional Context-Aware Hierarchical Attention Network for Document Understanding
Jean-Baptiste Remy
A. Tixier
Michalis Vazirgiannis
32
5
0
16 Aug 2019
Densely Connected Graph Convolutional Networks for Graph-to-Sequence
  Learning
Densely Connected Graph Convolutional Networks for Graph-to-Sequence Learning
Zhijiang Guo
Yan Zhang
Zhiyang Teng
Wei Lu
GNN
24
130
0
16 Aug 2019
Incorporating Word and Subword Units in Unsupervised Machine Translation
  Using Language Model Rescoring
Incorporating Word and Subword Units in Unsupervised Machine Translation Using Language Model Rescoring
Zihan Liu
Yan Xu
Genta Indra Winata
Pascale Fung
23
22
0
16 Aug 2019
Visualizing and Understanding the Effectiveness of BERT
Visualizing and Understanding the Effectiveness of BERT
Y. Hao
Li Dong
Furu Wei
Ke Xu
27
181
0
15 Aug 2019
A Multi-Turn Emotionally Engaging Dialog Model
A Multi-Turn Emotionally Engaging Dialog Model
Yubo Xie
Ekaterina Svikhnushina
P. Pu
16
15
0
15 Aug 2019
A Multi-Type Multi-Span Network for Reading Comprehension that Requires
  Discrete Reasoning
A Multi-Type Multi-Span Network for Reading Comprehension that Requires Discrete Reasoning
Minghao Hu
Yuxing Peng
Zhen Huang
Dongsheng Li
AIMat
LRM
32
90
0
15 Aug 2019
A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended
  Multi-Task Learning
A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning
Pengfei Wang
Chengquan Zhang
Fei Qi
Zuming Huang
Mengyi En
Junyu Han
Jingtuo Liu
Errui Ding
Guangming Shi
32
59
0
15 Aug 2019
Adaptive Regularization of Labels
Adaptive Regularization of Labels
Qianggang Ding
Sifan Wu
Hao Sun
Jiadong Guo
Shutao Xia
ODL
24
29
0
15 Aug 2019
Temporal Collaborative Ranking Via Personalized Transformer
Temporal Collaborative Ranking Via Personalized Transformer
Liwei Wu
Shuqing Li
Cho-Jui Hsieh
James Sharpnack
AI4TS
24
4
0
15 Aug 2019
Towards Making the Most of BERT in Neural Machine Translation
Towards Making the Most of BERT in Neural Machine Translation
Jiacheng Yang
Mingxuan Wang
Hao Zhou
Chengqi Zhao
Yong Yu
Weinan Zhang
Lei Li
CLL
21
156
0
15 Aug 2019
Multi-Task Self-Supervised Learning for Disfluency Detection
Multi-Task Self-Supervised Learning for Disfluency Detection
Shaolei Wang
Wanxiang Che
Qi Liu
Pengda Qin
Ting Liu
William Yang Wang
SSL
22
56
0
15 Aug 2019
Once a MAN: Towards Multi-Target Attack via Learning Multi-Target
  Adversarial Network Once
Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once
Jiangfan Han
Xiaoyi Dong
Ruimao Zhang
Dongdong Chen
Weiming Zhang
Nenghai Yu
Ping Luo
Xiaogang Wang
AAML
24
28
0
14 Aug 2019
Scalable Attentive Sentence-Pair Modeling via Distilled Sentence
  Embedding
Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding
Oren Barkan
Noam Razin
Itzik Malkiel
Ori Katz
Avi Caciularu
Noam Koenigstein
FedML
25
37
0
14 Aug 2019
SG-Net: Syntax-Guided Machine Reading Comprehension
SG-Net: Syntax-Guided Machine Reading Comprehension
ZhuoSheng Zhang
Yuwei Wu
Junru Zhou
Sufeng Duan
Hai Zhao
Rui Wang
33
187
0
14 Aug 2019
Fusion of Detected Objects in Text for Visual Question Answering
Fusion of Detected Objects in Text for Visual Question Answering
Chris Alberti
Jeffrey Ling
Michael Collins
David Reitter
17
173
0
14 Aug 2019
Reinforcement Learning Based Graph-to-Sequence Model for Natural
  Question Generation
Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation
Yu Chen
Lingfei Wu
Mohammed J Zaki
GNN
19
154
0
14 Aug 2019
Fine-grained Information Status Classification Using Discourse
  Context-Aware Self-Attention
Fine-grained Information Status Classification Using Discourse Context-Aware Self-Attention
Yufang Hou
10
0
0
13 Aug 2019
Complicated Table Structure Recognition
Complicated Table Structure Recognition
Zewen Chi
Heyan Huang
Heng-Da Xu
Houjin Yu
Wanxuan Yin
Xian-Ling Mao
LMTD
14
107
0
13 Aug 2019
On Identifiability in Transformers
On Identifiability in Transformers
Gino Brunner
Yang Liu
Damian Pascual
Oliver Richter
Massimiliano Ciaramita
Roger Wattenhofer
ViT
30
186
0
12 Aug 2019
LIP: Local Importance-based Pooling
LIP: Local Importance-based Pooling
Ziteng Gao
Limin Wang
Gangshan Wu
FAtt
39
94
0
12 Aug 2019
Multimodal Unified Attention Networks for Vision-and-Language
  Interactions
Multimodal Unified Attention Networks for Vision-and-Language Interactions
Zhou Yu
Yuhao Cui
Jun Yu
Dacheng Tao
Q. Tian
27
38
0
12 Aug 2019
A Review of Cooperative Multi-Agent Deep Reinforcement Learning
A Review of Cooperative Multi-Agent Deep Reinforcement Learning
Afshin Oroojlooyjadid
Davood Hajinezhad
53
413
0
11 Aug 2019
Exploiting Temporal Relationships in Video Moment Localization with
  Natural Language
Exploiting Temporal Relationships in Video Moment Localization with Natural Language
Songyang Zhang
Jinsong Su
Jiebo Luo
12
74
0
11 Aug 2019
SCAR: Spatial-/Channel-wise Attention Regression Networks for Crowd
  Counting
SCAR: Spatial-/Channel-wise Attention Regression Networks for Crowd Counting
Junyu Gao
Qi. Wang
Yuan. Yuan
25
191
0
10 Aug 2019
Multi-modality Latent Interaction Network for Visual Question Answering
Multi-modality Latent Interaction Network for Visual Question Answering
Peng Gao
Haoxuan You
Zhanpeng Zhang
Xiaogang Wang
Hongsheng Li
25
82
0
10 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
82
1,919
0
09 Aug 2019
Biologically-inspired Salience Affected Artificial Neural Network (SANN)
Biologically-inspired Salience Affected Artificial Neural Network (SANN)
Leendert A. Remmelzwaal
George F. R. Ellis
J. Tapson
Amit K Mishra
18
3
0
09 Aug 2019
GridDehazeNet: Attention-Based Multi-Scale Network for Image Dehazing
GridDehazeNet: Attention-Based Multi-Scale Network for Image Dehazing
Xiaohong Liu
Yongrui Ma
Zhihao Shi
Jun Chen
58
741
0
08 Aug 2019
Advocacy Learning: Learning through Competition and Class-Conditional
  Representations
Advocacy Learning: Learning through Competition and Class-Conditional Representations
Ian Fox
Jenna Wiens
SSL
25
2
0
07 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
111
3,634
0
06 Aug 2019
Spatially and Temporally Efficient Non-local Attention Network for
  Video-based Person Re-Identification
Spatially and Temporally Efficient Non-local Attention Network for Video-based Person Re-Identification
Chih-Ting Liu
Chih-Wei Wu
Y. Wang
Shao-Yi Chien
3DPC
32
82
0
05 Aug 2019
Predicting Actions to Help Predict Translations
Predicting Actions to Help Predict Translations
Zixiu "Alex" Wu
Julia Ive
Josiah Wang
Pranava Madhyastha
Lucia Specia
17
7
0
05 Aug 2019
Theme-Aware Aesthetic Distribution Prediction With Full-Resolution
  Photographs
Theme-Aware Aesthetic Distribution Prediction With Full-Resolution Photographs
Gengyun Jia
Peipei Li
Ran He
27
12
0
04 Aug 2019
Attentive Normalization
Attentive Normalization
Xilai Li
Wei Sun
Tianfu Wu
OOD
ViT
28
31
0
04 Aug 2019
ABD-Net: Attentive but Diverse Person Re-Identification
ABD-Net: Attentive but Diverse Person Re-Identification
Tianlong Chen
Shaojin Ding
Jingyi Xie
Ye Yuan
Wuyang Chen
Yang
Zhou Ren
Zhangyang Wang
27
478
0
03 Aug 2019
Tree-Transformer: A Transformer-Based Method for Correction of
  Tree-Structured Data
Tree-Transformer: A Transformer-Based Method for Correction of Tree-Structured Data
Jacob A. Harer
Christopher P. Reale
Peter Chin
25
44
0
01 Aug 2019
Dolphin: A Spoken Language Proficiency Assessment System for Elementary
  Education
Dolphin: A Spoken Language Proficiency Assessment System for Elementary Education
Wenbiao Ding
Guowei Xu
Tianqiao Liu
Weiping Fu
Y. Song
Chaoyou Guo
Cong Kong
Songfan Yang
Gale Yan Huang
Zitao Liu
17
21
0
01 Aug 2019
MSnet: A BERT-based Network for Gendered Pronoun Resolution
MSnet: A BERT-based Network for Gendered Pronoun Resolution
Zili Wang
21
4
0
01 Aug 2019
Expectation-Maximization Attention Networks for Semantic Segmentation
Expectation-Maximization Attention Networks for Semantic Segmentation
Xia Li
Zhisheng Zhong
Jianlong Wu
Yibo Yang
Zhouchen Lin
Hong Liu
3DV
3DPC
14
553
0
31 Jul 2019
English-Czech Systems in WMT19: Document-Level Transformer
English-Czech Systems in WMT19: Document-Level Transformer
Martin Popel
Dominik Machácek
Michal Auersperger
Ondrej Bojar
Pavel Pecina
11
22
0
30 Jul 2019
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
S. Rothe
Shashi Narayan
Aliaksei Severyn
SILM
71
433
0
29 Jul 2019
DLGNet: A Transformer-based Model for Dialogue Response Generation
DLGNet: A Transformer-based Model for Dialogue Response Generation
O. Olabiyi
Erik T. Mueller
16
12
0
26 Jul 2019
Investigating Self-Attention Network for Chinese Word Segmentation
Investigating Self-Attention Network for Chinese Word Segmentation
Leilei Gan
Yue Zhang
21
11
0
26 Jul 2019
Expressive Graph Informer Networks
Expressive Graph Informer Networks
Jaak Simm
Adam Arany
E. Brouwer
Yves Moreau
GNN
28
2
0
25 Jul 2019
DropAttention: A Regularization Method for Fully-Connected
  Self-Attention Networks
DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks
Zehui Lin
Pengfei Liu
Luyao Huang
Junkun Chen
Xipeng Qiu
Xuanjing Huang
3DPC
16
44
0
25 Jul 2019
AlphaStock: A Buying-Winners-and-Selling-Losers Investment Strategy
  using Interpretable Deep Reinforcement Attention Networks
AlphaStock: A Buying-Winners-and-Selling-Losers Investment Strategy using Interpretable Deep Reinforcement Attention Networks
Jingyuan Wang
Yang Zhang
Ke Tang
Junjie Wu
Zhang Xiong
AIFin
24
119
0
24 Jul 2019
Tripartite Heterogeneous Graph Propagation for Large-scale Social
  Recommendation
Tripartite Heterogeneous Graph Propagation for Large-scale Social Recommendation
KyungHyun Kim
Donghyun Kwak
Hanock Kwak
Young-Jin Park
Sangkwon Sim
Jae-Han Cho
Minkyu Kim
Jihun Kwon
Nako Sung
Jung-Woo Ha
13
19
0
24 Jul 2019
CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context
  in Morphology
CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology
Aditi Chaudhary
Elizabeth Salesky
G. Bhat
David R. Mortensen
J. Carbonell
Yulia Tsvetkov
24
4
0
23 Jul 2019
Previous
123...360361362...375376377
Next