ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.03762
  4. Cited By
Attention Is All You Need
v1v2v3v4v5v6v7 (latest)

Attention Is All You Need

12 June 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
    3DV
ArXiv (abs)PDFHTML

Papers citing "Attention Is All You Need"

50 / 26,904 papers shown
Title
Towards artificial general intelligence via a multimodal foundation
  model
Towards artificial general intelligence via a multimodal foundation model
Nanyi Fei
Zhiwu Lu
Yizhao Gao
Guoxing Yang
Yuqi Huo
...
Ruihua Song
Xin Gao
Tao Xiang
Haoran Sun
Jiling Wen
AI4CELRM
90
230
0
27 Oct 2021
Perceptual Score: What Data Modalities Does Your Model Perceive?
Perceptual Score: What Data Modalities Does Your Model Perceive?
Itai Gat
Idan Schwartz
Alex Schwing
86
31
0
27 Oct 2021
MixSeq: Connecting Macroscopic Time Series Forecasting with Microscopic
  Time Series Data
MixSeq: Connecting Macroscopic Time Series Forecasting with Microscopic Time Series Data
Zhibo Zhu
Ziqi Liu
Ge Jin
Qing Cui
Lei Chen
Jun Zhou
Jianyong Zhou
AI4TS
70
15
0
27 Oct 2021
SQALER: Scaling Question Answering by Decoupling Multi-Hop and Logical
  Reasoning
SQALER: Scaling Question Answering by Decoupling Multi-Hop and Logical Reasoning
Mattia Atzeni
Jasmina Bogojeska
Andreas Loukas
ReLMLRM
73
15
0
27 Oct 2021
Pay attention to emoji: Feature Fusion Network with EmoGraph2vec Model
  for Sentiment Analysis
Pay attention to emoji: Feature Fusion Network with EmoGraph2vec Model for Sentiment Analysis
Xiaowei Yuan
Jingyuan Hu
Xiaodan Zhang
Honglei Lv
GNN
22
4
0
27 Oct 2021
Dex-NeRF: Using a Neural Radiance Field to Grasp Transparent Objects
Dex-NeRF: Using a Neural Radiance Field to Grasp Transparent Objects
Jeffrey Ichnowski
Yahav Avigal
Justin Kerr
Ken Goldberg
124
172
0
27 Oct 2021
Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship
  Attribution
Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship Attribution
Silvia Corbara
Alejandro Moreo
Fabrizio Sebastiani
103
8
0
27 Oct 2021
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
392
4,604
0
27 Oct 2021
SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language
  Navigation
SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
A. Moudgil
Arjun Majumdar
Harsh Agrawal
Stefan Lee
Dhruv Batra
LM&Ro
84
61
0
27 Oct 2021
CoFiNet: Reliable Coarse-to-fine Correspondences for Robust Point Cloud
  Registration
CoFiNet: Reliable Coarse-to-fine Correspondences for Robust Point Cloud Registration
Hao Yu
Fu Li
Mahdi Saleh
Benjamin Busam
Slobodan Ilic
3DPC3DV
88
218
0
26 Oct 2021
NeuroBack: Improving CDCL SAT Solving using Graph Neural Networks
NeuroBack: Improving CDCL SAT Solving using Graph Neural Networks
Wenxi Wang
Yang Hu
Mohit Tiwari
S. Khurshid
K. McMillan
Risto Miikkulainen
GNNNAI
55
8
0
26 Oct 2021
Neural Program Generation Modulo Static Analysis
Neural Program Generation Modulo Static Analysis
Rohan Mukherjee
Yeming Wen
Dipak Chaudhari
Thomas W. Reps
Swarat Chaudhuri
C. Jermaine
76
25
0
26 Oct 2021
Leveraging Local Temporal Information for Multimodal Scene
  Classification
Leveraging Local Temporal Information for Multimodal Scene Classification
Saurabh Sahu
Palash Goyal
ViT
29
0
0
26 Oct 2021
Revisiting Batch Norm Initialization
Revisiting Batch Norm Initialization
Jim Davis
Logan Frank
66
4
0
26 Oct 2021
Learning Collaborative Policies to Solve NP-hard Routing Problems
Learning Collaborative Policies to Solve NP-hard Routing Problems
Minsu Kim
Jinkyoo Park
Joungho Kim
74
115
0
26 Oct 2021
Combining Recurrent, Convolutional, and Continuous-time Models with
  Linear State-Space Layers
Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers
Albert Gu
Isys Johnson
Karan Goel
Khaled Kamal Saab
Tri Dao
Atri Rudra
Christopher Ré
132
614
0
26 Oct 2021
Adversarial Attacks and Defenses for Social Network Text Processing
  Applications: Techniques, Challenges and Future Research Directions
Adversarial Attacks and Defenses for Social Network Text Processing Applications: Techniques, Challenges and Future Research Directions
I. Alsmadi
Kashif Ahmad
Mahmoud Nazzal
Firoj Alam
Ala I. Al-Fuqaha
Abdallah Khreishah
A. Algosaibi
AAML
57
16
0
26 Oct 2021
NeRV: Neural Representations for Videos
NeRV: Neural Representations for Videos
Hao Chen
Bo He
Hanyu Wang
Yixuan Ren
Ser-Nam Lim
Abhinav Shrivastava
57
256
0
26 Oct 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
290
1,911
0
26 Oct 2021
Deep Explicit Duration Switching Models for Time Series
Deep Explicit Duration Switching Models for Time Series
Abdul Fatir Ansari
Konstantinos Benidis
Richard Kurle
Ali Caner Turkmen
Harold Soh
Alex Smola
Yuyang Wang
Tim Januschowski
BDL
87
20
0
26 Oct 2021
Assessing Evaluation Metrics for Speech-to-Speech Translation
Assessing Evaluation Metrics for Speech-to-Speech Translation
Elizabeth Salesky
Julian Mäder
Severin Klinger
74
15
0
26 Oct 2021
SE(3) Equivariant Graph Neural Networks with Complete Local Frames
SE(3) Equivariant Graph Neural Networks with Complete Local Frames
Weitao Du
He Zhang
Yuanqi Du
Qi Meng
Wei Chen
Jia Zhang
Tie-Yan Liu
122
84
0
26 Oct 2021
Geometric Transformer for End-to-End Molecule Properties Prediction
Geometric Transformer for End-to-End Molecule Properties Prediction
Yoni Choukroun
Lior Wolf
AI4CEViT
75
16
0
26 Oct 2021
HIST: A Graph-based Framework for Stock Trend Forecasting via Mining
  Concept-Oriented Shared Information
HIST: A Graph-based Framework for Stock Trend Forecasting via Mining Concept-Oriented Shared Information
Wentao Xu
Weiqing Liu
Lewen Wang
Yingce Xia
Jiang Bian
Jian Yin
Tie-Yan Liu
AI4TSAIFin
90
49
0
26 Oct 2021
Hierarchical Transformers Are More Efficient Language Models
Hierarchical Transformers Are More Efficient Language Models
Piotr Nawrot
Szymon Tworkowski
Michał Tyrolski
Lukasz Kaiser
Yuhuai Wu
Christian Szegedy
Henryk Michalewski
89
69
0
26 Oct 2021
BioIE: Biomedical Information Extraction with Multi-head Attention
  Enhanced Graph Convolutional Network
BioIE: Biomedical Information Extraction with Multi-head Attention Enhanced Graph Convolutional Network
Jialun Wu
Yang Liu
Zeyu Gao
Tieliang Gong
Chunbao Wang
Chen Li
46
16
0
26 Oct 2021
s2s-ft: Fine-Tuning Pretrained Transformer Encoders for
  Sequence-to-Sequence Learning
s2s-ft: Fine-Tuning Pretrained Transformer Encoders for Sequence-to-Sequence Learning
Hangbo Bao
Li Dong
Wenhui Wang
Nan Yang
Furu Wei
61
11
0
26 Oct 2021
Probabilistic Entity Representation Model for Reasoning over Knowledge
  Graphs
Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs
Nurendra Choudhary
Nikhil S. Rao
S. Katariya
Karthik Subbian
Chandan K. Reddy
81
37
0
26 Oct 2021
CLAUSEREC: A Clause Recommendation Framework for AI-aided Contract
  Authoring
CLAUSEREC: A Clause Recommendation Framework for AI-aided Contract Authoring
V. Aggarwal
Aparna Garimella
Balaji Vasan Srinivasan
N. Anandhavelu
R. Jain
AILaw
88
11
0
26 Oct 2021
TUNet: A Block-online Bandwidth Extension Model based on Transformers
  and Self-supervised Pretraining
TUNet: A Block-online Bandwidth Extension Model based on Transformers and Self-supervised Pretraining
Viet-Anh Nguyen
Anh H. T. Nguyen
Andy W. H. Khong
60
22
0
26 Oct 2021
Simultaneous Neural Machine Translation with Constituent Label
  Prediction
Simultaneous Neural Machine Translation with Constituent Label Prediction
Yasumasa Kano
Katsuhito Sudoh
Satoshi Nakamura
44
3
0
26 Oct 2021
Decomposing Complex Questions Makes Multi-Hop QA Easier and More
  Interpretable
Decomposing Complex Questions Makes Multi-Hop QA Easier and More Interpretable
Ruiliu Fu
Han Wang
Xuejun Zhang
Jun Zhou
Yonghong Yan
ReLM
67
33
0
26 Oct 2021
Towards More Generalizable One-shot Visual Imitation Learning
Towards More Generalizable One-shot Visual Imitation Learning
Zhao Mandi
Fangchen Liu
Kimin Lee
Pieter Abbeel
80
61
0
26 Oct 2021
TriBERT: Full-body Human-centric Audio-visual Representation Learning
  for Visual Sound Separation
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation
Tanzila Rahman
Mengyu Yang
Leonid Sigal
ViT
67
8
0
26 Oct 2021
IIP-Transformer: Intra-Inter-Part Transformer for Skeleton-Based Action
  Recognition
IIP-Transformer: Intra-Inter-Part Transformer for Skeleton-Based Action Recognition
Qingtian Wang
Jianlin Peng
Shuze Shi
Tingxi Liu
Jiabin He
Renliang Weng
ViT
68
37
0
26 Oct 2021
History Aware Multimodal Transformer for Vision-and-Language Navigation
History Aware Multimodal Transformer for Vision-and-Language Navigation
Shizhe Chen
Pierre-Louis Guhur
Cordelia Schmid
Ivan Laptev
LM&Ro
84
236
0
25 Oct 2021
Distributionally Robust Recurrent Decoders with Random Network
  Distillation
Distributionally Robust Recurrent Decoders with Random Network Distillation
Antonio Valerio Miceli Barone
Alexandra Birch
Rico Sennrich
125
1
0
25 Oct 2021
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual
  Language Reasoning
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning
Pan Lu
Liang Qiu
Jiaqi Chen
Tony Xia
Yizhou Zhao
Wei Zhang
Zhou Yu
Xiaodan Liang
Song-Chun Zhu
AIMat
147
207
0
25 Oct 2021
Parameter Prediction for Unseen Deep Architectures
Parameter Prediction for Unseen Deep Architectures
Boris Knyazev
M. Drozdzal
Graham W. Taylor
Adriana Romero Soriano
OOD
111
83
0
25 Oct 2021
Gophormer: Ego-Graph Transformer for Node Classification
Gophormer: Ego-Graph Transformer for Node Classification
Jianan Zhao
Chaozhuo Li
Qian Wen
Yiqi Wang
Yuming Liu
Hao Sun
Xing Xie
Yanfang Ye
69
83
0
25 Oct 2021
MVT: Multi-view Vision Transformer for 3D Object Recognition
MVT: Multi-view Vision Transformer for 3D Object Recognition
Shuo Chen
Tan Yu
Ping Li
ViT
69
45
0
25 Oct 2021
Robbing the Fed: Directly Obtaining Private Data in Federated Learning
  with Modified Models
Robbing the Fed: Directly Obtaining Private Data in Federated Learning with Modified Models
Liam H. Fowl
Jonas Geiping
W. Czaja
Micah Goldblum
Tom Goldstein
FedML
130
148
0
25 Oct 2021
Applications and Techniques for Fast Machine Learning in Science
Applications and Techniques for Fast Machine Learning in Science
A. Deiana
Nhan Tran
Joshua C. Agar
Michaela Blott
G. D. Guglielmo
...
Ashish Sharma
S. Summers
Pietro Vischia
J. Vlimant
Olivia Weng
82
72
0
25 Oct 2021
TAPL: Dynamic Part-based Visual Tracking via Attention-guided Part
  Localization
TAPL: Dynamic Part-based Visual Tracking via Attention-guided Part Localization
Wei Han
Hantao Huang
Xiaoxi Yu
31
1
0
25 Oct 2021
Generating artificial texts as substitution or complement of training
  data
Generating artificial texts as substitution or complement of training data
Vincent Claveau
Antoine Chaffin
Ewa Kijak
75
10
0
25 Oct 2021
AxoNN: An asynchronous, message-driven parallel framework for
  extreme-scale deep learning
AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning
Siddharth Singh
A. Bhatele
GNN
92
15
0
25 Oct 2021
Goal-Aware Cross-Entropy for Multi-Target Reinforcement Learning
Goal-Aware Cross-Entropy for Multi-Target Reinforcement Learning
Kibeom Kim
Min Whoo Lee
Yoonsung Kim
Je-hwan Ryu
Minsu Lee
Byoung-Tak Zhang
62
8
0
25 Oct 2021
DocTr: Document Image Transformer for Geometric Unwarping and
  Illumination Correction
DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction
Hao Feng
Yuechen Wang
Wen-gang Zhou
Jiajun Deng
Houqiang Li
ViT
102
60
0
25 Oct 2021
The Efficiency Misnomer
The Efficiency Misnomer
Daoyuan Chen
Liuyi Yao
Dawei Gao
Ashish Vaswani
Yaliang Li
107
103
0
25 Oct 2021
Actions Speak Louder than Listening: Evaluating Music Style Transfer
  based on Editing Experience
Actions Speak Louder than Listening: Evaluating Music Style Transfer based on Editing Experience
Weiyi Lu
Meng-Hsuan Wu
Yuh-ming Chiu
Li Su
48
0
0
25 Oct 2021
Previous
123...339340341...537538539
Next