ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.03762
  4. Cited By
Attention Is All You Need
v1v2v3v4v5v6v7 (latest)

Attention Is All You Need

12 June 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
    3DV
ArXiv (abs)PDFHTML

Papers citing "Attention Is All You Need"

50 / 27,337 papers shown
Title
Dense Contrastive Visual-Linguistic Pretraining
Dense Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Gao
Zuohui Fu
Gerard de Melo
Yunpeng Chen
Sen Su
VLMSSL
127
11
0
24 Sep 2021
LIBRA: Enabling Workload-aware Multi-dimensional Network Topology
  Optimization for Distributed Training of Large AI Models
LIBRA: Enabling Workload-aware Multi-dimensional Network Topology Optimization for Distributed Training of Large AI Models
William Won
Saeed Rashidi
Sudarshan Srinivasan
T. Krishna
AI4CE
77
9
0
24 Sep 2021
DACT-BERT: Differentiable Adaptive Computation Time for an Efficient
  BERT Inference
DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference
Cristobal Eyzaguirre
Felipe del-Rio
Vladimir Araujo
Alvaro Soto
61
7
0
24 Sep 2021
Adversarial Neural Trip Recommendation
Adversarial Neural Trip Recommendation
Linlang Jiang
Jingbo Zhou
Tong Xu
Yanyan Li
Hechang Chen
Jizhou Huang
Hui Xiong
19
3
0
24 Sep 2021
Simple and Effective Zero-shot Cross-lingual Phoneme Recognition
Simple and Effective Zero-shot Cross-lingual Phoneme Recognition
Qiantong Xu
Alexei Baevski
Michael Auli
VLM
145
91
0
23 Sep 2021
Modeling Dynamic Attributes for Next Basket Recommendation
Modeling Dynamic Attributes for Next Basket Recommendation
Yong-Guang Chen
Jia Li
Chenghao Liu
Chenxi Li
M. Anderle
Julian McAuley
Caiming Xiong
93
18
0
23 Sep 2021
LGD: Label-guided Self-distillation for Object Detection
LGD: Label-guided Self-distillation for Object Detection
Peizhen Zhang
Zijian Kang
Tong Yang
Xinming Zhang
N. Zheng
Jian Sun
ObjD
181
30
0
23 Sep 2021
Named Entity Recognition and Classification on Historical Documents: A
  Survey
Named Entity Recognition and Classification on Historical Documents: A Survey
Maud Ehrmann
Ahmed Hamdi
Elvys Linhares Pontes
Matteo Romanello
A. Doucet
122
115
0
23 Sep 2021
Transferring Knowledge from Vision to Language: How to Achieve it and
  how to Measure it?
Transferring Knowledge from Vision to Language: How to Achieve it and how to Measure it?
Tobias Norlund
Lovisa Hagström
Richard Johansson
72
25
0
23 Sep 2021
Dynamic Knowledge Distillation for Pre-trained Language Models
Dynamic Knowledge Distillation for Pre-trained Language Models
Lei Li
Yankai Lin
Shuhuai Ren
Peng Li
Jie Zhou
Xu Sun
91
49
0
23 Sep 2021
End-to-End Dense Video Grounding via Parallel Regression
End-to-End Dense Video Grounding via Parallel Regression
Fengyuan Shi
Weilin Huang
Limin Wang
113
10
0
23 Sep 2021
The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21
The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21
Lihua Qian
Yi Zhou
Zaixiang Zheng
Yaoming Zhu
Zehui Lin
Jiangtao Feng
Shanbo Cheng
Lei Li
Mingxuan Wang
Hao Zhou
89
18
0
23 Sep 2021
Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and
  Benchmark
Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and Benchmark
Xun Gao
Yin Zhao
Jie Zhang
Longjun Cai
53
6
0
23 Sep 2021
Dependency Structure for News Document Summarization
Dependency Structure for News Document Summarization
Congbo Ma
Wei Emma Zhang
Hu Wang
Shubham Gupta
Mingyu Guo
70
2
0
23 Sep 2021
CSAGN: Conversational Structure Aware Graph Network for Conversational
  Semantic Role Labeling
CSAGN: Conversational Structure Aware Graph Network for Conversational Semantic Role Labeling
Han Wu
Kun Xu
Linqi Song
GNN
81
8
0
23 Sep 2021
Towards Universal Dense Retrieval for Open-domain Question Answering
Towards Universal Dense Retrieval for Open-domain Question Answering
Christopher Sciavolino
RALM
36
1
0
23 Sep 2021
Controlled Evaluation of Grammatical Knowledge in Mandarin Chinese
  Language Models
Controlled Evaluation of Grammatical Knowledge in Mandarin Chinese Language Models
Yiwen Wang
Jennifer Hu
R. Levy
Peng Qian
55
3
0
22 Sep 2021
Cross-Modal Coherence for Text-to-Image Retrieval
Cross-Modal Coherence for Text-to-Image Retrieval
Malihe Alikhani
Fangda Han
Hareesh Ravi
Mubbasir Kapadia
Vladimir Pavlovic
Matthew Stone
61
9
0
22 Sep 2021
Alzheimers Dementia Detection using Acoustic & Linguistic features and
  Pre-Trained BERT
Alzheimers Dementia Detection using Acoustic & Linguistic features and Pre-Trained BERT
Akshay Valsaraj
Ithihas Madala
Nikhil Garg
V. Baths
44
10
0
22 Sep 2021
Recursively Summarizing Books with Human Feedback
Recursively Summarizing Books with Human Feedback
Jeff Wu
Long Ouyang
Daniel M. Ziegler
Nissan Stiennon
Ryan J. Lowe
Jan Leike
Paul Christiano
ALM
221
303
0
22 Sep 2021
Pix2seq: A Language Modeling Framework for Object Detection
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLMViTVLM
298
351
0
22 Sep 2021
Small-Bench NLP: Benchmark for small single GPU trained models in
  Natural Language Processing
Small-Bench NLP: Benchmark for small single GPU trained models in Natural Language Processing
K. Kanakarajan
Bhuvana Kundumani
Malaikannan Sankarasubbu
ALMMoE
59
5
0
22 Sep 2021
A Workflow for Offline Model-Free Robotic Reinforcement Learning
A Workflow for Offline Model-Free Robotic Reinforcement Learning
Aviral Kumar
Anika Singh
Stephen Tian
Chelsea Finn
Sergey Levine
OffRL
215
87
0
22 Sep 2021
Simulated Annealing for Emotional Dialogue Systems
Simulated Annealing for Emotional Dialogue Systems
Chengzhang Dong
Chenyang Huang
Osmar Zaïane
Lili Mou
63
5
0
22 Sep 2021
Scale Efficiently: Insights from Pre-training and Fine-tuning
  Transformers
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Yi Tay
Mostafa Dehghani
J. Rao
W. Fedus
Samira Abnar
Hyung Won Chung
Sharan Narang
Dani Yogatama
Ashish Vaswani
Donald Metzler
283
115
0
22 Sep 2021
MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News
  Summarization
MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization
Xinnuo Xu
Ondrej Dusek
Shashi Narayan
Verena Rieser
Ioannis Konstas
HILM
55
6
0
22 Sep 2021
Enriching and Controlling Global Semantics for Text Summarization
Enriching and Controlling Global Semantics for Text Summarization
Thong Nguyen
Anh Tuan Luu
Truc Lu
Tho Quan
48
35
0
22 Sep 2021
Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation
Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation
Yuanxun Lu
Jinxiang Chai
Xun Cao
88
89
0
22 Sep 2021
Improving 360 Monocular Depth Estimation via Non-local Dense Prediction
  Transformer and Joint Supervised and Self-supervised Learning
Improving 360 Monocular Depth Estimation via Non-local Dense Prediction Transformer and Joint Supervised and Self-supervised Learning
I. Yun
Hyuk-Jae Lee
Chae-Eun Rhee
ViTMDE
77
28
0
22 Sep 2021
Hierarchical Multimodal Transformer to Summarize Videos
Hierarchical Multimodal Transformer to Summarize Videos
Bin Zhao
Maoguo Gong
Xuelong Li
ViT
60
57
0
22 Sep 2021
K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for
  Question Answering
K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering
Fu Sun
Feng-Lin Li
Ruize Wang
Qianglong Chen
Xingyi Cheng
Ji Zhang
VLMKELM
69
4
0
22 Sep 2021
Awakening Latent Grounding from Pretrained Language Models for Semantic
  Parsing
Awakening Latent Grounding from Pretrained Language Models for Semantic Parsing
Qian Liu
Dejian Yang
Jiahui Zhang
Jiaqi Guo
Bin Zhou
Jian-Guang Lou
113
42
0
22 Sep 2021
Role of Language Relatedness in Multilingual Fine-tuning of Language
  Models: A Case Study in Indo-Aryan Languages
Role of Language Relatedness in Multilingual Fine-tuning of Language Models: A Case Study in Indo-Aryan Languages
Tejas I. Dhamecha
V. Rudramurthy
Samarth Bharadwaj
Karthik Sankaranarayanan
P. Bhattacharyya
95
26
0
22 Sep 2021
DialogueBERT: A Self-Supervised Learning based Dialogue Pre-training
  Encoder
DialogueBERT: A Self-Supervised Learning based Dialogue Pre-training Encoder
Zhenyu Zhang
Tao Guo
Meng Chen
SSL
104
23
0
22 Sep 2021
Self-Supervised Learning to Prove Equivalence Between Straight-Line
  Programs via Rewrite Rules
Self-Supervised Learning to Prove Equivalence Between Straight-Line Programs via Rewrite Rules
Steve Kommrusch
Monperrus Martin
L. Pouchet
67
9
0
22 Sep 2021
Rapid detection and recognition of whole brain activity in a freely
  behaving Caenorhabditis elegans
Rapid detection and recognition of whole brain activity in a freely behaving Caenorhabditis elegans
Yuxiang Wu
Shan Wu
Xin Eric Wang
Chengtian Lang
Quanshi Zhang
Quan Wen
Tianqi Xu
58
8
0
22 Sep 2021
The First Vision For Vitals (V4V) Challenge for Non-Contact Video-Based
  Physiological Estimation
The First Vision For Vitals (V4V) Challenge for Non-Contact Video-Based Physiological Estimation
Ambareesh Revanur
Zhihua Li
U. Ciftci
L. Yin
László A. Jeni
114
38
0
22 Sep 2021
Scalable and Efficient MoE Training for Multitask Multilingual Models
Scalable and Efficient MoE Training for Multitask Multilingual Models
Young Jin Kim
A. A. Awan
Alexandre Muzio
Andres Felipe Cruz Salinas
Liyang Lu
Amr Hendy
Samyam Rajbhandari
Yuxiong He
Hany Awadalla
MoE
148
85
0
22 Sep 2021
Multilingual Document-Level Translation Enables Zero-Shot Transfer From
  Sentences to Documents
Multilingual Document-Level Translation Enables Zero-Shot Transfer From Sentences to Documents
Biao Zhang
Ankur Bapna
Melvin Johnson
A. Dabirmoghaddam
N. Arivazhagan
Orhan Firat
78
14
0
21 Sep 2021
Homography augumented momentum constrastive learning for SAR image
  retrieval
Homography augumented momentum constrastive learning for SAR image retrieval
Seonho Park
M. Rysz
Kathleen M. Dipple
P. Pardalos
60
1
0
21 Sep 2021
TrOCR: Transformer-based Optical Character Recognition with Pre-trained
  Models
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
Minghao Li
Tengchao Lv
Jingye Chen
Lei Cui
Yijuan Lu
D. Florêncio
Cha Zhang
Zhoujun Li
Furu Wei
ViT
246
375
0
21 Sep 2021
One Source, Two Targets: Challenges and Rewards of Dual Decoding
One Source, Two Targets: Challenges and Rewards of Dual Decoding
Jitao Xu
François Yvon
71
6
0
21 Sep 2021
TranslateLocally: Blazing-fast translation running on the local CPU
TranslateLocally: Blazing-fast translation running on the local CPU
Nikolay Bogoychev
Jelmer Van der Linde
Kenneth Heafield
45
3
0
21 Sep 2021
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
Md. Akmal Haidar
Nithin Anchuri
Mehdi Rezagholizadeh
Abbas Ghaddar
Philippe Langlais
Pascal Poupart
111
22
0
21 Sep 2021
Interpretable Directed Diversity: Leveraging Model Explanations for
  Iterative Crowd Ideation
Interpretable Directed Diversity: Leveraging Model Explanations for Iterative Crowd Ideation
Yunlong Wang
Priyadarshini Venkatesh
Brian Y. Lim
137
21
0
21 Sep 2021
Are Transformers a Modern Version of ELIZA? Observations on French
  Object Verb Agreement
Are Transformers a Modern Version of ELIZA? Observations on French Object Verb Agreement
Bingzhi Li
Guillaume Wisniewski
Benoît Crabbé
90
6
0
21 Sep 2021
ConvFiT: Conversational Fine-Tuning of Pretrained Language Models
ConvFiT: Conversational Fine-Tuning of Pretrained Language Models
Ivan Vulić
Pei-hao Su
Sam Coope
D. Gerz
Paweł Budzianowski
I. Casanueva
Nikola Mrkvsić
Tsung-Hsien Wen
100
37
0
21 Sep 2021
wsGAT: Weighted and Signed Graph Attention Networks for Link Prediction
wsGAT: Weighted and Signed Graph Attention Networks for Link Prediction
Marco Grassia
G. Mangioni
73
13
0
21 Sep 2021
PDFNet: Pointwise Dense Flow Network for Urban-Scene Segmentation
PDFNet: Pointwise Dense Flow Network for Urban-Scene Segmentation
Venkata Satya Sai Ajay Daliparthi
3DPC
55
0
0
21 Sep 2021
NADE: A Benchmark for Robust Adverse Drug Events Extraction in Face of
  Negations
NADE: A Benchmark for Robust Adverse Drug Events Extraction in Face of Negations
Simone Scaboro
Beatrice Portelli
Emmanuele Chersoni
Enrico Santus
G. Serra
70
9
0
21 Sep 2021
Previous
123...357358359...545546547
Next