ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.02155
  4. Cited By
Self-Attention with Relative Position Representations

Self-Attention with Relative Position Representations

6 March 2018
Peter Shaw
Jakob Uszkoreit
Ashish Vaswani
ArXivPDFHTML

Papers citing "Self-Attention with Relative Position Representations"

50 / 411 papers shown
Title
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Ashish Vaswani
Prajit Ramachandran
A. Srinivas
Niki Parmar
Blake A. Hechtman
Jonathon Shlens
27
395
0
23 Mar 2021
API2Com: On the Improvement of Automatically Generated Code Comments
  Using API Documentations
API2Com: On the Improvement of Automatically Generated Code Comments Using API Documentations
Ramin Shahbazi
Rishab Sharma
Fatemeh H. Fard
24
25
0
19 Mar 2021
An End-to-End Network for Emotion-Cause Pair Extraction
An End-to-End Network for Emotion-Cause Pair Extraction
Aaditya Singh
Shreeshail Hingane
Saim Wani
Ashutosh Modi
24
38
0
02 Mar 2021
Entity Structure Within and Throughout: Modeling Mention Dependencies
  for Document-Level Relation Extraction
Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extraction
Benfeng Xu
Quan Wang
Yajuan Lyu
Yong Zhu
Zhendong Mao
27
166
0
20 Feb 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
281
179
0
17 Feb 2021
Revisiting Language Encoding in Learning Multilingual Representations
Revisiting Language Encoding in Learning Multilingual Representations
Shengjie Luo
Kaiyuan Gao
Shuxin Zheng
Guolin Ke
Di He
Liwei Wang
Tie-Yan Liu
34
2
0
16 Feb 2021
TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can
  Scale Up
TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up
Yi Ding
Shiyu Chang
Zhangyang Wang
ViT
29
382
0
14 Feb 2021
Transformer Language Models with LSTM-based Cross-utterance Information
  Representation
Transformer Language Models with LSTM-based Cross-utterance Information Representation
G. Sun
C. Zhang
P. Woodland
76
32
0
12 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Joey Tianyi Zhou
MLLM
277
525
0
04 Feb 2021
Bottleneck Transformers for Visual Recognition
Bottleneck Transformers for Visual Recognition
A. Srinivas
Nayeon Lee
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
290
980
0
27 Jan 2021
Transformers in Vision: A Survey
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
227
2,431
0
04 Jan 2021
Code Generation from Natural Language with Less Prior and More
  Monolingual Data
Code Generation from Natural Language with Less Prior and More Monolingual Data
Sajad Norouzi
Keyi Tang
Yanshuai Cao
14
19
0
01 Jan 2021
Shortformer: Better Language Modeling using Shorter Inputs
Shortformer: Better Language Modeling using Shorter Inputs
Ofir Press
Noah A. Smith
M. Lewis
230
89
0
31 Dec 2020
ERNIE-Doc: A Retrospective Long-Document Modeling Transformer
ERNIE-Doc: A Retrospective Long-Document Modeling Transformer
Siyu Ding
Junyuan Shang
Shuohuan Wang
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
71
52
0
31 Dec 2020
Optimizing Deeper Transformers on Small Datasets
Optimizing Deeper Transformers on Small Datasets
Peng Xu
Dhruv Kumar
Wei Yang
Wenjie Zi
Keyi Tang
Chenyang Huang
Jackie C.K. Cheung
S. Prince
Yanshuai Cao
AI4CE
24
68
0
30 Dec 2020
Code Summarization with Structure-induced Transformer
Code Summarization with Structure-induced Transformer
Hongqiu Wu
Hai Zhao
Min Zhang
41
84
0
29 Dec 2020
Portfolio Optimization with 2D Relative-Attentional Gated Transformer
Portfolio Optimization with 2D Relative-Attentional Gated Transformer
Tae Wan Kim
Matloob Khushi
AI4TS
28
12
0
27 Dec 2020
Learning Light-Weight Translation Models from Deep Transformer
Learning Light-Weight Translation Models from Deep Transformer
Bei Li
Ziyang Wang
Hui Liu
Quan Du
Tong Xiao
Chunliang Zhang
Jingbo Zhu
VLM
120
40
0
27 Dec 2020
Learning to Represent Programs with Heterogeneous Graphs
Learning to Represent Programs with Heterogeneous Graphs
Kechi Zhang
Wenhan Wang
Huangzhao Zhang
Ge Li
Zhi Jin
GNN
21
63
0
08 Dec 2020
Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for
  3D Reconstruction
Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction
Anzhu Yu
Wenyue Guo
Bing Liu
Xin Chen
Xin Wang
Xuefeng Cao
Bingchuan Jiang
3DV
26
64
0
25 Nov 2020
Persuasive Dialogue Understanding: the Baselines and Negative Results
Persuasive Dialogue Understanding: the Baselines and Negative Results
Hui Chen
Deepanway Ghosal
Navonil Majumder
Amir Hussain
Soujanya Poria
23
8
0
19 Nov 2020
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis
Xi Wang
Huaiping Ming
Lei He
Frank Soong
14
5
0
17 Nov 2020
Blind Deinterleaving of Signals in Time Series with Self-attention Based
  Soft Min-cost Flow Learning
Blind Deinterleaving of Signals in Time Series with Self-attention Based Soft Min-cost Flow Learning
Ougul Can
Y. Z. Gürbüz
B. Yildirim
A. Aydin Alatan
AI4TS
17
4
0
24 Oct 2020
A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep
  Learning for Source Code
A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code
Nadezhda Chirkova
Sergey Troshin
47
12
0
23 Oct 2020
SmBoP: Semi-autoregressive Bottom-up Semantic Parsing
SmBoP: Semi-autoregressive Bottom-up Semantic Parsing
Ohad Rubin
Jonathan Berant
138
150
0
23 Oct 2020
Developing Real-time Streaming Transformer Transducer for Speech
  Recognition on Large-scale Dataset
Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset
Xie Chen
Yu-Huan Wu
Zhenghao Wang
Shujie Liu
Jinyu Li
22
169
0
22 Oct 2020
DuoRAT: Towards Simpler Text-to-SQL Models
DuoRAT: Towards Simpler Text-to-SQL Models
Torsten Scholak
Raymond Li
Dzmitry Bahdanau
H. D. Vries
C. Pal
AI4TS
35
26
0
21 Oct 2020
Predicting Chemical Properties using Self-Attention Multi-task Learning
  based on SMILES Representation
Predicting Chemical Properties using Self-Attention Multi-task Learning based on SMILES Representation
Sangrak Lim
Yong Oh Lee
27
17
0
19 Oct 2020
MIA-Prognosis: A Deep Learning Framework to Predict Therapy Response
MIA-Prognosis: A Deep Learning Framework to Predict Therapy Response
Jiancheng Yang
Jiajun Chen
Kaiming Kuang
Tiancheng Lin
Junjun He
Bingbing Ni
31
8
0
08 Oct 2020
SlotRefine: A Fast Non-Autoregressive Model for Joint Intent Detection
  and Slot Filling
SlotRefine: A Fast Non-Autoregressive Model for Joint Intent Detection and Slot Filling
Di Wu
Liang Ding
Fan Lu
Jian Xie
VLM
BDL
29
80
0
06 Oct 2020
SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge
  Graph Summarization
SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization
Yue Yu
Kexin Huang
Chao Zhang
Lucas Glass
Jimeng Sun
Cao Xiao
28
120
0
04 Oct 2020
Improve Transformer Models with Better Relative Position Embeddings
Improve Transformer Models with Better Relative Position Embeddings
Zhiheng Huang
Davis Liang
Peng Xu
Bing Xiang
ViT
15
127
0
28 Sep 2020
Temporally Guided Music-to-Body-Movement Generation
Temporally Guided Music-to-Body-Movement Generation
Hsuan-Kai Kao
Li Su
44
42
0
17 Sep 2020
Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable
  End-to-End Speech Recognition
Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition
Wenyong Huang
Wenchao Hu
Y. Yeung
Xiao Chen
25
50
0
13 Aug 2020
Select, Extract and Generate: Neural Keyphrase Generation with
  Layer-wise Coverage Attention
Select, Extract and Generate: Neural Keyphrase Generation with Layer-wise Coverage Attention
Wasi Uddin Ahmad
Xiaoyu Bai
Soomin Lee
Kai-Wei Chang
41
36
0
04 Aug 2020
Learning Modality Interaction for Temporal Sentence Localization and
  Event Captioning in Videos
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
Shaoxiang Chen
Wenhao Jiang
Wei Liu
Yu-Gang Jiang
25
101
0
28 Jul 2020
RobustScanner: Dynamically Enhancing Positional Clues for Robust Text
  Recognition
RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition
Xiaoyu Yue
Zhanghui Kuang
Chenhao Lin
Hongbin Sun
Wayne Zhang
28
160
0
15 Jul 2020
Rewiring the Transformer with Depth-Wise LSTMs
Rewiring the Transformer with Depth-Wise LSTMs
Hongfei Xu
Yang Song
Qiuhui Liu
Josef van Genabith
Deyi Xiong
42
6
0
13 Jul 2020
Transformer-XL Based Music Generation with Multiple Sequences of
  Time-valued Notes
Transformer-XL Based Music Generation with Multiple Sequences of Time-valued Notes
Xianchao Wu
Chengyuan Wang
Qinying Lei
14
19
0
11 Jul 2020
Hybrid Models for Learning to Branch
Hybrid Models for Learning to Branch
Prateek Gupta
Maxime Gasse
Elias Boutros Khalil
M. P. Kumar
Andrea Lodi
Yoshua Bengio
GNN
22
123
0
26 Jun 2020
SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks
SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks
F. Fuchs
Daniel E. Worrall
Volker Fischer
Max Welling
3DPC
45
667
0
18 Jun 2020
On the Computational Power of Transformers and its Implications in
  Sequence Modeling
On the Computational Power of Transformers and its Implications in Sequence Modeling
S. Bhattamishra
Arkil Patel
Navin Goyal
33
65
0
16 Jun 2020
Improving Graph Neural Network Expressivity via Subgraph Isomorphism
  Counting
Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting
Giorgos Bouritsas
Fabrizio Frasca
S. Zafeiriou
M. Bronstein
58
424
0
16 Jun 2020
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
AAML
64
2,622
0
05 Jun 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment
  Search
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
54
475
0
22 May 2020
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based
  Quantized DNNs
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs
Yongkweon Jeon
Baeseong Park
S. Kwon
Byeongwook Kim
Jeongin Yun
Dongsoo Lee
MQ
33
30
0
20 May 2020
How Does Selective Mechanism Improve Self-Attention Networks?
How Does Selective Mechanism Improve Self-Attention Networks?
Xinwei Geng
Longyue Wang
Xing Wang
Bing Qin
Ting Liu
Zhaopeng Tu
AAML
39
35
0
03 May 2020
Hard-Coded Gaussian Attention for Neural Machine Translation
Hard-Coded Gaussian Attention for Neural Machine Translation
Weiqiu You
Simeng Sun
Mohit Iyyer
22
67
0
02 May 2020
A Transformer-based Approach for Source Code Summarization
A Transformer-based Approach for Source Code Summarization
Wasi Uddin Ahmad
Saikat Chakraborty
Baishakhi Ray
Kai-Wei Chang
ViT
16
375
0
01 May 2020
Capsule-Transformer for Neural Machine Translation
Capsule-Transformer for Neural Machine Translation
Sufeng Duan
Juncheng Cao
Hai Zhao
MedIm
27
4
0
30 Apr 2020
Previous
123456789
Next