ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.05909
  4. Cited By
Stand-Alone Self-Attention in Vision Models

Stand-Alone Self-Attention in Vision Models

13 June 2019
Prajit Ramachandran
Niki Parmar
Ashish Vaswani
Irwan Bello
Anselm Levskaya
Jonathon Shlens
    VLMSLRViT
ArXiv (abs)PDFHTML

Papers citing "Stand-Alone Self-Attention in Vision Models"

50 / 588 papers shown
Title
Attention-based Domain Adaptation for Single Stage Detectors
Attention-based Domain Adaptation for Single Stage Detectors
Vidit Vidit
Mathieu Salzmann
ObjD
90
13
0
14 Jun 2021
Structure-Regularized Attention for Deformable Object Representation
Structure-Regularized Attention for Deformable Object Representation
Shenao Zhang
Li Shen
Zhifeng Li
Wei Liu
41
1
0
12 Jun 2021
Rethinking Architecture Design for Tackling Data Heterogeneity in
  Federated Learning
Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning
Liangqiong Qu
Yuyin Zhou
Paul Pu Liang
Yingda Xia
Feifei Wang
Ehsan Adeli
L. Fei-Fei
D. Rubin
FedMLAI4CE
114
186
0
10 Jun 2021
Transformed CNNs: recasting pre-trained convolutional layers with
  self-attention
Transformed CNNs: recasting pre-trained convolutional layers with self-attention
Stéphane dÁscoli
Levent Sagun
Giulio Biroli
Ari S. Morcos
ViT
56
6
0
10 Jun 2021
Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in
  Time
Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time
Shao-Wei Liu
Hanwen Jiang
Jiarui Xu
Sifei Liu
Xiaolong Wang
3DH
130
165
0
09 Jun 2021
Rethinking Space-Time Networks with Improved Memory Coverage for
  Efficient Video Object Segmentation
Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation
Ho Kei Cheng
Yu-Wing Tai
Chi-Keung Tang
VOS
98
287
0
09 Jun 2021
CoAtNet: Marrying Convolution and Attention for All Data Sizes
CoAtNet: Marrying Convolution and Attention for All Data Sizes
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
ViT
151
1,222
0
09 Jun 2021
On the Connection between Local Attention and Dynamic Depth-wise
  Convolution
On the Connection between Local Attention and Dynamic Depth-wise Convolution
Qi Han
Zejia Fan
Qi Dai
Lei-huan Sun
Ming-Ming Cheng
Jiaying Liu
Jingdong Wang
ViT
123
112
0
08 Jun 2021
Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer
Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer
Zilong Huang
Youcheng Ben
Guozhong Luo
Pei Cheng
Gang Yu
Bin-Bin Fu
ViT
111
185
0
07 Jun 2021
Oriented Object Detection with Transformer
Oriented Object Detection with Transformer
Teli Ma
Mingyuan Mao
Honghui Zheng
Peng Gao
Xiaodi Wang
Shumin Han
Errui Ding
Baochang Zhang
David Doermann
ViT
54
44
0
06 Jun 2021
Learnable Fourier Features for Multi-Dimensional Spatial Positional
  Encoding
Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding
Yang Li
Si Si
Gang Li
Cho-Jui Hsieh
Samy Bengio
99
96
0
05 Jun 2021
RegionViT: Regional-to-Local Attention for Vision Transformers
RegionViT: Regional-to-Local Attention for Vision Transformers
Chun-Fu Chen
Yikang Shen
Quanfu Fan
ViT
148
200
0
04 Jun 2021
Glance-and-Gaze Vision Transformer
Glance-and-Gaze Vision Transformer
Qihang Yu
Yingda Xia
Yutong Bai
Yongyi Lu
Alan Yuille
Wei Shen
ViT
85
76
0
04 Jun 2021
Multi-Scale Feature Aggregation by Cross-Scale Pixel-to-Region Relation
  Operation for Semantic Segmentation
Multi-Scale Feature Aggregation by Cross-Scale Pixel-to-Region Relation Operation for Semantic Segmentation
Yechao Bai
Ziyuan Huang
Lyuyu Shen
Hongliang Guo
Marcelo H. Ang Jr
Daniela Rus
SSeg
18
4
0
03 Jun 2021
Transformers are Deep Infinite-Dimensional Non-Mercer Binary Kernel
  Machines
Transformers are Deep Infinite-Dimensional Non-Mercer Binary Kernel Machines
Matthew A. Wright
Joseph E. Gonzalez
84
23
0
02 Jun 2021
DLA-Net: Learning Dual Local Attention Features for Semantic
  Segmentation of Large-Scale Building Facade Point Clouds
DLA-Net: Learning Dual Local Attention Features for Semantic Segmentation of Large-Scale Building Facade Point Clouds
Yanfei Su
Weiquan Liu
Zhimin Yuan
Ming Cheng
Zhihong Zhang
Xuelun Shen
Cheng-Yu Wang
3DPC
108
40
0
01 Jun 2021
MSG-Transformer: Exchanging Local Spatial Information by Manipulating
  Messenger Tokens
MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens
Jiemin Fang
Lingxi Xie
Xinggang Wang
Xiaopeng Zhang
Wenyu Liu
Qi Tian
ViT
73
77
0
31 May 2021
Less is More: Pay Less Attention in Vision Transformers
Less is More: Pay Less Attention in Vision Transformers
Zizheng Pan
Bohan Zhuang
Haoyu He
Jing Liu
Jianfei Cai
ViT
141
87
0
29 May 2021
An Attention Free Transformer
An Attention Free Transformer
Shuangfei Zhai
Walter A. Talbott
Nitish Srivastava
Chen Huang
Hanlin Goh
Ruixiang Zhang
J. Susskind
ViT
94
132
0
28 May 2021
ResT: An Efficient Transformer for Visual Recognition
ResT: An Efficient Transformer for Visual Recognition
Qing-Long Zhang
Yubin Yang
ViT
97
235
0
28 May 2021
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and
  Interpretable Visual Understanding
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding
Zizhao Zhang
Han Zhang
Long Zhao
Ting Chen
Sercan O. Arik
Tomas Pfister
ViT
102
174
0
26 May 2021
Interpretable UAV Collision Avoidance using Deep Reinforcement Learning
Interpretable UAV Collision Avoidance using Deep Reinforcement Learning
Deep Thomas
Daniil Olshanskyi
Karter Krueger
Tichakorn Wongpiromsarn
Ali Jannesari
81
5
0
25 May 2021
Video-based Person Re-identification without Bells and Whistles
Video-based Person Re-identification without Bells and Whistles
Chih-Ting Liu
Jun-Cheng Chen
Chu-Song Chen
Shao-Yi Chien
112
15
0
22 May 2021
Intriguing Properties of Vision Transformers
Intriguing Properties of Vision Transformers
Muzammal Naseer
Kanchana Ranasinghe
Salman Khan
Munawar Hayat
Fahad Shahbaz Khan
Ming-Hsuan Yang
ViT
348
654
0
21 May 2021
A Multi-Branch Hybrid Transformer Networkfor Corneal Endothelial Cell
  Segmentation
A Multi-Branch Hybrid Transformer Networkfor Corneal Endothelial Cell Segmentation
Yinglin Zhang
Risa Higashita
Huazhu Fu
Yanwu Xu
Yang Zhang
Haofeng Liu
Jian Zhang
Jiang-Dong Liu
ViTMedIm
76
52
0
21 May 2021
DCAP: Deep Cross Attentional Product Network for User Response
  Prediction
DCAP: Deep Cross Attentional Product Network for User Response Prediction
Zekai Chen
Fangtian Zhong
Zhumin Chen
Xiao Zhang
Robert Pless
Xiuzhen Cheng
76
11
0
18 May 2021
Beyond Self-attention: External Attention using Two Linear Layers for
  Visual Tasks
Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks
Meng-Hao Guo
Zheng-Ning Liu
Tai-Jiang Mu
Shimin Hu
74
504
0
05 May 2021
Attention-based Stylisation for Exemplar Image Colourisation
Attention-based Stylisation for Exemplar Image Colourisation
Marc Górriz Blanch
Issa Khalifeh
Alan F. Smeaton
Noel E. O'Connor
M. Mrak
98
4
0
04 May 2021
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
521
2,716
0
04 May 2021
Dual-Cross Central Difference Network for Face Anti-Spoofing
Dual-Cross Central Difference Network for Face Anti-Spoofing
Zitong Yu
Yunxiao Qin
Hengshuang Zhao
Xiaobai Li
Guoying Zhao
CVBM
70
107
0
04 May 2021
AGMB-Transformer: Anatomy-Guided Multi-Branch Transformer Network for
  Automated Evaluation of Root Canal Therapy
AGMB-Transformer: Anatomy-Guided Multi-Branch Transformer Network for Automated Evaluation of Root Canal Therapy
Yunxiang Li
G. Zeng
Yifan Zhang
Jun Wang
Qianni Zhang
...
Neng Xia
Ruizi Peng
Kai Tang
Yaqi Wang
Shuai Wang
MedImAI4CE
166
29
0
02 May 2021
Exploring Relational Context for Multi-Task Dense Prediction
Exploring Relational Context for Multi-Task Dense Prediction
David Brüggemann
Menelaos Kanakis
Anton Obukhov
Stamatios Georgoulis
Luc Van Gool
128
77
0
28 Apr 2021
Twins: Revisiting the Design of Spatial Attention in Vision Transformers
Twins: Revisiting the Design of Spatial Attention in Vision Transformers
Xiangxiang Chu
Zhi Tian
Yuqing Wang
Bo Zhang
Haibing Ren
Xiaolin K. Wei
Huaxia Xia
Chunhua Shen
ViT
110
1,032
0
28 Apr 2021
ConTNet: Why not use convolution and transformer at the same time?
ConTNet: Why not use convolution and transformer at the same time?
Haotian Yan
Zhe Li
Weijian Li
Changhu Wang
Ming Wu
Chuang Zhang
ViT
90
77
0
27 Apr 2021
MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions
MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions
Zhenpei Yang
Zhile Ren
Qi Shan
Qi-Xing Huang
3DV
111
52
0
27 Apr 2021
CAGAN: Text-To-Image Generation with Combined Attention GANs
CAGAN: Text-To-Image Generation with Combined Attention GANs
Henning Schulze
Dogucan Yaman
Alexander Waibel
GAN
42
3
0
26 Apr 2021
Visformer: The Vision-friendly Transformer
Visformer: The Vision-friendly Transformer
Zhengsu Chen
Lingxi Xie
Jianwei Niu
Xuefeng Liu
Longhui Wei
Qi Tian
ViT
213
223
0
26 Apr 2021
Multiscale Vision Transformers
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
137
1,272
0
22 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Huayu Chen
Boqing Gong
ViT
368
594
0
22 Apr 2021
Variational Relational Point Completion Network
Variational Relational Point Completion Network
Liang Pan
Xinyi Chen
Zhongang Cai
Junzhe Zhang
Haiyu Zhao
Shuai Yi
Ziwei Liu
3DPC
301
179
0
20 Apr 2021
Augmenting Deep Classifiers with Polynomial Neural Networks
Augmenting Deep Classifiers with Polynomial Neural Networks
Grigorios G. Chrysos
Markos Georgopoulos
Jiankang Deng
Jean Kossaifi
Yannis Panagakis
Anima Anandkumar
57
20
0
16 Apr 2021
Higher-Order Attribute-Enhancing Heterogeneous Graph Neural Networks
Higher-Order Attribute-Enhancing Heterogeneous Graph Neural Networks
Jianxin Li
Hao Peng
Dongyuan Li
Yingtong Dou
Hekai Zhang
Philip S. Yu
Lifang He
90
81
0
16 Apr 2021
Simultaneous Face Hallucination and Translation for Thermal to Visible
  Face Verification using Axial-GAN
Simultaneous Face Hallucination and Translation for Thermal to Visible Face Verification using Axial-GAN
Rakhil Immidisetti
Shuowen Hu
Vishal M. Patel
CVBMPICV
68
19
0
13 Apr 2021
Co-Scale Conv-Attentional Image Transformers
Co-Scale Conv-Attentional Image Transformers
Weijian Xu
Yifan Xu
Tyler A. Chang
Zhuowen Tu
ViT
61
377
0
13 Apr 2021
Fibro-CoSANet: Pulmonary Fibrosis Prognosis Prediction using a
  Convolutional Self Attention Network
Fibro-CoSANet: Pulmonary Fibrosis Prognosis Prediction using a Convolutional Self Attention Network
Zabir Al Nazi
Fazla Rabbi Mashrur
Md. Amirul Islam
S. Saha
34
21
0
13 Apr 2021
Escaping the Big Data Paradigm with Compact Transformers
Escaping the Big Data Paradigm with Compact Transformers
Ali Hassani
Steven Walton
Nikhil Shah
Abulikemu Abuduweili
Jiachen Li
Humphrey Shi
154
464
0
12 Apr 2021
GAttANet: Global attention agreement for convolutional neural networks
GAttANet: Global attention agreement for convolutional neural networks
R. V. Rullen
A. Alamia
ViT
50
2
0
12 Apr 2021
Context-self contrastive pretraining for crop type semantic segmentation
Context-self contrastive pretraining for crop type semantic segmentation
Michail Tarasiou
R. Güler
Stefanos Zafeiriou
SSL
63
17
0
09 Apr 2021
Capturing Multi-Resolution Context by Dilated Self-Attention
Capturing Multi-Resolution Context by Dilated Self-Attention
Niko Moritz
Takaaki Hori
Jonathan Le Roux
32
7
0
07 Apr 2021
Fourier Image Transformer
Fourier Image Transformer
T. Buchholz
Florian Jug
ViT
43
19
0
06 Apr 2021
Previous
123...10111289
Next