ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.05909
  4. Cited By
Stand-Alone Self-Attention in Vision Models

Stand-Alone Self-Attention in Vision Models

13 June 2019
Prajit Ramachandran
Niki Parmar
Ashish Vaswani
Irwan Bello
Anselm Levskaya
Jonathon Shlens
    VLMSLRViT
ArXiv (abs)PDFHTML

Papers citing "Stand-Alone Self-Attention in Vision Models"

50 / 588 papers shown
Title
Searching for Efficient Multi-Stage Vision Transformers
Searching for Efficient Multi-Stage Vision Transformers
Yi-Lun Liao
S. Karaman
Vivienne Sze
ViT
78
19
0
01 Sep 2021
Pulmonary Disease Classification Using Globally Correlated Maximum
  Likelihood: an Auxiliary Attention mechanism for Convolutional Neural
  Networks
Pulmonary Disease Classification Using Globally Correlated Maximum Likelihood: an Auxiliary Attention mechanism for Convolutional Neural Networks
E. Verenich
Tobias Martin
Alvaro Velasquez
Nazar Khan
Faraz Hussain
70
3
0
01 Sep 2021
Efficient conformer: Progressive downsampling and grouped attention for
  automatic speech recognition
Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition
Maxime Burchi
Valentin Vielzeuf
73
88
0
31 Aug 2021
Learning Inner-Group Relations on Point Clouds
Learning Inner-Group Relations on Point Clouds
Haoxi Ran
Wei Zhuo
Jing Liu
Li Lu
3DPC
111
61
0
27 Aug 2021
Shifted Chunk Transformer for Spatio-Temporal Representational Learning
Shifted Chunk Transformer for Spatio-Temporal Representational Learning
Xuefan Zha
Wentao Zhu
Tingxun Lv
Sen Yang
Ji Liu
AI4TSViT
88
27
0
26 Aug 2021
Memory-Augmented Non-Local Attention for Video Super-Resolution
Memory-Augmented Non-Local Attention for Video Super-Resolution
Ji-yang Yu
Jingen Liu
Liefeng Bo
Tao Mei
SupR
100
32
0
25 Aug 2021
SwinIR: Image Restoration Using Swin Transformer
SwinIR: Image Restoration Using Swin Transformer
Christos Sakaridis
Jie Cao
Guolei Sun
Peng Sun
Luc Van Gool
Radu Timofte
ViT
198
2,982
0
23 Aug 2021
Relational Embedding for Few-Shot Classification
Relational Embedding for Few-Shot Classification
Dahyun Kang
Heeseung Kwon
Juhong Min
Minsu Cho
103
187
0
22 Aug 2021
StarVQA: Space-Time Attention for Video Quality Assessment
StarVQA: Space-Time Attention for Video Quality Assessment
Fengchuang Xing
Yuan-Gen Wang
Hanpin Wang
Leida Li
Guopu Zhu
ViT
24
22
0
22 Aug 2021
Construction material classification on imbalanced datasets using Vision
  Transformer (ViT) architecture
Construction material classification on imbalanced datasets using Vision Transformer (ViT) architecture
Maryam Soleymani
Mahdi Bonyani
Hadi Mahami
F. Nasirzadeh
35
1
0
21 Aug 2021
Discriminative Region-based Multi-Label Zero-Shot Learning
Discriminative Region-based Multi-Label Zero-Shot Learning
Sanath Narayan
Akshita Gupta
Salman Khan
Fahad Shahbaz Khan
Ling Shao
M. Shah
VLM
117
47
0
20 Aug 2021
Group-based Distinctive Image Captioning with Memory Attention
Group-based Distinctive Image Captioning with Memory Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
100
18
0
20 Aug 2021
Do Vision Transformers See Like Convolutional Neural Networks?
Do Vision Transformers See Like Convolutional Neural Networks?
M. Raghu
Thomas Unterthiner
Simon Kornblith
Chiyuan Zhang
Alexey Dosovitskiy
ViT
145
969
0
19 Aug 2021
Spatially-Adaptive Image Restoration using Distortion-Guided Networks
Spatially-Adaptive Image Restoration using Distortion-Guided Networks
Kuldeep Purohit
Maitreya Suin
A. N. Rajagopalan
Vishnu Boddeti
296
118
0
19 Aug 2021
PTT: Point-Track-Transformer Module for 3D Single Object Tracking in
  Point Clouds
PTT: Point-Track-Transformer Module for 3D Single Object Tracking in Point Clouds
Jiayao Shan
Sifan Zhou
Zheng Fang
Yubo Cui
ViT
96
80
0
14 Aug 2021
PVT: Point-Voxel Transformer for Point Cloud Learning
PVT: Point-Voxel Transformer for Point Cloud Learning
Cheng Zhang
Haocheng Wan
Xinyi Shen
Zizhao Wu
3DPCViT
102
86
0
13 Aug 2021
Learning Fair Face Representation With Progressive Cross Transformer
Learning Fair Face Representation With Progressive Cross Transformer
Yong Li
Yufei Sun
Zhen Cui
Shiguang Shan
Jian Yang
77
11
0
11 Aug 2021
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial
  Locality?
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?
Yuki Tatsunami
Masato Taki
84
12
0
09 Aug 2021
An Intelligent Recommendation-cum-Reminder System
An Intelligent Recommendation-cum-Reminder System
Rohan Saxena
Maheep Chaudhary
Chandresh Kumar Maurya
Shitala Prasad
15
0
0
09 Aug 2021
Understanding the computational demands underlying visual reasoning
Understanding the computational demands underlying visual reasoning
Mohit Vaishnav
Rémi Cadène
A. Alamia
Drew Linsley
Rufin VanRullen
Thomas Serre
GNNCoGe
77
17
0
08 Aug 2021
Global Self-Attention as a Replacement for Graph Convolution
Global Self-Attention as a Replacement for Graph Convolution
Md Shamim Hussain
Mohammed J Zaki
D. Subramanian
ViT
101
127
0
07 Aug 2021
Fast Convergence of DETR with Spatially Modulated Co-Attention
Fast Convergence of DETR with Spatially Modulated Co-Attention
Peng Gao
Minghang Zheng
Xiaogang Wang
Jifeng Dai
Hongsheng Li
ViT
91
307
0
05 Aug 2021
Vision Transformer with Progressive Sampling
Vision Transformer with Progressive Sampling
Xiaoyu Yue
Shuyang Sun
Zhanghui Kuang
Meng Wei
Philip Torr
Wayne Zhang
Dahua Lin
ViT
89
85
0
03 Aug 2021
Knowing When to Quit: Selective Cascaded Regression with Patch Attention
  for Real-Time Face Alignment
Knowing When to Quit: Selective Cascaded Regression with Patch Attention for Real-Time Face Alignment
Gil Shapira
Noga Levy
Ishay Goldin
R. Jevnisek
CVBM
42
3
0
01 Aug 2021
Rethinking and Improving Relative Position Encoding for Vision
  Transformer
Rethinking and Improving Relative Position Encoding for Vision Transformer
Kan Wu
Houwen Peng
Minghao Chen
Jianlong Fu
Hongyang Chao
ViT
118
339
0
29 Jul 2021
Contextual Transformer Networks for Visual Recognition
Contextual Transformer Networks for Visual Recognition
Yehao Li
Ting Yao
Yingwei Pan
Tao Mei
ViT
108
490
0
26 Jul 2021
Log-Polar Space Convolution for Convolutional Neural Networks
Log-Polar Space Convolution for Convolutional Neural Networks
Fuchun Sun
Ji-Rong Wen
47
2
0
26 Jul 2021
H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for
  Sequences
H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences
Zhenhai Zhu
Radu Soricut
167
42
0
25 Jul 2021
A3GC-IP: Attention-Oriented Adjacency Adaptive Recurrent Graph
  Convolutions for Human Pose Estimation from Sparse Inertial Measurements
A3GC-IP: Attention-Oriented Adjacency Adaptive Recurrent Graph Convolutions for Human Pose Estimation from Sparse Inertial Measurements
Patrik Puchert
Timo Ropinski
3DH
75
3
0
23 Jul 2021
Video Crowd Localization with Multi-focus Gaussian Neighborhood
  Attention and a Large-Scale Benchmark
Video Crowd Localization with Multi-focus Gaussian Neighborhood Attention and a Large-Scale Benchmark
Haopeng Li
Lingbo Liu
Kunlin Yang
Shinan Liu
Junyuan Gao
Bin Zhao
Rui Zhang
Jun Hou
144
16
0
19 Jul 2021
From block-Toeplitz matrices to differential equations on graphs:
  towards a general theory for scalable masked Transformers
From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers
K. Choromanski
Han Lin
Haoxian Chen
Tianyi Zhang
Arijit Sehanobish
Valerii Likhosherstov
Jack Parker-Holder
Tamás Sarlós
Adrian Weller
Thomas Weingarten
115
34
0
16 Jul 2021
Visual Parser: Representing Part-whole Hierarchies with Transformers
Visual Parser: Representing Part-whole Hierarchies with Transformers
Shuyang Sun
Xiaoyu Yue
S. Bai
Philip Torr
128
27
0
13 Jul 2021
Locally Enhanced Self-Attention: Combining Self-Attention and
  Convolution as Local and Context Terms
Locally Enhanced Self-Attention: Combining Self-Attention and Convolution as Local and Context Terms
Chenglin Yang
Siyuan Qiao
Adam Kortylewski
Alan Yuille
144
4
0
12 Jul 2021
ReconVAT: A Semi-Supervised Automatic Music Transcription Framework for
  Low-Resource Real-World Data
ReconVAT: A Semi-Supervised Automatic Music Transcription Framework for Low-Resource Real-World Data
K. Cheuk
Dorien Herremans
Li Su
204
34
0
11 Jul 2021
U-Net with Hierarchical Bottleneck Attention for Landmark Detection in
  Fundus Images of the Degenerated Retina
U-Net with Hierarchical Bottleneck Attention for Landmark Detection in Fundus Images of the Degenerated Retina
Shuyun Tang
Z. Qi
Jacob Granley
M. Beyeler
55
10
0
09 Jul 2021
ViTGAN: Training GANs with Vision Transformers
ViTGAN: Training GANs with Vision Transformers
Kwonjoon Lee
Huiwen Chang
Lu Jiang
Han Zhang
Zhuowen Tu
Ce Liu
ViT
99
186
0
09 Jul 2021
Poly-NL: Linear Complexity Non-local Layers with Polynomials
Poly-NL: Linear Complexity Non-local Layers with Polynomials
F. Babiloni
Ioannis Marras
Filippos Kokkinos
Jiankang Deng
Grigorios G. Chrysos
Stefanos Zafeiriou
61
6
0
06 Jul 2021
Test-Time Personalization with a Transformer for Human Pose Estimation
Test-Time Personalization with a Transformer for Human Pose Estimation
Yizhuo Li
Miao Hao
Zonglin Di
N. B. Gundavarapu
Xiaolong Wang
ViT
89
48
0
05 Jul 2021
Polarized Self-Attention: Towards High-quality Pixel-wise Regression
Polarized Self-Attention: Towards High-quality Pixel-wise Regression
Huajun Liu
Fuqiang Liu
Xinyi Fan
Dong Huang
135
220
0
02 Jul 2021
CSWin Transformer: A General Vision Transformer Backbone with
  Cross-Shaped Windows
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Xiaoyi Dong
Jianmin Bao
Dongdong Chen
Weiming Zhang
Nenghai Yu
Lu Yuan
Dong Chen
B. Guo
ViT
187
993
0
01 Jul 2021
AutoFormer: Searching Transformers for Visual Recognition
AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen
Houwen Peng
Jianlong Fu
Haibin Ling
ViT
104
268
0
01 Jul 2021
Focal Self-attention for Local-Global Interactions in Vision
  Transformers
Focal Self-attention for Local-Global Interactions in Vision Transformers
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao
ViT
84
436
0
01 Jul 2021
Early Convolutions Help Transformers See Better
Early Convolutions Help Transformers See Better
Tete Xiao
Mannat Singh
Eric Mintun
Trevor Darrell
Piotr Dollár
Ross B. Girshick
82
778
0
28 Jun 2021
VOLO: Vision Outlooker for Visual Recognition
VOLO: Vision Outlooker for Visual Recognition
Li-xin Yuan
Qibin Hou
Zihang Jiang
Jiashi Feng
Shuicheng Yan
ViT
135
328
0
24 Jun 2021
Bootstrap Representation Learning for Segmentation on Medical Volumes
  and Sequences
Bootstrap Representation Learning for Segmentation on Medical Volumes and Sequences
Zejian Chen
Wei Zhuo
Tianfu Wang
Wufeng Xue
Dong Ni
101
6
0
23 Jun 2021
Probabilistic Attention for Interactive Segmentation
Probabilistic Attention for Interactive Segmentation
Prasad Gabbur
Manjot Bilkhu
J. Movellan
103
13
0
23 Jun 2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo
A. Piergiovanni
Anurag Arnab
Mostafa Dehghani
A. Angelova
ViT
149
129
0
21 Jun 2021
Visual Correspondence Hallucination
Visual Correspondence Hallucination
Hugo Germain
Vincent Lepetit
Guillaume Bourmaud
81
11
0
17 Jun 2021
Multi-head or Single-head? An Empirical Comparison for Transformer
  Training
Multi-head or Single-head? An Empirical Comparison for Transformer Training
Liyuan Liu
Jialu Liu
Jiawei Han
71
33
0
17 Jun 2021
Scene Transformer: A unified architecture for predicting multiple agent
  trajectories
Scene Transformer: A unified architecture for predicting multiple agent trajectories
Jiquan Ngiam
Benjamin Caine
Vijay Vasudevan
Zhengdong Zhang
H. Chiang
...
Ashish Venugopal
David J. Weiss
Benjamin Sapp
Zhifeng Chen
Jonathon Shlens
127
168
0
15 Jun 2021
Previous
123...101112789
Next