Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.05909
Cited By
Stand-Alone Self-Attention in Vision Models
13 June 2019
Prajit Ramachandran
Niki Parmar
Ashish Vaswani
Irwan Bello
Anselm Levskaya
Jonathon Shlens
VLM
SLR
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Stand-Alone Self-Attention in Vision Models"
50 / 588 papers shown
Title
Learning to Estimate Hidden Motions with Global Motion Aggregation
Shihao Jiang
Dylan Campbell
Yao Lu
Hongdong Li
Leonid Sigal
3DPC
145
297
0
06 Apr 2021
An Empirical Study of Training Self-Supervised Vision Transformers
Xinlei Chen
Saining Xie
Kaiming He
ViT
183
1,874
0
05 Apr 2021
Group-Free 3D Object Detection via Transformers
Ze Liu
Zheng Zhang
Yue Cao
Han Hu
Xin Tong
ViT
3DPC
101
315
0
01 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
215
1,193
0
01 Apr 2021
Motion Guided Attention Fusion to Recognize Interactions from Videos
Tae Soo Kim
Jonathan D. Jones
Gregory Hager
39
15
0
01 Apr 2021
Going deeper with Image Transformers
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jégou
ViT
193
1,025
0
31 Mar 2021
Rethinking Spatial Dimensions of Vision Transformers
Byeongho Heo
Sangdoo Yun
Dongyoon Han
Sanghyuk Chun
Junsuk Choe
Seong Joon Oh
ViT
562
585
0
30 Mar 2021
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
240
2,175
0
29 Mar 2021
BA^2M: A Batch Aware Attention Module for Image Classification
Qishang Cheng
Hongliang Li
Qi Wu
K. Ngan
39
6
0
28 Mar 2021
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
Chun-Fu Chen
Quanfu Fan
Yikang Shen
ViT
73
1,500
0
27 Mar 2021
TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization
Wei Gao
Fang Wan
Xingjia Pan
Zhiliang Peng
Qi Tian
Zhenjun Han
Bolei Zhou
QiXiang Ye
ViT
WSOL
91
203
0
27 Mar 2021
Looking Beyond Two Frames: End-to-End Multi-Object Tracking Using Spatial and Temporal Transformers
Tianyu Zhu
Markus Hiller
Mahsa Ehsanpour
Rongkai Ma
Tom Drummond
Ian Reid
Hamid Rezatofighi
VOT
87
36
0
27 Mar 2021
Understanding Robustness of Transformers for Image Classification
Srinadh Bhojanapalli
Ayan Chakrabarti
Daniel Glasner
Daliang Li
Thomas Unterthiner
Andreas Veit
ViT
116
392
0
26 Mar 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
517
21,773
0
25 Mar 2021
An Image is Worth 16x16 Words, What is a Video Worth?
Gilad Sharir
Asaf Noy
Lihi Zelnik-Manor
ViT
100
125
0
25 Mar 2021
Vision Transformers for Dense Prediction
René Ranftl
Alexey Bochkovskiy
V. Koltun
ViT
MDE
158
1,752
0
24 Mar 2021
SaccadeCam: Adaptive Visual Attention for Monocular Depth Sensing
Brevin Tilmon
S. Koppal
MDE
54
5
0
24 Mar 2021
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Ashish Vaswani
Prajit Ramachandran
A. Srinivas
Niki Parmar
Blake A. Hechtman
Jonathon Shlens
109
404
0
23 Mar 2021
Instance-level Image Retrieval using Reranking Transformers
Fuwen Tan
Jiangbo Yuan
Vicente Ordonez
ViT
165
93
0
22 Mar 2021
DeepViT: Towards Deeper Vision Transformer
Daquan Zhou
Bingyi Kang
Xiaojie Jin
Linjie Yang
Xiaochen Lian
Zihang Jiang
Qibin Hou
Jiashi Feng
ViT
130
525
0
22 Mar 2021
Incorporating Convolution Designs into Visual Transformers
Kun Yuan
Shaopeng Guo
Ziwei Liu
Aojun Zhou
F. Yu
Wei Wu
ViT
115
484
0
22 Mar 2021
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
Stéphane dÁscoli
Hugo Touvron
Matthew L. Leavitt
Ari S. Morcos
Giulio Biroli
Levent Sagun
ViT
143
835
0
19 Mar 2021
Scalable Vision Transformers with Hierarchical Pooling
Zizheng Pan
Bohan Zhuang
Jing Liu
Haoyu He
Jianfei Cai
ViT
95
130
0
19 Mar 2021
UNETR: Transformers for 3D Medical Image Segmentation
Ali Hatamizadeh
Yucheng Tang
Vishwesh Nath
Dong Yang
Andriy Myronenko
Bennett Landman
H. Roth
Daguang Xu
ViT
MedIm
198
1,634
0
18 Mar 2021
Revisiting ResNets: Improved Training and Scaling Strategies
Irwan Bello
W. Fedus
Xianzhi Du
E. D. Cubuk
A. Srinivas
Nayeon Lee
Jonathon Shlens
Barret Zoph
98
302
0
13 Mar 2021
Involution: Inverting the Inherence of Convolution for Visual Recognition
Duo Li
Jie Hu
Changhu Wang
Xiangtai Li
Qi She
Lei Zhu
Tong Zhang
Qifeng Chen
BDL
77
304
0
10 Mar 2021
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
163
388
0
05 Mar 2021
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
214
1,029
0
04 Mar 2021
Generative Adversarial Transformers
Drew A. Hudson
C. L. Zitnick
ViT
131
182
0
01 Mar 2021
DR-TANet: Dynamic Receptive Temporal Attention Network for Street Scene Change Detection
Shuo Chen
Kailun Yang
Rainer Stiefelhagen
66
39
0
01 Mar 2021
Representation Learning for Event-based Visuomotor Policies
Sai H. Vemprala
Sami Mian
Ashish Kapoor
55
23
0
01 Mar 2021
Nested-block self-attention for robust radiotherapy planning segmentation
Harini Veeraraghavan
Jue Jiang
Elguindi Sharif
S. Berry
Ifeanyirochukwu Onochie
Aditya P. Apte
L. Cerviño
Joseph O. Deasy
93
3
0
26 Feb 2021
Iterative SE(3)-Transformers
F. Fuchs
E. Wagstaff
Justas Dauparas
Ingmar Posner
64
17
0
26 Feb 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
583
3,754
0
24 Feb 2021
Conditional Positional Encodings for Vision Transformers
Xiangxiang Chu
Zhi Tian
Bo Zhang
Xinlong Wang
Chunhua Shen
ViT
150
625
0
22 Feb 2021
UniT: Multimodal Multitask Learning with a Unified Transformer
Ronghang Hu
Amanpreet Singh
ViT
106
301
0
22 Feb 2021
Evolving Attention with Residual Convolutions
Yujing Wang
Yaming Yang
Jiangang Bai
Mingliang Zhang
Jing Bai
Jiahao Yu
Ce Zhang
Gao Huang
Yunhai Tong
ViT
112
34
0
20 Feb 2021
Hard-Attention for Scalable Image Classification
Athanasios Papadopoulos
Pawel Korus
N. Memon
114
25
0
20 Feb 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
359
181
0
17 Feb 2021
Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition
Heeseung Kwon
Manjin Kim
Suha Kwak
Minsu Cho
TTA
79
41
0
14 Feb 2021
Training Vision Transformers for Image Retrieval
Alaaeldin El-Nouby
Natalia Neverova
Ivan Laptev
Hervé Jégou
ViT
139
159
0
10 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
420
2,075
0
09 Feb 2021
CKConv: Continuous Kernel Convolution For Sequential Data
David W. Romero
Anna Kuzina
Erik J. Bekkers
Jakub M. Tomczak
Mark Hoogendoorn
75
126
0
04 Feb 2021
Relaxed Transformer Decoders for Direct Action Proposal Generation
Jing Tan
Jiaqi Tang
Limin Wang
Gangshan Wu
ViT
156
182
0
03 Feb 2021
Self-Attention Meta-Learner for Continual Learning
Ghada Sokar
Decebal Constantin Mocanu
Mykola Pechenizkiy
CLL
53
11
0
28 Jan 2021
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Li-xin Yuan
Yunpeng Chen
Tao Wang
Weihao Yu
Yujun Shi
Zihang Jiang
Francis E. H. Tay
Jiashi Feng
Shuicheng Yan
ViT
192
1,953
0
28 Jan 2021
CNN with large memory layers
R. Karimov
Yury Malkov
Karim Iskakov
Victor Lempitsky
44
0
0
27 Jan 2021
Bottleneck Transformers for Visual Recognition
A. Srinivas
Nayeon Lee
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
373
997
0
27 Jan 2021
Channelized Axial Attention for Semantic Segmentation -- Considering Channel Relation within Spatial Attention for Semantic Segmentation
Ye Huang
Di Kang
W. Jia
Xiangjian He
Liu Liu
169
36
0
19 Jan 2021
Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification
Ardhendu Behera
Zachary Wharton
Pradeep Ruwan Padmasiri Galbokka Hewage
Asish Bera
118
110
0
17 Jan 2021
Previous
1
2
3
...
10
11
12
9
Next