Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.16289
Cited By
Lightweight Structure-Aware Attention for Visual Understanding
29 November 2022
Heeseung Kwon
F. M. Castro
M. Marín-Jiménez
N. Guil
Alahari Karteek
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Lightweight Structure-Aware Attention for Visual Understanding"
43 / 43 papers shown
Title
Image Recognition with Online Lightweight Vision Transformer: A Survey
Zherui Zhang
Rongtao Xu
Jie Zhou
Changwei Wang
Xingtian Pei
...
Jiguang Zhang
Li Guo
Longxiang Gao
Wenyuan Xu
Shibiao Xu
ViT
488
0
0
06 May 2025
cosFormer: Rethinking Softmax in Attention
Zhen Qin
Weixuan Sun
Huicai Deng
Dongxu Li
Yunshen Wei
Baohong Lv
Junjie Yan
Lingpeng Kong
Yiran Zhong
76
222
0
17 Feb 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
123
249
0
12 Jan 2022
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
153
693
0
02 Dec 2021
Relational Self-Attention: What's Missing in Attention for Video Understanding
Manjin Kim
Heeseung Kwon
Chunyu Wang
Suha Kwak
Minsu Cho
ViT
69
29
0
02 Nov 2021
PermuteFormer: Efficient Relative Position Encoding for Long Sequences
Peng-Jen Chen
67
21
0
06 Sep 2021
Early Convolutions Help Transformers See Better
Tete Xiao
Mannat Singh
Eric Mintun
Trevor Darrell
Piotr Dollár
Ross B. Girshick
55
771
0
28 Jun 2021
PVT v2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
AI4TS
122
1,676
0
25 Jun 2021
Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding
Shengjie Luo
Shanda Li
Tianle Cai
Di He
Dinglan Peng
Shuxin Zheng
Guolin Ke
Liwei Wang
Tie-Yan Liu
70
50
0
23 Jun 2021
CoAtNet: Marrying Convolution and Attention for All Data Sizes
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
ViT
126
1,208
0
09 Jun 2021
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding
Zizhao Zhang
Han Zhang
Long Zhao
Ting Chen
Sercan O. Arik
Tomas Pfister
ViT
61
173
0
26 May 2021
Relative Positional Encoding for Transformers with Linear Complexity
Antoine Liutkus
Ondřej Cífka
Shih-Lun Wu
Umut Simsekli
Yi-Hsuan Yang
Gaël Richard
65
48
0
18 May 2021
Segmenter: Transformer for Semantic Segmentation
Robin Strudel
Ricardo Garcia Pinel
Ivan Laptev
Cordelia Schmid
ViT
215
1,470
0
12 May 2021
Twins: Revisiting the Design of Spatial Attention in Vision Transformers
Xiangxiang Chu
Zhi Tian
Yuqing Wang
Bo Zhang
Haibing Ren
Xiaolin K. Wei
Huaxia Xia
Chunhua Shen
ViT
84
1,026
0
28 Apr 2021
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
135
1,265
0
22 Apr 2021
CvT: Introducing Convolutions to Vision Transformers
Haiping Wu
Bin Xiao
Noel Codella
Mengchen Liu
Xiyang Dai
Lu Yuan
Lei Zhang
ViT
154
1,917
0
29 Mar 2021
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
225
2,167
0
29 Mar 2021
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Ashish Vaswani
Prajit Ramachandran
A. Srinivas
Niki Parmar
Blake A. Hechtman
Jonathon Shlens
92
403
0
23 Mar 2021
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
Stéphane dÁscoli
Hugo Touvron
Matthew L. Leavitt
Ari S. Morcos
Giulio Biroli
Levent Sagun
ViT
131
833
0
19 Mar 2021
Involution: Inverting the Inherence of Convolution for Visual Recognition
Duo Li
Jie Hu
Changhu Wang
Xiangtai Li
Qi She
Lei Zhu
Tong Zhang
Qifeng Chen
BDL
72
304
0
10 Mar 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
533
3,734
0
24 Feb 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
338
181
0
17 Feb 2021
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
186
1,600
0
30 Sep 2020
WeightNet: Revisiting the Design Space of Weight Networks
Ningning Ma
Xinming Zhang
Jiawei Huang
Jian Sun
56
108
0
23 Jul 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
201
1,786
0
29 Jun 2020
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
216
1,713
0
08 Jun 2020
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
434
13,108
0
26 May 2020
Exploring Self-attention for Image Recognition
Hengshuang Zhao
Jiaya Jia
V. Koltun
SSL
95
786
0
28 Apr 2020
Designing Network Design Spaces
Ilija Radosavovic
Raj Prateek Kosaraju
Ross B. Girshick
Kaiming He
Piotr Dollár
GNN
102
1,692
0
30 Mar 2020
Dynamic Convolution: Attention over Convolution Kernels
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Dongdong Chen
Lu Yuan
Zicheng Liu
104
895
0
07 Dec 2019
RandAugment: Practical automated data augmentation with a reduced search space
E. D. Cubuk
Barret Zoph
Jonathon Shlens
Quoc V. Le
MQ
253
3,502
0
30 Sep 2019
MMDetection: Open MMLab Detection Toolbox and Benchmark
Kai-xiang Chen
Jiaqi Wang
Jiangmiao Pang
Yuhang Cao
Yu Xiong
...
Jingdong Wang
Jianping Shi
Wanli Ouyang
Chen Change Loy
Dahua Lin
VOS
172
2,878
0
17 Jun 2019
Video Modeling with Correlation Networks
Heng Wang
Du Tran
Lorenzo Torresani
Matt Feiszli
75
129
0
07 Jun 2019
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Sangdoo Yun
Dongyoon Han
Seong Joon Oh
Sanghyuk Chun
Junsuk Choe
Y. Yoo
OOD
622
4,802
0
13 May 2019
mixup: Beyond Empirical Risk Minimization
Hongyi Zhang
Moustapha Cissé
Yann N. Dauphin
David Lopez-Paz
NoLa
289
9,803
0
25 Oct 2017
Random Erasing Data Augmentation
Zhun Zhong
Liang Zheng
Guoliang Kang
Shaozi Li
Yi Yang
98
3,645
0
16 Aug 2017
The "something something" video database for learning and evaluating visual common sense
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzynska
S. Westphal
...
Moritz Mueller-Freitag
F. Hoppe
Christian Thurau
Ingo Bax
Roland Memisevic
VLM
98
1,542
0
13 Jun 2017
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
1.2K
20,892
0
17 Apr 2017
Mask R-CNN
Kaiming He
Georgia Gkioxari
Piotr Dollár
Ross B. Girshick
ObjD
360
27,244
0
20 Mar 2017
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
Limin Wang
Yuanjun Xiong
Zhe Wang
Yu Qiao
Dahua Lin
Xiaoou Tang
Luc Van Gool
ViT
120
3,840
0
02 Aug 2016
Deep Networks with Stochastic Depth
Gao Huang
Yu Sun
Zhuang Liu
Daniel Sedra
Kilian Q. Weinberger
215
2,361
0
30 Mar 2016
Rethinking the Inception Architecture for Computer Vision
Christian Szegedy
Vincent Vanhoucke
Sergey Ioffe
Jonathon Shlens
Z. Wojna
3DV
BDL
886
27,416
0
02 Dec 2015
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
1.7K
39,595
0
01 Sep 2014
1