Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.00641
Cited By
Focal Self-attention for Local-Global Interactions in Vision Transformers
1 July 2021
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Focal Self-attention for Local-Global Interactions in Vision Transformers"
50 / 252 papers shown
Title
Multi-Human Mesh Recovery with Transformers
Zeyu Wang
Zhenzhen Weng
Serena Yeung-Levy
3DH
57
1
0
26 Feb 2024
ToDo: Token Downsampling for Efficient Generation of High-Resolution Images
Ethan Smith
Nayan Saxena
Aninda Saha
DiffM
51
6
0
21 Feb 2024
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
158
35
0
05 Feb 2024
Do deep neural networks utilize the weight space efficiently?
Onur Can Koyun
B. U. Toreyin
59
0
0
26 Jan 2024
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu
Bencheng Liao
Qian Zhang
Xinlong Wang
Wenyu Liu
Xinggang Wang
Mamba
125
817
0
17 Jan 2024
SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting
Mingxin Huang
Dezhi Peng
Hongliang Li
Zhenghao Peng
Chongyu Liu
Dahua Lin
Yuliang Liu
Xiang Bai
Lianwen Jin
196
1
0
15 Jan 2024
Self-Attention and Hybrid Features for Replay and Deep-Fake Audio Detection
Lian Huang
Chi-Man Pun
66
5
0
11 Jan 2024
Deformable Audio Transformer for Audio Event Detection
Wentao Zhu
81
0
0
24 Dec 2023
Agent Attention: On the Integration of Softmax and Linear Attention
Dongchen Han
Tianzhu Ye
Yizeng Han
Zhuofan Xia
Siyuan Pan
Pengfei Wan
Shiji Song
Gao Huang
103
88
0
14 Dec 2023
The Counterattack of CNNs in Self-Supervised Learning: Larger Kernel Size might be All You Need
Tianjin Huang
Tianlong Chen
Zhangyang Wang
Shiwei Liu
80
1
0
09 Dec 2023
DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization
Risab Biswas
Swalpa Kumar Roy
Ning Wang
Umapada Pal
Guang-Bin Huang
ViT
29
1
0
06 Dec 2023
SCHEME: Scalable Channel Mixer for Vision Transformers
Deepak Sridhar
Yunsheng Li
Nuno Vasconcelos
177
0
0
01 Dec 2023
GeoDeformer: Geometric Deformable Transformer for Action Recognition
Jinhui Ye
Jiaming Zhou
Hui Xiong
Junwei Liang
ViT
51
1
0
29 Nov 2023
TransNeXt: Robust Foveal Visual Perception for Vision Transformers
Dai Shi
ViT
93
98
0
28 Nov 2023
Advancing Vision Transformers with Group-Mix Attention
Chongjian Ge
Xiaohan Ding
Zhan Tong
Lichao Sun
Jiangliu Wang
Yibing Song
Ping Luo
180
18
0
26 Nov 2023
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao
Haiping Wu
Weijian Xu
Xiyang Dai
Houdong Hu
Yumao Lu
Michael Zeng
Ce Liu
Lu Yuan
VLM
141
175
0
10 Nov 2023
Vision Big Bird: Random Sparsification for Full Attention
Zhemin Zhang
Xun Gong
ViT
70
1
0
10 Nov 2023
Scattering Vision Transformer: Spectral Mixing Matters
Badri N. Patro
Vijay Srinivas Agneeswaran
121
15
0
02 Nov 2023
VST++: Efficient and Stronger Visual Saliency Transformer
Nian Liu
Ziyang Luo
Ni Zhang
Junwei Han
ViT
73
20
0
18 Oct 2023
EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
Yulong Shi
Mingwei Sun
Yongshuai Wang
Hui Sun
Zengqiang Chen
103
4
0
10 Oct 2023
Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers
Shiyue Cao
Yueqin Yin
Lianghua Huang
Yu Liu
Xin Zhao
Deli Zhao
Kaiqi Huang
ViT
100
19
0
09 Oct 2023
Single Stage Warped Cloth Learning and Semantic-Contextual Attention Feature Fusion for Virtual TryOn
Sanhita Pathak
V. Kaushik
Brejesh Lall
DiffM
85
2
0
08 Oct 2023
Low-Resolution Self-Attention for Semantic Segmentation
Yu-Huan Wu
Shi-Chen Zhang
Yun-Hai Liu
Le Zhang
Xin Zhan
Daquan Zhou
Jiashi Feng
Ming-Ming Cheng
Liangli Zhen
ViT
233
3
0
08 Oct 2023
Speech Audio Synthesis from Tagged MRI and Non-Negative Matrix Factorization via Plastic Transformer
Xiaofeng Liu
Fangxu Xing
Maureen Stone
Jiachen Zhuo
S. Fels
Jerry L. Prince
Xiaofeng Liu
Jonghye Woo
MedIm
64
3
0
26 Sep 2023
CINFormer: Transformer network with multi-stage CNN feature injection for surface defect segmentation
Xiaoheng Jiang
Kaiyi Guo
Yang Lu
Feng Yan
Hao Liu
Jiale Cao
Mingliang Xu
Dacheng Tao
MedIm
ViT
UQCV
64
1
0
22 Sep 2023
RMT: Retentive Networks Meet Vision Transformers
Qihang Fan
Huaibo Huang
Mingrui Chen
Hongmin Liu
Ran He
ViT
170
91
0
20 Sep 2023
Mask-Attention-Free Transformer for 3D Instance Segmentation
Xin Lai
Yuhui Yuan
Ruihang Chu
Yukang Chen
Han Hu
Jiaya Jia
MedIm
ISeg
3DPC
102
31
0
04 Sep 2023
RevColV2: Exploring Disentangled Representations in Masked Image Modeling
Qi Han
Yuxuan Cai
Xiangyu Zhang
123
8
0
02 Sep 2023
Computation-efficient Deep Learning for Computer Vision: A Survey
Yulin Wang
Yizeng Han
Chaofei Wang
Shiji Song
Qi Tian
Gao Huang
VLM
141
21
0
27 Aug 2023
Vision Transformer Adapters for Generalizable Multitask Learning
Deblina Bhattacharjee
Sabine Süsstrunk
Mathieu Salzmann
ViT
91
8
0
23 Aug 2023
SG-Former: Self-guided Transformer with Evolving Token Reallocation
Sucheng Ren
Xingyi Yang
Songhua Liu
Xinchao Wang
ViT
88
43
0
23 Aug 2023
SPANet: Frequency-balancing Token Mixer using Spectral Pooling Aggregation Modulation
Guhnoo Yun
J. Yoo
Kijung Kim
Jeongho Lee
Dong Hwan Kim
MoE
70
9
0
22 Aug 2023
Revisiting Vision Transformer from the View of Path Ensemble
Shuning Chang
Pichao Wang
Haowen Luo
Fan Wang
Mike Zheng Shou
ViT
66
3
0
12 Aug 2023
DiT: Efficient Vision Transformers with Dynamic Token Routing
Yuchen Ma
Zhengcong Fei
Junshi Huang
ViT
67
2
0
07 Aug 2023
PVG: Progressive Vision Graph for Vision Recognition
Jiafu Wu
Jian Li
Jiangning Zhang
Boshen Zhang
M. Chi
Yabiao Wang
Chengjie Wang
ViT
124
15
0
01 Aug 2023
When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review
Maxime Fontana
Michael W. Spratling
Miaojing Shi
89
7
0
25 Jul 2023
Scale-Aware Modulation Meet Transformer
Wei-Shiang Lin
Ziheng Wu
Jiayu Chen
Jun Huang
Lianwen Jin
MoE
ViT
117
78
0
17 Jul 2023
Adaptive Window Pruning for Efficient Local Motion Deblurring
Haoying Li
Jixin Zhao
Shangchen Zhou
H. Feng
Chongyi Li
Chen Change Loy
ViT
82
5
0
25 Jun 2023
InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding
Hanrong Ye
Dan Xu
ViT
113
13
0
08 Jun 2023
Collect-and-Distribute Transformer for 3D Point Cloud Analysis
Haibo Qiu
Baosheng Yu
Dacheng Tao
3DPC
ViT
104
7
0
02 Jun 2023
Lightweight Vision Transformer with Bidirectional Interaction
Qihang Fan
Huaibo Huang
Xiaoqiang Zhou
Ran He
ViT
170
29
0
01 Jun 2023
Dual Path Transformer with Partition Attention
Zhengkai Jiang
Liang Liu
Jiangning Zhang
Yabiao Wang
Mingang Chen
Chengjie Wang
ViT
100
2
0
24 May 2023
OctFormer: Octree-based Transformers for 3D Point Clouds
Peng-Shuai Wang
ViT
3DPC
83
88
0
04 May 2023
AxWin Transformer: A Context-Aware Vision Transformer Backbone with Axial Windows
Fangjian Lin
Yizhe Ma
Sitong Wu
Long Yu
Sheng Tian
ViT
44
5
0
02 May 2023
UniNeXt: Exploring A Unified Architecture for Vision Recognition
Fangjian Lin
Jianlong Yuan
Sitong Wu
Fan Wang
Zhibin Wang
ViT
85
14
0
26 Apr 2023
MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer
QiHao Zhao
Yangyu Huang
Wei Hu
Fan Zhang
Jing Liu
ViT
75
16
0
24 Apr 2023
MMDR: A Result Feature Fusion Object Detection Approach for Autonomous System
Wendong Zhang
53
0
0
19 Apr 2023
Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding
Yu-Qi Yang
Yu-Xiao Guo
Jiangfeng Xiong
Yang Liu
Hao Pan
Peng-Shuai Wang
Xin Tong
B. Guo
ViT
108
88
0
14 Apr 2023
SpectFormer: Frequency and Attention is what you need in a Vision Transformer
Badri N. Patro
Vinay P. Namboodiri
Vijay Srinivas Agneeswaran
ViT
94
49
0
13 Apr 2023
RSIR Transformer: Hierarchical Vision Transformer using Random Sampling Windows and Important Region Windows
Zhemin Zhang
Xun Gong
ViT
44
1
0
13 Apr 2023
Previous
1
2
3
4
5
6
Next