Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.00641
Cited By
Focal Self-attention for Local-Global Interactions in Vision Transformers
1 July 2021
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Focal Self-attention for Local-Global Interactions in Vision Transformers"
50 / 252 papers shown
Title
PlantDet: A benchmark for Plant Detection in the Three-Rivers-Source Region
Huanhuan Li
Xuechao Zou
Yu-an Zhang
Jiangcai Zhaba
Guomei Li
Lamao Yongga
67
0
0
11 Apr 2023
MC-MLP:Multiple Coordinate Frames in all-MLP Architecture for Vision
Zhimin Zhu
Jianguo Zhao
Tong Mu
Yuliang Yang
Mengyu Zhu
74
0
0
08 Apr 2023
Towards an Effective and Efficient Transformer for Rain-by-snow Weather Removal
Tao Gao
Yuanbo Wen
Kaihao Zhang
Peng Cheng
Ting Chen
ViT
99
5
0
06 Apr 2023
Vision Transformers with Mixed-Resolution Tokenization
Tomer Ronen
Omer Levy
A. Golbert
ViT
105
21
0
01 Apr 2023
Rethinking Local Perception in Lightweight Vision Transformer
Qi Fan
Huaibo Huang
Jiyang Guan
Ran He
ViT
86
31
0
31 Mar 2023
InceptionNeXt: When Inception Meets ConvNeXt
Weihao Yu
Pan Zhou
Shuicheng Yan
Xinchao Wang
193
142
0
29 Mar 2023
Spherical Transformer for LiDAR-based 3D Recognition
Xin Lai
Yukang Chen
Fanbin Lu
Jianhui Liu
Jiaya Jia
3DPC
119
136
0
22 Mar 2023
Robustifying Token Attention for Vision Transformers
Yong Guo
David Stutz
Bernt Schiele
ViT
121
25
0
20 Mar 2023
Making Vision Transformers Efficient from A Token Sparsification View
Shuning Chang
Pichao Wang
Ming Lin
Fan Wang
David Junhao Zhang
Rong Jin
Mike Zheng Shou
ViT
102
26
0
15 Mar 2023
Pretrained ViTs Yield Versatile Representations For Medical Images
Christos Matsoukas
Johan Fredin Haslum
Magnus P Soderberg
Kevin Smith
MedIm
ViT
69
14
0
13 Mar 2023
TransMatting: Tri-token Equipped Transformer Model for Image Matting
Huanqia Cai
Fanglei Xue
Lele Xu
Lili Guo
ViT
71
3
0
11 Mar 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
187
216
0
20 Feb 2023
Stitchable Neural Networks
Zizheng Pan
Jianfei Cai
Bohan Zhuang
108
25
0
13 Feb 2023
CEDNet: A Cascade Encoder-Decoder Network for Dense Prediction
Gang Zhang
Zi-Hua Li
Chufeng Tang
Jianmin Li
Xiaolin Hu
106
20
0
13 Feb 2023
Dual Memory Units with Uncertainty Regulation for Weakly Supervised Video Anomaly Detection
Hang Zhou
Junqing Yu
Wei Yang
68
83
0
10 Feb 2023
DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition
Jiayu Jiao
Yuyao Tang
Kun-Li Channing Lin
Yipeng Gao
Jinhua Ma
Yaowei Wang
Wei-Shi Zheng
MedIm
ViT
98
156
0
03 Feb 2023
Fairness-aware Vision Transformer via Debiased Self-Attention
Yao Qiang
Chengyin Li
Prashant Khanduri
D. Zhu
ViT
136
9
0
31 Jan 2023
A Survey of Advanced Computer Vision Techniques for Sports
Tiago Mendes-Neves
Luís Meireles
João Mendes-Moreira
91
4
0
18 Jan 2023
FGAHOI: Fine-Grained Anchors for Human-Object Interaction Detection
Shuailei Ma
Yuefeng Wang
Shanze Wang
Ying-yu Wei
83
35
0
08 Jan 2023
Unsupervised 4D LiDAR Moving Object Segmentation in Stationary Settings with Multivariate Occupancy Time Series
T. Kreutz
M. Mühlhäuser
Alejandro Sánchez Guinea
87
15
0
30 Dec 2022
A Close Look at Spatial Modeling: From Attention to Convolution
Xu Ma
Huan Wang
Can Qin
Kunpeng Li
Xing Zhao
Jie Fu
Yun Fu
ViT
3DPC
71
12
0
23 Dec 2022
What Makes for Good Tokenizers in Vision Transformer?
Shengju Qian
Yi Zhu
Wenbo Li
Mu Li
Jiaya Jia
ViT
93
14
0
21 Dec 2022
Focal-UNet: UNet-like Focal Modulation for Medical Image Segmentation
Mohammadreza Naderi
Mohammad H. Givkashi
F. Piri
N. Karimi
S. Samavi
ViT
MedIm
108
14
0
19 Dec 2022
Most Important Person-guided Dual-branch Cross-Patch Attention for Group Affect Recognition
Hongxia Xie
Ming-Xian Lee
Tzu-Jui Chen
Hung-Jen Chen
Hou-I Liu
Hong-Han Shuai
Wen-Huang Cheng
CVBM
77
8
0
14 Dec 2022
GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation
Chenhongyi Yang
Jiarui Xu
Shalini De Mello
Elliot J. Crowley
Xinyu Wang
ViT
109
22
0
13 Dec 2022
Mitigation of Spatial Nonstationarity with Vision Transformers
Lei Liu
Javier E. Santos
Mavsa Prodanović
Michael J. Pyrcz
55
4
0
09 Dec 2022
Asymmetric Cross-Scale Alignment for Text-Based Person Search
Zhong Ji
Junhua Hu
Deyin Liu
Yuan Wu
Ye Zhao
108
46
0
26 Nov 2022
Cross Aggregation Transformer for Image Restoration
Zheng Chen
Yulun Zhang
Jinjin Gu
Yongbing Zhang
Lingyu Kong
X. Yuan
ViT
120
158
0
24 Nov 2022
Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token Migration
Yunjie Tian
Lingxi Xie
Jihao Qiu
Jianbin Jiao
Yaowei Wang
Qi Tian
Qixiang Ye
ViT
98
7
0
23 Nov 2022
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition
Qibin Hou
Cheng Lu
Mingg-Ming Cheng
Jiashi Feng
ViT
132
141
0
22 Nov 2022
N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution
Haram Choi
Jeong-Sik Lee
Jihoon Yang
ViT
88
84
0
21 Nov 2022
Vision Transformer with Super Token Sampling
Huaibo Huang
Xiaoqiang Zhou
Jie Cao
Ran He
Tieniu Tan
ViT
90
59
0
21 Nov 2022
Token Transformer: Can class token help window-based transformer build better long-range interactions?
Jia-ju Mao
Yuan Chang
Xuesong Yin
56
0
0
11 Nov 2022
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
...
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
VLM
184
700
0
10 Nov 2022
ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention
Jyotikrishna Dass
Shang Wu
Huihong Shi
Chaojian Li
Zhifan Ye
Zhongfeng Wang
Yingyan Lin
80
57
0
09 Nov 2022
State-of-the-art Models for Object Detection in Various Fields of Application
S. A. G. Naqvi
Syed Shahnawaz Ali
ObjD
OOD
130
0
0
01 Nov 2022
ViT-LSLA: Vision Transformer with Light Self-Limited-Attention
Zhenzhe Hechen
Wei Huang
Yixin Zhao
ViT
59
6
0
31 Oct 2022
Contextual Learning in Fourier Complex Field for VHR Remote Sensing Images
Yan Zhang
Xiyuan Gao
Qingyan Duan
Jiaxu Leng
Xiao Pu
Xinbo Gao
ViT
59
1
0
28 Oct 2022
Explicitly Increasing Input Information Density for Vision Transformers on Small Datasets
Xiangyu Chen
Ying Qin
Wenju Xu
A. Bur
Cuncong Zhong
Guanghui Wang
ViT
88
3
0
25 Oct 2022
Context-Enhanced Stereo Transformer
Weiyu Guo
Zhaoshuo Li
Yongkui Yang
Ziyi Wang
Russell H. Taylor
Mathias Unberath
Alan Yuille
Yingwei Li
70
41
0
21 Oct 2022
Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
Zhiying Lu
Hongtao Xie
Chuanbin Liu
Yongdong Zhang
ViT
107
62
0
12 Oct 2022
Curved Representation Space of Vision Transformers
Juyeop Kim
Junha Park
Songkuk Kim
Jongseok Lee
ViT
81
7
0
11 Oct 2022
Hierarchical Graph Transformer with Adaptive Node Sampling
Zaixin Zhang
Qi Liu
Qingyong Hu
Cheekong Lee
162
95
0
08 Oct 2022
FocalUNETR: A Focal Transformer for Boundary-aware Segmentation of CT Images
Chengyin Li
Yao Qiang
Vikram Goddla
H. Bagher-Ebadian
Prashant Khanduri
I. Chetty
D. Zhu
ViT
MedIm
77
9
0
06 Oct 2022
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
Chenglin Yang
Siyuan Qiao
Qihang Yu
Xiaoding Yuan
Yukun Zhu
Alan Yuille
Hartwig Adam
Liang-Chieh Chen
ViT
MoE
128
66
0
04 Oct 2022
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
Weicong Liang
Yuhui Yuan
Henghui Ding
Xiao Luo
Weihong Lin
Ding Jia
Zheng Zhang
Chao Zhang
Hanhua Hu
120
31
0
03 Oct 2022
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features
S. Wadekar
Abhishek Chaurasia
ViT
157
94
0
30 Sep 2022
Graph Reasoning Transformer for Image Parsing
Dong Zhang
Jinhui Tang
Kwang-Ting Cheng
ViT
66
18
0
20 Sep 2022
Axially Expanded Windows for Local-Global Interaction in Vision Transformers
Zhemin Zhang
Xun Gong
ViT
59
1
0
19 Sep 2022
SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation
Meng-Hao Guo
Chenggang Lu
Qibin Hou
Zheng Liu
Ming-Ming Cheng
Shiyong Hu
SSeg
ViT
VLM
104
671
0
18 Sep 2022
Previous
1
2
3
4
5
6
Next