Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.15808
Cited By
CvT: Introducing Convolutions to Vision Transformers
29 March 2021
Haiping Wu
Bin Xiao
Noel Codella
Mengchen Liu
Xiyang Dai
Lu Yuan
Lei Zhang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CvT: Introducing Convolutions to Vision Transformers"
50 / 819 papers shown
Title
What Makes for Good Tokenizers in Vision Transformer?
Shengju Qian
Yi Zhu
Wenbo Li
Mu Li
Jiaya Jia
ViT
37
14
0
21 Dec 2022
EIT: Enhanced Interactive Transformer
Tong Zheng
Bei Li
Huiwen Bao
Tong Xiao
Jingbo Zhu
32
2
0
20 Dec 2022
Convolution-enhanced Evolving Attention Networks
Yujing Wang
Yaming Yang
Zhuowan Li
Jiangang Bai
Mingliang Zhang
Xiangtai Li
Jiahao Yu
Ce Zhang
Gao Huang
Yu Tong
ViT
27
6
0
16 Dec 2022
Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation
Loic Themyr
Clément Rambour
Nicolas Thome
Toby Collins
Alexandre Hostettler
ViT
27
10
0
15 Dec 2022
GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation
Chenhongyi Yang
Jiarui Xu
Shalini De Mello
Elliot J. Crowley
Xinyu Wang
ViT
43
21
0
13 Dec 2022
OAMixer: Object-aware Mixing Layer for Vision Transformers
H. Kang
Sangwoo Mo
Jinwoo Shin
VLM
39
4
0
13 Dec 2022
Masked autoencoders are effective solution to transformer data-hungry
Jia-ju Mao
Honggu Zhou
Xuesong Yin
Binling Nie
MedIm
37
6
0
12 Dec 2022
Position Embedding Needs an Independent Layer Normalization
Runyi Yu
Zhennan Wang
Yinhuai Wang
Kehan Li
Yian Zhao
Jian Zhang
Guoli Song
Jie Chen
31
1
0
10 Dec 2022
CamoFormer: Masked Separable Attention for Camouflaged Object Detection
Bo Yin
Xuying Zhang
Qibin Hou
Bo Sun
Deng-Ping Fan
Luc Van Gool
28
51
0
10 Dec 2022
FacT: Factor-Tuning for Lightweight Adaptation on Vision Transformer
Shibo Jie
Zhi-Hong Deng
31
127
0
06 Dec 2022
Generalizable Person Re-Identification via Viewpoint Alignment and Fusion
Bingliang Jiao
Lingqiao Liu
Liying Gao
Guosheng Lin
Ruiqi Wu
Shizhou Zhang
Peng Wang
Yanning Zhang
24
2
0
05 Dec 2022
ResFormer: Scaling ViTs with Multi-Resolution Training
Rui Tian
Zuxuan Wu
Qiuju Dai
Hang-Rui Hu
Yu Qiao
Yu-Gang Jiang
ViT
32
33
0
01 Dec 2022
From CNNs to Shift-Invariant Twin Models Based on Complex Wavelets
Hubert Leterme
K. Polisano
V. Perrier
Alahari Karteek
39
1
0
01 Dec 2022
Part-based Face Recognition with Vision Transformers
Zhonglin Sun
Georgios Tzimiropoulos
ViT
28
15
0
30 Nov 2022
Pattern Attention Transformer with Doughnut Kernel
Wenyuan Sheng
ViT
16
0
0
30 Nov 2022
From Coarse to Fine: Hierarchical Pixel Integration for Lightweight Image Super-Resolution
Jie Liu
Chaoqian Chen
Jie Tang
Gangshan Wu
SupR
25
12
0
30 Nov 2022
Lightweight Structure-Aware Attention for Visual Understanding
Heeseung Kwon
F. M. Castro
M. Marín-Jiménez
N. Guil
Alahari Karteek
31
2
0
29 Nov 2022
ExpNet: A unified network for Expert-Level Classification
Junde Wu
Huihui Fang
Yehui Yang
Yu Zhang
Haoyi Xiong
Huazhu Fu
Yanwu Xu
30
0
0
29 Nov 2022
Survey on Self-Supervised Multimodal Representation Learning and Foundation Models
Sushil Thapa
AI4TS
SSL
20
1
0
29 Nov 2022
Lightning Fast Video Anomaly Detection via Adversarial Knowledge Distillation
Florinel-Alin Croitoru
Nicolae-Cătălin Ristea
D. Dascalescu
Radu Tudor Ionescu
Fahad Shahbaz Khan
M. Shah
43
2
0
28 Nov 2022
FsaNet: Frequency Self-attention for Semantic Segmentation
Fengyu Zhang
Ashkan Panahi
Guangjun Gao
AI4TS
32
28
0
28 Nov 2022
Dynamic Feature Pruning and Consolidation for Occluded Person Re-Identification
Yuteng Ye
Hang Zhou
Jiale Cai
Chenxing Gao
Youjia Zhang
Junle Wang
Qiang Hu
Junqing Yu
Wei Yang
31
6
0
27 Nov 2022
Degenerate Swin to Win: Plain Window-based Transformer without Sophisticated Operations
Tan Yu
Ping Li
ViT
46
5
0
25 Nov 2022
Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token Migration
Yunjie Tian
Lingxi Xie
Jihao Qiu
Jianbin Jiao
Yaowei Wang
Qi Tian
Qixiang Ye
ViT
39
6
0
23 Nov 2022
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition
Qibin Hou
Cheng Lu
Mingg-Ming Cheng
Jiashi Feng
ViT
34
129
0
22 Nov 2022
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers
Sifan Long
Z. Zhao
Jimin Pi
Sheng-sheng Wang
Jingdong Wang
27
31
0
21 Nov 2022
Vision Transformer with Super Token Sampling
Huaibo Huang
Xiaoqiang Zhou
Jie Cao
Ran He
Tieniu Tan
ViT
23
56
0
21 Nov 2022
PIDray: A Large-scale X-ray Benchmark for Real-World Prohibited Item Detection
Libo Zhang
Lutao Jiang
Ruyi Ji
Hengrui Fan
24
23
0
19 Nov 2022
Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference
Haoran You
Yunyang Xiong
Xiaoliang Dai
Bichen Wu
Peizhao Zhang
Haoqi Fan
Peter Vajda
Yingyan Lin
37
32
0
18 Nov 2022
Vision Transformers in Medical Imaging: A Review
Emerald U. Henry
Onyeka Emebob
C. Omonhinmin
ViT
MedIm
40
34
0
18 Nov 2022
MIMT: Multi-Illuminant Color Constancy via Multi-Task Local Surface and Light Color Learning
Shuwei Li
Ji-kai Wang
Michael S. Brown
R. Tan
22
5
0
16 Nov 2022
HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers
Peiyan Dong
Mengshu Sun
Alec Lu
Yanyue Xie
Li-Yu Daisy Liu
...
Xin Meng
Zechao Li
Xue Lin
Zhenman Fang
Yanzhi Wang
ViT
36
62
0
15 Nov 2022
ParCNetV2: Oversized Kernel with Enhanced Attention
Ruihan Xu
Haokui Zhang
Wenze Hu
Shiliang Zhang
Xiaoyu Wang
ViT
32
6
0
14 Nov 2022
Efficient Speech Quality Assessment using Self-supervised Framewise Embeddings
Karl El Hajal
Zihan Wu
Neil Scheidwasser
Gasser Elbanna
Milos Cernak
23
9
0
12 Nov 2022
Token Transformer: Can class token help window-based transformer build better long-range interactions?
Jia-ju Mao
Yuan Chang
Xuesong Yin
34
0
0
11 Nov 2022
A Comprehensive Survey of Transformers for Computer Vision
Sonain Jamil
Md. Jalil Piran
Oh-Jin Kwon
ViT
30
47
0
11 Nov 2022
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
...
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
VLM
44
660
0
10 Nov 2022
Training a Vision Transformer from scratch in less than 24 hours with 1 GPU
Saghar Irandoust
Thibaut Durand
Yunduz Rakhmangulova
Wenjie Zi
Hossein Hajimirsadeghi
ViT
41
6
0
09 Nov 2022
ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention
Jyotikrishna Dass
Shang Wu
Huihong Shi
Chaojian Li
Zhifan Ye
Zhongfeng Wang
Yingyan Lin
20
53
0
09 Nov 2022
MogaNet: Multi-order Gated Aggregation Network
Siyuan Li
Zedong Wang
Zicheng Liu
Cheng Tan
Haitao Lin
Di Wu
Zhiyuan Chen
Jiangbin Zheng
Stan Z. Li
31
55
0
07 Nov 2022
SPEAKER VGG CCT: Cross-corpus Speech Emotion Recognition with Speaker Embedding and Vision Transformers
Alessandro Arezzo
Stefano Berretti
ViT
35
15
0
04 Nov 2022
Pixel-Wise Contrastive Distillation
Junqiang Huang
Zichao Guo
44
4
0
01 Nov 2022
ViT-LSLA: Vision Transformer with Light Self-Limited-Attention
Zhenzhe Hechen
Wei Huang
Yixin Zhao
ViT
38
6
0
31 Oct 2022
Relative Attention-based One-Class Adversarial Autoencoder for Continuous Authentication of Smartphone Users
Mingming Hu
Kun Zhang
Ruibang You
Bibo Tu
AAML
27
1
0
30 Oct 2022
Explicitly Increasing Input Information Density for Vision Transformers on Small Datasets
Xiangyu Chen
Ying Qin
Wenju Xu
A. Bur
Cuncong Zhong
Guanghui Wang
ViT
49
3
0
25 Oct 2022
LCPFormer: Towards Effective 3D Point Cloud Analysis via Local Context Propagation in Transformers
Zhuo Huang
Zhiyou Zhao
Banghuai Li
Jungong Han
3DPC
ViT
37
55
0
23 Oct 2022
S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention
Chi Zhang
Lu Zhou
Lei Wang
Zaiyan Dai
Jun Yang
ViT
34
24
0
22 Oct 2022
Accumulated Trivial Attention Matters in Vision Transformers on Small Datasets
Xiangyu Chen
Qinghao Hu
Kaidong Li
Cuncong Zhong
Guanghui Wang
ViT
38
11
0
22 Oct 2022
Face Pyramid Vision Transformer
Khawar Islam
M. Zaheer
Arif Mahmood
ViT
CVBM
24
4
0
21 Oct 2022
Boosting vision transformers for image retrieval
Chull Hwan Song
Jooyoung Yoon
Shunghyun Choi
Yannis Avrithis
ViT
34
32
0
21 Oct 2022
Previous
1
2
3
...
8
9
10
...
15
16
17
Next