ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.15808
  4. Cited By
CvT: Introducing Convolutions to Vision Transformers

CvT: Introducing Convolutions to Vision Transformers

29 March 2021
Haiping Wu
Bin Xiao
Noel Codella
Mengchen Liu
Xiyang Dai
Lu Yuan
Lei Zhang
    ViT
ArXivPDFHTML

Papers citing "CvT: Introducing Convolutions to Vision Transformers"

50 / 819 papers shown
Title
Sequence and Circle: Exploring the Relationship Between Patches
Sequence and Circle: Exploring the Relationship Between Patches
Zhengyang Yu
Jochen Triesch
ViT
31
0
0
18 Oct 2022
On effects of Knowledge Distillation on Transfer Learning
On effects of Knowledge Distillation on Transfer Learning
Sushil Thapa
24
1
0
18 Oct 2022
ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
Haoran You
Zhanyi Sun
Huihong Shi
Zhongzhi Yu
Yang Katie Zhao
Yongan Zhang
Chaojian Li
Baopu Li
Yingyan Lin
ViT
25
81
0
18 Oct 2022
Linear Video Transformer with Feature Fixation
Linear Video Transformer with Feature Fixation
Kaiyue Lu
Zexia Liu
Jianyuan Wang
Weixuan Sun
Zhen Qin
...
Xuyang Shen
Huizhong Deng
Xiaodong Han
Yuchao Dai
Yiran Zhong
35
4
0
15 Oct 2022
Optimizing Vision Transformers for Medical Image Segmentation
Optimizing Vision Transformers for Medical Image Segmentation
Qianying Liu
Chaitanya Kaul
Jun Wang
Christos Anagnostopoulos
Roderick Murray-Smith
F. Deligianni
ViT
MedIm
24
19
0
14 Oct 2022
MCTNet: A Multi-Scale CNN-Transformer Network for Change Detection in
  Optical Remote Sensing Images
MCTNet: A Multi-Scale CNN-Transformer Network for Change Detection in Optical Remote Sensing Images
Weiming Li
Lihui Xue
Xueqian Wang
Gang Li
ViT
16
12
0
14 Oct 2022
TokenMixup: Efficient Attention-guided Token-level Data Augmentation for
  Transformers
TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers
Hyeong Kyu Choi
Joonmyung Choi
Hyunwoo J. Kim
ViT
33
35
0
14 Oct 2022
How to Train Vision Transformer on Small-scale Datasets?
How to Train Vision Transformer on Small-scale Datasets?
Hanan Gani
Muzammal Naseer
Mohammad Yaqub
ViT
20
51
0
13 Oct 2022
FontTransformer: Few-shot High-resolution Chinese Glyph Image Synthesis
  via Stacked Transformers
FontTransformer: Few-shot High-resolution Chinese Glyph Image Synthesis via Stacked Transformers
Yitian Liu
Zheng Lian
43
13
0
12 Oct 2022
Bridging the Gap Between Vision Transformers and Convolutional Neural
  Networks on Small Datasets
Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
Zhiying Lu
Hongtao Xie
Chuanbin Liu
Yongdong Zhang
ViT
28
57
0
12 Oct 2022
SaiT: Sparse Vision Transformers through Adaptive Token Pruning
SaiT: Sparse Vision Transformers through Adaptive Token Pruning
Ling Li
D. Thorsley
Joseph Hassoun
ViT
27
17
0
11 Oct 2022
Coded Residual Transform for Generalizable Deep Metric Learning
Coded Residual Transform for Generalizable Deep Metric Learning
Shichao Kan
Yixiong Liang
Min Li
Yigang Cen
Jianxin Wang
Z. He
34
3
0
09 Oct 2022
Flexible Alignment Super-Resolution Network for Multi-Contrast MRI
Flexible Alignment Super-Resolution Network for Multi-Contrast MRI
Yiming Liu
Mengxi Zhang
Weiqin Zhang
Bo Jiang
Bo Hou
Dan Liu
Jie Chen
Heqing Lian
MedIm
33
1
0
07 Oct 2022
Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision
  Tasks
Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks
Yen-Cheng Liu
Chih-Yao Ma
Junjiao Tian
Zijian He
Z. Kira
128
47
0
07 Oct 2022
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision
  Models
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
Chenglin Yang
Siyuan Qiao
Qihang Yu
Xiaoding Yuan
Yukun Zhu
Alan Yuille
Hartwig Adam
Liang-Chieh Chen
ViT
MoE
39
59
0
04 Oct 2022
Towards Flexible Inductive Bias via Progressive Reparameterization
  Scheduling
Towards Flexible Inductive Bias via Progressive Reparameterization Scheduling
Yunsung Lee
Gyuseong Lee
Kwang-seok Ryoo
Hyojun Go
Jihye Park
Seung Wook Kim
32
5
0
04 Oct 2022
Dual-former: Hybrid Self-attention Transformer for Efficient Image
  Restoration
Dual-former: Hybrid Self-attention Transformer for Efficient Image Restoration
Sixiang Chen
Tian-Chun Ye
Yun-Peng Liu
Erkang Chen
ViT
34
15
0
03 Oct 2022
Attention Distillation: self-supervised vision transformer students need
  more guidance
Attention Distillation: self-supervised vision transformer students need more guidance
Kai Wang
Fei Yang
Joost van de Weijer
ViT
30
16
0
03 Oct 2022
E-Branchformer: Branchformer with Enhanced merging for speech
  recognition
E-Branchformer: Branchformer with Enhanced merging for speech recognition
Kwangyoun Kim
Felix Wu
Yifan Peng
Jing Pan
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
61
105
0
30 Sep 2022
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and
  Effective Fusion of Local, Global and Input Features
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features
S. Wadekar
Abhishek Chaurasia
ViT
106
88
0
30 Sep 2022
Effective Vision Transformer Training: A Data-Centric Perspective
Effective Vision Transformer Training: A Data-Centric Perspective
Benjia Zhou
Pichao Wang
Jun Wan
Yan-Ni Liang
Fan Wang
26
5
0
29 Sep 2022
Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully
  Exploiting Self-Attention
Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-Attention
Xiangcheng Liu
Tianyi Wu
Guodong Guo
ViT
48
26
0
28 Sep 2022
Self-Supervised Masked Convolutional Transformer Block for Anomaly
  Detection
Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection
Neelu Madan
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
Kamal Nasrollahi
Fahad Shahbaz Khan
T. Moeslund
M. Shah
ViT
MedIm
264
62
0
25 Sep 2022
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question
  Answering
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering
Hao Li
Jinfa Huang
Peng Jin
Guoli Song
Qi Wu
Jie Chen
39
21
0
21 Sep 2022
Understanding the Tricks of Deep Learning in Medical Image Segmentation:
  Challenges and Future Directions
Understanding the Tricks of Deep Learning in Medical Image Segmentation: Challenges and Future Directions
Dong-Ming Zhang
Yi-Mou Lin
Hao Chen
Zhuotao Tian
Xin Yang
Jinhui Tang
Kwang-Ting Cheng
VLM
35
11
0
21 Sep 2022
On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural Networks
On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural Networks
Hubert Leterme
K. Polisano
V. Perrier
Alahari Karteek
FAtt
38
2
0
19 Sep 2022
A Mosquito is Worth 16x16 Larvae: Evaluation of Deep Learning
  Architectures for Mosquito Larvae Classification
A Mosquito is Worth 16x16 Larvae: Evaluation of Deep Learning Architectures for Mosquito Larvae Classification
Aswin Surya
David B. Peral
Austin VanLoon
A. Rajesh
MedIm
8
2
0
16 Sep 2022
Transformer based Fingerprint Feature Extraction
Transformer based Fingerprint Feature Extraction
Saraansh Tandon
A. Namboodiri
ViT
39
8
0
08 Sep 2022
Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D
  Image Representations
Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D Image Representations
Vadim Tschernezki
Iro Laina
Diane Larlus
Andrea Vedaldi
189
185
0
07 Sep 2022
Fusion of Satellite Images and Weather Data with Transformer Networks
  for Downy Mildew Disease Detection
Fusion of Satellite Images and Weather Data with Transformer Networks for Downy Mildew Disease Detection
William Maillet
Maryam Ouhami
A. Hafiane
ViT
MedIm
22
6
0
06 Sep 2022
Statistical Foundation Behind Machine Learning and Its Impact on
  Computer Vision
Statistical Foundation Behind Machine Learning and Its Impact on Computer Vision
Lei Zhang
H. Shum
VLM
SSL
27
2
0
06 Sep 2022
ELMformer: Efficient Raw Image Restoration with a Locally Multiplicative
  Transformer
ELMformer: Efficient Raw Image Restoration with a Locally Multiplicative Transformer
Jiaqi Ma
Shengyuan Yan
Lefei Zhang
Guoli Wang
Qian Zhang
41
8
0
31 Aug 2022
MAFormer: A Transformer Network with Multi-scale Attention Fusion for
  Visual Recognition
MAFormer: A Transformer Network with Multi-scale Attention Fusion for Visual Recognition
Y. Wang
H. Sun
Xiaodi Wang
Bin Zhang
Chaonan Li
Ying Xin
Baochang Zhang
Errui Ding
Shumin Han
ViT
31
9
0
31 Aug 2022
MRL: Learning to Mix with Attention and Convolutions
MRL: Learning to Mix with Attention and Convolutions
Shlok Mohta
Hisahiro Suganuma
Yoshiki Tanaka
28
2
0
30 Aug 2022
Adaptive Perception Transformer for Temporal Action Localization
Adaptive Perception Transformer for Temporal Action Localization
Yizheng Ouyang
Tianjin Zhang
Weibo Gu
Hongfa Wang
21
3
0
25 Aug 2022
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted
  Window
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window
Mocho Go
Hideyuki Tachibana
ViT
37
9
0
24 Aug 2022
Efficient Attention-free Video Shift Transformers
Efficient Attention-free Video Shift Transformers
Adrian Bulat
Brais Martínez
Georgios Tzimiropoulos
ViT
29
1
0
23 Aug 2022
FocusFormer: Focusing on What We Need via Architecture Sampler
FocusFormer: Focusing on What We Need via Architecture Sampler
Jing Liu
Jianfei Cai
Bohan Zhuang
40
7
0
23 Aug 2022
DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection
DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection
Jingyu Lin
Jie Jiang
Y. Yan
Chunchao Guo
Hongfa Wang
Wei Liu
Hanzi Wang
ViT
31
3
0
21 Aug 2022
Improved Image Classification with Token Fusion
Improved Image Classification with Token Fusion
Keong-Hun Choi
Jin-Woo Kim
Yaolong Wang
J. Ha
ViT
24
0
0
19 Aug 2022
Learning Spatial-Frequency Transformer for Visual Object Tracking
Learning Spatial-Frequency Transformer for Visual Object Tracking
Chuanming Tang
Tianlin Li
Yuanchao Bai
Zhe Wu
Jianlin Zhang
Yongmei Huang
ViT
37
43
0
18 Aug 2022
Conviformers: Convolutionally guided Vision Transformer
Conviformers: Convolutionally guided Vision Transformer
Mohit Vaishnav
Thomas Fel
I. F. Rodriguez
Thomas Serre
ViT
38
1
0
17 Aug 2022
Your ViT is Secretly a Hybrid Discriminative-Generative Diffusion Model
Your ViT is Secretly a Hybrid Discriminative-Generative Diffusion Model
Xiulong Yang
Sheng-Min Shih
Yinlin Fu
Xiaoting Zhao
Shihao Ji
DiffM
33
56
0
16 Aug 2022
Flow-Guided Transformer for Video Inpainting
Flow-Guided Transformer for Video Inpainting
Kaiwen Zhang
Jingjing Fu
Dong Liu
ViT
35
68
0
14 Aug 2022
Class-attention Video Transformer for Engagement Intensity Prediction
Class-attention Video Transformer for Engagement Intensity Prediction
Xusheng Ai
Victor S. Sheng
Chunhua Li
Zhiming Cui
ViT
17
5
0
12 Aug 2022
Deep is a Luxury We Don't Have
Deep is a Luxury We Don't Have
Ahmed Taha
Yen Nhi Truong Vu
Brent Mombourquette
Thomas P. Matthews
Jason Su
Sadanand Singh
ViT
MedIm
26
2
0
11 Aug 2022
DropKey
DropKey
Bonan li
Yinhan Hu
Xuecheng Nie
Congying Han
Xiangjian Jiang
Tiande Guo
Luoqi Liu
20
11
0
04 Aug 2022
Maintaining Performance with Less Data
Maintaining Performance with Less Data
Dominic Sanderson
Tatiana Kalgonova
33
1
0
03 Aug 2022
Global-Local Self-Distillation for Visual Representation Learning
Global-Local Self-Distillation for Visual Representation Learning
Tim Lebailly
Tinne Tuytelaars
SSL
30
6
0
29 Jul 2022
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated
  Convolutions
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
Yongming Rao
Wenliang Zhao
Yansong Tang
Jie Zhou
Ser-Nam Lim
Jiwen Lu
ViT
22
251
0
28 Jul 2022
Previous
123...91011...151617
Next