ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.15808
  4. Cited By
CvT: Introducing Convolutions to Vision Transformers

CvT: Introducing Convolutions to Vision Transformers

29 March 2021
Haiping Wu
Bin Xiao
Noel Codella
Mengchen Liu
Xiyang Dai
Lu Yuan
Lei Zhang
    ViT
ArXivPDFHTML

Papers citing "CvT: Introducing Convolutions to Vision Transformers"

50 / 818 papers shown
Title
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
Chenhongyi Yang
Zehui Chen
Miguel Espinosa
Linus Ericsson
Zhenyu Wang
Jiaming Liu
Elliot J. Crowley
Mamba
39
88
0
26 Mar 2024
Exploring Dynamic Transformer for Efficient Object Tracking
Exploring Dynamic Transformer for Efficient Object Tracking
Jiawen Zhu
Xin Chen
Haiwen Diao
Shuai Li
Jun-Yan He
Chenyang Li
Bin Luo
Dong Wang
Huchuan Lu
43
2
0
26 Mar 2024
CFAT: Unleashing TriangularWindows for Image Super-resolution
CFAT: Unleashing TriangularWindows for Image Super-resolution
Abhisek Ray
Gaurav Kumar
M. Kolekar
SupR
35
8
0
24 Mar 2024
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for
  Faster Inference
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
Tanvir Mahmud
Burhaneddin Yaman
Chun-Hao Liu
Diana Marculescu
38
2
0
24 Mar 2024
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate
  Time series
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
59
50
0
22 Mar 2024
ParFormer: Vision Transformer Baseline with Parallel Local Global Token
  Mixer and Convolution Attention Patch Embedding
ParFormer: Vision Transformer Baseline with Parallel Local Global Token Mixer and Convolution Attention Patch Embedding
Novendra Setyawan
Ghufron Wahyu Kurniawan
Chi-Chia Sun
Jun-Wei Hsieh
Hui-Kai Su
W. Kuo
ViT
MoE
42
0
0
22 Mar 2024
LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT
  Descriptors
LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors
Saksham Suri
Matthew Walmer
Kamal Gupta
Abhinav Shrivastava
41
4
0
21 Mar 2024
Retina Vision Transformer (RetinaViT): Introducing Scaled Patches into
  Vision Transformers
Retina Vision Transformer (RetinaViT): Introducing Scaled Patches into Vision Transformers
Yuyang Shu
Michael E. Bain
ViT
MedIm
MDE
37
0
0
20 Mar 2024
HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs
HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs
Ting Yao
Yehao Li
Yingwei Pan
Tao Mei
ViT
31
15
0
18 Mar 2024
D-Net: Dynamic Large Kernel with Dynamic Feature Fusion for Volumetric
  Medical Image Segmentation
D-Net: Dynamic Large Kernel with Dynamic Feature Fusion for Volumetric Medical Image Segmentation
Jin Yang
Peijie Qiu
Yichi Zhang
Daniel S. Marcus
Aristeidis Sotiras
MedIm
49
9
0
15 Mar 2024
Group-Mix SAM: Lightweight Solution for Industrial Assembly Line
  Applications
Group-Mix SAM: Lightweight Solution for Industrial Assembly Line Applications
Wu Liang
X.-G. Ma
36
0
0
15 Mar 2024
Activating Wider Areas in Image Super-Resolution
Activating Wider Areas in Image Super-Resolution
Cheng Cheng
Hang Wang
Hongbin Sun
37
10
0
13 Mar 2024
Learning Correction Errors via Frequency-Self Attention for Blind Image
  Super-Resolution
Learning Correction Errors via Frequency-Self Attention for Blind Image Super-Resolution
Haochen Sun
Yan Yuan
Lijuan Su
Hao-Yu Shao
41
1
0
12 Mar 2024
LKM-UNet: Large Kernel Vision Mamba UNet for Medical Image Segmentation
LKM-UNet: Large Kernel Vision Mamba UNet for Medical Image Segmentation
Jinhong Wang
Jintai Chen
Danny Chen
Jian Wu
Mamba
51
20
0
12 Mar 2024
Explainable Transformer Prototypes for Medical Diagnoses
Explainable Transformer Prototypes for Medical Diagnoses
Ugur Demir
Debesh Jha
Zheyu Zhang
Elif Keles
Bradley Allen
Aggelos K. Katsaggelos
Ulas Bagci
MedIm
16
3
0
11 Mar 2024
GRITv2: Efficient and Light-weight Social Relation Recognition
GRITv2: Efficient and Light-weight Social Relation Recognition
Sagar Reddy
Neeraj Kasera
Avinash Thakur
ViT
25
0
0
11 Mar 2024
Not just Birds and Cars: Generic, Scalable and Explainable Models for
  Professional Visual Recognition
Not just Birds and Cars: Generic, Scalable and Explainable Models for Professional Visual Recognition
Junde Wu
Jiayuan Zhu
Min Xu
Yueming Jin
32
0
0
08 Mar 2024
Transformers and Language Models in Form Understanding: A Comprehensive
  Review of Scanned Document Analysis
Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis
Abdelrahman Abdallah
Daniel Eberharter
Zoe Pfister
Adam Jatowt
40
12
0
06 Mar 2024
A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid
  Transformer and Contrastive Learning
A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning
Yuelin Zhang
Pengyu Zheng
Wanquan Yan
Chengyu Fang
Shing Shin Cheng
MedIm
37
7
0
05 Mar 2024
AllSpark: Reborn Labeled Features from Unlabeled in Transformer for
  Semi-Supervised Semantic Segmentation
AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation
Haonan Wang
Qixiang Zhang
Yi Li
Xiaomeng Li
43
16
0
04 Mar 2024
ViTaL: An Advanced Framework for Automated Plant Disease Identification
  in Leaf Images Using Vision Transformers and Linear Projection For Feature
  Reduction
ViTaL: An Advanced Framework for Automated Plant Disease Identification in Leaf Images Using Vision Transformers and Linear Projection For Feature Reduction
Abhishek Sebastian
A. AnnisFathima
R. Pragna
S. MadhanKumar
G. YaswanthKannan
Vinay Murali
MedIm
32
4
0
27 Feb 2024
A Comparison of Deep Learning Models for Proton Background Rejection
  with the AMS Electromagnetic Calorimeter
A Comparison of Deep Learning Models for Proton Background Rejection with the AMS Electromagnetic Calorimeter
R. K. Hashmani
Emre Akbas
M. Demirköz
39
1
0
26 Feb 2024
Zero-shot generalization across architectures for visual classification
Zero-shot generalization across architectures for visual classification
Evan Gerritz
Luciano Dyballa
Steven W. Zucker
31
1
0
21 Feb 2024
FViT: A Focal Vision Transformer with Gabor Filter
FViT: A Focal Vision Transformer with Gabor Filter
Yulong Shi
Mingwei Sun
Yongshuai Wang
Rui Wang
57
4
0
17 Feb 2024
TDViT: Temporal Dilated Video Transformer for Dense Video Tasks
TDViT: Temporal Dilated Video Transformer for Dense Video Tasks
Guanxiong Sun
Yang Hua
Guosheng Hu
N. Robertson
ViT
27
1
0
14 Feb 2024
Exploring the Synergies of Hybrid CNNs and ViTs Architectures for
  Computer Vision: A survey
Exploring the Synergies of Hybrid CNNs and ViTs Architectures for Computer Vision: A survey
Haruna Yunusa
Shiyin Qin
Abdulrahman Hamman Adama Chukkol
Abdulganiyu Abdu Yusuf
Isah Bello
A. Lawan
ViT
43
13
0
05 Feb 2024
Spatio-temporal Prompting Network for Robust Video Feature Extraction
Spatio-temporal Prompting Network for Robust Video Feature Extraction
Guanxiong Sun
Chi Wang
Zhaoyu Zhang
Jiankang Deng
S. Zafeiriou
Yang Hua
ViT
17
4
0
04 Feb 2024
ScribFormer: Transformer Makes CNN Work Better for Scribble-based
  Medical Image Segmentation
ScribFormer: Transformer Makes CNN Work Better for Scribble-based Medical Image Segmentation
Zihan Li
Yuan Zheng
Dandan Shan
Shuzhou Yang
Qingde Li
Beizhan Wang
Yuan-ting Zhang
Qingqi Hong
Dinggang Shen
ViT
MedIm
32
39
0
03 Feb 2024
HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack
  on Text
HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack on Text
Han Liu
Zhi Xu
Xiaotong Zhang
Feng Zhang
Fenglong Ma
Hongyang Chen
Hong Yu
Xianchao Zhang
AAML
22
7
0
02 Feb 2024
Convolution Meets LoRA: Parameter Efficient Finetuning for Segment
  Anything Model
Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model
Zihan Zhong
Zhiqiang Tang
Tong He
Haoyang Fang
Chun Yuan
46
41
0
31 Jan 2024
Local and Global Contexts for Conversation
Local and Global Contexts for Conversation
Zuoquan Lin
Xinyi Shen
18
1
0
31 Jan 2024
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
Seokju Yun
Youngmin Ro
ViT
44
29
0
29 Jan 2024
Do deep neural networks utilize the weight space efficiently?
Do deep neural networks utilize the weight space efficiently?
Onur Can Koyun
B. U. Toreyin
18
0
0
26 Jan 2024
Convolutional Initialization for Data-Efficient Vision Transformers
Convolutional Initialization for Data-Efficient Vision Transformers
Jianqiao Zheng
Xueqian Li
Simon Lucey
43
2
0
23 Jan 2024
Anisotropy Is Inherent to Self-Attention in Transformers
Anisotropy Is Inherent to Self-Attention in Transformers
Nathan Godey
Eric Villemonte de la Clergerie
Benoît Sagot
18
16
0
22 Jan 2024
Colorectal Polyp Segmentation in the Deep Learning Era: A Comprehensive
  Survey
Colorectal Polyp Segmentation in the Deep Learning Era: A Comprehensive Survey
Zhenyu Wu
Fengmao Lv
Chenglizhao Chen
Aimin Hao
Shuo Li
ELM
33
10
0
22 Jan 2024
Cloud-based XAI Services for Assessing Open Repository Models Under
  Adversarial Attacks
Cloud-based XAI Services for Assessing Open Repository Models Under Adversarial Attacks
Zerui Wang
Yan Liu
AAML
25
1
0
22 Jan 2024
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning
Chu Myaet Thwal
Minh N. H. Nguyen
Ye Lin Tun
Seongjin Kim
My T. Thai
Choong Seon Hong
61
5
0
22 Jan 2024
Unifying Visual and Vision-Language Tracking via Contrastive Learning
Unifying Visual and Vision-Language Tracking via Contrastive Learning
Yinchao Ma
Yuyang Tang
Wenfei Yang
Tianzhu Zhang
Jinpeng Zhang
Mengxue Kang
ObjD
20
12
0
20 Jan 2024
Learning Position-Aware Implicit Neural Network for Real-World Face
  Inpainting
Learning Position-Aware Implicit Neural Network for Real-World Face Inpainting
Bo Zhao
Huan Yang
Jianlong Fu
CVBM
40
0
0
19 Jan 2024
Efficient generative adversarial networks using linear
  additive-attention Transformers
Efficient generative adversarial networks using linear additive-attention Transformers
Emilio Morales-Juarez
Gibran Fuentes Pineda
42
3
0
17 Jan 2024
Vision Mamba: Efficient Visual Representation Learning with
  Bidirectional State Space Model
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu
Bencheng Liao
Qian Zhang
Xinlong Wang
Wenyu Liu
Xinggang Wang
Mamba
50
710
0
17 Jan 2024
Cross-Level Multi-Instance Distillation for Self-Supervised Fine-Grained
  Visual Categorization
Cross-Level Multi-Instance Distillation for Self-Supervised Fine-Grained Visual Categorization
Qi Bi
Wei Ji
Jingjun Yi
Haolan Zhan
Gui-Song Xia
30
0
0
16 Jan 2024
Efficient Vision-and-Language Pre-training with Text-Relevant Image
  Patch Selection
Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection
Wei Ye
Chaoya Jiang
Haiyang Xu
Chenhao Ye
Chenliang Li
Mingshi Yan
Shikun Zhang
Songhang Huang
Fei Huang
VLM
31
0
0
11 Jan 2024
Evaluating Data Augmentation Techniques for Coffee Leaf Disease
  Classification
Evaluating Data Augmentation Techniques for Coffee Leaf Disease Classification
Adrian Gheorghiu
Iulian-Marius Taiatu
Dumitru-Clementin Cercel
Iuliana Marin
Florin-Catalin Pop
54
2
0
11 Jan 2024
Transforming Image Super-Resolution: A ConvFormer-based Efficient
  Approach
Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach
Gang Wu
Junjun Jiang
Junpeng Jiang
Xianming Liu
SupR
43
7
0
11 Jan 2024
LF-ViT: Reducing Spatial Redundancy in Vision Transformer for Efficient
  Image Recognition
LF-ViT: Reducing Spatial Redundancy in Vision Transformer for Efficient Image Recognition
Youbing Hu
Yun Cheng
Anqi Lu
Zhiqiang Cao
Dawei Wei
Jie Liu
Zhijun Li
ViT
24
6
0
08 Jan 2024
SeTformer is What You Need for Vision and Language
SeTformer is What You Need for Vision and Language
Pourya Shamsolmoali
Masoumeh Zareapoor
Eric Granger
Michael Felsberg
40
4
0
07 Jan 2024
SPFormer: Enhancing Vision Transformer with Superpixel Representation
SPFormer: Enhancing Vision Transformer with Superpixel Representation
Jieru Mei
Liang-Chieh Chen
Alan L. Yuille
Cihang Xie
ViT
MDE
21
4
0
05 Jan 2024
A Cost-Efficient FPGA Implementation of Tiny Transformer Model using
  Neural ODE
A Cost-Efficient FPGA Implementation of Tiny Transformer Model using Neural ODE
Ikumi Okubo
Keisuke Sugiura
Hiroki Matsutani
36
2
0
05 Jan 2024
Previous
12345...151617
Next