Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.15808
Cited By
CvT: Introducing Convolutions to Vision Transformers
29 March 2021
Haiping Wu
Bin Xiao
Noel Codella
Mengchen Liu
Xiyang Dai
Lu Yuan
Lei Zhang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CvT: Introducing Convolutions to Vision Transformers"
50 / 818 papers shown
Title
A 2D Semantic-Aware Position Encoding for Vision Transformers
Xi Chen
Shiyang Zhou
Muqi Huang
Jiaxu Feng
Yun Xiong
...
Yuyao Zhang
Huishuai Bao
Sijia Peng
Chong Li
Feng Shi
ViT
31
0
0
14 May 2025
FAD: Frequency Adaptation and Diversion for Cross-domain Few-shot Learning
Ruixiao Shi
Fu Feng
Yucheng Xie
Jing Wang
Xin Geng
29
0
0
13 May 2025
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
Young-Hu Park
R.-H. Park
Hyung-Min Park
49
0
0
07 May 2025
Hyb-KAN ViT: Hybrid Kolmogorov-Arnold Networks Augmented Vision Transformer
Sainath Dey
Mitul Goswami
Jashika Sethi
Prasant Kumar Pattnaik
ViT
30
0
0
07 May 2025
Image Recognition with Online Lightweight Vision Transformer: A Survey
Zherui Zhang
Rongtao Xu
Jie Zhou
Changwei Wang
Xingtian Pei
...
Jiguang Zhang
Li Guo
Longxiang Gao
Wenyuan Xu
Shibiao Xu
ViT
148
0
0
06 May 2025
AI Assisted Cervical Cancer Screening for Cytology Samples in Developing Countries
Love Panta
Suraj Prasai
Karishma Malla Vaidya
Shyam Shrestha
Suresh Manandhar
44
0
0
29 Apr 2025
Group Downsampling with Equivariant Anti-aliasing
Md Ashiqur Rahman
Raymond A. Yeh
67
1
0
24 Apr 2025
ECViT: Efficient Convolutional Vision Transformer with Local-Attention and Multi-scale Stages
Zhoujie Qian
ViT
29
0
0
21 Apr 2025
HMPE:HeatMap Embedding for Efficient Transformer-Based Small Object Detection
YangChen Zeng
ViT
31
0
0
18 Apr 2025
Fighting Fires from Space: Leveraging Vision Transformers for Enhanced Wildfire Detection and Characterization
Aman Agarwal
James Gearon
Raksha Rank
Etienne Chenevert
31
0
0
18 Apr 2025
Graph Network for Sign Language Tasks
Shiwei Gan
Yafeng Yin
Zhiwei Jiang
Hongkai Wen
Lei Xie
Sanglu Lu
SLR
49
0
0
16 Apr 2025
GFT: Gradient Focal Transformer
Boris Kriuk
Simranjit Kaur Gill
Shoaib Aslam
Amir Fakhrutdinov
31
0
0
14 Apr 2025
Multi-modal and Multi-view Fundus Image Fusion for Retinopathy Diagnosis via Multi-scale Cross-attention and Shifted Window Self-attention
Yonghao Huang
Leiting Chen
Chuan Zhou
19
0
0
12 Apr 2025
A Hybrid Fully Convolutional CNN-Transformer Model for Inherently Interpretable Medical Image Classification
K. Djoumessi
Samuel Ofosu Mensah
Philipp Berens
ViT
MedIm
34
0
0
11 Apr 2025
HGFormer: Topology-Aware Vision Transformer with HyperGraph Learning
Hao Wang
Shuo Zhang
Biao Leng
ViT
82
0
0
03 Apr 2025
Multi-Token Attention
O. Yu. Golovneva
Tianlu Wang
Jason Weston
Sainbayar Sukhbaatar
48
1
0
01 Apr 2025
Spectral-Adaptive Modulation Networks for Visual Perception
Guhnoo Yun
J. Yoo
Kijung Kim
Jeongho Lee
Paul Hongsuck Seo
Dong Hwan Kim
42
0
0
31 Mar 2025
Video-based Traffic Light Recognition by Rockchip RV1126 for Autonomous Driving
Miao Fan
Xuxu Kong
Shengtong Xu
Haoyi Xiong
Xiangzeng Liu
ViT
46
0
0
31 Mar 2025
Stack Transformer Based Spatial-Temporal Attention Model for Dynamic Multi-Culture Sign Language Recognition
Koki Hirooka
Abu Saleh Musa Miah
Tatsuya Murakami
Yuto Akiba
Yong Seok Hwang
Jungpil Shin
SLR
54
0
0
21 Mar 2025
DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition
Caoshuo Li
Tanzhe Li
Xiaobin Hu
Donghao Luo
Taisong Jin
66
0
0
19 Mar 2025
Unlocking Open-Set Language Accessibility in Vision Models
Fawaz Sammani
Jonas Fischer
Nikos Deligiannis
VLM
55
0
0
14 Mar 2025
LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding
Shen Zhang
Yaning Tan
Siyuan Liang
Zhaowei Chen
Linze Li
...
Shuheng Li
Zhenyu Zhao
Caihua Chen
Jiajun Liang
Yao Tang
51
0
0
06 Mar 2025
Transformers with Joint Tokens and Local-Global Attention for Efficient Human Pose Estimation
K. A. Kinfu
René Vidal
ViT
26
0
0
28 Feb 2025
Escaping The Big Data Paradigm in Self-Supervised Representation Learning
Carlos Vélez García
Miguel Cazorla
Jorge Pomares
54
0
0
25 Feb 2025
VesselSAM: Leveraging SAM for Aortic Vessel Segmentation with LoRA and Atrous Attention
Adnan Iltaf
Rayan Merghani Ahmed
Bin Li
Bin Li
Shoujun Zhou
55
0
0
25 Feb 2025
MedKAN: An Advanced Kolmogorov-Arnold Network for Medical Image Classification
Zhuoqin Yang
Jiansong Zhang
Xiaoling Luo
Zheng Lu
Linlin Shen
MedIm
68
2
0
25 Feb 2025
SparseFormer: Detecting Objects in HRW Shots via Sparse Vision Transformer
Wenxi Li
Yuchen Guo
Jilai Zheng
Haozhe Lin
Chao Ma
Lu Fang
Xiaokang Yang
ViT
62
1
0
11 Feb 2025
MicroViT: A Vision Transformer with Low Complexity Self Attention for Edge Device
Novendra Setyawan
Chi-Chia Sun
Mao-Hsiu Hsu
W. Kuo
Jun-Wei Hsieh
ViT
49
2
0
09 Feb 2025
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
Weikang Meng
Yadan Luo
Xin Li
D. Jiang
Zheng Zhang
156
0
0
25 Jan 2025
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Hongjun Wang
Wonmin Byeon
Jiarui Xu
Liang Feng
Ka Chun Cheung
Xiaolong Wang
Kai Han
Jan Kautz
Sifei Liu
152
0
0
21 Jan 2025
Confident Pseudo-labeled Diffusion Augmentation for Canine Cardiomegaly Detection
Shiman Zhang
Lakshmikar R. Polamreddy
Youshan Zhang
MedIm
DiffM
42
0
0
13 Jan 2025
Image Classification with Deep Reinforcement Active Learning
Mingyuan Jiu
Xuguang Song
H. Sahbi
Shupan Li
Yan Chen
Wei Guo
Lihua Guo
Mingliang Xu
VLM
29
0
0
31 Dec 2024
Towards Simple and Provable Parameter-Free Adaptive Gradient Methods
Yuanzhe Tao
Huizhuo Yuan
Xun Zhou
Yuan Cao
Q. Gu
ODL
39
0
0
27 Dec 2024
Unity is Strength: Unifying Convolutional and Transformeral Features for Better Person Re-Identification
Yuhao Wang
Pingping Zhang
Xuehu Liu
Zhengzheng Tu
Huchuan Lu
42
3
0
23 Dec 2024
Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with MxDNA
Lifeng Qiao
Peng Ye
Yuchen Ren
Weiqiang Bai
Chaoqi Liang
Xinzhu Ma
Nanqing Dong
W. Ouyang
83
2
0
18 Dec 2024
Reversing the Damage: A QP-Aware Transformer-Diffusion Approach for 8K Video Restoration under Codec Compression
Ali Mollaahmadi Dehaghi
Reza Razavi
Mohammad Moshirpour
77
1
0
12 Dec 2024
Cascaded Multi-Scale Attention for Enhanced Multi-Scale Feature Extraction and Interaction with Low-Resolution Images
Xiangyong Lu
Masanori Suganuma
Takayuki Okatani
74
0
0
03 Dec 2024
Multi-Token Enhancing for Vision Representation Learning
Zhong-Yu Li
Yu-Song Hu
Bo Yin
Ming-Ming Cheng
66
1
0
24 Nov 2024
FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Generation
Trong-Thang Pham
Ngoc-Vuong Ho
Nhat-Tan Bui
T. Phan
Patel Brijesh
...
Gianfranco Doretto
Anh Nguyen
Carol C. Wu
Hien Nguyen
Ngan Le
92
2
0
23 Nov 2024
ReXrank: A Public Leaderboard for AI-Powered Radiology Report Generation
Xiaoman Zhang
Hong-Yu Zhou
Xiaoli Yang
Oishi Banerjee
J. N. Acosta
Josh Miller
Ouwen Huang
Pranav Rajpurkar
LM&MA
72
3
0
22 Nov 2024
D-Cube: Exploiting Hyper-Features of Diffusion Model for Robust Medical Classification
Minhee Jang
Juheon Son
Thanaporn Viriyasaranon
Junho Kim
Jang-Hwan Choi
MedIm
34
0
0
17 Nov 2024
SAG-ViT: A Scale-Aware, High-Fidelity Patching Approach with Graph Attention for Vision Transformers
Shravan Venkatraman
Jaskaran Singh Walia
J. Raheja
ViT
33
0
0
14 Nov 2024
Breaking the Low-Rank Dilemma of Linear Attention
Qihang Fan
Huaibo Huang
Ran He
45
1
0
12 Nov 2024
CFPNet: Improving Lightweight ToF Depth Completion via Cross-zone Feature Propagation
Laiyan Ding
Hualie Jiang
Rui Xu
Rui Huang
31
1
0
07 Nov 2024
Reducing catastrophic forgetting of incremental learning in the absence of rehearsal memory with task-specific token
Young Jo Choi
Min Kyoon Yoo
Yu Rang Park
24
0
0
06 Nov 2024
HRPVT: High-Resolution Pyramid Vision Transformer for medium and small-scale human pose estimation
Zhoujie Xu
ViT
3DH
36
2
0
29 Oct 2024
NT-VOT211: A Large-Scale Benchmark for Night-time Visual Object Tracking
Yu Liu
Arif Mahmood
Muhammad Haris Khan
21
2
0
27 Oct 2024
UTSRMorph: A Unified Transformer and Superresolution Network for Unsupervised Medical Image Registration
Runshi Zhang
Hao Mo
Junchen Wang
Bimeng Jie
Yang He
Nenghao Jin
Liang Zhu
ViT
MedIm
33
3
0
27 Oct 2024
TEAM: Topological Evolution-aware Framework for Traffic Forecasting--Extended Version
Duc Kieu
Tung Kieu
Peng Han
Bin Yang
Christian S. Jensen
Bac Le
AI4TS
32
1
0
24 Oct 2024
FIPER: Generalizable Factorized Features for Robust Low-Level Vision Models
Yang-Che Sun
Cheng Yu Yeo
Ernie Chu
Jun-Cheng Chen
Yu-Lun Liu
SupR
30
0
0
23 Oct 2024
1
2
3
4
...
15
16
17
Next