Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.15808
Cited By
CvT: Introducing Convolutions to Vision Transformers
29 March 2021
Haiping Wu
Bin Xiao
Noel Codella
Mengchen Liu
Xiyang Dai
Lu Yuan
Lei Zhang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CvT: Introducing Convolutions to Vision Transformers"
50 / 818 papers shown
Title
TCFormer: Visual Recognition via Token Clustering Transformer
Wang Zeng
Sheng Jin
Lumin Xu
Wentao Liu
Chao Qian
Wanli Ouyang
Ping Luo
Xiaogang Wang
33
3
0
16 Jul 2024
Parameter Efficient Fine Tuning for Multi-scanner PET to PET Reconstruction
Yumin Kim
Gayoon Choi
Seong Jae Hwang
39
0
0
10 Jul 2024
HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation
Guoan Xu
Wenjing Jia
Tao Wu
Ligeng Chen
Guangwei Gao
ViT
38
9
0
10 Jul 2024
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency
Haruna Yunusa
Qin Shiyin
Abdulrahman Hamman Adama Chukkol
Isah Bello
A. Lawan
Isah Bello
46
4
0
10 Jul 2024
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Kazi Sajeed Mehrab
M. Maruf
Arka Daw
Harish Babu Manogaran
Abhilash Neog
...
Paula Mabee
Wasila Dahdul
Anuj Karpatne
Wasila M Dahdul
Anuj Karpatne
41
4
0
10 Jul 2024
CBM: Curriculum by Masking
Andrei Jarca
Florinel-Alin Croitoru
Radu Tudor Ionescu
35
0
0
06 Jul 2024
Kolmogorov-Arnold Convolutions: Design Principles and Empirical Studies
Ivan Drokin
53
19
0
01 Jul 2024
Query-Efficient Hard-Label Black-Box Attack against Vision Transformers
Chao Zhou
Xiaowen Shi
Yuan-Gen Wang
ViT
AAML
29
0
0
29 Jun 2024
Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads
Ali Khaleghi Rahimian
Manish Kumar Govind
Subhajit Maity
Dominick Reilly
Christian Kummerle
Srijan Das
A. Dutta
43
1
0
27 Jun 2024
Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes
Qi Ma
Danda Pani Paudel
E. Konukoglu
Luc Van Gool
40
6
0
25 Jun 2024
A Primal-Dual Framework for Transformers and Neural Networks
Tan M. Nguyen
Tam Nguyen
Nhat Ho
Andrea L. Bertozzi
Richard G. Baraniuk
Stanley J. Osher
ViT
29
13
0
19 Jun 2024
Learning to Adapt Foundation Model DINOv2 for Capsule Endoscopy Diagnosis
Bowen Zhang
Ying Chen
Long Bai
Yan Zhao
Yuxiang Sun
Yixuan Yuan
Jianhua Zhang
Hongliang Ren
40
4
0
15 Jun 2024
AdaNCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer
Yitao Xu
Tong Zhang
Sabine Süsstrunk
ViT
47
0
0
12 Jun 2024
Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking
Xiangyang Yang
Dan Zeng
Xucheng Wang
You Wu
Hengzhou Ye
Qijun Zhao
Shuiwang Li
59
3
0
12 Jun 2024
You Only Need Less Attention at Each Stage in Vision Transformers
Shuoxi Zhang
Hanpeng Liu
Stephen Lin
Kun He
53
5
0
01 Jun 2024
Automatic Channel Pruning for Multi-Head Attention
Eunho Lee
Youngbae Hwang
ViT
40
1
0
31 May 2024
Optimizing Foundation Model Inference on a Many-tiny-core Open-source RISC-V Platform
Viviane Potocnik
Luca Colagrande
Tim Fischer
L. Bertaccini
Daniele Jahier Pagliari
Alessio Burrello
Luca Benini
23
3
0
29 May 2024
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao
Xinggang Wang
Lianghui Zhu
Qian Zhang
Chang Huang
57
4
0
28 May 2024
XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser
Xianfu Cheng
Hang Zhang
Jian Yang
Xiang Li
Weixiao Zhou
...
Fei Liu
Wei Zhang
Tao Sun
Tongliang Li
Zhoujun Li
52
2
0
27 May 2024
ETTrack: Enhanced Temporal Motion Predictor for Multi-Object Tracking
Xudong Han
Nobuyuki Oishi
Yueying Tian
Elif Ucurum
R. Young
C. Chatwin
Philip Birch
40
3
0
24 May 2024
YOLOv10: Real-Time End-to-End Object Detection
Ao Wang
Hui Chen
Lihao Liu
Kai Chen
Zijia Lin
Jungong Han
Guiguang Ding
3DH
43
916
0
23 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
82
42
0
23 May 2024
CSTA: CNN-based Spatiotemporal Attention for Video Summarization
Jaewon Son
Jaehun Park
Kwangsu Kim
AI4TS
ViT
39
8
0
20 May 2024
Stereo-Knowledge Distillation from dpMV to Dual Pixels for Light Field Video Reconstruction
Aryan Garg
Raghav Mallampali
Akshat Joshi
Shrisudhan Govindarajan
Kaushik Mitra
39
0
0
20 May 2024
GestFormer: Multiscale Wavelet Pooling Transformer Network for Dynamic Hand Gesture Recognition
Mallika Garg
Debashis Ghosh
P. M. Pradhan
SLR
ViT
44
2
0
18 May 2024
All in One Framework for Multimodal Re-identification in the Wild
He Li
Mang Ye
Ming Zhang
Bo Du
35
9
0
08 May 2024
Examining Changes in Internal Representations of Continual Learning Models Through Tensor Decomposition
Nishant Suresh Aswani
Amira Guesmi
Muhammad Abdullah Hanif
Muhammad Shafique
CLL
30
1
0
06 May 2024
A separability-based approach to quantifying generalization: which layer is best?
Luciano Dyballa
Evan Gerritz
Steven W. Zucker
OOD
37
3
0
02 May 2024
Fusing Depthwise and Pointwise Convolutions for Efficient Inference on GPUs
Fareed Qararyah
M. Azhar
Mohammad Ali Maleki
Pedro Trancoso
29
1
0
30 Apr 2024
ShadowMaskFormer: Mask Augmented Patch Embeddings for Shadow Removal
Zhuohao Li
Guoyang Xie
Guannan Jiang
Zhichao Lu
36
3
0
29 Apr 2024
GLIMS: Attention-Guided Lightweight Multi-Scale Hybrid Network for Volumetric Semantic Segmentation
Z. A. Yazici
Ilkay Oksuz
H. K. Ekenel
MedIm
38
7
0
27 Apr 2024
PromptCIR: Blind Compressed Image Restoration with Prompt Learning
Bingchen Li
Xin Li
Yiting Lu
Ruoyu Feng
Mengxi Guo
Shijie Zhao
Li Zhang
Zhibo Chen
39
13
0
26 Apr 2024
MathNet: A Data-Centric Approach for Printed Mathematical Expression Recognition
Felix M. Schmitt-Koopmann
Elaine M. Huang
Hans-Peter Hutter
Thilo Stadelmann
Alireza Darvishy
32
4
0
21 Apr 2024
Nested-TNT: Hierarchical Vision Transformers with Multi-Scale Feature Processing
Yuang Liu
Zhiheng Qiu
Xiaokai Qin
ViT
31
0
0
20 Apr 2024
An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training
Jin Gao
Shubo Lin
Shaoru Wang
Yutong Kou
Zeming Li
Liang Li
Congxuan Zhang
Xiaoqin Zhang
Yizheng Wang
Weiming Hu
47
1
0
18 Apr 2024
Training Transformer Models by Wavelet Losses Improves Quantitative and Visual Performance in Single Image Super-Resolution
Cansu Korkmaz
A. Murat Tekalp
ViT
44
6
0
17 Apr 2024
Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers
Diana-Nicoleta Grigore
Mariana-Iuliana Georgescu
J. A. Justo
T. Johansen
Andreea-Iuliana Ionescu
Radu Tudor Ionescu
36
0
0
14 Apr 2024
TSLANet: Rethinking Transformers for Time Series Representation Learning
Emadeldeen Eldele
Mohamed Ragab
Zhenghua Chen
Min-man Wu
Xiaoli Li
AI4TS
AIFin
36
37
0
12 Apr 2024
Robust feature knowledge distillation for enhanced performance of lightweight crack segmentation models
Zhaohui Chen
Elyas Asadi Shamsabadi
Sheng Jiang
Luming Shen
Daniel Dias-da-Costa
29
2
0
09 Apr 2024
Using Few-Shot Learning to Classify Primary Lung Cancer and Other Malignancy with Lung Metastasis in Cytological Imaging via Endobronchial Ultrasound Procedures
Ching-Kai Lin
Di-Chun Wei
Yun-Chien Cheng
37
0
0
09 Apr 2024
Lightweight Deep Learning for Resource-Constrained Environments: A Survey
Hou-I Liu
Marco Galindo
Hongxia Xie
Lai-Kuan Wong
Hong-Han Shuai
Yung-Hui Li
Wen-Huang Cheng
58
48
0
08 Apr 2024
HSViT: Horizontally Scalable Vision Transformer
Chenhao Xu
Chang-Tsun Li
Chee Peng Lim
Douglas Creighton
ViT
34
2
0
08 Apr 2024
GvT: A Graph-based Vision Transformer with Talking-Heads Utilizing Sparsity, Trained from Scratch on Small Datasets
Dongjing Shan
guiqiang chen
ViT
45
0
0
07 Apr 2024
Learning Correlation Structures for Vision Transformers
Manjin Kim
Paul Hongsuck Seo
Cordelia Schmid
Minsu Cho
ViT
40
7
0
05 Apr 2024
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jienneg Chen
Qihang Yu
Xiaohui Shen
Alan L. Yuille
Liang-Chieh Chen
3DV
VLM
36
24
0
02 Apr 2024
Structured Initialization for Attention in Vision Transformers
Jianqiao Zheng
Xueqian Li
Simon Lucey
ViT
26
1
0
01 Apr 2024
Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping
Hyeongjun Kwon
Jinhyun Jang
Jin-Hwa Kim
Kwonyoung Kim
Kwanghoon Sohn
43
1
0
01 Apr 2024
IPT-V2: Efficient Image Processing Transformer using Hierarchical Attentions
Zhijun Tu
Kunpeng Du
Hanting Chen
Hai-lin Wang
Wei Li
Jie Hu
Yunhe Wang
ViT
44
4
0
31 Mar 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
A. Kazerouni
I. Hacihaliloglu
Dorit Merhof
51
7
0
28 Mar 2024
Heracles: A Hybrid SSM-Transformer Model for High-Resolution Image and Time-Series Analysis
Badri N. Patro
Suhas Ranganath
Vinay P. Namboodiri
Vijay Srinivas Agneeswaran
43
2
0
26 Mar 2024
Previous
1
2
3
4
5
6
...
15
16
17
Next