ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.15808
  4. Cited By
CvT: Introducing Convolutions to Vision Transformers

CvT: Introducing Convolutions to Vision Transformers

29 March 2021
Haiping Wu
Bin Xiao
Noel Codella
Mengchen Liu
Xiyang Dai
Lu Yuan
Lei Zhang
    ViT
ArXivPDFHTML

Papers citing "CvT: Introducing Convolutions to Vision Transformers"

50 / 819 papers shown
Title
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing
  Mechanisms in Sequence Learning
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
Aniket Didolkar
Kshitij Gupta
Anirudh Goyal
Nitesh B. Gundavarapu
Alex Lamb
Nan Rosemary Ke
Yoshua Bengio
AI4CE
118
17
0
30 May 2022
WaveMix: A Resource-efficient Neural Network for Image Analysis
WaveMix: A Resource-efficient Neural Network for Image Analysis
Pranav Jeevan
Kavitha Viswanathan
S. AnanduA
A. Sethi
26
20
0
28 May 2022
Future Transformer for Long-term Action Anticipation
Future Transformer for Long-term Action Anticipation
Dayoung Gong
Joonseok Lee
Manjin Kim
S. Ha
Minsu Cho
AI4TS
16
61
0
27 May 2022
Green Hierarchical Vision Transformer for Masked Image Modeling
Green Hierarchical Vision Transformer for Masked Image Modeling
Lang Huang
Shan You
Mingkai Zheng
Fei Wang
Chao Qian
T. Yamasaki
35
68
0
26 May 2022
Fast Vision Transformers with HiLo Attention
Fast Vision Transformers with HiLo Attention
Zizheng Pan
Jianfei Cai
Bohan Zhuang
28
152
0
26 May 2022
Concurrent Neural Tree and Data Preprocessing AutoML for Image
  Classification
Concurrent Neural Tree and Data Preprocessing AutoML for Image Classification
Anish Thite
Mohan Dodda
Pulak Agarwal
Jason Zutty
38
3
0
25 May 2022
Inception Transformer
Inception Transformer
Chenyang Si
Weihao Yu
Pan Zhou
Yichen Zhou
Xinchao Wang
Shuicheng Yan
ViT
37
187
0
25 May 2022
MoCoViT: Mobile Convolutional Vision Transformer
Hailong Ma
Xin Xia
Xing Wang
Xuefeng Xiao
Jiashi Li
Min Zheng
ViT
37
18
0
25 May 2022
VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal
  Document Classification
VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification
Souhail Bakkali
Zuheng Ming
Mickael Coustaty
Marccal Rusinol
O. R. Terrades
VLM
51
30
0
24 May 2022
Transformer based Generative Adversarial Network for Liver Segmentation
Transformer based Generative Adversarial Network for Liver Segmentation
Ugur Demir
Zheyu Zhang
Bin Wang
M. Antalek
Elif Keles
Debesh Jha
Amir Borhani
Daniela Ladner
Ulas Bagci
ViT
MedIm
44
11
0
21 May 2022
Boosting Camouflaged Object Detection with Dual-Task Interactive
  Transformer
Boosting Camouflaged Object Detection with Dual-Task Interactive Transformer
Zheng Liu
Zhili Zhang
Wei Wu
32
46
0
21 May 2022
Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision
  Transformers with Locality
Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality
Xiang Li
Wenhai Wang
Lingfeng Yang
Jian Yang
119
73
0
20 May 2022
TRT-ViT: TensorRT-oriented Vision Transformer
TRT-ViT: TensorRT-oriented Vision Transformer
Xin Xia
Jiashi Li
Jie Wu
Xing Wang
Xuefeng Xiao
Min Zheng
Rui Wang
ViT
23
27
0
19 May 2022
Learning Rate Curriculum
Learning Rate Curriculum
Florinel-Alin Croitoru
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
N. Sebe
17
9
0
18 May 2022
Vision Transformer Adapter for Dense Predictions
Vision Transformer Adapter for Dense Predictions
Zhe Chen
Yuchen Duan
Wenhai Wang
Junjun He
Tong Lu
Jifeng Dai
Yu Qiao
45
543
0
17 May 2022
POViT: Vision Transformer for Multi-objective Design and Characterization of Nanophotonic Devices
Xinyu Chen
Renjie Li
Yueyao Yu
Yuanwen Shen
Wenye Li
Zhaoyu Zhang
Yin Zhang
ViT
28
1
0
17 May 2022
ShiftAddNAS: Hardware-Inspired Search for More Accurate and Efficient Neural Networks
ShiftAddNAS: Hardware-Inspired Search for More Accurate and Efficient Neural Networks
Haoran You
Baopu Li
Huihong Shi
Y. Fu
Yingyan Lin
49
17
0
17 May 2022
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised
  Semantic Segmentation and Localization
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization
Luke Melas-Kyriazi
Christian Rupprecht
Iro Laina
Andrea Vedaldi
30
160
0
16 May 2022
Transformers in 3D Point Clouds: A Survey
Transformers in 3D Point Clouds: A Survey
Dening Lu
Qian Xie
Mingqiang Wei
Kyle Gao
Linlin Xu
Jonathan Li
3DPC
ViT
32
49
0
16 May 2022
Activating More Pixels in Image Super-Resolution Transformer
Activating More Pixels in Image Super-Resolution Transformer
Xiangyu Chen
Xintao Wang
Jiantao Zhou
Yu Qiao
Chao Dong
ViT
79
602
0
09 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders
ConvMAE: Masked Convolution Meets Masked Autoencoders
Peng Gao
Teli Ma
Hongsheng Li
Ziyi Lin
Jifeng Dai
Yu Qiao
ViT
19
121
0
08 May 2022
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision
  Transformers
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
Junting Pan
Adrian Bulat
Fuwen Tan
Xiatian Zhu
L. Dudziak
Hongsheng Li
Georgios Tzimiropoulos
Brais Martínez
ViT
31
181
0
06 May 2022
Symmetric Transformer-based Network for Unsupervised Image Registration
Symmetric Transformer-based Network for Unsupervised Image Registration
Mingrui Ma
Lei Song
Yuanbo Xu
Gui-Xian Liu
ViT
MedIm
27
36
0
28 Apr 2022
Self-Supervised Learning of Object Parts for Semantic Segmentation
Self-Supervised Learning of Object Parts for Semantic Segmentation
A. Ziegler
Yuki M. Asano
SSL
OCL
26
101
0
27 Apr 2022
DearKD: Data-Efficient Early Knowledge Distillation for Vision
  Transformers
DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers
Xianing Chen
Qiong Cao
Yujie Zhong
Jing Zhang
Shenghua Gao
Dacheng Tao
ViT
40
76
0
27 Apr 2022
Adaptive Split-Fusion Transformer
Adaptive Split-Fusion Transformer
Zixuan Su
Hao Zhang
Jingjing Chen
Lei Pang
Chong-Wah Ngo
Yu-Gang Jiang
ViT
27
7
0
26 Apr 2022
TranSiam: Fusing Multimodal Visual Features Using Transformer for
  Medical Image Segmentation
TranSiam: Fusing Multimodal Visual Features Using Transformer for Medical Image Segmentation
Xia Li
Shiqiang Ma
Jijun Tang
Fei Guo
ViT
MedIm
22
9
0
26 Apr 2022
Deeper Insights into the Robustness of ViTs towards Common Corruptions
Deeper Insights into the Robustness of ViTs towards Common Corruptions
Rui Tian
Zuxuan Wu
Qi Dai
Han Hu
Yu-Gang Jiang
ViT
AAML
24
4
0
26 Apr 2022
High-Efficiency Lossy Image Coding Through Adaptive Neighborhood
  Information Aggregation
High-Efficiency Lossy Image Coding Through Adaptive Neighborhood Information Aggregation
Ming Lu
Fangdong Chen
Shiliang Pu
Zhan Ma
39
44
0
25 Apr 2022
Residual Mixture of Experts
Residual Mixture of Experts
Lemeng Wu
Mengchen Liu
Yinpeng Chen
Dongdong Chen
Xiyang Dai
Lu Yuan
MoE
22
36
0
20 Apr 2022
DecBERT: Enhancing the Language Understanding of BERT with Causal
  Attention Masks
DecBERT: Enhancing the Language Understanding of BERT with Causal Attention Masks
Ziyang Luo
Yadong Xi
Jing Ma
Zhiwei Yang
Xiaoxi Mao
Changjie Fan
Rongsheng Zhang
19
3
0
19 Apr 2022
Not All Tokens Are Equal: Human-centric Visual Analysis via Token
  Clustering Transformer
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
Wang Zeng
Sheng Jin
Wentao Liu
Chao Qian
Ping Luo
Ouyang Wanli
Xiaogang Wang
ViT
23
120
0
19 Apr 2022
The Devil is in the Frequency: Geminated Gestalt Autoencoder for
  Self-Supervised Visual Pre-Training
The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training
Hao Liu
Xinghua Jiang
Xin Li
Antai Guo
Deqiang Jiang
Bo Ren
32
37
0
18 Apr 2022
ResT V2: Simpler, Faster and Stronger
ResT V2: Simpler, Faster and Stronger
Qing-Long Zhang
Yubin Yang
ViT
35
25
0
15 Apr 2022
MiniViT: Compressing Vision Transformers with Weight Multiplexing
MiniViT: Compressing Vision Transformers with Weight Multiplexing
Jinnian Zhang
Houwen Peng
Kan Wu
Mengchen Liu
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
28
124
0
14 Apr 2022
Neighborhood Attention Transformer
Neighborhood Attention Transformer
Ali Hassani
Steven Walton
Jiacheng Li
Shengjia Li
Humphrey Shi
ViT
AI4TS
36
254
0
14 Apr 2022
DeiT III: Revenge of the ViT
DeiT III: Revenge of the ViT
Hugo Touvron
Matthieu Cord
Hervé Jégou
ViT
48
391
0
14 Apr 2022
SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient
  object detection
SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection
Zhengyi Liu
Yacheng Tan
Qian He
Yun Xiao
ViT
25
225
0
12 Apr 2022
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
Wenqiang Zhang
Zilong Huang
Guozhong Luo
Tao Chen
Xinggang Wang
Wenyu Liu
Gang Yu
Chunhua Shen
ViT
27
199
0
12 Apr 2022
Linear Complexity Randomized Self-attention Mechanism
Linear Complexity Randomized Self-attention Mechanism
Lin Zheng
Chong-Jun Wang
Lingpeng Kong
22
31
0
10 Apr 2022
Multimodal Multi-Head Convolutional Attention with Various Kernel Sizes
  for Medical Image Super-Resolution
Multimodal Multi-Head Convolutional Attention with Various Kernel Sizes for Medical Image Super-Resolution
Mariana-Iuliana Georgescu
Radu Tudor Ionescu
A. Miron
O. Savencu
Nicolae-Cătălin Ristea
N. Verga
Fahad Shahbaz Khan
SupR
26
46
0
08 Apr 2022
DaViT: Dual Attention Vision Transformers
DaViT: Dual Attention Vision Transformers
Mingyu Ding
Bin Xiao
Noel Codella
Ping Luo
Jingdong Wang
Lu Yuan
ViT
51
242
0
07 Apr 2022
Unified Contrastive Learning in Image-Text-Label Space
Unified Contrastive Learning in Image-Text-Label Space
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Bin Xiao
Ce Liu
Lu Yuan
Jianfeng Gao
VLM
SSL
51
221
0
07 Apr 2022
MixFormer: Mixing Features across Windows and Dimensions
MixFormer: Mixing Features across Windows and Dimensions
Qiang Chen
Qiman Wu
Jian Wang
Qinghao Hu
T. Hu
Errui Ding
Jian Cheng
Jingdong Wang
MDE
ViT
31
103
0
06 Apr 2022
SE(3)-Equivariant Attention Networks for Shape Reconstruction in
  Function Space
SE(3)-Equivariant Attention Networks for Shape Reconstruction in Function Space
Evangelos Chatzipantazis
Stefanos Pertigkiozoglou
Yan Sun
Kostas Daniilidis
3DPC
39
30
0
05 Apr 2022
MaxViT: Multi-Axis Vision Transformer
MaxViT: Multi-Axis Vision Transformer
Zhengzhong Tu
Hossein Talebi
Han Zhang
Feng Yang
P. Milanfar
A. Bovik
Yinxiao Li
ViT
62
636
0
04 Apr 2022
Matching Feature Sets for Few-Shot Image Classification
Matching Feature Sets for Few-Shot Image Classification
Arman Afrasiyabi
Hugo Larochelle
Jean-François Lalonde
Christian Gagné
VLM
33
72
0
02 Apr 2022
CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object
  Detection
CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection
Yanan Zhang
Jiaxin Chen
Di Huang
ViT
3DPC
37
59
0
01 Apr 2022
Hybrid Handcrafted and Learnable Audio Representation for Analysis of
  Speech Under Cognitive and Physical Load
Hybrid Handcrafted and Learnable Audio Representation for Analysis of Speech Under Cognitive and Physical Load
Gasser Elbanna
A. Biryukov
Neil Scheidwasser
Lara Orlandic
Pablo Mainar
M. Kegler
P. Beckmann
Milos Cernak
25
11
0
30 Mar 2022
ITTR: Unpaired Image-to-Image Translation with Transformers
ITTR: Unpaired Image-to-Image Translation with Transformers
Wanfeng Zheng
Qiang Li
Guoxin Zhang
Pengfei Wan
Zhong-ming Wang
ViT
48
17
0
30 Mar 2022
Previous
123...111213...151617
Next