ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.15808
  4. Cited By
CvT: Introducing Convolutions to Vision Transformers

CvT: Introducing Convolutions to Vision Transformers

29 March 2021
Haiping Wu
Bin Xiao
Noel Codella
Mengchen Liu
Xiyang Dai
Lu Yuan
Lei Zhang
    ViT
ArXivPDFHTML

Papers citing "CvT: Introducing Convolutions to Vision Transformers"

50 / 818 papers shown
Title
Frontiers in Intelligent Colonoscopy
Frontiers in Intelligent Colonoscopy
Ge-Peng Ji
Jingyi Liu
Peng-Tao Xu
Nick Barnes
F. Khan
Salman Khan
Deng-Ping Fan
46
4
0
22 Oct 2024
Rethinking Transformer for Long Contextual Histopathology Whole Slide
  Image Analysis
Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Pingyi Chen
Zhongyi Shui
Chenglu Zhu
Lin Yang
MedIm
41
4
0
18 Oct 2024
Improving Vision Transformers by Overlapping Heads in Multi-Head Self-Attention
Improving Vision Transformers by Overlapping Heads in Multi-Head Self-Attention
Tianxiao Zhang
Bo Luo
G. Wang
ViT
21
1
0
18 Oct 2024
On Partial Prototype Collapse in the DINO Family of Self-Supervised
  Methods
On Partial Prototype Collapse in the DINO Family of Self-Supervised Methods
Hariprasath Govindarajan
Per Sidén
Jacob Roll
Fredrik Lindsten
37
2
0
17 Oct 2024
CTA-Net: A CNN-Transformer Aggregation Network for Improving Multi-Scale
  Feature Extraction
CTA-Net: A CNN-Transformer Aggregation Network for Improving Multi-Scale Feature Extraction
Chunlei Meng
Jiacheng Yang
Wei Lin
Bowen Liu
Hongda Zhang
chun ouyang
Zhongxue Gan
ViT
30
2
0
15 Oct 2024
Locality Alignment Improves Vision-Language Models
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Zou
Tatsunori Hashimoto
VLM
70
4
0
14 Oct 2024
BA-Net: Bridge Attention in Deep Neural Networks
BA-Net: Bridge Attention in Deep Neural Networks
Ronghui Zhang
Runzong Zou
Yue Zhao
Zirui Zhang
Junzhou Chen
Yue Cao
Chuan Hu
Houbing Song
36
0
0
10 Oct 2024
Guided Self-attention: Find the Generalized Necessarily Distinct Vectors
  for Grain Size Grading
Guided Self-attention: Find the Generalized Necessarily Distinct Vectors for Grain Size Grading
Fang Gao
XueTao Li
Jiabao Wang
Shengheng Ma
Jun Yu
28
0
0
08 Oct 2024
ResTNet: Defense against Adversarial Policies via Transformer in
  Computer Go
ResTNet: Defense against Adversarial Policies via Transformer in Computer Go
Tai-Lin Wu
Ti-Rong Wu
Chung-Chin Shih
Yan-Ru Ju
I-Chen Wu
AAML
28
0
0
07 Oct 2024
SynCo: Synthetic Hard Negatives in Contrastive Learning for Better
  Unsupervised Visual Representations
SynCo: Synthetic Hard Negatives in Contrastive Learning for Better Unsupervised Visual Representations
Nikolaos Giakoumoglou
Tania Stathaki
SSL
46
2
0
03 Oct 2024
Beyond Skip Connection: Pooling and Unpooling Design for Elimination
  Singularities
Beyond Skip Connection: Pooling and Unpooling Design for Elimination Singularities
Chengkun Sun
Jinqian Pan
Juoli Jin
Russell Stevens Terry
Jiang Bian
Jie Xu
22
0
0
20 Sep 2024
Sparks of Artificial General Intelligence(AGI) in Semiconductor Material
  Science: Early Explorations into the Next Frontier of Generative AI-Assisted
  Electron Micrograph Analysis
Sparks of Artificial General Intelligence(AGI) in Semiconductor Material Science: Early Explorations into the Next Frontier of Generative AI-Assisted Electron Micrograph Analysis
Sakhinana Sagar Srinivas
Geethan Sannidhi
Sreeja Gangasani
Chidaksh Ravuru
Venkataramana Runkana
33
0
0
17 Sep 2024
GLCONet: Learning Multi-source Perception Representation for Camouflaged
  Object Detection
GLCONet: Learning Multi-source Perception Representation for Camouflaged Object Detection
Yanguang Sun
Hanyu Xuan
Jian Yang
Lei Luo
ObjD
45
2
0
15 Sep 2024
Domain-Invariant Representation Learning of Bird Sounds
Domain-Invariant Representation Learning of Bird Sounds
Ilyass Moummad
Romain Serizel
Emmanouil Benetos
Nicolas Farrugia
SSL
40
2
0
13 Sep 2024
SDformer: Efficient End-to-End Transformer for Depth Completion
SDformer: Efficient End-to-End Transformer for Depth Completion
Jian Qian
Miao Sun
Ashley Lee
Jie Li
Shenglong Zhuo
Patrick Chiang
ViT
MDE
39
2
0
12 Sep 2024
ASSNet: Adaptive Semantic Segmentation Network for Microtumors and
  Multi-Organ Segmentation
ASSNet: Adaptive Semantic Segmentation Network for Microtumors and Multi-Organ Segmentation
Fuchen Zheng
Xinyi Chen
Xuhang Chen
Haolun Li
Xiaojiao Guo
Guoheng Huang
Chi-Man Pun
Shoujun Zhou
ViT
MedIm
27
0
0
12 Sep 2024
Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy
Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy
Bojian Li
Bo Liu
Jinghua Yue
F. Zhou
Fugen Zhou
MedIm
MDE
53
2
0
12 Sep 2024
PanAdapter: Two-Stage Fine-Tuning with Spatial-Spectral Priors Injecting
  for Pansharpening
PanAdapter: Two-Stage Fine-Tuning with Spatial-Spectral Priors Injecting for Pansharpening
Ruocheng Wu
ZiEn Zhang
ShangQi Deng
YuLe Duan
LiangJian Deng
45
0
0
11 Sep 2024
Brain-Inspired Stepwise Patch Merging for Vision Transformers
Brain-Inspired Stepwise Patch Merging for Vision Transformers
Yonghao Yu
Dongcheng Zhao
Guobin Shen
Yiting Dong
Yi Zeng
58
0
0
11 Sep 2024
Exploring Rich Subjective Quality Information for Image Quality
  Assessment in the Wild
Exploring Rich Subjective Quality Information for Image Quality Assessment in the Wild
Xiongkuo Min
Yixuan Gao
Yuqin Cao
Guangtao Zhai
Wenjun Zhang
Huifang Sun
C. Chen
20
10
0
09 Sep 2024
UNIT: Unifying Image and Text Recognition in One Vision Encoder
UNIT: Unifying Image and Text Recognition in One Vision Encoder
Yi Zhu
Yanpeng Zhou
Chunwei Wang
Yang Cao
Jianhua Han
Lu Hou
Hang Xu
ViT
VLM
34
4
0
06 Sep 2024
MVTN: A Multiscale Video Transformer Network for Hand Gesture
  Recognition
MVTN: A Multiscale Video Transformer Network for Hand Gesture Recognition
Mallika Garg
Debashis Ghosh
P. M. Pradhan
ViT
38
1
0
05 Sep 2024
TBConvL-Net: A Hybrid Deep Learning Architecture for Robust Medical
  Image Segmentation
TBConvL-Net: A Hybrid Deep Learning Architecture for Robust Medical Image Segmentation
Shahzaib Iqbal
Tariq M. Khan
Syed S. Naqvi
Asim Naveed
Erik H. W. Meijering
MedIm
53
6
0
05 Sep 2024
Frequency-Spatial Entanglement Learning for Camouflaged Object Detection
Frequency-Spatial Entanglement Learning for Camouflaged Object Detection
Yanguang Sun
Chunyan Xu
Jian Yang
Hanyu Xuan
Lei Luo
39
13
0
03 Sep 2024
Dreaming is All You Need
Dreaming is All You Need
Mingze Ni
Wei Liu
38
0
0
03 Sep 2024
A Hybrid Transformer-Mamba Network for Single Image Deraining
A Hybrid Transformer-Mamba Network for Single Image Deraining
Shangquan Sun
Wenqi Ren
Juxiang Zhou
Jianhou Gan
Rui Wang
Xiaochun Cao
Mamba
54
5
0
31 Aug 2024
SMAFormer: Synergistic Multi-Attention Transformer for Medical Image Segmentation
SMAFormer: Synergistic Multi-Attention Transformer for Medical Image Segmentation
Fuchen Zheng
Xuhang Chen
Weihuang Liu
Haolun Li
Yingtie Lei
Jiahui He
Chi-Man Pun
Shounjun Zhou
MedIm
29
12
0
31 Aug 2024
Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language
  Instruction Tuning for Semiconductor Electron Micrograph Analysis
Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis
Sakhinana Sagar Srinivas
Chidaksh Ravuru
Geethan Sannidhi
Venkataramana Runkana
43
0
0
27 Aug 2024
Multi-Modal Instruction-Tuning Small-Scale Language-and-Vision Assistant
  for Semiconductor Electron Micrograph Analysis
Multi-Modal Instruction-Tuning Small-Scale Language-and-Vision Assistant for Semiconductor Electron Micrograph Analysis
Sakhinana Sagar Srinivas
Geethan Sannidhi
Venkataramana Runkana
38
1
0
27 Aug 2024
Hierarchical Network Fusion for Multi-Modal Electron Micrograph
  Representation Learning with Foundational Large Language Models
Hierarchical Network Fusion for Multi-Modal Electron Micrograph Representation Learning with Foundational Large Language Models
Sakhinana Sagar Srinivas
Geethan Sannidhi
Venkataramana Runkana
35
0
0
24 Aug 2024
Preliminary Investigations of a Multi-Faceted Robust and Synergistic
  Approach in Semiconductor Electron Micrograph Analysis: Integrating Vision
  Transformers with Large Language and Multimodal Models
Preliminary Investigations of a Multi-Faceted Robust and Synergistic Approach in Semiconductor Electron Micrograph Analysis: Integrating Vision Transformers with Large Language and Multimodal Models
Sakhinana Sagar Srinivas
Geethan Sannidhi
Sreeja Gangasani
Chidaksh Ravuru
Venkataramana Runkana
32
0
0
24 Aug 2024
Foundational Model for Electron Micrograph Analysis: Instruction-Tuning
  Small-Scale Language-and-Vision Assistant for Enterprise Adoption
Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption
Sakhinana Sagar Srinivas
Chidaksh Ravuru
Geethan Sannidhi
Venkataramana Runkana
41
0
0
23 Aug 2024
Vision HgNN: An Electron-Micrograph is Worth Hypergraph of Hypernodes
Vision HgNN: An Electron-Micrograph is Worth Hypergraph of Hypernodes
Sakhinana Sagar Srinivas
Rajat Kumar Sarkar
Sreeja Gangasani
Venkataramana Runkana
38
2
0
21 Aug 2024
sTransformer: A Modular Approach for Extracting Inter-Sequential and
  Temporal Information for Time-Series Forecasting
sTransformer: A Modular Approach for Extracting Inter-Sequential and Temporal Information for Time-Series Forecasting
Jiaheng Yin
Zhengxin Shi
Jianshen Zhang
Xiaomin Lin
Yulin Huang
Yongzhi Qi
Wei Qi
AI4TS
32
0
0
19 Aug 2024
MetaSeg: MetaFormer-based Global Contexts-aware Network for Efficient
  Semantic Segmentation
MetaSeg: MetaFormer-based Global Contexts-aware Network for Efficient Semantic Segmentation
Beoungwoo Kang
Seunghun Moon
Yubin Cho
Hyunwoo Yu
Suk-Ju Kang
ViT
MedIm
29
8
0
14 Aug 2024
Advanced Vision Transformers and Open-Set Learning for Robust Mosquito
  Classification: A Novel Approach to Entomological Studies
Advanced Vision Transformers and Open-Set Learning for Robust Mosquito Classification: A Novel Approach to Entomological Studies
Ahmed Akib Jawad Karim
Muhammad Zawad Mahmud
Riasat Khan
18
0
0
12 Aug 2024
HcNet: Image Modeling with Heat Conduction Equation
HcNet: Image Modeling with Heat Conduction Equation
Zhemin Zhang
Xun Gong
DiffM
3DV
43
0
0
12 Aug 2024
MacFormer: Semantic Segmentation with Fine Object Boundaries
MacFormer: Semantic Segmentation with Fine Object Boundaries
Guoan Xu
Wenfeng Huang
Tao Wu
Ligeng Chen
Wenjing Jia
Guangwei Gao
Xiatian Zhu
Stuart W. Perry
40
0
0
11 Aug 2024
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for
  Efficient Mobile Applications
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications
Tianfang Zhang
Lei Li
Yang Zhou
Wentao Liu
Chen Qian
Xiangyang Ji
ViT
30
12
0
07 Aug 2024
Multi-label Sewer Pipe Defect Recognition with Mask Attention Feature
  Enhancement and Label Correlation Learning
Multi-label Sewer Pipe Defect Recognition with Mask Attention Feature Enhancement and Label Correlation Learning
Xin Zuo
Yu Sheng
Jifeng Shen
Yongwei Shan
17
0
0
01 Aug 2024
Depth-Wise Convolutions in Vision Transformers for Efficient Training on
  Small Datasets
Depth-Wise Convolutions in Vision Transformers for Efficient Training on Small Datasets
Tianxiao Zhang
Wenju Xu
Bo Luo
Guanghui Wang
ViT
MDE
40
7
0
28 Jul 2024
A Survey on Cell Nuclei Instance Segmentation and Classification:
  Leveraging Context and Attention
A Survey on Cell Nuclei Instance Segmentation and Classification: Leveraging Context and Attention
João D. Nunes
D. Montezuma
Domingos Oliveira
Tania Pereira
Jaime S. Cardoso
49
1
0
26 Jul 2024
VSSD: Vision Mamba with Non-Causal State Space Duality
VSSD: Vision Mamba with Non-Causal State Space Duality
Yuheng Shi
Minjing Dong
Mingjia Li
Chang Xu
Mamba
33
3
0
26 Jul 2024
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for
  Vision Transformers
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers
Zhengang Li
Alec Lu
Yanyue Xie
Zhenglun Kong
Mengshu Sun
...
Peiyan Dong
Caiwen Ding
Yanzhi Wang
Xue Lin
Zhenman Fang
34
5
0
25 Jul 2024
How Lightweight Can A Vision Transformer Be
How Lightweight Can A Vision Transformer Be
Jen Hong Tan
ViT
MoE
62
0
0
25 Jul 2024
Embedding-Free Transformer with Inference Spatial Reduction for
  Efficient Semantic Segmentation
Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation
Hyunwoo Yu
Yubin Cho
Beoungwoo Kang
Seunghun Moon
Kyeongbo Kong
Suk-Ju Kang
30
3
0
24 Jul 2024
HERGen: Elevating Radiology Report Generation with Longitudinal Data
HERGen: Elevating Radiology Report Generation with Longitudinal Data
Fuying Wang
Shenghui Du
Lequan Yu
MedIm
45
5
0
21 Jul 2024
DuoFormer: Leveraging Hierarchical Visual Representations by Local and
  Global Attention
DuoFormer: Leveraging Hierarchical Visual Representations by Local and Global Attention
Xiaoya Tang
Bodong Zhang
Beatrice S. Knudsen
Tolga Tasdizen
ViT
MedIm
50
1
0
18 Jul 2024
SegPoint: Segment Any Point Cloud via Large Language Model
SegPoint: Segment Any Point Cloud via Large Language Model
Shuting He
Henghui Ding
Xudong Jiang
Bihan Wen
3DV
MLLM
3DPC
48
18
0
18 Jul 2024
AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an
  Efficient Alternative to Attention in ViTs
AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an Efficient Alternative to Attention in ViTs
Yunling Zheng
Zeyi Xu
Fanghui Xue
Biao Yang
Jiancheng Lyu
Shuai Zhang
Y. Qi
Jack Xin
53
0
0
16 Jul 2024
Previous
12345...151617
Next