Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.07392
Cited By
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
12 March 2024
Chunlong Xia
Xinliang Wang
Feng Lv
Xin Hao
Yifeng Shi
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions"
30 / 30 papers shown
Title
Hyb-KAN ViT: Hybrid Kolmogorov-Arnold Networks Augmented Vision Transformer
Sainath Dey
Mitul Goswami
Jashika Sethi
Prasant Kumar Pattnaik
ViT
30
0
0
07 May 2025
Pets: General Pattern Assisted Architecture For Time Series Analysis
Xiangkai Ma
Xiaobin Hong
Wenzhong Li
Sanglu Lu
AI4TS
32
0
0
19 Apr 2025
Collaborative Perception Datasets for Autonomous Driving: A Review
N. Wang
Deyong Shang
Yan Gong
X. S. Hu
Ziying Song
Lei Yang
Y. Huang
Xiaoyu Wang
J. Lu
39
0
0
17 Apr 2025
Your ViT is Secretly an Image Segmentation Model
Tommie Kerssies
Niccolò Cavagnero
Alexander Hermans
Narges Norouzi
Giuseppe Averta
Bastian Leibe
Gijs Dubbelman
Daan de Geus
ViT
VLM
64
1
0
24 Mar 2025
SketchFusion: Learning Universal Sketch Features through Fusing Foundation Models
Subhadeep Koley
Tapas Kumar Dutta
Aneeshan Sain
Pinaki Nath Chowdhury
A. Bhunia
Yi-Zhe Song
VLM
66
0
0
18 Mar 2025
SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures
Hui Liu
Chen Jia
Fan Shi
Xu Cheng
Shengyong Chen
Mamba
47
0
0
03 Mar 2025
Fully Exploiting Vision Foundation Model's Profound Prior Knowledge for Generalizable RGB-Depth Driving Scene Parsing
Sicen Guo
Tianyou Wen
Chuang-Wei Liu
Qijun Chen
Rui Fan
57
0
0
10 Feb 2025
OT-Transformer: A Continuous-time Transformer Architecture with Optimal Transport Regularization
Kelvin Kan
Xingjian Li
Stanley Osher
91
2
0
30 Jan 2025
Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation
Lin Chen
Qi Yang
Kun Ding
Z. Li
Gang Shen
Fei Li
Qiyuan Cao
Shiming Xiang
VLM
56
0
0
29 Jan 2025
Prion-ViT: Prions-Inspired Vision Transformers for Temperature prediction with Specklegrams
Abhishek Sebastian
Pragna R
Sonaa Rajagopal
Muralikrishnan Mani
58
0
0
28 Jan 2025
Modulating CNN Features with Pre-Trained ViT Representations for Open-Vocabulary Object Detection
Xiangyu Gao
Yu Dai
Benliu Qiu
Hongliang Li
Heqian Qiu
Hongliang Li
ObjD
VLM
145
0
0
28 Jan 2025
PARF-Net: integrating pixel-wise adaptive receptive fields into hybrid Transformer-CNN network for medical image segmentation
Xu Ma
Mengsheng Chen
Junhui Zhang
Lijuan Song
Fang Du
Zhenhua Yu
ViT
MedIm
31
0
0
06 Jan 2025
SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
Yunxiang Fu
Meng Lou
Yizhou Yu
115
1
0
16 Dec 2024
RT-DETRv3: Real-time End-to-End Object Detection with Hierarchical Dense Positive Supervision
Shuo Wang
Chunlong Xia
Feng Lv
Yifeng Shi
PINN
ViT
MU
35
3
0
13 Sep 2024
Token Turing Machines are Efficient Vision Models
Purvish Jajal
Nick Eliopoulos
Benjamin Shiue-Hal Chou
George K. Thiravathukal
James C. Davis
Yung-Hsiang Lu
93
0
0
11 Sep 2024
iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation
Hayeon Jo
Hyesong Choi
Minhee Cho
Dongbo Min
36
1
0
04 Sep 2024
MacFormer: Semantic Segmentation with Fine Object Boundaries
Guoan Xu
Wenfeng Huang
Tao Wu
Ligeng Chen
Wenjing Jia
Guangwei Gao
Xiatian Zhu
Stuart W. Perry
40
0
0
11 Aug 2024
DuoFormer: Leveraging Hierarchical Visual Representations by Local and Global Attention
Xiaoya Tang
Bodong Zhang
Beatrice S. Knudsen
Tolga Tasdizen
ViT
MedIm
47
1
0
18 Jul 2024
Learning Spatial-Semantic Features for Robust Video Object Segmentation
Xin Li
Deshui Miao
Zhenyu He
Y. Wang
Huchuan Lu
Ming Yang
VOS
53
4
0
10 Jul 2024
Brain Tumor Classification using Vision Transformer with Selective Cross-Attention Mechanism and Feature Calibration
M. Khaniki
Alireza Golkarieh
Mohammad Manthouri
MedIm
26
4
0
25 Jun 2024
RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization
Mingshu Zhao
Yi Luo
Yong Ouyang
37
2
0
23 Jun 2024
1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation
Qingfeng Liu
Mostafa El-Khamy
Kee-Bong Song
42
0
0
08 Jun 2024
Parameter-Inverted Image Pyramid Networks
Xizhou Zhu
Xue Yang
Zhaokai Wang
Hao Li
Wenhan Dou
Junqi Ge
Lewei Lu
Yu Qiao
Jifeng Dai
47
0
0
06 Jun 2024
UniRGB-IR: A Unified Framework for Visible-Infrared Semantic Tasks via Adapter Tuning
Maoxun Yuan
Bo Cui
Tianyi Zhao
Xingxing Wei
Shan Fu
Xue Yang
Xingxing Wei
43
0
0
26 Apr 2024
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Ali Behrouz
Michele Santacatterina
Ramin Zabih
44
31
0
29 Mar 2024
C
2
\mathbf{C}^2
C
2
Former: Calibrated and Complementary Transformer for RGB-Infrared Object Detection
Maoxun Yuan
Xingxing Wei
ViT
23
38
0
28 Jun 2023
CMT: Convolutional Neural Networks Meet Vision Transformers
Jianyuan Guo
Kai Han
Han Wu
Yehui Tang
Chunjing Xu
Yunhe Wang
Chang Xu
ViT
349
500
0
13 Jul 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
277
3,622
0
24 Feb 2021
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
253
1,827
0
18 Aug 2016
You Only Look Once: Unified, Real-Time Object Detection
Joseph Redmon
S. Divvala
Ross B. Girshick
Ali Farhadi
ObjD
289
36,320
0
08 Jun 2015
1