Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.16527
Cited By
Exploring Plain Vision Transformer Backbones for Object Detection
30 March 2022
Yanghao Li
Hanzi Mao
Ross B. Girshick
Kaiming He
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Exploring Plain Vision Transformer Backbones for Object Detection"
50 / 131 papers shown
Title
Causal Prompt Calibration Guided Segment Anything Model for Open-Vocabulary Multi-Entity Segmentation
Jingyao Wang
Jianqi Zhang
Wenwen Qiang
Changwen Zheng
VLM
37
0
0
10 May 2025
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation
Volodymyr Havrylov
Haiwen Huang
Dan Zhang
Andreas Geiger
111
0
0
04 May 2025
CDFormer: Cross-Domain Few-Shot Object Detection Transformer Against Feature Confusion
Boyuan Meng
X. Zhang
Peilin Li
Zhe Wu
Yiming Li
Wenkai Zhao
B. Yu
Hui-Liang Shen
ViT
72
0
0
02 May 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
103
0
0
17 Apr 2025
UniCon: Unidirectional Information Flow for Effective Control of Large-Scale Diffusion Models
Fanghua Yu
Jinjin Gu
Jinfan Hu
Zheyuan Li
Chao Dong
DiffM
50
0
0
21 Mar 2025
STEP: Simultaneous Tracking and Estimation of Pose for Animals and Humans
Shashikant Verma
Harish Katti
Soumyaratna Debnath
Yamuna Swamy
S. Raman
148
0
0
17 Mar 2025
Spiking Transformer:Introducing Accurate Addition-Only Spiking Self-Attention for Transformer
Yufei Guo
Xiaode Liu
Y. Chen
Weihang Peng
Yuhan Zhang
Zhe Ma
MQ
43
0
0
28 Feb 2025
MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Haoran Tang
Meng Cao
Jinfa Huang
Ruyang Liu
Peng Jin
Ge Li
Xiaodan Liang
Mamba
94
4
0
24 Feb 2025
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Benedikt Alkin
Lukas Miklautz
Sepp Hochreiter
Johannes Brandstetter
VLM
65
8
0
24 Feb 2025
Janus: Collaborative Vision Transformer Under Dynamic Network Environment
Linyi Jiang
Silvery Fu
Yifei Zhu
Bo Li
ViT
147
0
0
14 Feb 2025
UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation
Tao Zhang
Jinyong Wen
Zhen Chen
Kun Ding
S. Xiang
Chunhong Pan
72
1
0
04 Feb 2025
Enhancing SAR Object Detection with Self-Supervised Pre-training on Masked Auto-Encoders
Xinyang Pu
Feng Xu
34
0
0
20 Jan 2025
Kolmogorov-Arnold Network for Remote Sensing Image Semantic Segmentation
Xianping Ma
Ziyao Wang
Yin Hu
Xiaokang Zhang
Man-On Pun
46
0
0
13 Jan 2025
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Ming Dai
Jian Li
Jiedong Zhuang
Xian Zhang
Wankou Yang
ObjD
42
1
0
12 Jan 2025
UV-Attack: Physical-World Adversarial Attacks for Person Detection via Dynamic-NeRF-based UV Mapping
Yanjie Li
Wenxuan Zhang
K. Liang
Bin Xiao
AAML
61
1
0
10 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
96
48
0
03 Jan 2025
IV-tuning: Parameter-Efficient Transfer Learning for Infrared-Visible Tasks
Yaming Zhang
Chenqiang Gao
Fangcen Liu
Junjie Guo
Lan Wang
Xinggan Peng
Deyu Meng
102
0
0
21 Dec 2024
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
Haoyi Jiang
Liu Liu
Tianheng Cheng
Xinjie Wang
Tianwei Lin
Zhizhong Su
W. Liu
X. Wang
3DGS
ViT
108
5
0
17 Dec 2024
Transmission Line Defect Detection Based on UAV Patrol Images and Vision-language Pretraining
Ke Zhang
Zhaoye Zheng
Yurong Guo
Jiacun Wang
Jiyuan Yang
Yangjie Xiao
VLM
79
0
0
18 Nov 2024
TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
Yiwei Guo
Shaobin Zhuang
Kunchang Li
Yu Qiao
Yali Wang
VLM
CLIP
24
0
0
16 Oct 2024
Fractal Calibration for long-tailed object detection
Konstantinos Panagiotis Alexandridis
Ismail Elezi
Jiankang Deng
Anh H. Nguyen
Shan Luo
108
0
0
15 Oct 2024
GlobalMamba: Global Image Serialization for Vision Mamba
Chengkun Wang
Wenzhao Zheng
Jie Zhou
Jiwen Lu
Mamba
35
0
0
14 Oct 2024
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Y. Zou
Tatsunori Hashimoto
VLM
64
3
0
14 Oct 2024
WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild
Rolandos Alexandros Potamias
Jinglei Zhang
Jiankang Deng
S. Zafeiriou
3DH
28
10
0
18 Sep 2024
FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation
Xi Chen
Haosen Yang
Sheng Jin
Xiatian Zhu
H. Yao
VLM
29
3
0
05 Sep 2024
A Survey of the Self Supervised Learning Mechanisms for Vision Transformers
Asifullah Khan
A. Sohail
M. Fiaz
Mehdi Hassan
Tariq Habib Afridi
...
Muhammad Zaigham Zaheer
Kamran Ali
Tangina Sultana
Ziaurrehman Tanoli
Naeem Akhter
45
3
0
30 Aug 2024
How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model
Yuxin Zhu
Huiyu Duan
Kaiwei Zhang
Yucheng Zhu
Xilei Zhu
Long Teng
Xiongkuo Min
Guangtao Zhai
69
2
0
10 Aug 2024
Rate-Distortion-Cognition Controllable Versatile Neural Image Compression
Jinming Liu
Ruoyu Feng
Yunpeng Qi
Qiuyu Chen
Zhibo Chen
Wenjun Zeng
Xin Jin
41
2
0
16 Jul 2024
Learning Spatial-Semantic Features for Robust Video Object Segmentation
Xin Li
Deshui Miao
Zhenyu He
Y. Wang
Huchuan Lu
Ming Yang
VOS
51
4
0
10 Jul 2024
SAVE: Segment Audio-Visual Easy way using Segment Anything Model
Khanh-Binh Nguyen
Chae Jung Park
VLM
VOS
34
1
0
02 Jul 2024
Robot Instance Segmentation with Few Annotations for Grasping
Moshe Kimhi
David Vainshtein
Chaim Baskin
Dotan Di Castro
51
2
0
01 Jul 2024
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Yuxuan Zhang
Tianheng Cheng
Lianghui Zhu
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
VLM
53
24
0
28 Jun 2024
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
Di Wang
Meiqi Hu
Yao Jin
Yuchun Miao
Jiaqi Yang
...
Lefei Zhang
Chen Wu
Bo Du
Dacheng Tao
Liangpei Zhang
59
25
0
17 Jun 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem
Subhajit Maity
Ayan Banerjee
Matthew Blaschko
Marie-Francine Moens
Josep Lladós
Sanket Biswas
41
2
0
12 Jun 2024
Masked Image Modelling for retinal OCT understanding
Theodoros Pissas
Pablo Márquez-Neila
Sebastian Wolf
M. Zinkernagel
Raphael Sznitman
22
0
0
23 May 2024
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
A. Fuller
Daniel G. Kyrollos
Yousef Yassin
James R. Green
43
2
0
22 May 2024
Replication Study and Benchmarking of Real-Time Object Detection Models
Pierre-Luc Asselin
Vincent Coulombe
William Guimont-Martin
William Larrivée-Hardy
38
0
0
11 May 2024
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Sachin Mehta
Maxwell Horton
Fartash Faghri
Mohammad Hossein Sekhavat
Mahyar Najibi
Mehrdad Farajtabar
Oncel Tuzel
Mohammad Rastegari
VLM
CLIP
36
6
0
24 Apr 2024
MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training
Jiayang Li
Junjun Jiang
Pengwei Liang
Jiayi Ma
Liqiang Nie
42
1
0
17 Apr 2024
Cross-domain Multi-modal Few-shot Object Detection via Rich Text
Zeyu Shangguan
Daniel Seita
Mohammad Rostami
ObjD
47
1
0
24 Mar 2024
Align and Distill: Unifying and Improving Domain Adaptive Object Detection
Justin Kay
T. Haucke
Suzanne Stathatos
Siqi Deng
Erik Young
Pietro Perona
Sara Beery
Grant Van Horn
51
4
0
18 Mar 2024
Part-aware Personalized Segment Anything Model for Patient-Specific Segmentation
Chenhui Zhao
Liyue Shen
VLM
29
3
0
08 Mar 2024
Denoising Autoregressive Representation Learning
Yazhe Li
J. Bornschein
Ting Chen
DiffM
30
3
0
08 Mar 2024
GOOD: Towards Domain Generalized Orientated Object Detection
Qi Bi
Beichen Zhou
Jingjun Yi
Wei Ji
Haolan Zhan
Gui-Song Xia
ObjD
OOD
74
2
0
20 Feb 2024
Dual-View Visual Contextualization for Web Navigation
Jihyung Kil
Chan Hee Song
Boyuan Zheng
Xiang Deng
Yu-Chuan Su
Wei-Lun Chao
EgoV
22
12
0
06 Feb 2024
InstanceDiffusion: Instance-level Control for Image Generation
Xudong Wang
Trevor Darrell
Sai Saketh Rambhatla
Rohit Girdhar
Ishan Misra
VLM
DiffM
32
84
0
05 Feb 2024
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Xinlei Chen
Zhuang Liu
Saining Xie
Kaiming He
DiffM
30
53
0
25 Jan 2024
Rethinking Patch Dependence for Masked Autoencoders
Letian Fu
Long Lian
Renhao Wang
Baifeng Shi
Xudong Wang
Adam Yala
Trevor Darrell
Alexei A. Efros
Ken Goldberg
26
14
0
25 Jan 2024
A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting
Wouter Van Gansbeke
Bert De Brabandere
DiffM
32
11
0
18 Jan 2024
A Study on Self-Supervised Pretraining for Vision Problems in Gastrointestinal Endoscopy
Edward Sanderson
B. Matuszewski
21
2
0
11 Jan 2024
1
2
3
Next