Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.09106
Cited By
RegionCLIP: Region-based Language-Image Pretraining
16 December 2021
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
Liunian Harold Li
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"RegionCLIP: Region-based Language-Image Pretraining"
50 / 105 papers shown
Title
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Donghoon Kim
Minji Bae
Kyuhong Shim
B. Shim
38
0
0
13 May 2025
FG-CLIP: Fine-Grained Visual and Textual Alignment
Chunyu Xie
Bin Wang
Fanjing Kong
Jincheng Li
Dawei Liang
Gengshen Zhang
Dawei Leng
Yuhui Yin
CLIP
VLM
53
0
0
08 May 2025
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
Junjie Wang
Bin Chen
Yulin Li
Bin Kang
Yulin Chen
Zhuotao Tian
VLM
38
0
0
07 May 2025
Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions
Yifei Dong
Fengyi Wu
Sanjian Zhang
Guangyu Chen
Yuzhi Hu
...
Jingdong Sun
Siyu Huang
Feng Liu
Qi Dai
Zhi-Qi Cheng
44
0
0
16 Apr 2025
FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment
Sebastián Barbas Laina
Simon Boche
Sotiris Papatheodorou
Simon Schaefer
Jaehyung Jung
Stefan Leutenegger
52
0
0
11 Apr 2025
EffOWT: Transfer Visual Language Models to Open-World Tracking Efficiently and Effectively
Bingyang Wang
Kaer Huang
Bin Li
Yiqiang Yan
L. Zhang
Huchuan Lu
You He
VLM
37
0
0
07 Apr 2025
Post-processing for Fair Regression via Explainable SVD
Zhiqun Zuo
Ding Zhu
Mohammad Mahdi Khalili
155
0
0
04 Apr 2025
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
45
0
0
03 Apr 2025
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
Aniket Didolkar
Andrii Zadaianchuk
Rabiul Awal
Maximilian Seitzer
E. Gavves
Aishwarya Agrawal
OCL
VLM
89
2
0
27 Mar 2025
CQ-DINO: Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection
Zhichao Sun
Huazhang Hu
Yidong Ma
Gang Liu
Nemo Chen
Xu Tang
Yao Hu
Yongchao Xu
ObjD
47
0
0
24 Mar 2025
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Jinlong Li
Cristiano Saltori
Fabio Poiesi
N. Sebe
168
0
0
20 Mar 2025
Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection
Chuhan Zhang
Chaoyang Zhu
Pingcheng Dong
Long Chen
Dong Zhang
ObjD
VLM
164
0
0
14 Mar 2025
InPK: Infusing Prior Knowledge into Prompt for Vision-Language Models
Shuchang Zhou
Jiwei Wei
Shiyuan He
Yuyang Zhou
Chaoning Zhang
Jie Zou
Ning Xie
Yang Yang
VLM
VPVLM
81
0
0
27 Feb 2025
MQADet: A Plug-and-Play Paradigm for Enhancing Open-Vocabulary Object Detection via Multimodal Question Answering
Caixiong Li
Xiongwei Zhao
Jinhang Zhang
Xing Zhang
Qihao Sun
Zhou Wu
ObjD
MLLM
VLM
56
0
0
23 Feb 2025
Contrastive Localized Language-Image Pre-Training
Hong-You Chen
Zhengfeng Lai
H. Zhang
X. Wang
Marcin Eichner
Keen You
Meng Cao
Bowen Zhang
Y. Yang
Zhe Gan
CLIP
VLM
68
7
0
20 Feb 2025
ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification
Jiangbo Shi
Chen Li
Tieliang Gong
Yefeng Zheng
Huazhu Fu
VLM
65
7
0
12 Feb 2025
Modulating CNN Features with Pre-Trained ViT Representations for Open-Vocabulary Object Detection
Xiangyu Gao
Yu Dai
Benliu Qiu
Hongliang Li
Heqian Qiu
Hongliang Li
ObjD
VLM
151
0
0
28 Jan 2025
Myriad: Large Multimodal Model by Applying Vision Experts for Industrial Anomaly Detection
Yuanze Li
Haolin Wang
Shihao Yuan
Ming-Yu Liu
Debin Zhao
Yiwen Guo
Chen Xu
Guangming Shi
Wangmeng Zuo
81
29
0
20 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
105
18
0
17 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
99
48
0
03 Jan 2025
Beyond Accuracy: On the Effects of Fine-tuning Towards Vision-Language Model's Prediction Rationality
Qitong Wang
Tang Li
Kien X. Nguyen
Xi Peng
85
0
0
17 Dec 2024
Expanding Event Modality Applications through a Robust CLIP-Based Encoder
SungHeon Jeong
Hanning Chen
Sanggeon Yun
Suhyeon Cho
Wenjun Huang
Xiangjian Liu
Mohsen Imani
98
1
0
04 Dec 2024
Flex: End-to-End Text-Instructed Visual Navigation from Foundation Model Features
Makram Chahine
Alex Quach
Alaa Maalouf
T. Wang
Daniela Rus
23
0
0
16 Oct 2024
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Y. Zou
Tatsunori Hashimoto
VLM
67
4
0
14 Oct 2024
DA-Ada: Learning Domain-Aware Adapter for Domain Adaptive Object Detection
Hao Li
Rui Zhang
Hantao Yao
X. Zhang
Yifan Hao
Xinkai Song
Xiaqing Li
Yongwei Zhao
Ling Li
Yunji Chen
ObjD
VLM
31
3
0
11 Oct 2024
Compositional Entailment Learning for Hyperbolic Vision-Language Models
Avik Pal
Max van Spengler
Guido Maria DÁmely di Melendugno
Alessandro Flaborea
Fabio Galasso
Pascal Mettes
CoGe
48
5
0
09 Oct 2024
Revisiting Prompt Pretraining of Vision-Language Models
Zhenyuan Chen
Lingfeng Yang
Shuo Chen
Zhaowei Chen
Jiajun Liang
Xiang Li
MLLM
VPVLM
VLM
43
1
0
10 Sep 2024
OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding
Youjun Zhao
Jiaying Lin
Shuquan Ye
Qianshi Pang
Rynson W. H. Lau
64
1
0
20 Aug 2024
Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community
Jiancheng Pan
Yanxing Liu
Yuqian Fu
Muyuan Ma
Jiaohao Li
D. Paudel
Luc Van Gool
Xiaomeng Huang
ObjD
69
7
0
17 Aug 2024
Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation
Tong Shao
Zhuotao Tian
Hang Zhao
Jingyong Su
VLM
36
15
0
11 Jul 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
66
4
0
09 Jul 2024
Open-Vocabulary Temporal Action Localization using Multimodal Guidance
Akshita Gupta
Aditya Arora
Sanath Narayan
Salman Khan
F. Khan
Graham W. Taylor
38
3
0
21 Jun 2024
Open-Vocabulary X-ray Prohibited Item Detection via Fine-tuning CLIP
Shuyang Lin
Tong Jia
Hao Wang
Bowen Ma
Mingyuan Li
Dongyue Chen
VLM
ObjD
41
0
0
16 Jun 2024
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Mohamed El Amine Boudjoghra
Angela Dai
Jean Lahoud
Hisham Cholakkal
Rao Muhammad Anwer
Salman Khan
F. Khan
VLM
ISeg
80
6
0
04 Jun 2024
G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios
Zeyu Wang
Yuanchun Shi
Yuntao wang
Yuchen Yao
Kun Yan
Yuhan Wang
Lei Ji
Xuhai Xu
Chun Yu
40
7
0
13 May 2024
Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation
Shengyuan Liu
Bo Wang
Ye Ma
Te Yang
Xipeng Cao
Quan Chen
Han Li
Di Dong
Peng Jiang
EGVM
44
2
0
11 May 2024
Unifying Global and Local Scene Entities Modelling for Precise Action Spotting
Kim Hoang Tran
Phuc Vuong Do
Ngoc Quoc Ly
Ngan Le
36
4
0
15 Apr 2024
O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation
Muer Tie
Julong Wei
Zhengjun Wang
Ke Wu
Shansuai Yuan
Kaizhao Zhang
Jie Jia
Jieru Zhao
Zhongxue Gan
Wenchao Ding
40
7
0
10 Apr 2024
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images
Chaoqin Huang
Aofan Jiang
Jinghao Feng
Ya-Qin Zhang
Xinchao Wang
Yanfeng Wang
MedIm
45
25
0
19 Mar 2024
Open-World Semantic Segmentation Including Class Similarity
Matteo Sodano
Federico Magistri
Lucas Nunes
Jens Behley
C. Stachniss
VLM
42
8
0
12 Mar 2024
DomainVerse: A Benchmark Towards Real-World Distribution Shifts For Tuning-Free Adaptive Domain Generalization
Feng Hou
Jin Yuan
Ying Yang
Yang Liu
Yang Zhang
Cheng Zhong
Zhongchao Shi
Jianping Fan
Yong Rui
Zhiqiang He
VLM
51
1
0
05 Mar 2024
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
Ming-hui Li
Shuai Li
Xindong Zhang
Lei Zhang
VOS
44
16
0
28 Feb 2024
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada
Kanta Kaneda
Daichi Saito
Komei Sugiura
34
24
0
28 Feb 2024
Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization
Yuhang Zang
Hanlin Goh
Josh Susskind
Chen Huang
VLM
34
12
0
29 Jan 2024
POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images
Antonín Vobecký
Oriane Siméoni
David Hurych
Spyros Gidaris
Andrei Bursuc
Patrick Pérez
Josef Sivic
40
33
0
17 Jan 2024
Domain Adaptation for Large-Vocabulary Object Detectors
Kai Jiang
Jiaxing Huang
Weiying Xie
Jie Lei
Yunsong Li
Ling Shao
Shijian Lu
ObjD
VLM
37
2
0
13 Jan 2024
MmAP : Multi-modal Alignment Prompt for Cross-domain Multi-task Learning
Yi Xin
Junlong Du
Qiang Wang
Ke Yan
Shouhong Ding
VLM
41
45
0
14 Dec 2023
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
VLM
CLIP
48
83
0
06 Dec 2023
Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention
Zuyao Chen
Jinlin Wu
Zhen Lei
Zhaoxiang Zhang
Changwen Chen
25
11
0
18 Nov 2023
SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem
Yongqin Xian
Xiaohua Zhai
Lukas Hoyer
Luc Van Gool
F. Tombari
VLM
26
33
0
20 Oct 2023
1
2
3
Next