ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.11430
  4. Cited By
Class-agnostic Object Detection with Multi-modal Transformer
v1v2v3v4v5v6 (latest)

Class-agnostic Object Detection with Multi-modal Transformer

22 November 2021
Muhammad Maaz
H. Rasheed
Salman Khan
Fahad Shahbaz Khan
Rao Muhammad Anwer
Ming-Hsuan Yang
ArXiv (abs)PDFHTML

Papers citing "Class-agnostic Object Detection with Multi-modal Transformer"

50 / 66 papers shown
Title
Object Retrieval for Visual Question Answering with Outside Knowledge
Object Retrieval for Visual Question Answering with Outside Knowledge
Shichao Kan
Yuhai Deng
Yixiong Liang
Lihui Cen
Zhe Qu
Linna Zhang
Zhihai He
Yigang Cen
90
0
0
01 Jul 2025
Open World Object Detection: A Survey
Open World Object Detection: A Survey
Yiming Li
Yi Wang
Wenqian Wang
Dan Lin
Bingbing Li
Kim-Hui Yap
ObjD
94
1
0
01 Jul 2025
Object-level Self-Distillation for Vision Pretraining
Object-level Self-Distillation for Vision Pretraining
Çağlar Hızlı
Çağatay Yıldız
Pekka Marttinen
OCLVLM
50
0
0
04 Jun 2025
S2AFormer: Strip Self-Attention for Efficient Vision Transformer
S2AFormer: Strip Self-Attention for Efficient Vision Transformer
Guoan Xu
Wenfeng Huang
Wenjing Jia
Jiamao Li
Guangwei Gao
Guo-Jun Qi
71
0
0
28 May 2025
Enhancing Target-unspecific Tasks through a Features Matrix
Enhancing Target-unspecific Tasks through a Features Matrix
Fangming Cui
Yonggang Zhang
Xuan Wang
Xinmei Tian
Jun Yu
AAML
136
1
0
06 May 2025
ResNetVLLM -- Multi-modal Vision LLM for the Video Understanding Task
ResNetVLLM -- Multi-modal Vision LLM for the Video Understanding Task
Ahmad Khalil
Mahmoud Khalil
A. Ngom
VLM
118
1
0
20 Apr 2025
A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object Detection
Shenghao Fu
Junkai Yan
Q. Yang
Xihan Wei
Xiaohua Xie
Wei-Shi Zheng
ObjDVLM
87
0
0
13 Mar 2025
Space Rotation with Basis Transformation for Training-free Test-Time Adaptation
Space Rotation with Basis Transformation for Training-free Test-Time Adaptation
Chenhao Ding
Xinyuan Gao
Songlin Dong
Yuhang He
Qiang Wang
Xiang Song
Alex C. Kot
Yihong Gong
TTAVLM
159
0
0
27 Feb 2025
YOLO-UniOW: Efficient Universal Open-World Object Detection
YOLO-UniOW: Efficient Universal Open-World Object Detection
Lihao Liu
Juexiao Feng
Hui Chen
Ao Wang
Lin Song
Jiawei Han
Guiguang Ding
ObjDVLM
134
2
0
31 Dec 2024
From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects
From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects
Zizhao Li
Zhengkang Xiang
Joseph West
Kourosh Khoshelham
ObjDVLM
187
1
0
27 Nov 2024
3D Audio-Visual Segmentation
3D Audio-Visual Segmentation
Artem Sokolov
Swapnil Bhosale
Xiatian Zhu
VOS
81
0
0
04 Nov 2024
LOBG:Less Overfitting for Better Generalization in Vision-Language Model
LOBG:Less Overfitting for Better Generalization in Vision-Language Model
Chenhao Ding
Xinyuan Gao
Songlin Dong
Yuhang He
Qiang Wang
Alex C. Kot
Yihong Gong
VLM
69
1
0
14 Oct 2024
O1O: Grouping of Known Classes to Identify Unknown Objects as
  Odd-One-Out
O1O: Grouping of Known Classes to Identify Unknown Objects as Odd-One-Out
Mısra Yavuz
Fatma Guney
74
0
0
10 Oct 2024
CatFree3D: Category-agnostic 3D Object Detection with Diffusion
CatFree3D: Category-agnostic 3D Object Detection with Diffusion
Wenjing Bian
Zirui Wang
Andrea Vedaldi
96
1
0
22 Aug 2024
Multimodal Foundational Models for Unsupervised 3D General Obstacle
  Detection
Multimodal Foundational Models for Unsupervised 3D General Obstacle Detection
Tamás Matuszka
Peter Hajas
Dávid Szeghy
75
0
0
22 Aug 2024
Advancing Prompt Learning through an External Layer
Advancing Prompt Learning through an External Layer
Fangming Cui
Xun Yang
Chao Wu
Liang Xiao
Xinmei Tian
VLM
144
4
0
29 Jul 2024
ActionSwitch: Class-agnostic Detection of Simultaneous Actions in
  Streaming Videos
ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos
Hyolim Kang
Jeongseok Hyun
Joungbin An
Youngjae Yu
Seon Joo Kim
61
0
0
17 Jul 2024
Quantized Prompt for Efficient Generalization of Vision-Language Models
Quantized Prompt for Efficient Generalization of Vision-Language Models
Tianxiang Hao
Xiaohan Ding
Juexiao Feng
Yuhong Yang
Hui Chen
Guiguang Ding
VLMMQ
94
5
0
15 Jul 2024
XAMI -- A Benchmark Dataset for Artefact Detection in XMM-Newton Optical
  Images
XAMI -- A Benchmark Dataset for Artefact Detection in XMM-Newton Optical Images
Elisabeta-Iulia Dima
Pablo Gómez
Sandor Kruk
Peter Kretschmar
Simon Rosen
Călin-Adrian Popa
79
0
0
25 Jun 2024
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
Jia Syuen Lim
Zhuoxiao Chen
Mahsa Baktashmotlagh
Zhi Chen
Xin Yu
Zi Huang
Yadan Luo
VLMObjD
177
1
0
21 Jun 2024
RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection
RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection
Fangyi Chen
Han Zhang
Zhantao Yang
Hao Chen
Kai Hu
Marios Savvides
ObjDVLM
86
5
0
30 May 2024
Multimodal Object Detection via Probabilistic a priori Information
  Integration
Multimodal Object Detection via Probabilistic a priori Information Integration
Hafsa El Hafyani
Bastien Pasdeloup
Camille Yver
Pierre Romenteau
76
0
0
24 May 2024
ChEX: Interactive Localization and Region Description in Chest X-rays
ChEX: Interactive Localization and Region Description in Chest X-rays
Philip Muller
Georgios Kaissis
Daniel Rueckert
88
5
0
24 Apr 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques
  and Insights
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
Amirhossein Kazerouni
Ilker Hacihaliloglu
Dorit Merhof
97
7
0
28 Mar 2024
Unsupervised Audio-Visual Segmentation with Modality Alignment
Unsupervised Audio-Visual Segmentation with Modality Alignment
Swapnil Bhosale
Haosen Yang
Diptesh Kanojia
Jiangkang Deng
Xiatian Zhu
VOS
78
6
0
21 Mar 2024
As Firm As Their Foundations: Can open-sourced foundation models be used
  to create adversarial examples for downstream tasks?
As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks?
Anjun Hu
Jindong Gu
Francesco Pinto
Konstantinos Kamnitsas
Philip Torr
AAMLSILM
86
5
0
19 Mar 2024
Zero-shot Generalizable Incremental Learning for Vision-Language Object
  Detection
Zero-shot Generalizable Incremental Learning for Vision-Language Object Detection
Jieren Deng
Haojian Zhang
Kun Ding
Jianhua Hu
Xingxuan Zhang
Yunkuan Wang
VLMObjD
179
7
0
04 Mar 2024
APLe: Token-Wise Adaptive for Multi-Modal Prompt Learning
APLe: Token-Wise Adaptive for Multi-Modal Prompt Learning
Guiming Cao
Kaize Shi
Hong Fu
Huaiwen Zhang
Guandong Xu
VLM
75
2
0
12 Jan 2024
YOLO-Former: YOLO Shakes Hand With ViT
YOLO-Former: YOLO Shakes Hand With ViT
J. Khoramdel
A. Moori
Y. Borhani
A. Ghanbarzadeh
Esmaeil Najafi
ViT
64
3
0
11 Jan 2024
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes
  Interactively
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
Haobo Yuan
Xiangtai Li
Chong Zhou
Yining Li
Kai Chen
Chen Change Loy
VLM
118
51
0
05 Jan 2024
COMMA: Co-Articulated Multi-Modal Learning
COMMA: Co-Articulated Multi-Modal Learning
Lianyu Hu
Liqing Gao
Zekang Liu
Chi-Man Pun
Wei Feng
VLM
74
2
0
30 Dec 2023
Understanding the Multi-modal Prompts of the Pre-trained Vision-Language
  Model
Understanding the Multi-modal Prompts of the Pre-trained Vision-Language Model
Shuailei Ma
Chen-Wei Xie
Ying-yu Wei
Siyang Sun
Jiaqi Fan
Xiaoyi Bao
Yuxin Guo
Yun Zheng
VLMVPVLM
68
2
0
18 Dec 2023
MobileSAMv2: Faster Segment Anything to Everything
MobileSAMv2: Faster Segment Anything to Everything
Chaoning Zhang
Dongshen Han
Sheng Zheng
J. Choi
Tae-Ho Kim
Choong Seon Hong
VLM
92
27
0
15 Dec 2023
ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for
  Open-Vocabulary Object Detection
ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for Open-Vocabulary Object Detection
Joonhyun Jeong
Geondo Park
Jayeon Yoo
Hyungsik Jung
Heesu Kim
VLMObjD
92
11
0
12 Dec 2023
Open World Object Detection in the Era of Foundation Models
Open World Object Detection in the Era of Foundation Models
O. Zohar
Alejandro Lozano
Shelly Goel
Serena Yeung
Kuan-Chieh Wang
VLM
89
11
0
10 Dec 2023
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding
Yizhou Wang
Ruiyi Zhang
Haoliang Wang
Uttaran Bhattacharya
Yun Fu
Gang Wu
MLLM
74
11
0
04 Dec 2023
Proposal-Level Unsupervised Domain Adaptation for Open World Unbiased
  Detector
Proposal-Level Unsupervised Domain Adaptation for Open World Unbiased Detector
Xuanyi Liu
Zhongqi Yue
Xian-Sheng Hua
113
0
0
04 Nov 2023
Towards Open World Active Learning for 3D Object Detection
Towards Open World Active Learning for 3D Object Detection
Zhuoxiao Chen
Yadan Luo
Zixin Wang
Zijian Wang
Xin Yu
Zi Huang
68
0
0
16 Oct 2023
Leveraging Foundation models for Unsupervised Audio-Visual Segmentation
Leveraging Foundation models for Unsupervised Audio-Visual Segmentation
Swapnil Bhosale
Haosen Yang
Diptesh Kanojia
Xiatian Zhu
VOS
66
5
0
13 Sep 2023
Transformers in Small Object Detection: A Benchmark and Survey of
  State-of-the-Art
Transformers in Small Object Detection: A Benchmark and Survey of State-of-the-Art
Aref Miri Rekavandi
Shima Rashidi
F. Boussaïd
Stephen Hoefs
Emre Akbas
Bennamoun
ViT
113
27
0
10 Sep 2023
Contrastive Feature Masking Open-Vocabulary Vision Transformer
Contrastive Feature Masking Open-Vocabulary Vision Transformer
Dahun Kim
A. Angelova
Weicheng Kuo
ObjDVLM
120
27
0
02 Sep 2023
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object
  Detection
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
Yifan Xu
Mengdan Zhang
Xiaoshan Yang
Changsheng Xu
ObjD
84
5
0
30 Aug 2023
GenKL: An Iterative Framework for Resolving Label Ambiguity and Label
  Non-conformity in Web Images Via a New Generalized KL Divergence
GenKL: An Iterative Framework for Resolving Label Ambiguity and Label Non-conformity in Web Images Via a New Generalized KL Divergence
Xia Huang
Kai Fong Ernest Chong
73
3
0
19 Jul 2023
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present,
  and Future
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
Chaoyang Zhu
Long Chen
ObjDVLM
146
40
0
18 Jul 2023
Self-regulating Prompts: Foundational Model Adaptation without
  Forgetting
Self-regulating Prompts: Foundational Model Adaptation without Forgetting
Muhammad Uzair Khattak
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
93
189
0
13 Jul 2023
Understanding Prompt Tuning for V-L Models Through the Lens of Neural
  Collapse
Understanding Prompt Tuning for V-L Models Through the Lens of Neural Collapse
Didi Zhu
Zexi Li
Min Zhang
Junkun Yuan
Yunfeng Shao
Jiashuo Liu
Kun Kuang
Yinchuan Li
Chao Wu
VLM
71
2
0
28 Jun 2023
Towards Open Vocabulary Learning: A Survey
Towards Open Vocabulary Learning: A Survey
Jianzong Wu
Xiangtai Li
Shilin Xu
Haobo Yuan
Henghui Ding
...
Jiangning Zhang
Yu Tong
Xudong Jiang
Guohao Li
Dacheng Tao
ObjDVLM
154
151
0
28 Jun 2023
Hyp-OW: Exploiting Hierarchical Structure Learning with Hyperbolic
  Distance Enhances Open World Object Detection
Hyp-OW: Exploiting Hierarchical Structure Learning with Hyperbolic Distance Enhances Open World Object Detection
T. Doan
Xin Li
Sima Behpour
Wenbin He
Liangke Gou
Liu Ren
101
7
0
25 Jun 2023
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and
  Language Models
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Muhammad Maaz
H. Rasheed
Salman Khan
Fahad Shahbaz Khan
MLLM
162
662
0
08 Jun 2023
USD: Unknown Sensitive Detector Empowered by Decoupled Objectness and
  Segment Anything Model
USD: Unknown Sensitive Detector Empowered by Decoupled Objectness and Segment Anything Model
Yulin He
Wei Chen
Yusong Tan
Siqi Wang
95
9
0
04 Jun 2023
12
Next