Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.14610
Cited By
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
21 March 2024
Qing Jiang
Feng Li
Zhaoyang Zeng
Tianhe Ren
Shilong Liu
Lei Zhang
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy"
33 / 33 papers shown
Title
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Teng Hu
Zhentao Yu
Zhengguang Zhou
Sen Liang
Yuan Zhou
Qin Lin
Qinglin Lu
DiffM
VGen
122
3
0
07 May 2025
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
390
0
0
11 Mar 2025
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
175
4
0
31 Dec 2024
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Qing Jiang
Gen Luo
Yuqin Yang
Yuda Xiong
Yihao Chen
Zhaoyang Zeng
Tianhe Ren
Lei Zhang
VLM
LRM
141
8
0
27 Nov 2024
Efficient Fourier Filtering Network with Contrastive Learning for UAV-based Unaligned Bi-modal Salient Object Detection
Pengfei Lyu
Pak-Hei Yeung
Xiufei Cheng
Xiaosheng Yu
Chengdong Wu
Jagath C. Rajapakse
64
0
0
06 Nov 2024
CountGD: Multi-Modal Open-World Counting
Niki Amini-Naieni
Tengda Han
Andrew Zisserman
ObjD
110
12
0
05 Jul 2024
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
Jia Syuen Lim
Zhuoxiao Chen
Mahsa Baktashmotlagh
Zhi Chen
Xin Yu
Zi Huang
Yadan Luo
VLM
ObjD
119
1
0
21 Jun 2024
T-Rex: Counting by Visual Prompting
Qing Jiang
Feng Li
Tianhe Ren
Shilong Liu
Zhaoyang Zeng
Kent Yu
Lei Zhang
70
13
0
22 Nov 2023
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
Lewei Yao
Jianhua Han
Youpeng Wen
Xiaodan Liang
Dan Xu
Wei Zhang
Zhenguo Li
Chunjing Xu
Hang Xu
CLIP
VLM
144
158
0
20 Sep 2022
Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting
Min Shi
Hao Lu
Chen Feng
Chengxin Liu
Zhiguo Cao
47
91
0
16 Mar 2022
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy
Yuanhan Zhang
Qi Sun
Yichun Zhou
Zexin He
Zhen-fei Yin
Kunze Wang
Lu Sheng
Yu Qiao
Jing Shao
Ziwei Liu
ObjD
VLM
53
19
0
15 Mar 2022
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Hao Zhang
Feng Li
Shilong Liu
Lei Zhang
Hang Su
Jun Zhu
L. Ni
H. Shum
ViT
153
1,428
0
07 Mar 2022
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
Feng Li
Hao Zhang
Shi-guang Liu
Jian Guo
L. Ni
Lei Zhang
ViT
120
671
0
02 Mar 2022
DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR
Shilong Liu
Feng Li
Hao Zhang
Xiaohu Yang
Xianbiao Qi
Hang Su
Jun Zhu
Lei Zhang
ViT
271
750
0
28 Jan 2022
Detecting Twenty-thousand Classes using Image-level Supervision
Xingyi Zhou
Rohit Girdhar
Armand Joulin
Phillip Krahenbuhl
Ishan Misra
CLIP
VLM
95
612
0
07 Jan 2022
Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
Golnaz Ghiasi
Xiuye Gu
Huayu Chen
Nayeon Lee
VLM
104
378
0
22 Dec 2021
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
...
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLM
CLIP
126
574
0
16 Dec 2021
Grounded Language-Image Pre-training
Liunian Harold Li
Pengchuan Zhang
Haotian Zhang
Jianwei Yang
Chunyuan Li
...
Lu Yuan
Lei Zhang
Lei Li
Kai-Wei Chang
Jianfeng Gao
ObjD
VLM
87
1,058
0
07 Dec 2021
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
204
1,422
0
03 Nov 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
163
879
0
26 Apr 2021
Learning To Count Everything
Viresh Ranjan
U. Sharma
Thua Nguyen
Minh Hoai
59
148
0
16 Apr 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
396
21,281
0
25 Mar 2021
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
191
5,046
0
08 Oct 2020
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
349
12,966
0
26 May 2020
LVIS: A Dataset for Large Vocabulary Instance Segmentation
Agrim Gupta
Piotr Dollár
Ross B. Girshick
ISeg
VLM
100
1,365
0
08 Aug 2019
Objects as Points
Xingyi Zhou
Dequan Wang
Philipp Krahenbuhl
3DPC
101
3,249
0
16 Apr 2019
FCOS: Fully Convolutional One-Stage Object Detection
Zhi Tian
Chunhua Shen
Hao Chen
Tong He
ObjD
114
4,997
0
02 Apr 2019
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression
S. Hamid Rezatofighi
Deyuan Li
JunYoung Gwak
Amir Sadeghian
Ian Reid
Silvio Savarese
145
4,143
0
25 Feb 2019
CrowdHuman: A Benchmark for Detecting Human in a Crowd
Shuai Shao
Zijian Zhao
Boxun Li
Tete Xiao
Gang Yu
Xiangyu Zhang
Jian Sun
265
684
0
30 Apr 2018
You Only Look Once: Unified, Real-Time Object Detection
Joseph Redmon
S. Divvala
Ross B. Girshick
Ali Farhadi
ObjD
639
36,801
0
08 Jun 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMat
ObjD
461
62,122
0
04 Jun 2015
Fast R-CNN
Ross B. Girshick
ObjD
290
25,008
0
30 Apr 2015
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
367
43,524
0
01 May 2014
1