Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2209.13959
Cited By
Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding
28 September 2022
Fengyuan Shi
Ruopeng Gao
Weilin Huang
Limin Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding"
15 / 15 papers shown
Title
PathVG: A New Benchmark and Dataset for Pathology Visual Grounding
Chunlin Zhong
Shuang Hao
Junhua Wu
Xiaona Chang
Jiwei Jiang
Xiu Nie
He Tang
Xiang Bai
47
0
0
28 Feb 2025
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration
X. J. Yang
J. Liu
Peng Wang
Guoqing Wang
Y. Yang
H. Shen
ObjD
94
0
0
27 Feb 2025
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
Liangtao Shi
Ting Liu
Xiantao Hu
Yue Hu
Quanjun Yin
Richang Hong
ObjD
51
0
0
24 Feb 2025
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Ming Dai
Jian Li
Jiedong Zhuang
Xian Zhang
Wankou Yang
ObjD
42
1
0
12 Jan 2025
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension
Ting Liu
Zunnan Xu
Yue Hu
Liangtao Shi
Zhiqiang Wang
Quanjun Yin
62
2
0
03 Jan 2025
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
55
3
0
31 Dec 2024
To Predict or Not To Predict? Proportionally Masked Autoencoders for Tabular Data Imputation
Jungkyu Kim
Kibok Lee
Taeyoung Park
44
0
0
26 Dec 2024
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Zilin Du
Haoxin Li
Jianfei Yu
Boyang Li
149
0
0
01 Dec 2024
Phrase Decoupling Cross-Modal Hierarchical Matching and Progressive Position Correction for Visual Grounding
Minghong Xie
M. Wang
Huafeng Li
Yafei Zhang
Dapeng Tao
Z. Yu
ObjD
40
1
0
31 Oct 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
29
5
0
10 Oct 2024
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion
Ming Dai
Lingfeng Yang
Yihao Xu
Zhenhua Feng
Wankou Yang
ObjD
27
9
0
26 Sep 2024
HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
34
8
0
20 Apr 2024
Context Does Matter: End-to-end Panoptic Narrative Grounding with Deformable Attention Refined Matching Network
Yiming Lin
Xiao-Bo Jin
Qiufeng Wang
Kaizhu Huang
24
3
0
25 Oct 2023
ClipCrop: Conditioned Cropping Driven by Vision-Language Model
Zhihang Zhong
Mingxi Cheng
Zhirong Wu
Yuhui Yuan
Yinqiang Zheng
Ji Li
Han Hu
Stephen Lin
Yoichi Sato
Imari Sato
VLM
CLIP
30
3
0
21 Nov 2022
A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension
Yue Liao
Si Liu
Guanbin Li
Fei-Yue Wang
Yanjie Chen
Chao Qian
Bo-wen Li
ObjD
62
174
0
16 Sep 2019
1