Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.06619
Cited By
TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer
14 June 2022
Jiajun Deng
Zhengyuan Yang
Daqing Liu
Tianlang Chen
Wen-gang Zhou
Yanyong Zhang
Houqiang Li
Wanli Ouyang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer"
26 / 26 papers shown
Title
Multi-Object Grounding via Hierarchical Contrastive Siamese Transformers
Chengyi Du
Keyan Jin
32
0
0
14 Apr 2025
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration
X. J. Yang
Xiaozhong Liu
Peng Wang
Guoqing Wang
Yi Yang
H. Shen
ObjD
94
0
0
27 Feb 2025
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
Liangtao Shi
Ting Liu
Xiantao Hu
Yue Hu
Quanjun Yin
Richang Hong
ObjD
51
0
0
24 Feb 2025
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension
Ting Liu
Zunnan Xu
Yue Hu
Liangtao Shi
Zhiqiang Wang
Quanjun Yin
65
2
0
03 Jan 2025
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
55
3
0
31 Dec 2024
To Predict or Not To Predict? Proportionally Masked Autoencoders for Tabular Data Imputation
Jungkyu Kim
Kibok Lee
Taeyoung Park
44
1
0
26 Dec 2024
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Zilin Du
Haoxin Li
Jianfei Yu
Boyang Li
152
0
0
01 Dec 2024
Phrase Decoupling Cross-Modal Hierarchical Matching and Progressive Position Correction for Visual Grounding
Minghong Xie
Hao Wu
Huafeng Li
Yafei Zhang
Dapeng Tao
Z. Yu
ObjD
40
1
0
31 Oct 2024
Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention
Haomeng Zhang
Chiao-An Yang
Raymond A. Yeh
41
1
0
29 Oct 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
32
5
0
10 Oct 2024
Make Graph-based Referring Expression Comprehension Great Again through Expression-guided Dynamic Gating and Regression
Jingcheng Ke
Dele Wang
Jun-Cheng Chen
I-Hong Jhuo
Chia-Wen Lin
Yen-Yu Lin
33
0
0
05 Sep 2024
PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference
Ye Li
Chen Tang
Yuan Meng
Jiajun Fan
Zenghao Chai
Xinzhu Ma
Zhi Wang
Wenwu Zhu
31
1
0
06 Jul 2024
ScanFormer: Referring Expression Comprehension by Iteratively Scanning
Wei Su
Peihan Miao
Huanzhang Dou
Xi Li
ObjD
47
7
0
26 Jun 2024
Prior Normality Prompt Transformer for Multi-class Industrial Image Anomaly Detection
Haiming Yao
Yunkang Cao
Wei Luo
Weihang Zhang
Wenyong Yu
Weiming Shen
35
7
0
17 Jun 2024
ChEX: Interactive Localization and Region Description in Chest X-rays
Philip Muller
Georgios Kaissis
Daniel Rueckert
35
5
0
24 Apr 2024
SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph Attention
Feng Xiao
Hongbin Xu
Qiuxia Wu
Wenxiong Kang
34
2
0
13 Mar 2024
Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions
Wenxuan Wang
Yisi Zhang
Xingjian He
Yichen Yan
Zijia Zhao
Xinlong Wang
Jing Liu
LM&Ro
27
4
0
17 Feb 2024
Cycle-Consistency Learning for Captioning and Grounding
Ning Wang
Jiajun Deng
Mingbo Jia
ObjD
42
7
0
23 Dec 2023
I
2
^2
2
MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation
Yunyao Mao
Jiajun Deng
Wen-gang Zhou
Zhenbo Lu
Wanli Ouyang
Houqiang Li
VLM
30
1
0
24 Oct 2023
Grounded Image Text Matching with Mismatched Relation Reasoning
Yu Wu
Yan-Tao Wei
Haozhe Jasper Wang
Yongfei Liu
Sibei Yang
Xuming He
34
6
0
02 Aug 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
94
11
0
03 Mar 2023
Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding
Fengyuan Shi
Ruopeng Gao
Weilin Huang
Limin Wang
30
23
0
28 Sep 2022
CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation
Yunyao Mao
Wen-gang Zhou
Zhenbo Lu
Jiajun Deng
Houqiang Li
30
38
0
26 Aug 2022
Training Vision-Language Transformers from Captions
Liangke Gui
Yingshan Chang
Qiuyuan Huang
Subhojit Som
Alexander G. Hauptmann
Jianfeng Gao
Yonatan Bisk
VLM
ViT
174
11
0
19 May 2022
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLM
ViT
VLM
244
344
0
22 Sep 2021
A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension
Yue Liao
Si Liu
Guanbin Li
Fei-Yue Wang
Yanjie Chen
Chao Qian
Bo-wen Li
ObjD
64
174
0
16 Sep 2019
1