Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.03354
Cited By
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
6 November 2023
Junyan Li
Delin Chen
Yining Hong
Zhenfang Chen
Peihao Chen
Yikang Shen
Chuang Gan
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding"
12 / 12 papers shown
Title
How Can Objects Help Video-Language Understanding?
Zitian Tang
Shijie Wang
Junho Cho
Jaewook Yoo
Chen Sun
42
0
0
10 Apr 2025
Is CLIP ideal? No. Can we fix it? Yes!
Raphi Kang
Yue Song
Georgia Gkioxari
Pietro Perona
VLM
58
0
0
10 Mar 2025
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration
X. J. Yang
J. Liu
Peng Wang
Guoqing Wang
Y. Yang
H. Shen
ObjD
81
0
0
27 Feb 2025
VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
Chunbai Zhang
Chao Wang
Yang Zhou
Yan Peng
LRM
ReLM
60
0
0
02 Feb 2025
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor
Jiali Chen
Xusen Hei
Yuqi Xue
Yuancheng Wei
Jiayuan Xie
Yi Cai
Qing Li
MLLM
LRM
72
4
0
08 Dec 2024
Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision
Shengcao Cao
Liang-Yan Gui
Yu-Xiong Wang
44
3
0
10 Oct 2024
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
Junzhuo Liu
X. Yang
Weiwei Li
Peng Wang
ObjD
54
3
0
23 Sep 2024
Learning Visual Grounding from Generative Vision and Language Model
Shijie Wang
Dahun Kim
A. Taalimi
Chen Sun
Weicheng Kuo
ObjD
34
5
0
18 Jul 2024
Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding
Junjie Fei
Mahmoud Ahmed
Jian Ding
Eslam Mohamed Bakr
Mohamed Elhoseiny
31
3
0
29 May 2024
3D-VLA: A 3D Vision-Language-Action Generative World Model
Haoyu Zhen
Xiaowen Qiu
Peihao Chen
Jincheng Yang
Xin Yan
Yilun Du
Yining Hong
Chuang Gan
LM&Ro
VGen
PINN
34
89
0
14 Mar 2024
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
Akash Ghosh
Arkadeep Acharya
Sriparna Saha
Vinija Jain
Aman Chadha
VLM
51
25
0
20 Feb 2024
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
Shezheng Song
Xiaopeng Li
Shasha Li
Shan Zhao
Jie Yu
Jun Ma
Xiaoguang Mao
Weimin Zhang
71
4
0
10 Nov 2023
1