Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.13923
Cited By
Qwen2.5-VL Technical Report
20 February 2025
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
Sibo Song
K. Dang
P. Wang
S. Wang
J. Tang
Humen Zhong
Yuanzhi Zhu
Mingkun Yang
Zhaohai Li
Jianqiang Wan
P. Wang
Wei Ding
Zheren Fu
Yiheng Xu
Jiabo Ye
Xi Zhang
Tianbao Xie
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Qwen2.5-VL Technical Report"
10 / 210 papers shown
Title
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models
Jake Poznanski
Jon Borchardt
Jason Dunkelberger
Regan Huff
Daniel Lin
Aman Rangapur
Christopher Wilhelm
Kyle Lo
Luca Soldaini
91
0
0
25 Feb 2025
AeroReformer: Aerial Referring Transformer for UAV-based Referring Image Segmentation
Rui Li
Xiaowei Zhao
71
0
0
23 Feb 2025
Phantom: Subject-consistent video generation via cross-modal alignment
Lijie Liu
Tianxiang Ma
Bingchuan Li
Zhuowei Chen
Jiawei Liu
Qian He
Xinglong Wu
Qian He
Xinglong Wu
DiffM
VGen
52
5
0
16 Feb 2025
Mobile Robot Navigation Using Hand-Drawn Maps: A Vision Language Model Approach
A. H. Tan
Angus Fung
Haitong Wang
G. Nejat
93
2
0
31 Jan 2025
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Xinhao Li
Yi Wang
Jiashuo Yu
Xiangyu Zeng
Yuhan Zhu
...
Yinan He
Chenting Wang
Yu Qiao
Yali Wang
L. Wang
VLM
77
25
0
31 Dec 2024
TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
Linqing Zhong
Chen Gao
Zihan Ding
Yue Liao
Si Liu
Shifeng Zhang
Xu Zhou
Si Liu
LRM
90
4
0
25 Nov 2024
Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments
Sangmim Song
S. Kodagoda
A. Gunatilake
Marc G. Carmichael
Karthick Thiyagarajan
Jodi Martin
LM&Ro
30
1
0
28 Oct 2024
Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning
Saemi Moon
M. Lee
Sangdon Park
Dongwoo Kim
44
1
0
08 Oct 2024
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan
Junming Yang
Junming Yang
Xinyu Fang
Lin Chen
...
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MA
VLM
39
115
0
16 Jul 2024
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
94
11
0
03 Mar 2023
Previous
1
2
3
4
5