Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.08394
Cited By
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
3 January 2025
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
Zhe Chen
Wenhai Wang
X. Zhu
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks"
3 / 53 papers shown
Title
Visual Saliency Transformer
Nian Liu
Ni Zhang
Kaiyuan Wan
Ling Shao
Junwei Han
ViT
253
352
0
25 Apr 2021
UniPose: Unified Human Pose Estimation in Single Images and Videos
Bruno Artacho
Andreas E. Savakis
133
135
0
22 Jan 2020
CrowdHuman: A Benchmark for Detecting Human in a Crowd
Shuai Shao
Zijian Zhao
Boxun Li
Tete Xiao
Gang Yu
Xiangyu Zhang
Jian Sun
222
675
0
30 Apr 2018
Previous
1
2