ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.08394
  4. Cited By
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

3 January 2025
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
Zhe Chen
Wenhai Wang
X. Zhu
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
    MLLM
    VLM
    LRM
ArXivPDFHTML

Papers citing "VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks"

3 / 53 papers shown
Title
Visual Saliency Transformer
Visual Saliency Transformer
Nian Liu
Ni Zhang
Kaiyuan Wan
Ling Shao
Junwei Han
ViT
253
352
0
25 Apr 2021
UniPose: Unified Human Pose Estimation in Single Images and Videos
UniPose: Unified Human Pose Estimation in Single Images and Videos
Bruno Artacho
Andreas E. Savakis
133
135
0
22 Jan 2020
CrowdHuman: A Benchmark for Detecting Human in a Crowd
CrowdHuman: A Benchmark for Detecting Human in a Crowd
Shuai Shao
Zijian Zhao
Boxun Li
Tete Xiao
Gang Yu
Xiangyu Zhang
Jian Sun
222
675
0
30 Apr 2018
Previous
12