ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.08394
  4. Cited By
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
v1v2v3 (latest)

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

3 January 2025
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
Zhe Chen
Wenhai Wang
X. Zhu
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
    MLLMVLMLRM
ArXiv (abs)PDFHTML

Papers citing "VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks"

50 / 224 papers shown
Title
Mini-Gemini: Mining the Potential of Multi-modality Vision Language
  Models
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Yanwei Li
Yuechen Zhang
Chengyao Wang
Zhisheng Zhong
Yixin Chen
Ruihang Chu
Shaoteng Liu
Jiaya Jia
VLMMLLMMoE
121
238
0
27 Mar 2024
InternLM2 Technical Report
InternLM2 Technical Report
Zheng Cai
Maosong Cao
Haojiong Chen
Kai-xiang Chen
Keyu Chen
...
Jingming Zhuo
Yi-Ling Zou
Xipeng Qiu
Yu Qiao
Dahua Lin
ALM
72
208
0
26 Mar 2024
PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model
PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model
Zheng Zhang
Yeyao Ma
Enming Zhang
Xiang Bai
VLMMLLM
124
47
0
21 Mar 2024
GiT: Towards Generalist Vision Transformer through Universal Language
  Interface
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Haiyang Wang
Hao Tang
Li Jiang
Shaoshuai Shi
Muhammad Ferjad Naeem
Hongsheng Li
Bernt Schiele
Liwei Wang
VLM
99
13
0
14 Mar 2024
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
Minbin Huang
Yanxin Long
Xinchi Deng
Ruihang Chu
Jiangfeng Xiong
Xiaodan Liang
Hong Cheng
Qinglin Lu
Wei Liu
MLLMEGVM
172
10
0
13 Mar 2024
SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale
  SAR Object Detection
SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection
Yuxuan Li
Xiang Li
Wei-Jang Li
Qibin Hou
Li Liu
Ming-Ming Cheng
Jian Yang
102
38
0
11 Mar 2024
Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception
Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception
Jun-Yan He
Yifan Wang
Lijun Wang
Huchuan Lu
Jun-Yan He
Jinpeng Lan
Bin Luo
Xuansong Xie
MLLMVLM
91
22
0
05 Mar 2024
RegionGPT: Towards Region Understanding Vision Language Model
RegionGPT: Towards Region Understanding Vision Language Model
Qiushan Guo
Shalini De Mello
Hongxu Yin
Wonmin Byeon
Ka Chun Cheung
Yizhou Yu
Ping Luo
Sifei Liu
VLM
97
37
0
04 Mar 2024
The All-Seeing Project V2: Towards General Relation Comprehension of the
  Open World
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Weiyun Wang
Yiming Ren
Hao Luo
Tiantong Li
Chenxiang Yan
...
Qingyun Li
Lewei Lu
Xizhou Zhu
Yu Qiao
Jifeng Dai
MLLM
129
53
0
29 Feb 2024
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
Yichi Zhang
Ziqiao Ma
Xiaofeng Gao
Suhaila Shakiah
Qiaozi Gao
Joyce Chai
MLLMVLM
133
47
0
26 Feb 2024
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Jun Zhan
Junqi Dai
Jiasheng Ye
Yunhua Zhou
Dong Zhang
...
Jie Fu
Tao Gui
Tianxiang Sun
Yugang Jiang
Xipeng Qiu
MLLM
95
136
0
19 Feb 2024
ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language
  Models
ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models
Guiming Hardy Chen
Shunian Chen
Ruifei Zhang
Junying Chen
Xiangbo Wu
Zhiyi Zhang
Zhihong Chen
Jianquan Li
Xiang Wan
Benyou Wang
VLMSyDa
129
139
0
18 Feb 2024
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Soroush Nasiriany
Fei Xia
Wenhao Yu
Ted Xiao
Jacky Liang
...
Karol Hausman
N. Heess
Chelsea Finn
Sergey Levine
Brian Ichter
LM&RoLRM
94
114
0
12 Feb 2024
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Xiangxiang Chu
Limeng Qiao
Xinyu Zhang
Shuang Xu
Fei Wei
...
Xiaofei Sun
Yiming Hu
Xinyang Lin
Bo Zhang
Chunhua Shen
VLMMLLM
80
109
0
06 Feb 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
...
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
LRMALM
203
381
0
05 Jan 2024
LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
Yichen Zhu
Minjie Zhu
Ning Liu
Zhicai Ou
Xiaofeng Mou
Jian Tang
205
103
0
04 Jan 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
  Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLMMLLM
266
1,216
0
21 Dec 2023
Generative Multimodal Models are In-Context Learners
Generative Multimodal Models are In-Context Learners
Quan-Sen Sun
Yufeng Cui
Xiaosong Zhang
Fan Zhang
Qiying Yu
...
Yueze Wang
Yongming Rao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLMLRM
155
291
0
20 Dec 2023
Jack of All Tasks, Master of Many: Designing General-purpose
  Coarse-to-Fine Vision-Language Model
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLMMLLM
164
36
0
19 Dec 2023
Osprey: Pixel Understanding with Visual Instruction Tuning
Osprey: Pixel Understanding with Visual Instruction Tuning
Yuqian Yuan
Wentong Li
Jian Liu
Dongqi Tang
Xinjie Luo
Chi Qin
Lei Zhang
Jianke Zhu
MLLMVLM
121
87
0
15 Dec 2023
Pixel Aligned Language Models
Pixel Aligned Language Models
Jiarui Xu
Xingyi Zhou
Shen Yan
Xiuye Gu
Anurag Arnab
Chen Sun
Xiaolong Wang
Cordelia Schmid
MLLMVLM
132
15
0
14 Dec 2023
General Object Foundation Model for Images and Videos at Scale
General Object Foundation Model for Images and Videos at Scale
Junfeng Wu
Yi Jiang
Qihao Liu
Zehuan Yuan
Xiang Bai
Song Bai
VOSVLM
108
41
0
14 Dec 2023
SmartEdit: Exploring Complex Instruction-based Image Editing with
  Multimodal Large Language Models
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
Yuzhou Huang
Liangbin Xie
Xintao Wang
Ziyang Yuan
Xiaodong Cun
...
Jiantao Zhou
Chao Dong
Rui Huang
Ruimao Zhang
Ying Shan
DiffM
74
77
0
11 Dec 2023
Hulk: A Universal Knowledge Translator for Human-Centric Tasks
Hulk: A Universal Knowledge Translator for Human-Centric Tasks
Yizhou Wang
YiXuan Wu
Shixiang Tang
Weizhen He
Xun Guo
...
Lei Bai
Rui Zhao
Jian Wu
Tong He
Wanli Ouyang
VLM
202
14
0
04 Dec 2023
PixelLM: Pixel Reasoning with Large Multimodal Model
PixelLM: Pixel Reasoning with Large Multimodal Model
Zhongwei Ren
Zhicheng Huang
Yunchao Wei
Yao-Min Zhao
Dongmei Fu
Jiashi Feng
Xiaojie Jin
VLMMLLMLRM
116
109
0
04 Dec 2023
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of
  Low-rank Experts
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Jialin Wu
Xia Hu
Yaqing Wang
Bo Pang
Radu Soricut
MoE
73
16
0
01 Dec 2023
LLMGA: Multimodal Large Language Model based Generation Assistant
LLMGA: Multimodal Large Language Model based Generation Assistant
Bin Xia
Shiyin Wang
Yingfan Tao
Yitong Wang
Jiaya Jia
MLLM
89
12
0
27 Nov 2023
Visual In-Context Prompting
Visual In-Context Prompting
Feng Li
Qing Jiang
Hao Zhang
Tianhe Ren
Shilong Liu
...
Hongyang Li
Chun-yue Li
Jianwei Yang
Lei Zhang
Jianfeng Gao
VLMLRMMLLM
86
36
0
22 Nov 2023
T-Rex: Counting by Visual Prompting
T-Rex: Counting by Visual Prompting
Qing Jiang
Feng Li
Tianhe Ren
Shilong Liu
Zhaoyang Zeng
Kent Yu
Lei Zhang
95
14
0
22 Nov 2023
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Conghui He
Jiaqi Wang
Feng Zhao
Dahua Lin
MLLMVLM
200
683
0
21 Nov 2023
SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20
  Million masks
SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks
Jin Ye
Junlong Cheng
Jianpin Chen
Zhongying Deng
Tian-Xin Li
...
Hui Sun
Min Zhu
Shaoting Zhang
Junjun He
Yu Qiao
VLMMedIm
79
40
0
20 Nov 2023
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
...
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
MLLMVLM
101
126
0
09 Nov 2023
GLaMM: Pixel Grounding Large Multimodal Model
GLaMM: Pixel Grounding Large Multimodal Model
H. Rasheed
Muhammad Maaz
Sahal Shaji Mullappilly
Abdelrahman M. Shaker
Salman Khan
Hisham Cholakkal
Rao M. Anwer
Erix Xing
Ming-Hsuan Yang
Fahad S. Khan
MLLMVLM
151
239
0
06 Nov 2023
ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object
  Detection
ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection
Youwei Pang
Xiaoqi Zhao
Tian-Zhu Xiang
Lihe Zhang
Huchuan Lu
109
33
0
31 Oct 2023
ControlLLM: Augment Language Models with Tools by Searching on Graphs
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Zhaoyang Liu
Zeqiang Lai
Zhangwei Gao
Erfei Cui
Ziheng Li
...
Lewei Lu
Qifeng Chen
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
198
31
0
26 Oct 2023
MiniGPT-v2: large language model as a unified interface for
  vision-language multi-task learning
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
249
473
0
14 Oct 2023
AutoVP: An Automated Visual Prompting Framework and Benchmark
AutoVP: An Automated Visual Prompting Framework and Benchmark
Hsi-Ai Tsao
Lei Hsiung
Pin-Yu Chen
Sijia Liu
Tsung-Yi Ho
VLM
82
22
0
12 Oct 2023
Ferret: Refer and Ground Anything Anywhere at Any Granularity
Ferret: Refer and Ground Anything Anywhere at Any Granularity
Haoxuan You
Haotian Zhang
Zhe Gan
Xianzhi Du
Bowen Zhang
Zirui Wang
Liangliang Cao
Shih-Fu Chang
Yinfei Yang
ObjDMLLMVLM
118
328
0
11 Oct 2023
Improved Baselines with Visual Instruction Tuning
Improved Baselines with Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLMMLLM
219
2,830
0
05 Oct 2023
Kosmos-G: Generating Images in Context with Multimodal Large Language
  Models
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Xichen Pan
Li Dong
Shaohan Huang
Zhiliang Peng
Wenhu Chen
Furu Wei
VLM
152
68
0
04 Oct 2023
Making LLaMA SEE and Draw with SEED Tokenizer
Making LLaMA SEE and Draw with SEED Tokenizer
Yuying Ge
Sijie Zhao
Ziyun Zeng
Yixiao Ge
Chen Li
Xintao Wang
Ying Shan
80
137
0
02 Oct 2023
InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision
  Generalists
InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists
Yulu Gan
Sungwoo Park
Alexander Schubert
Anthony Philippakis
Ahmed Alaa
VLM
109
25
0
30 Sep 2023
Guiding Instruction-based Image Editing via Multimodal Large Language
  Models
Guiding Instruction-based Image Editing via Multimodal Large Language Models
Johannes Frey
Wenze Hu
Xianzhi Du
William Yang Wang
Yinfei Yang
Zhe Gan
114
98
0
29 Sep 2023
Qwen Technical Report
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
273
1,919
0
28 Sep 2023
DreamLLM: Synergistic Multimodal Comprehension and Creation
DreamLLM: Synergistic Multimodal Comprehension and Creation
Runpei Dong
Chunrui Han
Yuang Peng
Zekun Qi
Zheng Ge
...
Hao-Ran Wei
Xiangwen Kong
Xiangyu Zhang
Kaisheng Ma
Li Yi
MLLM
104
199
0
20 Sep 2023
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Zigang Geng
Binxin Yang
Tiankai Hang
Chen Li
Shuyang Gu
...
Jianmin Bao
Zheng Zhang
Han Hu
DongDong Chen
Baining Guo
DiffMVLM
118
107
0
07 Sep 2023
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction
  Tuning
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
L. Yu
Bowen Shi
Ramakanth Pasunuru
Benjamin Muller
O. Yu. Golovneva
...
Yaniv Taigman
Maryam Fazel-Zarandi
Asli Celikyilmaz
Luke Zettlemoyer
Armen Aghajanyan
MLLM
98
142
0
05 Sep 2023
Group Pose: A Simple Baseline for End-to-End Multi-person Pose
  Estimation
Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation
Huan Liu
Qiang Chen
Zichang Tan
Jiangjiang Liu
Jian Wang
...
Kun Yao
Junyu Han
Errui Ding
Yao-Min Zhao
Jingdong Wang
ViT
80
28
0
14 Aug 2023
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
Weihao Yu
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
Zicheng Liu
Xinchao Wang
Lijuan Wang
MLLM
135
720
0
04 Aug 2023
The All-Seeing Project: Towards Panoptic Visual Recognition and
  Understanding of the Open World
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Weiyun Wang
Min Shi
Qingyun Li
Wen Wang
Zhenhang Huang
...
Zhiguo Cao
Yushi Chen
Tong Lu
Jifeng Dai
Yu Qiao
LRMMLLM
130
88
0
03 Aug 2023
Previous
12345
Next