ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.05437
  4. Cited By
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

9 November 2023
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
Tianhe Ren
Xueyan Zou
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
    MLLMVLM
ArXiv (abs)PDFHTML

Papers citing "LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents"

41 / 91 papers shown
Title
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs
Zijia Zhao
Haoyu Lu
Yuqi Huo
Yifan Du
Tongtian Yue
Longteng Guo
Bingning Wang
Weipeng Chen
Jing Liu
84
5
0
13 Jun 2024
Wings: Learning Multimodal LLMs without Text-only Forgetting
Wings: Learning Multimodal LLMs without Text-only Forgetting
Yi-Kai Zhang
Shiyin Lu
Yang Li
Yanqing Ma
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
VLM
119
10
0
05 Jun 2024
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Ling-Hao Chen
Shunlin Lu
Ailing Zeng
Hao Zhang
Benyou Wang
Ruimao Zhang
Lei Zhang
120
38
0
30 May 2024
Typography Leads Semantic Diversifying: Amplifying Adversarial
  Transferability across Multimodal Large Language Models
Typography Leads Semantic Diversifying: Amplifying Adversarial Transferability across Multimodal Large Language Models
Hao-Ran Cheng
Erjia Xiao
Jiahang Cao
Le Yang
Kaidi Xu
Jindong Gu
Renjing Xu
AAML
133
10
0
30 May 2024
A Human-Like Reasoning Framework for Multi-Phases Planning Task with
  Large Language Models
A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models
Chengxing Xie
Difan Zou
LRMLLMAG
73
5
0
28 May 2024
A Misleading Gallery of Fluid Motion by Generative Artificial
  Intelligence
A Misleading Gallery of Fluid Motion by Generative Artificial Intelligence
Ali Kashefi
VGen
85
6
0
24 May 2024
IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning
  Inner Monologues
IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues
Diji Yang
Jinmeng Rao
Kezhen Chen
Xiaoyuan Guo
Yawen Zhang
Jie Yang
Yi Zhang
LRMRALM
113
20
0
15 May 2024
VS-Assistant: Versatile Surgery Assistant on the Demand of Surgeons
VS-Assistant: Versatile Surgery Assistant on the Demand of Surgeons
Zhen Chen
Xingjian Luo
Jinlin Wu
Danny Tat Ming Chan
Zhen Lei
Jinqiao Wang
Sebastien Ourselin
Hongbin Liu
84
4
0
14 May 2024
DoLLM: How Large Language Models Understanding Network Flow Data to
  Detect Carpet Bombing DDoS
DoLLM: How Large Language Models Understanding Network Flow Data to Detect Carpet Bombing DDoS
Qingyang Li
Yihang Zhang
Zhidong Jia
Yannan Hu
Lei Zhang
Jianrong Zhang
Yongming Xu
Yong Cui
Xinggong Zhang
Xinggong Zhang
AI4CE
72
8
0
13 May 2024
BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to
  Complement Historical Analysis
BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis
Shuhang Lin
Wenyue Hua
Lingyao Li
Che-Jui Chang
Lizhou Fan
Jianchao Ji
Hang Hua
Mingyu Jin
Jiebo Luo
Yongfeng Zhang
LM&RoLLMAG
108
12
0
23 Apr 2024
In-Context Translation: Towards Unifying Image Recognition, Processing,
  and Generation
In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation
Han Xue
Qianru Sun
Li Song
Wenjun Zhang
Zhiwu Huang
MLLM
72
0
0
15 Apr 2024
OmniFusion Technical Report
OmniFusion Technical Report
Elizaveta Goncharova
Anton Razzhigaev
Matvey Mikhalchuk
Maxim Kurkin
Irina Abdullaeva
Matvey Skripkin
Ivan Oseledets
Denis Dimitrov
Andrey Kuznetsov
72
4
0
09 Apr 2024
Visually Descriptive Language Model for Vector Graphics Reasoning
Visually Descriptive Language Model for Vector Graphics Reasoning
Zhenhailong Wang
Joy Hsu
Xingyao Wang
Kuan-Hao Huang
Manling Li
Jiajun Wu
Heng Ji
MLLMVLMLRM
55
4
0
09 Apr 2024
BeyondScene: Higher-Resolution Human-Centric Scene Generation With
  Pretrained Diffusion
BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion
Gwanghyun Kim
Hayeon Kim
H. Seo
Dong un Kang
Se Young Chun
68
4
0
06 Apr 2024
Towards Responsible and Reliable Traffic Flow Prediction with Large
  Language Models
Towards Responsible and Reliable Traffic Flow Prediction with Large Language Models
Xusen Guo
Qiming Zhang
Junyue Jiang
Mingxing Peng
Hao
Hao Yang
Meixin Zhu
AI4TS
66
14
0
03 Apr 2024
Empowering Segmentation Ability to Multi-modal Large Language Models
Empowering Segmentation Ability to Multi-modal Large Language Models
Yuqi Yang
Peng-Tao Jiang
Jing Wang
Hao Zhang
Kai Zhao
Jinwei Chen
Yue Liu
LRMVLM
81
4
0
21 Mar 2024
Reconstruct before Query: Continual Missing Modality Learning with
  Decomposed Prompt Collaboration
Reconstruct before Query: Continual Missing Modality Learning with Decomposed Prompt Collaboration
Shu Zhao
Xiaohan Zou
Tan Yu
Huijuan Xu
83
1
0
17 Mar 2024
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Brandon McKinzie
Zhe Gan
J. Fauconnier
Sam Dodge
Bowen Zhang
...
Zirui Wang
Ruoming Pang
Peter Grasch
Alexander Toshev
Yinfei Yang
MLLM
123
208
0
14 Mar 2024
Large Multimodal Agents: A Survey
Large Multimodal Agents: A Survey
Junlin Xie
Zhihong Chen
Ruifei Zhang
Xiang Wan
Guanbin Li
LM&RoLLMAG
92
44
0
23 Feb 2024
Exploring the Frontier of Vision-Language Models: A Survey of Current
  Methodologies and Future Directions
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
Akash Ghosh
Arkadeep Acharya
Sriparna Saha
Vinija Jain
Aman Chadha
VLM
117
33
0
20 Feb 2024
The Revolution of Multimodal Large Language Models: A Survey
The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni
Federico Cocchi
Luca Barsellotti
Nicholas Moratelli
Sara Sarto
Lorenzo Baraldi
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
LRMVLM
131
63
0
19 Feb 2024
LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with
  External Knowledge Augmentation
LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation
Keyang Xuan
Li Yi
Fan Yang
Ruochen Wu
Yi R. Fung
Chenhui Xu
111
15
0
19 Feb 2024
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM
  Agents Exponentially Fast
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Ye Wang
Jing Jiang
Min Lin
LLMAGLM&Ro
52
63
0
13 Feb 2024
Large Language Models: A Survey
Large Language Models: A Survey
Shervin Minaee
Tomas Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALMLM&MAELM
246
425
0
09 Feb 2024
Video-LaVIT: Unified Video-Language Pre-training with Decoupled
  Visual-Motional Tokenization
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
Yang Jin
Zhicheng Sun
Kun Xu
Kun Xu
Liwei Chen
...
Yuliang Liu
Di Zhang
Yang Song
Kun Gai
Yadong Mu
VGen
111
51
0
05 Feb 2024
A Survey on Hallucination in Large Vision-Language Models
A Survey on Hallucination in Large Vision-Language Models
Hanchao Liu
Wenyuan Xue
Yifei Chen
Dapeng Chen
Xiutian Zhao
Ke Wang
Liping Hou
Rong-Zhi Li
Wei Peng
LRMMLLM
85
137
0
01 Feb 2024
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual
  Perception
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Junyang Wang
Haiyang Xu
Jiabo Ye
Mingshi Yan
Weizhou Shen
Ji Zhang
Fei Huang
Jitao Sang
138
129
0
29 Jan 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRLLRM
164
216
0
24 Jan 2024
Self-Imagine: Effective Unimodal Reasoning with Multimodal Models using
  Self-Imagination
Self-Imagine: Effective Unimodal Reasoning with Multimodal Models using Self-Imagination
Syeda Nahida Akter
Aman Madaan
Sangwu Lee
Yiming Yang
Eric Nyberg
ReLMVLMLRM
65
2
0
16 Jan 2024
LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
Yichen Zhu
Minjie Zhu
Ning Liu
Zhicai Ou
Xiaofeng Mou
Jian Tang
212
103
0
04 Jan 2024
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as
  Programmers
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Aleksandar Stanić
Sergi Caelles
Michael Tschannen
LRMVLM
92
10
0
03 Jan 2024
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile
  Devices
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
Xiangxiang Chu
Limeng Qiao
Xinyang Lin
Shuang Xu
Yang Yang
...
Fei Wei
Xinyu Zhang
Bo Zhang
Xiaolin Wei
Chunhua Shen
MLLM
130
44
0
28 Dec 2023
CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update
CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update
Zhi Gao
Yuntao Du
Xintong Zhang
Xiaojian Ma
Wenjuan Han
Song-Chun Zhu
Qing Li
LLMAGVLM
123
25
0
18 Dec 2023
GlitchBench: Can large multimodal models detect video game glitches?
GlitchBench: Can large multimodal models detect video game glitches?
Mohammad Reza Taesiri
Tianjun Feng
Anh Totti Nguyen
Cor-Paul Bezemer
MLLMVLMLRM
121
11
0
08 Dec 2023
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang
Hongyang Li
Feng Li
Tianhe Ren
Xueyan Zou
...
Shijia Huang
Jianfeng Gao
Lei Zhang
Chun-yue Li
Jianwei Yang
183
76
0
05 Dec 2023
Mitigating Hallucination in Visual Language Models with Visual
  Supervision
Mitigating Hallucination in Visual Language Models with Visual Supervision
Zhiyang Chen
Yousong Zhu
Yufei Zhan
Zhaowen Li
Chaoyang Zhao
Jinqiao Wang
Ming Tang
VLMMLLM
111
33
0
27 Nov 2023
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
Shangwen Wang
Xiaopeng Li
Shasha Li
Shan Zhao
Jie Yu
Jun Ma
Xiaoguang Mao
Weimin Zhang
116
7
0
10 Nov 2023
Towards Robust Multi-Modal Reasoning via Model Selection
Towards Robust Multi-Modal Reasoning via Model Selection
Xiangyan Liu
Rongxue Li
Wei Ji
Tao Lin
LLMAGLRM
90
6
0
12 Oct 2023
A Survey on Image-text Multimodal Models
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai-Nguyen Nguyen
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
128
7
0
23 Sep 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
MLLMVLM
166
238
0
07 Jul 2023
A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets
  Prompt Engineering
A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering
Chaoning Zhang
Fachrina Dewi Puspitasari
Sheng Zheng
Chenghao Li
Yu Qiao
...
Caiyan Qin
François Rameau
Lik-Hang Lee
Sung-Ho Bae
Choong Seon Hong
VLM
159
67
0
12 May 2023
Previous
12