Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.00571
Cited By
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
1 November 2023
Wei-Ge Chen
Irina Spiridonova
Jianwei Yang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing"
34 / 34 papers shown
Title
FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation
Fan Yang
Yousong Zhu
Xin Li
Yufei Zhan
Hongyin Zhao
Shurong Zheng
Yaowei Wang
Ming Tang
Jinqiao Wang
MLLM
VLM
52
0
0
20 Jun 2025
ZeroVO: Visual Odometry with Minimal Assumptions
Lei Lai
Zekai Yin
Eshed Ohn-Bar
VGen
30
0
0
09 Jun 2025
A Preliminary Study for GPT-4o on Image Restoration
Hao Yang
Yiran Yang
Ruikun Zhang
Liyuan Pan
111
1
0
08 May 2025
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation
Mingcheng Li
Xiaolu Hou
Ziyang Liu
Jinjie Wei
Ziyun Qian
Jiawei Chen
Jinjie Wei
Yiheng Jiang
Qingyao Xu
Li Zhang
DiffM
494
0
0
05 May 2025
Foundation Model-Driven Framework for Human-Object Interaction Prediction with Segmentation Mask Integration
Juhan Park
Kyungjae Lee
Hyung Jin Chang
Jungchan Cho
VLM
118
0
0
28 Apr 2025
Building LLM Agents by Incorporating Insights from Computer Systems
Yapeng Mi
Zhi Gao
Xiaojian Ma
Qing Li
LLMAG
129
0
0
06 Apr 2025
QG-VTC: Question-Guided Visual Token Compression in MLLMs for Efficient VQA
Shuai Li
Jian Xu
Xiao-Hui Li
Chao Deng
Lin-Lin Huang
MQ
85
1
0
01 Apr 2025
The Power of One: A Single Example is All it Takes for Segmentation in VLMs
Mir Rayat Imtiaz Hossain
Mennatullah Siam
Leonid Sigal
James J. Little
MLLM
VLM
Presented at
ResearchTrend Connect | VLM
on
21 May 2025
232
0
0
13 Mar 2025
Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding
Ziyang Chen
Mingxiao Li
Zhongfu Chen
Nan Du
Xiaolong Li
Yuexian Zou
150
1
0
19 Jan 2025
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Haobo Yuan
Xianrui Li
Tao Zhang
Zilong Huang
Shilin Xu
S. Ji
Yunhai Tong
Lu Qi
Jiashi Feng
Ming-Hsuan Yang
VLM
201
25
0
07 Jan 2025
SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing
Rong-Cheng Tu
Wenhao Sun
Zhao Jin
Jingyi Liao
Jiaxing Huang
Dacheng Tao
VGen
DiffM
173
7
0
28 Nov 2024
Generative Timelines for Instructed Visual Assembly
Alejandro Pardo
Jui-hsien Wang
Guohao Li
Josef Sivic
Bryan C. Russell
Fabian Caba Heilbron
VGen
108
0
0
19 Nov 2024
Mitigating Object Hallucination via Concentric Causal Attention
Yun Xing
Yiheng Li
Ivan Laptev
Shijian Lu
110
23
0
21 Oct 2024
A Survey on Data Synthesis and Augmentation for Large Language Models
Ke Wang
Jiahui Zhu
Minjie Ren
Ziqiang Liu
Shiwei Li
...
Yiming Lei
Xiaoyu Wu
Qiqi Zhan
Qingjie Liu
Yunhong Wang
SyDa
183
21
0
16 Oct 2024
Sportify: Question Answering with Embedded Visualizations and Personified Narratives for Sports Video
Chunggi Lee
Tica Lin
Hanspeter Pfister
Chen Zhu-Tian
81
2
0
09 Aug 2024
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model
Zhaowei Li
Wei Wang
Yiqing Cai
Xu Qi
Pengyu Wang
Dong Zhang
Hang Song
Botian Jiang
Zhida Huang
Tao Wang
AIFin
LRM
108
5
0
05 Aug 2024
Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort
Jeeyung Kim
Ze Wang
Qiang Qiu
81
2
0
12 Jul 2024
MMedAgent: Learning to Use Medical Tools with Multi-modal Agent
Binxu Li
Tiankai Yan
Yuanting Pan
Zhe Xu
Jie Luo
Ruiyang Ji
Shilong Liu
Haoyu Dong
Zihao Lin
Yixin Wang
LM&MA
91
35
0
02 Jul 2024
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
Tao Zhang
Xiangtai Li
Hao Fei
Haobo Yuan
Shengqiong Wu
Shunping Ji
Chen Change Loy
Shuicheng Yan
LRM
MLLM
VLM
141
63
0
27 Jun 2024
Holistic Evaluation for Interleaved Text-and-Image Generation
Minqian Liu
Zhiyang Xu
Zihao Lin
Trevor Ashby
Joy Rimchala
Jiaxin Zhang
Lifu Huang
EGVM
124
11
0
20 Jun 2024
Temporal Grounding of Activities using Multimodal Large Language Models
Young Chol Song
93
0
0
30 May 2024
Empowering Segmentation Ability to Multi-modal Large Language Models
Yuqi Yang
Peng-Tao Jiang
Jing Wang
Hao Zhang
Kai Zhao
Jinwei Chen
Yue Liu
LRM
VLM
90
4
0
21 Mar 2024
ReGround: Improving Textual and Spatial Grounding at No Cost
Yuseung Lee
Minhyuk Sung
DiffM
76
2
0
20 Mar 2024
Reconstruct before Query: Continual Missing Modality Learning with Decomposed Prompt Collaboration
Shu Zhao
Xiaohan Zou
Tan Yu
Huijuan Xu
85
1
0
17 Mar 2024
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
Minbin Huang
Yanxin Long
Xinchi Deng
Ruihang Chu
Jiangfeng Xiong
Xiaodan Liang
Hong Cheng
Qinglin Lu
Wei Liu
MLLM
EGVM
179
10
0
13 Mar 2024
NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models
Lizhou Fan
Wenyue Hua
Xiang Li
Kaijie Zhu
Mingyu Jin
...
Haoyang Ling
Jinkui Chi
Jindong Wang
Xin Ma
Yongfeng Zhang
LRM
88
14
0
04 Mar 2024
Large Multimodal Agents: A Survey
Junlin Xie
Zhihong Chen
Ruifei Zhang
Xiang Wan
Guanbin Li
LM&Ro
LLMAG
100
44
0
23 Feb 2024
DeiSAM: Segment Anything with Deictic Prompting
Hikaru Shindo
Manuel Brack
Gopika Sudhakaran
Devendra Singh Dhami
P. Schramowski
Kristian Kersting
VLM
87
3
0
21 Feb 2024
The Essential Role of Causality in Foundation World Models for Embodied AI
Tarun Gupta
Wenbo Gong
Chao Ma
Nick Pawlowski
Agrin Hilmkil
...
Jianfeng Gao
Stefan Bauer
Danica Kragic
Bernhard Schölkopf
Cheng Zhang
92
17
0
06 Feb 2024
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Ling Yang
Zhaochen Yu
Chenlin Meng
Minkai Xu
Stefano Ermon
Tengjiao Wang
CoGe
DiffM
131
137
0
22 Jan 2024
Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey
Jiaxing Huang
Jingyi Zhang
Kai Jiang
Han Qiu
Shijian Lu
92
23
0
27 Dec 2023
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
Bolin Lai
Xiaoliang Dai
Lawrence Chen
Guan Pang
James M. Rehg
Miao Liu
112
17
0
06 Dec 2023
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
Shangwen Wang
Xiaopeng Li
Shasha Li
Shan Zhao
Jie Yu
Jun Ma
Xiaoguang Mao
Weimin Zhang
119
7
0
10 Nov 2023
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild
Huayang Li
Siheng Li
Deng Cai
Longyue Wang
Lemao Liu
Taro Watanabe
Yujiu Yang
Shuming Shi
MLLM
145
18
0
14 Sep 2023
1