Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.08485
Cited By
Visual Instruction Tuning
17 April 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Instruction Tuning"
50 / 3,278 papers shown
Title
ChartAdapter: Large Vision-Language Model for Chart Summarization
Peixin Xu
Yujuan Ding
Wenqi Fan
32
2
0
31 Dec 2024
Multi-Agent Planning Using Visual Language Models
Michele Brienza
F. Argenziano
Vincenzo Suriani
D. Bloisi
Daniele Nardi
LM&Ro
LLMAG
72
4
0
31 Dec 2024
M
3
^3
3
oralBench: A MultiModal Moral Benchmark for LVLMs
Bei Yan
Jie M. Zhang
Zhiyuan Chen
Shiguang Shan
Xilin Chen
ELM
56
1
0
31 Dec 2024
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
67
4
0
31 Dec 2024
Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning
Zhifang Zhang
Shuo He
Bingquan Shen
Lei Feng
Lei Feng
AAML
60
0
0
29 Dec 2024
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers
Yuntao Chen
Yuqi Wang
Zhaoxiang Zhang
249
7
0
24 Dec 2024
Personalized Large Vision-Language Models
Chau Pham
Hoang Phan
David Doermann
Yunjie Tian
VLM
57
3
0
23 Dec 2024
Diving into Self-Evolving Training for Multimodal Reasoning
Wei Liu
Junlong Li
Xiwen Zhang
Fan Zhou
Yu Cheng
Junxian He
ReLM
LRM
49
11
0
23 Dec 2024
Multimodal Preference Data Synthetic Alignment with Reward Model
Robert Wijaya
Ngoc-Bao Nguyen
Ngai-man Cheung
MLLM
SyDa
64
3
0
23 Dec 2024
AV-EmoDialog: Chat with Audio-Visual Users Leveraging Emotional Cues
Se Jin Park
Yeonju Kim
Hyeongseop Rha
Bella Godiva
Y. Ro
51
1
0
23 Dec 2024
GraphAgent: Agentic Graph Language Assistant
Yuhao Yang
J. Tang
Lianghao Xia
Xingchen Zou
Keli Zhang
Chao Huang
LM&Ro
98
1
0
22 Dec 2024
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs
Xin Zhang
Yanzhao Zhang
Wen Xie
Mingxin Li
Ziqi Dai
Dingkun Long
Pengjun Xie
Meishan Zhang
Wenjie Li
Hao Fei
118
9
0
22 Dec 2024
SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization
Tan-Hanh Pham
Hoang-Nam Le
Phu-Vinh Nguyen
Chris Ngo
Truong-Son Hy
AuLLM
LRM
99
1
0
21 Dec 2024
Do Multimodal Language Models Really Understand Direction? A Benchmark for Compass Direction Reasoning
Hang Yin
Zhifeng Lin
Xin Liu
Bin Sun
Kan Li
LRM
92
1
0
21 Dec 2024
REO-VLM: Transforming VLM to Meet Regression Challenges in Earth Observation
Xizhe Xue
Guoting Wei
Hao Chen
Han Zhang
Feng Lin
Chunhua Shen
Xiao Xiang Zhu
108
3
0
21 Dec 2024
Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling
Daichi Yashima
Ryosuke Korekata
Komei Sugiura
82
0
0
21 Dec 2024
Reframing Image Difference Captioning with BLIP2IDC and Synthetic Augmentation
Gautier Evennou
Antoine Chaffin
Vivien Chappelier
Ewa Kijak
DiffM
89
0
0
20 Dec 2024
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao
Shiqian Su
X. Zhu
Chenyu Zhang
Zhe Chen
...
Wenhai Wang
Lewei Lu
Gao Huang
Yu Qiao
Jifeng Dai
MLLM
VLM
115
2
0
20 Dec 2024
Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data
Zhiqiang Tang
Zihan Zhong
Tong He
Gerald Friedland
94
0
0
19 Dec 2024
Next Patch Prediction for Autoregressive Visual Generation
Yatian Pang
Peng Jin
Shuo Yang
Bin Lin
Bin Zhu
...
Liuhan Chen
Francis E. H. Tay
Ser-Nam Lim
Harry Yang
Li Yuan
142
9
0
19 Dec 2024
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Jihan Yang
Shusheng Yang
Anjali W. Gupta
Rilyn Han
Li Fei-Fei
Saining Xie
LRM
135
58
0
18 Dec 2024
Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models
Xinghang Li
Peiyan Li
Minghuan Liu
Dong Wang
Jirong Liu
Bingyi Kang
Xiao Ma
Tao Kong
Hanbo Zhang
Huaping Liu
LM&Ro
104
20
0
18 Dec 2024
A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future
Shilin Sun
Wenbin An
Feng Tian
Fang Nan
Qidong Liu
Jing Liu
N. Shah
Ping Chen
104
3
0
18 Dec 2024
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models
Cong Wei
Yujie Zhong
Haoxian Tan
Yingsen Zeng
Yong Liu
Zheng Zhao
Yujiu Yang
MLLM
VLM
VOS
116
2
0
18 Dec 2024
Unlocking the Potential of Weakly Labeled Data: A Co-Evolutionary Learning Framework for Abnormality Detection and Report Generation
Jinghan Sun
Dong-mei Wei
Zhe Xu
Donghuan Lu
Hong Liu
Hong Wang
Sotirios A. Tsaftaris
Jingyu Sun
Yefeng Zheng
Liansheng Wang
MedIm
134
0
0
18 Dec 2024
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection
Le Yang
Ziwei Zheng
Boxu Chen
Zhengyu Zhao
Chenhao Lin
Chao Shen
VLM
148
3
0
18 Dec 2024
CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers
Dimitrios Mallis
Ahmet Serdar Karadeniz
Sebastian Cavada
Danila Rukhovich
Niki Maria Foteinopoulou
K. Cherenkova
Anis Kacem
Djamila Aouada
84
3
0
18 Dec 2024
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
Yipeng Zhang
Yi Liu
Zonghao Guo
Yidan Zhang
Xuesong Yang
...
Yuan Yao
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
Maosong Sun
MLLM
VLM
92
0
0
18 Dec 2024
HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction
Chen Bao
Jiarui Xu
Xiaolong Wang
Abhinav Gupta
Homanga Bharadhwaj
90
3
0
17 Dec 2024
LLMs are Also Effective Embedding Models: An In-depth Overview
Chongyang Tao
Tao Shen
Shen Gao
Junshuo Zhang
Zhen Li
Zhengwei Tao
Shuai Ma
95
8
0
17 Dec 2024
CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image
Wonseok Roh
Hwanhee Jung
Jong Wook Kim
Seanie Lee
Innfarn Yoo
Andreas Lugmayr
Seunggeun Chi
K. Ramani
Sangpil Kim
3DGS
97
2
0
17 Dec 2024
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
Renqiu Xia
Mingxing Li
Hancheng Ye
Wenjie Wu
Hongbin Zhou
...
Zeang Sheng
Botian Shi
Tao Chen
Junchi Yan
Bo Zhang
99
8
0
16 Dec 2024
OmniVLM: A Token-Compressed, Sub-Billion-Parameter Vision-Language Model for Efficient On-Device Inference
Wei Chen
Zhiyuan Li
Shuo Xin
VLM
MLLM
91
3
0
16 Dec 2024
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval
Yuanmin Tang
Xiaoting Qin
Jingyang Zhang
Jing Yu
Gaopeng Gou
Gang Xiong
Qingwei Ling
Saravan Rajmohan
Dongmei Zhang
Qi Wu
LRM
68
1
0
15 Dec 2024
Empowering LLMs to Understand and Generate Complex Vector Graphics
Ximing Xing
Juncheng Hu
Guotao Liang
Jing Zhang
Dong Xu
Qian Yu
103
7
0
15 Dec 2024
CATALOG: A Camera Trap Language-guided Contrastive Learning Model
Julian D. Santamaria
Claudia Isaza
Jhony H. Giraldo
88
0
0
14 Dec 2024
EVLM: Self-Reflective Multimodal Reasoning for Cross-Dimensional Visual Editing
Umar Khalid
Hasan Iqbal
Azib Farooq
Nazanin Rahnavard
Jing Hua
...
H. Iqbal
Azib Farooq
Nazanin Rahnavard
Jing Hua
Chen Chen
80
0
0
13 Dec 2024
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Pan Zhang
Xiaoyi Dong
Yuhang Cao
Yuhang Zang
Rui Qian
...
Xinsong Zhang
Kai Chen
Yu Qiao
Dahua Lin
Jiaqi Wang
KELM
89
12
0
12 Dec 2024
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Zhisheng Zhong
Chengyao Wang
Yuqi Liu
Senqiao Yang
Longxiang Tang
...
Shaozuo Yu
Sitong Wu
Eric Lo
Shu Liu
Jiaya Jia
AuLLM
111
7
0
12 Dec 2024
Falcon-UI: Understanding GUI Before Following User Instructions
Huawen Shen
Chang-Shu Liu
Gengluo Li
Xinlong Wang
Yu Zhou
Can Ma
Xiangyang Ji
LLMAG
98
5
0
12 Dec 2024
Foundation Models and Adaptive Feature Selection: A Synergistic Approach to Video Question Answering
Sai Bhargav Rongali
M. Cui
Ankit Jha
Neha Bhargava
Saurabh Prasad
Biplab Banerjee
89
0
0
12 Dec 2024
Geo-LLaVA: A Large Multi-Modal Model for Solving Geometry Math Problems with Meta In-Context Learning
Shihao Xu
Yiyang Luo
Wei Shi
LRM
ReLM
90
2
0
12 Dec 2024
Olympus: A Universal Task Router for Computer Vision Tasks
Yuanze Lin
Yunsheng Li
Dongdong Chen
Weijian Xu
Ronald Clark
Philip Torr
VLM
ObjD
302
0
0
12 Dec 2024
Efficient and Comprehensive Feature Extraction in Large Vision-Language Model for Pathology Analysis
Shengxuming Zhang
Weihan Li
Tianhong Gao
Jiacong Hu
Haoming Luo
Xiuming Zhang
Jing Zhang
Mingli Song
Zunlei Feng
LM&MA
108
0
0
12 Dec 2024
LLaVA-Zip: Adaptive Visual Token Compression with Intrinsic Image Information
Ke Wang
Hong Xuan
VLM
75
2
0
11 Dec 2024
ChatDyn: Language-Driven Multi-Actor Dynamics Generation in Street Scenes
Yuxi Wei
Jingbo Wang
Yuwen Du
Dingju Wang
Liang Pan
Chenxin Xu
Yao Feng
Bo Dai
Siheng Chen
AI4CE
91
1
0
11 Dec 2024
RoboMM: All-in-One Multimodal Large Model for Robotic Manipulation
Feng Yan
Fanfan Liu
Liming Zheng
Yufeng Zhong
Yiyang Huang
Zechao Guan
Chengjian Feng
Lin Ma
92
2
0
10 Dec 2024
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations
Mingjie Xu
Mengyang Wu
Yuzhi Zhao
Jason Chun Lok Li
Weifeng Ou
LRM
SyDa
VLM
78
3
0
09 Dec 2024
MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization
Kangyu Zhu
Peng Xia
Yun Li
Hongtu Zhu
Sheng Wang
Huaxiu Yao
111
1
0
09 Dec 2024
Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng
Mingxing Li
Hongbin Zhou
Renqiu Xia
Renrui Zhang
...
Aojun Zhou
Botian Shi
Tao Chen
Bo Zhang
Xiangyu Yue
98
5
0
08 Dec 2024
Previous
1
2
3
...
18
19
20
...
64
65
66
Next