Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.14895
Cited By
Large Multimodal Models: Notes on CVPR 2023 Tutorial
26 June 2023
Chunyuan Li
MLLM
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Large Multimodal Models: Notes on CVPR 2023 Tutorial"
18 / 18 papers shown
Title
UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces
Baining Zhao
Jianjie Fang
Zichao Dai
Ziyi Wang
Jirong Zha
...
Chen Gao
Yijiao Wang
Jinqiang Cui
Xinlei Chen
Yongqian Li
99
5
0
08 Mar 2025
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models
Ruiyi Zhang
Yufan Zhou
Jian Chen
Jiuxiang Gu
Changyou Chen
Tongfei Sun
VLM
52
6
0
27 Jul 2024
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
S. Swetha
Jinyu Yang
T. Neiman
Mamshad Nayeem Rizve
Son Tran
Benjamin Z. Yao
Trishul Chilimbi
Mubarak Shah
112
2
0
18 Jul 2024
TRINS: Towards Multimodal Language Models that Can Read
Ruiyi Zhang
Yanzhe Zhang
Jian Chen
Yufan Zhou
Jiuxiang Gu
Changyou Chen
Tong Sun
VLM
82
6
0
10 Jun 2024
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Yuexiang Zhai
Hao Bai
Zipeng Lin
Jiayi Pan
Shengbang Tong
...
Alane Suhr
Saining Xie
Yann LeCun
Yi-An Ma
Sergey Levine
LLMAG
LRM
137
80
0
16 May 2024
VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework
Chris Kelly
Luhui Hu
Bang Yang
Yu Tian
Deshun Yang
Cindy Yang
Zaoshan Huang
Zihao Li
Jiayin Hu
Yuexian Zou
85
10
0
14 Mar 2024
Multimodal Rationales for Explainable Visual Question Answering
Kun Li
G. Vosselman
Michael Ying Yang
130
2
0
06 Feb 2024
Explaining latent representations of generative models with large multimodal models
Mengdan Zhu
Zhenke Liu
Bo Pan
Abhinav Angirekula
Liang Zhao
60
2
0
02 Feb 2024
A Survey of Reasoning with Foundation Models
Jiankai Sun
Chuanyang Zheng
Enze Xie
Zhengying Liu
Ruihang Chu
...
Xipeng Qiu
Yi-Chen Guo
Hui Xiong
Qun Liu
Zhenguo Li
ReLM
LRM
AI4CE
207
85
0
17 Dec 2023
GlitchBench: Can large multimodal models detect video game glitches?
Mohammad Reza Taesiri
Tianjun Feng
Anh Totti Nguyen
Cor-Paul Bezemer
MLLM
VLM
LRM
128
11
0
08 Dec 2023
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Jialin Wu
Xia Hu
Yaqing Wang
Bo Pang
Radu Soricut
MoE
80
16
0
01 Dec 2023
Visual Data-Type Understanding does not emerge from Scaling Vision-Language Models
Vishaal Udandarao
Max F. Burg
Samuel Albanie
Matthias Bethge
VLM
67
9
0
12 Oct 2023
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
Yadong Lu
Chunyuan Li
Haotian Liu
Jianwei Yang
Jianfeng Gao
Yelong Shen
MLLM
164
31
0
18 Sep 2023
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Min Zhang
Björn W. Schuller
LM&MA
AuLLM
188
39
0
24 Aug 2023
Detecting and Preventing Hallucinations in Large Vision Language Models
Anisha Gunjal
Jihan Yin
Erhan Bas
MLLM
VLM
96
175
0
11 Aug 2023
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Yanzhe Zhang
Ruiyi Zhang
Jiuxiang Gu
Yufan Zhou
Nedim Lipka
Diyi Yang
Tongfei Sun
VLM
MLLM
103
238
0
29 Jun 2023
Concept-Oriented Deep Learning with Large Language Models
Daniel T. Chang
54
1
0
29 Jun 2023
Valley: Video Assistant with Large Language model Enhanced abilitY
Ruipu Luo
Ziwang Zhao
Min Yang
Junwei Dong
Da Li
Pengcheng Lu
Tao Wang
Linmei Hu
Ming-Hui Qiu
MLLM
132
209
0
12 Jun 2023
1