ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.14895
  4. Cited By
Large Multimodal Models: Notes on CVPR 2023 Tutorial

Large Multimodal Models: Notes on CVPR 2023 Tutorial

26 June 2023
Chunyuan Li
    MLLM
    VLM
ArXivPDFHTML

Papers citing "Large Multimodal Models: Notes on CVPR 2023 Tutorial"

23 / 23 papers shown
Title
UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces
Baining Zhao
Jianjie Fang
Zichao Dai
Zhilin Wang
Jirong Zha
...
Chen Gao
Y. Wang
Jinqiang Cui
Xinlei Chen
Yongqian Li
51
2
0
08 Mar 2025
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models
Ruiyi Zhang
Yufan Zhou
Jian Chen
Jiuxiang Gu
Changyou Chen
Tongfei Sun
VLM
41
6
0
27 Jul 2024
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
S. Swetha
Jinyu Yang
T. Neiman
Mamshad Nayeem Rizve
Son Tran
Benjamin Z. Yao
Trishul Chilimbi
Mubarak Shah
59
2
0
18 Jul 2024
TRINS: Towards Multimodal Language Models that Can Read
TRINS: Towards Multimodal Language Models that Can Read
Ruiyi Zhang
Yanzhe Zhang
Jian Chen
Yufan Zhou
Jiuxiang Gu
Changyou Chen
Tong Sun
VLM
31
6
0
10 Jun 2024
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via
  Reinforcement Learning
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Yuexiang Zhai
Hao Bai
Zipeng Lin
Jiayi Pan
Shengbang Tong
...
Alane Suhr
Saining Xie
Yann LeCun
Yi Ma
Sergey Levine
LLMAG
LRM
47
59
0
16 May 2024
VisionGPT: Vision-Language Understanding Agent Using Generalized
  Multimodal Framework
VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework
Chris Kelly
Luhui Hu
Bang Yang
Yu Tian
Deshun Yang
Cindy Yang
Zaoshan Huang
Zihao Li
Jiayin Hu
Yuexian Zou
37
9
0
14 Mar 2024
Convincing Rationales for Visual Question Answering Reasoning
Convincing Rationales for Visual Question Answering Reasoning
Kun Li
G. Vosselman
Michael Ying Yang
44
1
0
06 Feb 2024
Explaining latent representations of generative models with large
  multimodal models
Explaining latent representations of generative models with large multimodal models
Mengdan Zhu
Zhenke Liu
Bo Pan
Abhinav Angirekula
Liang Zhao
37
2
0
02 Feb 2024
A Survey of Reasoning with Foundation Models
A Survey of Reasoning with Foundation Models
Jiankai Sun
Chuanyang Zheng
E. Xie
Zhengying Liu
Ruihang Chu
...
Xipeng Qiu
Yi-Chen Guo
Hui Xiong
Qun Liu
Zhenguo Li
ReLM
LRM
AI4CE
27
76
0
17 Dec 2023
GlitchBench: Can large multimodal models detect video game glitches?
GlitchBench: Can large multimodal models detect video game glitches?
Mohammad Reza Taesiri
Tianjun Feng
Anh Nguyen
C. Bezemer
MLLM
VLM
LRM
30
9
0
08 Dec 2023
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of
  Low-rank Experts
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Jialin Wu
Xia Hu
Yaqing Wang
Bo Pang
Radu Soricut
MoE
19
14
0
01 Dec 2023
Visual Data-Type Understanding does not emerge from Scaling
  Vision-Language Models
Visual Data-Type Understanding does not emerge from Scaling Vision-Language Models
Vishaal Udandarao
Max F. Burg
Samuel Albanie
Matthias Bethge
VLM
34
9
0
12 Oct 2023
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
Yadong Lu
Chunyuan Li
Haotian Liu
Jianwei Yang
Jianfeng Gao
Yelong Shen
MLLM
105
31
0
18 Sep 2023
Sparks of Large Audio Models: A Survey and Outlook
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Erik Cambria
Björn W. Schuller
LM&MA
AuLLM
31
38
0
24 Aug 2023
Detecting and Preventing Hallucinations in Large Vision Language Models
Detecting and Preventing Hallucinations in Large Vision Language Models
Anisha Gunjal
Jihan Yin
Erhan Bas
MLLM
VLM
22
155
0
11 Aug 2023
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image
  Understanding
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Yanzhe Zhang
Ruiyi Zhang
Jiuxiang Gu
Yufan Zhou
Nedim Lipka
Diyi Yang
Tongfei Sun
VLM
MLLM
25
219
0
29 Jun 2023
Concept-Oriented Deep Learning with Large Language Models
Concept-Oriented Deep Learning with Large Language Models
Daniel T. Chang
14
1
0
29 Jun 2023
Instruction-ViT: Multi-Modal Prompts for Instruction Learning in ViT
Instruction-ViT: Multi-Modal Prompts for Instruction Learning in ViT
Zhe Xiao
Yuzhong Chen
Lu Zhang
Jun Yao
Zihao Wu
...
Yixuan Yuan
Dinggang Shen
Dajiang Zhu
Tianming Liu
Xi Jiang
VLM
MLLM
63
17
0
29 Apr 2023
mPLUG-Owl: Modularization Empowers Large Language Models with
  Multimodality
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
208
900
0
27 Apr 2023
Instruction Tuning with GPT-4
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
159
579
0
06 Apr 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
211
1,106
0
20 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
373
8,495
0
28 Jan 2022
1