ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.12793
  4. Cited By
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

21 November 2023
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Conghui He
Jiaqi Wang
Feng Zhao
Dahua Lin
    MLLM
    VLM
ArXivPDFHTML

Papers citing "ShareGPT4V: Improving Large Multi-Modal Models with Better Captions"

50 / 471 papers shown
Title
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal
  Large Language Models
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
Wenqiao Zhang
Tianwei Lin
Jiang Liu
Fangxun Shu
Haoyuan Li
...
Zheqi Lv
Hao Jiang
Juncheng Li
Siliang Tang
Yueting Zhuang
VLM
MLLM
41
4
0
20 Mar 2024
X-LLaVA: Optimizing Bilingual Large Vision-Language Alignment
X-LLaVA: Optimizing Bilingual Large Vision-Language Alignment
Dongjae Shin
Hyunseok Lim
Inho Won
Changsu Choi
Minjun Kim
Seungwoo Song
Hangyeol Yoo
Sangmin Kim
Kyungtae Lim
23
5
0
18 Mar 2024
SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
Guohao Sun
Can Qin
Jiamian Wang
Zeyuan Chen
Ran Xu
Zhiqiang Tao
MLLM
VLM
LRM
37
9
0
17 Mar 2024
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Brandon McKinzie
Zhe Gan
J. Fauconnier
Sam Dodge
Bowen Zhang
...
Zirui Wang
Ruoming Pang
Peter Grasch
Alexander Toshev
Yinfei Yang
MLLM
43
189
0
14 Mar 2024
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text
  Transformation
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation
Yunhao Gou
Kai Chen
Zhili Liu
Lanqing Hong
Hang Xu
Zhenguo Li
Dit-Yan Yeung
James T. Kwok
Yu Zhang
MLLM
46
42
0
14 Mar 2024
Strengthening Multimodal Large Language Model with Bootstrapped
  Preference Optimization
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
Renjie Pi
Tianyang Han
Wei Xiong
Jipeng Zhang
Runtao Liu
Rui Pan
Tong Zhang
MLLM
50
34
0
13 Mar 2024
MoAI: Mixture of All Intelligence for Large Language and Vision Models
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Byung-Kwan Lee
Beomchan Park
Chae Won Kim
Yonghyun Ro
MLLM
VLM
48
20
0
12 Mar 2024
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large
  Multimodal Models
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
Yang Jiao
Shaoxiang Chen
Zequn Jie
Wenke Huang
Lin Ma
Yueping Jiang
MLLM
50
18
0
12 Mar 2024
DeepSeek-VL: Towards Real-World Vision-Language Understanding
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Haoyu Lu
Wen Liu
Bo Zhang
Bing-Li Wang
Kai Dong
...
Yaofeng Sun
Chengqi Deng
Hanwei Xu
Zhenda Xie
Chong Ruan
VLM
41
304
0
08 Mar 2024
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K
  Text-to-Image Generation
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Junsong Chen
Chongjian Ge
Enze Xie
Yue Wu
Lewei Yao
Xiaozhe Ren
Zhongdao Wang
Ping Luo
Huchuan Lu
Zhenguo Li
141
90
0
07 Mar 2024
Yi: Open Foundation Models by 01.AI
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLM
LRM
150
512
0
07 Mar 2024
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
Yanqi Dai
Dong Jing
Nanyi Fei
Zhiwu Lu
Nanyi Fei
Guoxing Yang
Zhiwu Lu
58
3
0
07 Mar 2024
Finetuned Multimodal Language Models Are High-Quality Image-Text Data
  Filters
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters
Weizhi Wang
Khalil Mrini
Linjie Yang
Sateesh Kumar
Yu Tian
Xifeng Yan
Heng Wang
46
16
0
05 Mar 2024
ImgTrojan: Jailbreaking Vision-Language Models with ONE Image
ImgTrojan: Jailbreaking Vision-Language Models with ONE Image
Xijia Tao
Shuai Zhong
Lei Li
Qi Liu
Lingpeng Kong
47
25
0
05 Mar 2024
Beyond Specialization: Assessing the Capabilities of MLLMs in Age and Gender Estimation
Beyond Specialization: Assessing the Capabilities of MLLMs in Age and Gender Estimation
Maksim Kuprashevich
Grigorii Alekseenko
Irina Tolstykh
ELM
66
4
0
04 Mar 2024
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of
  Large Vision-Language Models
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models
Lei Li
Yuqi Wang
Runxin Xu
Peiyi Wang
Xiachong Feng
Lingpeng Kong
Qi Liu
39
51
0
01 Mar 2024
The All-Seeing Project V2: Towards General Relation Comprehension of the
  Open World
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Weiyun Wang
Yiming Ren
Hao Luo
Tiantong Li
Chenxiang Yan
...
Qingyun Li
Lewei Lu
Xizhou Zhu
Yu Qiao
Jifeng Dai
MLLM
55
48
0
29 Feb 2024
A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models
A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models
Xiujie Song
Mengyue Wu
Ke Zhu
Chunhao Zhang
Yanyi Chen
LRM
ELM
36
3
0
28 Feb 2024
Towards Open-ended Visual Quality Comparison
Towards Open-ended Visual Quality Comparison
Haoning Wu
Hanwei Zhu
Zicheng Zhang
Erli Zhang
Chaofeng Chen
...
Qiong Yan
Xiaohong Liu
Guangtao Zhai
Shiqi Wang
Weisi Lin
AAML
67
49
0
26 Feb 2024
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
Yao Mu
Junting Chen
Qinglong Zhang
Shoufa Chen
Qiaojun Yu
...
Wenhai Wang
Jifeng Dai
Yu Qiao
Mingyu Ding
Ping Luo
46
22
0
25 Feb 2024
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation
  Framework for Large Vision Language Models
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models
Chaoya Jiang
Wei Ye
Mengfan Dong
Hongrui Jia
Haiyang Xu
Mingshi Yan
Ji Zhang
Shikun Zhang
VLM
MLLM
43
15
0
24 Feb 2024
DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large
  Language Models
DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models
Yuhang Cao
Pan Zhang
Xiao-wen Dong
Dahua Lin
Jiaqi Wang
45
11
0
22 Feb 2024
Visual Hallucinations of Multi-modal Large Language Models
Visual Hallucinations of Multi-modal Large Language Models
Wen Huang
Hongbin Liu
Minxin Guo
Neil Zhenqiang Gong
MLLM
VLM
32
24
0
22 Feb 2024
Subobject-level Image Tokenization
Subobject-level Image Tokenization
Delong Chen
Samuel Cahyawijaya
Jianfeng Liu
Baoyuan Wang
Pascale Fung
VLM
OCL
60
7
0
22 Feb 2024
CODIS: Benchmarking Context-Dependent Visual Comprehension for
  Multimodal Large Language Models
CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models
Ziyue Wang
Chi Chen
Zihao Wan
Zhaolu Kang
Qidong Yan
...
Xiaoyue Mi
Peng Li
Ning Ma
Maosong Sun
Yang Liu
48
5
0
21 Feb 2024
A Touch, Vision, and Language Dataset for Multimodal Alignment
A Touch, Vision, and Language Dataset for Multimodal Alignment
Letian Fu
Gaurav Datta
Huang Huang
Will Panitch
Jaimyn Drake
Joseph Ortiz
Mustafa Mukadam
Mike Lambeta
Roberto Calandra
Ken Goldberg
VLM
40
34
0
20 Feb 2024
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on
  Deceptive Prompts
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
Yusu Qian
Haotian Zhang
Yinfei Yang
Zhe Gan
100
26
0
20 Feb 2024
ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language
  Models
ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models
Guiming Hardy Chen
Shunian Chen
Ruifei Zhang
Junying Chen
Xiangbo Wu
Zhiyi Zhang
Zhihong Chen
Jianquan Li
Xiang Wan
Benyou Wang
VLM
SyDa
41
129
0
18 Feb 2024
Cobra Effect in Reference-Free Image Captioning Metrics
Cobra Effect in Reference-Free Image Captioning Metrics
Zheng Ma
Changxin Wang
Yawen Ouyang
Fei Zhao
Jianbing Zhang
Shujian Huang
Jiajun Chen
38
2
0
18 Feb 2024
Efficient Multimodal Learning from Data-centric Perspective
Efficient Multimodal Learning from Data-centric Perspective
Muyang He
Yexin Liu
Boya Wu
Jianhao Yuan
Yueze Wang
Tiejun Huang
Bo Zhao
MLLM
38
85
0
18 Feb 2024
CoLLaVO: Crayon Large Language and Vision mOdel
CoLLaVO: Crayon Large Language and Vision mOdel
Byung-Kwan Lee
Beomchan Park
Chae Won Kim
Yonghyun Ro
VLM
MLLM
44
16
0
17 Feb 2024
Multi-modal preference alignment remedies regression of visual
  instruction tuning on language model
Multi-modal preference alignment remedies regression of visual instruction tuning on language model
Shengzhi Li
Rongyu Lin
Shichao Pei
40
22
0
16 Feb 2024
EFUF: Efficient Fine-grained Unlearning Framework for Mitigating
  Hallucinations in Multimodal Large Language Models
EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models
Shangyu Xing
Fei Zhao
Zhen Wu
Tuo An
Weihao Chen
Chunhui Li
Jianbing Zhang
Xinyu Dai
MLLM
MU
47
5
0
15 Feb 2024
Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision
Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision
Zhaoqing Wang
Xiaobo Xia
Ziye Chen
Xiao He
Yandong Guo
Biwei Huang
Tongliang Liu
VLM
29
11
0
14 Feb 2024
World Model on Million-Length Video And Language With Blockwise RingAttention
World Model on Million-Length Video And Language With Blockwise RingAttention
Hao Liu
Wilson Yan
Matei A. Zaharia
Pieter Abbeel
VGen
39
64
0
13 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
130
110
0
08 Feb 2024
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Xiangxiang Chu
Limeng Qiao
Xinyu Zhang
Shuang Xu
Fei Wei
...
Xiaofei Sun
Yiming Hu
Xinyang Lin
Bo Zhang
Chunhua Shen
VLM
MLLM
33
100
0
06 Feb 2024
Instruction Makes a Difference
Instruction Makes a Difference
Tosin Adewumi
Nudrat Habib
Lama Alkhaled
Elisa Barney
VLM
MLLM
24
1
0
01 Feb 2024
MouSi: Poly-Visual-Expert Vision-Language Models
MouSi: Poly-Visual-Expert Vision-Language Models
Xiaoran Fan
Tao Ji
Changhao Jiang
Shuo Li
Senjie Jin
...
Qi Zhang
Xipeng Qiu
Xuanjing Huang
Zuxuan Wu
Yunchun Jiang
VLM
32
16
0
30 Jan 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and
  Comprehension in Vision-Language Large Model
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Jiaqi Wang
VLM
MLLM
89
245
0
29 Jan 2024
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts
  in Instruction Finetuning MLLMs
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs
Shaoxiang Chen
Zequn Jie
Lin Ma
MoE
50
47
0
29 Jan 2024
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Bin Lin
Zhenyu Tang
Yang Ye
Jiaxi Cui
Bin Zhu
...
Jinfa Huang
Junwu Zhang
Yatian Pang
Munan Ning
Li-ming Yuan
VLM
MLLM
MoE
48
154
0
29 Jan 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
56
183
0
24 Jan 2024
Red Teaming Visual Language Models
Red Teaming Visual Language Models
Mukai Li
Lei Li
Yuwei Yin
Masood Ahmed
Zhenguang Liu
Qi Liu
VLM
51
30
0
23 Jan 2024
Benchmarking Large Multimodal Models against Common Corruptions
Benchmarking Large Multimodal Models against Common Corruptions
Jiawei Zhang
Tianyu Pang
Chao Du
Yi Ren
Bo-wen Li
Min Lin
MLLM
40
14
0
22 Jan 2024
COCO is "ALL'' You Need for Visual Instruction Fine-tuning
COCO is "ALL'' You Need for Visual Instruction Fine-tuning
Xiaotian Han
Yiqi Wang
Bohan Zhai
Quanzeng You
Hongxia Yang
VLM
MLLM
35
2
0
17 Jan 2024
AesBench: An Expert Benchmark for Multimodal Large Language Models on
  Image Aesthetics Perception
AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception
Yipo Huang
Quan Yuan
Xiangfei Sheng
Zhichao Yang
Haoning Wu
Pengfei Chen
Yuzhe Yang
Leida Li
Weisi Lin
VLM
24
38
0
16 Jan 2024
MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of
  Multimodal Large Language Models in Perception
MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception
Yuhao Wang
Yusheng Liao
Heyang Liu
Hongcheng Liu
Yu Wang
Yanfeng Wang
LRM
VLM
30
13
0
15 Jan 2024
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile
  Devices
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
Xiangxiang Chu
Limeng Qiao
Xinyang Lin
Shuang Xu
Yang Yang
...
Fei Wei
Xinyu Zhang
Bo Zhang
Xiaolin Wei
Chunhua Shen
MLLM
44
35
0
28 Dec 2023
Mixture of Cluster-conditional LoRA Experts for Vision-language
  Instruction Tuning
Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning
Yunhao Gou
Zhili Liu
Kai Chen
Lanqing Hong
Hang Xu
Aoxue Li
Dit-Yan Yeung
James T. Kwok
Yu Zhang
MoE
MLLM
VLM
49
63
0
19 Dec 2023
Previous
123...1089
Next