ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.12793
  4. Cited By
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

21 November 2023
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Conghui He
Jiaqi Wang
Feng Zhao
Dahua Lin
    MLLM
    VLM
ArXivPDFHTML

Papers citing "ShareGPT4V: Improving Large Multi-Modal Models with Better Captions"

50 / 471 papers shown
Title
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
Xiangyu Zhao
Shengyuan Ding
Zicheng Zhang
Haian Huang
Maosong Cao
...
Wenhai Wang
Guangtao Zhai
Haodong Duan
Hua Yang
Kai Chen
126
7
0
25 Feb 2025
Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing
Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing
Yi-Kai Zhang
De-Chuan Zhan
Han-Jia Ye
ALM
ELM
LRM
44
2
0
24 Feb 2025
Contrastive Visual Data Augmentation
Contrastive Visual Data Augmentation
Yu Zhou
B. Li
Mohan Tang
Xiaomeng Jin
Te-Lin Wu
Kuan-Hao Huang
Heng Ji
Kai-Wei Chang
Nanyun Peng
64
0
0
24 Feb 2025
Unveiling Reasoning Thresholds in Language Models: Scaling, Fine-Tuning, and Interpretability through Attention Maps
Unveiling Reasoning Thresholds in Language Models: Scaling, Fine-Tuning, and Interpretability through Attention Maps
Yen-Che Hsiao
Abhishek Dutta
LRM
ReLM
ELM
71
0
0
24 Feb 2025
Chitrarth: Bridging Vision and Language for a Billion People
Chitrarth: Bridging Vision and Language for a Billion People
Shaharukh Khan
Ayush Tarun
Abhinav Ravi
Ali Faraz
Akshat Patidar
Praveen Kumar Pokala
Anagha Bhangare
Raja Kolla
Chandra Khatri
Shubham Agarwal
VLM
130
1
0
21 Feb 2025
Megrez-Omni Technical Report
Boxun Li
Yadong Li
Zehan Li
Congyi Liu
Weilin Liu
...
Dong Zhou
Yueqing Zhuang
Shengen Yan
Guohao Dai
Yansen Wang
51
0
0
19 Feb 2025
Understanding and Rectifying Safety Perception Distortion in VLMs
Understanding and Rectifying Safety Perception Distortion in VLMs
Xiaohan Zou
Jian Kang
George Kesidis
Lu Lin
257
1
0
18 Feb 2025
VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation
VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation
Xinlong Chen
Yang Zhang
Chongling Rao
Yushuo Guan
Qingbin Liu
Fuzheng Zhang
Chengru Song
Qiang Liu
Di Zhang
Tieniu Tan
23
0
0
18 Feb 2025
Magma: A Foundation Model for Multimodal AI Agents
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
122
9
0
18 Feb 2025
Do we Really Need Visual Instructions? Towards Visual Instruction-Free Fine-tuning for Large Vision-Language Models
Do we Really Need Visual Instructions? Towards Visual Instruction-Free Fine-tuning for Large Vision-Language Models
Zikang Liu
K. Zhou
Wayne Xin Zhao
Dawei Gao
Yaliang Li
Zhicheng Dou
MLLM
VLM
LRM
94
0
0
17 Feb 2025
Unhackable Temporal Rewarding for Scalable Video MLLMs
Unhackable Temporal Rewarding for Scalable Video MLLMs
En Yu
Kangheng Lin
Liang Zhao
Yana Wei
Zining Zhu
...
Jianjian Sun
Zheng Ge
Xinsong Zhang
Jingyu Wang
Wenbing Tao
69
4
0
17 Feb 2025
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
Angelos Zavras
Dimitrios Michail
Xiao Xiang Zhu
Begüm Demir
Ioannis Papoutsis
VLM
88
0
0
13 Feb 2025
Effective Black-Box Multi-Faceted Attacks Breach Vision Large Language Model Guardrails
Effective Black-Box Multi-Faceted Attacks Breach Vision Large Language Model Guardrails
Yijun Yang
L. Wang
Xiao Yang
Lanqing Hong
Jun Zhu
AAML
66
0
0
09 Feb 2025
Rethinking Homogeneity of Vision and Text Tokens in Large Vision-and-Language Models
Rethinking Homogeneity of Vision and Text Tokens in Large Vision-and-Language Models
Chia-Wen Kuo
Sijie Zhu
Fan Chen
Xiaohui Shen
Longyin Wen
VLM
65
1
0
04 Feb 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi Ma
OffRL
72
67
0
28 Jan 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
Qingbin Liu
Tao Zhang
Tao Zhang
Tian Jin
...
Jianhua Xu
Haoze Sun
Mingan Lin
Zenan Zhou
Xin Wu
AuLLM
83
14
0
28 Jan 2025
StreamingRAG: Real-time Contextual Retrieval and Generation Framework
StreamingRAG: Real-time Contextual Retrieval and Generation Framework
Murugan Sankaradas
Ravi K.Rajendran
Srimat T.Chakradhar
47
1
0
23 Jan 2025
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo
Min-Hung Chen
De-An Huang
Sifei Liu
Subhashree Radhakrishnan
Seon Joo Kim
Yu-Chun Wang
Ryo Hachiuma
ObjD
VLM
162
2
0
14 Jan 2025
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Xuanle Zhao
Xianzhen Luo
Qi Shi
Chong Chen
Shuo Wang
Wanxiang Che
Zhiyuan Liu
Maosong Sun
MLLM
54
4
0
11 Jan 2025
MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
S. Joshi
Besmira Nushi
Vidhisha Balachandran
Varun Chandrasekaran
Vibhav Vineet
Neel Joshi
Baharan Mirzasoleiman
MLLM
VLM
49
0
0
07 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
91
12
0
06 Jan 2025
Efficient Architectures for High Resolution Vision-Language Models
Miguel Carvalho
Bruno Martins
MLLM
VLM
50
0
0
05 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
104
48
0
03 Jan 2025
Altogether: Image Captioning via Re-aligning Alt-text
Altogether: Image Captioning via Re-aligning Alt-text
Hu Xu
Po-Yao (Bernie) Huang
Xiaoqing Ellen Tan
Ching-Feng Yeh
Jacob Kahn
...
Luke Zettlemoyer
Wen-tau Yih
Shang-Wen Li
Saining Xie
Christoph Feichtenhofer
DiffM
49
7
0
31 Dec 2024
Multimodal Preference Data Synthetic Alignment with Reward Model
Multimodal Preference Data Synthetic Alignment with Reward Model
Robert Wijaya
Ngoc-Bao Nguyen
Ngai-man Cheung
MLLM
SyDa
62
2
0
23 Dec 2024
Where am I? Cross-View Geo-localization with Natural Language Descriptions
Where am I? Cross-View Geo-localization with Natural Language Descriptions
Junyan Ye
Honglin Lin
Leyan Ou
Dairong Chen
Zihao Wang
Zeang Sheng
Weijia Li
Weijia Li
81
0
0
22 Dec 2024
A High-Quality Text-Rich Image Instruction Tuning Dataset via Hybrid
  Instruction Generation
A High-Quality Text-Rich Image Instruction Tuning Dataset via Hybrid Instruction Generation
Shijie Zhou
R. Zhang
Yufan Zhou
Changyou Chen
VLM
77
1
0
20 Dec 2024
I0T: Embedding Standardization Method Towards Zero Modality Gap
I0T: Embedding Standardization Method Towards Zero Modality Gap
Na Min An
Eunki Kim
James Thorne
Hyunjung Shim
VLM
77
1
0
18 Dec 2024
CA-Edit: Causality-Aware Condition Adapter for High-Fidelity Local
  Facial Attribute Editing
CA-Edit: Causality-Aware Condition Adapter for High-Fidelity Local Facial Attribute Editing
Xiaole Xian
Xilin He
Zenghao Niu
Junliang Zhang
Weicheng Xie
Siyang Song
Zitong Yu
Linlin Shen
DiffM
81
0
0
18 Dec 2024
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
Yipeng Zhang
Yi Liu
Zonghao Guo
Yidan Zhang
Xuesong Yang
...
Yuan Yao
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
Maosong Sun
MLLM
VLM
92
0
0
18 Dec 2024
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for
  Long-term Streaming Video and Audio Interactions
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Pan Zhang
Xiaoyi Dong
Yuhang Cao
Yuhang Zang
Rui Qian
...
Xinsong Zhang
K. Chen
Yu Qiao
Dahua Lin
Jiaqi Wang
KELM
86
12
0
12 Dec 2024
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM
Haozhao Wang
Yuxiang Nie
Yongjie Ye
Deng GuanYu
Yanjie Wang
Shuai Li
Haiyang Yu
Jinghui Lu
Can Huang
VLM
MLLM
84
1
0
12 Dec 2024
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images
Andreas Koukounas
Georgios Mastrapas
Bo Wang
Mohammad Kalim Akram
Sedigheh Eslami
Michael Gunther
Isabelle Mohr
Saba Sturua
Scott Martens
Nan Wang
VLM
120
8
0
11 Dec 2024
RoboMM: All-in-One Multimodal Large Model for Robotic Manipulation
RoboMM: All-in-One Multimodal Large Model for Robotic Manipulation
Feng Yan
Fanfan Liu
Liming Zheng
Yufeng Zhong
Yiyang Huang
Zechao Guan
Chengjian Feng
Lin Ma
92
2
0
10 Dec 2024
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph
  Generation with Enhanced Spatial Relations
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations
Mingjie Xu
Mengyang Wu
Yuzhi Zhao
Jason Chun Lok Li
Weifeng Ou
LRM
SyDa
VLM
73
2
0
09 Dec 2024
Chimera: Improving Generalist Model with Domain-Specific Experts
Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng
Mingxing Li
Hongbin Zhou
Renqiu Xia
Renrui Zhang
...
Aojun Zhou
Botian Shi
Tao Chen
Bo Zhang
Xiangyu Yue
90
5
0
08 Dec 2024
Florence-VL: Enhancing Vision-Language Models with Generative Vision
  Encoder and Depth-Breadth Fusion
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Jiuhai Chen
Jianwei Yang
Haiping Wu
Dianqi Li
Jianfeng Gao
Tianyi Zhou
Bin Xiao
VLM
62
5
0
05 Dec 2024
FLAIR: VLM with Fine-grained Language-informed Image Representations
FLAIR: VLM with Fine-grained Language-informed Image Representations
Rui Xiao
Sanghwan Kim
Mariana-Iuliana Georgescu
Zeynep Akata
Stephan Alaniz
VLM
CLIP
84
2
0
04 Dec 2024
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision
  Language Models
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Byung-Kwan Lee
Ryo Hachiuma
Yu-Chiang Frank Wang
Y. Ro
Yueh-Hua Wu
VLM
81
0
0
02 Dec 2024
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile
  Vision-Language Model
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model
Qianhan Feng
Wenshuo Li
Tong Lin
Xinghao Chen
VLM
77
0
0
02 Dec 2024
PainterNet: Adaptive Image Inpainting with Actual-Token Attention and
  Diverse Mask Control
PainterNet: Adaptive Image Inpainting with Actual-Token Attention and Diverse Mask Control
Ruichen Wang
Junliang Zhang
Qingsong Xie
Chen Chen
H. Lu
DiffM
95
1
0
02 Dec 2024
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
Anton Voronov
Denis Kuznedelev
Mikhail Khoroshikh
Valentin Khrulkov
Dmitry Baranchuk
119
2
0
02 Dec 2024
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
96
2
0
02 Dec 2024
Beyond Pixels: Text Enhances Generalization in Real-World Image
  Restoration
Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration
Haoze Sun
W. J. Li
Qingbin Liu
Kaiwen Zhou
Yongqiang Chen
Yong Guo
Yunshui Li
Renjing Pei
Long Peng
Yue Yang
DiffM
83
1
0
01 Dec 2024
Advancing Myopia To Holism: Fully Contrastive Language-Image
  Pre-training
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training
Haicheng Wang
Chen Ju
Weixiong Lin
Shuai Xiao
Mengting Chen
...
Mingshuai Yao
Jinsong Lan
Ying Chen
Qingwen Liu
Yanfeng Wang
VLM
CLIP
88
4
0
30 Nov 2024
ForgerySleuth: Empowering Multimodal Large Language Models for Image
  Manipulation Detection
ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection
Zhihao Sun
Haoran Jiang
Haoran Chen
Yixin Cao
Xipeng Qiu
Zuxuan Wu
Yu-Gang Jiang
78
2
0
29 Nov 2024
On Domain-Specific Post-Training for Multimodal Large Language Models
On Domain-Specific Post-Training for Multimodal Large Language Models
Daixuan Cheng
Shaohan Huang
Ziyu Zhu
Xintong Zhang
Wayne Xin Zhao
Zhongzhi Luan
Bo Dai
Zhenliang Zhang
VLM
102
2
0
29 Nov 2024
Open-Sora Plan: Open-Source Large Video Generation Model
Bin Lin
Yunyang Ge
Xinhua Cheng
Zongjian Li
Bin Zhu
...
Zhang Pan
Xing Zhou
Shaoling Dong
Yonghong Tian
Li-xin Yuan
VLM
VGen
121
60
0
28 Nov 2024
Detailed Object Description with Controllable Dimensions
Detailed Object Description with Controllable Dimensions
Xinran Wang
Han Zhang
Baoteng Li
Kongming Liang
Hao Sun
Zhongjiang He
Zejun Ma
Jun Guo
81
1
0
28 Nov 2024
Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads
Siqi Kou
Jiachun Jin
Chang Liu
Ye Ma
Jian Jia
Quan Chen
Peng Jiang
Zhijie Deng
Zhijie Deng
DiffM
VGen
VLM
137
6
0
28 Nov 2024
Previous
123456...8910
Next