Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.16821
Cited By
v1
v2 (latest)
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
25 April 2024
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
Erfei Cui
Wenwen Tong
Kongzhi Hu
Jiapeng Luo
Zheng Ma
Ji Ma
Jiaqi Wang
Xiao-wen Dong
Hang Yan
Hewei Guo
Conghui He
Botian Shi
Zhenjiang Jin
Chaochao Xu
Bin Wang
Xingjian Wei
Wei Li
Wenjian Zhang
Bo Zhang
Pinlong Cai
Licheng Wen
Xiangchao Yan
Min Dou
Lewei Lu
Xizhou Zhu
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (8213★)
Papers citing
"How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites"
21 / 471 papers shown
Title
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Jingqun Tang
Qi-dong Liu
Yongjie Ye
Jinghui Lu
Shubo Wei
...
Hao Liu
Xiang Bai
Can Huang
Xiang Bai
Can Huang
187
28
0
20 May 2024
FreeVA: Offline MLLM as Training-Free Video Assistant
Wenhao Wu
VLM
OffRL
87
20
0
13 May 2024
VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization
Yuliang Liu
Mingxin Huang
Hao Yan
Linger Deng
Weijia Wu
Hao Lu
Chunhua Shen
Lianwen Jin
Xiang Bai
86
0
0
30 Apr 2024
UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition
Bin Wang
Zhuangcheng Gu
Chaochao Xu
Bo Zhang
Botian Shi
Conghui He
OffRL
91
13
0
23 Apr 2024
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin
Sam Ade Jacobs
A. A. Awan
J. Aneja
Ahmed Hassan Awadallah
...
Li Zhang
Yi Zhang
Yue Zhang
Yunan Zhang
Xiren Zhou
LRM
ALM
206
1,275
0
22 Apr 2024
Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases
Kai Chen
Yanze Li
Wenhua Zhang
Yanxin Liu
Pengxiang Li
...
Xinhai Zhao
Zhenguo Li
Dit-Yan Yeung
Huchuan Lu
Xu Jia
ELM
MLLM
116
37
0
16 Apr 2024
MM-MATH: Advancing Multimodal Math Evaluation with Process Evaluation and Fine-grained Classification
Kai Sun
Yushi Bai
Ji Qi
Lei Hou
Juanzi Li
LRM
74
23
0
07 Apr 2024
IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models
Haz Sameen Shahgir
Khondker Salman Sayeed
Abhik Bhattacharjee
Wasi Uddin Ahmad
Yue Dong
Rifat Shahriyar
VLM
MLLM
99
14
0
23 Mar 2024
What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-modal Models
Junho Kim
Yeonju Kim
Yonghyun Ro
LRM
MLLM
68
5
0
20 Mar 2024
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models
Kung-Hsiang Huang
Hou Pong Chan
Yi R. Fung
Haoyi Qiu
Mingyang Zhou
Shafiq Joty
Shih-Fu Chang
Chenhui Xu
AI4TS
123
32
0
18 Mar 2024
Customizing Visual-Language Foundation Models for Multi-modal Anomaly Detection and Reasoning
Xiaohao Xu
Yunkang Cao
Huaxin Zhang
Nong Sang
Xiaonan Huang
VLM
133
11
0
17 Mar 2024
GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
Enguang Wang
Zhimao Peng
Zhengyuan Xie
Fei Yang
Xialei Liu
Ming-Ming Cheng
135
3
0
15 Mar 2024
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLM
LRM
317
576
0
07 Mar 2024
Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Jiwen Zhang
Jihao Wu
Yihua Teng
Minghui Liao
Nuo Xu
Xiao Xiao
Zhongyu Wei
Duyu Tang
LLMAG
LM&Ro
125
75
0
05 Mar 2024
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
Yusu Qian
Haotian Zhang
Yinfei Yang
Zhe Gan
198
30
0
20 Feb 2024
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning
Renqiu Xia
Bo Zhang
Hancheng Ye
Xiangchao Yan
Qi Liu
...
Min Dou
Botian Shi
Junchi Yan
Junchi Yan
Yu Qiao
LRM
182
68
0
19 Feb 2024
V-IRL: Grounding Virtual Intelligence in Real Life
Jihan Yang
Runyu Ding
Ellis L Brown
Xiaojuan Qi
Saining Xie
LM&Ro
119
22
0
05 Feb 2024
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Guosheng Dong
Zhiying Wu
ELM
LRM
334
755
0
19 Sep 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
MLLM
VLM
168
238
0
07 Jul 2023
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLM
LRM
138
613
0
23 Jun 2023
On the Hidden Mystery of OCR in Large Multimodal Models
Yuliang Liu
Zhang Li
Mingxin Huang
Chunyuan Li
Dezhi Peng
Mingyu Liu
Lianwen Jin
Xiang Bai
VLM
MLLM
144
96
0
13 May 2023
Previous
1
2
3
...
10
8
9