ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.16534
  4. Cited By
An Early Evaluation of GPT-4V(ision)

An Early Evaluation of GPT-4V(ision)

25 October 2023
Yang Wu
Shilong Wang
Hao Yang
Tian Zheng
Hongbo Zhang
Yanyan Zhao
Bing Qin
    MLLM
    ELM
ArXivPDFHTML

Papers citing "An Early Evaluation of GPT-4V(ision)"

27 / 27 papers shown
Title
Evaluating Compositional Scene Understanding in Multimodal Generative Models
Evaluating Compositional Scene Understanding in Multimodal Generative Models
Shuhao Fu
Andrew Jun Lee
Anna Wang
Ida Momennejad
Trevor Bihl
Hongjing Lu
Taylor Webb
CoGe
OCL
114
1
0
29 Mar 2025
3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o
3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o
Dingning Liu
Cheng Wang
Peng Gao
Renrui Zhang
Xinzhu Ma
Yuan Meng
Zhihui Wang
LRM
49
0
0
17 Mar 2025
Advancing Multimodal In-Context Learning in Large Vision-Language Models with Task-aware Demonstrations
Advancing Multimodal In-Context Learning in Large Vision-Language Models with Task-aware Demonstrations
Yanshu Li
49
0
0
05 Mar 2025
Introducing Visual Perception Token into Multimodal Large Language Model
Introducing Visual Perception Token into Multimodal Large Language Model
Runpeng Yu
Xinyin Ma
Xinchao Wang
MLLM
LRM
89
0
0
24 Feb 2025
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Mengzhao Jia
Wenhao Yu
Kaixin Ma
Tianqing Fang
Zhihan Zhang
Siru Ouyang
Hongming Zhang
Meng Jiang
Dong Yu
VLM
42
5
0
02 Oct 2024
From the Least to the Most: Building a Plug-and-Play Visual Reasoner via
  Data Synthesis
From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis
Chuanqi Cheng
Jian Guan
Wei Wu
Rui Yan
LRM
54
10
0
28 Jun 2024
GPT-4V Explorations: Mining Autonomous Driving
GPT-4V Explorations: Mining Autonomous Driving
Zixuan Li
47
1
0
24 Jun 2024
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Ling-Hao Chen
Shunlin Lu
Ailing Zeng
Hao Zhang
Benyou Wang
Ruimao Zhang
Lei Zhang
63
33
0
30 May 2024
LLM-Optic: Unveiling the Capabilities of Large Language Models for
  Universal Visual Grounding
LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding
Haoyu Zhao
Wenhang Ge
Ying-cong Chen
ObjD
MLLM
VLM
37
4
0
27 May 2024
Realizing Visual Question Answering for Education: GPT-4V as a
  Multimodal AI
Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI
Gyeong-Geon Lee
Xiaoming Zhai
43
6
0
12 May 2024
A Philosophical Introduction to Language Models - Part II: The Way
  Forward
A Philosophical Introduction to Language Models - Part II: The Way Forward
Raphael Milliere
Cameron Buckner
LRM
66
14
0
06 May 2024
MileBench: Benchmarking MLLMs in Long Context
MileBench: Benchmarking MLLMs in Long Context
Dingjie Song
Shunian Chen
Guiming Hardy Chen
Fei Yu
Xiang Wan
Benyou Wang
VLM
82
35
0
29 Apr 2024
Constructing Multilingual Visual-Text Datasets Revealing Visual
  Multilingual Ability of Vision Language Models
Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models
Jesse Atuhurra
Iqra Ali
Tatsuya Hiraoka
Hidetaka Kamigaito
Tomoya Iwakura
Taro Watanabe
51
1
0
29 Mar 2024
BBA: Bi-Modal Behavioral Alignment for Reasoning with Large
  Vision-Language Models
BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models
Xueliang Zhao
Xinting Huang
Tingchen Fu
Qintong Li
Shansan Gong
Lemao Liu
Wei Bi
Lingpeng Kong
LRM
42
1
0
21 Feb 2024
Scaffolding Coordinates to Promote Vision-Language Coordination in Large
  Multi-Modal Models
Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models
Xuanyu Lei
Zonghan Yang
Xinrui Chen
Peng Li
Yang Liu
MLLM
LRM
43
32
0
19 Feb 2024
Progress and Opportunities of Foundation Models in Bioinformatics
Progress and Opportunities of Foundation Models in Bioinformatics
Qing Li
Zhihang Hu
Yixuan Wang
Lei Li
Yimin Fan
Irwin King
Le Song
Yu Li
AI4CE
48
9
0
06 Feb 2024
Developing ChatGPT for Biology and Medicine: A Complete Review of
  Biomedical Question Answering
Developing ChatGPT for Biology and Medicine: A Complete Review of Biomedical Question Answering
Qing Li
Lei Li
Yu Li
LM&MA
AI4MH
51
6
0
15 Jan 2024
DeepArt: A Benchmark to Advance Fidelity Research in AI-Generated
  Content
DeepArt: A Benchmark to Advance Fidelity Research in AI-Generated Content
Wentao Wang
Xuanyao Huang
Tianyang Wang
Swalpa Kumar Roy
EGVM
50
0
0
16 Dec 2023
GlitchBench: Can large multimodal models detect video game glitches?
GlitchBench: Can large multimodal models detect video game glitches?
Mohammad Reza Taesiri
Tianjun Feng
Anh Totti Nguyen
Cor-Paul Bezemer
MLLM
VLM
LRM
48
10
0
08 Dec 2023
Charting New Territories: Exploring the Geographic and Geospatial
  Capabilities of Multimodal LLMs
Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs
Jonathan Roberts
Timo Lüddecke
Rehan Sheikh
Kai Han
Samuel Albanie
MLLM
26
26
0
24 Nov 2023
NERIF: GPT-4V for Automatic Scoring of Drawn Models
NERIF: GPT-4V for Automatic Scoring of Drawn Models
Gyeong-Geon Lee
Xiaoming Zhai
26
10
0
21 Nov 2023
AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination
  Evaluation
AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation
Junyang Wang
Yuhang Wang
Guohai Xu
Jing Zhang
Yukai Gu
...
Jiaqi Wang
Haiyang Xu
Ming Yan
Ji Zhang
Jitao Sang
MLLM
VLM
30
104
0
13 Nov 2023
GPT-4V-AD: Exploring Grounding Potential of VQA-oriented GPT-4V for
  Zero-shot Anomaly Detection
GPT-4V-AD: Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection
Jiangning Zhang
Haoyang He
Xuhai Chen
Zhucun Xue
Yabiao Wang
Chengjie Wang
Lei Xie
Yong Liu
MLLM
43
22
0
05 Nov 2023
A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical
  Image Analysis
A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image Analysis
Yingshu Li
Yunyi Liu
Zhanyu Wang
Xinyu Liang
Lei Wang
Lingqiao Liu
Leyang Cui
Zhaopeng Tu
Longyue Wang
Luping Zhou
ELM
LM&MA
45
38
0
31 Oct 2023
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language
  Models
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu
Peixian Chen
Yunhang Shen
Yulei Qin
Mengdan Zhang
...
Xiawu Zheng
Ke Li
Xing Sun
Zhenyu Qiu
Rongrong Ji
ELM
MLLM
42
770
0
23 Jun 2023
Mind's Eye: Grounded Language Model Reasoning through Simulation
Mind's Eye: Grounded Language Model Reasoning through Simulation
Ruibo Liu
Jason W. Wei
S. Gu
Te-Yen Wu
Soroush Vosoughi
Claire Cui
Denny Zhou
Andrew M. Dai
ReLM
LRM
124
80
0
11 Oct 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
211
1,134
0
20 Sep 2022
1