Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.17421
Cited By
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
29 September 2023
Zhengyuan Yang
Linjie Li
Kevin Qinghong Lin
Jianfeng Wang
Chung-Ching Lin
Nasim Shakouri Mahmoudabadi
Lijuan Wang
LM&MA
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)"
43 / 43 papers shown
Title
Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal Learning
Cheng Peng
Kai Zhang
Mengxian Lyu
Hongfang Liu
Lichao Sun
Yonghui Wu
LM&MA
MedIm
VLM
137
0
0
23 May 2025
Superplatforms Have to Attack AI Agents
Jianghao Lin
Jiachen Zhu
Zheli Zhou
Yunjia Xi
Weiwen Liu
Yong Yu
Weinan Zhang
AAML
34
0
0
23 May 2025
Robotic Visual Instruction
Yuchen Li
Ziyang Gong
Haoyang Li
Xiaoqi Huang
Haolan Kang
Guangping Bai
Xianzheng Ma
LM&Ro
101
0
0
01 May 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
121
0
0
22 Apr 2025
UFO2: The Desktop AgentOS
Chaoyun Zhang
He Huang
Chiming Ni
J. Mu
Si Qin
...
Minghua Ma
Jian-Guang Lou
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
LLMAG
89
3
0
20 Apr 2025
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
Xinze Wang
Zhiyong Yang
Chao Feng
Hongjin Lu
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Furong Huang
Lijuan Wang
OODD
ReLM
LRM
VLM
120
12
0
10 Apr 2025
LaViC: Adapting Large Vision-Language Models to Visually-Aware Conversational Recommendation
Hyunsik Jeon
Satoshi Koide
Yu Wang
Zhankui He
Julian McAuley
VLM
114
0
0
30 Mar 2025
EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments
Dongping Li
Tielong Cai
Tianci Tang
Wenhao Chai
Katherine Rose Driggs-Campbell
Gaoang Wang
LM&Ro
138
0
0
11 Mar 2025
Parameter-Efficient Fine-Tuning for Foundation Models
Dan Zhang
Tao Feng
Lilong Xue
Yuandong Wang
Yuxiao Dong
J. Tang
118
10
0
23 Jan 2025
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models
Yong-Hyun Park
Sangdoo Yun
Jin-Hwa Kim
Junho Kim
Geonhui Jang
Yonghyun Jeong
Junghyo Jo
Gayoung Lee
96
14
0
17 Jan 2025
Instruction-Guided Scene Text Recognition
Yongkun Du
Z. Chen
Yuchen Su
Caiyan Jia
Yu-Gang Jiang
107
3
0
03 Jan 2025
ChemDFM-X: Towards Large Multimodal Model for Chemistry
Zihan Zhao
B. Chen
Jingpiao Li
Lu Chen
Liyang Wen
...
Ziping Wan
Yansi Li
Zhongyang Dai
Xin Chen
Kai Yu
AI4CE
126
3
0
03 Jan 2025
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Hanguang Xiao
Feizhong Zhou
Xianglong Liu
Tianqi Liu
Zhipeng Li
Xin Liu
Xiaoxuan Huang
AILaw
LM&MA
LRM
85
24
0
31 Dec 2024
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Hao Fei
Shengqiong Wu
Hao Zhang
Tat-Seng Chua
Shuicheng Yan
107
40
0
31 Dec 2024
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao
Shiqian Su
X. Zhu
Chenyu Zhang
Zhe Chen
...
Wenhai Wang
Lewei Lu
Gao Huang
Yu Qiao
Jifeng Dai
MLLM
VLM
156
2
0
20 Dec 2024
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Di Zhang
Jingdi Lei
Junxian Li
Xunzhi Wang
Yong Liu
...
Steve Yang
Jianbo Wu
Peng Ye
Wanli Ouyang
Dongzhan Zhou
OffRL
LRM
129
7
0
27 Nov 2024
TurtleBench: A Visual Programming Benchmark in Turtle Geometry
Sina Rismanchian
Yasaman Razeghi
Sameer Singh
Shayan Doroudi
75
1
0
31 Oct 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
84
26
0
10 Oct 2024
GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation
Yangtao Chen
Zixuan Chen
Junhui Yin
Jing Huo
Pinzhuo Tian
Jieqi Shi
Yang Gao
LM&Ro
81
3
0
30 Sep 2024
FoAM: Foresight-Augmented Multi-Task Imitation Policy for Robotic Manipulation
Litao Liu
Wentao Wang
Yifan Han
Zhuoli Xie
Pengfei Yi
Junyan Li
Yi Qin
Wenzhao Lian
50
2
0
29 Sep 2024
HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models
V. Bhat
Prashanth Krishnamurthy
Ramesh Karri
Farshad Khorrami
77
4
0
16 Sep 2024
Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection
Jinfa Huang
Jinsheng Pan
Zhongwei Wan
Hanjia Lyu
Jiebo Luo
72
5
0
30 Jul 2024
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Amir Mohammad Karimi Mamaghan
Samuele Papa
Karl Henrik Johansson
Stefan Bauer
Andrea Dittadi
OCL
78
7
0
22 Jul 2024
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
Yufan Shen
Chuwei Luo
Zhaoqing Zhu
Yang Chen
Qi Zheng
Zhi Yu
Jiajun Bu
Cong Yao
66
2
0
17 Jul 2024
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
Jinghui Lu
Haiyang Yu
Yanjie Wang
Yongjie Ye
Jingqun Tang
...
Qi Liu
Hao Feng
Han Wang
Hao Liu
Can Huang
98
23
0
02 Jul 2024
Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach
Yuxuan Wan
Chaozheng Wang
Yi Dong
Wenxuan Wang
Shuqing Li
Yintong Huo
Michael R. Lyu
3DV
83
10
0
24 Jun 2024
EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms
Siyu Yuan
Kaitao Song
Jiangjie Chen
Xu Tan
Dongsheng Li
Deqing Yang
LLMAG
81
17
0
20 Jun 2024
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
Shengkang Wang
Hongzhan Lin
Ziyang Luo
Zhen Ye
Guang Chen
Jing Ma
86
3
0
17 Jun 2024
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Zejun Li
Ruipu Luo
Jiwen Zhang
Minghui Qiu
Zhongyu Wei
Zhongyu Wei
LRM
MLLM
101
14
0
27 May 2024
Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization
Beitao Chen
Xinyu Lyu
Lianli Gao
Jingkuan Song
Hengtao Shen
MLLM
65
10
0
24 May 2024
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Jingqun Tang
Qi-dong Liu
Yongjie Ye
Jinghui Lu
Shubo Wei
...
Yanjie Wang
Yuliang Liu
Hao Liu
Xiang Bai
Can Huang
104
28
0
20 May 2024
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
An Yan
Zhengyuan Yang
Junda Wu
Wanrong Zhu
Jianwei Yang
...
Kevin Qinghong Lin
Jianfeng Wang
Julian McAuley
Jianfeng Gao
Lijuan Wang
LRM
52
12
0
25 Apr 2024
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min
Shyamal Buch
Arsha Nagrani
Minsu Cho
Cordelia Schmid
LRM
59
24
0
09 Apr 2024
Customizing Visual-Language Foundation Models for Multi-modal Anomaly Detection and Reasoning
Xiaohao Xu
Yunkang Cao
Huaxin Zhang
Nong Sang
Xiaonan Huang
VLM
84
10
0
17 Mar 2024
What Is Missing in Multilingual Visual Reasoning and How to Fix It
Yueqi Song
Simran Khanuja
Graham Neubig
VLM
LRM
134
6
0
03 Mar 2024
A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models
Xiujie Song
Mengyue Wu
Ke Zhu
Chunhao Zhang
Yanyi Chen
LRM
ELM
63
3
0
28 Feb 2024
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning
Renqiu Xia
Bo Zhang
Hancheng Ye
Xiangchao Yan
Qi Liu
...
Min Dou
Botian Shi
Junchi Yan
Junchi Yan
Yu Qiao
LRM
88
61
0
19 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
151
112
0
08 Feb 2024
Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks
Maan Qraitem
Nazia Tasnim
Piotr Teterwak
Kate Saenko
Bryan A. Plummer
AAML
VLM
46
11
0
01 Feb 2024
Emojis Decoded: Leveraging ChatGPT for Enhanced Understanding in Social Media Communications
Yuhang Zhou
Paiheng Xu
Xiyao Wang
Xuan Lu
Ge Gao
Wei Ai
74
5
0
22 Jan 2024
GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse
Hongzhan Lin
Ziyang Luo
Bo Wang
Ruichao Yang
Jing Ma
72
28
0
03 Jan 2024
Ctrl-Room: Controllable Text-to-3D Room Meshes Generation with Layout Constraints
Chuan Fang
Yuan Dong
Kunming Luo
Xiaotao Hu
Rakesh Shrestha
Ping Tan
DiffM
102
35
0
05 Oct 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
VLM
MLLM
120
231
0
07 Jul 2023
1