Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.11633
Cited By
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
17 June 2024
Renqiu Xia
Song Mao
Xiangchao Yan
Hongbin Zhou
Bo Zhang
Haoyang Peng
Jiahao Pi
Daocheng Fu
Wenjie Wu
Hancheng Ye
Shiyang Feng
Bin Wang
Chao Xu
Conghui He
Pinlong Cai
Min Dou
Botian Shi
Sheng Zhou
Yongwei Wang
Bin Wang
Junchi Yan
Fei Wu
Yu Qiao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models"
13 / 13 papers shown
Title
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan
Zining Wang
Pei Fu
Zhengtao Guo
Wei Shen
...
Chen Duan
Hao Sun
Qianyi Jiang
Junfeng Luo
Xiaokang Yang
VLM
68
1
0
04 Mar 2025
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
Renqiu Xia
Mingxing Li
Hancheng Ye
Wenjie Wu
Hongbin Zhou
...
Zeang Sheng
Botian Shi
Tao Chen
Junchi Yan
Bo Zhang
108
9
0
16 Dec 2024
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
Linke Ouyang
Yuan Qu
Hongbin Zhou
Jiawei Zhu
Rui Zhang
...
Chao Xu
Bo Zhang
Botian Shi
Zhongying Tu
Zeang Sheng
123
6
0
10 Dec 2024
Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng
Mingxing Li
Hongbin Zhou
Renqiu Xia
Renrui Zhang
...
Aojun Zhou
Botian Shi
Tao Chen
Bo Zhang
Xiangyu Yue
124
5
0
08 Dec 2024
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Zirui Wang
Mengzhou Xia
Luxi He
Howard Chen
Yitao Liu
...
Haotian Liu
Sadhika Malladi
Alexis Chevalier
Sanjeev Arora
Danqi Chen
28
52
0
26 Jun 2024
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models
Lei Li
Yuqi Wang
Runxin Xu
Peiyi Wang
Xiachong Feng
Lingpeng Kong
Qi Liu
56
53
0
01 Mar 2024
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning
Renqiu Xia
Bo Zhang
Hancheng Ye
Xiangchao Yan
Qi Liu
...
Min Dou
Botian Shi
Junchi Yan
Junchi Yan
Yu Qiao
LRM
86
61
0
19 Feb 2024
DocLLM: A layout-aware generative language model for multimodal document understanding
Dongsheng Wang
Natraj Raman
Mathieu Sibue
Zhiqiang Ma
Petr Babkin
Simerjot Kaur
Yulong Pei
Armineh Nourbakhsh
Xiaomo Liu
VLM
60
55
0
31 Dec 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
VLM
MLLM
120
231
0
07 Jul 2023
DocFormerv2: Local Features for Document Understanding
Srikar Appalaraju
Peng Tang
Qi Dong
Nishant Sankaran
Yichu Zhou
R. Manmatha
68
40
0
02 Jun 2023
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
276
3,458
0
29 Apr 2022
DocVQA: A Dataset for VQA on Document Images
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
91
690
0
01 Jul 2020
Representation Learning: A Review and New Perspectives
Yoshua Bengio
Aaron Courville
Pascal Vincent
OOD
SSL
160
12,384
0
24 Jun 2012
1