ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.07626
  4. Cited By
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
v1v2 (latest)

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

10 December 2024
Linke Ouyang
Yuan Qu
Hongbin Zhou
Jiawei Zhu
Rui Zhang
Qunshu Lin
Bin Wang
Zhiyuan Zhao
Man Jiang
Xiaomeng Zhao
Jin Shi
Fan Wu
Pei Chu
Minghao Liu
Zhenxiang Li
Chao Xu
Bo Zhang
Botian Shi
Zhongying Tu
Zeang Sheng
ArXiv (abs)PDFHTML

Papers citing "OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations"

42 / 42 papers shown
Title
MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm
Zhang Li
Yuliang Liu
Qiang Liu
Zhiyin Ma
Ziyang Zhang
Shuo Zhang
Zidun Guo
Jiarui Zhang
Xinyu Wang
Xiang Bai
114
0
0
05 Jun 2025
XY-Cut++: Advanced Layout Ordering via Hierarchical Mask Mechanism on a Novel Benchmark
XY-Cut++: Advanced Layout Ordering via Hierarchical Mask Mechanism on a Novel Benchmark
Shuai Liu
Youmeng Li
Jizeng Wei
73
1
0
14 Apr 2025
Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?
Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?
Zichen Wen
Yifeng Gao
Weijia Li
Conghui He
Linfeng Zhang
LRM
186
6
0
17 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
367
7
0
12 Feb 2025
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse
  Synthetic Data and Global-to-Local Adaptive Perception
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Zhiyuan Zhao
Hengrui Kang
Bin Wang
Zeang Sheng
72
17
0
16 Oct 2024
MinerU: An Open-Source Solution for Precise Document Content Extraction
MinerU: An Open-Source Solution for Precise Document Content Extraction
Bin Wang
Chao Xu
Xiaomeng Zhao
Linke Ouyang
Fan Wu
...
Wei Li
Botian Shi
Yu Qiao
Dahua Lin
Conghui He
62
47
0
27 Sep 2024
READoc: A Unified Benchmark for Realistic Document Structured Extraction
READoc: A Unified Benchmark for Realistic Document Structured Extraction
Zichao Li
Aizier Abulaiti
Yaojie Lu
Xuanang Chen
Jia Zheng
Hongyu Lin
Xianpei Han
Le Sun
77
5
0
08 Sep 2024
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page
  Document Understanding
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
Anwen Hu
Haiyang Xu
Liang Zhang
Jiabo Ye
Ming Yan
Ji Zhang
Qin Jin
Fei Huang
Jingren Zhou
VLM
109
37
0
05 Sep 2024
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Haoran Wei
Chenglong Liu
Jinyue Chen
Jia Wang
Lingyu Kong
...
Liang Zhao
Jianjian Sun
Yuang Peng
Chunrui Han
Xiangyu Zhang
VLM
110
55
0
03 Sep 2024
DocGenome: An Open Large-scale Scientific Document Benchmark for
  Training and Testing Multi-modal Large Language Models
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Renqiu Xia
Song Mao
Xiangchao Yan
Hongbin Zhou
Bo Zhang
...
Yongwei Wang
Bin Wang
Junchi Yan
Fei Wu
Yu Qiao
115
12
0
17 Jun 2024
Focus Anywhere for Fine-grained Multi-page Document Understanding
Focus Anywhere for Fine-grained Multi-page Document Understanding
Chenglong Liu
Haoran Wei
Jinyue Chen
Lingyu Kong
Zheng Ge
Zining Zhu
Liang Zhao
Jian‐Yuan Sun
Chunrui Han
Xiangyu Zhang
85
25
0
23 May 2024
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document
  Understanding
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Anwen Hu
Haiyang Xu
Jiabo Ye
Mingshi Yan
Liang Zhang
...
Chen Li
Ji Zhang
Qin Jin
Fei Huang
Jingren Zhou
VLM
117
125
0
19 Mar 2024
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning
Renqiu Xia
Bo Zhang
Hancheng Ye
Xiangchao Yan
Qi Liu
...
Min Dou
Botian Shi
Junchi Yan
Junchi Yan
Yu Qiao
LRM
193
68
0
19 Feb 2024
GraphKD: Exploring Knowledge Distillation Towards Document Object
  Detection with Structured Graph Creation
GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation
Ayan Banerjee
Sanket Biswas
Josep Lladós
Umapada Pal
128
2
0
17 Feb 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
  Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLMMLLM
324
1,217
0
21 Dec 2023
Retrieval-Augmented Generation for Large Language Models: A Survey
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao
Yun Xiong
Xinyu Gao
Kangxiang Jia
Jinliu Pan
Yuxi Bi
Yi Dai
Jiawei Sun
Meng Wang
Haofen Wang
3DVRALM
347
1,846
1
18 Dec 2023
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and
  Beyond
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond
Cong Yao
91
6
0
19 Oct 2023
Nougat: Neural Optical Understanding for Academic Documents
Nougat: Neural Optical Understanding for Academic Documents
Lukas Blecher
Guillem Cucurull
Thomas Scialom
Robert Stojnic
ViT
66
120
0
25 Aug 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding,
  Localization, Text Reading, and Beyond
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLMVLMObjD
217
945
0
24 Aug 2023
M$^{6}$Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout,
  Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout
  Analysis
M6^{6}6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis
Hiuyi Cheng
Pei-yu Zhang
Sihang Wu
Jiaxin Zhang
Qi Zhu
Zecheng Xie
Jing Li
Kai Ding
Lianwen Jin
113
32
0
15 May 2023
SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for
  Document Instance Segmentation
SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation
Ayan Banerjee
Sanket Biswas
Josep Lladós
Umapada Pal
ViT
90
16
0
08 May 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
1.7K
14,870
0
15 Mar 2023
Improving Table Structure Recognition with Visual-Alignment Sequential
  Coordinate Modeling
Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling
Yongshuai Huang
Ning Lu
Dapeng Chen
Yibo Li
Zecheng Xie
Shenggao Zhu
Liangcai Gao
Wei Peng
111
29
0
13 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
1.8K
13,560
0
27 Feb 2023
PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR
  System
PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR System
Chenxia Li
Weiwei Liu
Ruoyu Guo
Xiaoyue Yin
Kaitao Jiang
...
Lingfeng Zhu
Baohua Lai
Xiaoguang Hu
Dianhai Yu
Yanjun Ma
135
114
0
07 Jun 2022
DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis
DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis
B. Pfitzmann
Christoph Auer
Michele Dolfi
A. Nassar
Peter W. J. Staar
87
91
0
02 Jun 2022
Unified Pretraining Framework for Document Understanding
Unified Pretraining Framework for Document Understanding
Jiuxiang Gu
Jason Kuen
Vlad I. Morariu
Handong Zhao
Nikolaos Barmpalios
R. Jain
A. Nenkova
Tong Sun
105
98
0
22 Apr 2022
LayoutLMv3: Pre-training for Document AI with Unified Text and Image
  Masking
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Yupan Huang
Tengchao Lv
Lei Cui
Yutong Lu
Furu Wei
174
464
0
18 Apr 2022
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
  Detection and Text Recognition
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
Mingxin Huang
Yuliang Liu
Zhenghao Peng
Chongyu Liu
Dahua Lin
Shenggao Zhu
N. Yuan
Kai Ding
Lianwen Jin
ViT
85
103
0
19 Mar 2022
DiT: Self-supervised Pre-training for Document Image Transformer
DiT: Self-supervised Pre-training for Document Image Transformer
Junlong Li
Yiheng Xu
Tengchao Lv
Lei Cui
Chaoxi Zhang
Furu Wei
ViTVLM
130
170
0
04 Mar 2022
TabLeX: A Benchmark Dataset for Structure and Content Information
  Extraction from Scientific Tables
TabLeX: A Benchmark Dataset for Structure and Content Information Extraction from Scientific Tables
Harsh Desai
Pratik Kayal
M. Singh
LMTD
55
17
0
12 May 2021
PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering
  Network
PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network
Pengfei Wang
Chengquan Zhang
Fei Qi
Shanshan Liu
Xiaoqiang Zhang
Pengyuan Lyu
Junyu Han
Jingtuo Liu
Errui Ding
Guangming Shi
101
83
0
12 Apr 2021
TabTransformer: Tabular Data Modeling Using Contextual Embeddings
TabTransformer: Tabular Data Modeling Using Contextual Embeddings
Xin Huang
A. Khetan
Milan Cvitkovic
Zohar Karnin
ViTLMTD
227
462
0
11 Dec 2020
Towards a Multi-modal, Multi-task Learning based Pre-training Framework
  for Document Representation Learning
Towards a Multi-modal, Multi-task Learning based Pre-training Framework for Document Representation Learning
Subhojeet Pramanik
Shashank Mujumdar
Hima Patel
138
32
0
30 Sep 2020
Improving Attention-Based Handwritten Mathematical Expression
  Recognition with Scale Augmentation and Drop Attention
Improving Attention-Based Handwritten Mathematical Expression Recognition with Scale Augmentation and Drop Attention
Zhe Li
Lianwen Jin
Songxuan Lai
Yecheng Zhu
82
46
0
20 Jul 2020
Spatial Dependency Parsing for Semi-Structured Document Information
  Extraction
Spatial Dependency Parsing for Semi-Structured Document Information Extraction
Wonseok Hwang
Jinyeong Yim
Seunghyun Park
Sohee Yang
Minjoon Seo
115
97
0
01 May 2020
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network
Yuliang Liu
Hao Chen
Chunhua Shen
Tong He
Lianwen Jin
Liangwei Wang
153
336
0
24 Feb 2020
Image-based table recognition: data, model, and evaluation
Image-based table recognition: data, model, and evaluation
Xu Zhong
Elaheh Shafieibavani
Antonio Jimeno Yepes
LMTD
152
223
0
25 Nov 2019
PubLayNet: largest dataset ever for document layout analysis
PubLayNet: largest dataset ever for document layout analysis
Xu Zhong
Jianbin Tang
Antonio Jimeno Yepes
54
465
0
16 Aug 2019
TableBank: A Benchmark Dataset for Table Detection and Recognition
TableBank: A Benchmark Dataset for Table Detection and Recognition
Minghao Li
Lei Cui
Shaohan Huang
Furu Wei
M. Zhou
Zhoujun Li
LMTD
88
176
0
05 Mar 2019
Multi-Scale Attention with Dense Encoder for Handwritten Mathematical
  Expression Recognition
Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition
Jianshu Zhang
Jun Du
Lirong Dai
98
126
0
05 Jan 2018
Image-to-Markup Generation with Coarse-to-Fine Attention
Image-to-Markup Generation with Coarse-to-Fine Attention
Yuntian Deng
Anssi Kanervisto
Jeffrey Ling
Alexander M. Rush
68
230
0
16 Sep 2016
1