Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.08387
Cited By
v1
v2
v3 (latest)
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
18 April 2022
Yupan Huang
Tengchao Lv
Lei Cui
Yutong Lu
Furu Wei
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking"
50 / 277 papers shown
Title
Robustness Evaluation of OCR-based Visual Document Understanding under Multi-Modal Adversarial Attacks
Dong Nguyen Tien
Dung D. Le
AAML
19
0
0
19 Jun 2025
MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm
Zhang Li
Yuliang Liu
Qiang Liu
Zhiyin Ma
Ziyang Zhang
Shuo Zhang
Zidun Guo
Jiarui Zhang
Xinyu Wang
Xiang Bai
109
0
0
05 Jun 2025
CRAWLDoc: A Dataset for Robust Ranking of Bibliographic Documents
Fabian Karl
A. Scherp
63
0
0
04 Jun 2025
GraphRAG-Bench: Challenging Domain-Specific Reasoning for Evaluating Graph Retrieval-Augmented Generation
Yilin Xiao
Junnan Dong
Chuang Zhou
Su Dong
Qianwen Zhang
Di Yin
Xing Sun
Xiao Huang
LRM
62
0
0
03 Jun 2025
VRD-IU: Lessons from Visually Rich Document Intelligence and Understanding
Yihao Ding
S. Han
Yan Li
Josiah Poon
58
1
0
02 Jun 2025
Predicting the Past: Estimating Historical Appraisals with OCR and Machine Learning
Mihir Bhaskar
Jun Tao Luo
Zihan Geng
Asmita Hajra
Junia Howell
Matthew R. Gormley
33
0
0
30 May 2025
Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition
Yu Li
Jin Jiang
J. Zhu
Shuai Peng
Baole Wei
Yuxuan Zhou
Liangcai Gao
55
0
0
29 May 2025
Enhancing Large Vision-Language Models with Layout Modality for Table Question Answering on Japanese Annual Securities Reports
Hayato Aida
Kosuke Takahashi
Takahiro Omi
LMTD
37
0
0
23 May 2025
Benchmarking Retrieval-Augmented Multimomal Generation for Document Question Answering
Kuicai Dong
Yujing Chang
Shijie Huang
Yasheng Wang
Ruiming Tang
Yong Liu
70
1
0
22 May 2025
FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document Understanding
Amit Agarwal
Srikant Panda
Kulbhushan Pachauri
69
4
0
22 May 2025
SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation
Yuyang Dong
Nobuhiro Ueda
Krisztián Boros
Daiki Ito
Takuya Sera
Masafumi Oyamada
VLM
116
0
0
20 May 2025
Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting
Hao Feng
Shu Wei
Xiang Fei
Wei Shi
Yingdong Han
...
Qi Liu
Chunhui Lin
Jingqun Tang
Hao Liu
Can Huang
137
3
0
20 May 2025
Information Extraction from Visually Rich Documents using LLM-based Organization of Documents into Independent Textual Segments
Aniket Bhattacharyya
Anurag Tripathi
Ujjal Das
Archan Karmakar
Amit Pathak
Maneesh Gupta
66
0
0
18 May 2025
Lost in OCR Translation? Vision-Based Approaches to Robust Document Retrieval
Alexander Buschmann Most
Joseph Winjum
Ayan Biswas
Shawn Jones
Nishath Rajiv Ranasinghe
Dan O’Malley
Manish Bhattarai
78
0
0
08 May 2025
DocSpiral: A Platform for Integrated Assistive Document Annotation through Human-in-the-Spiral
Qiang Sun
Sirui Li
Tingting Bi
D. Huynh
Mark Reynolds
Yuanyi Luo
Wei Liu
82
0
0
06 May 2025
Beyond Text: Characterizing Domain Expert Needs in Document Research
Sireesh Gururaja
Nupoor Gandhi
Jeremiah Milbauer
Emma Strubell
133
0
0
16 Apr 2025
XY-Cut++: Advanced Layout Ordering via Hierarchical Mask Mechanism on a Novel Benchmark
Shuai Liu
Youmeng Li
Jizeng Wei
73
1
0
14 Apr 2025
Relation-Rich Visual Document Generator for Visual Information Extraction
Zi-Han Jiang
Chien-Wei Lin
Wei-Hua Li
Hsuan-Tung Liu
Yi-Ren Yeh
Chu-Song Chen
75
0
0
14 Apr 2025
NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding
Aniket Pal
Sanket Biswas
Alloy Das
Ayush Lodh
Priyanka Banerjee
Soumitri Chattopadhyay
Dimosthenis Karatzas
Josep Lladós
C. V. Jawahar
VLM
69
0
0
12 Apr 2025
DocSAM: Unified Document Image Segmentation via Query Decomposition and Heterogeneous Mixed Learning
Xiao-Hui Li
Fei Yin
Cheng-Lin Liu
85
1
0
05 Apr 2025
VISTA-OCR: Towards generative and interactive end to end OCR models
Laziz Hamdi
Amine Tamasna
Pascal Boisson
Thierry Paquet
93
1
0
04 Apr 2025
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding
Binh M. Le
Shaoyuan Xu
Jinmiao Fu
Zhishen Huang
Moyan Li
Yanhui Guo
Hongdong Li
Sameera Ramasinghe
Bryan Wang
75
0
0
03 Apr 2025
Improving Applicability of Deep Learning based Token Classification models during Training
Anket Mehra
Malte Prieß
Marian Himstedt
101
0
0
28 Mar 2025
M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?
Haolong Yan
Kaijun Tan
Yeqing Shen
Xin Huang
Zheng Ge
Xiangyu Zhang
Si Li
Daxin Jiang
VLM
76
0
0
27 Mar 2025
BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata Extraction
Jan Kohút
Martin Dočekal
Michal Hradiš
Marek Vaško
87
0
0
25 Mar 2025
SFDLA: Source-Free Document Layout Analysis
Sebastian Tewes
Yufan Chen
Omar Moured
Jiaming Zhang
Rainer Stiefelhagen
86
0
0
24 Mar 2025
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
Zhaoqing Zhu
Chuwei Luo
Zirui Shao
Feiyu Gao
Hangdi Xing
Qi Zheng
Ji Zhang
100
1
0
24 Mar 2025
MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks
Wenhao You
Bryan Hooi
Yiwei Wang
Yansen Wang
Zong Ke
Ming Yang
Zi Huang
Yujun Cai
AAML
100
0
0
24 Mar 2025
PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data Construction
Ting Sun
Cheng Cui
Yuning Du
Yi Liu
104
1
0
21 Mar 2025
Joint Extraction Matters: Prompt-Based Visual Question Answering for Multi-Field Document Information Extraction
Mengsay Loem
Taiju Hosaka
85
0
0
21 Mar 2025
UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis
Jiawei Wang
Kai Hu
Qiang Huo
117
0
0
20 Mar 2025
MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures
Lucas Morin
Valéry Weber
A. Nassar
Gerhard Ingmar Meijer
Luc Van Gool
Yawei Li
Peter W. J. Staar
89
2
0
20 Mar 2025
TextBite: A Historical Czech Document Dataset for Logical Page Segmentation
Martin Kostelník
Karel Beneš
Michal Hradiš
70
0
0
20 Mar 2025
M3: 3D-Spatial MultiModal Memory
Xueyan Zou
Yuchen Song
Ri-Zhao Qiu
Xuanbin Peng
Jianglong Ye
Sifei Liu
Xiaolong Wang
3DGS
106
0
0
20 Mar 2025
An Efficient Deep Learning-Based Approach to Automating Invoice Document Validation
Aziz Amari
Mariem Makni
Wissal Fnaich
Akram Lahmar
Fedi Koubaa
Oumayma Charrad
Mohamed Ali Zormati
Rabaa Youssef Douss
75
0
0
15 Mar 2025
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
A. Nassar
Andres Marafioti
Matteo Omenetti
Maksym Lysak
Nikolaos Livathinos
...
Yusik Kim
A. Said Gurbuz
Michele Dolfi
Miquel Farré
Peter W. J. Staar
102
6
0
14 Mar 2025
KIEval: Evaluation Metric for Document Key Information Extraction
Minsoo Khang
Sang Chul Jung
Sungrae Park
Teakgyu Hong
118
0
0
07 Mar 2025
Knowledge-Decoupled Synergetic Learning: An MLLM based Collaborative Approach to Few-shot Multimodal Dialogue Intention Recognition
Bin Chen
Yu Zhang
Hongfei Ye
Ziyi Huang
Hongyang Chen
108
1
0
06 Mar 2025
MathMistake Checker: A Comprehensive Demonstration for Step-by-Step Math Problem Mistake Finding by Prompt-Guided LLMs
T. Zhang
Zhuoxuan Jiang
Haotian Zhang
Lin Lin
Shaohua Zhang
LRM
112
0
0
06 Mar 2025
Towards Statistical Factuality Guarantee for Large Vision-Language Models
Zechao Li
Chao Yan
Nicholas J. Jackson
Wendi Cui
B. Li
Jiaxin Zhang
Bradley Malin
143
0
0
27 Feb 2025
Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription
Benjamin Gutteridge
Matthew Thomas Jackson
Toni Kukurin
Xiaowen Dong
80
0
0
27 Feb 2025
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
Thanh-Phong Le
Trung Le Chi Phan
Nghia Hieu Nguyen
Kiet Van Nguyen
ViT
89
1
0
26 Feb 2025
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs
Gaye Colakoglu
Gürkan Solmaz
Jonathan Fürst
87
1
0
25 Feb 2025
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language models
Jonathan Bourne
200
0
0
24 Feb 2025
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review
Pei Fu
Tongkun Guan
Zining Wang
Zhentao Guo
Chen Duan
...
Boming Chen
Jiayao Ma
Qianyi Jiang
Kai Zhou
Junfeng Luo
VLM
141
0
0
23 Feb 2025
EDocNet: Efficient Datasheet Layout Analysis Based on Focus and Global Knowledge Distillation
Hong Cai Chen
Longchang Wu
Yang Zhang
72
0
0
23 Feb 2025
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
Wenwen Yu
Zhibo Yang
Jianqiang Wan
Sibo Song
J. Tang
Wenqing Cheng
Yunxing Liu
Xiang Bai
113
5
0
22 Feb 2025
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
Granite Vision Team
Leonid Karlinsky
Assaf Arbelle
Abraham Daniels
A. Nassar
...
Sriram Raghavan
Tanveer Syeda-Mahmood
Peter W. J. Staar
Tal Drory
Rogerio Feris
VLM
AI4TS
193
2
0
14 Feb 2025
Enhancing Document Key Information Localization Through Data Augmentation
Yue Dai
115
0
0
10 Feb 2025
\Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents
Ilia Karmanov
A. Deshmukh
Lukas Voegtle
Philipp Fischer
Kateryna Chumachenko
...
Jarno Seppänen
Jupinder Parmar
Pritam Gundecha
Andrew Tao
Karan Sapra
137
1
0
06 Feb 2025
1
2
3
4
5
6
Next