Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.08387
Cited By
v1
v2
v3 (latest)
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
18 April 2022
Yupan Huang
Tengchao Lv
Lei Cui
Yutong Lu
Furu Wei
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking"
50 / 277 papers shown
Title
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures
Shivalika Singh
Nakul Sharma
Manish Gupta
Anand Mishra
143
1
0
28 Jan 2025
Code and Pixels: Multi-Modal Contrastive Pre-training for Enhanced Tabular Data Analysis
Kankana Roy
Lars Krämer
Sebastian Domaschke
Malik Haris
Roland Aydin
Hyunjin Park
Martin Held
119
0
0
13 Jan 2025
Clinical Insights: A Comprehensive Review of Language Models in Medicine
Nikita Neveditsin
Pawan Lingras
V. Mago
LM&MA
129
5
0
08 Jan 2025
SAIL: Sample-Centric In-Context Learning for Document Information Extraction
Jinyu Zhang
Zhiyuan You
Jize Wang
Xinyi Le
132
1
0
22 Dec 2024
Training LayoutLM from Scratch for Efficient Named-Entity Recognition in the Insurance Domain
Benno Uthayasooriyar
A. Ly
Franck Vermet
Caio Corro
101
0
0
12 Dec 2024
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
Linke Ouyang
Yuan Qu
Hongbin Zhou
Jiawei Zhu
Rui Zhang
...
Chao Xu
Bo Zhang
Botian Shi
Zhongying Tu
Zeang Sheng
187
11
0
10 Dec 2024
Text Change Detection in Multilingual Documents Using Image Comparison
Doyoung Park
Naresh Reddy Yarram
Sunjin Kim
Minkyu Kim
Seongho Cho
Taehee Lee
87
0
0
05 Dec 2024
DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness
Ahmad Mohammadshirazi
Pinaki Prasad Guha Neogi
Ser-Nam Lim
R. Ramnath
131
1
0
29 Nov 2024
GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts
Junwen He
Yifan Wang
Lijun Wang
Huchuan Lu
Jun-Yan He
Chong Li
Hanyuan Chen
Jin-Peng Lan
Bin Luo
Yifeng Geng
119
1
0
18 Nov 2024
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Jaemin Cho
Debanjan Mahata
Ozan Irsoy
Yujie He
Joey Tianyi Zhou
VLM
98
16
0
07 Nov 2024
AutoGameUI: Constructing High-Fidelity Game UIs via Multimodal Learning and Interactive Web-Based Tool
Zhongliang Tang
Mengchen Tan
Fei Xia
Qingrong Cheng
Hao Jiang
Yize Zhang
58
0
0
06 Nov 2024
SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding
Jian Chen
Ruiyi Zhang
Yufan Zhou
Tong Yu
Franck Dernoncourt
J. Gu
Ryan Rossi
Changyou Chen
Tong Sun
86
0
0
02 Nov 2024
Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map
Xinyuan Chang
Maixuan Xue
Xinran Liu
Zheng Pan
Xing Wei
217
2
0
31 Oct 2024
DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
Manan Suri
Puneet Mathur
Franck Dernoncourt
R. Jain
Vlad I. Morariu
Ramit Sawhney
Preslav Nakov
Dinesh Manocha
122
3
0
21 Oct 2024
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Zhiyuan Zhao
Hengrui Kang
Bin Wang
Zeang Sheng
69
17
0
16 Oct 2024
ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training
Zhouqiang Jiang
Bowen Wang
Junhao Chen
Yuta Nakashima
48
2
0
14 Oct 2024
Towards an Improved Metric for Evaluating Disentangled Representations
Sahib Julka
Yashu Wang
Michael Granitzer
69
0
0
04 Oct 2024
DAViD: Domain Adaptive Visually-Rich Document Understanding with Synthetic Insights
Yihao Ding
S. Han
Zechuan Li
Hyunsuk Chung
70
2
0
02 Oct 2024
Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding
Chong Zhang
Yi Tu
Yixi Zhao
Chenshu Yuan
Huan Chen
...
Mingxu Chai
Ya Guo
Huijia Zhu
Qi Zhang
Tao Gui
62
2
0
29 Sep 2024
See then Tell: Enhancing Key Information Extraction with Vision Grounding
Shuhang Liu
Zhenrong Zhang
Pengfei Hu
Jiefeng Ma
Jun Du
Qing Wang
Jianshu Zhang
Chenyu Liu
60
0
0
29 Sep 2024
MinerU: An Open-Source Solution for Precise Document Content Extraction
Bin Wang
Chao Xu
Xiaomeng Zhao
Linke Ouyang
Fan Wu
...
Wei Li
Botian Shi
Yu Qiao
Dahua Lin
Conghui He
60
47
0
27 Sep 2024
A comprehensive study of on-device NLP applications -- VQA, automated Form filling, Smart Replies for Linguistic Codeswitching
Naman Goyal
53
0
0
23 Sep 2024
DocMamba: Efficient Document Pre-training with State Space Model
Pengfei Hu
Zhenrong Zhang
Jiefeng Ma
Shuhang Liu
Jun Du
Jianshu Zhang
Mamba
127
1
0
18 Sep 2024
Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5
Marcel Lamott
Muhammad Armaghan Shakir
70
0
0
17 Sep 2024
RexUniNLU: Recursive Method with Explicit Schema Instructor for Universal NLU
Chengyuan Liu
Shihang Wang
Fubang Zhao
Kun Kuang
Yangyang Kang
Weiming Lu
Changlong Sun
Fei Wu
89
0
0
09 Sep 2024
READoc: A Unified Benchmark for Realistic Document Structured Extraction
Zichao Li
Aizier Abulaiti
Yaojie Lu
Xuanang Chen
Jia Zheng
Hongyu Lin
Xianpei Han
Le Sun
75
5
0
08 Sep 2024
ViRED: Prediction of Visual Relations in Engineering Drawings
Chao Gu
Ke Lin
Yiyang Luo
Jiahui Hou
Xiang-Yang Li
80
0
0
02 Sep 2024
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts
I. de Rodrigo
A. Sanchez-Cuadrado
J. Boal
A. J. Lopez-Lopez
VLM
95
1
0
31 Aug 2024
μgat: Improving Single-Page Document Parsing by Providing Multi-Page Context
Fabio Quattrini
Carmine Zaccagnino
Silvia Cascianelli
Laura Righi
Rita Cucchiara
68
1
0
28 Aug 2024
SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding
Chuanghao Ding
Xuejing Liu
Wei Tang
Juan Li
Xiaoliang Wang
Rui Zhao
Cam-Tu Nguyen
Fei Tan
93
0
0
27 Aug 2024
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Wenhui Liao
Jiapeng Wang
Hongliang Li
Chengyu Wang
Jun Huang
Lianwen Jin
178
0
0
27 Aug 2024
Large Language Models for Page Stream Segmentation
H. Heidenreich
Ratish Dalvi
Rohith Mukku
Nikhil Verma
Neven Pičuljan
72
1
0
21 Aug 2024
Out-of-Distribution Detection with Attention Head Masking for Multimodal Document Classification
Christos Constantinou
Georgios Ioannides
Aman Chadha
Aaron Elkins
Edwin Simpson
OODD
77
1
0
20 Aug 2024
Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models
Mingxin Huang
Yuliang Liu
Dingkang Liang
Lianwen Jin
Xiang Bai
111
14
0
04 Aug 2024
UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents
Yi Tu
Chong Zhang
Ya Guo
Huan Chen
Jinyang Tang
Huijia Zhu
Qi Zhang
67
3
0
02 Aug 2024
Deep Learning based Visually Rich Document Content Understanding: A Survey
Muhammad Ali
Jean Lee
Salman Khan
Eduard Hovy
115
6
0
02 Aug 2024
SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters
Shohei Tanaka
Hao Wang
Yoshitaka Ushiku
59
2
0
29 Jul 2024
Harmonizing Visual Text Comprehension and Generation
Zhen Zhao
Jingqun Tang
Binghong Wu
Chunhui Lin
Shubo Wei
Hao Liu
Xin Tan
Zhizhong Zhang
Can Huang
Yuan Xie
VLM
107
26
0
23 Jul 2024
CRMSP: A Semi-supervised Approach for Key Information Extraction with Class-Rebalancing and Merged Semantic Pseudo-Labeling
Qi Zhang
Yonghong Song
Pengcheng Guo
Yangyang Hui
83
0
0
19 Jul 2024
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Ofir Abramovich
Niv Nayman
Sharon Fogel
I. Lavi
Ron Litman
Shahar Tsiper
Royee Tichauer
Srikar Appalaraju
Shai Mazor
R. Manmatha
VLM
109
3
0
17 Jul 2024
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
Yufan Shen
Chuwei Luo
Zhaoqing Zhu
Yang Chen
Qi Zheng
Zhi Yu
Jiajun Bu
Cong Yao
139
2
0
17 Jul 2024
DANIEL: A fast Document Attention Network for Information Extraction and Labelling of handwritten documents
Thomas Constum
Pierrick Tranouez
Thierry Paquet
56
5
0
12 Jul 2024
Extracting Training Data from Document-Based VQA Models
Francesco Pinto
N. Rauschmayr
F. Tramèr
Philip Torr
Federico Tombari
92
6
0
11 Jul 2024
VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction
Thanh-Dat Nguyen
Tung Do-Viet
Hung Nguyen-Duy
Tuan-Hai Luu
Hung Le
Bach Le
Patanamon
Thongtanunam
SyDa
46
1
0
09 Jul 2024
Large Language Models Understand Layout
Weiming Li
Manni Duan
Dong An
Yan Shao
86
3
0
08 Jul 2024
MobileFlow: A Multimodal LLM For Mobile GUI Agent
Songqin Nong
Jiali Zhu
Rui Wu
Jiongchao Jin
Shuo Shan
Xiutian Huang
Wenhao Xu
67
11
0
05 Jul 2024
DocXplain: A Novel Model-Agnostic Explainability Method for Document Image Classification
S. Saifullah
S. Agne
Andreas Dengel
Sheraz Ahmed
136
0
0
04 Jul 2024
MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis
Lei Chen
Feng Yan
Yujie Zhong
Shaoxiang Chen
Zequn Jie
Lin Ma
125
4
0
03 Jul 2024
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
Jinghui Lu
Haiyang Yu
Yanjie Wang
Yongjie Ye
Jingqun Tang
...
Qi Liu
Hao Feng
Han Wang
Hao Liu
Can Huang
178
23
0
02 Jul 2024
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
Yubo Ma
Yuhang Zang
Liangyu Chen
Meiqi Chen
Yizhu Jiao
...
Liangming Pan
Yu-Gang Jiang
Jiaqi Wang
Yixin Cao
Aixin Sun
ELM
RALM
VLM
111
33
0
01 Jul 2024
Previous
1
2
3
4
5
6
Next