ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.08387
  4. Cited By
LayoutLMv3: Pre-training for Document AI with Unified Text and Image
  Masking
v1v2v3 (latest)

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

18 April 2022
Yupan Huang
Tengchao Lv
Lei Cui
Yutong Lu
Furu Wei
ArXiv (abs)PDFHTML

Papers citing "LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking"

50 / 277 papers shown
Title
Hi-SAM: Marrying Segment Anything Model for Hierarchical Text
  Segmentation
Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation
Maoyuan Ye
Jing Zhang
Juhua Liu
Chenyu Liu
Baocai Yin
Cong Liu
Bo Du
Dacheng Tao
VLM
109
15
0
31 Jan 2024
Document Structure in Long Document Transformers
Document Structure in Long Document Transformers
Jan Buchmann
Max Eichler
Jan-Micha Bodensohn
Ilia Kuznetsov
Iryna Gurevych
56
3
0
31 Jan 2024
LongFin: A Multimodal Document Understanding Model for Long Financial
  Domain Documents
LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents
Ahmed Masry
Amir Hajian
68
4
0
26 Jan 2024
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document
  Understanding with Instructions
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions
Ryota Tanaka
Taichi Iki
Kyosuke Nishida
Kuniko Saito
Jun Suzuki
VLM
72
23
0
24 Jan 2024
Detect-Order-Construct: A Tree Construction based Approach for
  Hierarchical Document Structure Analysis
Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis
Jiawei Wang
Kai Hu
Zhuoyao Zhong
Lei-huan Sun
Qiang Huo
74
7
0
22 Jan 2024
Dynamic Relation Transformer for Contextual Text Block Detection
Dynamic Relation Transformer for Contextual Text Block Detection
Jiawei Wang
Shunchi Zhang
Kai Hu
Chixiang Ma
Zhuoyao Zhong
Lei-huan Sun
Qiang Huo
58
0
0
17 Jan 2024
Watermark Text Pattern Spotting in Document Images
Watermark Text Pattern Spotting in Document Images
Mateusz Krubiński
Stefan Matcovici
Diana Grigore
Daniel Voinea
A. Popa
WaLM
59
2
0
10 Jan 2024
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for
  End-to-end Document Pair Extraction
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction
Zening Lin
Jiapeng Wang
Teng Li
Wenhui Liao
Dayi Huang
Longfei Xiong
Lianwen Jin
53
2
0
07 Jan 2024
GRAM: Global Reasoning for Multi-Page VQA
GRAM: Global Reasoning for Multi-Page VQA
Tsachi Blau
Sharon Fogel
Roi Ronen
Alona Golts
Roy Ganz
Elad Ben Avraham
Aviad Aberdam
Shahar Tsiper
Ron Litman
74
13
0
07 Jan 2024
DocGraphLM: Documental Graph Language Model for Information Extraction
DocGraphLM: Documental Graph Language Model for Information Extraction
Dongsheng Wang
Zhiqiang Ma
Armineh Nourbakhsh
Kang Gu
Sameena Shah
67
8
0
05 Jan 2024
DocLLM: A layout-aware generative language model for multimodal document
  understanding
DocLLM: A layout-aware generative language model for multimodal document understanding
Dongsheng Wang
Natraj Raman
Mathieu Sibue
Zhiqiang Ma
Petr Babkin
Simerjot Kaur
Yulong Pei
Armineh Nourbakhsh
Xiaomo Liu
VLM
100
62
0
31 Dec 2023
ESGReveal: An LLM-based approach for extracting structured data from ESG
  reports
ESGReveal: An LLM-based approach for extracting structured data from ESG reports
Yi Zou
Mengying Shi
Zhongjie Chen
Zhu Deng
Zongxiong Lei
Zihan Zeng
Shiming Yang
Hongxiang Tong
Lei Xiao
Wenwen Zhou
97
10
0
25 Dec 2023
TDeLTA: A Light-weight and Robust Table Detection Method based on
  Learning Text Arrangement
TDeLTA: A Light-weight and Robust Table Detection Method based on Learning Text Arrangement
Yang Fan
Xiangping Wu
Qingcai Chen
Heng Li
Yan Huang
Zhixiang Cai
Qitian Wu
LMTD
78
0
0
18 Dec 2023
WordScape: a Pipeline to extract multilingual, visually rich Documents
  with Layout Annotations from Web Crawl Data
WordScape: a Pipeline to extract multilingual, visually rich Documents with Layout Annotations from Web Crawl Data
Maurice Weber
Carlo Siebenschuh
Rory Butler
Anton Alexandrov
Valdemar Thanner
...
Haris Jabbar
Ian Foster
Yue Liu
Rick L. Stevens
Ce Zhang
65
4
0
15 Dec 2023
Privacy-Aware Document Visual Question Answering
Privacy-Aware Document Visual Question Answering
Rubèn Pérez Tito
Khanh Nguyen
Marlon Tobaben
Raouf Kerkouche
Mohamed Ali Souibgui
...
Lei Kang
Ernest Valveny
Antti Honkela
Mario Fritz
Dimosthenis Karatzas
81
13
0
15 Dec 2023
ESG Accountability Made Easy: DocQA at Your Service
ESG Accountability Made Easy: DocQA at Your Service
Lokesh Mishra
Cesar Berrospi
K. Dinkla
Diego Antognini
Francesco Fusco
...
Panagiotis Vagenas
Lucas Morin
Christoph Auer
Michele Dolfi
Peter W. J. Staar
59
4
0
30 Nov 2023
EIGEN: Expert-Informed Joint Learning Aggregation for High-Fidelity
  Information Extraction from Document Images
EIGEN: Expert-Informed Joint Learning Aggregation for High-Fidelity Information Extraction from Document Images
A. Singh
Venkatapathy Subramanian
Ayush Maheshwari
Pradeep Narayan
D. P. Shetty
Ganesh Ramakrishnan
56
3
0
23 Nov 2023
FATURA: A Multi-Layout Invoice Image Dataset for Document Analysis and
  Understanding
FATURA: A Multi-Layout Invoice Image Dataset for Document Analysis and Understanding
Mahmoud Limam
M. Dhiaf
Yousri Kessentini
50
2
0
20 Nov 2023
DocPedia: Unleashing the Power of Large Multimodal Model in the
  Frequency Domain for Versatile Document Understanding
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
Hao Feng
Qi Liu
Hao Liu
Wen-gang Zhou
Houqiang Li
Can Huang
VLM
119
67
0
20 Nov 2023
PixT3: Pixel-based Table-To-Text Generation
PixT3: Pixel-based Table-To-Text Generation
Iñigo Alonso
Eneko Agirre
Mirella Lapata
LMTD
67
6
0
16 Nov 2023
Efficient End-to-End Visual Document Understanding with Rationale
  Distillation
Efficient End-to-End Visual Document Understanding with Rationale Distillation
Wang Zhu
Alekh Agarwal
Mandar Joshi
Robin Jia
Jesse Thomason
Kristina Toutanova
74
2
0
16 Nov 2023
Multiple-Question Multiple-Answer Text-VQA
Multiple-Question Multiple-Answer Text-VQA
Peng Tang
Srikar Appalaraju
R. Manmatha
Yusheng Xie
Vijay Mahadevan
98
5
0
15 Nov 2023
On Task-personalized Multimodal Few-shot Learning for Visually-rich
  Document Entity Retrieval
On Task-personalized Multimodal Few-shot Learning for Visually-rich Document Entity Retrieval
Jiayi Chen
H. Dai
Bo Dai
Aidong Zhang
Wei Wei
87
2
0
01 Nov 2023
DCQA: Document-Level Chart Question Answering towards Complex Reasoning
  and Common-Sense Understanding
DCQA: Document-Level Chart Question Answering towards Complex Reasoning and Common-Sense Understanding
Anran Wu
Luwei Xiao
Xingjiao Wu
Shuwen Yang
Junjie Xu
Zisong Zhuang
Nian Xie
Cheng Jin
Liang He
65
0
0
29 Oct 2023
Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis
Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis
Shangbang Long
Siyang Qin
Yasuhisa Fujii
Alessandro Bissacco
Michalis Raptis
81
5
0
25 Oct 2023
Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and
  In-depth Evaluation
Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation
Yongxin Shi
Dezhi Peng
Wenhui Liao
Zening Lin
Xinhong Chen
Chongyu Liu
Yuyi Zhang
Lianwen Jin
MLLM
99
44
0
25 Oct 2023
Enhancing Document Information Analysis with Multi-Task Pre-training: A
  Robust Approach for Information Extraction in Visually-Rich Documents
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Tofik Ali
Partha Pratim Roy
59
0
0
25 Oct 2023
A Multi-Modal Multilingual Benchmark for Document Image Classification
A Multi-Modal Multilingual Benchmark for Document Image Classification
Yoshinari Fujinuma
Siddharth Varia
Nishant Sankaran
Srikar Appalaraju
Bonan Min
Yogarshi Vyas
VLM
90
4
0
25 Oct 2023
GenKIE: Robust Generative Multimodal Document Key Information Extraction
GenKIE: Robust Generative Multimodal Document Key Information Extraction
Panfeng Cao
Ye Wang
Qiang Zhang
Zaiqiao Meng
SyDa
82
7
0
24 Oct 2023
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye
  Movement for Machine Reading
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading
Hao Wang
Qingxuan Wang
Yue Li
Changqing Wang
Chenhui Chu
Rui Wang
VGen
48
3
0
23 Oct 2023
Vision-Enhanced Semantic Entity Recognition in Document Images via
  Visually-Asymmetric Consistency Learning
Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning
Hao Wang
Xiahua Chen
Rui Wang
Chenhui Chu
70
0
0
23 Oct 2023
Reading Order Matters: Information Extraction from Visually-rich
  Documents by Token Path Prediction
Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction
Chong Zhang
Ya Guo
Yi Tu
Huan Chen
Jinyang Tang
Huijia Zhu
Qi Zhang
Tao Gui
3DV
81
20
0
17 Oct 2023
Enhancing BERT-Based Visual Question Answering through Keyword-Driven
  Sentence Selection
Enhancing BERT-Based Visual Question Answering through Keyword-Driven Sentence Selection
Davide Napolitano
Lorenzo Vaiani
Luca Cagliero
56
2
0
13 Oct 2023
DSG: An End-to-End Document Structure Generator
DSG: An End-to-End Document Structure Generator
Johannes Rausch
Gentiana Rashiti
Maxim Gusev
Ce Zhang
Stefan Feuerriegel
85
3
0
13 Oct 2023
TILFA: A Unified Framework for Text, Image, and Layout Fusion in
  Argument Mining
TILFA: A Unified Framework for Text, Image, and Layout Fusion in Argument Mining
Qing Zong
Zhaowei Wang
Baixuan Xu
Tianshi Zheng
Haochen Shi
Weiqi Wang
Yangqiu Song
Ginny Wong
Simon See
83
4
0
08 Oct 2023
UReader: Universal OCR-free Visually-situated Language Understanding
  with Multimodal Large Language Model
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Jiabo Ye
Anwen Hu
Haiyang Xu
Qinghao Ye
Mingshi Yan
...
Ji Zhang
Qin Jin
Liang He
Xin Lin
Feiyan Huang
VLMMLLM
212
98
0
08 Oct 2023
PrIeD-KIE: Towards Privacy Preserved Document Key Information Extraction
PrIeD-KIE: Towards Privacy Preserved Document Key Information Extraction
S. Saifullah
S. Agne
Andreas Dengel
Sheraz Ahmed
78
0
0
05 Oct 2023
ProtoNER: Few shot Incremental Learning for Named Entity Recognition
  using Prototypical Networks
ProtoNER: Few shot Incremental Learning for Named Entity Recognition using Prototypical Networks
Ritesh Kumar
Saurabh Goyal
Ashish Verma
Vatche Isahagian
53
3
0
03 Oct 2023
Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards
  Enhancing Text Spotting Performance
Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance
Alloy Das
Sanket Biswas
Ayan Banerjee
Josep Lladós
Umapada Pal
Saumik Bhattacharya
103
3
0
02 Oct 2023
GridFormer: Towards Accurate Table Structure Recognition via Grid
  Prediction
GridFormer: Towards Accurate Table Structure Recognition via Grid Prediction
Pengyuan Lyu
Weihong Ma
Hongyi Wang
Yu Yu
Chengquan Zhang
Kun Yao
Yang Xue
Jingdong Wang
LMTD
84
9
0
26 Sep 2023
Analyzing the Efficacy of an LLM-Only Approach for Image-based Document
  Question Answering
Analyzing the Efficacy of an LLM-Only Approach for Image-based Document Question Answering
Nidhi Hegde
S. Paul
Gagan Madan
Gaurav Aggarwal
68
9
0
25 Sep 2023
Document Understanding for Healthcare Referrals
Document Understanding for Healthcare Referrals
Jimit Mistry
N. Arzeno
MedIm
32
1
0
22 Sep 2023
SCOB: Universal Text Understanding via Character-wise Supervised
  Contrastive Learning with Online Text Rendering for Bridging Domain Gap
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap
Daehee Kim
Yoon Kim
Donghyun Kim
Yumin Lim
Geewook Kim
Taeho Kil
79
4
0
21 Sep 2023
Kosmos-2.5: A Multimodal Literate Model
Kosmos-2.5: A Multimodal Literate Model
Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
...
Weiyao Luo
Shaoxiang Wu
Guoxin Wang
Cha Zhang
Furu Wei
VLMMLLM
114
66
0
20 Sep 2023
LMDX: Language Model-based Document Information Extraction and
  Localization
LMDX: Language Model-based Document Information Extraction and Localization
Vincent Perot
Kai Kang
Florian Luisier
Guolong Su
Xiaoyu Sun
...
Zifeng Wang
Jiaqi Mu
Hao Zhang
Chen-Yu Lee
Nan Hua
98
34
0
19 Sep 2023
Long-Range Transformer Architectures for Document Understanding
Long-Range Transformer Architectures for Document Understanding
Thibault Douzon
S. Duffner
Christophe Garcia
Jérémy Espinas
VLM
79
2
0
11 Sep 2023
Attention Where It Matters: Rethinking Visual Document Understanding
  with Selective Region Concentration
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
H. Cao
Changcun Bao
Chaohu Liu
Huang-wei Chen
Kun Yin
Hao Liu
Yinsong Liu
Deqiang Jiang
Xing Sun
68
14
0
03 Sep 2023
Document AI: A Comparative Study of Transformer-Based, Graph-Based
  Models, and Convolutional Neural Networks For Document Layout Analysis
Document AI: A Comparative Study of Transformer-Based, Graph-Based Models, and Convolutional Neural Networks For Document Layout Analysis
Sotirios Kastanas
Shaomu Tan
Yijiang He
79
1
0
29 Aug 2023
Vision Grid Transformer for Document Layout Analysis
Vision Grid Transformer for Document Layout Analysis
Cheng Da
Chuwei Luo
Qi Zheng
Cong Yao
ViT
99
32
0
29 Aug 2023
Nougat: Neural Optical Understanding for Academic Documents
Nougat: Neural Optical Understanding for Academic Documents
Lukas Blecher
Guillem Cucurull
Thomas Scialom
Robert Stojnic
ViT
66
120
0
25 Aug 2023
Previous
123456
Next