ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.08387
  4. Cited By
LayoutLMv3: Pre-training for Document AI with Unified Text and Image
  Masking
v1v2v3 (latest)

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

18 April 2022
Yupan Huang
Tengchao Lv
Lei Cui
Yutong Lu
Furu Wei
ArXiv (abs)PDFHTML

Papers citing "LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking"

27 / 277 papers shown
Title
DocILE Benchmark for Document Information Localization and Extraction
DocILE Benchmark for Document Information Localization and Extraction
vStvepán vSimsa
Milan vSulc
Michal Uvrivcávr
Yash J. Patel
Ahmed Hamdi
...
Matyávs Skalický
Jivrí Matas
Antoine Doucet
Mickael Coustaty
Dimosthenis Karatzas
67
36
0
11 Feb 2023
Layout-aware Webpage Quality Assessment
Layout-aware Webpage Quality Assessment
Anfeng Cheng
Yiding Liu
Weibin Li
Qian Dong
Shuaiqiang Wang
Zhengjie Huang
Shikun Feng
Zhicong Cheng
D. Yin
3DV
71
4
0
28 Jan 2023
An Augmentation Strategy for Visually Rich Documents
An Augmentation Strategy for Visually Rich Documents
Jing Xie
James Bradley Wendt
Yichao Zhou
Seth Ebner
Sandeep Tata
73
0
0
20 Dec 2022
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and
  Chart Derendering
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
Fangyu Liu
Francesco Piccinno
Syrine Krichene
Chenxi Pang
Kenton Lee
Mandar Joshi
Yasemin Altun
Nigel Collier
Julian Martin Eisenschlos
VLMLRM
61
102
0
19 Dec 2022
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document
  Understanding
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding
Haoli Bai
Zhiguang Liu
Xiaojun Meng
Wentao Li
Shuangning Liu
...
Liangwei Wang
Lu Hou
Jiansheng Wei
Xin Jiang
Qun Liu
ViT
77
13
0
19 Dec 2022
CLIPPO: Image-and-Language Understanding from Pixels Only
CLIPPO: Image-and-Language Understanding from Pixels Only
Michael Tschannen
Basil Mustafa
N. Houlsby
CLIPVLM
104
49
0
15 Dec 2022
Page Layout Analysis of Text-heavy Historical Documents: a Comparison of
  Textual and Visual Approaches
Page Layout Analysis of Text-heavy Historical Documents: a Comparison of Textual and Visual Approaches
Sven Najem-Meyer
Matteo Romanello
63
6
0
12 Dec 2022
Hierarchical multimodal transformers for Multi-Page DocVQA
Hierarchical multimodal transformers for Multi-Page DocVQA
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
94
61
0
07 Dec 2022
Unifying Vision, Text, and Layout for Universal Document Processing
Unifying Vision, Text, and Layout for Universal Document Processing
Zineng Tang
Ziyi Yang
Guoxin Wang
Yuwei Fang
Yang Liu
Chenguang Zhu
Michael Zeng
Chao-Yue Zhang
Joey Tianyi Zhou
VLM
131
115
0
05 Dec 2022
Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image
  Models
Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models
Lei Wang
Jian He
Xingdong Xu
Ning Liu
Hui-juan Liu
83
2
0
27 Nov 2022
Semantic Table Detection with LayoutLMv3
Semantic Table Detection with LayoutLMv3
Ivan Silajev
Niels Victor
Phillip Mortimer
48
1
0
25 Nov 2022
VRDU: A Benchmark for Visually-rich Document Understanding
VRDU: A Benchmark for Visually-rich Document Understanding
Zilong Wang
Yichao Zhou
Wei Wei
Chen-Yu Lee
Sandeep Tata
58
17
0
15 Nov 2022
Unimodal and Multimodal Representation Training for Relation Extraction
Unimodal and Multimodal Representation Training for Relation Extraction
Ciaran Cooney
Rachel Heyburn
Liam Maddigan
Mairead O'Cuinn
Chloe Thompson
Joana Cavadas
55
2
0
11 Nov 2022
DoSA : A System to Accelerate Annotations on Business Documents with
  Human-in-the-Loop
DoSA : A System to Accelerate Annotations on Business Documents with Human-in-the-Loop
Neelesh K Shukla
Msp Raja
Raghu Katikeri
Amit Vaid
36
1
0
09 Nov 2022
On Web-based Visual Corpus Construction for Visual Document
  Understanding
On Web-based Visual Corpus Construction for Visual Document Understanding
Donghyun Kim
Teakgyu Hong
Moonbin Yim
Yoonsik Kim
Geewook Kim
95
4
0
07 Nov 2022
RegCLR: A Self-Supervised Framework for Tabular Representation Learning
  in the Wild
RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild
Weiyao Wang
Byung-Hak Kim
Varun Ganapathi
SSLLMTD
63
1
0
02 Nov 2022
Key Information Extraction in Purchase Documents using Deep Learning and
  Rule-based Corrections
Key Information Extraction in Purchase Documents using Deep Learning and Rule-based Corrections
R. Arroyo
J. Yebes
E. Martínez
Hector Corrales
Javier Lorenzo
75
1
0
07 Oct 2022
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language
  Understanding
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Kenton Lee
Mandar Joshi
Iulia Turc
Hexiang Hu
Fangyu Liu
Julian Martin Eisenschlos
Urvashi Khandelwal
Peter Shaw
Ming-Wei Chang
Kristina Toutanova
CLIPVLM
302
280
0
07 Oct 2022
XDoc: Unified Pre-training for Cross-Format Document Understanding
XDoc: Unified Pre-training for Cross-Format Document Understanding
Jingye Chen
Tengchao Lv
Lei Cui
Changrong Zhang
Furu Wei
95
14
0
06 Oct 2022
One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text
One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text
Abhinav Java
Shripad Deshmukh
Milan Aggarwal
Surgan Jandial
Mausoom Sarkar
Balaji Krishnamurthy
56
3
0
12 Sep 2022
TaCo: Textual Attribute Recognition via Contrastive Learning
TaCo: Textual Attribute Recognition via Contrastive Learning
Chang Nie
Yiqing Hu
Yanqiu Qu
Hao Liu
Deqiang Jiang
Bo Ren
90
0
0
22 Aug 2022
Understanding Long Documents with Different Position-Aware Attentions
Understanding Long Documents with Different Position-Aware Attentions
Hai Pham
Guoxin Wang
Yijuan Lu
D. Florêncio
Changrong Zhang
67
9
0
17 Aug 2022
Knowing Where and What: Unified Word Block Pretraining for Document Understanding
Song Tao
Zijian Wang
Tiantian Fan
Canjie Luo
Can Huang
SSL
80
2
0
28 Jul 2022
Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding
Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding
Chuwei Luo
Guozhi Tang
Qi Zheng
Cong Yao
Lianwen Jin
Chenliang Li
Yang Xue
Luo Si
91
18
0
27 Jun 2022
Test-Time Adaptation for Visual Document Understanding
Test-Time Adaptation for Visual Document Understanding
Sayna Ebrahimi
Sercan O. Arik
Tomas Pfister
OOD
78
6
0
15 Jun 2022
VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal
  Document Classification
VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification
Souhail Bakkali
Zuheng Ming
Mickael Coustaty
Marccal Rusinol
O. R. Terrades
VLM
99
31
0
24 May 2022
DiT: Self-supervised Pre-training for Document Image Transformer
DiT: Self-supervised Pre-training for Document Image Transformer
Junlong Li
Yiheng Xu
Tengchao Lv
Lei Cui
Chaoxi Zhang
Furu Wei
ViTVLM
128
170
0
04 Mar 2022
Previous
123456