v1v2v3 (latest)

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

18 April 2022

Papers citing "LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking"

27 / 277 papers shown

Title
DocILE Benchmark for Document Information Localization and Extraction vStvepán vSimsa Milan vSulc Michal Uvrivcávr Yash J. Patel Ahmed Hamdi ... Matyávs Skalický Jivrí Matas Antoine Doucet Mickael Coustaty Dimosthenis Karatzas 67 36 0 11 Feb 2023
Layout-aware Webpage Quality Assessment Anfeng Cheng Yiding Liu Weibin Li Qian Dong Shuaiqiang Wang Zhengjie Huang Shikun Feng Zhicong Cheng D. Yin 3DV 71 4 0 28 Jan 2023
An Augmentation Strategy for Visually Rich Documents Jing Xie James Bradley Wendt Yichao Zhou Seth Ebner Sandeep Tata 73 0 0 20 Dec 2022
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering Fangyu Liu Francesco Piccinno Syrine Krichene Chenxi Pang Kenton Lee Mandar Joshi Yasemin Altun Nigel Collier Julian Martin Eisenschlos VLM LRM 61 102 0 19 Dec 2022
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding Haoli Bai Zhiguang Liu Xiaojun Meng Wentao Li Shuangning Liu ... Liangwei Wang Lu Hou Jiansheng Wei Xin Jiang Qun Liu ViT 77 13 0 19 Dec 2022
CLIPPO: Image-and-Language Understanding from Pixels Only Michael Tschannen Basil Mustafa N. Houlsby CLIP VLM 104 49 0 15 Dec 2022
Page Layout Analysis of Text-heavy Historical Documents: a Comparison of Textual and Visual Approaches Sven Najem-Meyer Matteo Romanello 63 6 0 12 Dec 2022
Hierarchical multimodal transformers for Multi-Page DocVQA Rubèn Pérez Tito Dimosthenis Karatzas Ernest Valveny 94 61 0 07 Dec 2022
Unifying Vision, Text, and Layout for Universal Document Processing Zineng Tang Ziyi Yang Guoxin Wang Yuwei Fang Yang Liu Chenguang Zhu Michael Zeng Chao-Yue Zhang Joey Tianyi Zhou VLM 131 115 0 05 Dec 2022
Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models Lei Wang Jian He Xingdong Xu Ning Liu Hui-juan Liu 83 2 0 27 Nov 2022
Semantic Table Detection with LayoutLMv3 Ivan Silajev Niels Victor Phillip Mortimer 48 1 0 25 Nov 2022
VRDU: A Benchmark for Visually-rich Document Understanding Zilong Wang Yichao Zhou Wei Wei Chen-Yu Lee Sandeep Tata 58 17 0 15 Nov 2022
Unimodal and Multimodal Representation Training for Relation Extraction Ciaran Cooney Rachel Heyburn Liam Maddigan Mairead O'Cuinn Chloe Thompson Joana Cavadas 55 2 0 11 Nov 2022
DoSA : A System to Accelerate Annotations on Business Documents with Human-in-the-Loop Neelesh K Shukla Msp Raja Raghu Katikeri Amit Vaid 36 1 0 09 Nov 2022
On Web-based Visual Corpus Construction for Visual Document Understanding Donghyun Kim Teakgyu Hong Moonbin Yim Yoonsik Kim Geewook Kim 95 4 0 07 Nov 2022
RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild Weiyao Wang Byung-Hak Kim Varun Ganapathi SSL LMTD 63 1 0 02 Nov 2022
Key Information Extraction in Purchase Documents using Deep Learning and Rule-based Corrections R. Arroyo J. Yebes E. Martínez Hector Corrales Javier Lorenzo 75 1 0 07 Oct 2022
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding Kenton Lee Mandar Joshi Iulia Turc Hexiang Hu Fangyu Liu Julian Martin Eisenschlos Urvashi Khandelwal Peter Shaw Ming-Wei Chang Kristina Toutanova CLIP VLM 302 280 0 07 Oct 2022
XDoc: Unified Pre-training for Cross-Format Document Understanding Jingye Chen Tengchao Lv Lei Cui Changrong Zhang Furu Wei 95 14 0 06 Oct 2022
One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text Abhinav Java Shripad Deshmukh Milan Aggarwal Surgan Jandial Mausoom Sarkar Balaji Krishnamurthy 56 3 0 12 Sep 2022
TaCo: Textual Attribute Recognition via Contrastive Learning Chang Nie Yiqing Hu Yanqiu Qu Hao Liu Deqiang Jiang Bo Ren 90 0 0 22 Aug 2022
Understanding Long Documents with Different Position-Aware Attentions Hai Pham Guoxin Wang Yijuan Lu D. Florêncio Changrong Zhang 67 9 0 17 Aug 2022
Knowing Where and What: Unified Word Block Pretraining for Document Understanding Song Tao Zijian Wang Tiantian Fan Canjie Luo Can Huang SSL 80 2 0 28 Jul 2022
Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding Chuwei Luo Guozhi Tang Qi Zheng Cong Yao Lianwen Jin Chenliang Li Yang Xue Luo Si 91 18 0 27 Jun 2022
Test-Time Adaptation for Visual Document Understanding Sayna Ebrahimi Sercan O. Arik Tomas Pfister OOD 78 6 0 15 Jun 2022
VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification Souhail Bakkali Zuheng Ming Mickael Coustaty Marccal Rusinol O. R. Terrades VLM 99 31 0 24 May 2022
DiT: Self-supervised Pre-training for Document Image Transformer Junlong Li Yiheng Xu Tengchao Lv Lei Cui Chaoxi Zhang Furu Wei ViT VLM 128 170 0 04 Mar 2022