Towards a Multi-modal, Multi-task Learning based Pre-training Framework for Document Representation Learning

30 September 2020

Papers citing "Towards a Multi-modal, Multi-task Learning based Pre-training Framework for Document Representation Learning"

21 / 21 papers shown

Title
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations Linke Ouyang Yuan Qu Hongbin Zhou Jiawei Zhu Rui Zhang ... Chao Xu Bo Zhang Botian Shi Zhongying Tu Conghui He 101 5 0 10 Dec 2024
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding Wenhui Liao Jiapeng Wang Hongliang Li Chengyu Wang Jun Huang Lianwen Jin 38 0 0 27 Aug 2024
LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents Ahmed Masry Amir Hajian 22 2 0 26 Jan 2024
Beyond Document Page Classification: Design, Datasets, and Challenges Jordy Van Landeghem Sanket Biswas Matthew B. Blaschko Marie-Francine Moens 37 6 0 24 Aug 2023
On Evaluation of Document Classification using RVL-CDIP Stefan Larson Gordon Lim Kevin Leach 26 3 0 21 Jun 2023
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction Nils Loose Chun-Liang Li Hao Zhang Timothy Dozat Felix Mächtle ... Shangbang Long Siyang Qin Yasuhisa Fujii Nan Hua T. Eisenbarth SSL 48 17 0 04 May 2023
DocILE Benchmark for Document Information Localization and Extraction vStvepán vSimsa Milan vSulc Michal Uvrivcávr Yash J. Patel Ahmed Hamdi ... Matyávs Skalický Jivrí Matas Antoine Doucet Mickael Coustaty Dimosthenis Karatzas 24 33 0 11 Feb 2023
Unifying Vision, Text, and Layout for Universal Document Processing Zineng Tang Ziyi Yang Guoxin Wang Yuwei Fang Yang Liu Chenguang Zhu Michael Zeng Chao-Yue Zhang Mohit Bansal VLM 30 105 0 05 Dec 2022
Unimodal and Multimodal Representation Training for Relation Extraction Ciaran Cooney Rachel Heyburn Liam Maddigan Mairead O'Cuinn Chloe Thompson Joana Cavadas 28 2 0 11 Nov 2022
Evaluating Out-of-Distribution Performance on Document Image Classifiers Stefan Larson Gordon Lim Yutong Ai David Kuang Kevin Leach OODD OOD 34 18 0 14 Oct 2022
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking Yupan Huang Tengchao Lv Lei Cui Yutong Lu Furu Wei 25 432 0 18 Apr 2022
DiT: Self-supervised Pre-training for Document Image Transformer Junlong Li Yiheng Xu Tengchao Lv Lei Cui Chaoxi Zhang Furu Wei ViT VLM 35 159 0 04 Mar 2022
MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding Junlong Li Yiheng Xu Lei Cui Furu Wei VLM 3DGS 23 59 0 16 Oct 2021
Skim-Attention: Learning to Focus via Document Layout Laura Nguyen Thomas Scialom Jacopo Staiano Benjamin Piwowarski 13 9 0 02 Sep 2021
StrucTexT: Structured Text Understanding with Multi-Modal Transformers Yulin Li Yuxi Qian Yuchen Yu Xiameng Qin Chengquan Zhang Yan Liu Kun Yao Junyu Han Jingtuo Liu Errui Ding 27 113 0 06 Aug 2021
SelfDoc: Self-Supervised Document Representation Learning Peizhao Li Jiuxiang Gu Jason Kuen Vlad I. Morariu Handong Zhao R. Jain Varun Manjunatha Hongfu Liu ViT SSL 20 159 0 07 Jun 2021
End-to-End Hierarchical Relation Extraction for Generic Form Understanding Tuan-Anh Dang Nguyen Duc Thanh Hoang Q. Tran Chih-Wei Pan T. Nguyen 11 10 0 02 Jun 2021
ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents Weihong Lin Qifang Gao Lei-huan Sun Zhuoyao Zhong Kaiqin Hu Qin Ren Qiang Huo 23 37 0 25 May 2021
LAMPRET: Layout-Aware Multimodal PreTraining for Document Understanding Te-Lin Wu Cheng-rong Li Mingyang Zhang Tao Chen Spurthi Amba Hombaiah Michael Bendersky 13 14 0 16 Apr 2021
A Survey of Deep Learning Approaches for OCR and Document Understanding Nishant Subramani Alexandre Matton Malcolm Greaves Adrian Lam 11 48 0 27 Nov 2020
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents Guillaume Jaume H. K. Ekenel Jean-Philippe Thiran 134 355 0 27 May 2019