Unified Pretraining Framework for Document Understanding

22 April 2022

Jiuxiang Gu

Papers citing "Unified Pretraining Framework for Document Understanding"

26 / 26 papers shown

Title
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations Linke Ouyang Yuan Qu Hongbin Zhou Jiawei Zhu Rui Zhang ... Chao Xu Bo Zhang Botian Shi Zhongying Tu Zeang Sheng 104 5 0 10 Dec 2024
DocMamba: Efficient Document Pre-training with State Space Model Pengfei Hu Zhenrong Zhang Jiefeng Ma Shuhang Liu Jun Du Jianshu Zhang Mamba 42 1 0 18 Sep 2024
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data Yufan Shen Chuwei Luo Zhaoqing Zhu Yang Chen Qi Zheng Zhi Yu Jiajun Bu Cong Yao 48 2 0 17 Jul 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications Jordy Van Landeghem Subhajit Maity Ayan Banerjee Matthew Blaschko Marie-Francine Moens Josep Lladós Sanket Biswas 50 2 0 12 Jun 2024
A Hybrid Approach for Document Layout Analysis in Document images Tahira Shehzadi Didier Stricker Muhammad Zeshan Afzal 37 5 0 27 Apr 2024
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering Yihao Ding Kaixuan Ren Jiabin Huang Siwen Luo S. Han 43 1 0 19 Apr 2024
Noise-Aware Training of Layout-Aware Language Models Ritesh Sarkhel Xiaoqi Ren Lauro Beltrao Costa Guolong Su Vincent Perot Yanan Xie Emmanouil Koukoumidis Arnab Nandi VLM 49 0 0 30 Mar 2024
DOCMASTER: A Unified Platform for Annotation, Training, & Inference in Document Question-Answering Alex Nguyen Zilong Wang Jingbo Shang Dheeraj Mekala 41 1 0 30 Mar 2024
TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing Ran Zmigrod Zhiqiang Ma Armineh Nourbakhsh Sameena Shah 24 4 0 07 Feb 2024
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap Daehee Kim Yoon Kim Donghyun Kim Yumin Lim Geewook Kim Taeho Kil 34 3 0 21 Sep 2023
A Graphical Approach to Document Layout Analysis Jilin Wang Michael Krumdick Baojia Tong Hamima Halim M. Sokolov Vadym Barda Delphine Vendryes Christy Tanner 21 8 0 03 Aug 2023
Bridging the Performance Gap between DETR and R-CNN for Graphical Object Detection in Document Images Tahira Shehzadi K. Hashmi D. Stricker Marcus Liwicki Muhammad Zeshan Afzal 29 7 0 23 Jun 2023
On Evaluation of Document Classification using RVL-CDIP Stefan Larson Gordon Lim Kevin Leach 36 3 0 21 Jun 2023
Towards Zero-shot Relation Extraction in Web Mining: A Multimodal Approach with Relative XML Path Zilong Wang Jingbo Shang 49 0 0 23 May 2023
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild Zhibo Yang Rujiao Long Pengfei Wang Sibo Song Humen Zhong Wenqing Cheng X. Bai Cong Yao 34 21 0 23 Mar 2023
Unifying Vision, Text, and Layout for Universal Document Processing Zineng Tang Ziyi Yang Guoxin Wang Yuwei Fang Yang Liu Chenguang Zhu Michael Zeng Chao-Yue Zhang Joey Tianyi Zhou VLM 32 106 0 05 Dec 2022
Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models Lei Wang Jian He Xingdong Xu Ning Liu Hui-juan Liu 41 2 0 27 Nov 2022
Evaluating Out-of-Distribution Performance on Document Image Classifiers Stefan Larson Gordon Lim Yutong Ai David Kuang Kevin Leach OODD OOD 37 18 0 14 Oct 2022
XDoc: Unified Pre-training for Cross-Format Document Understanding Jingye Chen Tengchao Lv Lei Cui Changrong Zhang Furu Wei 50 13 0 06 Oct 2022
Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis Siwen Luo Yi Ding Siqu Long Josiah Poon S. Han GNN 45 16 0 22 Aug 2022
Knowing Where and What: Unified Word Block Pretraining for Document Understanding Song Tao Zijian Wang Tiantian Fan Canjie Luo Can Huang SSL 40 2 0 28 Jul 2022
Test-Time Adaptation for Visual Document Understanding Sayna Ebrahimi Sercan Ö. Arik Tomas Pfister OOD 33 6 0 15 Jun 2022
Relational Representation Learning in Visually-Rich Documents Xin Li Yan Zheng Yiqing Hu H. Cao Yunfei Wu Deqiang Jiang Yinsong Liu Bo Ren 20 12 0 05 May 2022
OCR-IDL: OCR Annotations for Industry Document Library Dataset Ali Furkan Biten Rubèn Pérez Tito Lluís Gómez Ernest Valveny Dimosthenis Karatzas 25 26 0 25 Feb 2022
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding Yang Xu Yiheng Xu Tengchao Lv Lei Cui Furu Wei ... D. Florêncio Cha Zhang Wanxiang Che Min Zhang Lidong Zhou ViT MLLM 153 501 0 29 Dec 2020
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents Guillaume Jaume H. K. Ekenel Jean-Philippe Thiran 143 356 0 27 May 2019