DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents

24 April 2023

Papers citing "DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents"

14 / 14 papers shown

Title
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models M. Dhouib Davide Buscaldi Sonia Vanier A. Shabou VLM 36 0 0 11 Apr 2025
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models Wenwen Yu Zhibo Yang Jianqiang Wan Sibo Song J. Tang Wenqing Cheng Y. Liu Xiang Bai 46 1 0 22 Feb 2025
HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction Rujiao Long Pengfei Wang Zhibo Yang Cong Yao 39 0 0 02 Nov 2024
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts I. de Rodrigo A. Sanchez-Cuadrado J. Boal A. J. Lopez-Lopez VLM 21 1 0 31 Aug 2024
SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding Chuanghao Ding Xuejing Liu Wei Tang Juan Li Xiaoliang Wang Rui Zhao Cam-Tu Nguyen Fei Tan 23 0 0 27 Aug 2024
ChatSchema: A pipeline of extracting structured information with Large Multimodal Models based on schema Fei Wang Yuewen Zheng Qin Li Jingyi Wu Pengfei Li Lu Zhang 43 0 0 26 Jul 2024
Reconstructing training data from document understanding models Jérémie Dentan Arnaud Paran A. Shabou AAML SyDa 38 1 0 05 Jun 2024
RealKIE: Five Novel Datasets for Enterprise Key Information Extraction Benjamin Townsend Madison May Christopher Wells SyDa 35 0 0 29 Mar 2024
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition Jianqiang Wan Sibo Song Wenwen Yu Yuliang Liu Wenqing Cheng Fei Huang Xiang Bai Cong Yao Zhibo Yang 43 26 0 28 Mar 2024
The future of document indexing: GPT and Donut revolutionize table of content processing Degaga Wolde Feyisa Haylemicheal Berihun Amanuel Zewdu Mahsa Najimoghadam Marzieh Zare 27 0 0 12 Mar 2024
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering Wenjin Wang Yunhao Li Yixin Ou Yin Zhang VLM 21 24 0 01 Jun 2023
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding Yang Xu Yiheng Xu Tengchao Lv Lei Cui Furu Wei ... D. Florêncio Cha Zhang Wanxiang Che Min Zhang Lidong Zhou ViT MLLM 145 498 0 29 Dec 2020
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand M. Andreetto Hartwig Adam 3DH 950 20,561 0 17 Apr 2017
Aggregated Residual Transformations for Deep Neural Networks Saining Xie Ross B. Girshick Piotr Dollár Z. Tu Kaiming He 297 10,216 0 16 Nov 2016