Title
Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks Andrea Gemelli Sanket Biswas Enrico Civitelli Josep Lladós S. Marinai 13 15 0 23 Aug 2022
TaCo: Textual Attribute Recognition via Contrastive Learning Chang Nie Yiqing Hu Yanqiu Qu Hao Liu Deqiang Jiang Bo Ren 27 0 0 22 Aug 2022
Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis Siwen Luo Yi Ding Siqu Long Josiah Poon S. Han GNN 20 16 0 22 Aug 2022
Information Extraction from Scanned Invoice Images using Text Analysis and Layout Features H. Ha Ales Horak 23 14 0 08 Aug 2022
Knowing Where and What: Unified Word Block Pretraining for Document Understanding Song Tao Zijian Wang Tiantian Fan Canjie Luo Can Huang SSL 32 2 0 28 Jul 2022
Towards Complex Document Understanding By Discrete Reasoning Fengbin Zhu Wenqiang Lei Fuli Feng Chao Wang Haozhou Zhang Tat-Seng Chua 31 42 0 25 Jul 2022
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge Linxi Fan Guanzhi Wang Yunfan Jiang Ajay Mandlekar Yuncong Yang Haoyi Zhu Andrew Tang De-An Huang Yuke Zhu Anima Anandkumar LM&Ro 46 348 0 17 Jun 2022
Test-Time Adaptation for Visual Document Understanding Sayna Ebrahimi Sercan Ö. Arik Tomas Pfister OOD 33 6 0 15 Jun 2022
Multimodal Learning with Transformers: A Survey P. Xu Xiatian Zhu David A. Clifton ViT 54 527 0 13 Jun 2022
V-Doc : Visual questions answers with Documents Yihao Ding Zhe Huang Runlin Wang Yanhang Zhang Xianru Chen Yuzhong Ma Hyunsuk Chung S. Han 25 15 0 27 May 2022
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI Liangtai Sun Xingyu Chen Lu Chen Tianle Dai Zichen Zhu Kai Yu LLMAG 20 50 0 23 May 2022
LayoutXLM vs. GNN: An Empirical Evaluation of Relation Extraction for Documents Hervé Déjean S. Clinchant Jean-Luc Meunier 22 4 0 09 May 2022
Relational Representation Learning in Visually-Rich Documents Xin Li Yan Zheng Yiqing Hu H. Cao Yunfei Wu Deqiang Jiang Yinsong Liu Bo Ren 18 12 0 05 May 2022
Vision-Language Pre-Training for Boosting Scene Text Detectors Sibo Song Jianqiang Wan Zhibo Yang Jun Tang Wenqing Cheng Xiang Bai Cong Yao VLM 41 24 0 29 Apr 2022
Digitizing Historical Balance Sheet Data: A Practitioner's Guide Sergio Correia Stephan Luck 26 10 0 31 Mar 2022
End-to-end Document Recognition and Understanding with Dessurt Brian L. Davis B. Morse Brian L. Price Chris Tensmeyer Curtis Wigington Vlad I. Morariu VLM ViT 24 73 0 30 Mar 2022
DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition Denis Coquenet Clément Chatelain Thierry Paquet 30 57 0 23 Mar 2022
XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding Zhangxuan Gu Changhua Meng Ke Wang Jun Lan Weiqiang Wang Ming Gu Liqing Zhang 31 76 0 14 Mar 2022
DiT: Self-supervised Pre-training for Document Image Transformer Junlong Li Yiheng Xu Tengchao Lv Lei Cui Chaoxi Zhang Furu Wei ViT VLM 35 159 0 04 Mar 2022
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding Jiapeng Wang Lianwen Jin Kai Ding VLM 30 138 0 28 Feb 2022
WebFormer: The Web-page Transformer for Structure Information Extraction Qifan Wang Yi Fang Anirudh Ravula Fuli Feng Xiaojun Quan Dongfang Liu ViT 141 65 0 01 Feb 2022
DocEnTr: An End-to-End Document Image Enhancement Transformer Mohamed Ali Souibgui Sanket Biswas Sana Khamekhem Jemni Yousri Kessentini Alicia Fornés Josep Lladós Umapada Pal ViT 55 45 0 25 Jan 2022
LaTr: Layout-Aware Transformer for Scene-Text VQA Ali Furkan Biten Ron Litman Yusheng Xie Srikar Appalaraju R. Manmatha ViT 29 100 0 23 Dec 2021
Value Retrieval with Arbitrary Queries for Form-like Documents M. Gao Le Xue Chetan Ramaiah Chen Xing Ran Xu Caiming Xiong 15 6 0 15 Dec 2021
OCR-free Document Understanding Transformer Geewook Kim Teakgyu Hong Moonbin Yim Jeongyeon Nam Jinyoung Park Jinyeong Yim Wonseok Hwang Sangdoo Yun Dongyoon Han Seunghyun Park ViT 50 262 0 30 Nov 2021
Document AI: Benchmarks, Models and Applications Lei Cui Yiheng Xu Tengchao Lv Furu Wei VLM 21 69 0 16 Nov 2021
ICDAR 2021 Competition on Document VisualQuestion Answering Rubèn Pérez Tito Minesh Mathew C. V. Jawahar Ernest Valveny Dimosthenis Karatzas 35 23 0 10 Nov 2021
Information Extraction from Visually Rich Documents with Font Style Embeddings Ismail Oussaid William Vanhuffel Pirashanth Ratnamogan Mhamed Hajaiej Alexis Mathey Thomas Gilles 19 1 0 07 Nov 2021
Entity Relation Extraction as Dependency Parsing in Visually Rich Documents Yue Zhang Bo-Wen Zhang Rui Wang Junjie Cao Chen Li Zuyi Bao 40 32 0 19 Oct 2021
MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding Junlong Li Yiheng Xu Lei Cui Furu Wei VLM 3DGS 25 59 0 16 Oct 2021
OPAD: An Optimized Policy-based Active Learning Framework for Document Content Analysis Sumit Shekhar Bhanu Prakash Reddy Guda Ashutosh Chaubey Ishan Jindal Avanish Jain 30 0 0 01 Oct 2021
Skim-Attention: Learning to Focus via Document Layout Laura Nguyen Thomas Scialom Jacopo Staiano Benjamin Piwowarski 18 9 0 02 Sep 2021
BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents Teakgyu Hong Donghyun Kim Mingi Ji Wonseok Hwang Daehyun Nam Sungrae Park VLM 34 150 0 10 Aug 2021
DocFormer: End-to-End Transformer for Document Understanding Srikar Appalaraju Bhavan A. Jasani Bhargava Urala Kota Yusheng Xie R. Manmatha ViT 29 270 0 22 Jun 2021
VILA: Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups Zejiang Shen Kyle Lo Lucy Lu Wang Bailey Kuehl Daniel S. Weld Doug Downey VLM 16 34 0 01 Jun 2021
InfographicVQA Minesh Mathew Viraj Bagal Rubèn Pérez Tito Dimosthenis Karatzas Ernest Valveny C. V. Jawahar 22 203 0 26 Apr 2021
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding Yiheng Xu Tengchao Lv Lei Cui Guoxin Wang Yijuan Lu D. Florêncio Cha Zhang Furu Wei MLLM VLM 32 127 0 18 Apr 2021
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer Rafal Powalski Łukasz Borchmann Dawid Jurkiewicz Tomasz Dwojak Michal Pietruszka Gabriela Pałka ViT 33 157 0 18 Feb 2021
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents Guillaume Jaume H. K. Ekenel Jean-Philippe Thiran 134 355 0 27 May 2019
Aggregated Residual Transformations for Deep Neural Networks Saining Xie Ross B. Girshick Piotr Dollár Z. Tu Kaiming He 297 10,220 0 16 Nov 2016
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Z. Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 716 6,743 0 26 Sep 2016