StrucTexT: Structured Text Understanding with Multi-Modal Transformers

6 August 2021

Yan Liu

Errui Ding

Papers citing "StrucTexT: Structured Text Understanding with Multi-Modal Transformers"

27 / 27 papers shown

Title
GraphRevisedIE: Multimodal Information Extraction with Graph-Revised Network Panfeng Cao Jian Wu 30 9 0 02 Oct 2024
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data Yufan Shen Chuwei Luo Zhaoqing Zhu Yang Chen Qi Zheng Zhi Yu Jiajun Bu Cong Yao 53 2 0 17 Jul 2024
A Hybrid Approach for Document Layout Analysis in Document images Tahira Shehzadi Didier Stricker Muhammad Zeshan Afzal 37 5 0 27 Apr 2024
TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing Ran Zmigrod Zhiqiang Ma Armineh Nourbakhsh Sameena Shah 28 4 0 07 Feb 2024
DocGraphLM: Documental Graph Language Model for Information Extraction Dongsheng Wang Zhiqiang Ma Armineh Nourbakhsh Kang Gu Sameena Shah 40 8 0 05 Jan 2024
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision Yukun Zhai Xiaoqiang Zhang Xiameng Qin Sanyuan Zhao Xingping Dong Jianbing Shen 51 4 0 06 Jun 2023
A Question-Answering Approach to Key Value Pair Extraction from Form-like Document Images Kai Hu Zhuoyuan Wu Zhuoyao Zhong Weihong Lin Lei-huan Sun Qiang Huo 26 11 0 17 Apr 2023
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild Zhibo Yang Rujiao Long Pengfei Wang Sibo Song Humen Zhong Wenqing Cheng X. Bai Cong Yao 41 22 0 23 Mar 2023
ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction Jiabang He Lei Wang Yingpeng Hu Ning Liu Hui-juan Liu Xingdong Xu Hengtao Shen MLLM 8 46 0 09 Mar 2023
Unifying Vision, Text, and Layout for Universal Document Processing Zineng Tang Ziyi Yang Guoxin Wang Yuwei Fang Yang Liu Chenguang Zhu Michael Zeng Chao-Yue Zhang Joey Tianyi Zhou VLM 32 107 0 05 Dec 2022
Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models Lei Wang Jian He Xingdong Xu Ning Liu Hui-juan Liu 41 2 0 27 Nov 2022
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion Zixiang Zhao Hao Bai Jiangshe Zhang Yulun Zhang Shuang Xu Zudi Lin Radu Timofte Luc Van Gool 40 311 0 26 Nov 2022
Unimodal and Multimodal Representation Training for Relation Extraction Ciaran Cooney Rachel Heyburn Liam Maddigan Mairead O'Cuinn Chloe Thompson Joana Cavadas 36 2 0 11 Nov 2022
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding Qiming Peng Yinxu Pan Wenjin Wang Bin Luo Zhenyu Zhang ... Shi Feng Yu Sun Hao Tian Hua Wu Haifeng Wang 13 83 0 12 Oct 2022
ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding Wenjin Wang Zhengjie Huang Bin Luo Qianglong Chen Qiming Peng ... Weichong Yin Shi Feng Yu Sun Dianhai Yu Yin Zhang ViT 35 11 0 18 Sep 2022
Knowing Where and What: Unified Word Block Pretraining for Document Understanding Song Tao Zijian Wang Tiantian Fan Canjie Luo Can Huang SSL 42 2 0 28 Jul 2022
Layout-Aware Information Extraction for Document-Grounded Dialogue: Dataset, Method and Demonstration Zhenyu Zhang Yu Bowen Haiyang Yu Tingwen Liu Cheng Fu Jingyang Li Chengguang Tang Jian Sun Yongbin Li 44 6 0 14 Jul 2022
Relational Representation Learning in Visually-Rich Documents Xin Li Yan Zheng Yiqing Hu H. Cao Yunfei Wu Deqiang Jiang Yinsong Liu Bo Ren 25 12 0 05 May 2022
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding Jiapeng Wang Lianwen Jin Kai Ding VLM 35 140 0 28 Feb 2022
Modelling the semantics of text in complex document layouts using graph transformer networks T. Barillot Jacob Saks Polena Lilyanova Edward Torgas Yachen Hu Yuanqing Liu Varun Balupuri P. Gaskell GNN 42 0 0 18 Feb 2022
BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents Teakgyu Hong Donghyun Kim Mingi Ji Wonseok Hwang Daehyun Nam Sungrae Park VLM 34 152 0 10 Aug 2021
InfographicVQA Minesh Mathew Viraj Bagal Rubèn Pérez Tito Dimosthenis Karatzas Ernest Valveny C. V. Jawahar 42 209 0 26 Apr 2021
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding Yang Xu Yiheng Xu Tengchao Lv Lei Cui Furu Wei ... D. Florêncio Cha Zhang Wanxiang Che Min Zhang Lidong Zhou ViT MLLM 153 502 0 29 Dec 2020
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents Guillaume Jaume H. K. Ekenel Jean-Philippe Thiran 143 357 0 27 May 2019
Bag of Tricks for Image Classification with Convolutional Neural Networks Tong He Zhi-Li Zhang Hang Zhang Zhongyue Zhang Junyuan Xie Mu Li 224 1,400 0 04 Dec 2018
Aggregated Residual Transformations for Deep Neural Networks Saining Xie Ross B. Girshick Piotr Dollár Zhuowen Tu Kaiming He 348 10,237 0 16 Nov 2016
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Zhehuai Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 718 6,750 0 26 Sep 2016