Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.07553
Cited By
The future of document indexing: GPT and Donut revolutionize table of content processing
12 March 2024
Degaga Wolde Feyisa
Haylemicheal Berihun
Amanuel Zewdu
Mahsa Najimoghadam
Marzieh Zare
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"The future of document indexing: GPT and Donut revolutionize table of content processing"
11 / 11 papers shown
Title
PP-StructureV2: A Stronger Document Analysis System
Chenxia Li
Ruoyu Guo
Jun Zhou
Mengtao An
Yuning Du
Lingfeng Zhu
Yi Liu
Xiaoguang Hu
Dianhai Yu
98
23
0
11 Oct 2022
DeeperDive: The Unreasonable Effectiveness of Weak Supervision in Document Understanding A Case Study in Collaboration with UiPath Inc
Emad Elwany
Allison Hegel
Marina Shah
Brendan Roof
Genevieve Peaslee
Quentin Rivet
32
2
0
17 Aug 2022
PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR System
Chenxia Li
Weiwei Liu
Ruoyu Guo
Xiaoyue Yin
Kaitao Jiang
...
Lingfeng Zhu
Baohua Lai
Xiaoguang Hu
Dianhai Yu
Yanjun Ma
109
113
0
07 Jun 2022
BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents
Teakgyu Hong
Donghyun Kim
Mingi Ji
Wonseok Hwang
Daehyun Nam
Sungrae Park
VLM
87
154
0
10 Aug 2021
ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents
Weihong Lin
Qifang Gao
Lei-huan Sun
Zhuoyao Zhong
Kaiqin Hu
Qin Ren
Qiang Huo
68
39
0
25 May 2021
Kleister: Key Information Extraction Datasets Involving Long Documents with Complex Layouts
Tomasz Stanislawek
Filip Graliñski
Anna Wróblewska
Dawid Lipiñski
Agnieszka Kaliska
Paulina Rosalska
Bartosz Topolski
P. Biecek
81
95
0
12 May 2021
ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction
Zheng Huang
Kai Chen
Jianhua He
X. Bai
Dimosthenis Karatzas
Shijian Lu
C. V. Jawahar
62
321
0
18 Mar 2021
PP-OCR: A Practical Ultra Lightweight OCR System
Yuning Du
Chenxia Li
Ruoyu Guo
Xiaoting Yin
Weiwei Liu
...
Yifan Bai
Zilin Yu
Yehua Yang
Qingqing Dang
Hongya Wang
78
195
0
21 Sep 2020
Kleister: A novel task for Information Extraction involving Long Documents with Complex Layout
Filip Graliñski
Tomasz Stanislawek
Anna Wróblewska
Dawid Lipiñski
Agnieszka Kaliska
Paulina Rosalska
Bartosz Topolski
P. Biecek
67
41
0
04 Mar 2020
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Yiheng Xu
Minghao Li
Lei Cui
Shaohan Huang
Furu Wei
Ming Zhou
139
712
0
31 Dec 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
343
1,920
0
17 Sep 2019
1