Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.10213
Cited By
ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction
18 March 2021
Zheng Huang
Kai Chen
Jianhua He
X. Bai
Dimosthenis Karatzas
Shijian Lu
C. V. Jawahar
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
50 / 186 papers shown
Title
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
Jinghui Lu
Haiyang Yu
Yalin Wang
Yongjie Ye
Jingqun Tang
...
Qi Liu
Hao Feng
Hairu Wang
Hao Liu
Can Huang
61
21
0
02 Jul 2024
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
Yubo Ma
Yuhang Zang
Liangyu Chen
Meiqi Chen
Yizhu Jiao
...
Liangming Pan
Yu-Gang Jiang
Jiaqi Wang
Yixin Cao
Aixin Sun
ELM
RALM
VLM
39
25
0
01 Jul 2024
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Jiaxin Zhang
Wentao Yang
Songxuan Lai
Zecheng Xie
Lianwen Jin
42
15
0
27 Jun 2024
On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning
Geewook Kim
Minjoon Seo
VLM
49
2
0
17 Jun 2024
SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding
Jiefeng Ma
Yan Wang
Chenyu Liu
Jun Du
Yu Hu
Zhenrong Zhang
Pengfei Hu
Qing Wang
Jianshu Zhang
41
0
0
13 Jun 2024
CORU: Comprehensive Post-OCR Parsing and Receipt Understanding Dataset
Abdelrahman Abdallah
Mahmoud Abdalla
M. Kasem
Mohamed Mahmoud
Ibrahim Abdelhalim
Mohamed Elkasaby
Yasser Elbendary
Adam Jatowt
44
0
0
06 Jun 2024
Reconstructing training data from document understanding models
Jérémie Dentan
Arnaud Paran
A. Shabou
AAML
SyDa
54
1
0
05 Jun 2024
StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Pengyuan Lyu
Yulin Li
Hao Zhou
Weihong Ma
Xingyu Wan
...
Liang Wu
Chengquan Zhang
Kun Yao
Errui Ding
Jingdong Wang
41
7
0
31 May 2024
Lightweight Spatial Modeling for Combinatorial Information Extraction From Documents
Yanfei Dong
Lambert Deng
Jiazheng Zhang
Xiaodong Yu
Ting Lin
Francesco Gelli
Soujanya Poria
W. Lee
45
0
0
08 May 2024
GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding
Nil Biescas
Carlos Boned Riera
Josep Lladós
Sanket Biswas
42
1
0
06 May 2024
KVP10k : A Comprehensive Dataset for Key-Value Pair Extraction in Business Documents
O. Naparstek
Roi Pony
Inbar Shapira
Foad Abo Dahood
Ophir Azulai
...
Idan Friedman
Orit Prince
Yevgeny Burshtein
Adi Raz Goldfarb
Udi Barzelay
28
3
0
01 May 2024
CREPE: Coordinate-Aware End-to-End Document Parser
Yamato Okamoto
Youngmin Baek
Geewook Kim
Ryota Nakao
Donghyun Kim
Moonbin Yim
Seunghyun Park
Bado Lee
37
1
0
01 May 2024
Mixed Text Recognition with Efficient Parameter Fine-Tuning and Transformer
Da Chang
Yu Li
72
2
0
19 Apr 2024
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Jingqun Tang
Chunhui Lin
Zhen Zhao
Shubo Wei
Binghong Wu
...
Yuliang Liu
Hao Liu
Yuan Xie
Xiang Bai
Can Huang
LRM
VLM
MLLM
76
30
0
19 Apr 2024
TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding
Bozhi Luan
Hao Feng
Hong Chen
Yonghui Wang
Wen-gang Zhou
Houqiang Li
MLLM
42
11
0
15 Apr 2024
HRVDA: High-Resolution Visual Document Assistant
Chaohu Liu
Kun Yin
Haoyu Cao
Xinghua Jiang
Xin Li
Yinsong Liu
Deqiang Jiang
Xing Sun
Linli Xu
VLM
45
24
0
10 Apr 2024
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
Chuwei Luo
Yufan Shen
Zhaoqing Zhu
Qi Zheng
Zhi Yu
Cong Yao
42
41
0
08 Apr 2024
BuDDIE: A Business Document Dataset for Multi-task Information Extraction
Ran Zmigrod
Dongsheng Wang
Mathieu Sibue
Yulong Pei
Petr Babkin
...
Antony Papadimitriou
William Watson
Zhiqiang Ma
Armineh Nourbakhsh
Sameena Shah
27
4
0
05 Apr 2024
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition
Jianqiang Wan
Sibo Song
Wenwen Yu
Yuliang Liu
Wenqing Cheng
Fei Huang
Xiang Bai
Cong Yao
Zhibo Yang
56
28
0
28 Mar 2024
Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence
Hsiu-Wei Yang
Abhinav Agrawal
Pavlos Fragkogiannis
Shubham Nitin Mulay
37
1
0
27 Mar 2024
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Hao Shao
Shengju Qian
Han Xiao
Guanglu Song
Zhuofan Zong
Letian Wang
Yu Liu
Hongsheng Li
VGen
LRM
MLLM
71
43
0
25 Mar 2024
Visually Guided Generative Text-Layout Pre-training for Document Intelligence
Zhiming Mao
Haoli Bai
Lu Hou
Jiansheng Wei
Xin Jiang
Qun Liu
Kam-Fai Wong
37
8
0
25 Mar 2024
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models
Kung-Hsiang Huang
Hou Pong Chan
Yi R. Fung
Haoyi Qiu
Mingyang Zhou
Shafiq Joty
Shih-Fu Chang
Chenhui Xu
AI4TS
77
18
0
18 Mar 2024
The future of document indexing: GPT and Donut revolutionize table of content processing
Degaga Wolde Feyisa
Haylemicheal Berihun
Amanuel Zewdu
Mahsa Najimoghadam
Marzieh Zare
39
0
0
12 Mar 2024
Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis
Abdelrahman Abdallah
Daniel Eberharter
Zoe Pfister
Adam Jatowt
45
12
0
06 Mar 2024
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding
Hongshen Xu
Lu Chen
Zihan Zhao
Da Ma
Ruisheng Cao
Zichen Zhu
Kai Yu
42
2
0
28 Feb 2024
LAPDoc: Layout-Aware Prompting for Documents
Marcel Lamott
Yves-Noel Weweler
A. Ulges
Faisal Shafait
Dirk Krechel
Darko Obradovic
59
5
0
15 Feb 2024
Lumos : Empowering Multimodal LLMs with Scene Text Recognition
Ashish Shenoy
Yichao Lu
Srihari Jayakumar
Debojeet Chatterjee
Mohsen Moslehpour
...
Shicong Zhao
Longfang Zhao
Ankit Ramchandani
Xin Luna Dong
Anuj Kumar
MLLM
40
2
0
12 Feb 2024
TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing
Ran Zmigrod
Zhiqiang Ma
Armineh Nourbakhsh
Sameena Shah
31
4
0
07 Feb 2024
PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
Jinghui Lu
Ziwei Yang
Yanjie Wang
Xuejing Liu
Brian Mac Namee
Can Huang
MoE
58
5
0
07 Feb 2024
ANLS* -- A Universal Document Processing Metric for Generative Large Language Models
David Peer
Philemon Schöpf
V. Nebendahl
A. Rietzler
Sebastian Stabinger
35
3
0
06 Feb 2024
LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents
Ahmed Masry
Amir Hajian
37
2
0
26 Jan 2024
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions
Ryota Tanaka
Taichi Iki
Kyosuke Nishida
Kuniko Saito
Jun Suzuki
VLM
21
23
0
24 Jan 2024
UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-like Documents
Kai Hu
Jiawei Wang
Weihong Lin
Zhuoyao Zhong
Lei-huan Sun
Qiang Huo
45
1
0
17 Jan 2024
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction
Zening Lin
Jiapeng Wang
Teng Li
Wenhui Liao
Dayi Huang
Longfei Xiong
Lianwen Jin
29
2
0
07 Jan 2024
DocLLM: A layout-aware generative language model for multimodal document understanding
Dongsheng Wang
Natraj Raman
Mathieu Sibue
Zhiqiang Ma
Petr Babkin
Simerjot Kaur
Yulong Pei
Armineh Nourbakhsh
Xiaomo Liu
VLM
24
53
0
31 Dec 2023
Advancements and Challenges in Arabic Optical Character Recognition: A Comprehensive Survey
M. Kasem
Mohamed Mahmoud
H. Kang
30
7
0
19 Dec 2023
Toward Real Text Manipulation Detection: New Dataset and New Solution
Dongliang Luo
Yuliang Liu
Rui Yang
Xianjin Liu
Jishen Zeng
Yu Zhou
Xiang Bai
42
3
0
12 Dec 2023
EIGEN: Expert-Informed Joint Learning Aggregation for High-Fidelity Information Extraction from Document Images
A. Singh
Venkatapathy Subramanian
Ayush Maheshwari
Pradeep Narayan
D. P. Shetty
Ganesh Ramakrishnan
25
3
0
23 Nov 2023
Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs
Yonghui Wang
Wen-gang Zhou
Hao Feng
Keyi Zhou
Houqiang Li
73
19
0
22 Nov 2023
FATURA: A Multi-Layout Invoice Image Dataset for Document Analysis and Understanding
Mahmoud Limam
M. Dhiaf
Yousri Kessentini
23
2
0
20 Nov 2023
On Task-personalized Multimodal Few-shot Learning for Visually-rich Document Entity Retrieval
Jiayi Chen
H. Dai
Bo Dai
Aidong Zhang
Wei Wei
36
2
0
01 Nov 2023
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Tofik Ali
Partha Pratim Roy
19
0
0
25 Oct 2023
GenKIE: Robust Generative Multimodal Document Key Information Extraction
Panfeng Cao
Ye Wang
Qiang Zhang
Zaiqiao Meng
SyDa
29
6
0
24 Oct 2023
Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning
Hao Wang
Xiahua Chen
Rui Wang
Chenhui Chu
29
0
0
23 Oct 2023
Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction
Chong Zhang
Ya Guo
Yi Tu
Huan Chen
Jinyang Tang
Huijia Zhu
Qi Zhang
Tao Gui
3DV
40
20
0
17 Oct 2023
PrIeD-KIE: Towards Privacy Preserved Document Key Information Extraction
S. Saifullah
S. Agne
Andreas Dengel
Sheraz Ahmed
21
0
0
05 Oct 2023
ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Zejun Li
Ye Wang
Mengfei Du
Qingwen Liu
Binhao Wu
...
Zhihao Fan
Jie Fu
Jingjing Chen
Xuanjing Huang
Zhongyu Wei
38
13
0
04 Oct 2023
Kosmos-2.5: A Multimodal Literate Model
Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
...
Weiyao Luo
Shaoxiang Wu
Guoxin Wang
Cha Zhang
Furu Wei
VLM
MLLM
39
64
0
20 Sep 2023
AMuRD: Annotated Arabic-English Receipt Dataset for Key Information Extraction and Classification
Abdelrahman Abdallah
Mahmoud Abdalla
Mohamed Elkasaby
Yasser Elbendary
Adam Jatowt
35
0
0
18 Sep 2023
Previous
1
2
3
4
Next