ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.08609
  4. Cited By
Document AI: Benchmarks, Models and Applications

Document AI: Benchmarks, Models and Applications

16 November 2021
Lei Cui
Yiheng Xu
Tengchao Lv
Furu Wei
    VLM
ArXivPDFHTML

Papers citing "Document AI: Benchmarks, Models and Applications"

40 / 40 papers shown
Title
Visual Text Processing: A Comprehensive Review and Unified Evaluation
Visual Text Processing: A Comprehensive Review and Unified Evaluation
Yan Shu
Weichao Zeng
Fangmin Zhao
Zeyu Chen
Z. Li
...
Paolo Rota
Xiang Bai
Lianwen Jin
Xu-Cheng Yin
N. Sebe
CoGe
52
0
0
30 Apr 2025
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
Zhaoqing Zhu
Chuwei Luo
Zirui Shao
Feiyu Gao
Hangdi Xing
Qi Zheng
Ji Zhang
50
0
0
24 Mar 2025
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs
Gaye Colakoglu
Gürkan Solmaz
Jonathan Fürst
45
1
0
25 Feb 2025
See then Tell: Enhancing Key Information Extraction with Vision
  Grounding
See then Tell: Enhancing Key Information Extraction with Vision Grounding
Shuhang Liu
Zhenrong Zhang
Pengfei Hu
Jiefeng Ma
Jun Du
Qing Wang
Jianshu Zhang
Chenyu Liu
26
0
0
29 Sep 2024
DocMamba: Efficient Document Pre-training with State Space Model
DocMamba: Efficient Document Pre-training with State Space Model
Pengfei Hu
Zhenrong Zhang
Jiefeng Ma
Shuhang Liu
Jun Du
Jianshu Zhang
Mamba
37
1
0
18 Sep 2024
OfficeBench: Benchmarking Language Agents across Multiple Applications
  for Office Automation
OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation
Zilong Wang
Yuedong Cui
Li Zhong
Zimin Zhang
Da Yin
Bill Yuchen Lin
Jingbo Shang
51
4
0
26 Jul 2024
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
Yufan Shen
Chuwei Luo
Zhaoqing Zhu
Yang Chen
Qi Zheng
Zhi Yu
Jiajun Bu
Cong Yao
40
2
0
17 Jul 2024
DocKylin: A Large Multimodal Model for Visual Document Understanding
  with Efficient Visual Slimming
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Jiaxin Zhang
Wentao Yang
Songxuan Lai
Zecheng Xie
Lianwen Jin
34
15
0
27 Jun 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem
Subhajit Maity
Ayan Banerjee
Matthew Blaschko
Marie-Francine Moens
Josep Lladós
Sanket Biswas
41
2
0
12 Jun 2024
XFormParser: A Simple and Effective Multimodal Multilingual
  Semi-structured Form Parser
XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser
Xianfu Cheng
Hang Zhang
Jian Yang
Xiang Li
Weixiao Zhou
...
Fei Liu
Wei Zhang
Tao Sun
Tongliang Li
Zhoujun Li
44
2
0
27 May 2024
A Hybrid Approach for Document Layout Analysis in Document images
A Hybrid Approach for Document Layout Analysis in Document images
Tahira Shehzadi
Didier Stricker
Muhammad Zeshan Afzal
29
5
0
27 Apr 2024
HRVDA: High-Resolution Visual Document Assistant
HRVDA: High-Resolution Visual Document Assistant
Chaohu Liu
Kun Yin
Haoyu Cao
Xinghua Jiang
Xin Li
Yinsong Liu
Deqiang Jiang
Xing Sun
Linli Xu
VLM
35
23
0
10 Apr 2024
LayoutLLM: Layout Instruction Tuning with Large Language Models for
  Document Understanding
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
Chuwei Luo
Yufan Shen
Zhaoqing Zhu
Qi Zheng
Zhi Yu
Cong Yao
29
38
0
08 Apr 2024
Can AI Models Appreciate Document Aesthetics? An Exploration of
  Legibility and Layout Quality in Relation to Prediction Confidence
Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence
Hsiu-Wei Yang
Abhinav Agrawal
Pavlos Fragkogiannis
Shubham Nitin Mulay
22
1
0
27 Mar 2024
DocLLM: A layout-aware generative language model for multimodal document
  understanding
DocLLM: A layout-aware generative language model for multimodal document understanding
Dongsheng Wang
Natraj Raman
Mathieu Sibue
Zhiqiang Ma
Petr Babkin
Simerjot Kaur
Yulong Pei
Armineh Nourbakhsh
Xiaomo Liu
VLM
14
50
0
31 Dec 2023
A Multi-Modal Multilingual Benchmark for Document Image Classification
A Multi-Modal Multilingual Benchmark for Document Image Classification
Yoshinari Fujinuma
Siddharth Varia
Nishant Sankaran
Srikar Appalaraju
Bonan Min
Yogarshi Vyas
VLM
18
4
0
25 Oct 2023
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and
  Beyond
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond
Cong Yao
34
6
0
19 Oct 2023
GridFormer: Towards Accurate Table Structure Recognition via Grid
  Prediction
GridFormer: Towards Accurate Table Structure Recognition via Grid Prediction
Pengyuan Lyu
Weihong Ma
Hongyi Wang
Yu Yu
Chengquan Zhang
Kun Yao
Yang Xue
Jingdong Wang
LMTD
26
8
0
26 Sep 2023
Comprehensive Overview of Named Entity Recognition: Models,
  Domain-Specific Applications and Challenges
Comprehensive Overview of Named Entity Recognition: Models, Domain-Specific Applications and Challenges
Kalyani Pakhale
23
21
0
25 Sep 2023
Kosmos-2.5: A Multimodal Literate Model
Kosmos-2.5: A Multimodal Literate Model
Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
...
Weiyao Luo
Shaoxiang Wu
Guoxin Wang
Cha Zhang
Furu Wei
VLM
MLLM
25
63
0
20 Sep 2023
Attention Where It Matters: Rethinking Visual Document Understanding
  with Selective Region Concentration
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
H. Cao
Changcun Bao
Chaohu Liu
Huang-wei Chen
Kun Yin
Hao Liu
Yinsong Liu
Deqiang Jiang
Xing Sun
17
13
0
03 Sep 2023
Document AI: A Comparative Study of Transformer-Based, Graph-Based
  Models, and Convolutional Neural Networks For Document Layout Analysis
Document AI: A Comparative Study of Transformer-Based, Graph-Based Models, and Convolutional Neural Networks For Document Layout Analysis
Sotirios Kastanas
Shaomu Tan
Yijiang He
25
1
0
29 Aug 2023
Vision Grid Transformer for Document Layout Analysis
Vision Grid Transformer for Document Layout Analysis
Cheng Da
Chuwei Luo
Qi Zheng
Cong Yao
ViT
26
27
0
29 Aug 2023
A Graphical Approach to Document Layout Analysis
A Graphical Approach to Document Layout Analysis
Jilin Wang
Michael Krumdick
Baojia Tong
Hamima Halim
M. Sokolov
Vadym Barda
Delphine Vendryes
Christy Tanner
21
8
0
03 Aug 2023
DocTr: Document Transformer for Structured Information Extraction in
  Documents
DocTr: Document Transformer for Structured Information Extraction in Documents
Haofu Liao
Aruni RoyChowdhury
Weijian Li
Ankan Bansal
Yuting Zhang
Z. Tu
R. Satzoda
R. Manmatha
Vijay Mahadevan
15
11
0
16 Jul 2023
Global Structure Knowledge-Guided Relation Extraction Method for
  Visually-Rich Document
Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document
Xiangnan Chen
Qianwen Xiao
Juncheng Li
Duo Dong
Jun Lin
Xiaozhong Liu
Siliang Tang
32
5
0
23 May 2023
Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text
  Documents via Semantic-Oriented Hierarchical Graphs
Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text Documents via Semantic-Oriented Hierarchical Graphs
Fengbin Zhu
Chao Wang
Fuli Feng
Zifeng Ren
Moxin Li
Tat-Seng Chua
37
3
0
03 May 2023
Structure Diagram Recognition in Financial Announcements
Structure Diagram Recognition in Financial Announcements
Meixuan Qiao
Jun Wang
Junfu Xiang
Qiyu Hou
Ruixuan Li
27
1
0
26 Apr 2023
GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
Chuwei Luo
Changxu Cheng
Qi Zheng
Cong Yao
19
43
0
21 Apr 2023
HRDoc: Dataset and Baseline Method Toward Hierarchical Reconstruction of
  Document Structures
HRDoc: Dataset and Baseline Method Toward Hierarchical Reconstruction of Document Structures
Jiefeng Ma
Jun Du
Pengfei Hu
Zhenrong Zhang
Jianshu Zhang
Huihui Zhu
Cong Liu
19
15
0
24 Mar 2023
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document
  Understanding
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding
Haoli Bai
Zhiguang Liu
Xiaojun Meng
Wentao Li
Shuangning Liu
...
Liangwei Wang
Lu Hou
Jiansheng Wei
Xin Jiang
Qun Liu
ViT
22
11
0
19 Dec 2022
XDoc: Unified Pre-training for Cross-Format Document Understanding
XDoc: Unified Pre-training for Cross-Format Document Understanding
Jingye Chen
Tengchao Lv
Lei Cui
Changrong Zhang
Furu Wei
48
13
0
06 Oct 2022
Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot
  Document-Level Question Answering
Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot Document-Level Question Answering
T. McDonald
Brian Tsan
Amar Saini
Juanita Ordoñez
Luis Gutierrez
Phan-Anh-Huy Nguyen
Blake Mason
Brenda Ng
RALM
19
3
0
04 Oct 2022
ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document
  Understanding
ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding
Wenjin Wang
Zhengjie Huang
Bin Luo
Qianglong Chen
Qiming Peng
...
Weichong Yin
Shi Feng
Yu Sun
Dianhai Yu
Yin Zhang
ViT
27
11
0
18 Sep 2022
Towards Complex Document Understanding By Discrete Reasoning
Towards Complex Document Understanding By Discrete Reasoning
Fengbin Zhu
Wenqiang Lei
Fuli Feng
Chao Wang
Haozhou Zhang
Tat-Seng Chua
31
42
0
25 Jul 2022
LayoutLMv3: Pre-training for Document AI with Unified Text and Image
  Masking
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Yupan Huang
Tengchao Lv
Lei Cui
Yutong Lu
Furu Wei
25
432
0
18 Apr 2022
Do BERTs Learn to Use Browser User Interface? Exploring Multi-Step Tasks
  with Unified Vision-and-Language BERTs
Do BERTs Learn to Use Browser User Interface? Exploring Multi-Step Tasks with Unified Vision-and-Language BERTs
Taichi Iki
Akiko Aizawa
LLMAG
11
6
0
15 Mar 2022
DiT: Self-supervised Pre-training for Document Image Transformer
DiT: Self-supervised Pre-training for Document Image Transformer
Junlong Li
Yiheng Xu
Tengchao Lv
Lei Cui
Chaoxi Zhang
Furu Wei
ViT
VLM
35
159
0
04 Mar 2022
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document
  Understanding
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Yang Xu
Yiheng Xu
Tengchao Lv
Lei Cui
Furu Wei
...
D. Florêncio
Cha Zhang
Wanxiang Che
Min Zhang
Lidong Zhou
ViT
MLLM
145
498
0
29 Dec 2020
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents
Guillaume Jaume
H. K. Ekenel
Jean-Philippe Thiran
131
355
0
27 May 2019
1