Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.14740
Cited By
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
29 December 2020
Yang Xu
Yiheng Xu
Tengchao Lv
Lei Cui
Furu Wei
Guoxin Wang
Yijuan Lu
D. Florêncio
Cha Zhang
Wanxiang Che
Min Zhang
Lidong Zhou
ViT
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding"
50 / 88 papers shown
Title
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding
Binh M. Le
Shaoyuan Xu
Jinmiao Fu
Zhishen Huang
Moyan Li
Yanhui Guo
Hongdong Li
Sameera Ramasinghe
Bryan Wang
33
0
0
03 Apr 2025
KIEval: Evaluation Metric for Document Key Information Extraction
Minsoo Khang
Sang Chul Jung
Sungrae Park
Teakgyu Hong
47
0
0
07 Mar 2025
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
Thanh-Phong Le
Trung Le Chi Phan
Nghia Hieu Nguyen
Kiet Van Nguyen
ViT
44
0
0
26 Feb 2025
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures
S. Kamath S
Nakul Sharma
Manish Gupta
Anand Mishra
48
1
0
28 Jan 2025
Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map
Xinyuan Chang
Maixuan Xue
Xinran Liu
Zheng Pan
Xing Wei
53
1
0
31 Oct 2024
GraphRevisedIE: Multimodal Information Extraction with Graph-Revised Network
Panfeng Cao
Jian Wu
28
9
0
02 Oct 2024
DocMamba: Efficient Document Pre-training with State Space Model
Pengfei Hu
Zhenrong Zhang
Jiefeng Ma
Shuhang Liu
Jun Du
Jianshu Zhang
Mamba
37
1
0
18 Sep 2024
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Ofir Abramovich
Niv Nayman
Sharon Fogel
I. Lavi
Ron Litman
Shahar Tsiper
Royee Tichauer
Srikar Appalaraju
Shai Mazor
R. Manmatha
VLM
33
3
0
17 Jul 2024
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
Yufan Shen
Chuwei Luo
Zhaoqing Zhu
Yang Chen
Qi Zheng
Zhi Yu
Jiajun Bu
Cong Yao
40
2
0
17 Jul 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem
Subhajit Maity
Ayan Banerjee
Matthew Blaschko
Marie-Francine Moens
Josep Lladós
Sanket Biswas
48
2
0
12 Jun 2024
Reconstructing training data from document understanding models
Jérémie Dentan
Arnaud Paran
A. Shabou
AAML
SyDa
41
1
0
05 Jun 2024
Lightweight Spatial Modeling for Combinatorial Information Extraction From Documents
Yanfei Dong
Lambert Deng
Jiazheng Zhang
Xiaodong Yu
Ting Lin
Francesco Gelli
Soujanya Poria
W. Lee
35
0
0
08 May 2024
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering
Yihao Ding
Kaixuan Ren
Jiabin Huang
Siwen Luo
S. Han
40
1
0
19 Apr 2024
Noise-Aware Training of Layout-Aware Language Models
Ritesh Sarkhel
Xiaoqi Ren
Lauro Beltrao Costa
Guolong Su
Vincent Perot
Yanan Xie
Emmanouil Koukoumidis
Arnab Nandi
VLM
44
0
0
30 Mar 2024
DOCMASTER: A Unified Platform for Annotation, Training, & Inference in Document Question-Answering
Alex Nguyen
Zilong Wang
Jingbo Shang
Dheeraj Mekala
33
1
0
30 Mar 2024
LOCR: Location-Guided Transformer for Optical Character Recognition
Yu Sun
Dongzhan Zhou
Chen Lin
Conghui He
Wanli Ouyang
Han-Sen Zhong
34
1
0
04 Mar 2024
TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing
Ran Zmigrod
Zhiqiang Ma
Armineh Nourbakhsh
Sameena Shah
24
4
0
07 Feb 2024
DocGraphLM: Documental Graph Language Model for Information Extraction
Dongsheng Wang
Zhiqiang Ma
Armineh Nourbakhsh
Kang Gu
Sameena Shah
31
8
0
05 Jan 2024
Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs
Yonghui Wang
Wen-gang Zhou
Hao Feng
Keyi Zhou
Houqiang Li
55
18
0
22 Nov 2023
DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency
Azhar Shaikh
Michael Cochez
Denis Diachkov
Michiel de Rijcke
Sahar Yousefi
25
0
0
09 Nov 2023
DCQA: Document-Level Chart Question Answering towards Complex Reasoning and Common-Sense Understanding
Anran Wu
Luwei Xiao
Xingjiao Wu
Shuwen Yang
Junjie Xu
Zisong Zhuang
Nian Xie
Cheng Jin
Liang He
24
0
0
29 Oct 2023
A Multi-Modal Multilingual Benchmark for Document Image Classification
Yoshinari Fujinuma
Siddharth Varia
Nishant Sankaran
Srikar Appalaraju
Bonan Min
Yogarshi Vyas
VLM
18
4
0
25 Oct 2023
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap
Daehee Kim
Yoon Kim
Donghyun Kim
Yumin Lim
Geewook Kim
Taeho Kil
28
3
0
21 Sep 2023
A Graphical Approach to Document Layout Analysis
Jilin Wang
Michael Krumdick
Baojia Tong
Hamima Halim
M. Sokolov
Vadym Barda
Delphine Vendryes
Christy Tanner
21
8
0
03 Aug 2023
On Evaluation of Document Classification using RVL-CDIP
Stefan Larson
Gordon Lim
Kevin Leach
26
3
0
21 Jun 2023
Towards Zero-shot Relation Extraction in Web Mining: A Multimodal Approach with Relative XML Path
Zilong Wang
Jingbo Shang
36
0
0
23 May 2023
Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution
Jianfeng Kuang
Wei Hua
Dingkang Liang
Mingkun Yang
Deqiang Jiang
Bo Ren
Xiang Bai
27
39
0
12 May 2023
EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification
Souhail Bakkali
Zuheng Ming
Mickael Coustaty
Marçal Rusiñol
10
6
0
11 May 2023
Language Independent Neuro-Symbolic Semantic Parsing for Form Understanding
Bhanu Prakash Voutharoja
Lizhen Qu
Fatemeh Shiri
22
1
0
08 May 2023
Structure Diagram Recognition in Financial Announcements
Meixuan Qiao
Jun Wang
Junfu Xiang
Qiyu Hou
Ruixuan Li
32
1
0
26 Apr 2023
DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents
M. Dhouib
G. Bettaieb
A. Shabou
17
20
0
24 Apr 2023
HRDoc: Dataset and Baseline Method Toward Hierarchical Reconstruction of Document Structures
Jiefeng Ma
Jun Du
Pengfei Hu
Zhenrong Zhang
Jianshu Zhang
Huihui Zhu
Cong Liu
21
15
0
24 Mar 2023
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild
Zhibo Yang
Rujiao Long
Pengfei Wang
Sibo Song
Humen Zhong
Wenqing Cheng
X. Bai
Cong Yao
32
19
0
23 Mar 2023
ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical Handwritten Documents
Sana Khamekhem Jemni
Sourour Ammar
Mohamed Ali Souibgui
Yousri Kessentini
A. Cheddad
17
3
0
06 Mar 2023
Entry Separation using a Mixed Visual and Textual Language Model: Application to 19th century French Trade Directories
Bertrand Duménieu
Edwin Carlinet
N. Abadie
Joseph Chazalon
24
0
0
17 Feb 2023
LoRaLay: A Multilingual and Multimodal Dataset for Long Range and Layout-Aware Summarization
Laura Nguyen
Thomas Scialom
Benjamin Piwowarski
Jacopo Staiano
27
7
0
26 Jan 2023
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images
Ryota Tanaka
Kyosuke Nishida
Kosuke Nishida
Taku Hasegawa
Itsumi Saito
Kuniko Saito
16
72
0
12 Jan 2023
Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models
Lei Wang
Jian He
Xingdong Xu
Ning Liu
Hui-juan Liu
33
2
0
27 Nov 2022
Unimodal and Multimodal Representation Training for Relation Extraction
Ciaran Cooney
Rachel Heyburn
Liam Maddigan
Mairead O'Cuinn
Chloe Thompson
Joana Cavadas
25
2
0
11 Nov 2022
Radically Lower Data-Labeling Costs for Visually Rich Document Extraction Models
Yichao Zhou
James Bradley Wendt
Navneet Potti
Jing Xie
Sandeep Tata
VLM
29
1
0
28 Oct 2022
Evaluating Out-of-Distribution Performance on Document Image Classifiers
Stefan Larson
Gordon Lim
Yutong Ai
David Kuang
Kevin Leach
OODD
OOD
34
18
0
14 Oct 2022
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding
Qiming Peng
Yinxu Pan
Wenjin Wang
Bin Luo
Zhenyu Zhang
...
Shi Feng
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
13
83
0
12 Oct 2022
PP-StructureV2: A Stronger Document Analysis System
Chenxia Li
Ruoyu Guo
Jun Zhou
Mengtao An
Yuning Du
Lingfeng Zhu
Yi Liu
Xiaoguang Hu
Dianhai Yu
49
22
0
11 Oct 2022
HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crisis Response
Selim Fekih
Nicolò Tamagnone
Benjamin Minixhofer
R. Shrestha
Ximena Contla
Ewan Oglethorpe
Navid Rekabsaz
11
6
0
10 Oct 2022
Key Information Extraction in Purchase Documents using Deep Learning and Rule-based Corrections
R. Arroyo
J. Yebes
E. Martínez
Hector Corrales
Javier Lorenzo
31
1
0
07 Oct 2022
XDoc: Unified Pre-training for Cross-Format Document Understanding
Jingye Chen
Tengchao Lv
Lei Cui
Changrong Zhang
Furu Wei
48
13
0
06 Oct 2022
ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding
Wenjin Wang
Zhengjie Huang
Bin Luo
Qianglong Chen
Qiming Peng
...
Weichong Yin
Shi Feng
Yu Sun
Dianhai Yu
Yin Zhang
ViT
27
11
0
18 Sep 2022
DM
2
^2
2
S
2
^2
2
: Deep Multi-Modal Sequence Sets with Hierarchical Modality Attention
Shunsuke Kitada
Yuki Iwazaki
Riku Togashi
Hitoshi Iyatomi
21
1
0
07 Sep 2022
Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks
Andrea Gemelli
Sanket Biswas
Enrico Civitelli
Josep Lladós
S. Marinai
13
15
0
23 Aug 2022
TaCo: Textual Attribute Recognition via Contrastive Learning
Chang Nie
Yiqing Hu
Yanqiu Qu
Hao Liu
Deqiang Jiang
Bo Ren
27
0
0
22 Aug 2022
1
2
Next