Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.14740
Cited By
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
29 December 2020
Yang Xu
Yiheng Xu
Tengchao Lv
Lei Cui
Furu Wei
Guoxin Wang
Yijuan Lu
D. Florêncio
Cha Zhang
Wanxiang Che
Min Zhang
Lidong Zhou
ViT
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding"
41 / 91 papers shown
Title
Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks
Andrea Gemelli
Sanket Biswas
Enrico Civitelli
Josep Lladós
S. Marinai
13
15
0
23 Aug 2022
TaCo: Textual Attribute Recognition via Contrastive Learning
Chang Nie
Yiqing Hu
Yanqiu Qu
Hao Liu
Deqiang Jiang
Bo Ren
27
0
0
22 Aug 2022
Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis
Siwen Luo
Yi Ding
Siqu Long
Josiah Poon
S. Han
GNN
20
16
0
22 Aug 2022
Information Extraction from Scanned Invoice Images using Text Analysis and Layout Features
H. Ha
Ales Horak
23
14
0
08 Aug 2022
Knowing Where and What: Unified Word Block Pretraining for Document Understanding
Song Tao
Zijian Wang
Tiantian Fan
Canjie Luo
Can Huang
SSL
32
2
0
28 Jul 2022
Towards Complex Document Understanding By Discrete Reasoning
Fengbin Zhu
Wenqiang Lei
Fuli Feng
Chao Wang
Haozhou Zhang
Tat-Seng Chua
31
42
0
25 Jul 2022
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Linxi Fan
Guanzhi Wang
Yunfan Jiang
Ajay Mandlekar
Yuncong Yang
Haoyi Zhu
Andrew Tang
De-An Huang
Yuke Zhu
Anima Anandkumar
LM&Ro
46
348
0
17 Jun 2022
Test-Time Adaptation for Visual Document Understanding
Sayna Ebrahimi
Sercan Ö. Arik
Tomas Pfister
OOD
33
6
0
15 Jun 2022
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
54
527
0
13 Jun 2022
V-Doc : Visual questions answers with Documents
Yihao Ding
Zhe Huang
Runlin Wang
Yanhang Zhang
Xianru Chen
Yuzhong Ma
Hyunsuk Chung
S. Han
25
15
0
27 May 2022
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI
Liangtai Sun
Xingyu Chen
Lu Chen
Tianle Dai
Zichen Zhu
Kai Yu
LLMAG
20
50
0
23 May 2022
LayoutXLM vs. GNN: An Empirical Evaluation of Relation Extraction for Documents
Hervé Déjean
S. Clinchant
Jean-Luc Meunier
22
4
0
09 May 2022
Relational Representation Learning in Visually-Rich Documents
Xin Li
Yan Zheng
Yiqing Hu
H. Cao
Yunfei Wu
Deqiang Jiang
Yinsong Liu
Bo Ren
18
12
0
05 May 2022
Vision-Language Pre-Training for Boosting Scene Text Detectors
Sibo Song
Jianqiang Wan
Zhibo Yang
Jun Tang
Wenqing Cheng
Xiang Bai
Cong Yao
VLM
41
24
0
29 Apr 2022
Digitizing Historical Balance Sheet Data: A Practitioner's Guide
Sergio Correia
Stephan Luck
26
10
0
31 Mar 2022
End-to-end Document Recognition and Understanding with Dessurt
Brian L. Davis
B. Morse
Brian L. Price
Chris Tensmeyer
Curtis Wigington
Vlad I. Morariu
VLM
ViT
24
73
0
30 Mar 2022
DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition
Denis Coquenet
Clément Chatelain
Thierry Paquet
30
57
0
23 Mar 2022
XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding
Zhangxuan Gu
Changhua Meng
Ke Wang
Jun Lan
Weiqiang Wang
Ming Gu
Liqing Zhang
31
76
0
14 Mar 2022
DiT: Self-supervised Pre-training for Document Image Transformer
Junlong Li
Yiheng Xu
Tengchao Lv
Lei Cui
Chaoxi Zhang
Furu Wei
ViT
VLM
35
159
0
04 Mar 2022
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
Jiapeng Wang
Lianwen Jin
Kai Ding
VLM
30
138
0
28 Feb 2022
WebFormer: The Web-page Transformer for Structure Information Extraction
Qifan Wang
Yi Fang
Anirudh Ravula
Fuli Feng
Xiaojun Quan
Dongfang Liu
ViT
141
65
0
01 Feb 2022
DocEnTr: An End-to-End Document Image Enhancement Transformer
Mohamed Ali Souibgui
Sanket Biswas
Sana Khamekhem Jemni
Yousri Kessentini
Alicia Fornés
Josep Lladós
Umapada Pal
ViT
55
45
0
25 Jan 2022
LaTr: Layout-Aware Transformer for Scene-Text VQA
Ali Furkan Biten
Ron Litman
Yusheng Xie
Srikar Appalaraju
R. Manmatha
ViT
29
100
0
23 Dec 2021
Value Retrieval with Arbitrary Queries for Form-like Documents
M. Gao
Le Xue
Chetan Ramaiah
Chen Xing
Ran Xu
Caiming Xiong
15
6
0
15 Dec 2021
OCR-free Document Understanding Transformer
Geewook Kim
Teakgyu Hong
Moonbin Yim
Jeongyeon Nam
Jinyoung Park
Jinyeong Yim
Wonseok Hwang
Sangdoo Yun
Dongyoon Han
Seunghyun Park
ViT
50
262
0
30 Nov 2021
Document AI: Benchmarks, Models and Applications
Lei Cui
Yiheng Xu
Tengchao Lv
Furu Wei
VLM
21
69
0
16 Nov 2021
ICDAR 2021 Competition on Document VisualQuestion Answering
Rubèn Pérez Tito
Minesh Mathew
C. V. Jawahar
Ernest Valveny
Dimosthenis Karatzas
35
23
0
10 Nov 2021
Information Extraction from Visually Rich Documents with Font Style Embeddings
Ismail Oussaid
William Vanhuffel
Pirashanth Ratnamogan
Mhamed Hajaiej
Alexis Mathey
Thomas Gilles
19
1
0
07 Nov 2021
Entity Relation Extraction as Dependency Parsing in Visually Rich Documents
Yue Zhang
Bo-Wen Zhang
Rui Wang
Junjie Cao
Chen Li
Zuyi Bao
40
32
0
19 Oct 2021
MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding
Junlong Li
Yiheng Xu
Lei Cui
Furu Wei
VLM
3DGS
25
59
0
16 Oct 2021
OPAD: An Optimized Policy-based Active Learning Framework for Document Content Analysis
Sumit Shekhar
Bhanu Prakash Reddy Guda
Ashutosh Chaubey
Ishan Jindal
Avanish Jain
30
0
0
01 Oct 2021
Skim-Attention: Learning to Focus via Document Layout
Laura Nguyen
Thomas Scialom
Jacopo Staiano
Benjamin Piwowarski
18
9
0
02 Sep 2021
BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents
Teakgyu Hong
Donghyun Kim
Mingi Ji
Wonseok Hwang
Daehyun Nam
Sungrae Park
VLM
34
150
0
10 Aug 2021
DocFormer: End-to-End Transformer for Document Understanding
Srikar Appalaraju
Bhavan A. Jasani
Bhargava Urala Kota
Yusheng Xie
R. Manmatha
ViT
29
270
0
22 Jun 2021
VILA: Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups
Zejiang Shen
Kyle Lo
Lucy Lu Wang
Bailey Kuehl
Daniel S. Weld
Doug Downey
VLM
16
34
0
01 Jun 2021
InfographicVQA
Minesh Mathew
Viraj Bagal
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
C. V. Jawahar
22
203
0
26 Apr 2021
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
Yiheng Xu
Tengchao Lv
Lei Cui
Guoxin Wang
Yijuan Lu
D. Florêncio
Cha Zhang
Furu Wei
MLLM
VLM
32
127
0
18 Apr 2021
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
Rafal Powalski
Łukasz Borchmann
Dawid Jurkiewicz
Tomasz Dwojak
Michal Pietruszka
Gabriela Pałka
ViT
33
157
0
18 Feb 2021
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents
Guillaume Jaume
H. K. Ekenel
Jean-Philippe Thiran
134
355
0
27 May 2019
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Z. Tu
Kaiming He
297
10,220
0
16 Nov 2016
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,743
0
26 Sep 2016
Previous
1
2