Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.11539
Cited By
v1
v2 (latest)
DocFormer: End-to-End Transformer for Document Understanding
22 June 2021
Srikar Appalaraju
Bhavan A. Jasani
Bhargava Urala Kota
Yusheng Xie
R. Manmatha
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DocFormer: End-to-End Transformer for Document Understanding"
50 / 185 papers shown
Title
Noise-Aware Training of Layout-Aware Language Models
Ritesh Sarkhel
Xiaoqi Ren
Lauro Beltrao Costa
Guolong Su
Vincent Perot
Yanan Xie
Emmanouil Koukoumidis
Arnab Nandi
VLM
86
0
0
30 Mar 2024
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition
Jianqiang Wan
Sibo Song
Wenwen Yu
Yuliang Liu
Wenqing Cheng
Fei Huang
Xiang Bai
Cong Yao
Zhibo Yang
97
37
0
28 Mar 2024
Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence
Hsiu-Wei Yang
Abhinav Agrawal
Pavlos Fragkogiannis
Shubham Nitin Mulay
86
1
0
27 Mar 2024
Visually Guided Generative Text-Layout Pre-training for Document Intelligence
Zhiming Mao
Haoli Bai
Lu Hou
Jiansheng Wei
Xin Jiang
Qun Liu
Kam-Fai Wong
80
10
0
25 Mar 2024
RoDLA: Benchmarking the Robustness of Document Layout Analysis Models
Yufan Chen
Jiaming Zhang
Kunyu Peng
Junwei Zheng
Ruiping Liu
Philip Torr
Rainer Stiefelhagen
OOD
68
7
0
21 Mar 2024
Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis
Abdelrahman Abdallah
Daniel Eberharter
Zoe Pfister
Adam Jatowt
66
12
0
06 Mar 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLM
CLIP
156
12
0
05 Mar 2024
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding
Hongshen Xu
Lu Chen
Zihan Zhao
Da Ma
Ruisheng Cao
Zichen Zhu
Kai Yu
60
2
0
28 Feb 2024
Improving Language Understanding from Screenshots
Tianyu Gao
Zirui Wang
Adithya Bhaskar
Danqi Chen
VLM
82
10
0
21 Feb 2024
LAPDoc: Layout-Aware Prompting for Documents
Marcel Lamott
Yves-Noel Weweler
A. Ulges
Faisal Shafait
Dirk Krechel
Darko Obradovic
116
6
0
15 Feb 2024
Beyond the Mud: Datasets and Benchmarks for Computer Vision in Off-Road Racing
Jacob Tyo
Motolani Olarinre
Youngseog Chung
Zachary Chase Lipton
74
0
0
12 Feb 2024
TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing
Ran Zmigrod
Zhiqiang Ma
Armineh Nourbakhsh
Sameena Shah
59
4
0
07 Feb 2024
Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation
Maoyuan Ye
Jing Zhang
Juhua Liu
Chenyu Liu
Baocai Yin
Cong Liu
Bo Du
Dacheng Tao
VLM
106
15
0
31 Jan 2024
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions
Ryota Tanaka
Taichi Iki
Kyosuke Nishida
Kuniko Saito
Jun Suzuki
VLM
72
23
0
24 Jan 2024
Watermark Text Pattern Spotting in Document Images
Mateusz Krubiński
Stefan Matcovici
Diana Grigore
Daniel Voinea
A. Popa
WaLM
57
2
0
10 Jan 2024
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction
Zening Lin
Jiapeng Wang
Teng Li
Wenhui Liao
Dayi Huang
Longfei Xiong
Lianwen Jin
51
2
0
07 Jan 2024
GRAM: Global Reasoning for Multi-Page VQA
Tsachi Blau
Sharon Fogel
Roi Ronen
Alona Golts
Roy Ganz
Elad Ben Avraham
Aviad Aberdam
Shahar Tsiper
Ron Litman
72
13
0
07 Jan 2024
DocGraphLM: Documental Graph Language Model for Information Extraction
Dongsheng Wang
Zhiqiang Ma
Armineh Nourbakhsh
Kang Gu
Sameena Shah
67
8
0
05 Jan 2024
LANS: A Layout-Aware Neural Solver for Plane Geometry Problem
Zhong-Zhi Li
Ming-Liang Zhang
Fei Yin
Cheng-Lin Liu
82
17
0
25 Nov 2023
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
Hao Feng
Qi Liu
Hao Liu
Wen-gang Zhou
Houqiang Li
Can Huang
VLM
117
67
0
20 Nov 2023
Efficient End-to-End Visual Document Understanding with Rationale Distillation
Wang Zhu
Alekh Agarwal
Mandar Joshi
Robin Jia
Jesse Thomason
Kristina Toutanova
72
2
0
16 Nov 2023
DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models
Peng Tang
Pengkai Zhu
Tian Li
Srikar Appalaraju
Vijay Mahadevan
R. Manmatha
70
7
0
15 Nov 2023
Multiple-Question Multiple-Answer Text-VQA
Peng Tang
Srikar Appalaraju
R. Manmatha
Yusheng Xie
Vijay Mahadevan
94
5
0
15 Nov 2023
Reading Between the Mud: A Challenging Motorcycle Racer Number Dataset
Jacob Tyo
Youngseog Chung
Motolani Olarinre
Zachary Chase Lipton
57
0
0
14 Nov 2023
ETDPC: A Multimodality Framework for Classifying Pages in Electronic Theses and Dissertations
Muntabir Hasan Choudhury
Lamia Salsabil
William A. Ingram
Edward A. Fox
Jian Wu
42
0
0
07 Nov 2023
Image Generation and Learning Strategy for Deep Document Forgery Detection
Yamato Okamoto
Osada Genki
Iu Yahiro
Rintaro Hasegawa
Peifei Zhu
Hirokatsu Kataoka
AAML
69
0
0
07 Nov 2023
On Task-personalized Multimodal Few-shot Learning for Visually-rich Document Entity Retrieval
Jiayi Chen
H. Dai
Bo Dai
Aidong Zhang
Wei Wei
87
2
0
01 Nov 2023
Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation
Yongxin Shi
Dezhi Peng
Wenhui Liao
Zening Lin
Xinhong Chen
Chongyu Liu
Yuyi Zhang
Lianwen Jin
MLLM
99
44
0
25 Oct 2023
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Tofik Ali
Partha Pratim Roy
57
0
0
25 Oct 2023
A Multi-Modal Multilingual Benchmark for Document Image Classification
Yoshinari Fujinuma
Siddharth Varia
Nishant Sankaran
Srikar Appalaraju
Bonan Min
Yogarshi Vyas
VLM
90
4
0
25 Oct 2023
GenKIE: Robust Generative Multimodal Document Key Information Extraction
Panfeng Cao
Ye Wang
Qiang Zhang
Zaiqiao Meng
SyDa
82
7
0
24 Oct 2023
Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning
Hao Wang
Xiahua Chen
Rui Wang
Chenhui Chu
67
0
0
23 Oct 2023
PHD: Pixel-Based Language Modeling of Historical Documents
Nadav Borenstein
Phillip Rust
Desmond Elliott
Isabelle Augenstein
60
3
0
22 Oct 2023
DSG: An End-to-End Document Structure Generator
Johannes Rausch
Gentiana Rashiti
Maxim Gusev
Ce Zhang
Stefan Feuerriegel
82
3
0
13 Oct 2023
ProtoNER: Few shot Incremental Learning for Named Entity Recognition using Prototypical Networks
Ritesh Kumar
Saurabh Goyal
Ashish Verma
Vatche Isahagian
53
3
0
03 Oct 2023
Analyzing the Efficacy of an LLM-Only Approach for Image-based Document Question Answering
Nidhi Hegde
S. Paul
Gagan Madan
Gaurav Aggarwal
66
9
0
25 Sep 2023
Document Understanding for Healthcare Referrals
Jimit Mistry
N. Arzeno
MedIm
30
1
0
22 Sep 2023
Kosmos-2.5: A Multimodal Literate Model
Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
...
Weiyao Luo
Shaoxiang Wu
Guoxin Wang
Cha Zhang
Furu Wei
VLM
MLLM
114
66
0
20 Sep 2023
LMDX: Language Model-based Document Information Extraction and Localization
Vincent Perot
Kai Kang
Florian Luisier
Guolong Su
Xiaoyu Sun
...
Zifeng Wang
Jiaqi Mu
Hao Zhang
Chen-Yu Lee
Nan Hua
96
34
0
19 Sep 2023
Vision Grid Transformer for Document Layout Analysis
Cheng Da
Chuwei Luo
Qi Zheng
Cong Yao
ViT
99
32
0
29 Aug 2023
Nougat: Neural Optical Understanding for Academic Documents
Lukas Blecher
Guillem Cucurull
Thomas Scialom
Robert Stojnic
ViT
66
120
0
25 Aug 2023
Beyond Document Page Classification: Design, Datasets, and Challenges
Jordy Van Landeghem
Sanket Biswas
Matthew B. Blaschko
Marie-Francine Moens
105
7
0
24 Aug 2023
Enhancing Visually-Rich Document Understanding via Layout Structure Modeling
Qiwei Li
Z. Li
Xiantao Cai
Bo Du
Hai Zhao
64
8
0
15 Aug 2023
RealCQA: Scientific Chart Question Answering as a Test-bed for First-Order Logic
Saleem Ahmed
Bhavin Jawade
Shubham Pandey
S. Setlur
Venugopal Govindaraju
49
7
0
03 Aug 2023
SpaDen : Sparse and Dense Keypoint Estimation for Real-World Chart Understanding
Saleem Ahmed
Pengyu Yan
David Doermann
S. Setlur
Venugopal Govindaraju
36
2
0
03 Aug 2023
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
Izzeddin Gur
Hiroki Furuta
Austin Huang
Mustafa Safdari
Yutaka Matsuo
Douglas Eck
Aleksandra Faust
LM&Ro
LLMAG
192
226
0
24 Jul 2023
DocTr: Document Transformer for Structured Information Extraction in Documents
Haofu Liao
Aruni RoyChowdhury
Weijian Li
Ankan Bansal
Yuting Zhang
Zhuowen Tu
R. Satzoda
R. Manmatha
Vijay Mahadevan
70
12
0
16 Jul 2023
On Evaluation of Document Classification using RVL-CDIP
Stefan Larson
Gordon Lim
Kevin Leach
106
3
0
21 Jun 2023
DocumentNet: Bridging the Data Gap in Document Pre-Training
Lijun Yu
Jin Miao
Xiaoyu Sun
Jiayi Chen
Alexander G. Hauptmann
H. Dai
Wei Wei
31
3
0
15 Jun 2023
DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents
Fuxiao Liu
Hao Tan
Chris Tensmeyer
CLIP
VLM
99
18
0
09 Jun 2023
Previous
1
2
3
4
Next