ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.11539
  4. Cited By
DocFormer: End-to-End Transformer for Document Understanding

DocFormer: End-to-End Transformer for Document Understanding

22 June 2021
Srikar Appalaraju
Bhavan A. Jasani
Bhargava Urala Kota
Yusheng Xie
R. Manmatha
    ViT
ArXivPDFHTML

Papers citing "DocFormer: End-to-End Transformer for Document Understanding"

50 / 185 papers shown
Title
Transformers and Language Models in Form Understanding: A Comprehensive
  Review of Scanned Document Analysis
Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis
Abdelrahman Abdallah
Daniel Eberharter
Zoe Pfister
Adam Jatowt
40
12
0
06 Mar 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLM
CLIP
67
12
0
05 Mar 2024
Hierarchical Multimodal Pre-training for Visually Rich Webpage
  Understanding
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding
Hongshen Xu
Lu Chen
Zihan Zhao
Da Ma
Ruisheng Cao
Zichen Zhu
Kai Yu
37
2
0
28 Feb 2024
Improving Language Understanding from Screenshots
Improving Language Understanding from Screenshots
Tianyu Gao
Zirui Wang
Adithya Bhaskar
Danqi Chen
VLM
43
10
0
21 Feb 2024
LAPDoc: Layout-Aware Prompting for Documents
LAPDoc: Layout-Aware Prompting for Documents
Marcel Lamott
Yves-Noel Weweler
A. Ulges
Faisal Shafait
Dirk Krechel
Darko Obradovic
54
5
0
15 Feb 2024
Beyond the Mud: Datasets and Benchmarks for Computer Vision in Off-Road
  Racing
Beyond the Mud: Datasets and Benchmarks for Computer Vision in Off-Road Racing
Jacob Tyo
Motolani Olarinre
Youngseog Chung
Zachary Chase Lipton
43
0
0
12 Feb 2024
TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing
TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing
Ran Zmigrod
Zhiqiang Ma
Armineh Nourbakhsh
Sameena Shah
24
4
0
07 Feb 2024
Hi-SAM: Marrying Segment Anything Model for Hierarchical Text
  Segmentation
Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation
Maoyuan Ye
Jing Zhang
Juhua Liu
Chenyu Liu
Baocai Yin
Cong Liu
Bo Du
Dacheng Tao
VLM
37
11
0
31 Jan 2024
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document
  Understanding with Instructions
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions
Ryota Tanaka
Taichi Iki
Kyosuke Nishida
Kuniko Saito
Jun Suzuki
VLM
21
23
0
24 Jan 2024
Watermark Text Pattern Spotting in Document Images
Watermark Text Pattern Spotting in Document Images
Mateusz Krubiński
Stefan Matcovici
Diana Grigore
Daniel Voinea
A. Popa
WaLM
17
2
0
10 Jan 2024
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for
  End-to-end Document Pair Extraction
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction
Zening Lin
Jiapeng Wang
Teng Li
Wenhui Liao
Dayi Huang
Longfei Xiong
Lianwen Jin
24
2
0
07 Jan 2024
GRAM: Global Reasoning for Multi-Page VQA
GRAM: Global Reasoning for Multi-Page VQA
Tsachi Blau
Sharon Fogel
Roi Ronen
Alona Golts
Roy Ganz
Elad Ben Avraham
Aviad Aberdam
Shahar Tsiper
Ron Litman
16
12
0
07 Jan 2024
DocGraphLM: Documental Graph Language Model for Information Extraction
DocGraphLM: Documental Graph Language Model for Information Extraction
Dongsheng Wang
Zhiqiang Ma
Armineh Nourbakhsh
Kang Gu
Sameena Shah
36
8
0
05 Jan 2024
LANS: A Layout-Aware Neural Solver for Plane Geometry Problem
LANS: A Layout-Aware Neural Solver for Plane Geometry Problem
Zhong-Zhi Li
Ming-Liang Zhang
Fei Yin
Cheng-Lin Liu
18
11
0
25 Nov 2023
DocPedia: Unleashing the Power of Large Multimodal Model in the
  Frequency Domain for Versatile Document Understanding
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
Hao Feng
Qi Liu
Hao Liu
Wen-gang Zhou
Houqiang Li
Can Huang
VLM
25
60
0
20 Nov 2023
Efficient End-to-End Visual Document Understanding with Rationale
  Distillation
Efficient End-to-End Visual Document Understanding with Rationale Distillation
Wang Zhu
Alekh Agarwal
Mandar Joshi
Robin Jia
Jesse Thomason
Kristina Toutanova
37
2
0
16 Nov 2023
DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder
  Transformer Models
DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models
Peng Tang
Pengkai Zhu
Tian Li
Srikar Appalaraju
Vijay Mahadevan
R. Manmatha
42
7
0
15 Nov 2023
Multiple-Question Multiple-Answer Text-VQA
Multiple-Question Multiple-Answer Text-VQA
Peng Tang
Srikar Appalaraju
R. Manmatha
Yusheng Xie
Vijay Mahadevan
46
5
0
15 Nov 2023
Reading Between the Mud: A Challenging Motorcycle Racer Number Dataset
Reading Between the Mud: A Challenging Motorcycle Racer Number Dataset
Jacob Tyo
Youngseog Chung
Motolani Olarinre
Zachary Chase Lipton
29
0
0
14 Nov 2023
ETDPC: A Multimodality Framework for Classifying Pages in Electronic
  Theses and Dissertations
ETDPC: A Multimodality Framework for Classifying Pages in Electronic Theses and Dissertations
Muntabir Hasan Choudhury
Lamia Salsabil
William A. Ingram
Edward A. Fox
Jian Wu
19
0
0
07 Nov 2023
Image Generation and Learning Strategy for Deep Document Forgery
  Detection
Image Generation and Learning Strategy for Deep Document Forgery Detection
Yamato Okamoto
Osada Genki
Iu Yahiro
Rintaro Hasegawa
Peifei Zhu
Hirokatsu Kataoka
AAML
33
0
0
07 Nov 2023
On Task-personalized Multimodal Few-shot Learning for Visually-rich
  Document Entity Retrieval
On Task-personalized Multimodal Few-shot Learning for Visually-rich Document Entity Retrieval
Jiayi Chen
H. Dai
Bo Dai
Aidong Zhang
Wei Wei
36
2
0
01 Nov 2023
Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and
  In-depth Evaluation
Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation
Yongxin Shi
Dezhi Peng
Wenhui Liao
Zening Lin
Xinhong Chen
Chongyu Liu
Yuyi Zhang
Lianwen Jin
MLLM
30
44
0
25 Oct 2023
Enhancing Document Information Analysis with Multi-Task Pre-training: A
  Robust Approach for Information Extraction in Visually-Rich Documents
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Tofik Ali
Partha Pratim Roy
16
0
0
25 Oct 2023
A Multi-Modal Multilingual Benchmark for Document Image Classification
A Multi-Modal Multilingual Benchmark for Document Image Classification
Yoshinari Fujinuma
Siddharth Varia
Nishant Sankaran
Srikar Appalaraju
Bonan Min
Yogarshi Vyas
VLM
20
4
0
25 Oct 2023
GenKIE: Robust Generative Multimodal Document Key Information Extraction
GenKIE: Robust Generative Multimodal Document Key Information Extraction
Panfeng Cao
Ye Wang
Qiang Zhang
Zaiqiao Meng
SyDa
29
5
0
24 Oct 2023
Vision-Enhanced Semantic Entity Recognition in Document Images via
  Visually-Asymmetric Consistency Learning
Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning
Hao Wang
Xiahua Chen
Rui-cang Wang
Chenhui Chu
27
0
0
23 Oct 2023
PHD: Pixel-Based Language Modeling of Historical Documents
PHD: Pixel-Based Language Modeling of Historical Documents
Nadav Borenstein
Phillip Rust
Desmond Elliott
Isabelle Augenstein
33
3
0
22 Oct 2023
DSG: An End-to-End Document Structure Generator
DSG: An End-to-End Document Structure Generator
Johannes Rausch
Gentiana Rashiti
Maxim Gusev
Ce Zhang
Stefan Feuerriegel
31
3
0
13 Oct 2023
ProtoNER: Few shot Incremental Learning for Named Entity Recognition
  using Prototypical Networks
ProtoNER: Few shot Incremental Learning for Named Entity Recognition using Prototypical Networks
Ritesh Kumar
Saurabh Goyal
Ashish Verma
Vatche Isahagian
15
3
0
03 Oct 2023
Analyzing the Efficacy of an LLM-Only Approach for Image-based Document
  Question Answering
Analyzing the Efficacy of an LLM-Only Approach for Image-based Document Question Answering
Nidhi Hegde
S. Paul
Gagan Madan
Gaurav Aggarwal
20
8
0
25 Sep 2023
Document Understanding for Healthcare Referrals
Document Understanding for Healthcare Referrals
Jimit Mistry
N. Arzeno
MedIm
13
0
0
22 Sep 2023
Kosmos-2.5: A Multimodal Literate Model
Kosmos-2.5: A Multimodal Literate Model
Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
...
Weiyao Luo
Shaoxiang Wu
Guoxin Wang
Cha Zhang
Furu Wei
VLM
MLLM
31
63
0
20 Sep 2023
LMDX: Language Model-based Document Information Extraction and
  Localization
LMDX: Language Model-based Document Information Extraction and Localization
Vincent Perot
Kai Kang
Florian Luisier
Guolong Su
Xiaoyu Sun
...
Zifeng Wang
Jiaqi Mu
Hao Zhang
Chen-Yu Lee
Nan Hua
56
29
0
19 Sep 2023
Vision Grid Transformer for Document Layout Analysis
Vision Grid Transformer for Document Layout Analysis
Cheng Da
Chuwei Luo
Qi Zheng
Cong Yao
ViT
40
27
0
29 Aug 2023
Nougat: Neural Optical Understanding for Academic Documents
Nougat: Neural Optical Understanding for Academic Documents
Lukas Blecher
Guillem Cucurull
Thomas Scialom
Robert Stojnic
ViT
21
107
0
25 Aug 2023
Beyond Document Page Classification: Design, Datasets, and Challenges
Beyond Document Page Classification: Design, Datasets, and Challenges
Jordy Van Landeghem
Sanket Biswas
Matthew B. Blaschko
Marie-Francine Moens
40
6
0
24 Aug 2023
Enhancing Visually-Rich Document Understanding via Layout Structure
  Modeling
Enhancing Visually-Rich Document Understanding via Layout Structure Modeling
Qiwei Li
Z. Li
Xiantao Cai
Bo Du
Hai Zhao
28
7
0
15 Aug 2023
RealCQA: Scientific Chart Question Answering as a Test-bed for
  First-Order Logic
RealCQA: Scientific Chart Question Answering as a Test-bed for First-Order Logic
Saleem Ahmed
Bhavin Jawade
Shubham Pandey
S. Setlur
Venugopal Govindaraju
21
5
0
03 Aug 2023
SpaDen : Sparse and Dense Keypoint Estimation for Real-World Chart
  Understanding
SpaDen : Sparse and Dense Keypoint Estimation for Real-World Chart Understanding
Saleem Ahmed
Pengyu Yan
David Doermann
S. Setlur
Venugopal Govindaraju
18
2
0
03 Aug 2023
A Real-World WebAgent with Planning, Long Context Understanding, and
  Program Synthesis
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
Izzeddin Gur
Hiroki Furuta
Austin Huang
Mustafa Safdari
Yutaka Matsuo
Douglas Eck
Aleksandra Faust
LM&Ro
LLMAG
39
198
0
24 Jul 2023
DocTr: Document Transformer for Structured Information Extraction in
  Documents
DocTr: Document Transformer for Structured Information Extraction in Documents
Haofu Liao
Aruni RoyChowdhury
Weijian Li
Ankan Bansal
Yuting Zhang
Z. Tu
R. Satzoda
R. Manmatha
Vijay Mahadevan
29
11
0
16 Jul 2023
On Evaluation of Document Classification using RVL-CDIP
On Evaluation of Document Classification using RVL-CDIP
Stefan Larson
Gordon Lim
Kevin Leach
31
3
0
21 Jun 2023
DocumentNet: Bridging the Data Gap in Document Pre-Training
DocumentNet: Bridging the Data Gap in Document Pre-Training
Lijun Yu
Jin Miao
Xiaoyu Sun
Jiayi Chen
Alexander G. Hauptmann
H. Dai
Wei Wei
22
3
0
15 Jun 2023
DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents
DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents
Fuxiao Liu
Hao Tan
Chris Tensmeyer
CLIP
VLM
38
18
0
09 Jun 2023
ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich
  Document Images
ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images
Wenwen Yu
Chengquan Zhang
H. Cao
Wei Hua
Bohan Li
...
M. Zhang
Dimosthenis Karatzas
Xingchao Sun
Jingdong Wang
Xiang Bai
26
11
0
05 Jun 2023
DocFormerv2: Local Features for Document Understanding
DocFormerv2: Local Features for Document Understanding
Srikar Appalaraju
Peng Tang
Qi Dong
Nishant Sankaran
Yichu Zhou
R. Manmatha
28
39
0
02 Jun 2023
End-to-End Document Classification and Key Information Extraction using
  Assignment Optimization
End-to-End Document Classification and Key Information Extraction using Assignment Optimization
Ciaran Cooney
Joana Cavadas
Liam Madigan
Bradley Savage
Rachel Heyburn
Mairead O'Cuinn
11
0
0
01 Jun 2023
Layout and Task Aware Instruction Prompt for Zero-shot Document Image
  Question Answering
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering
Wenjin Wang
Yunhao Li
Yixin Ou
Yin Zhang
VLM
21
24
0
01 Jun 2023
LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training
  for Document Understanding
LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding
Yi Tu
Ya Guo
Huan Chen
Jinyang Tang
31
15
0
30 May 2023
Previous
1234
Next