ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.14457
  4. Cited By
Towards a Multi-modal, Multi-task Learning based Pre-training Framework
  for Document Representation Learning

Towards a Multi-modal, Multi-task Learning based Pre-training Framework for Document Representation Learning

30 September 2020
Subhojeet Pramanik
Shashank Mujumdar
Hima Patel
ArXivPDFHTML

Papers citing "Towards a Multi-modal, Multi-task Learning based Pre-training Framework for Document Representation Learning"

21 / 21 papers shown
Title
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
Linke Ouyang
Yuan Qu
Hongbin Zhou
Jiawei Zhu
Rui Zhang
...
Chao Xu
Bo Zhang
Botian Shi
Zhongying Tu
Conghui He
101
5
0
10 Dec 2024
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Wenhui Liao
Jiapeng Wang
Hongliang Li
Chengyu Wang
Jun Huang
Lianwen Jin
38
0
0
27 Aug 2024
LongFin: A Multimodal Document Understanding Model for Long Financial
  Domain Documents
LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents
Ahmed Masry
Amir Hajian
22
2
0
26 Jan 2024
Beyond Document Page Classification: Design, Datasets, and Challenges
Beyond Document Page Classification: Design, Datasets, and Challenges
Jordy Van Landeghem
Sanket Biswas
Matthew B. Blaschko
Marie-Francine Moens
37
6
0
24 Aug 2023
On Evaluation of Document Classification using RVL-CDIP
On Evaluation of Document Classification using RVL-CDIP
Stefan Larson
Gordon Lim
Kevin Leach
26
3
0
21 Jun 2023
FormNetV2: Multimodal Graph Contrastive Learning for Form Document
  Information Extraction
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
Nils Loose
Chun-Liang Li
Hao Zhang
Timothy Dozat
Felix Mächtle
...
Shangbang Long
Siyang Qin
Yasuhisa Fujii
Nan Hua
T. Eisenbarth
SSL
48
17
0
04 May 2023
DocILE Benchmark for Document Information Localization and Extraction
DocILE Benchmark for Document Information Localization and Extraction
vStvepán vSimsa
Milan vSulc
Michal Uvrivcávr
Yash J. Patel
Ahmed Hamdi
...
Matyávs Skalický
Jivrí Matas
Antoine Doucet
Mickael Coustaty
Dimosthenis Karatzas
24
33
0
11 Feb 2023
Unifying Vision, Text, and Layout for Universal Document Processing
Unifying Vision, Text, and Layout for Universal Document Processing
Zineng Tang
Ziyi Yang
Guoxin Wang
Yuwei Fang
Yang Liu
Chenguang Zhu
Michael Zeng
Chao-Yue Zhang
Mohit Bansal
VLM
30
105
0
05 Dec 2022
Unimodal and Multimodal Representation Training for Relation Extraction
Unimodal and Multimodal Representation Training for Relation Extraction
Ciaran Cooney
Rachel Heyburn
Liam Maddigan
Mairead O'Cuinn
Chloe Thompson
Joana Cavadas
28
2
0
11 Nov 2022
Evaluating Out-of-Distribution Performance on Document Image Classifiers
Evaluating Out-of-Distribution Performance on Document Image Classifiers
Stefan Larson
Gordon Lim
Yutong Ai
David Kuang
Kevin Leach
OODD
OOD
34
18
0
14 Oct 2022
LayoutLMv3: Pre-training for Document AI with Unified Text and Image
  Masking
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Yupan Huang
Tengchao Lv
Lei Cui
Yutong Lu
Furu Wei
25
432
0
18 Apr 2022
DiT: Self-supervised Pre-training for Document Image Transformer
DiT: Self-supervised Pre-training for Document Image Transformer
Junlong Li
Yiheng Xu
Tengchao Lv
Lei Cui
Chaoxi Zhang
Furu Wei
ViT
VLM
35
159
0
04 Mar 2022
MarkupLM: Pre-training of Text and Markup Language for Visually-rich
  Document Understanding
MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding
Junlong Li
Yiheng Xu
Lei Cui
Furu Wei
VLM
3DGS
23
59
0
16 Oct 2021
Skim-Attention: Learning to Focus via Document Layout
Skim-Attention: Learning to Focus via Document Layout
Laura Nguyen
Thomas Scialom
Jacopo Staiano
Benjamin Piwowarski
13
9
0
02 Sep 2021
StrucTexT: Structured Text Understanding with Multi-Modal Transformers
StrucTexT: Structured Text Understanding with Multi-Modal Transformers
Yulin Li
Yuxi Qian
Yuchen Yu
Xiameng Qin
Chengquan Zhang
Yan Liu
Kun Yao
Junyu Han
Jingtuo Liu
Errui Ding
27
113
0
06 Aug 2021
SelfDoc: Self-Supervised Document Representation Learning
SelfDoc: Self-Supervised Document Representation Learning
Peizhao Li
Jiuxiang Gu
Jason Kuen
Vlad I. Morariu
Handong Zhao
R. Jain
Varun Manjunatha
Hongfu Liu
ViT
SSL
20
159
0
07 Jun 2021
End-to-End Hierarchical Relation Extraction for Generic Form
  Understanding
End-to-End Hierarchical Relation Extraction for Generic Form Understanding
Tuan-Anh Dang Nguyen
Duc Thanh Hoang
Q. Tran
Chih-Wei Pan
T. Nguyen
11
10
0
02 Jun 2021
ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for
  Key Information Extraction from Documents
ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents
Weihong Lin
Qifang Gao
Lei-huan Sun
Zhuoyao Zhong
Kaiqin Hu
Qin Ren
Qiang Huo
23
37
0
25 May 2021
LAMPRET: Layout-Aware Multimodal PreTraining for Document Understanding
LAMPRET: Layout-Aware Multimodal PreTraining for Document Understanding
Te-Lin Wu
Cheng-rong Li
Mingyang Zhang
Tao Chen
Spurthi Amba Hombaiah
Michael Bendersky
13
14
0
16 Apr 2021
A Survey of Deep Learning Approaches for OCR and Document Understanding
A Survey of Deep Learning Approaches for OCR and Document Understanding
Nishant Subramani
Alexandre Matton
Malcolm Greaves
Adrian Lam
11
48
0
27 Nov 2020
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents
Guillaume Jaume
H. K. Ekenel
Jean-Philippe Thiran
134
355
0
27 May 2019
1