ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.09550
  4. Cited By
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout
  Transformer

Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

18 February 2021
Rafal Powalski
Łukasz Borchmann
Dawid Jurkiewicz
Tomasz Dwojak
Michal Pietruszka
Gabriela Pałka
    ViT
ArXivPDFHTML

Papers citing "Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer"

50 / 117 papers shown
Title
DocVXQA: Context-Aware Visual Explanations for Document Question Answering
DocVXQA: Context-Aware Visual Explanations for Document Question Answering
Mohamed Ali Souibgui
Changkyu Choi
Andrey Barsky
Kangsoo Jung
Ernest Valveny
Dimosthenis Karatzas
25
0
0
12 May 2025
Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA
Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA
M. Turski
Mateusz Chiliński
Łukasz Borchmann
28
0
0
14 Apr 2025
TextBite: A Historical Czech Document Dataset for Logical Page Segmentation
TextBite: A Historical Czech Document Dataset for Logical Page Segmentation
Martin Kostelník
Karel Beneš
Michal Hradiš
37
0
0
20 Mar 2025
KIEval: Evaluation Metric for Document Key Information Extraction
KIEval: Evaluation Metric for Document Key Information Extraction
Minsoo Khang
Sang Chul Jung
Sungrae Park
Teakgyu Hong
47
0
0
07 Mar 2025
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
Thanh-Phong Le
Trung Le Chi Phan
Nghia Hieu Nguyen
Kiet Van Nguyen
ViT
46
0
0
26 Feb 2025
ReLayout: Towards Real-World Document Understanding via Layout-enhanced
  Pre-training
ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training
Zhouqiang Jiang
Bowen Wang
Junhao Chen
Yuta Nakashima
22
2
0
14 Oct 2024
Modeling Layout Reading Order as Ordering Relations for Visually-rich
  Document Understanding
Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding
Chong Zhang
Yi Tu
Yixi Zhao
Chenshu Yuan
Huan Chen
...
Mingxu Chai
Ya Guo
Huijia Zhu
Qi Zhang
Tao Gui
43
2
0
29 Sep 2024
DocMamba: Efficient Document Pre-training with State Space Model
DocMamba: Efficient Document Pre-training with State Space Model
Pengfei Hu
Zhenrong Zhang
Jiefeng Ma
Shuhang Liu
Jun Du
Jianshu Zhang
Mamba
39
1
0
18 Sep 2024
Arctic-TILT. Business Document Understanding at Sub-Billion Scale
Arctic-TILT. Business Document Understanding at Sub-Billion Scale
Łukasz Borchmann
Michał Pietruszka
Wojciech Ja'skowski
Dawid Jurkiewicz
Piotr Halama
...
Gabriela Nowakowska
Artur Zawłocki
Łukasz Duhr
Paweł Dyda
Michał Turski
VLM
34
1
0
08 Aug 2024
Deep Learning based Visually Rich Document Content Understanding: A
  Survey
Deep Learning based Visually Rich Document Content Understanding: A Survey
Muhammad Ali
Jean Lee
Salman Khan
39
6
0
02 Aug 2024
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Ofir Abramovich
Niv Nayman
Sharon Fogel
I. Lavi
Ron Litman
Shahar Tsiper
Royee Tichauer
Srikar Appalaraju
Shai Mazor
R. Manmatha
VLM
33
3
0
17 Jul 2024
DocXplain: A Novel Model-Agnostic Explainability Method for Document
  Image Classification
DocXplain: A Novel Model-Agnostic Explainability Method for Document Image Classification
S. Saifullah
S. Agne
Andreas Dengel
Sheraz Ahmed
29
0
0
04 Jul 2024
Reconstructing training data from document understanding models
Reconstructing training data from document understanding models
Jérémie Dentan
Arnaud Paran
A. Shabou
AAML
SyDa
41
1
0
05 Jun 2024
Multimodal Adaptive Inference for Document Image Classification with
  Anytime Early Exiting
Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting
Omar Hamed
Souhail Bakkali
Marie-Francine Moens
Matthew Blaschko
Jordy Van Landeghem
27
1
0
21 May 2024
CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large
  Language Models
CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models
Haoxiang Shi
Jiaan Wang
Jiarong Xu
Cen Wang
Tetsuya Sakai
LMTD
28
0
0
20 May 2024
Federated Document Visual Question Answering: A Pilot Study
Federated Document Visual Question Answering: A Pilot Study
Khanh Nguyen
Dimosthenis Karatzas
FedML
44
0
0
10 May 2024
GeoContrastNet: Contrastive Key-Value Edge Learning for
  Language-Agnostic Document Understanding
GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding
Nil Biescas
Carlos Boned Riera
Josep Lladós
Sanket Biswas
42
1
0
06 May 2024
CREPE: Coordinate-Aware End-to-End Document Parser
CREPE: Coordinate-Aware End-to-End Document Parser
Yamato Okamoto
Youngmin Baek
Geewook Kim
Ryota Nakao
Donghyun Kim
Moonbin Yim
Seunghyun Park
Bado Lee
27
1
0
01 May 2024
Multi-Page Document Visual Question Answering using Self-Attention
  Scoring Mechanism
Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism
Lei Kang
Rubèn Pérez Tito
Ernest Valveny
Dimosthenis Karatzas
37
5
0
29 Apr 2024
A Hybrid Approach for Document Layout Analysis in Document images
A Hybrid Approach for Document Layout Analysis in Document images
Tahira Shehzadi
Didier Stricker
Muhammad Zeshan Afzal
34
5
0
27 Apr 2024
HRVDA: High-Resolution Visual Document Assistant
HRVDA: High-Resolution Visual Document Assistant
Chaohu Liu
Kun Yin
Haoyu Cao
Xinghua Jiang
Xin Li
Yinsong Liu
Deqiang Jiang
Xing Sun
Linli Xu
VLM
43
24
0
10 Apr 2024
LayoutLLM: Layout Instruction Tuning with Large Language Models for
  Document Understanding
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
Chuwei Luo
Yufan Shen
Zhaoqing Zhu
Qi Zheng
Zhi Yu
Cong Yao
31
38
0
08 Apr 2024
BuDDIE: A Business Document Dataset for Multi-task Information
  Extraction
BuDDIE: A Business Document Dataset for Multi-task Information Extraction
Ran Zmigrod
Dongsheng Wang
Mathieu Sibue
Yulong Pei
Petr Babkin
...
Antony Papadimitriou
William Watson
Zhiqiang Ma
Armineh Nourbakhsh
Sameena Shah
25
4
0
05 Apr 2024
JDocQA: Japanese Document Question Answering Dataset for Generative
  Language Models
JDocQA: Japanese Document Question Answering Dataset for Generative Language Models
Eri Onami
Shuhei Kurita
Taiki Miyanishi
Taro Watanabe
25
1
0
28 Mar 2024
Can AI Models Appreciate Document Aesthetics? An Exploration of
  Legibility and Layout Quality in Relation to Prediction Confidence
Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence
Hsiu-Wei Yang
Abhinav Agrawal
Pavlos Fragkogiannis
Shubham Nitin Mulay
29
1
0
27 Mar 2024
Visually Guided Generative Text-Layout Pre-training for Document
  Intelligence
Visually Guided Generative Text-Layout Pre-training for Document Intelligence
Zhiming Mao
Haoli Bai
Lu Hou
Jiansheng Wei
Xin Jiang
Qun Liu
Kam-Fai Wong
32
8
0
25 Mar 2024
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich
  Document Understanding
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding
Masato Fujitake
MLLM
19
15
0
21 Mar 2024
Transformers and Language Models in Form Understanding: A Comprehensive
  Review of Scanned Document Analysis
Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis
Abdelrahman Abdallah
Daniel Eberharter
Zoe Pfister
Adam Jatowt
32
12
0
06 Mar 2024
Improving Language Understanding from Screenshots
Improving Language Understanding from Screenshots
Tianyu Gao
Zirui Wang
Adithya Bhaskar
Danqi Chen
VLM
33
10
0
21 Feb 2024
GraphKD: Exploring Knowledge Distillation Towards Document Object
  Detection with Structured Graph Creation
GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation
Ayan Banerjee
Sanket Biswas
Josep Lladós
Umapada Pal
38
1
0
17 Feb 2024
LAPDoc: Layout-Aware Prompting for Documents
LAPDoc: Layout-Aware Prompting for Documents
Marcel Lamott
Yves-Noel Weweler
A. Ulges
Faisal Shafait
Dirk Krechel
Darko Obradovic
46
5
0
15 Feb 2024
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Gilles Baechler
Srinivas Sunkara
Maria Wang
Fedir Zubach
Hassan Mansoor
Vincent Etter
Victor Carbune
Jason Lin
Jindong Chen
Abhanshu Sharma
117
47
0
07 Feb 2024
GRAM: Global Reasoning for Multi-Page VQA
GRAM: Global Reasoning for Multi-Page VQA
Tsachi Blau
Sharon Fogel
Roi Ronen
Alona Golts
Roy Ganz
Elad Ben Avraham
Aviad Aberdam
Shahar Tsiper
Ron Litman
16
12
0
07 Jan 2024
DocGraphLM: Documental Graph Language Model for Information Extraction
DocGraphLM: Documental Graph Language Model for Information Extraction
Dongsheng Wang
Zhiqiang Ma
Armineh Nourbakhsh
Kang Gu
Sameena Shah
36
8
0
05 Jan 2024
Privacy-Aware Document Visual Question Answering
Privacy-Aware Document Visual Question Answering
Rubèn Pérez Tito
Khanh Nguyen
Marlon Tobaben
Raouf Kerkouche
Mohamed Ali Souibgui
...
Lei Kang
Ernest Valveny
Antti Honkela
Mario Fritz
Dimosthenis Karatzas
35
13
0
15 Dec 2023
Efficient End-to-End Visual Document Understanding with Rationale
  Distillation
Efficient End-to-End Visual Document Understanding with Rationale Distillation
Wang Zhu
Alekh Agarwal
Mandar Joshi
Robin Jia
Jesse Thomason
Kristina Toutanova
32
2
0
16 Nov 2023
DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder
  Transformer Models
DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models
Peng Tang
Pengkai Zhu
Tian Li
Srikar Appalaraju
Vijay Mahadevan
R. Manmatha
37
7
0
15 Nov 2023
Multiple-Question Multiple-Answer Text-VQA
Multiple-Question Multiple-Answer Text-VQA
Peng Tang
Srikar Appalaraju
R. Manmatha
Yusheng Xie
Vijay Mahadevan
46
5
0
15 Nov 2023
On Task-personalized Multimodal Few-shot Learning for Visually-rich
  Document Entity Retrieval
On Task-personalized Multimodal Few-shot Learning for Visually-rich Document Entity Retrieval
Jiayi Chen
H. Dai
Bo Dai
Aidong Zhang
Wei Wei
21
2
0
01 Nov 2023
Enhancing Document Information Analysis with Multi-Task Pre-training: A
  Robust Approach for Information Extraction in Visually-Rich Documents
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Tofik Ali
Partha Pratim Roy
16
0
0
25 Oct 2023
GenKIE: Robust Generative Multimodal Document Key Information Extraction
GenKIE: Robust Generative Multimodal Document Key Information Extraction
Panfeng Cao
Ye Wang
Qiang Zhang
Zaiqiao Meng
SyDa
26
5
0
24 Oct 2023
Vision-Enhanced Semantic Entity Recognition in Document Images via
  Visually-Asymmetric Consistency Learning
Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning
Hao Wang
Xiahua Chen
Rui-cang Wang
Chenhui Chu
19
0
0
23 Oct 2023
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Xi Chen
Xiao Wang
Lucas Beyer
Alexander Kolesnikov
Jialin Wu
...
Keran Rong
Tianli Yu
Daniel Keysers
Xiao-Qi Zhai
Radu Soricut
MLLM
VLM
30
93
0
13 Oct 2023
PrIeD-KIE: Towards Privacy Preserved Document Key Information Extraction
PrIeD-KIE: Towards Privacy Preserved Document Key Information Extraction
S. Saifullah
S. Agne
Andreas Dengel
Sheraz Ahmed
16
0
0
05 Oct 2023
ProtoNER: Few shot Incremental Learning for Named Entity Recognition
  using Prototypical Networks
ProtoNER: Few shot Incremental Learning for Named Entity Recognition using Prototypical Networks
Ritesh Kumar
Saurabh Goyal
Ashish Verma
Vatche Isahagian
10
3
0
03 Oct 2023
LMDX: Language Model-based Document Information Extraction and
  Localization
LMDX: Language Model-based Document Information Extraction and Localization
Vincent Perot
Kai Kang
Florian Luisier
Guolong Su
Xiaoyu Sun
...
Zifeng Wang
Jiaqi Mu
Hao Zhang
Chen-Yu Lee
Nan Hua
50
29
0
19 Sep 2023
Long-Range Transformer Architectures for Document Understanding
Long-Range Transformer Architectures for Document Understanding
Thibault Douzon
S. Duffner
Christophe Garcia
Jérémy Espinas
VLM
24
2
0
11 Sep 2023
Improving Information Extraction on Business Documents with Specific
  Pre-Training Tasks
Improving Information Extraction on Business Documents with Specific Pre-Training Tasks
Thibault Douzon
S. Duffner
Christophe Garcia
Jérémy Espinas
11
6
0
11 Sep 2023
Attention Where It Matters: Rethinking Visual Document Understanding
  with Selective Region Concentration
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
H. Cao
Changcun Bao
Chaohu Liu
Huang-wei Chen
Kun Yin
Hao Liu
Yinsong Liu
Deqiang Jiang
Xing Sun
22
13
0
03 Sep 2023
Beyond Document Page Classification: Design, Datasets, and Challenges
Beyond Document Page Classification: Design, Datasets, and Challenges
Jordy Van Landeghem
Sanket Biswas
Matthew B. Blaschko
Marie-Francine Moens
37
6
0
24 Aug 2023
123
Next