ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.00490
  4. Cited By
ICDAR 2019 Competition on Scene Text Visual Question Answering

ICDAR 2019 Competition on Scene Text Visual Question Answering

30 June 2019
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
Minesh Mathew
C. V. Jawahar
Ernest Valveny
Dimosthenis Karatzas
ArXivPDFHTML

Papers citing "ICDAR 2019 Competition on Scene Text Visual Question Answering"

50 / 54 papers shown
Title
Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA
Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA
M. Turski
Mateusz Chiliński
Łukasz Borchmann
35
0
0
14 Apr 2025
Where is this coming from? Making groundedness count in the evaluation of Document VQA models
Where is this coming from? Making groundedness count in the evaluation of Document VQA models
Armineh Nourbakhsh
Siddharth Parekh
Pranav Shetty
Zhao Jin
Sameena Shah
Carolyn Rose
54
0
0
24 Mar 2025
KIEval: Evaluation Metric for Document Key Information Extraction
KIEval: Evaluation Metric for Document Key Information Extraction
Minsoo Khang
Sang Chul Jung
Sungrae Park
Teakgyu Hong
55
0
0
07 Mar 2025
Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations
Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations
Archita Srivastava
Abhas Kumar
Rajesh Kumar
Prabhakar Srinivasan
35
0
0
08 Jan 2025
NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document
  VQA
NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA
Marlon Tobaben
Mohamed Ali Souibgui
Rubèn Pérez Tito
Khanh Nguyen
Raouf Kerkouche
...
Josep Lladós
Ernest Valveny
Antti Honkela
Mario Fritz
Dimosthenis Karatzas
FedML
39
0
0
06 Nov 2024
Towards an Improved Metric for Evaluating Disentangled Representations
Towards an Improved Metric for Evaluating Disentangled Representations
Sahib Julka
Yashu Wang
Michael Granitzer
34
0
0
04 Oct 2024
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Mengzhao Jia
Wenhao Yu
Kaixin Ma
Tianqing Fang
Zhihan Zhang
Siru Ouyang
Hongming Zhang
Meng Jiang
Dong Yu
VLM
39
5
0
02 Oct 2024
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene
  Understanding
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding
Yonghui Wang
Wengang Zhou
Hao Feng
Houqiang Li
VLM
35
0
0
30 Aug 2024
Large Language Models for Page Stream Segmentation
Large Language Models for Page Stream Segmentation
H. Heidenreich
Ratish Dalvi
Rohith Mukku
Nikhil Verma
Neven Pičuljan
35
0
0
21 Aug 2024
Deep Learning based Visually Rich Document Content Understanding: A
  Survey
Deep Learning based Visually Rich Document Content Understanding: A Survey
Muhammad Ali
Jean Lee
Salman Khan
47
6
0
02 Aug 2024
DocKylin: A Large Multimodal Model for Visual Document Understanding
  with Efficient Visual Slimming
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Jiaxin Zhang
Wentao Yang
Songxuan Lai
Zecheng Xie
Lianwen Jin
39
15
0
27 Jun 2024
TRINS: Towards Multimodal Language Models that Can Read
TRINS: Towards Multimodal Language Models that Can Read
Ruiyi Zhang
Yanzhe Zhang
Jian Chen
Yufan Zhou
Jiuxiang Gu
Changyou Chen
Tong Sun
VLM
39
6
0
10 Jun 2024
TANQ: An open domain dataset of table answered questions
TANQ: An open domain dataset of table answered questions
Mubashara Akhtar
Chenxi Pang
Andreea Marzoca
Yasemin Altun
Julian Martin Eisenschlos
LMTD
RALM
49
1
0
13 May 2024
SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials
SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials
Wonjoong Kim
S. Park
Yeonjun In
Seokwon Han
Chanyoung Park
LRM
ReLM
32
3
0
22 Feb 2024
RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question
  Answering and Clinical Reasoning
RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning
Congyun Jin
Ming Zhang
Xiaowei Ma
Yujiao Li
Yingbo Wang
...
Chenfei Chi
Xiangguo Lv
Fangzhou Li
Wei Xue
Yiran Huang
LM&MA
27
2
0
19 Feb 2024
DocLLM: A layout-aware generative language model for multimodal document
  understanding
DocLLM: A layout-aware generative language model for multimodal document understanding
Dongsheng Wang
Natraj Raman
Mathieu Sibue
Zhiqiang Ma
Petr Babkin
Simerjot Kaur
Yulong Pei
Armineh Nourbakhsh
Xiaomo Liu
VLM
22
53
0
31 Dec 2023
Privacy-Aware Document Visual Question Answering
Privacy-Aware Document Visual Question Answering
Rubèn Pérez Tito
Khanh Nguyen
Marlon Tobaben
Raouf Kerkouche
Mohamed Ali Souibgui
...
Lei Kang
Ernest Valveny
Antti Honkela
Mario Fritz
Dimosthenis Karatzas
38
13
0
15 Dec 2023
Towards Improving Document Understanding: An Exploration on
  Text-Grounding via MLLMs
Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs
Yonghui Wang
Wen-gang Zhou
Hao Feng
Keyi Zhou
Houqiang Li
66
19
0
22 Nov 2023
DocPedia: Unleashing the Power of Large Multimodal Model in the
  Frequency Domain for Versatile Document Understanding
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
Hao Feng
Qi Liu
Hao Liu
Wen-gang Zhou
Houqiang Li
Can Huang
VLM
25
63
0
20 Nov 2023
Multiple-Question Multiple-Answer Text-VQA
Multiple-Question Multiple-Answer Text-VQA
Peng Tang
Srikar Appalaraju
R. Manmatha
Yusheng Xie
Vijay Mahadevan
46
5
0
15 Nov 2023
Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA
Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA
Sheng Zhou
Dan Guo
Jia Li
Xun Yang
Ming Wang
21
5
0
13 Oct 2023
Analyzing the Efficacy of an LLM-Only Approach for Image-based Document
  Question Answering
Analyzing the Efficacy of an LLM-Only Approach for Image-based Document Question Answering
Nidhi Hegde
S. Paul
Gagan Madan
Gaurav Aggarwal
20
8
0
25 Sep 2023
ImageBind-LLM: Multi-modality Instruction Tuning
ImageBind-LLM: Multi-modality Instruction Tuning
Jiaming Han
Renrui Zhang
Wenqi Shao
Peng Gao
Peng Xu
...
Yafei Wen
Xiaoxin Chen
Xiangyu Yue
Hongsheng Li
Yu Qiao
MLLM
51
117
0
07 Sep 2023
InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4
InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4
Lai Wei
Zihao Jiang
Weiran Huang
Lichao Sun
VLM
MLLM
24
56
0
23 Aug 2023
Tiny LVLM-eHub: Early Multimodal Experiments with Bard
Tiny LVLM-eHub: Early Multimodal Experiments with Bard
Wenqi Shao
Yutao Hu
Peng Gao
Meng Lei
Kaipeng Zhang
...
Peng Xu
Siyuan Huang
Hongsheng Li
Yuning Qiao
Ping Luo
VLM
MLLM
34
2
0
07 Aug 2023
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image
  Understanding
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Yanzhe Zhang
Ruiyi Zhang
Jiuxiang Gu
Yufan Zhou
Nedim Lipka
Diyi Yang
Tongfei Sun
VLM
MLLM
30
219
0
29 Jun 2023
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large
  Vision-Language Models
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
Peng Xu
Wenqi Shao
Kaipeng Zhang
Peng Gao
Shuo Liu
Meng Lei
Fanqing Meng
Siyuan Huang
Yu Qiao
Ping Luo
ELM
MLLM
38
159
0
15 Jun 2023
ESTISR: Adapting Efficient Scene Text Image Super-resolution for
  Real-Scenes
ESTISR: Adapting Efficient Scene Text Image Super-resolution for Real-Scenes
Minghao Fu
Xin Man
Yihan Xu
Jie Shao
33
2
0
04 Jun 2023
DocFormerv2: Local Features for Document Understanding
DocFormerv2: Local Features for Document Understanding
Srikar Appalaraju
Peng Tang
Qi Dong
Nishant Sankaran
Yichu Zhou
R. Manmatha
36
39
0
02 Jun 2023
Document Understanding Dataset and Evaluation (DUDE)
Document Understanding Dataset and Evaluation (DUDE)
Jordy Van Landeghem
Rubèn Pérez Tito
Łukasz Borchmann
Michal Pietruszka
Pawel Józiak
...
Bertrand Ackaert
Ernest Valveny
Matthew Blaschko
Sien Moens
Tomasz Stanislawek
VGen
24
53
0
15 May 2023
On the Hidden Mystery of OCR in Large Multimodal Models
On the Hidden Mystery of OCR in Large Multimodal Models
Yuliang Liu
Zhang Li
Mingxin Huang
Chunyuan Li
Dezhi Peng
Mingyu Liu
Lianwen Jin
Xiang Bai
VLM
MLLM
34
55
0
13 May 2023
DePlot: One-shot visual language reasoning by plot-to-table translation
DePlot: One-shot visual language reasoning by plot-to-table translation
Fangyu Liu
Julian Martin Eisenschlos
Francesco Piccinno
Syrine Krichene
Chenxi Pang
Kenton Lee
Mandar Joshi
Wenhu Chen
Nigel Collier
Yasemin Altun
VLM
ReLM
LRM
30
89
0
20 Dec 2022
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich
  Document Understanding
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding
Qiming Peng
Yinxu Pan
Wenjin Wang
Bin Luo
Zhenyu Zhang
...
Shi Feng
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
13
83
0
12 Oct 2022
MUST-VQA: MUltilingual Scene-text VQA
MUST-VQA: MUltilingual Scene-text VQA
Emanuele Vivoli
Ali Furkan Biten
Andrés Mafla
Dimosthenis Karatzas
Lluís Gómez
34
6
0
14 Sep 2022
Understanding Attention for Vision-and-Language Tasks
Understanding Attention for Vision-and-Language Tasks
Feiqi Cao
S. Han
Siqu Long
Changwei Xu
Josiah Poon
47
5
0
17 Aug 2022
Knowing Where and What: Unified Word Block Pretraining for Document Understanding
Song Tao
Zijian Wang
Tiantian Fan
Canjie Luo
Can Huang
SSL
40
2
0
28 Jul 2022
Test-Time Adaptation for Visual Document Understanding
Test-Time Adaptation for Visual Document Understanding
Sayna Ebrahimi
Sercan Ö. Arik
Tomas Pfister
OOD
35
6
0
15 Jun 2022
OCR-IDL: OCR Annotations for Industry Document Library Dataset
OCR-IDL: OCR Annotations for Industry Document Library Dataset
Ali Furkan Biten
Rubèn Pérez Tito
Lluís Gómez
Ernest Valveny
Dimosthenis Karatzas
27
26
0
25 Feb 2022
LaTr: Layout-Aware Transformer for Scene-Text VQA
LaTr: Layout-Aware Transformer for Scene-Text VQA
Ali Furkan Biten
Ron Litman
Yusheng Xie
Srikar Appalaraju
R. Manmatha
ViT
36
100
0
23 Dec 2021
ICDAR 2021 Competition on Document VisualQuestion Answering
ICDAR 2021 Competition on Document VisualQuestion Answering
Rubèn Pérez Tito
Minesh Mathew
C. V. Jawahar
Ernest Valveny
Dimosthenis Karatzas
40
23
0
10 Nov 2021
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout
  Transformer
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
Rafal Powalski
Łukasz Borchmann
Dawid Jurkiewicz
Tomasz Dwojak
Michal Pietruszka
Gabriela Pałka
ViT
36
157
0
18 Feb 2021
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question
  Answering
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering
Zanxia Jin
Heran Wu
Chun Yang
Fang Zhou
Jingyan Qin
Lei Xiao
Xu-Cheng Yin
25
30
0
24 Oct 2020
Finding the Evidence: Localization-aware Answer Prediction for Text
  Visual Question Answering
Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering
Wei Han
Hantao Huang
Tao Han
22
51
0
06 Oct 2020
Document Visual Question Answering Challenge 2020
Document Visual Question Answering Challenge 2020
Minesh Mathew
Rubèn Pérez Tito
Dimosthenis Karatzas
R. Manmatha
C. V. Jawahar
6
15
0
20 Aug 2020
Label or Message: A Large-Scale Experimental Survey of Texts and Objects
  Co-Occurrence
Label or Message: A Large-Scale Experimental Survey of Texts and Objects Co-Occurrence
Koki Takeshita
Juntaro Shioyama
S. Uchida
12
1
0
30 Jul 2020
Spatially Aware Multimodal Transformers for TextVQA
Spatially Aware Multimodal Transformers for TextVQA
Yash Kant
Dhruv Batra
Peter Anderson
Alex Schwing
Devi Parikh
Jiasen Lu
Harsh Agrawal
22
85
0
23 Jul 2020
DocVQA: A Dataset for VQA on Document Images
DocVQA: A Dataset for VQA on Document Images
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
6
655
0
01 Jul 2020
Multimodal grid features and cell pointers for Scene Text Visual
  Question Answering
Multimodal grid features and cell pointers for Scene Text Visual Question Answering
Lluís Gómez
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Marçal Rusiñol
Ernest Valveny
Dimosthenis Karatzas
13
20
0
01 Jun 2020
Structured Multimodal Attentions for TextVQA
Structured Multimodal Attentions for TextVQA
Chenyu Gao
Qi Zhu
Peng Wang
Hui Li
Yuliang Liu
Anton Van Den Hengel
Qi Wu
23
59
0
01 Jun 2020
Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene
  Text
Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text
Difei Gao
Ke Li
Ruiping Wang
Shiguang Shan
Xilin Chen
16
111
0
31 Mar 2020
12
Next