ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.08622
  4. Cited By
Multiple-Question Multiple-Answer Text-VQA

Multiple-Question Multiple-Answer Text-VQA

15 November 2023
Peng Tang
Srikar Appalaraju
R. Manmatha
Yusheng Xie
Vijay Mahadevan
ArXiv (abs)PDFHTML

Papers citing "Multiple-Question Multiple-Answer Text-VQA"

30 / 30 papers shown
Title
Enhancing Vision-Language Pre-training with Rich Supervisions
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLMCLIP
123
12
0
05 Mar 2024
Unifying Vision, Text, and Layout for Universal Document Processing
Unifying Vision, Text, and Layout for Universal Document Processing
Zineng Tang
Ziyi Yang
Guoxin Wang
Yuwei Fang
Yang Liu
Chenguang Zhu
Michael Zeng
Chao-Yue Zhang
Joey Tianyi Zhou
VLM
86
112
0
05 Dec 2022
YORO -- Lightweight End to End Visual Grounding
YORO -- Lightweight End to End Visual Grounding
Chih-Hui Ho
Srikar Appalaraju
Bhavan A. Jasani
R. Manmatha
Nuno Vasconcelos
ObjD
53
21
0
15 Nov 2022
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich
  Document Understanding
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding
Qiming Peng
Yinxu Pan
Wenjin Wang
Bin Luo
Zhenyu Zhang
...
Shi Feng
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
64
83
0
12 Oct 2022
PaLI: A Jointly-Scaled Multilingual Language-Image Model
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Xi Chen
Tianlin Li
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLMVLM
104
722
0
14 Sep 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
137
556
0
27 May 2022
LayoutLMv3: Pre-training for Document AI with Unified Text and Image
  Masking
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Yupan Huang
Tengchao Lv
Lei Cui
Yutong Lu
Furu Wei
92
457
0
18 Apr 2022
DocFormer: End-to-End Transformer for Document Understanding
DocFormer: End-to-End Transformer for Document Understanding
Srikar Appalaraju
Bhavan A. Jasani
Bhargava Urala Kota
Yusheng Xie
R. Manmatha
ViT
88
279
0
22 Jun 2021
StructuralLM: Structural Pre-training for Form Understanding
StructuralLM: Structural Pre-training for Form Understanding
Chenliang Li
Bin Bi
Ming Yan
Wei Wang
Songfang Huang
Fei Huang
Luo Si
LMTDAI4CE
83
134
0
24 May 2021
A Survey of Data Augmentation Approaches for NLP
A Survey of Data Augmentation Approaches for NLP
Steven Y. Feng
Varun Gangal
Jason W. Wei
Sarath Chandar
Soroush Vosoughi
Teruko Mitamura
Eduard H. Hovy
AIMat
106
823
0
07 May 2021
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout
  Transformer
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
Rafal Powalski
Łukasz Borchmann
Dawid Jurkiewicz
Tomasz Dwojak
Michal Pietruszka
Gabriela Pałka
ViT
66
157
0
18 Feb 2021
VisualMRC: Machine Reading Comprehension on Document Images
VisualMRC: Machine Reading Comprehension on Document Images
Ryota Tanaka
Kyosuke Nishida
Sen Yoshida
91
145
0
27 Jan 2021
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
Zhengyuan Yang
Yijuan Lu
Jianfeng Wang
Xi Yin
D. Florêncio
Lijuan Wang
Cha Zhang
Lei Zhang
Jiebo Luo
VLM
94
144
0
08 Dec 2020
Document Visual Question Answering Challenge 2020
Document Visual Question Answering Challenge 2020
Minesh Mathew
Rubèn Pérez Tito
Dimosthenis Karatzas
R. Manmatha
C. V. Jawahar
45
16
0
20 Aug 2020
Spatially Aware Multimodal Transformers for TextVQA
Spatially Aware Multimodal Transformers for TextVQA
Yash Kant
Dhruv Batra
Peter Anderson
Alex Schwing
Devi Parikh
Jiasen Lu
Harsh Agrawal
82
86
0
23 Jul 2020
DocVQA: A Dataset for VQA on Document Images
DocVQA: A Dataset for VQA on Document Images
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
142
739
0
01 Jul 2020
Structured Multimodal Attentions for TextVQA
Structured Multimodal Attentions for TextVQA
Chenyu Gao
Qi Zhu
Peng Wang
Hui Li
Yuliang Liu
Anton Van Den Hengel
Qi Wu
72
59
0
01 Jun 2020
Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene
  Text
Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text
Difei Gao
Ke Li
Ruiping Wang
Shiguang Shan
Xilin Chen
73
112
0
31 Mar 2020
LayoutLM: Pre-training of Text and Layout for Document Image
  Understanding
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Yiheng Xu
Minghao Li
Lei Cui
Shaohan Huang
Furu Wei
Ming Zhou
135
707
0
31 Dec 2019
Iterative Answer Prediction with Pointer-Augmented Multimodal
  Transformers for TextVQA
Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA
Ronghang Hu
Amanpreet Singh
Trevor Darrell
Marcus Rohrbach
71
197
0
14 Nov 2019
Momentum Contrast for Unsupervised Visual Representation Learning
Momentum Contrast for Unsupervised Visual Representation Learning
Kaiming He
Haoqi Fan
Yuxin Wu
Saining Xie
Ross B. Girshick
SSL
207
12,085
0
13 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
445
20,298
0
23 Oct 2019
ICDAR 2019 Competition on Scene Text Visual Question Answering
ICDAR 2019 Competition on Scene Text Visual Question Answering
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
Minesh Mathew
C. V. Jawahar
Ernest Valveny
Dimosthenis Karatzas
54
76
0
30 Jun 2019
Scene Text Visual Question Answering
Scene Text Visual Question Answering
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
Ernest Valveny
C. V. Jawahar
Dimosthenis Karatzas
103
360
0
31 May 2019
Towards VQA Models That Can Read
Towards VQA Models That Can Read
Amanpreet Singh
Vivek Natarajan
Meet Shah
Yu Jiang
Xinlei Chen
Dhruv Batra
Devi Parikh
Marcus Rohrbach
EgoV
97
1,244
0
18 Apr 2019
DVQA: Understanding Data Visualizations via Question Answering
DVQA: Understanding Data Visualizations via Question Answering
Kushal Kafle
Brian L. Price
Scott D. Cohen
Christopher Kanan
AIMat
77
396
0
24 Jan 2018
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Zhiwen Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
903
6,796
0
26 Sep 2016
Layer Normalization
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
413
10,526
0
21 Jul 2016
Pointer Networks
Pointer Networks
Oriol Vinyals
Meire Fortunato
Navdeep Jaitly
121
3,059
0
09 Jun 2015
VQA: Visual Question Answering
VQA: Visual Question Answering
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
211
5,497
0
03 May 2015
1