ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.00753
  4. Cited By
Structured Multimodal Attentions for TextVQA
v1v2 (latest)

Structured Multimodal Attentions for TextVQA

1 June 2020
Chenyu Gao
Qi Zhu
Peng Wang
Hui Li
Yuliang Liu
Anton Van Den Hengel
Qi Wu
ArXiv (abs)PDFHTMLGithub (11★)

Papers citing "Structured Multimodal Attentions for TextVQA"

26 / 26 papers shown
Title
ViConsFormer: Constituting Meaningful Phrases of Scene Texts using
  Transformer-based Method in Vietnamese Text-based Visual Question Answering
ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question Answering
Nghia Hieu Nguyen
Tho Thanh Quan
Ngan Luu-Thuy Nguyen
75
0
0
18 Oct 2024
Multiple-Question Multiple-Answer Text-VQA
Multiple-Question Multiple-Answer Text-VQA
Peng Tang
Srikar Appalaraju
R. Manmatha
Yusheng Xie
Vijay Mahadevan
98
5
0
15 Nov 2023
Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA
Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA
Sheng Zhou
Dan Guo
Jia Li
Xun Yang
Ming Wang
98
14
0
13 Oct 2023
A Survey on Image-text Multimodal Models
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai-Nguyen Nguyen
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
133
7
0
23 Sep 2023
Separate and Locate: Rethink the Text in Text-based Visual Question
  Answering
Separate and Locate: Rethink the Text in Text-based Visual Question Answering
Chengyang Fang
Jiangnan Li
Liang Li
Can Ma
Dayong Hu
83
13
0
31 Aug 2023
Making the V in Text-VQA Matter
Making the V in Text-VQA Matter
Shamanthak Hegde
Soumya Jahagirdar
Shankar Gangisetty
CoGe
87
4
0
01 Aug 2023
DocTr: Document Transformer for Structured Information Extraction in
  Documents
DocTr: Document Transformer for Structured Information Extraction in Documents
Haofu Liao
Aruni RoyChowdhury
Weijian Li
Ankan Bansal
Yuting Zhang
Zhuowen Tu
R. Satzoda
R. Manmatha
Vijay Mahadevan
70
12
0
16 Jul 2023
Visual Question Answering: A Survey on Techniques and Common Trends in
  Recent Literature
Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature
Ana Claudia Akemi Matsuki de Faria
Felype de Castro Bastos
Jose Victor Nogueira Alves da Silva
Vitor Lopes Fabris
Valeska Uchôa
Décio Gonccalves de Aguiar Neto
C. F. G. Santos
68
27
0
18 May 2023
Locate Then Generate: Bridging Vision and Language with Bounding Box for
  Scene-Text VQA
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Yongxin Zhu
Ziqiang Liu
Yukang Liang
Xin Li
Hao Liu
Changcun Bao
Linli Xu
60
7
0
04 Apr 2023
Towards Models that Can See and Read
Towards Models that Can See and Read
Roy Ganz
Oren Nuriel
Aviad Aberdam
Yair Kittenplon
Shai Mazor
Ron Litman
75
13
0
18 Jan 2023
Universal Multimodal Representation for Language Understanding
Universal Multimodal Representation for Language Understanding
Zhuosheng Zhang
Kehai Chen
Rui Wang
Masao Utiyama
Eiichiro Sumita
Z. Li
Hai Zhao
SSL
109
22
0
09 Jan 2023
SceneGATE: Scene-Graph based co-Attention networks for TExt visual
  question answering
SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering
Feiqi Cao
Siwen Luo
F. Núñez
Zean Wen
Josiah Poon
Caren Han
GNN
118
5
0
16 Dec 2022
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question
  Answering
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering
Hao Li
Jinfa Huang
Peng Jin
Guoli Song
Qi Wu
Jie Chen
149
22
0
21 Sep 2022
TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation
TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation
Jun Wang
M. Gao
Yuqian Hu
Ramprasaath R. Selvaraju
Chetan Ramaiah
Ran Xu
Joseph Jaja
Larry S. Davis
ViT
72
18
0
03 Aug 2022
Towards Multimodal Vision-Language Models Generating Non-Generic Text
Towards Multimodal Vision-Language Models Generating Non-Generic Text
Wes Robbins
Zanyar Zohourianshahzadi
Jugal Kalita
53
1
0
09 Jul 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
178
564
0
27 May 2022
Towards Escaping from Language Bias and OCR Error: Semantics-Centered
  Text Visual Question Answering
Towards Escaping from Language Bias and OCR Error: Semantics-Centered Text Visual Question Answering
Chengyang Fang
Gangyan Zeng
Yu Zhou
Daiqing Wu
Can Ma
Dayong Hu
Weiping Wang
63
8
0
24 Mar 2022
LaTr: Layout-Aware Transformer for Scene-Text VQA
LaTr: Layout-Aware Transformer for Scene-Text VQA
Ali Furkan Biten
Ron Litman
Yusheng Xie
Srikar Appalaraju
R. Manmatha
ViT
127
102
0
23 Dec 2021
Graph Relation Transformer: Incorporating pairwise object features into
  the Transformer architecture
Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture
Michael Yang
Aditya Anantharaman
Zach Kitowski
Derik Clive Robert
ViT
64
4
0
11 Nov 2021
EKTVQA: Generalized use of External Knowledge to empower Scene Text in
  Text-VQA
EKTVQA: Generalized use of External Knowledge to empower Scene Text in Text-VQA
Arka Ujjal Dey
Ernest Valveny
Gaurav Harit
43
3
0
22 Aug 2021
Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling
Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling
Xiaopeng Lu
Zhenhua Fan
Yansen Wang
Jean Oh
Carolyn Rose
86
27
0
20 Aug 2021
TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped
  scene text
TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text
Amanpreet Singh
Guan Pang
Mandy Toh
Jing Huang
Wojciech Galuba
Tal Hassner
86
174
0
12 May 2021
A First Look: Towards Explainable TextVQA Models via Visual and Textual
  Explanations
A First Look: Towards Explainable TextVQA Models via Visual and Textual Explanations
Varun Nagaraj Rao
Xingjian Zhen
K. Hovsepian
Mingwei Shen
97
19
0
29 Apr 2021
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps
Qi Zhu
Chenyu Gao
Peng Wang
Qi Wu
92
54
0
09 Dec 2020
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
Zhengyuan Yang
Yijuan Lu
Jianfeng Wang
Xi Yin
D. Florêncio
Lijuan Wang
Cha Zhang
Lei Zhang
Jiebo Luo
VLM
107
144
0
08 Dec 2020
Finding the Evidence: Localization-aware Answer Prediction for Text
  Visual Question Answering
Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering
Wei Han
Hantao Huang
Tao Han
60
51
0
06 Oct 2020
1