ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering
v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown
Title
DRAMA: Joint Risk Localization and Captioning in Driving
DRAMA: Joint Risk Localization and Captioning in Driving
Srikanth Malla
Chiho Choi
Isht Dwivedi
Joonhyang Choi
Jiachen Li
186
100
0
22 Sep 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELMReLMLRM
304
1,303
0
20 Sep 2022
Enabling Conversational Interaction with Mobile UI using Large Language
  Models
Enabling Conversational Interaction with Mobile UI using Large Language Models
Bryan Wang
Gang Li
Yang Li
219
146
0
18 Sep 2022
ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots
ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots
Yu-Chung Hsiao
Fedir Zubach
Maria Wang
Jindong Chen
Victor Carbune
Jason Lin
Maria Wang
Yun Zhu
Jindong Chen
RALM
224
30
0
16 Sep 2022
Exploring Visual Interpretability for Contrastive Language-Image
  Pre-training
Exploring Visual Interpretability for Contrastive Language-Image Pre-training
Yi Li
Hualiang Wang
Yiqun Duan
Han Xu
Xiaomeng Li
CLIPVLM
153
28
0
15 Sep 2022
VIPHY: Probing "Visible" Physical Commonsense Knowledge
VIPHY: Probing "Visible" Physical Commonsense Knowledge
Shikhar Singh
Ehsan Qasemi
Muhao Chen
94
7
0
15 Sep 2022
Correlation Information Bottleneck: Towards Adapting Pretrained
  Multimodal Models for Robust Visual Question Answering
Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question Answering
Jingjing Jiang
Zi-yi Liu
Nanning Zheng
111
8
0
14 Sep 2022
PaLI: A Jointly-Scaled Multilingual Language-Image Model
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Xi Chen
Tianlin Li
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLMVLM
238
742
0
14 Sep 2022
ImageArg: A Multi-modal Tweet Dataset for Image Persuasiveness Mining
ImageArg: A Multi-modal Tweet Dataset for Image Persuasiveness Mining
Zhexiong Liu
M. Guo
Y. Dai
Diane Litman
85
16
0
14 Sep 2022
PreSTU: Pre-Training for Scene-Text Understanding
PreSTU: Pre-Training for Scene-Text Understanding
Jihyung Kil
Soravit Changpinyo
Xi Chen
Hexiang Hu
Sebastian Goodman
Wei-Lun Chao
Radu Soricut
VLM
193
29
0
12 Sep 2022
MaXM: Towards Multilingual Visual Question Answering
MaXM: Towards Multilingual Visual Question Answering
Soravit Changpinyo
Linting Xue
Michal Yarom
Ashish V. Thapliyal
Idan Szpektor
J. Amelot
Xi Chen
Radu Soricut
120
8
0
12 Sep 2022
Ask Before You Act: Generalising to Novel Environments by Asking
  Questions
Ask Before You Act: Generalising to Novel Environments by Asking Questions
Ross Murphy
S. Mosesov
Javier Leguina Peral
Thymo ter Doest
LRM
65
0
0
10 Sep 2022
Pre-training image-language transformers for open-vocabulary tasks
Pre-training image-language transformers for open-vocabulary tasks
A. Piergiovanni
Weicheng Kuo
A. Angelova
VLMViT
119
10
0
09 Sep 2022
Foundations and Trends in Multimodal Machine Learning: Principles,
  Challenges, and Open Questions
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
114
90
0
07 Sep 2022
Interactive Question Answering Systems: Literature Review
Interactive Question Answering Systems: Literature Review
Giovanni Maria Biancofiore
Yashar Deldjoo
Tommaso Di Noia
E. Sciascio
Fedelucio Narducci
111
23
0
04 Sep 2022
Diffusion Models: A Comprehensive Survey of Methods and Applications
Diffusion Models: A Comprehensive Survey of Methods and Applications
Ling Yang
Zhilong Zhang
Yingxia Shao
Shenda Hong
Runsheng Xu
Yue Zhao
Wentao Zhang
Tengjiao Wang
Ming-Hsuan Yang
DiffMMedIm
568
1,428
0
02 Sep 2022
Efficient Vision-Language Pretraining with Visual Concepts and
  Hierarchical Alignment
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment
Mustafa Shukor
Guillaume Couairon
Matthieu Cord
VLMCLIP
100
27
0
29 Aug 2022
Disentangle and Remerge: Interventional Knowledge Distillation for
  Few-Shot Object Detection from A Conditional Causal Perspective
Disentangle and Remerge: Interventional Knowledge Distillation for Few-Shot Object Detection from A Conditional Causal Perspective
Jiangmeng Li
Yanan Zhang
Jingyao Wang
Hui Xiong
Chengbo Jiao
Xiaohui Hu
Changwen Zheng
Gang Hua
CML
116
30
0
26 Aug 2022
AiM: Taking Answers in Mind to Correct Chinese Cloze Tests in
  Educational Applications
AiM: Taking Answers in Mind to Correct Chinese Cloze Tests in Educational Applications
Yusen Zhang
Zhongli Li
Qingyu Zhou
Ziyi Liu
Chao Li
Mina W. Ma
Yunbo Cao
Hongzhi Liu
93
1
0
26 Aug 2022
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA
  Task
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task
Stan Weixian Lei
Difei Gao
Jay Zhangjie Wu
Yuxuan Wang
Wei Liu
Meng Zhang
Mike Zheng Shou
81
38
0
24 Aug 2022
How good are deep models in understanding the generated images?
How good are deep models in understanding the generated images?
Ali Borji
OOD
55
6
0
23 Aug 2022
Learning More May Not Be Better: Knowledge Transferability in Vision and
  Language Tasks
Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks
Tianwei Chen
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
Hajime Nagahara
VLM
56
0
0
23 Aug 2022
Neuro-Symbolic Visual Dialog
Neuro-Symbolic Visual Dialog
Adnen Abdessaied
Mihai Bâce
Andreas Bulling
NAI
57
3
0
22 Aug 2022
CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for
  Image-Text Retrieval
CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval
Haoran Wang
Dongliang He
Wenhao Wu
Boyang Xia
Min Yang
Fu Li
YunLong Yu
Zhong Ji
Errui Ding
Jingdong Wang
64
23
0
21 Aug 2022
Causality-Inspired Taxonomy for Explainable Artificial Intelligence
Causality-Inspired Taxonomy for Explainable Artificial Intelligence
Pedro C. Neto
Tiago B. Gonccalves
João Ribeiro Pinto
W. Silva
Ana F. Sequeira
Arun Ross
Jaime S. Cardoso
XAI
112
13
0
19 Aug 2022
VAuLT: Augmenting the Vision-and-Language Transformer for Sentiment
  Classification on Social Media
VAuLT: Augmenting the Vision-and-Language Transformer for Sentiment Classification on Social Media
Georgios Chochlakis
Tejas Srinivasan
Jesse Thomason
Shrikanth Narayanan
VLM
89
4
0
18 Aug 2022
Towards Open-vocabulary Scene Graph Generation with Prompt-based
  Finetuning
Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning
Tao He
Lianli Gao
Jingkuan Song
Yuan-Fang Li
VLM
88
53
0
17 Aug 2022
PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative
  Grounding
PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding
Zihan Ding
Zixiang Ding
Tianrui Hui
Junshi Huang
Xiaoming Wei
Xiaolin K. Wei
Si Liu
96
14
0
11 Aug 2022
Patching open-vocabulary models by interpolating weights
Patching open-vocabulary models by interpolating weights
Gabriel Ilharco
Mitchell Wortsman
S. Gadre
Shuran Song
Hannaneh Hajishirzi
Simon Kornblith
Ali Farhadi
Ludwig Schmidt
VLMKELM
161
176
0
10 Aug 2022
CLEVR-Math: A Dataset for Compositional Language, Visual and
  Mathematical Reasoning
CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning
Adam Dahlgren Lindström
Savitha Sam Abraham
62
58
0
10 Aug 2022
Boosting Video-Text Retrieval with Explicit High-Level Semantics
Boosting Video-Text Retrieval with Explicit High-Level Semantics
Haoran Wang
Di Xu
Dongliang He
Fu Li
Zhong Ji
Jungong Han
Errui Ding
88
13
0
08 Aug 2022
Neural Message Passing for Visual Relationship Detection
Neural Message Passing for Visual Relationship Detection
Yue Hu
Siheng Chen
Xu Chen
Ya Zhang
Xiao Gu
89
17
0
08 Aug 2022
GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language
  Pre-training
GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language Pre-training
Jaeseok Byun
Taebaek Hwang
Jianlong Fu
Taesup Moon
VLM
95
11
0
08 Aug 2022
ChiQA: A Large Scale Image-based Real-World Question Answering Dataset
  for Multi-Modal Understanding
ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding
Bingning Wang
Feiya Lv
Ting Yao
Yiming Yuan
Jin Ma
Yu Luo
Haijin Liang
68
3
0
05 Aug 2022
Prompt Tuning for Generative Multimodal Pretrained Models
Prompt Tuning for Generative Multimodal Pretrained Models
Han Yang
Junyang Lin
An Yang
Peng Wang
Chang Zhou
Hongxia Yang
VLMLRMVPVLM
91
31
0
04 Aug 2022
Fine-Grained Semantically Aligned Vision-Language Pre-Training
Fine-Grained Semantically Aligned Vision-Language Pre-Training
Juncheng Li
Xin He
Longhui Wei
Long Qian
Linchao Zhu
Lingxi Xie
Yueting Zhuang
Qi Tian
Siliang Tang
VLM
106
80
0
04 Aug 2022
TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation
TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation
Jun Wang
M. Gao
Yuqian Hu
Ramprasaath R. Selvaraju
Chetan Ramaiah
Ran Xu
Joseph Jaja
Larry S. Davis
ViT
72
18
0
03 Aug 2022
Video Question Answering with Iterative Video-Text Co-Tokenization
Video Question Answering with Iterative Video-Text Co-Tokenization
A. Piergiovanni
K. Morton
Weicheng Kuo
Michael S. Ryoo
A. Angelova
104
18
0
01 Aug 2022
Generative Bias for Robust Visual Question Answering
Generative Bias for Robust Visual Question Answering
Jae-Won Cho
Dong-Jin Kim
H. Ryu
In So Kweon
OODCML
109
20
0
01 Aug 2022
Visual Recognition by Request
Visual Recognition by Request
Chufeng Tang
Lingxi Xie
Xiaopeng Zhang
Xiaolin Hu
Qi Tian
VLM
104
15
0
28 Jul 2022
DoRO: Disambiguation of referred object for embodied agents
DoRO: Disambiguation of referred object for embodied agents
Pradip Pramanick
Chayan Sarkar
S. Paul
R. Roychoudhury
Brojeshwar Bhowmick
LM&Ro
51
14
0
28 Jul 2022
Portrait Interpretation and a Benchmark
Portrait Interpretation and a Benchmark
Yixuan Fan
Zhaopeng Dou
Yali Li
Shengjin Wang
61
0
0
27 Jul 2022
Uncertainty-based Visual Question Answering: Estimating Semantic
  Inconsistency between Image and Knowledge Base
Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base
Jinyeong Chae
Jihie Kim
64
2
0
27 Jul 2022
NewsStories: Illustrating articles with visual summaries
NewsStories: Illustrating articles with visual summaries
Reuben Tan
Bryan A. Plummer
Kate Saenko
J. P. Lewis
Avneesh Sud
Thomas Leung
VLMSSL
145
5
0
26 Jul 2022
LaKo: Knowledge-driven Visual Question Answering via Late
  Knowledge-to-Text Injection
LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection
Zhuo Chen
Yufen Huang
Jiaoyan Chen
Yuxia Geng
Yin Fang
Jeff Z. Pan
Ningyu Zhang
Wen Zhang
95
38
0
26 Jul 2022
Cross-Modal Causal Relational Reasoning for Event-Level Visual Question
  Answering
Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering
Yang Liu
Guanbin Li
Liang Lin
LRM
185
89
0
26 Jul 2022
Is GPT-3 all you need for Visual Question Answering in Cultural
  Heritage?
Is GPT-3 all you need for Visual Question Answering in Cultural Heritage?
P. Bongini
Federico Becattini
A. Bimbo
56
13
0
25 Jul 2022
Visual Perturbation-aware Collaborative Learning for Overcoming the
  Language Prior Problem
Visual Perturbation-aware Collaborative Learning for Overcoming the Language Prior Problem
Yudong Han
Liqiang Nie
Jianhua Yin
Jianlong Wu
Yan Yan
92
14
0
24 Jul 2022
Semantic-aware Modular Capsule Routing for Visual Question Answering
Semantic-aware Modular Capsule Routing for Visual Question Answering
Yudong Han
Jianhua Yin
Jianlong Wu
Yin-wei Wei
Liqiang Nie
68
8
0
21 Jul 2022
Correspondence Matters for Video Referring Expression Comprehension
Correspondence Matters for Video Referring Expression Comprehension
Meng Cao
Ji Jiang
Long Chen
Yuexian Zou
VOS
91
20
0
21 Jul 2022
Previous
123...262728...585960
Next