ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering
v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown
Title
Do BERTs Learn to Use Browser User Interface? Exploring Multi-Step Tasks
  with Unified Vision-and-Language BERTs
Do BERTs Learn to Use Browser User Interface? Exploring Multi-Step Tasks with Unified Vision-and-Language BERTs
Taichi Iki
Akiko Aizawa
LLMAG
66
6
0
15 Mar 2022
Can you even tell left from right? Presenting a new challenge for VQA
Can you even tell left from right? Presenting a new challenge for VQA
Sairaam Venkatraman
Rishi Rao
S. Balasubramanian
C. Vorugunti
R. R. Sarma
CoGe
86
0
0
15 Mar 2022
CARETS: A Consistency And Robustness Evaluative Test Suite for VQA
CARETS: A Consistency And Robustness Evaluative Test Suite for VQA
Carlos E. Jimenez
Olga Russakovsky
Karthik Narasimhan
CoGe
84
14
0
15 Mar 2022
CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual
  Entailment
CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment
Haoyu Song
Li Dong
Weinan Zhang
Ting Liu
Furu Wei
VLMCLIP
108
139
0
14 Mar 2022
The worst of both worlds: A comparative analysis of errors in learning
  from data in psychology and machine learning
The worst of both worlds: A comparative analysis of errors in learning from data in psychology and machine learning
Jessica Hullman
Sayash Kapoor
Priyanka Nanayakkara
Andrew Gelman
Arvind Narayanan
147
39
0
12 Mar 2022
Can I see an Example? Active Learning the Long Tail of Attributes and
  Relations
Can I see an Example? Active Learning the Long Tail of Attributes and Relations
Tyler L. Hayes
Maximilian Nickel
Christopher Kanan
Ludovic Denoyer
Arthur Szlam
VLM
71
3
0
11 Mar 2022
REX: Reasoning-aware and Grounded Explanation
REX: Reasoning-aware and Grounded Explanation
Shi Chen
Qi Zhao
93
18
0
11 Mar 2022
LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text
  Retrieval
LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval
Jie Lei
Xinlei Chen
Ning Zhang
Meng-xing Wang
Joey Tianyi Zhou
Tamara L. Berg
Licheng Yu
118
12
0
10 Mar 2022
PACTran: PAC-Bayesian Metrics for Estimating the Transferability of
  Pretrained Models to Classification Tasks
PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks
Nan Ding
Xi Chen
Tomer Levinboim
Soravit Changpinyo
Radu Soricut
86
30
0
10 Mar 2022
NLX-GPT: A Model for Natural Language Explanations in Vision and
  Vision-Language Tasks
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks
Fawaz Sammani
Tanmoy Mukherjee
Nikos Deligiannis
MILMELMLRM
138
68
0
09 Mar 2022
AssistQ: Affordance-centric Question-driven Task Completion for
  Egocentric Assistant
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
B. Wong
Joya Chen
You Wu
Stan Weixian Lei
Dongxing Mao
Difei Gao
Mike Zheng Shou
EgoV
77
29
0
08 Mar 2022
MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning
MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning
Shiming Chen
Ziming Hong
Guosen Xie
Wenhan Wang
Qinmu Peng
Kai Wang
Jian-jun Zhao
Xinge You
VLM
116
107
0
07 Mar 2022
Modeling Coreference Relations in Visual Dialog
Modeling Coreference Relations in Visual Dialog
Mingxiao Li
Marie-Francine Moens
51
10
0
06 Mar 2022
Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for
  Knowledge-based Visual Question Answering
Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering
Mingxiao Li
Marie-Francine Moens
92
13
0
06 Mar 2022
Vision-Language Intelligence: Tasks, Representation Learning, and Large
  Models
Vision-Language Intelligence: Tasks, Representation Learning, and Large Models
Feng Li
Hao Zhang
Yi-Fan Zhang
Shixuan Liu
Jian Guo
L. Ni
Pengchuan Zhang
Lei Zhang
AI4TSVLM
83
37
0
03 Mar 2022
Quantity over Quality: Training an AV Motion Planner with Large Scale
  Commodity Vision Data
Quantity over Quality: Training an AV Motion Planner with Large Scale Commodity Vision Data
Lukas Platinsky
Tayyab Naseer
Hui Chen
Benjamin A. Haines
Haoyue Zhu
Hugo Grimmett
Luca Del Pero
61
1
0
03 Mar 2022
High-Modality Multimodal Transformer: Quantifying Modality & Interaction
  Heterogeneity for High-Modality Representation Learning
High-Modality Multimodal Transformer: Quantifying Modality & Interaction Heterogeneity for High-Modality Representation Learning
Paul Pu Liang
Yiwei Lyu
Xiang Fan
Jeffrey Tsaw
Yudong Liu
Shentong Mo
Dani Yogatama
Louis-Philippe Morency
Ruslan Salakhutdinov
96
33
0
02 Mar 2022
Video Question Answering: Datasets, Algorithms and Challenges
Video Question Answering: Datasets, Algorithms and Challenges
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
126
93
0
02 Mar 2022
Recent, rapid advancement in visual question answering architecture: a
  review
Recent, rapid advancement in visual question answering architecture: a review
V. Kodali
Daniel Berleant
92
9
0
02 Mar 2022
There is a Time and Place for Reasoning Beyond the Image
There is a Time and Place for Reasoning Beyond the Image
Xingyu Fu
Ben Zhou
I. Chandratreya
Carl Vondrick
Dan Roth
163
22
0
01 Mar 2022
Unsupervised Vision-and-Language Pre-training via Retrieval-based
  Multi-Granular Alignment
Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
Mingyang Zhou
Licheng Yu
Amanpreet Singh
Mengjiao MJ Wang
Zhou Yu
Ning Zhang
VLM
82
31
0
01 Mar 2022
On Modality Bias Recognition and Reduction
On Modality Bias Recognition and Reduction
Yangyang Guo
Liqiang Nie
Harry Cheng
Zhiyong Cheng
Mohan S. Kankanhalli
A. Bimbo
80
28
0
25 Feb 2022
Joint Answering and Explanation for Visual Commonsense Reasoning
Joint Answering and Explanation for Visual Commonsense Reasoning
Zhenyang Li
Yangyang Guo
Ke-Jyun Wang
Yin-wei Wei
Liqiang Nie
Mohan S. Kankanhalli
78
17
0
25 Feb 2022
Measuring CLEVRness: Blackbox testing of Visual Reasoning Models
Measuring CLEVRness: Blackbox testing of Visual Reasoning Models
Spyridon Mouselinos
Henryk Michalewski
Mateusz Malinowski
69
3
0
24 Feb 2022
GroupViT: Semantic Segmentation Emerges from Text Supervision
GroupViT: Semantic Segmentation Emerges from Text Supervision
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
Xinyu Wang
ViTVLM
308
529
0
22 Feb 2022
A Review of Emerging Research Directions in Abstract Visual Reasoning
A Review of Emerging Research Directions in Abstract Visual Reasoning
Mikolaj Malkiñski
Jacek Mańdziuk
111
41
0
21 Feb 2022
3DRM:Pair-wise relation module for 3D object detection
3DRM:Pair-wise relation module for 3D object detection
Yuqing Lan
Yao Duan
Yifei Shi
Hui Huang
Kai Xu
3DPC
46
4
0
20 Feb 2022
(2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering
(2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering
A. Cherian
Chiori Hori
Tim K. Marks
Jonathan Le Roux
108
38
0
18 Feb 2022
A Survey of Vision-Language Pre-Trained Models
A Survey of Vision-Language Pre-Trained Models
Yifan Du
Zikang Liu
Junyi Li
Wayne Xin Zhao
VLM
173
190
0
18 Feb 2022
A Review on Methods and Applications in Multimodal Deep Learning
A Review on Methods and Applications in Multimodal Deep Learning
Summaira Jabeen
Xi Li
Muhammad Shoib Amin
Abdul Jabbar
VLMHAI
75
103
0
18 Feb 2022
VLP: A Survey on Vision-Language Pre-training
VLP: A Survey on Vision-Language Pre-training
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
186
228
0
18 Feb 2022
XFBoost: Improving Text Generation with Controllable Decoders
XFBoost: Improving Text Generation with Controllable Decoders
Xiangyu Peng
Michael Sollami
75
1
0
16 Feb 2022
Privacy Preserving Visual Question Answering
Privacy Preserving Visual Question Answering
Cristian-Paul Bara
Q. Ping
Abhinav Mathur
Govind Thattai
M. Rohith
Gaurav Sukhatme
111
1
0
15 Feb 2022
CommerceMM: Large-Scale Commerce MultiModal Representation Learning with
  Omni Retrieval
CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval
Licheng Yu
Jun Chen
Animesh Sinha
Mengjiao MJ Wang
Hugo Chen
Tamara L. Berg
Ning Zhang
VLM
95
39
0
15 Feb 2022
An experimental study of the vision-bottleneck in VQA
An experimental study of the vision-bottleneck in VQA
Pierre Marza
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
95
1
0
14 Feb 2022
GAMMA Challenge:Glaucoma grAding from Multi-Modality imAges
GAMMA Challenge:Glaucoma grAding from Multi-Modality imAges
Junde Wu
Huihui Fang
Fei Li
Huazhu Fu
Fengbin Lin
...
Q. Hu
Hrvoje Bogunović
J. Orlando
Xiulan Zhang
Yanwu Xu
90
64
0
14 Feb 2022
Multi-Modal Knowledge Graph Construction and Application: A Survey
Multi-Modal Knowledge Graph Construction and Application: A Survey
Xiangru Zhu
Zhixu Li
Xiaodan Wang
Xueyao Jiang
Penglei Sun
Xuwu Wang
Yanghua Xiao
N. Yuan
73
167
0
11 Feb 2022
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive
  Reasoning
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning
Jack Hessel
Jena D. Hwang
Jinho Park
Rowan Zellers
Chandra Bhagavatula
Anna Rohrbach
Kate Saenko
Yejin Choi
ReLM
231
51
0
10 Feb 2022
Can Open Domain Question Answering Systems Answer Visual Knowledge
  Questions?
Can Open Domain Question Answering Systems Answer Visual Knowledge Questions?
Jiawen Zhang
Abhijit Mishra
Avinesh P.V.S
Siddharth Patwardhan
Sachin Agarwal
75
0
0
09 Feb 2022
DALL-Eval: Probing the Reasoning Skills and Social Biases of
  Text-to-Image Generation Models
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models
Jaemin Cho
Abhaysinh Zala
Joey Tianyi Zhou
ViT
258
193
0
08 Feb 2022
NEWSKVQA: Knowledge-Aware News Video Question Answering
NEWSKVQA: Knowledge-Aware News Video Question Answering
Pranay Gupta
Manish Gupta
144
7
0
08 Feb 2022
Catch Me if You Can: A Novel Task for Detection of Covert Geo-Locations
  (CGL)
Catch Me if You Can: A Novel Task for Detection of Covert Geo-Locations (CGL)
Binoy Saha
Sukhendu Das
75
1
0
05 Feb 2022
Webly Supervised Concept Expansion for General Purpose Vision Models
Webly Supervised Concept Expansion for General Purpose Vision Models
Amita Kamath
Christopher Clark
Tanmay Gupta
Eric Kolve
Derek Hoiem
Aniruddha Kembhavi
VLM
97
55
0
04 Feb 2022
Grounding Answers for Visual Questions Asked by Visually Impaired People
Grounding Answers for Visual Questions Asked by Visually Impaired People
Chongyan Chen
Samreen Anjum
Danna Gurari
109
49
0
04 Feb 2022
A Frustratingly Simple Approach for End-to-End Image Captioning
A Frustratingly Simple Approach for End-to-End Image Captioning
Ziyang Luo
Yadong Xi
Rongsheng Zhang
Jing Ma
VLMMLLM
79
16
0
30 Jan 2022
MVPTR: Multi-Level Semantic Alignment for Vision-Language Pre-Training
  via Multi-Stage Learning
MVPTR: Multi-Level Semantic Alignment for Vision-Language Pre-Training via Multi-Stage Learning
Zejun Li
Zhihao Fan
Huaixiao Tou
Jingjing Chen
Zhongyu Wei
Xuanjing Huang
86
18
0
29 Jan 2022
Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven's
  Progressive Matrices
Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven's Progressive Matrices
Mikolaj Malkiñski
Jacek Mańdziuk
225
43
0
28 Jan 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLMBDLVLMCLIP
586
4,443
0
28 Jan 2022
Language-biased image classification: evaluation based on semantic
  representations
Language-biased image classification: evaluation based on semantic representations
Yoann Lemesle
Masataka Sawayama
Guillermo Valle Pérez
Maxime Adolphe
Hélene Sauzéon
Pierre-Yves Oudeyer
VLM
49
7
0
26 Jan 2022
MGA-VQA: Multi-Granularity Alignment for Visual Question Answering
MGA-VQA: Multi-Granularity Alignment for Visual Question Answering
Peixi Xiong
Yilin Shen
Hongxia Jin
35
5
0
25 Jan 2022
Previous
123...303132...585960
Next