ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering
v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown
Title
GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video
  Summarization
GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization
Jia-Hong Huang
L. Murn
M. Mrak
Marcel Worring
ViT
152
37
0
26 Apr 2021
Playing Lottery Tickets with Vision and Language
Playing Lottery Tickets with Vision and Language
Zhe Gan
Yen-Chun Chen
Linjie Li
Tianlong Chen
Yu Cheng
Shuohang Wang
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
148
56
0
23 Apr 2021
Towards Solving Multimodal Comprehension
Towards Solving Multimodal Comprehension
Pritish Sahu
Karan Sikka
Ajay Divakaran
31
2
0
20 Apr 2021
Detector-Free Weakly Supervised Grounding by Separation
Detector-Free Weakly Supervised Grounding by Separation
Assaf Arbelle
Sivan Doveh
Amit Alfassy
J. Shtok
Guy Lev
...
Kate Saenko
S. Ullman
Raja Giryes
Rogerio Feris
Leonid Karlinsky
92
24
0
20 Apr 2021
Open Challenges on Generating Referring Expressions for Human-Robot
  Interaction
Open Challenges on Generating Referring Expressions for Human-Robot Interaction
Fethiye Irmak Dogan
Iolanda Leite
91
4
0
19 Apr 2021
SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal
  Conversations
SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations
Satwik Kottur
Seungwhan Moon
A. Geramifard
Babak Damavandi
86
92
0
18 Apr 2021
Worst of Both Worlds: Biases Compound in Pre-trained Vision-and-Language
  Models
Worst of Both Worlds: Biases Compound in Pre-trained Vision-and-Language Models
Tejas Srinivasan
Yonatan Bisk
VLM
83
56
0
18 Apr 2021
Question Decomposition with Dependency Graphs
Question Decomposition with Dependency Graphs
Matan Hasson
Jonathan Berant
GNN
91
10
0
17 Apr 2021
TransVG: End-to-End Visual Grounding with Transformers
TransVG: End-to-End Visual Grounding with Transformers
Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wen-gang Zhou
Houqiang Li
ViT
111
348
0
17 Apr 2021
Effect of Visual Extensions on Natural Language Understanding in
  Vision-and-Language Models
Effect of Visual Extensions on Natural Language Understanding in Vision-and-Language Models
Taichi Iki
Akiko Aizawa
VLM
67
20
0
16 Apr 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language
  Tasks
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks
Hung Le
Nancy F. Chen
Guosheng Lin
MLLM
83
19
0
16 Apr 2021
Ensemble of MRR and NDCG models for Visual Dialog
Ensemble of MRR and NDCG models for Visual Dialog
Idan Schwartz
58
9
0
15 Apr 2021
Do Neural Network Weights account for Classes Centers?
Do Neural Network Weights account for Classes Centers?
Ioannis Kansizoglou
Loukas Bampis
Antonios Gasteratos
40
10
0
14 Apr 2021
Jointly Learning Truth-Conditional Denotations and Groundings using
  Parallel Attention
Jointly Learning Truth-Conditional Denotations and Groundings using Parallel Attention
Leon Bergen
Dzmitry Bahdanau
Timothy J. O'Donnell
FedML
29
1
0
14 Apr 2021
Neuro-Symbolic VQA: A review from the perspective of AGI desiderata
Neuro-Symbolic VQA: A review from the perspective of AGI desiderata
Ian Berlot-Attwell
27
3
0
13 Apr 2021
MultiModalQA: Complex Question Answering over Text, Tables and Images
MultiModalQA: Complex Question Answering over Text, Tables and Images
Alon Talmor
Ori Yoran
Amnon Catav
Dan Lahav
Yizhong Wang
Akari Asai
Gabriel Ilharco
Hannaneh Hajishirzi
Jonathan Berant
LMTD
102
163
0
13 Apr 2021
CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question
  Answering with Hypothetical Actions over Images
CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images
Shailaja Keyur Sampat
Akshay Kumar
Yezhou Yang
Chitta Baral
78
26
0
13 Apr 2021
Dealing with Missing Modalities in the Visual Question Answer-Difference
  Prediction Task through Knowledge Distillation
Dealing with Missing Modalities in the Visual Question Answer-Difference Prediction Task through Knowledge Distillation
Jae-Won Cho
Dong-Jin Kim
Jinsoo Choi
Yunjae Jung
In So Kweon
VLM
57
17
0
13 Apr 2021
SpartQA: : A Textual Question Answering Benchmark for Spatial Reasoning
SpartQA: : A Textual Question Answering Benchmark for Spatial Reasoning
Roshanak Mirzaee
Hossein Rajaby Faghihi
Qiang Ning
Parisa Kordjmashidi
56
83
0
12 Apr 2021
Towards a Collective Agenda on AI for Earth Science Data Analysis
Towards a Collective Agenda on AI for Earth Science Data Analysis
D. Tuia
R. Roscher
Jan Dirk Wegner
Nathan Jacobs
Xiaoxiang Zhu
Gustau Camps-Valls
AI4CE
84
70
0
11 Apr 2021
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding
  Evaluation Framework
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation Framework
Santiago Castro
Ruoyao Wang
Pingxuan Huang
Ian Stewart
Oana Ignat
Nan Liu
Jonathan C. Stroud
Rada Mihalcea
AIMat
89
11
0
09 Apr 2021
The Road to Know-Where: An Object-and-Room Informed Sequential BERT for
  Indoor Vision-Language Navigation
The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation
Yuankai Qi
Zizheng Pan
Yicong Hong
Ming-Hsuan Yang
Anton Van Den Hengel
Qi Wu
LM&Ro
84
69
0
09 Apr 2021
Exploiting Natural Language for Efficient Risk-Aware Multi-robot SaR
  Planning
Exploiting Natural Language for Efficient Risk-Aware Multi-robot SaR Planning
Vikram Shree
B. Asfora
Rachel Zheng
Samantha Hong
Jacopo Banfi
M. Campbell
46
10
0
08 Apr 2021
How Transferable are Reasoning Patterns in VQA?
How Transferable are Reasoning Patterns in VQA?
Corentin Kervadec
Theo Jaunet
G. Antipov
M. Baccouche
Romain Vuillemot
Christian Wolf
LRM
63
28
0
08 Apr 2021
Multimodal Entity Linking for Tweets
Multimodal Entity Linking for Tweets
Omar Adjali
Romaric Besançon
Olivier Ferret
Hervé Le Borgne
Brigitte Grau
71
49
0
07 Apr 2021
Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in
  Visual Question Answering
Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
Corentin Dancette
Rémi Cadène
Damien Teney
Matthieu Cord
CML
94
78
0
07 Apr 2021
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language
  Representation Learning
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Zhicheng Huang
Zhaoyang Zeng
Yupan Huang
Bei Liu
Dongmei Fu
Jianlong Fu
VLMViT
160
274
0
07 Apr 2021
Compressing Visual-linguistic Model via Knowledge Distillation
Compressing Visual-linguistic Model via Knowledge Distillation
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lijuan Wang
Yezhou Yang
Zicheng Liu
VLM
120
99
0
05 Apr 2021
VisQA: X-raying Vision and Language Reasoning in Transformers
VisQA: X-raying Vision and Language Reasoning in Transformers
Theo Jaunet
Corentin Kervadec
Romain Vuillemot
G. Antipov
M. Baccouche
Christian Wolf
68
26
0
02 Apr 2021
Towards General Purpose Vision Systems
Towards General Purpose Vision Systems
Tanmay Gupta
Amita Kamath
Aniruddha Kembhavi
Derek Hoiem
103
53
0
01 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
227
1,194
0
01 Apr 2021
Zero-Shot Language Transfer vs Iterative Back Translation for
  Unsupervised Machine Translation
Zero-Shot Language Transfer vs Iterative Back Translation for Unsupervised Machine Translation
Aviral Joshi
Chengzhi Huang
H. Singh
54
2
0
31 Mar 2021
Multi-Class Multi-Instance Count Conditioned Adversarial Image
  Generation
Multi-Class Multi-Instance Count Conditioned Adversarial Image Generation
Amrutha Saseendran
Kathrin Skubch
Margret Keuper
VLMGAN
45
2
0
31 Mar 2021
Attention, please! A survey of Neural Attention Models in Deep Learning
Attention, please! A survey of Neural Attention Models in Deep Learning
Alana de Santana Correia
Esther Luna Colombini
HAI
128
198
0
31 Mar 2021
Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Linbo Jin
Ben Chen
Hao Zhou
Minghui Qiu
Ling Shao
VLM
99
121
0
30 Mar 2021
Grounding Open-Domain Instructions to Automate Web Support Tasks
Grounding Open-Domain Instructions to Automate Web Support Tasks
N. Xu
Sam Masling
Michael Du
Giovanni Campagna
Larry Heck
James A. Landay
M. Lam
LLMAGAI4TS
73
44
0
30 Mar 2021
AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning
AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning
Madeleine Grunde-McLaughlin
Ranjay Krishna
Maneesh Agrawala
CoGe
85
119
0
30 Mar 2021
Domain-robust VQA with diverse datasets and methods but no target labels
Domain-robust VQA with diverse datasets and methods but no target labels
Ruotong Wang
Tristan D. Maidment
Ahmad Diab
Adriana Kovashka
R. Hwa
OOD
129
23
0
29 Mar 2021
Generic Attention-model Explainability for Interpreting Bi-Modal and
  Encoder-Decoder Transformers
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers
Hila Chefer
Shir Gur
Lior Wolf
ViT
103
328
0
29 Mar 2021
'Just because you are right, doesn't mean I am wrong': Overcoming a
  Bottleneck in the Development and Evaluation of Open-Ended Visual Question
  Answering (VQA) Tasks
'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) Tasks
Man Luo
Shailaja Keyur Sampat
Riley Tallman
Yankai Zeng
Manuha Vancha
Akarshan Sajja
Chitta Baral
53
10
0
28 Mar 2021
Generating and Evaluating Explanations of Attended and Error-Inducing
  Input Regions for VQA Models
Generating and Evaluating Explanations of Attended and Error-Inducing Input Regions for VQA Models
Arijit Ray
Michael Cogswell
Xiaoyu Lin
Kamran Alipour
Ajay Divakaran
Yi Yao
Giedrius Burachas
FAtt
40
4
0
26 Mar 2021
On the hidden treasure of dialog in video question answering
On the hidden treasure of dialog in video question answering
Deniz Engin
Franccois Schnitzler
Ngoc Q. K. Duong
Yannis Avrithis
76
12
0
26 Mar 2021
Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and
  Execution
Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution
Fangqiu Yi
Baoxiong Jia
Song-Chun Zhu
Yixin Zhu
132
72
0
26 Mar 2021
Structured Co-reference Graph Attention for Video-grounded Dialogue
Structured Co-reference Graph Attention for Video-grounded Dialogue
Junyeong Kim
Sunjae Yoon
Dahyun Kim
Chang D. Yoo
68
26
0
24 Mar 2021
Scene-Intuitive Agent for Remote Embodied Visual Grounding
Scene-Intuitive Agent for Remote Embodied Visual Grounding
Xiangru Lin
Guanbin Li
Yizhou Yu
LM&Ro
80
53
0
24 Mar 2021
Co-Grounding Networks with Semantic Attention for Referring Expression
  Comprehension in Videos
Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos
Sijie Song
Xudong Lin
Jiaying Liu
Zongming Guo
Shih-Fu Chang
ObjD
60
16
0
23 Mar 2021
Multi-Modal Answer Validation for Knowledge-Based VQA
Multi-Modal Answer Validation for Knowledge-Based VQA
Jialin Wu
Jiasen Lu
Ashish Sabharwal
Roozbeh Mottaghi
171
146
0
23 Mar 2021
Local Interpretations for Explainable Natural Language Processing: A
  Survey
Local Interpretations for Explainable Natural Language Processing: A Survey
Siwen Luo
Hamish Ivison
S. Han
Josiah Poon
MILM
120
52
0
20 Mar 2021
ClawCraneNet: Leveraging Object-level Relation for Text-based Video
  Segmentation
ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation
Chen Liang
Yu Wu
Yawei Luo
Yi Yang
VOS
101
30
0
19 Mar 2021
Decoupled Spatial Temporal Graphs for Generic Visual Grounding
Decoupled Spatial Temporal Graphs for Generic Visual Grounding
Qi Feng
Yunchao Wei
Mingming Cheng
Yi Yang
64
5
0
18 Mar 2021
Previous
123...363738...585960
Next