ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering
v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown
Title
Seeing is Knowing! Fact-based Visual Question Answering using Knowledge
  Graph Embeddings
Seeing is Knowing! Fact-based Visual Question Answering using Knowledge Graph Embeddings
Kiran Ramnath
M. Hasegawa-Johnson
66
9
0
31 Dec 2020
OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual
  Contexts
OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts
Yuxian Meng
Shuhe Wang
Qinghong Han
Xiaofei Sun
Leilei Gan
Rui Yan
Jiwei Li
93
30
0
30 Dec 2020
Detecting Hate Speech in Memes Using Multimodal Deep Learning
  Approaches: Prize-winning solution to Hateful Memes Challenge
Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge
Riza Velioglu
J. Rose
VLM
50
87
0
23 Dec 2020
A Multimodal Framework for the Detection of Hateful Memes
A Multimodal Framework for the Detection of Hateful Memes
Phillip Lippe
Nithin Holla
Shantanu Chandra
S. Rajamanickam
Georgios Antoniou
Ekaterina Shutova
H. Yannakoudakis
60
74
0
23 Dec 2020
Seeing past words: Testing the cross-modal capabilities of pretrained
  V&L models on counting tasks
Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks
Letitia Parcalabescu
Albert Gatt
Anette Frank
Iacer Calixto
LRM
101
49
0
22 Dec 2020
Object-Centric Diagnosis of Visual Reasoning
Object-Centric Diagnosis of Visual Reasoning
Jianwei Yang
Jiayuan Mao
Jiajun Wu
Devi Parikh
David D. Cox
J. Tenenbaum
Chuang Gan
OCL
82
16
0
21 Dec 2020
Learning content and context with language bias for Visual Question
  Answering
Learning content and context with language bias for Visual Question Answering
Chao Yang
Su Feng
Dongsheng Li
Huawei Shen
Guoqing Wang
Bin Jiang
68
21
0
21 Dec 2020
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain
  Knowledge-Based VQA
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
Kenneth Marino
Xinlei Chen
Devi Parikh
Abhinav Gupta
Marcus Rohrbach
128
188
0
20 Dec 2020
Trying Bilinear Pooling in Video-QA
Trying Bilinear Pooling in Video-QA
T. Winterbottom
S. Xiao
A. McLean
Noura Al Moubayed
71
3
0
18 Dec 2020
On Modality Bias in the TVQA Dataset
On Modality Bias in the TVQA Dataset
T. Winterbottom
S. Xiao
A. McLean
Noura Al Moubayed
87
35
0
18 Dec 2020
Can Transformers Reason About Effects of Actions?
Can Transformers Reason About Effects of Actions?
Pratyay Banerjee
Chitta Baral
Man Luo
Arindam Mitra
Kuntal Kumar Pal
Tran Cao Son
Neeraj Varshney
LRMAI4CE
79
10
0
17 Dec 2020
Overcoming Language Priors with Self-supervised Learning for Visual
  Question Answering
Overcoming Language Priors with Self-supervised Learning for Visual Question Answering
Xi Zhu
Zhendong Mao
Chunxiao Liu
Peng Zhang
Bin Wang
Yongdong Zhang
SSL
58
117
0
17 Dec 2020
MELINDA: A Multimodal Dataset for Biomedical Experiment Method
  Classification
MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification
Te-Lin Wu
Shikhar Singh
S. Paul
Gully A. Burns
Nanyun Peng
41
18
0
16 Dec 2020
Towards Recognizing New Semantic Concepts in New Visual Domains
Towards Recognizing New Semantic Concepts in New Visual Domains
Massimiliano Mancini
OOD
141
0
0
16 Dec 2020
A Closer Look at the Robustness of Vision-and-Language Pre-trained
  Models
A Closer Look at the Robustness of Vision-and-Language Pre-trained Models
Linjie Li
Zhe Gan
Jingjing Liu
VLM
98
44
0
15 Dec 2020
Enhance Multimodal Transformer With External Label And In-Domain
  Pretrain: Hateful Meme Challenge Winning Solution
Enhance Multimodal Transformer With External Label And In-Domain Pretrain: Hateful Meme Challenge Winning Solution
Ron Zhu
73
81
0
15 Dec 2020
Knowledge-Routed Visual Question Reasoning: Challenges for Deep
  Representation Embedding
Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding
Qingxing Cao
Bailin Li
Xiaodan Liang
Keze Wang
Liang Lin
94
37
0
14 Dec 2020
Learning Contextual Causality from Time-consecutive Images
Learning Contextual Causality from Time-consecutive Images
Hongming Zhang
Yintong Huo
Xinran Zhao
Yangqiu Song
Dan Roth
CML
59
6
0
13 Dec 2020
Look Before you Speak: Visually Contextualized Utterances
Look Before you Speak: Visually Contextualized Utterances
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
99
67
0
10 Dec 2020
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps
Qi Zhu
Chenyu Gao
Peng Wang
Qi Wu
92
54
0
09 Dec 2020
Intrinsically Motivated Compositional Language Emergence
Intrinsically Motivated Compositional Language Emergence
Rishi Hazra
Sonu Dixit
Sayambhu Sen
65
1
0
09 Dec 2020
Emotional Conversation Generation with Heterogeneous Graph Neural
  Network
Emotional Conversation Generation with Heterogeneous Graph Neural Network
Yunlong Liang
Fandong Meng
Yingxue Zhang
Jinan Xu
Jinan Xu
Jie Zhou
68
25
0
09 Dec 2020
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
Zhengyuan Yang
Yijuan Lu
Jianfeng Wang
Xi Yin
D. Florêncio
Lijuan Wang
Cha Zhang
Lei Zhang
Jiebo Luo
VLM
107
144
0
08 Dec 2020
CASTing Your Model: Learning to Localize Improves Self-Supervised
  Representations
CASTing Your Model: Learning to Localize Improves Self-Supervised Representations
Ramprasaath R. Selvaraju
Karan Desai
Justin Johnson
Nikhil Naik
SSL
101
80
0
08 Dec 2020
CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions
CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions
Tayfun Ates
Muhammed Samil Atesoglu
Cagatay Yigit
.Ilker Kesen
Mert Kobaş
Erkut Erdem
Aykut Erdem
T. Goksun
Deniz Yuret
61
31
0
08 Dec 2020
Edited Media Understanding Frames: Reasoning About the Intent and Implications of Visual Misinformation
Edited Media Understanding Frames: Reasoning About the Intent and Implications of Visual Misinformation
Jeff Da
Maxwell Forbes
Rowan Zellers
Anthony Zheng
Jena D. Hwang
Antoine Bosselut
Yejin Choi
DiffM
87
13
0
08 Dec 2020
WeaQA: Weak Supervision via Captions for Visual Question Answering
WeaQA: Weak Supervision via Captions for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
110
36
0
04 Dec 2020
Classification of Multimodal Hate Speech -- The Winning Solution of
  Hateful Memes Challenge
Classification of Multimodal Hate Speech -- The Winning Solution of Hateful Memes Challenge
Xiayu Zhong
59
15
0
02 Dec 2020
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework
  of Vision-and-Language BERTs
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs
Emanuele Bugliarello
Ryan Cotterell
Naoaki Okazaki
Desmond Elliott
102
120
0
30 Nov 2020
Point and Ask: Incorporating Pointing into Visual Question Answering
Point and Ask: Incorporating Pointing into Visual Question Answering
Arjun Mani
Nobline Yoo
William Fu-Hinthorn
Olga Russakovsky
3DPC
82
38
0
27 Nov 2020
Transformation Driven Visual Reasoning
Transformation Driven Visual Reasoning
Xin Hong
Yanyan Lan
Liang Pang
Jiafeng Guo
Xueqi Cheng
LRM
85
23
0
26 Nov 2020
Adversarial Evaluation of Multimodal Models under Realistic Gray Box
  Assumption
Adversarial Evaluation of Multimodal Models under Realistic Gray Box Assumption
Ivan Evtimov
Russ Howes
Brian Dolhansky
Hamed Firooz
Cristian Canton Ferrer
AAML
49
10
0
25 Nov 2020
Multimodal Learning for Hateful Memes Detection
Multimodal Learning for Hateful Memes Detection
Yi Zhou
Zhenhao Chen
87
61
0
25 Nov 2020
XTQA: Span-Level Explanations of the Textbook Question Answering
XTQA: Span-Level Explanations of the Textbook Question Answering
Jie Ma
Q. Zheng
Jun Liu
Qingyu Yin
Jianlong Zhou
Y. Huang
34
13
0
25 Nov 2020
Large Scale Multimodal Classification Using an Ensemble of Transformer
  Models and Co-Attention
Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention
Varnith Chordia
B. Vijaykumar
57
7
0
23 Nov 2020
Interpretable Visual Reasoning via Induced Symbolic Space
Interpretable Visual Reasoning via Induced Symbolic Space
Zhonghao Wang
Kai Wang
Mo Yu
Jinjun Xiong
Wen-mei W. Hwu
M. Hasegawa-Johnson
Humphrey Shi
LRMOCL
63
20
0
23 Nov 2020
Video SemNet: Memory-Augmented Video Semantic Network
Video SemNet: Memory-Augmented Video Semantic Network
Prashanth Vijayaraghavan
D. Roy
40
0
0
22 Nov 2020
Using Text to Teach Image Retrieval
Using Text to Teach Image Retrieval
Haoyu Dong
Ze Wang
Qiang Qiu
Guillermo Sapiro
3DV
75
4
0
19 Nov 2020
Improving Calibration in Deep Metric Learning With Cross-Example Softmax
Improving Calibration in Deep Metric Learning With Cross-Example Softmax
Andreas Veit
Kimberly Wilber
24
2
0
17 Nov 2020
iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video
  Captioning and Video Question Answering
iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering
Aman Chadha
Gurneet Arora
Navpreet Kaloty
66
37
0
16 Nov 2020
A Review of Uncertainty Quantification in Deep Learning: Techniques,
  Applications and Challenges
A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges
Moloud Abdar
Farhad Pourpanah
Sadiq Hussain
Dana Rezazadegan
Li Liu
...
Xiaochun Cao
Abbas Khosravi
U. Acharya
V. Makarenkov
S. Nahavandi
BDLUQCV
380
1,952
0
12 Nov 2020
Deep Multimodal Fusion by Channel Exchanging
Deep Multimodal Fusion by Channel Exchanging
Yikai Wang
Wenbing Huang
Gang Hua
Tingyang Xu
Yu Rong
Junzhou Huang
76
246
0
10 Nov 2020
Determining Question-Answer Plausibility in Crowdsourced Datasets Using
  Multi-Task Learning
Determining Question-Answer Plausibility in Crowdsourced Datasets Using Multi-Task Learning
Rachel Gardner
M. Varma
Clare Zhu
Ranjay Krishna
43
6
0
10 Nov 2020
Multi-document Summarization via Deep Learning Techniques: A Survey
Multi-document Summarization via Deep Learning Techniques: A Survey
Congbo Ma
W. Zhang
Mingyu Guo
Hu Wang
Quan Z. Sheng
125
129
0
10 Nov 2020
Utilizing Every Image Object for Semi-supervised Phrase Grounding
Utilizing Every Image Object for Semi-supervised Phrase Grounding
Haidong Zhu
Arka Sadhu
Zhao-Heng Zheng
Ram Nevatia
ObjD
66
7
0
05 Nov 2020
An Improved Attention for Visual Question Answering
An Improved Attention for Visual Question Answering
Tanzila Rahman
Shih-Han Chou
Leonid Sigal
Giuseppe Carenini
55
45
0
04 Nov 2020
Cross-Media Keyphrase Prediction: A Unified Framework with
  Multi-Modality Multi-Head Attention and Image Wordings
Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings
Yue Wang
Jing Li
Michael R. Lyu
Irwin King
75
16
0
03 Nov 2020
Pairwise Relations Discriminator for Unsupervised Raven's Progressive
  Matrices
Pairwise Relations Discriminator for Unsupervised Raven's Progressive Matrices
Nicholas Quek Wei Kiat
Duo Wang
M. Jamnik
81
5
0
02 Nov 2020
Reasoning Over History: Context Aware Visual Dialog
Reasoning Over History: Context Aware Visual Dialog
Muhammad A. Shah
Shikib Mehri
Tejas Srinivasan
33
4
0
02 Nov 2020
DeepOpht: Medical Report Generation for Retinal Images via Deep Models
  and Visual Explanation
DeepOpht: Medical Report Generation for Retinal Images via Deep Models and Visual Explanation
Jia-Hong Huang
Chao-Han Huck Yang
Fangyu Liu
Meng Tian
Yi-Chieh Liu
...
Kang Wang
Hiromasa Morikawa
Hernghua Chang
Jesper N. Tegnér
M. Worring
MedIm
66
48
0
01 Nov 2020
Previous
123...383940...585960
Next