ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering
v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown
Title
Learning to Predict Visual Attributes in the Wild
Learning to Predict Visual Attributes in the Wild
Khoi Pham
Kushal Kafle
Zhe Lin
Zhi Ding
Scott D. Cohen
Q. Tran
Abhinav Shrivastava
52
114
0
17 Jun 2021
$C^3$: Compositional Counterfactual Contrastive Learning for
  Video-grounded Dialogues
C3C^3C3: Compositional Counterfactual Contrastive Learning for Video-grounded Dialogues
Hung Le
Nancy F. Chen
Guosheng Lin
62
2
0
16 Jun 2021
How Modular Should Neural Module Networks Be for Systematic
  Generalization?
How Modular Should Neural Module Networks Be for Systematic Generalization?
Vanessa D’Amario
Tomotake Sasaki
Xavier Boix
69
17
0
15 Jun 2021
Vision-Language Navigation with Random Environmental Mixup
Vision-Language Navigation with Random Environmental Mixup
Chong Liu
Fengda Zhu
Xiaojun Chang
Xiaodan Liang
Zongyuan Ge
Yi-Dong Shen
LM&Ro
135
88
0
15 Jun 2021
Pre-Trained Models: Past, Present and Future
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFinMQAI4MH
179
864
0
14 Jun 2021
Step-Wise Hierarchical Alignment Network for Image-Text Matching
Step-Wise Hierarchical Alignment Network for Image-Text Matching
Zhong Ji
Kexin Chen
Haoran Wang
83
95
0
11 Jun 2021
NAAQA: A Neural Architecture for Acoustic Question Answering
NAAQA: A Neural Architecture for Acoustic Question Answering
Jerome Abdelnour
Jean Rouat
G. Salvi
92
4
0
11 Jun 2021
Supervising the Transfer of Reasoning Patterns in VQA
Supervising the Transfer of Reasoning Patterns in VQA
Corentin Kervadec
Christian Wolf
G. Antipov
M. Baccouche
Madiha Nadri Wolf
79
11
0
10 Jun 2021
PAM: Understanding Product Images in Cross Product Category Attribute
  Extraction
PAM: Understanding Product Images in Cross Product Category Attribute Extraction
Rongmei Lin
Xiang He
J. Feng
Nasser Zalmout
Yan Liang
Li Xiong
Xin Luna Dong
88
36
0
08 Jun 2021
Check It Again: Progressive Visual Question Answering via Visual
  Entailment
Check It Again: Progressive Visual Question Answering via Visual Entailment
Q. Si
Zheng Lin
Mingyu Zheng
Peng Fu
Weiping Wang
79
48
0
08 Jun 2021
Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused
  Interventions
Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions
Daniel Rosenberg
Itai Gat
Amir Feder
Roi Reichart
AAML
91
16
0
08 Jun 2021
Conversational Fashion Image Retrieval via Multiturn Natural Language
  Feedback
Conversational Fashion Image Retrieval via Multiturn Natural Language Feedback
Yifei Yuan
W. Lam
49
43
0
08 Jun 2021
Human-Adversarial Visual Question Answering
Human-Adversarial Visual Question Answering
Sasha Sheng
Amanpreet Singh
Vedanuj Goswami
Jose Alberto Lopez Magana
Wojciech Galuba
Devi Parikh
Douwe Kiela
OODEgoVAAML
58
63
0
04 Jun 2021
Visual Question Rewriting for Increasing Response Rate
Visual Question Rewriting for Increasing Response Rate
Jiayi Wei
Xilian Li
Yi Zhang
Xin Eric Wang
56
3
0
04 Jun 2021
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual
  Learning
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
Haiyang Xu
Ming Yan
Chenliang Li
Bin Bi
Songfang Huang
Wenming Xiao
Fei Huang
VLM
116
119
0
03 Jun 2021
Attention mechanisms and deep learning for machine vision: A survey of
  the state of the art
Attention mechanisms and deep learning for machine vision: A survey of the state of the art
A. M. Hafiz
S. A. Parah
R. A. Bhat
93
45
0
03 Jun 2021
Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA
  Models
Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models
Linjie Li
Jie Lei
Zhe Gan
Jingjing Liu
AAMLVLM
116
75
0
01 Jun 2021
Volta at SemEval-2021 Task 6: Towards Detecting Persuasive Texts and
  Images using Textual and Multimodal Ensemble
Volta at SemEval-2021 Task 6: Towards Detecting Persuasive Texts and Images using Textual and Multimodal Ensemble
Kshitij Gupta
Devansh Gautam
R. Mamidi
57
15
0
01 Jun 2021
Transfer Learning for Sequence Generation: from Single-source to
  Multi-source
Transfer Learning for Sequence Generation: from Single-source to Multi-source
Xuancheng Huang
Jingfang Xu
Maosong Sun
Yang Liu
61
5
0
31 May 2021
Longer Version for "Deep Context-Encoding Network for Retinal Image
  Captioning"
Longer Version for "Deep Context-Encoding Network for Retinal Image Captioning"
Jia-Hong Huang
Ting-Wei Wu
Chao-Han Huck Yang
Marcel Worring
MedIm
66
29
0
30 May 2021
LPF: A Language-Prior Feedback Objective Function for De-biased Visual
  Question Answering
LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question Answering
Zujie Liang
Haifeng Hu
Jiaying Zhu
99
38
0
29 May 2021
Maria: A Visual Experience Powered Conversational Agent
Maria: A Visual Experience Powered Conversational Agent
Zujie Liang
Huang Hu
Can Xu
Chongyang Tao
Xiubo Geng
Yining Chen
Fan Liang
Daxin Jiang
91
32
0
27 May 2021
CrystalCandle: A User-Facing Model Explainer for Narrative Explanations
CrystalCandle: A User-Facing Model Explainer for Narrative Explanations
Jilei Yang
Diana M. Negoescu
P. Ahammad
37
1
0
27 May 2021
Multi-Modal Semantic Inconsistency Detection in Social Media News Posts
Multi-Modal Semantic Inconsistency Detection in Social Media News Posts
S. McCrae
Kehan Wang
A. Zakhor
60
15
0
26 May 2021
CARLS: Cross-platform Asynchronous Representation Learning System
CARLS: Cross-platform Asynchronous Representation Learning System
Chun-Ta Lu
Yun Zeng
Da-Cheng Juan
Yicheng Fan
Zhe Li
...
Ariel Fuxman
Futang Peng
Zhen Li
Tom Duerig
Andrew Tomkins
30
0
0
26 May 2021
What data do we need for training an AV motion planner?
What data do we need for training an AV motion planner?
Long Chen
Lukas Platinsky
Stefanie Speichert
B. Osinski
Oliver Scheel
Yawei Ye
Hugo Grimmett
Luca Del Pero
Peter Ondruska
62
13
0
26 May 2021
Read, Listen, and See: Leveraging Multimodal Information Helps Chinese
  Spell Checking
Read, Listen, and See: Leveraging Multimodal Information Helps Chinese Spell Checking
Heng-Da Xu
Zhongli Li
Qingyu Zhou
Chao Li
Zizhen Wang
Yunbo Cao
Heyan Huang
Xian-Ling Mao
98
97
0
26 May 2021
ST-HOI: A Spatial-Temporal Baseline for Human-Object Interaction
  Detection in Videos
ST-HOI: A Spatial-Temporal Baseline for Human-Object Interaction Detection in Videos
Meng-Jiun Chiou
Chun-Yu Liao
Li-Wei Wang
Roger Zimmermann
Jiashi Feng
107
27
0
25 May 2021
Multi-modal Understanding and Generation for Medical Images and Text via
  Vision-Language Pre-Training
Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training
Jong Hak Moon
HyunGyung Lee
W. Shin
Young-Hak Kim
Edward Choi
MedIm
110
161
0
24 May 2021
Human-centric Relation Segmentation: Dataset and Solution
Human-centric Relation Segmentation: Dataset and Solution
Si Liu
Zitian Wang
Yulu Gao
Lejian Ren
Yue Liao
Guanghui Ren
Bo Li
Shuicheng Yan
43
12
0
24 May 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
237
59
0
24 May 2021
Geographic Question Answering: Challenges, Uniqueness, Classification,
  and Future Directions
Geographic Question Answering: Challenges, Uniqueness, Classification, and Future Directions
Gengchen Mai
K. Janowicz
Rui Zhu
Ling Cai
Ni Lao
61
62
0
19 May 2021
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
Junbin Xiao
Xindi Shang
Angela Yao
Tat-Seng Chua
187
507
0
18 May 2021
A Review on Explainability in Multimodal Deep Neural Nets
A Review on Explainability in Multimodal Deep Neural Nets
Gargi Joshi
Rahee Walambe
K. Kotecha
138
142
0
17 May 2021
Show Why the Answer is Correct! Towards Explainable AI using
  Compositional Temporal Attention
Show Why the Answer is Correct! Towards Explainable AI using Compositional Temporal Attention
Nihar Bendre
K. Desai
Peyman Najafirad
CoGe
74
6
0
15 May 2021
Premise-based Multimodal Reasoning: Conditional Inference on Joint
  Textual and Visual Clues
Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues
Qingxiu Dong
Ziwei Qin
Heming Xia
Tian Feng
Shoujie Tong
...
Weidong Zhan
Sujian Li
Zhongyu Wei
Tianyu Liu
Zuifang Sui
LRM
64
11
0
15 May 2021
Conversational AI Systems for Social Good: Opportunities and Challenges
Conversational AI Systems for Social Good: Opportunities and Challenges
Peng Qi
Jing Huang
Youzheng Wu
Xiaodong He
Bowen Zhou
86
5
0
13 May 2021
Designing Multimodal Datasets for NLP Challenges
Designing Multimodal Datasets for NLP Challenges
James Pustejovsky
E. Holderness
Jingxuan Tu
Parker Glenn
Kyeongmin Rim
Kelley Lynch
R. Brutti
56
5
0
12 May 2021
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language
  Matching
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching
Chenchi Zhang
Wenbo Ma
Jun Xiao
Hanwang Zhang
Jian Shao
Yueting Zhuang
Long Chen
86
4
0
12 May 2021
AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition
AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition
Yikang Shen
Chun-Fu Chen
Quanfu Fan
Ximeng Sun
Kate Saenko
A. Oliva
Rogerio Feris
97
50
0
11 May 2021
Cross-Modal Generative Augmentation for Visual Question Answering
Cross-Modal Generative Augmentation for Visual Question Answering
Zixu Wang
Yishu Miao
Lucia Specia
78
11
0
11 May 2021
gComm: An environment for investigating generalization in Grounded
  Language Acquisition
gComm: An environment for investigating generalization in Grounded Language Acquisition
Rishi Hazra
Sonu Dixit
75
0
0
09 May 2021
e-ViL: A Dataset and Benchmark for Natural Language Explanations in
  Vision-Language Tasks
e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
Maxime Kayser
Oana-Maria Camburu
Leonard Salewski
Cornelius Emde
Virginie Do
Zeynep Akata
Thomas Lukasiewicz
VLM
114
101
0
08 May 2021
AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss
AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Feng Ji
Ji Zhang
A. Bimbo
71
35
0
05 May 2021
LFI-CAM: Learning Feature Importance for Better Visual Explanation
LFI-CAM: Learning Feature Importance for Better Visual Explanation
Kwang Hee Lee
Chaewon Park
J. Oh
Nojun Kwak
FAtt
100
28
0
03 May 2021
A survey on VQA_Datasets and Approaches
A survey on VQA_Datasets and Approaches
Yeyun Zou
Qiyu Xie
81
18
0
02 May 2021
Discover the Unknown Biased Attribute of an Image Classifier
Discover the Unknown Biased Attribute of an Image Classifier
Zhiheng Li
Chenliang Xu
88
50
0
29 Apr 2021
Comparing Visual Reasoning in Humans and AI
Comparing Visual Reasoning in Humans and AI
Shravan Murlidaran
Wenjie Wang
Miguel P. Eckstein
63
1
0
29 Apr 2021
A First Look: Towards Explainable TextVQA Models via Visual and Textual
  Explanations
A First Look: Towards Explainable TextVQA Models via Visual and Textual Explanations
Varun Nagaraj Rao
Xingjian Zhen
K. Hovsepian
Mingwei Shen
97
19
0
29 Apr 2021
Contextualized Keyword Representations for Multi-modal Retinal Image
  Captioning
Contextualized Keyword Representations for Multi-modal Retinal Image Captioning
Jia-Hong Huang
Ting-Wei Wu
Marcel Worring
MedIm
125
27
0
26 Apr 2021
Previous
123...353637...585960
Next