ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering
v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown
Title
Asking questions on handwritten document collections
Asking questions on handwritten document collections
Minesh Mathew
Lluís Gómez
Dimosthenis Karatzas
C. V. Jawahar
RALM
130
11
0
02 Oct 2021
Collecting and Characterizing Natural Language Utterances for Specifying
  Data Visualizations
Collecting and Characterizing Natural Language Utterances for Specifying Data Visualizations
Arjun Srinivasan
Nikhila Nyapathy
Bongshin Lee
Steven Drucker
J. Stasko
108
75
0
01 Oct 2021
Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real
  Images
Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images
Zhuowan Li
Elias Stengel-Eskin
Yixiao Zhang
Cihang Xie
Q. Tran
Benjamin Van Durme
Alan Yuille
VLM
73
15
0
01 Oct 2021
HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning
HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning
Shiming Chen
Guosen Xie
Yang Liu
Qinmu Peng
Baigui Sun
Hao Li
Xinge You
Ling Shao
146
128
0
30 Sep 2021
Visually Grounded Reasoning across Languages and Cultures
Visually Grounded Reasoning across Languages and Cultures
Fangyu Liu
Emanuele Bugliarello
Edoardo Ponti
Siva Reddy
Nigel Collier
Desmond Elliott
VLMLRM
173
180
0
28 Sep 2021
Multimodal Integration of Human-Like Attention in Visual Question
  Answering
Multimodal Integration of Human-Like Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Philippe Muller
Dominike Thomas
Mihai Bâce
Andreas Bulling
66
17
0
27 Sep 2021
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual
  Question Answering
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Florian Strohm
Prajit Dhar
Andreas Bulling
67
19
0
27 Sep 2021
The JDDC 2.0 Corpus: A Large-Scale Multimodal Multi-Turn Chinese
  Dialogue Dataset for E-commerce Customer Service
The JDDC 2.0 Corpus: A Large-Scale Multimodal Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service
Nan Zhao
Haoran Li
Youzheng Wu
Xiaodong He
Bowen Zhou
50
9
0
27 Sep 2021
An animated picture says at least a thousand words: Selecting Gif-based
  Replies in Multimodal Dialog
An animated picture says at least a thousand words: Selecting Gif-based Replies in Multimodal Dialog
Xingyao Wang
David Jurgens
68
5
0
24 Sep 2021
How to find a good image-text embedding for remote sensing visual
  question answering?
How to find a good image-text embedding for remote sensing visual question answering?
Christel Chappuis
Sylvain Lobry
B. Kellenberger
Bertrand Le Saux
D. Tuia
87
20
0
24 Sep 2021
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao
Ao Zhang
Zhengyan Zhang
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
MLLMVPVLMVLM
304
224
0
24 Sep 2021
Caption Enriched Samples for Improving Hateful Memes Detection
Caption Enriched Samples for Improving Hateful Memes Detection
Efrat Blaier
Itzik Malkiel
Lior Wolf
VLM
96
24
0
22 Sep 2021
COVR: A test-bed for Visually Grounded Compositional Generalization with
  real images
COVR: A test-bed for Visually Grounded Compositional Generalization with real images
Ben Bogin
Shivanshu Gupta
Matt Gardner
Jonathan Berant
CoGe
105
29
0
22 Sep 2021
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object
  Knowledge Distillation
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation
Yongfei Liu
Chenfei Wu
Shao-Yen Tseng
Vasudev Lal
Xuming He
Nan Duan
CLIPVLM
110
29
0
22 Sep 2021
CONQUER: Contextual Query-aware Ranking for Video Corpus Moment
  Retrieval
CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval
Zhijian Hou
Chong-Wah Ngo
W. Chan
71
44
0
21 Sep 2021
A Survey on Temporal Sentence Grounding in Videos
A Survey on Temporal Sentence Grounding in Videos
Xiaohan Lan
Yitian Yuan
Xin Eric Wang
Zhi Wang
Wenwu Zhu
123
47
0
16 Sep 2021
Knowledge-based Embodied Question Answering
Knowledge-based Embodied Question Answering
Sinan Tan
Mengmeng Ge
Di Guo
Huaping Liu
F. Sun
96
23
0
16 Sep 2021
Image Captioning for Effective Use of Language Models in Knowledge-Based
  Visual Question Answering
Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering
Ander Salaberria
Gorka Azkune
Oier López de Lacalle
Aitor Soroa Etxabe
Eneko Agirre
101
61
0
15 Sep 2021
Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning
Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning
Da Yin
Liunian Harold Li
Ziniu Hu
Nanyun Peng
Kai-Wei Chang
159
56
0
14 Sep 2021
Discovering the Unknown Knowns: Turning Implicit Knowledge in the
  Dataset into Explicit Training Examples for Visual Question Answering
Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
Jihyung Kil
Cheng Zhang
D. Xuan
Wei-Lun Chao
116
20
0
13 Sep 2021
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Yumao Lu
Zicheng Liu
Lijuan Wang
299
423
0
10 Sep 2021
Panoptic Narrative Grounding
Panoptic Narrative Grounding
Cristina González
Nicolás Ayobi
Isabela Hernández
José Hernández
Jordi Pont-Tuset
Pablo Arbeláez
146
23
0
10 Sep 2021
We went to look for meaning and all we got were these lousy
  representations: aspects of meaning representation for computational
  semantics
We went to look for meaning and all we got were these lousy representations: aspects of meaning representation for computational semantics
Simon Dobnik
R. Cooper
Adam Ek
Bill Noble
Staffan Larsson
N. Ilinykh
Vladislav Maraev
Vidya Somashekarappa
66
0
0
10 Sep 2021
Towards Developing a Multilingual and Code-Mixed Visual Question
  Answering System by Knowledge Distillation
Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation
H. Khan
D. Gupta
Asif Ekbal
57
14
0
10 Sep 2021
TxT: Crossmodal End-to-End Learning with Transformers
TxT: Crossmodal End-to-End Learning with Transformers
Jan-Martin O. Steitz
Jonas Pfeiffer
Iryna Gurevych
Stefan Roth
LRM
33
2
0
09 Sep 2021
M5Product: Self-harmonized Contrastive Learning for E-commercial
  Multi-modal Pretraining
M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining
Xiao Dong
Xunlin Zhan
Yangxin Wu
Yunchao Wei
Michael C. Kampffmeyer
Xiaoyong Wei
Minlong Lu
Yaowei Wang
Xiaodan Liang
118
38
0
09 Sep 2021
Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question
  Answering
Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering
Man Luo
Yankai Zeng
Pratyay Banerjee
Chitta Baral
RALM
131
66
0
09 Sep 2021
Exploration of Quantum Neural Architecture by Mixing Quantum Neuron
  Designs
Exploration of Quantum Neural Architecture by Mixing Quantum Neuron Designs
Zhepeng Wang
Zhiding Liang
Shangli Zhou
Caiwen Ding
Yiyu Shi
Weiwen Jiang
121
33
0
08 Sep 2021
Vision Guided Generative Pre-trained Language Models for Multimodal
  Abstractive Summarization
Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization
Tiezheng Yu
Wenliang Dai
Zihan Liu
Pascale Fung
105
74
0
06 Sep 2021
Improved RAMEN: Towards Domain Generalization for Visual Question
  Answering
Improved RAMEN: Towards Domain Generalization for Visual Question Answering
Bhanuka Gamage
Lim Chern Hong
77
1
0
06 Sep 2021
Weakly Supervised Relative Spatial Reasoning for Visual Question
  Answering
Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
LRM
85
19
0
04 Sep 2021
WebQA: Multihop and Multimodal QA
WebQA: Multihop and Multimodal QA
Yingshan Chang
M. Narang
Hisami Suzuki
Guihong Cao
Jianfeng Gao
Yonatan Bisk
LRM
91
87
0
01 Sep 2021
Spatio-Temporal Perturbations for Video Attribution
Spatio-Temporal Perturbations for Video Attribution
Zhenqiang Li
Weimin Wang
Zuoyue Li
Yifei Huang
Yoichi Sato
60
6
0
01 Sep 2021
CTAL: Pre-training Cross-modal Transformer for Audio-and-Language
  Representations
CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations
Hang Li
Yunxing Kang
Tianqiao Liu
Wenbiao Ding
Zitao Liu
73
19
0
01 Sep 2021
On the Significance of Question Encoder Sequence Model in the
  Out-of-Distribution Performance in Visual Question Answering
On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering
K. Gouthaman
Anurag Mittal
CML
77
0
0
28 Aug 2021
QACE: Asking Questions to Evaluate an Image Caption
QACE: Asking Questions to Evaluate an Image Caption
Hwanhee Lee
Thomas Scialom
Seunghyun Yoon
Franck Dernoncourt
Kyomin Jung
CoGe
87
19
0
28 Aug 2021
Vision-Language Navigation: A Survey and Taxonomy
Vision-Language Navigation: A Survey and Taxonomy
Wansen Wu
Tao Chang
Xinmeng Li
LM&Ro
73
24
0
26 Aug 2021
OOWL500: Overcoming Dataset Collection Bias in the Wild
OOWL500: Overcoming Dataset Collection Bias in the Wild
Brandon Leung
Chih-Hui Ho
Amir Persekian
David Orozco
Yen Chang
Erik Sandström
Bo Liu
Nuno Vasconcelos
65
3
0
24 Aug 2021
Auto-Parsing Network for Image Captioning and Visual Question Answering
Auto-Parsing Network for Image Captioning and Visual Question Answering
Xu Yang
Chongyang Gao
Hanwang Zhang
Jianfei Cai
117
37
0
24 Aug 2021
Maximum Likelihood Estimation for Multimodal Learning with Missing
  Modality
Maximum Likelihood Estimation for Multimodal Learning with Missing Modality
Fei Ma
Xiangxiang Xu
Shao-Lun Huang
Lin Zhang
137
11
0
24 Aug 2021
Embodied AI-Driven Operation of Smart Cities: A Concise Review
Embodied AI-Driven Operation of Smart Cities: A Concise Review
Farzan Shenavarmasouleh
F. Mohammadi
M. Amini
H. Arabnia
94
8
0
22 Aug 2021
EKTVQA: Generalized use of External Knowledge to empower Scene Text in
  Text-VQA
EKTVQA: Generalized use of External Knowledge to empower Scene Text in Text-VQA
Arka Ujjal Dey
Ernest Valveny
Gaurav Harit
43
3
0
22 Aug 2021
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Ming Yan
Haiyang Xu
Chenliang Li
Bin Bi
Junfeng Tian
Min Gui
Wei Wang
VLM
62
10
0
21 Aug 2021
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Pierre-Louis Guhur
Makarand Tapaswi
Shizhe Chen
Ivan Laptev
Cordelia Schmid
LM&Ro
59
144
0
20 Aug 2021
Semantic Compositional Learning for Low-shot Scene Graph Generation
Semantic Compositional Learning for Low-shot Scene Graph Generation
Tao He
Lianli Gao
Jingkuan Song
Jianfei Cai
Yuan-Fang Li
CoGe
88
8
0
19 Aug 2021
Social Fabric: Tubelet Compositions for Video Relation Detection
Social Fabric: Tubelet Compositions for Video Relation Detection
Shuo Chen
Zenglin Shi
Pascal Mettes
Cees G. M. Snoek
ViT
83
21
0
18 Aug 2021
X-modaler: A Versatile and High-performance Codebase for Cross-modal
  Analytics
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
Yehao Li
Yingwei Pan
Jingwen Chen
Ting Yao
Tao Mei
VLM
88
31
0
18 Aug 2021
A Game Interface to Study Semantic Grounding in Text-Based Models
A Game Interface to Study Semantic Grounding in Text-Based Models
Timothee Mickus
Mathieu Constant
Denis Paperno
18
0
0
17 Aug 2021
Indoor Semantic Scene Understanding using Multi-modality Fusion
Indoor Semantic Scene Understanding using Multi-modality Fusion
Muraleekrishna Gopinathan
Giang Truong
Jumana Abu-Khalaf
56
0
0
17 Aug 2021
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and
  Intra-modal Knowledge Integration
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
77
56
0
16 Aug 2021
Previous
123...333435...585960
Next