ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering
v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown
Title
Vision and Structured-Language Pretraining for Cross-Modal Food
  Retrieval
Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval
Mustafa Shukor
Nicolas Thome
Matthieu Cord
CLIPCoGe
97
9
0
08 Dec 2022
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level
  Natural Language Explanations
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations
Björn Plüster
Jakob Ambsdorf
Lukas Braach
Jae Hee Lee
S. Wermter
83
6
0
08 Dec 2022
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal
  Dialogue Dataset
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset
Young-Jun Lee
ByungSoo Ko
Han-Gyu Kim
Jonghwan Hyeon
Ho-Jin Choi
91
8
0
08 Dec 2022
Going Beyond XAI: A Systematic Survey for Explanation-Guided Learning
Going Beyond XAI: A Systematic Survey for Explanation-Guided Learning
Yuyang Gao
Siyi Gu
Junji Jiang
S. Hong
Dazhou Yu
Liang Zhao
76
42
0
07 Dec 2022
See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation
See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation
Hao Li
Yizhi Zhang
Junzhe Zhu
Shaoxiong Wang
Michelle A. Lee
Huazhe Xu
Edward H. Adelson
Li Fei-Fei
Ruohan Gao
Jiajun Wu
77
65
0
07 Dec 2022
Named Entity and Relation Extraction with Multi-Modal Retrieval
Named Entity and Relation Extraction with Multi-Modal Retrieval
Xinyu Wang
Jiong Cai
Yong Jiang
Pengjun Xie
Kewei Tu
Wei Lu
90
52
0
03 Dec 2022
Generalizing Multiple Object Tracking to Unseen Domains by Introducing
  Natural Language Representation
Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation
En Yu
Songtao Liu
Zhuoling Li
Jinrong Yang
Zeming Li
Shoudong Han
Wenbing Tao
110
13
0
03 Dec 2022
Compound Tokens: Channel Fusion for Vision-Language Representation
  Learning
Compound Tokens: Channel Fusion for Vision-Language Representation Learning
Maxwell Mbabilla Aladago
A. Piergiovanni
75
2
0
02 Dec 2022
What do you MEME? Generating Explanations for Visual Semantic Role
  Labelling in Memes
What do you MEME? Generating Explanations for Visual Semantic Role Labelling in Memes
Shivam Sharma
Siddhant Agarwal
Tharun Suresh
Preslav Nakov
Md. Shad Akhtar
Tanmoy Charkraborty
VLM
111
22
0
01 Dec 2022
Localization vs. Semantics: Visual Representations in Unimodal and
  Multimodal Models
Localization vs. Semantics: Visual Representations in Unimodal and Multimodal Models
Zhuowan Li
Cihang Xie
Benjamin Van Durme
Alan Yuille
VLMSSL
56
2
0
01 Dec 2022
Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual
  Reasoning
Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning
Zhuowan Li
Xingrui Wang
Elias Stengel-Eskin
Adam Kortylewski
Wufei Ma
Benjamin Van Durme
Max Planck Institute for Informatics
OODLRM
112
70
0
01 Dec 2022
Improving Commonsense in Vision-Language Models via Knowledge Graph
  Riddles
Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
Shuquan Ye
Yujia Xie
Dongdong Chen
Yichong Xu
Lu Yuan
Chenguang Zhu
Jing Liao
VLM
68
12
0
29 Nov 2022
Abstract Visual Reasoning with Tangram Shapes
Abstract Visual Reasoning with Tangram Shapes
Anya Ji
Noriyuki Kojima
N. Rush
Alane Suhr
Wai Keen Vong
Robert D. Hawkins
Yoav Artzi
LRM
77
40
0
29 Nov 2022
PiggyBack: Pretrained Visual Question Answering Environment for Backing
  up Non-deep Learning Professionals
PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals
Zhihao Zhang
Siwen Luo
Junyi Chen
Sijia Lai
Siqu Long
Hyunsuk Chung
S. Han
46
1
0
29 Nov 2022
SLAN: Self-Locator Aided Network for Cross-Modal Understanding
SLAN: Self-Locator Aided Network for Cross-Modal Understanding
Jiang-Tian Zhai
Qi Zhang
Tong Wu
Xinghan Chen
Jiangjiang Liu
Bo Ren
Ming-Ming Cheng
ObjDVLM
70
1
0
28 Nov 2022
CLIP2GAN: Towards Bridging Text with the Latent Space of GANs
CLIP2GAN: Towards Bridging Text with the Latent Space of GANs
Yixuan Wang
Wen-gang Zhou
Jianmin Bao
Weilun Wang
Li Li
Houqiang Li
GANCLIP
68
6
0
28 Nov 2022
Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for
  3D Visual Grounding
Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding
Eslam Mohamed Bakr
Yasmeen Alsaedy
Mohamed Elhoseiny
3DPC
92
44
0
25 Nov 2022
Look, Read and Ask: Learning to Ask Questions by Reading Text in Images
Look, Read and Ask: Learning to Ask Questions by Reading Text in Images
Soumya Jahagirdar
Shankar Gangisetty
Anand Mishra
97
4
0
23 Nov 2022
Unified Multimodal Model with Unlikelihood Training for Visual Dialog
Unified Multimodal Model with Unlikelihood Training for Visual Dialog
Zihao Wang
Junli Wang
Changjun Jiang
MLLM
67
10
0
23 Nov 2022
Open-vocabulary Attribute Detection
Open-vocabulary Attribute Detection
M. A. Bravo
Sudhanshu Mittal
Simon Ging
Thomas Brox
VLMObjD
92
31
0
23 Nov 2022
Smart Agriculture : A Novel Multilevel Approach for Agricultural Risk
  Assessment over Unstructured Data
Smart Agriculture : A Novel Multilevel Approach for Agricultural Risk Assessment over Unstructured Data
Hasna Najmi
M. Mikram
Maryem Rhanoui
Siham Yousfi
77
0
0
22 Nov 2022
A Short Survey of Systematic Generalization
A Short Survey of Systematic Generalization
Yuanpeng Li
AI4CE
105
1
0
22 Nov 2022
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative
  Latent Attention
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Zineng Tang
Jaemin Cho
Jie Lei
Joey Tianyi Zhou
VLM
84
9
0
21 Nov 2022
Cross-Modal Contrastive Learning for Robust Reasoning in VQA
Cross-Modal Contrastive Learning for Robust Reasoning in VQA
Qinjie Zheng
Chaoyue Wang
Daqing Liu
Dadong Wang
Dacheng Tao
LRM
66
0
0
21 Nov 2022
Leveraging per Image-Token Consistency for Vision-Language Pre-training
Leveraging per Image-Token Consistency for Vision-Language Pre-training
Yunhao Gou
Tom Ko
Hansi Yang
James T. Kwok
Yu Zhang
Mingxuan Wang
VLM
78
11
0
20 Nov 2022
A survey on knowledge-enhanced multimodal learning
A survey on knowledge-enhanced multimodal learning
Maria Lymperaiou
Giorgos Stamou
174
15
0
19 Nov 2022
CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual
  Question Answering
CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering
Yao Zhang
Haokun Chen
A. Frikha
Yezi Yang
Denis Krompass
Gengyuan Zhang
Jindong Gu
Volker Tresp
VLMLRM
88
7
0
19 Nov 2022
Cross-Modal Adapter for Text-Video Retrieval
Cross-Modal Adapter for Text-Video Retrieval
Haojun Jiang
Jianke Zhang
Rui Huang
Chunjiang Ge
Zanlin Ni
Jiwen Lu
Jie Zhou
S. Song
Gao Huang
136
38
0
17 Nov 2022
Text-Aware Dual Routing Network for Visual Question Answering
Text-Aware Dual Routing Network for Visual Question Answering
Luoqian Jiang
Yifan He
Jian Chen
38
0
0
17 Nov 2022
AlignVE: Visual Entailment Recognition Based on Alignment Relations
AlignVE: Visual Entailment Recognition Based on Alignment Relations
Biwei Cao
Jiuxin Cao
Jie Gui
Jiayun Shen
Bo Liu
Lei He
Yuan Yan Tang
James T. Kwok
52
7
0
16 Nov 2022
MapQA: A Dataset for Question Answering on Choropleth Maps
MapQA: A Dataset for Question Answering on Choropleth Maps
Shuaichen Chang
David Palzer
Jialin Li
Eric Fosler-Lussier
N. Xiao
61
48
0
15 Nov 2022
Navigating Connected Memories with a Task-oriented Dialog System
Navigating Connected Memories with a Task-oriented Dialog System
Seungwhan Moon
Satwik Kottur
A. Geramifard
Babak Damavandi
51
2
0
15 Nov 2022
PromptCap: Prompt-Guided Task-Aware Image Captioning
PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu
Hang Hua
Zhengyuan Yang
Weijia Shi
Noah A. Smith
Jiebo Luo
127
106
0
15 Nov 2022
Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling
  Approaches
Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling Approaches
Daniel Fried
Nicholas Tomlin
Jennifer Hu
Roma Patel
Aida Nematzadeh
95
7
0
15 Nov 2022
Versatile Diffusion: Text, Images and Variations All in One Diffusion
  Model
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model
Xingqian Xu
Zhangyang Wang
Eric Zhang
Kai Wang
Humphrey Shi
DiffM
192
198
0
15 Nov 2022
Visually Grounded VQA by Lattice-based Retrieval
Visually Grounded VQA by Lattice-based Retrieval
Daniel Reich
F. Putze
Tanja Schultz
58
2
0
15 Nov 2022
Multi-VQG: Generating Engaging Questions for Multiple Images
Multi-VQG: Generating Engaging Questions for Multiple Images
Min-Hsuan Yeh
Vicent Chen
Ting-Hao Haung
Lun-Wei Ku
CoGe
115
7
0
14 Nov 2022
Towards Reasoning-Aware Explainable VQA
Towards Reasoning-Aware Explainable VQA
Rakesh Vaideeswaran
Feng Gao
Abhinav Mathur
Govind Thattai
LRM
85
3
0
09 Nov 2022
Multi-Stage Based Feature Fusion of Multi-Modal Data for Human Activity
  Recognition
Multi-Stage Based Feature Fusion of Multi-Modal Data for Human Activity Recognition
Hyeongju Choi
Apoorva Beedu
H. Haresamudram
Irfan Essa
62
5
0
08 Nov 2022
CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties
  via Video Question Answering
CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering
Maitreya Patel
Tejas Gokhale
Chitta Baral
Yezhou Yang
129
12
0
07 Nov 2022
Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content
  Dilutions
Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions
Gaurav Verma
Vishwa Vinay
Ryan A. Rossi
Srijan Kumar
VLMAAML
62
8
0
04 Nov 2022
lilGym: Natural Language Visual Reasoning with Reinforcement Learning
lilGym: Natural Language Visual Reasoning with Reinforcement Learning
Anne Wu
Kianté Brantley
Noriyuki Kojima
Yoav Artzi
ReLMOffRLLRM
132
4
0
03 Nov 2022
Globally Gated Deep Linear Networks
Globally Gated Deep Linear Networks
Qianyi Li
H. Sompolinsky
AI4CE
81
11
0
31 Oct 2022
Generative Negative Text Replay for Continual Vision-Language
  Pretraining
Generative Negative Text Replay for Continual Vision-Language Pretraining
Shipeng Yan
Lanqing Hong
Hang Xu
Jianhua Han
Tinne Tuytelaars
Zhenguo Li
Xuming He
VLMCLLCLIP
79
18
0
31 Oct 2022
Unsupervised Audio-Visual Lecture Segmentation
Unsupervised Audio-Visual Lecture Segmentation
Darshan Singh
Anchit Gupta
C. V. Jawahar
Makarand Tapaswi
VOS
75
5
0
29 Oct 2022
A Survey on Causal Representation Learning and Future Work for Medical
  Image Analysis
A Survey on Causal Representation Learning and Future Work for Medical Image Analysis
Chang-Tien Lu
OODBDLCMLMedIm
111
0
0
28 Oct 2022
Generalization Differences between End-to-End and Neuro-Symbolic
  Vision-Language Reasoning Systems
Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems
Wang Zhu
Jesse Thomason
Robin Jia
VLMOODNAILRM
58
6
0
26 Oct 2022
What's Different between Visual Question Answering for Machine
  "Understanding" Versus for Accessibility?
What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility?
Yang Trista Cao
Kyle Seelman
Kyungjun Lee
Hal Daumé
44
5
0
26 Oct 2022
Multi-Viewpoint and Multi-Evaluation with Felicitous Inductive Bias
  Boost Machine Abstract Reasoning Ability
Multi-Viewpoint and Multi-Evaluation with Felicitous Inductive Bias Boost Machine Abstract Reasoning Ability
Qinglai Wei
Diancheng Chen
Beiming Yuan
79
11
0
26 Oct 2022
Compressing And Debiasing Vision-Language Pre-Trained Models for Visual
  Question Answering
Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering
Q. Si
Yuanxin Liu
Zheng Lin
Peng Fu
Weiping Wang
VLM
120
1
0
26 Oct 2022
Previous
123...242526...585960
Next