ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering
v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown
Title
Not All Relations are Equal: Mining Informative Labels for Scene Graph
  Generation
Not All Relations are Equal: Mining Informative Labels for Scene Graph Generation
A. Goel
Basura Fernando
Frank Keller
Hakan Bilen
108
33
0
26 Nov 2021
Confounder Identification-free Causal Visual Feature Learning
Confounder Identification-free Causal Visual Feature Learning
Xin Li
Zhizheng Zhang
Guoqiang Wei
Cuiling Lan
Wenjun Zeng
Xin Jin
Zhibo Chen
CMLOOD
106
14
0
26 Nov 2021
Two-stage Rule-induction Visual Reasoning on RPMs with an Application to
  Video Prediction
Two-stage Rule-induction Visual Reasoning on RPMs with an Application to Video Prediction
Wentao He
Jianfeng Ren
Ruibin Bai
Xudong Jiang
LRM
70
5
0
24 Nov 2021
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language
  Modeling
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Faisal Ahmed
Zicheng Liu
Yumao Lu
Lijuan Wang
146
117
0
23 Nov 2021
RedCaps: web-curated image-text data created by the people, for the
  people
RedCaps: web-curated image-text data created by the people, for the people
Karan Desai
Gaurav Kaul
Zubin Aysola
Justin Johnson
137
169
0
22 Nov 2021
DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion
DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion
Arthur Douillard
Alexandre Ramé
Guillaume Couairon
Matthieu Cord
CLL
149
314
0
22 Nov 2021
TraVLR: Now You See It, Now You Don't! A Bimodal Dataset for Evaluating
  Visio-Linguistic Reasoning
TraVLR: Now You See It, Now You Don't! A Bimodal Dataset for Evaluating Visio-Linguistic Reasoning
Keng Ji Chow
Samson Tan
MingSung Kan
LRM
65
4
0
21 Nov 2021
Medical Visual Question Answering: A Survey
Medical Visual Question Answering: A Survey
Zhihong Lin
Donghao Zhang
Qingyi Tao
Danli Shi
Gholamreza Haffari
Qi Wu
M. He
Z. Ge
114
122
0
19 Nov 2021
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained
  Embedding Matching
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Yaya Shi
Xu Yang
Haiyang Xu
Chunfen Yuan
Bing Li
Weiming Hu
Zhengjun Zha
82
33
0
17 Nov 2021
Achieving Human Parity on Visual Question Answering
Achieving Human Parity on Visual Question Answering
Ming Yan
Haiyang Xu
Chenliang Li
Junfeng Tian
Bin Bi
...
Ji Zhang
Songfang Huang
Fei Huang
Luo Si
Rong Jin
63
13
0
17 Nov 2021
Language bias in Visual Question Answering: A Survey and Taxonomy
Language bias in Visual Question Answering: A Survey and Taxonomy
Desen Yuan
103
13
0
16 Nov 2021
Sentiment Analysis of Fashion Related Posts in Social Media
Sentiment Analysis of Fashion Related Posts in Social Media
Yifei Yuan
W. Lam
70
8
0
15 Nov 2021
Visual Intelligence through Human Interaction
Visual Intelligence through Human Interaction
Ranjay Krishna
Mitchell L. Gordon
Fei-Fei Li
Michael S. Bernstein
74
8
0
12 Nov 2021
Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma
  Distributions
Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma Distributions
Huan Ma
Zongbo Han
Changqing Zhang
Huazhu Fu
Qiufeng Wang
Q. Hu
EDLUQCV
138
44
0
11 Nov 2021
A Survey of Visual Transformers
A Survey of Visual Transformers
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGSViT
203
356
0
11 Nov 2021
Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AI
Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AI
Jiangchao Yao
Shengyu Zhang
Yang Yao
Feng Wang
Jianxin Ma
...
Kun Kuang
Chao-Xiang Wu
Leilei Gan
Jingren Zhou
Hongxia Yang
167
103
0
11 Nov 2021
ICDAR 2021 Competition on Document VisualQuestion Answering
ICDAR 2021 Competition on Document VisualQuestion Answering
Rubèn Pérez Tito
Minesh Mathew
C. V. Jawahar
Ernest Valveny
Dimosthenis Karatzas
86
23
0
10 Nov 2021
Visual Question Answering based on Formal Logic
Visual Question Answering based on Formal Logic
Muralikrishnna G. Sethuraman
Ali Payani
Faramarz Fekri
J. C. Kerce
NAI
28
3
0
08 Nov 2021
NarrationBot and InfoBot: A Hybrid System for Automated Video Description
Shasta Ihorn
Y. Siu
Aditya Bodi
Lothar D Narins
Jose M. Castanon
Yash Kant
Abhishek Das
Ilmi Yoon
Pooyan Fazli
46
3
0
07 Nov 2021
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language
  Modeling
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
Renrui Zhang
Rongyao Fang
Wei Zhang
Peng Gao
Kunchang Li
Jifeng Dai
Yu Qiao
Hongsheng Li
VLM
292
405
0
06 Nov 2021
Medicines Question Answering System, MeQA
Medicines Question Answering System, MeQA
Jesús Santamaría
8
0
0
04 Nov 2021
An Empirical Study of Training End-to-End Vision-and-Language
  Transformers
An Empirical Study of Training End-to-End Vision-and-Language Transformers
Zi-Yi Dou
Yichong Xu
Zhe Gan
Jianfeng Wang
Shuohang Wang
...
Pengchuan Zhang
Lu Yuan
Nanyun Peng
Zicheng Liu
Michael Zeng
VLM
106
381
0
03 Nov 2021
Revisiting spatio-temporal layouts for compositional action recognition
Revisiting spatio-temporal layouts for compositional action recognition
Gorjan Radevski
Marie-Francine Moens
Tinne Tuytelaars
104
26
0
02 Nov 2021
Introspective Distillation for Robust Question Answering
Introspective Distillation for Robust Question Answering
Yulei Niu
Hanwang Zhang
94
60
0
01 Nov 2021
Dynamic Visual Reasoning by Learning Differentiable Physics Models from
  Video and Language
Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
Mingyu Ding
Zhenfang Chen
Tao Du
Ping Luo
J. Tenenbaum
Chuang Gan
VGenPINNOCL
103
75
0
28 Oct 2021
Perceptual Score: What Data Modalities Does Your Model Perceive?
Perceptual Score: What Data Modalities Does Your Model Perceive?
Itai Gat
Idan Schwartz
Alex Schwing
99
32
0
27 Oct 2021
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual
  Language Reasoning
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning
Pan Lu
Liang Qiu
Jiaqi Chen
Tony Xia
Yizhou Zhao
Wei Zhang
Zhou Yu
Xiaodan Liang
Song-Chun Zhu
AIMat
173
206
0
25 Oct 2021
Instance-Conditional Knowledge Distillation for Object Detection
Instance-Conditional Knowledge Distillation for Object Detection
Zijian Kang
Peizhen Zhang
Xinming Zhang
Jian Sun
N. Zheng
100
79
0
25 Oct 2021
Challenges in Procedural Multimodal Machine Comprehension:A Novel Way To
  Benchmark
Challenges in Procedural Multimodal Machine Comprehension:A Novel Way To Benchmark
Pritish Sahu
Karan Sikka
Ajay Divakaran
47
1
0
22 Oct 2021
Simple Dialogue System with AUDITED
Simple Dialogue System with AUDITED
Eugenio Clerico
Piotr Koniusz
75
2
0
22 Oct 2021
Single-Modal Entropy based Active Learning for Visual Question Answering
Single-Modal Entropy based Active Learning for Visual Question Answering
Dong-Jin Kim
Jae-Won Cho
Jinsoo Choi
Yunjae Jung
In So Kweon
63
12
0
21 Oct 2021
Evaluating and Improving Interactions with Hazy Oracles
Evaluating and Improving Interactions with Hazy Oracles
Stephan J. Lemmer
Jason J. Corso
41
2
0
19 Oct 2021
Unifying Multimodal Transformer for Bi-directional Image and Text
  Generation
Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Yupan Huang
Hongwei Xue
Bei Liu
Yutong Lu
81
59
0
19 Oct 2021
Towards Language-guided Visual Recognition via Dynamic Convolutions
Towards Language-guided Visual Recognition via Dynamic Convolutions
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Yongjian Wu
Yue Gao
Rongrong Ji
ObjD
98
19
0
17 Oct 2021
Explore before Moving: A Feasible Path Estimation and Memory Recalling
  Framework for Embodied Navigation
Explore before Moving: A Feasible Path Estimation and Memory Recalling Framework for Embodied Navigation
Yang Wu
Shirui Feng
Guanbin Li
Liang Lin
23
0
0
16 Oct 2021
Guiding Visual Question Generation
Guiding Visual Question Generation
Nihir Vedd
Zixu Wang
Marek Rei
Yishu Miao
Lucia Specia
140
22
0
15 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
434
1,115
0
13 Oct 2021
Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual
  Transformers with Joint Student-Teacher Learning
Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning
Ankit Parag Shah
Shijie Geng
Peng Gao
A. Cherian
Takaaki Hori
Tim K. Marks
Jonathan Le Roux
Chiori Hori
68
24
0
13 Oct 2021
Improving Users' Mental Model with Attention-directed Counterfactual
  Edits
Improving Users' Mental Model with Attention-directed Counterfactual Edits
Kamran Alipour
Arijit Ray
Xiaoyu Lin
Michael Cogswell
J. Schulze
Yi Yao
Giedrius Burachas
OOD
61
9
0
13 Oct 2021
Topic Scene Graph Generation by Attention Distillation from Caption
Topic Scene Graph Generation by Attention Distillation from Caption
Wenbin Wang
R. Wang
X. Chen
DiffM
94
14
0
12 Oct 2021
Supervision Exists Everywhere: A Data Efficient Contrastive
  Language-Image Pre-training Paradigm
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Yangguang Li
Feng Liang
Lichen Zhao
Yufeng Cui
Wanli Ouyang
Jing Shao
F. Yu
Junjie Yan
VLMCLIP
167
458
0
11 Oct 2021
Beyond Accuracy: A Consolidated Tool for Visual Question Answering
  Benchmarking
Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking
Dirk Vath
Pascal Tilli
Ngoc Thang Vu
82
4
0
11 Oct 2021
Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$
  Videos
Pano-AVQA: Grounded Audio-Visual Question Answering on 360∘^\circ∘ Videos
Heeseung Yun
Youngjae Yu
Wonsuk Yang
Kangil Lee
Gunhee Kim
100
86
0
11 Oct 2021
Calling to CNN-LSTM for Rumor Detection: A Deep Multi-channel Model for
  Message Veracity Classification in Microblogs
Calling to CNN-LSTM for Rumor Detection: A Deep Multi-channel Model for Message Veracity Classification in Microblogs
Abderrazek Azri
Cécile Favre
Nouria Harbi
Jérôme Darmont
C. Noûs
57
11
0
11 Oct 2021
Braxlines: Fast and Interactive Toolkit for RL-driven Behavior
  Engineering beyond Reward Maximization
Braxlines: Fast and Interactive Toolkit for RL-driven Behavior Engineering beyond Reward Maximization
S. Gu
Manfred Diaz
Daniel Freeman
Hiroki Furuta
Seyed Kamyar Seyed Ghasemipour
Anton Raichuk
Byron David
Erik Frey
Erwin Coumans
Olivier Bachem
82
14
0
10 Oct 2021
Efficient Multi-Modal Embeddings from Structured Data
Efficient Multi-Modal Embeddings from Structured Data
A. Vero
Ann A. Copestake
35
4
0
06 Oct 2021
Coarse-to-Fine Reasoning for Visual Question Answering
Coarse-to-Fine Reasoning for Visual Question Answering
Binh X. Nguyen
Tuong Khanh Long Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
A. Nguyen
NAI
138
40
0
06 Oct 2021
Word Acquisition in Neural Language Models
Word Acquisition in Neural Language Models
Tyler A. Chang
Benjamin Bergen
90
40
0
05 Oct 2021
Counterfactual Samples Synthesizing and Training for Robust Visual
  Question Answering
Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
Long Chen
Yuhang Zheng
Yulei Niu
Hanwang Zhang
Jun Xiao
AAMLOOD
119
37
0
03 Oct 2021
ProTo: Program-Guided Transformer for Program-Guided Tasks
ProTo: Program-Guided Transformer for Program-Guided Tasks
Zelin Zhao
Karan Samel
Binghong Chen
Le Song
ViTLM&Ro
98
30
0
02 Oct 2021
Previous
123...323334...585960
Next