ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering
v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown
Title
MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis
MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis
Georgios Paraskevopoulos
Efthymios Georgiou
Alexandros Potamianos
70
27
0
24 Jan 2022
Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal
  Grounding
Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding
Arjun Reddy Akula
OOD
116
3
0
24 Jan 2022
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Jianwei Yang
Xiyang Dai
Bin Xiao
Haoxuan You
Shih-Fu Chang
Lu Yuan
CLIPVLM
83
40
0
15 Jan 2022
Emergence of Machine Language: Towards Symbolic Intelligence with Neural
  Networks
Emergence of Machine Language: Towards Symbolic Intelligence with Neural Networks
Yuqi Wang
Xu-Yao Zhang
Cheng-Lin Liu
Zhaoxiang Zhang
57
2
0
14 Jan 2022
A Thousand Words Are Worth More Than a Picture: Natural Language-Centric
  Outside-Knowledge Visual Question Answering
A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering
Feng Gao
Q. Ping
Govind Thattai
Aishwarya N. Reganti
Yingting Wu
Premkumar Natarajan
74
17
0
14 Jan 2022
Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular
  Vision-Language Pre-training
Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training
Yehao Li
Jiahao Fan
Yingwei Pan
Ting Yao
Weiyao Lin
Tao Mei
MLLMObjD
81
19
0
11 Jan 2022
On the Efficacy of Co-Attention Transformer Layers in Visual Question
  Answering
On the Efficacy of Co-Attention Transformer Layers in Visual Question Answering
Ankur Sikarwar
Gabriel Kreiman
ViT
48
1
0
11 Jan 2022
Prior Knowledge Enhances Radiology Report Generation
Prior Knowledge Enhances Radiology Report Generation
Song Wang
Liyan Tang
Mingquan Lin
George Shih
Ying Ding
Yifan Peng
MedIm
67
24
0
11 Jan 2022
Language-driven Semantic Segmentation
Language-driven Semantic Segmentation
Boyi Li
Kilian Q. Weinberger
Serge Belongie
V. Koltun
René Ranftl
VLM
153
629
0
10 Jan 2022
COIN: Counterfactual Image Generation for VQA Interpretation
COIN: Counterfactual Image Generation for VQA Interpretation
Zeyd Boukhers
Timo Hartmann
Jan Jurjens
49
7
0
10 Jan 2022
Contrastive Neighborhood Alignment
Contrastive Neighborhood Alignment
Pengkai Zhu
Zhaowei Cai
Yuanjun Xiong
Zhuowen Tu
Luis Goncalves
Vijay Mahadevan
Stefano Soatto
49
3
0
06 Jan 2022
Multi Document Reading Comprehension
Multi Document Reading Comprehension
Avi Chawla
96
0
0
05 Jan 2022
Semantically Grounded Visual Embeddings for Zero-Shot Learning
Semantically Grounded Visual Embeddings for Zero-Shot Learning
Shah Nawaz
Jacopo Cavazza
Alessio Del Bue
ObjDFedMLVLM
105
3
0
03 Jan 2022
OpenQA: Hybrid QA System Relying on Structured Knowledge Base as well as
  Non-structured Data
OpenQA: Hybrid QA System Relying on Structured Knowledge Base as well as Non-structured Data
Gaochen Wu
Bin Xu
Yuxin Qin
Yang Liu
Lingyu Liu
Ziwei Wang
91
0
0
31 Dec 2021
VisRecall: Quantifying Information Visualisation Recallability via
  Question Answering
VisRecall: Quantifying Information Visualisation Recallability via Question Answering
Yao Wang
Chuhan Jiao
Mihai Bâce
Andreas Bulling
145
5
0
30 Dec 2021
Does CLIP Benefit Visual Question Answering in the Medical Domain as
  Much as it Does in the General Domain?
Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain?
Sedigheh Eslami
Gerard de Melo
Christoph Meinel
CLIPMedIm
84
121
0
27 Dec 2021
Multi-Image Visual Question Answering
Multi-Image Visual Question Answering
Harsh Raj
Janhavi Dadhania
Akhilesh Bhardwaj
Prabuchandran KJ
40
2
0
27 Dec 2021
Understanding and Measuring Robustness of Multimodal Learning
Understanding and Measuring Robustness of Multimodal Learning
Nishant Vishwamitra
Hongxin Hu
Ziming Zhao
Long Cheng
Feng Luo
AAML
86
5
0
22 Dec 2021
A Survey of Natural Language Generation
A Survey of Natural Language Generation
Chenhe Dong
Hai-Tao Zheng
Haifan Gong
Mengzhao Chen
Junxin Li
Ying Shen
Min Yang
3DV
89
45
0
22 Dec 2021
Comprehensive Visual Question Answering on Point Clouds through
  Compositional Scene Manipulation
Comprehensive Visual Question Answering on Point Clouds through Compositional Scene Manipulation
Xu Yan
Zhihao Yuan
Yuhao Du
Yinghong Liao
Yao Guo
Zhen Li
Shuguang Cui
3DPCCoGe
62
17
0
22 Dec 2021
Domain Adaptation with Pre-trained Transformers for Query Focused
  Abstractive Text Summarization
Domain Adaptation with Pre-trained Transformers for Query Focused Abstractive Text Summarization
Md Tahmid Rahman Laskar
Enamul Hoque
J. Huang
95
45
0
22 Dec 2021
Explainable Artificial Intelligence for Autonomous Driving: A
  Comprehensive Overview and Field Guide for Future Research Directions
Explainable Artificial Intelligence for Autonomous Driving: A Comprehensive Overview and Field Guide for Future Research Directions
Shahin Atakishiyev
Mohammad Salameh
Hengshuai Yao
Randy Goebel
114
138
0
21 Dec 2021
MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media
  Knowledge Extraction and Grounding
MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding
Revanth Reddy Gangi Reddy
Xilin Rui
Manling Li
Xudong Lin
Haoyang Wen
...
Joey Tianyi Zhou
Avirup Sil
Shih-Fu Chang
Alex Schwing
Heng Ji
82
32
0
20 Dec 2021
General Greedy De-bias Learning
General Greedy De-bias Learning
Xinzhe Han
Shuhui Wang
Chi Su
Qingming Huang
Qi Tian
109
9
0
20 Dec 2021
ScanQA: 3D Question Answering for Spatial Scene Understanding
ScanQA: 3D Question Answering for Spatial Scene Understanding
Daich Azuma
Taiki Miyanishi
Shuhei Kurita
M. Kawanabe
115
208
0
20 Dec 2021
Contrastive Vision-Language Pre-training with Limited Resources
Contrastive Vision-Language Pre-training with Limited Resources
Quan Cui
Boyan Zhou
Yu Guo
Weidong Yin
Hao Wu
Osamu Yoshie
Yubo Chen
VLMCLIP
53
34
0
17 Dec 2021
TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning
TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning
Shiming Chen
Zi-Quan Hong
Wenjin Hou
Guosen Xie
Yibing Song
Jian-jun Zhao
Xinge You
Shuicheng Yan
Ling Shao
ViT
101
47
0
16 Dec 2021
KAT: A Knowledge Augmented Transformer for Vision-and-Language
KAT: A Knowledge Augmented Transformer for Vision-and-Language
Liangke Gui
Borui Wang
Qiuyuan Huang
Alexander G. Hauptmann
Yonatan Bisk
Jianfeng Gao
88
162
0
16 Dec 2021
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense
  Reasoning
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
Zhecan Wang
Haoxuan You
Liunian Harold Li
Alireza Zareian
Suji Park
Yiqing Liang
Kai-Wei Chang
Shih-Fu Chang
ReLMLRM
69
33
0
16 Dec 2021
3D Question Answering
3D Question Answering
Shuquan Ye
Dongdong Chen
Songfang Han
Jing Liao
ViT
94
49
0
15 Dec 2021
Dual-Key Multimodal Backdoors for Visual Question Answering
Dual-Key Multimodal Backdoors for Visual Question Answering
Matthew Walmer
Karan Sikka
Indranil Sur
Abhinav Shrivastava
Susmit Jha
AAML
78
38
0
14 Dec 2021
Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in
  Visual Question Answering
Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
Jianjian Cao
Xiameng Qin
Sanyuan Zhao
Jianbing Shen
75
21
0
14 Dec 2021
CLIP-Lite: Information Efficient Visual Representation Learning with
  Language Supervision
CLIP-Lite: Information Efficient Visual Representation Learning with Language Supervision
A. Shrivastava
Ramprasaath R. Selvaraju
Nikhil Naik
Vicente Ordonez
VLMCLIP
66
6
0
14 Dec 2021
ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition
ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition
Xinyu Wang
Min Gui
Yong Jiang
Zixia Jia
Nguyen Bach
Tao Wang
Zhongqiang Huang
Fei Huang
Kewei Tu
97
55
0
13 Dec 2021
Change Detection Meets Visual Question Answering
Change Detection Meets Visual Question Answering
Zhenghang Yuan
Lichao Mou
Zhitong Xiong
Xiaoxiang Zhu
80
48
0
12 Dec 2021
Technical Language Supervision for Intelligent Fault Diagnosis in
  Process Industry
Technical Language Supervision for Intelligent Fault Diagnosis in Process Industry
Karl Lowenmark
C. Taal
S. Schnabel
Marcus Liwicki
Fredrik Sandin
52
7
0
11 Dec 2021
MAGMA -- Multimodal Augmentation of Generative Models through
  Adapter-based Finetuning
MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning
C. Eichenberg
Sid Black
Samuel Weinbach
Letitia Parcalabescu
Anette Frank
MLLMVLM
74
101
0
09 Dec 2021
PTR: A Benchmark for Part-based Conceptual, Relational, and Physical
  Reasoning
PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning
Yining Hong
Li Yi
J. Tenenbaum
Antonio Torralba
Chuang Gan
76
40
0
09 Dec 2021
MLP Architectures for Vision-and-Language Modeling: An Empirical Study
MLP Architectures for Vision-and-Language Modeling: An Empirical Study
Yi-Liang Nie
Linjie Li
Zhe Gan
Shuohang Wang
Chenguang Zhu
Michael Zeng
Zicheng Liu
Joey Tianyi Zhou
Lijuan Wang
66
6
0
08 Dec 2021
Question Answering Survey: Directions, Challenges, Datasets, Evaluation
  Matrices
Question Answering Survey: Directions, Challenges, Datasets, Evaluation Matrices
Hariom A. Pandya
Brijesh S. Bhatt
85
27
0
07 Dec 2021
MoCA: Incorporating Multi-stage Domain Pretraining and Cross-guided
  Multimodal Attention for Textbook Question Answering
MoCA: Incorporating Multi-stage Domain Pretraining and Cross-guided Multimodal Attention for Textbook Question Answering
Fangzhi Xu
Qika Lin
Jing Liu
Lingling Zhang
Tianzhe Zhao
Qianyi Chai
Yudai Pan
55
2
0
06 Dec 2021
Channel Exchanging Networks for Multimodal and Multitask Dense Image
  Prediction
Channel Exchanging Networks for Multimodal and Multitask Dense Image Prediction
Yikai Wang
Gang Hua
Wenbing Huang
Fengxiang He
Dacheng Tao
104
31
0
04 Dec 2021
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
Yongming Rao
Wenliang Zhao
Guangyi Chen
Yansong Tang
Zheng Zhu
Guan Huang
Jie Zhou
Jiwen Lu
VLMCLIP
230
584
0
02 Dec 2021
FNR: A Similarity and Transformer-Based Approach to Detect Multi-Modal
  Fake News in Social Media
FNR: A Similarity and Transformer-Based Approach to Detect Multi-Modal Fake News in Social Media
Faeze Ghorbanpour
Maryam Ramezani
M. A. Fazli
Hamid R. Rabiee
47
22
0
02 Dec 2021
Consensus Graph Representation Learning for Better Grounded Image
  Captioning
Consensus Graph Representation Learning for Better Grounded Image Captioning
Wenqiao Zhang
Haochen Shi
Siliang Tang
Jun Xiao
Qiang Yu
Yueting Zhuang
81
56
0
02 Dec 2021
Iconary: A Pictionary-Based Game for Testing Multimodal Communication
  with Drawings and Text
Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text
Christopher Clark
Jordi Salvador
Dustin Schwenk
Derrick Bonafilia
Mark Yatskar
...
Aaron Sarnat
Hannaneh Hajishirzi
Aniruddha Kembhavi
Oren Etzioni
Ali Farhadi
MLLM
57
5
0
01 Dec 2021
AssistSR: Task-oriented Video Segment Retrieval for Personal AI
  Assistant
AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant
Stan Weixian Lei
Difei Gao
Yuxuan Wang
Dongxing Mao
Zihan Liang
L. Ran
Mike Zheng Shou
69
8
0
30 Nov 2021
Classification-Regression for Chart Comprehension
Classification-Regression for Chart Comprehension
Matan Levy
Rami Ben-Ari
Dani Lischinski
67
16
0
29 Nov 2021
A Simple Long-Tailed Recognition Baseline via Vision-Language Model
A Simple Long-Tailed Recognition Baseline via Vision-Language Model
Teli Ma
Shijie Geng
Mengmeng Wang
Jing Shao
Jiasen Lu
Hongsheng Li
Peng Gao
Yu Qiao
VLM
120
47
0
29 Nov 2021
LAFITE: Towards Language-Free Training for Text-to-Image Generation
LAFITE: Towards Language-Free Training for Text-to-Image Generation
Yufan Zhou
Ruiyi Zhang
Changyou Chen
Chunyuan Li
Chris Tensmeyer
Tong Yu
Jiuxiang Gu
Jinhui Xu
Tong Sun
VLM
103
168
0
27 Nov 2021
Previous
123...313233...585960
Next