ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.00067
  4. Cited By
OK-VQA: A Visual Question Answering Benchmark Requiring External
  Knowledge
v1v2 (latest)

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

31 May 2019
Kenneth Marino
Mohammad Rastegari
Ali Farhadi
Roozbeh Mottaghi
ArXiv (abs)PDFHTML

Papers citing "OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge"

50 / 781 papers shown
Title
K-LITE: Learning Transferable Visual Models with External Knowledge
K-LITE: Learning Transferable Visual Models with External Knowledge
Sheng Shen
Chunyuan Li
Xiaowei Hu
Jianwei Yang
Yujia Xie
...
Ce Liu
Kurt Keutzer
Trevor Darrell
Anna Rohrbach
Jianfeng Gao
CLIPVLM
70
85
0
20 Apr 2022
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented
  Visual Models
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
Chunyuan Li
Haotian Liu
Liunian Harold Li
Pengchuan Zhang
J. Aneja
...
Ping Jin
Houdong Hu
Zicheng Liu
Yong Jae Lee
Jianfeng Gao
86
152
0
19 Apr 2022
Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog
Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog
Shunyu Zhang
X. Jiang
Zequn Yang
T. Wan
Zengchang Qin
60
12
0
10 Apr 2022
MuKEA: Multimodal Knowledge Extraction and Accumulation for
  Knowledge-based Visual Question Answering
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering
Yang Ding
Jing Yu
Bangchang Liu
Yue Hu
Mingxin Cui
Qi Wu
58
64
0
17 Mar 2022
K-VQG: Knowledge-aware Visual Question Generation for Common-sense
  Acquisition
K-VQG: Knowledge-aware Visual Question Generation for Common-sense Acquisition
Kohei Uehara
Tatsuya Harada
98
10
0
15 Mar 2022
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge
  Distillation
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
Wenliang Dai
Lu Hou
Lifeng Shang
Xin Jiang
Qun Liu
Pascale Fung
VLM
90
94
0
12 Mar 2022
REX: Reasoning-aware and Grounded Explanation
REX: Reasoning-aware and Grounded Explanation
Shi Chen
Qi Zhao
89
18
0
11 Mar 2022
PACTran: PAC-Bayesian Metrics for Estimating the Transferability of
  Pretrained Models to Classification Tasks
PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks
Nan Ding
Xi Chen
Tomer Levinboim
Soravit Changpinyo
Radu Soricut
79
29
0
10 Mar 2022
HyperPELT: Unified Parameter-Efficient Language Model Tuning for Both
  Language and Vision-and-Language Tasks
HyperPELT: Unified Parameter-Efficient Language Model Tuning for Both Language and Vision-and-Language Tasks
Zhengkun Zhang
Wenya Guo
Xiaojun Meng
Yasheng Wang
Yadao Wang
Xin Jiang
Qun Liu
Zhenglu Yang
80
17
0
08 Mar 2022
Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for
  Knowledge-based Visual Question Answering
Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering
Mingxiao Li
Marie-Francine Moens
82
13
0
06 Mar 2022
Joint Answering and Explanation for Visual Commonsense Reasoning
Joint Answering and Explanation for Visual Commonsense Reasoning
Zhenyang Li
Yangyang Guo
Ke-Jyun Wang
Yin-wei Wei
Liqiang Nie
Mohan S. Kankanhalli
67
17
0
25 Feb 2022
A Review on Methods and Applications in Multimodal Deep Learning
A Review on Methods and Applications in Multimodal Deep Learning
Summaira Jabeen
Xi Li
Muhammad Shoib Amin
Abdul Jabbar
VLMHAI
73
101
0
18 Feb 2022
Multi-Modal Knowledge Graph Construction and Application: A Survey
Multi-Modal Knowledge Graph Construction and Application: A Survey
Xiangru Zhu
Zhixu Li
Xiaodan Wang
Xueyao Jiang
Penglei Sun
Xuwu Wang
Yanghua Xiao
N. Yuan
73
167
0
11 Feb 2022
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive
  Reasoning
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning
Jack Hessel
Jena D. Hwang
Jinho Park
Rowan Zellers
Chandra Bhagavatula
Anna Rohrbach
Kate Saenko
Yejin Choi
ReLM
219
51
0
10 Feb 2022
Can Open Domain Question Answering Systems Answer Visual Knowledge
  Questions?
Can Open Domain Question Answering Systems Answer Visual Knowledge Questions?
Jiawen Zhang
Abhijit Mishra
Avinesh P.V.S
Siddharth Patwardhan
Sachin Agarwal
70
0
0
09 Feb 2022
NEWSKVQA: Knowledge-Aware News Video Question Answering
NEWSKVQA: Knowledge-Aware News Video Question Answering
Pranay Gupta
Manish Gupta
141
7
0
08 Feb 2022
A Thousand Words Are Worth More Than a Picture: Natural Language-Centric
  Outside-Knowledge Visual Question Answering
A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering
Feng Gao
Q. Ping
Govind Thattai
Aishwarya N. Reganti
Yingting Wu
Premkumar Natarajan
63
17
0
14 Jan 2022
Zero-shot and Few-shot Learning with Knowledge Graphs: A Comprehensive
  Survey
Zero-shot and Few-shot Learning with Knowledge Graphs: A Comprehensive Survey
Jiaoyan Chen
Yuxia Geng
Zhuo Chen
Jeff Z. Pan
Yuan He
Wen Zhang
Ian Horrocks
Hua-zeng Chen
119
49
0
18 Dec 2021
KAT: A Knowledge Augmented Transformer for Vision-and-Language
KAT: A Knowledge Augmented Transformer for Vision-and-Language
Liangke Gui
Borui Wang
Qiuyuan Huang
Alexander G. Hauptmann
Yonatan Bisk
Jianfeng Gao
75
162
0
16 Dec 2021
3D Question Answering
3D Question Answering
Shuquan Ye
Dongdong Chen
Songfang Han
Jing Liao
ViT
94
49
0
15 Dec 2021
Improving and Diagnosing Knowledge-Based Visual Question Answering via
  Entity Enhanced Knowledge Injection
Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection
Diego Garcia-Olano
Yasumasa Onoe
Joydeep Ghosh
69
18
0
13 Dec 2021
MAGMA -- Multimodal Augmentation of Generative Models through
  Adapter-based Finetuning
MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning
C. Eichenberg
Sid Black
Samuel Weinbach
Letitia Parcalabescu
Anette Frank
MLLMVLM
65
101
0
09 Dec 2021
Medical Visual Question Answering: A Survey
Medical Visual Question Answering: A Survey
Zhihong Lin
Donghao Zhang
Qingyi Tao
Danli Shi
Gholamreza Haffari
Qi Wu
M. He
Z. Ge
112
122
0
19 Nov 2021
Language bias in Visual Question Answering: A Survey and Taxonomy
Language bias in Visual Question Answering: A Survey and Taxonomy
Desen Yuan
86
13
0
16 Nov 2021
Transferring Domain-Agnostic Knowledge in Video Question Answering
Transferring Domain-Agnostic Knowledge in Video Question Answering
Tianran Wu
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
Haruo Takemura
53
8
0
26 Oct 2021
BEAMetrics: A Benchmark for Language Generation Evaluation Evaluation
BEAMetrics: A Benchmark for Language Generation Evaluation Evaluation
Thomas Scialom
Felix Hill
60
7
0
18 Oct 2021
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based
  Learning for Vision-Language Models
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
Woojeong Jin
Yu Cheng
Yelong Shen
Weizhu Chen
Xiang Ren
VLMVPVLMMLLM
117
138
0
16 Oct 2021
MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants
MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants
Alkesh Patel
Joel Ruben Antony Moniz
R. Nguyen
Nicholas Tzou
Hadas Kotek
Vincent Renkens
VGen
23
1
0
13 Oct 2021
Beyond Accuracy: A Consolidated Tool for Visual Question Answering
  Benchmarking
Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking
Dirk Vath
Pascal Tilli
Ngoc Thang Vu
72
4
0
11 Oct 2021
Coarse-to-Fine Reasoning for Visual Question Answering
Coarse-to-Fine Reasoning for Visual Question Answering
Binh X. Nguyen
Tuong Khanh Long Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
A. Nguyen
NAI
134
40
0
06 Oct 2021
Knowledge-based Embodied Question Answering
Knowledge-based Embodied Question Answering
Sinan Tan
Mengmeng Ge
Di Guo
Huaping Liu
F. Sun
96
23
0
16 Sep 2021
Image Captioning for Effective Use of Language Models in Knowledge-Based
  Visual Question Answering
Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering
Ander Salaberria
Gorka Azkune
Oier López de Lacalle
Aitor Soroa Etxabe
Eneko Agirre
92
61
0
15 Sep 2021
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Yumao Lu
Zicheng Liu
Lijuan Wang
269
423
0
10 Sep 2021
Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question
  Answering
Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering
Man Luo
Yankai Zeng
Pratyay Banerjee
Chitta Baral
RALM
131
66
0
09 Sep 2021
Weakly Supervised Relative Spatial Reasoning for Visual Question
  Answering
Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
LRM
83
19
0
04 Sep 2021
WebQA: Multihop and Multimodal QA
WebQA: Multihop and Multimodal QA
Yingshan Chang
M. Narang
Hisami Suzuki
Guihong Cao
Jianfeng Gao
Yonatan Bisk
LRM
78
87
0
01 Sep 2021
EKTVQA: Generalized use of External Knowledge to empower Scene Text in
  Text-VQA
EKTVQA: Generalized use of External Knowledge to empower Scene Text in Text-VQA
Arka Ujjal Dey
Ernest Valveny
Gaurav Harit
43
3
0
22 Aug 2021
BEHAVIOR: Benchmark for Everyday Household Activities in Virtual,
  Interactive, and Ecological Environments
BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments
S. Srivastava
Chengshu Li
Michael Lingelbach
Roberto Martín-Martín
Fei Xia
...
Chenxi Liu
Silvio Savarese
H. Gweon
Jiajun Wu
Li Fei-Fei
LM&Ro
251
168
0
06 Aug 2021
Communicative Learning with Natural Gestures for Embodied Navigation
  Agents with Human-in-the-Scene
Communicative Learning with Natural Gestures for Embodied Navigation Agents with Human-in-the-Scene
Qi Wu
Cheng-Ju Wu
Yixin Zhu
Jungseock Joo
97
14
0
05 Aug 2021
Zero-shot Visual Question Answering using Knowledge Graph
Zero-shot Visual Question Answering using Knowledge Graph
Zhuo Chen
Jiaoyan Chen
Yuxia Geng
Jeff Z. Pan
Zonggang Yuan
Huajun Chen
87
70
0
12 Jul 2021
Multimodal Few-Shot Learning with Frozen Language Models
Multimodal Few-Shot Learning with Frozen Language Models
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
197
791
0
25 Jun 2021
NAAQA: A Neural Architecture for Acoustic Question Answering
NAAQA: A Neural Architecture for Acoustic Question Answering
Jerome Abdelnour
Jean Rouat
G. Salvi
92
4
0
11 Jun 2021
Human-Adversarial Visual Question Answering
Human-Adversarial Visual Question Answering
Sasha Sheng
Amanpreet Singh
Vedanuj Goswami
Jose Alberto Lopez Magana
Wojciech Galuba
Devi Parikh
Douwe Kiela
OODEgoVAAML
55
63
0
04 Jun 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
229
59
0
24 May 2021
AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss
AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Feng Ji
Ji Zhang
A. Bimbo
71
35
0
05 May 2021
A survey on VQA_Datasets and Approaches
A survey on VQA_Datasets and Approaches
Yeyun Zou
Qiyu Xie
81
18
0
02 May 2021
Cross-Modal Retrieval Augmentation for Multi-Modal Classification
Cross-Modal Retrieval Augmentation for Multi-Modal Classification
Shir Gur
Natalia Neverova
C. Stauffer
Ser-Nam Lim
Douwe Kiela
A. Reiter
147
30
0
16 Apr 2021
Multi-Modal Answer Validation for Knowledge-Based VQA
Multi-Modal Answer Validation for Knowledge-Based VQA
Jialin Wu
Jiasen Lu
Ashish Sabharwal
Roozbeh Mottaghi
164
146
0
23 Mar 2021
Select, Substitute, Search: A New Benchmark for Knowledge-Augmented
  Visual Question Answering
Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering
Aman Jain
Mayank Kothyari
Vishwajeet Kumar
Preethi Jyothi
Ganesh Ramakrishnan
Soumen Chakrabarti
68
36
0
09 Mar 2021
Revamp: Enhancing Accessible Information Seeking Experience of Online
  Shopping for Blind or Low Vision Users
Revamp: Enhancing Accessible Information Seeking Experience of Online Shopping for Blind or Low Vision Users
Ruolin Wang
Zixuan Chen
MingruiRayZhang
Zhaoheng Li
Zhixiu Liu
Zihan Dang
Chun Yu
XiangAnthonyChen
OnRL
79
31
0
01 Feb 2021
Previous
123...141516
Next