ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.00067
  4. Cited By
OK-VQA: A Visual Question Answering Benchmark Requiring External
  Knowledge
v1v2 (latest)

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

31 May 2019
Kenneth Marino
Mohammad Rastegari
Ali Farhadi
Roozbeh Mottaghi
ArXiv (abs)PDFHTML

Papers citing "OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge"

50 / 781 papers shown
Title
From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language
  Models
From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models
Jiaxian Guo
Junnan Li
Dongxu Li
A. M. H. Tiong
Boyang Albert Li
Dacheng Tao
Steven C. H. Hoi
VLMMLLM
75
118
0
21 Dec 2022
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction
  Tuning
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
Zhiyang Xu
Ying Shen
Lifu Huang
MLLM
136
120
0
21 Dec 2022
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with
  Multi-Source Multimodal Knowledge Memory
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Ziniu Hu
Ahmet Iscen
Chen Sun
Zirui Wang
Kai-Wei Chang
Yizhou Sun
Cordelia Schmid
David A. Ross
Alireza Fathi
RALMVLM
98
96
0
10 Dec 2022
What do you MEME? Generating Explanations for Visual Semantic Role
  Labelling in Memes
What do you MEME? Generating Explanations for Visual Semantic Role Labelling in Memes
Shivam Sharma
Siddhant Agarwal
Tharun Suresh
Preslav Nakov
Md. Shad Akhtar
Tanmoy Charkraborty
VLM
96
22
0
01 Dec 2022
Improving Commonsense in Vision-Language Models via Knowledge Graph
  Riddles
Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
Shuquan Ye
Yujia Xie
Dongdong Chen
Yichong Xu
Lu Yuan
Chenguang Zhu
Jing Liao
VLM
66
12
0
29 Nov 2022
A survey on knowledge-enhanced multimodal learning
A survey on knowledge-enhanced multimodal learning
Maria Lymperaiou
Giorgos Stamou
155
15
0
19 Nov 2022
Visual Programming: Compositional visual reasoning without training
Visual Programming: Compositional visual reasoning without training
Tanmay Gupta
Aniruddha Kembhavi
ReLMVLMLRM
153
439
0
18 Nov 2022
Visual Commonsense-aware Representation Network for Video Captioning
Visual Commonsense-aware Representation Network for Video Captioning
Pengpeng Zeng
Haonan Zhang
Lianli Gao
Xiangpeng Li
Jin Qian
Hengtao Shen
83
18
0
17 Nov 2022
PromptCap: Prompt-Guided Task-Aware Image Captioning
PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu
Hang Hua
Zhengyuan Yang
Weijia Shi
Noah A. Smith
Jiebo Luo
111
106
0
15 Nov 2022
Towards Reasoning-Aware Explainable VQA
Towards Reasoning-Aware Explainable VQA
Rakesh Vaideeswaran
Feng Gao
Abhinav Mathur
Govind Thattai
LRM
83
3
0
09 Nov 2022
CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties
  via Video Question Answering
CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering
Maitreya Patel
Tejas Gokhale
Chitta Baral
Yezhou Yang
124
12
0
07 Nov 2022
What's Different between Visual Question Answering for Machine
  "Understanding" Versus for Accessibility?
What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility?
Yang Trista Cao
Kyle Seelman
Kyungjun Lee
Hal Daumé
41
5
0
26 Oct 2022
Compressing And Debiasing Vision-Language Pre-Trained Models for Visual
  Question Answering
Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering
Q. Si
Yuanxin Liu
Zheng Lin
Peng Fu
Weiping Wang
VLM
113
1
0
26 Oct 2022
VLC-BERT: Visual Question Answering with Contextualized Commonsense
  Knowledge
VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge
Sahithya Ravi
Aditya Chinchure
Leonid Sigal
Renjie Liao
Vered Shwartz
66
29
0
24 Oct 2022
Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual
  Question Answering
Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering
Jialin Wu
Raymond J. Mooney
RALM
131
11
0
18 Oct 2022
MoSE: Modality Split and Ensemble for Multimodal Knowledge Graph
  Completion
MoSE: Modality Split and Ensemble for Multimodal Knowledge Graph Completion
Yu Zhao
Xiangrui Cai
Yike Wu
Haiwei Zhang
Ying Zhang
Guoqing Zhao
Ning Jiang
53
26
0
17 Oct 2022
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models
  with Zero Training
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training
A. M. H. Tiong
Junnan Li
Boyang Albert Li
Silvio Savarese
Guosheng Lin
MLLM
104
109
0
17 Oct 2022
COFAR: Commonsense and Factual Reasoning in Image Search
COFAR: Commonsense and Factual Reasoning in Image Search
Prajwal Gatti
A. S. Penamakuri
Revant Teotia
Anand Mishra
Shubhashis Sengupta
Roshni Ramnani
ReLMLRM
34
4
0
16 Oct 2022
SQA3D: Situated Question Answering in 3D Scenes
SQA3D: Situated Question Answering in 3D Scenes
Xiaojian Ma
Silong Yong
Zilong Zheng
Qing Li
Yitao Liang
Song-Chun Zhu
Siyuan Huang
LM&Ro
91
160
0
14 Oct 2022
MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for
  Vision-Language Few-Shot Prompting
MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting
Oscar Manas
Pau Rodríguez López
Saba Ahmadi
Aida Nematzadeh
Yash Goyal
Aishwarya Agrawal
VLMVPVLM
63
51
0
13 Oct 2022
Retrieval Augmented Visual Question Answering with Outside Knowledge
Retrieval Augmented Visual Question Answering with Outside Knowledge
Weizhe Lin
Bill Byrne
RALM
105
77
0
07 Oct 2022
MuRAG: Multimodal Retrieval-Augmented Generator for Open Question
  Answering over Images and Text
MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text
Wenhu Chen
Hexiang Hu
Xi Chen
Pat Verga
William W. Cohen
RALM
100
160
0
06 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Vision+X: A Survey on Multimodal Learning in the Light of Data
Ye Zhu
Yuehua Wu
N. Sebe
Yan Yan
105
19
0
05 Oct 2022
LAVIS: A Library for Language-Vision Intelligence
LAVIS: A Library for Language-Vision Intelligence
Dongxu Li
Junnan Li
Hung Le
Guangsen Wang
Silvio Savarese
Guosheng Lin
VLM
192
56
0
15 Sep 2022
PaLI: A Jointly-Scaled Multilingual Language-Image Model
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Xi Chen
Tianlin Li
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLMVLM
187
741
0
14 Sep 2022
MaXM: Towards Multilingual Visual Question Answering
MaXM: Towards Multilingual Visual Question Answering
Soravit Changpinyo
Linting Xue
Michal Yarom
Ashish V. Thapliyal
Idan Szpektor
J. Amelot
Xi Chen
Radu Soricut
105
8
0
12 Sep 2022
Interactive Question Answering Systems: Literature Review
Interactive Question Answering Systems: Literature Review
Giovanni Maria Biancofiore
Yashar Deldjoo
Tommaso Di Noia
E. Sciascio
Fedelucio Narducci
109
22
0
04 Sep 2022
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA
  Task
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task
Stan Weixian Lei
Difei Gao
Jay Zhangjie Wu
Yuxuan Wang
Wei Liu
Meng Zhang
Mike Zheng Shou
71
38
0
24 Aug 2022
Uncertainty-based Visual Question Answering: Estimating Semantic
  Inconsistency between Image and Knowledge Base
Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base
Jinyeong Chae
Jihie Kim
52
2
0
27 Jul 2022
LaKo: Knowledge-driven Visual Question Answering via Late
  Knowledge-to-Text Injection
LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection
Zhuo Chen
Yufen Huang
Jiaoyan Chen
Yuxia Geng
Yin Fang
Jeff Z. Pan
Ningyu Zhang
Wen Zhang
95
38
0
26 Jul 2022
Visual Perturbation-aware Collaborative Learning for Overcoming the
  Language Prior Problem
Visual Perturbation-aware Collaborative Learning for Overcoming the Language Prior Problem
Yudong Han
Liqiang Nie
Jianhua Yin
Jianlong Wu
Yan Yan
81
14
0
24 Jul 2022
Semantic-aware Modular Capsule Routing for Visual Question Answering
Semantic-aware Modular Capsule Routing for Visual Question Answering
Yudong Han
Jianhua Yin
Jianlong Wu
Yin-wei Wei
Liqiang Nie
62
8
0
21 Jul 2022
Modern Question Answering Datasets and Benchmarks: A Survey
Modern Question Answering Datasets and Benchmarks: A Survey
Zhen Wang
85
23
0
30 Jun 2022
A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA
A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA
Yangyang Guo
Liqiang Nie
Yongkang Wong
Yebin Liu
Zhiyong Cheng
Mohan S. Kankanhalli
119
40
0
30 Jun 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjDVLMMLLM
160
412
0
17 Jun 2022
Language Models are General-Purpose Interfaces
Language Models are General-Purpose Interfaces
Y. Hao
Haoyu Song
Li Dong
Shaohan Huang
Zewen Chi
Wenhui Wang
Shuming Ma
Furu Wei
MLLM
73
102
0
13 Jun 2022
Visual Clues: Bridging Vision and Language Foundations for Image
  Paragraph Captioning
Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Yujia Xie
Luowei Zhou
Xiyang Dai
Lu Yuan
Nguyen Bach
Ce Liu
Michael Zeng
VLMMLLM
69
28
0
03 Jun 2022
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
Dustin Schwenk
Apoorv Khandelwal
Christopher Clark
Kenneth Marino
Roozbeh Mottaghi
74
556
0
03 Jun 2022
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual
  Question Answering
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
Yuanze Lin
Yujia Xie
Dongdong Chen
Yichong Xu
Chenguang Zhu
Lu Yuan
86
75
0
02 Jun 2022
Neural Retriever and Go Beyond: A Thesis Proposal
Neural Retriever and Go Beyond: A Thesis Proposal
Man Luo
100
1
0
31 May 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
172
562
0
27 May 2022
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally
  Spreading Out Disinformation
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally Spreading Out Disinformation
Jingnong Qu
Liunian Harold Li
Jieyu Zhao
Sunipa Dev
Kai-Wei Chang
64
12
0
25 May 2022
VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks
  for Visual Question Answering
VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering
Yanan Wang
Michihiro Yasunaga
Hongyu Ren
Shinya Wada
J. Leskovec
78
18
0
23 May 2022
Gender and Racial Bias in Visual Question Answering Datasets
Gender and Racial Bias in Visual Question Answering Datasets
Yusuke Hirota
Yuta Nakashima
Noa Garcia
FaML
187
55
0
17 May 2022
TIE: Topological Information Enhanced Structural Reading Comprehension
  on Web Pages
TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages
Zihan Zhao
Lu Chen
Ruisheng Cao
Hongshen Xu
Xingyu Chen
Kai Yu
83
9
0
13 May 2022
A Generalist Agent
A Generalist Agent
Scott E. Reed
Konrad Zolna
Emilio Parisotto
Sergio Gomez Colmenarejo
Alexander Novikov
...
Yutian Chen
R. Hadsell
Oriol Vinyals
Mahyar Bordbar
Nando de Freitas
LM&RoLLMAGAI4CE
213
827
0
12 May 2022
All You May Need for VQA are Image Captions
All You May Need for VQA are Image Captions
Soravit Changpinyo
Doron Kukliansky
Idan Szpektor
Xi Chen
Nan Ding
Radu Soricut
101
76
0
04 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
420
3,617
0
29 Apr 2022
Hypergraph Transformer: Weakly-supervised Multi-hop Reasoning for
  Knowledge-based Visual Question Answering
Hypergraph Transformer: Weakly-supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering
Y. Heo
Eun-Sol Kim
Woo Suk Choi
Byoung-Tak Zhang
65
28
0
22 Apr 2022
Attention in Reasoning: Dataset, Analysis, and Modeling
Attention in Reasoning: Dataset, Analysis, and Modeling
Shi Chen
Ming Jiang
Jinhui Yang
Qi Zhao
LRM
48
3
0
20 Apr 2022
Previous
123...13141516
Next