ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.00067
  4. Cited By
OK-VQA: A Visual Question Answering Benchmark Requiring External
  Knowledge
v1v2 (latest)

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

31 May 2019
Kenneth Marino
Mohammad Rastegari
Ali Farhadi
Roozbeh Mottaghi
ArXiv (abs)PDFHTML

Papers citing "OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge"

50 / 781 papers shown
Title
PaLI-X: On Scaling up a Multilingual Vision and Language Model
PaLI-X: On Scaling up a Multilingual Vision and Language Model
Xi Chen
Josip Djolonga
Piotr Padlewski
Basil Mustafa
Soravit Changpinyo
...
Mojtaba Seyedhosseini
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
VLM
152
203
0
29 May 2023
Z-GMOT: Zero-shot Generic Multiple Object Tracking
Z-GMOT: Zero-shot Generic Multiple Object Tracking
Kim Hoang Tran
Anh Duy Le Dinh
Tien-Phat Nguyen
Thinh Phan
Pha Nguyen
Khoa Luu
Don Adjeroh
Gianfranco Doretto
Ngan Hoang Le
VOT
83
7
0
28 May 2023
KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature
  Adaptation of Vision-Language Models
KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models
Zhiwei Jia
P. Narayana
Arjun Reddy Akula
G. Pruthi
Haoran Su
Sugato Basu
Varun Jampani
VLMOffRL
79
4
0
28 May 2023
Zero-shot Visual Question Answering with Language Model Feedback
Zero-shot Visual Question Answering with Language Model Feedback
Yifan Du
Junyi Li
Tianyi Tang
Wayne Xin Zhao
Ji-Rong Wen
61
16
0
26 May 2023
ChatBridge: Bridging Modalities with Large Language Model as a Language
  Catalyst
ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Zijia Zhao
Longteng Guo
Tongtian Yue
Si-Qing Chen
Shuai Shao
Xinxin Zhu
Zehuan Yuan
Jing Liu
MLLM
111
61
0
25 May 2023
MEMEX: Detecting Explanatory Evidence for Memes via Knowledge-Enriched
  Contextualization
MEMEX: Detecting Explanatory Evidence for Memes via Knowledge-Enriched Contextualization
Shivam Sharma
S Ramaneswaran
Udit Arora
Md. Shad Akhtar
Tanmoy Chakraborty
77
9
0
25 May 2023
MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal
  Image Generation
MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation
Marco Bellagente
Manuel Brack
H. Teufel
Felix Friedrich
Bjorn Deiseroth
...
Koen Oostermeijer
Andres Felipe Cruz Salinas
P. Schramowski
Kristian Kersting
Samuel Weinbach
141
20
0
24 May 2023
The Art of SOCRATIC QUESTIONING: Recursive Thinking with Large Language
  Models
The Art of SOCRATIC QUESTIONING: Recursive Thinking with Large Language Models
Jingyuan Qi
Zhiyang Xu
Ying Shen
Minqian Liu
dingnan jin
Qifan Wang
Lifu Huang
ReLMLRMKELM
56
13
0
24 May 2023
Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual
  Question Answering
Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering
Xingyu Fu
Ben Zhou
Sihao Chen
Mark Yatskar
Dan Roth
LRM
61
0
0
24 May 2023
Images in Language Space: Exploring the Suitability of Large Language
  Models for Vision & Language Tasks
Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks
Sherzod Hakimov
David Schlangen
VLM
61
5
0
23 May 2023
Enhance Reasoning Ability of Visual-Language Models via Large Language
  Models
Enhance Reasoning Ability of Visual-Language Models via Large Language Models
Yueting Yang
Xintong Zhang
Wenjuan Han
VLMReLMLRM
56
1
0
22 May 2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner
  and Dense Captioner
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
Qingbin Liu
58
1
0
19 May 2023
Visual Question Answering: A Survey on Techniques and Common Trends in
  Recent Literature
Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature
Ana Claudia Akemi Matsuki de Faria
Felype de Castro Bastos
Jose Victor Nogueira Alves da Silva
Vitor Lopes Fabris
Valeska Uchôa
Décio Gonccalves de Aguiar Neto
C. F. G. Santos
68
27
0
18 May 2023
Evaluating Object Hallucination in Large Vision-Language Models
Evaluating Object Hallucination in Large Vision-Language Models
Yifan Li
Yifan Du
Kun Zhou
Jinpeng Wang
Wayne Xin Zhao
Ji-Rong Wen
MLLMLRM
337
815
0
17 May 2023
InstructBLIP: Towards General-purpose Vision-Language Models with
  Instruction Tuning
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLMVLM
185
2,101
0
11 May 2023
Combo of Thinking and Observing for Outside-Knowledge VQA
Combo of Thinking and Observing for Outside-Knowledge VQA
Q. Si
Yuchen Mo
Zheng Lin
Huishan Ji
Weiping Wang
95
14
0
10 May 2023
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans
T. Gong
Chengqi Lyu
Shilong Zhang
Yudong Wang
Miao Zheng
Qianmengke Zhao
Kuikun Liu
Wenwei Zhang
Ping Luo
Kai-xiang Chen
MLLM
103
273
0
08 May 2023
LMEye: An Interactive Perception Network for Large Language Models
LMEye: An Interactive Perception Network for Large Language Models
Yunxin Li
Baotian Hu
Xinyu Chen
Lin Ma
Yong-mei Xu
Hao Fei
MLLMVLM
91
28
0
05 May 2023
Visual Reasoning: from State to Transformation
Visual Reasoning: from State to Transformation
Xin Hong
Yanyan Lan
Liang Pang
Jiafeng Guo
Xueqi Cheng
LRM
53
4
0
02 May 2023
VPGTrans: Transfer Visual Prompt Generator across LLMs
VPGTrans: Transfer Visual Prompt Generator across LLMs
Ao Zhang
Hao Fei
Yuan Yao
Wei Ji
Li Li
Zhiyuan Liu
Tat-Seng Chua
MLLMVLM
82
89
0
02 May 2023
Multimodal Graph Transformer for Multimodal Question Answering
Multimodal Graph Transformer for Multimodal Question Answering
Xuehai He
Xin Eric Wang
81
9
0
30 Apr 2023
Retrieval-based Knowledge Augmented Vision Language Pre-training
Retrieval-based Knowledge Augmented Vision Language Pre-training
Jiahua Rao
Zifei Shan
Long Liu
Yao Zhou
Yuedong Yang
VLM
163
14
0
27 Apr 2023
A Symmetric Dual Encoding Dense Retrieval Framework for
  Knowledge-Intensive Visual Question Answering
A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering
Alireza Salemi
Juan Altmayer Pizzorno
Hamed Zamani
38
15
0
26 Apr 2023
AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia
  Content Creation
AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation
Jheng-Hong Yang
Carlos Lassance
Rafael Sampaio de Rezende
Krishna Srinivasan
Miriam Redi
Stéphane Clinchant
Jimmy J. Lin
81
12
0
04 Apr 2023
IRFL: Image Recognition of Figurative Language
IRFL: Image Recognition of Figurative Language
Ron Yosef
Yonatan Bitton
Dafna Shahaf
92
20
0
27 Mar 2023
Task-Attentive Transformer Architecture for Continual Learning of
  Vision-and-Language Tasks Using Knowledge Distillation
Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation
Yuliang Cai
Jesse Thomason
Mohammad Rostami
VLMCLL
76
12
0
25 Mar 2023
FVQA 2.0: Introducing Adversarial Samples into Fact-based Visual
  Question Answering
FVQA 2.0: Introducing Adversarial Samples into Fact-based Visual Question Answering
Weizhe Lin
Zhilin Wang
Bill Byrne
AAML
110
4
0
19 Mar 2023
Divide and Conquer: Answering Questions with Object Factorization and
  Compositional Reasoning
Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning
Shi Chen
Qi Zhao
92
6
0
18 Mar 2023
ViperGPT: Visual Inference via Python Execution for Reasoning
ViperGPT: Visual Inference via Python Execution for Reasoning
Dídac Surís
Sachit Menon
Carl Vondrick
MLLMLRMReLM
136
468
0
14 Mar 2023
Accountable Textual-Visual Chat Learns to Reject Human Instructions in
  Image Re-creation
Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation
Zhiwei Zhang
Yuliang Liu
MLLM
71
0
0
10 Mar 2023
Toward Unsupervised Realistic Visual Question Answering
Toward Unsupervised Realistic Visual Question Answering
Yuwei Zhang
Chih-Hui Ho
Nuno Vasconcelos
CoGe
85
2
0
09 Mar 2023
Graph Neural Networks in Vision-Language Image Understanding: A Survey
Graph Neural Networks in Vision-Language Image Understanding: A Survey
Henry Senior
Greg Slabaugh
Shanxin Yuan
Luca Rossi
GNN
89
21
0
07 Mar 2023
PaLM-E: An Embodied Multimodal Language Model
PaLM-E: An Embodied Multimodal Language Model
Danny Driess
F. Xia
Mehdi S. M. Sajjadi
Corey Lynch
Aakanksha Chowdhery
...
Marc Toussaint
Klaus Greff
Andy Zeng
Igor Mordatch
Peter R. Florence
LM&Ro
151
1,678
0
06 Mar 2023
VTQA: Visual Text Question Answering via Entity Alignment and
  Cross-Media Reasoning
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning
Kan Chen
Xiangqian Wu
CoGe
52
9
0
05 Mar 2023
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on
  Tasks and Challenges
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges
Maria Lymperaiou
Giorgos Stamou
VLM
99
4
0
04 Mar 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
184
11
0
03 Mar 2023
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource
  Visual Question Answering
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
Jingjing Jiang
Nanning Zheng
MoE
112
6
0
02 Mar 2023
EVJVQA Challenge: Multilingual Visual Question Answering
EVJVQA Challenge: Multilingual Visual Question Answering
Ngan Luu-Thuy Nguyen
Nghia Hieu Nguyen
Duong T.D. Vo
K. Tran
Kiet Van Nguyen
69
7
0
23 Feb 2023
Can Pre-trained Vision and Language Models Answer Visual
  Information-Seeking Questions?
Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?
Yang Chen
Hexiang Hu
Yi Luan
Haitian Sun
Soravit Changpinyo
Alan Ritter
Ming-Wei Chang
131
94
0
23 Feb 2023
Open-domain Visual Entity Recognition: Towards Recognizing Millions of
  Wikipedia Entities
Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Hexiang Hu
Yi Luan
Yang Chen
Urvashi Khandelwal
Mandar Joshi
Kenton Lee
Kristina Toutanova
Ming-Wei Chang
VLM
119
61
0
22 Feb 2023
Few-shot Multimodal Multitask Multilingual Learning
Few-shot Multimodal Multitask Multilingual Learning
Aman Chadha
Vinija Jain
111
0
0
19 Feb 2023
Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
Zhu Wang
Sourav Medya
Sathya Ravi
VLM
90
0
0
11 Feb 2023
Multimodality Representation Learning: A Survey on Evolution,
  Pretraining and Its Applications
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications
Muhammad Arslan Manzoor
S. Albarri
Ziting Xian
Zaiqiao Meng
Preslav Nakov
Shangsong Liang
AI4TS
101
32
0
01 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
445
4,666
0
30 Jan 2023
Towards a Unified Model for Generating Answers and Explanations in
  Visual Question Answering
Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering
Chenxi Whitehouse
Tillman Weyde
Pranava Madhyastha
LRM
91
3
0
25 Jan 2023
HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial
  Images
HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial Images
Kun Li
G. Vosselman
M. Yang
80
7
0
23 Jan 2023
See, Think, Confirm: Interactive Prompting Between Vision and Language
  Models for Knowledge-based Visual Reasoning
See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning
Zhenfang Chen
Qinhong Zhou
Songlin Yang
Yining Hong
Hao Zhang
Chuang Gan
LRMVLM
107
41
0
12 Jan 2023
Multimodal Inverse Cloze Task for Knowledge-based Visual Question
  Answering
Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering
Paul Lerner
O. Ferret
C. Guinaudeau
84
9
0
11 Jan 2023
Using External Off-Policy Speech-To-Text Mappings in Contextual
  End-To-End Automated Speech Recognition
Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition
David M. Chan
Shalini Ghosh
Ariya Rastrow
Björn Hoffmeister
OffRL
76
6
0
06 Jan 2023
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and
  Challenges
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges
R. Zakari
Jim Wilson Owusu
Hailin Wang
Ke Qin
Zaharaddeen Karami Lawal
Yue-hong Dong
LRM
71
16
0
26 Dec 2022
Previous
123...1213141516
Next