ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.00067
  4. Cited By
OK-VQA: A Visual Question Answering Benchmark Requiring External
  Knowledge
v1v2 (latest)

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

31 May 2019
Kenneth Marino
Mohammad Rastegari
Ali Farhadi
Roozbeh Mottaghi
ArXiv (abs)PDFHTML

Papers citing "OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge"

50 / 781 papers shown
Title
A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common
  Sense Reasoning
A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning
Niki Maria Foteinopoulou
Enjie Ghorbel
Djamila Aouada
133
4
0
01 Oct 2024
Unleashing the Potentials of Likelihood Composition for Multi-modal
  Language Models
Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models
Shitian Zhao
Renrui Zhang
Xu Luo
Yan Wang
Shanghang Zhang
Peng Gao
91
0
0
01 Oct 2024
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Haotian Zhang
Mingfei Gao
Zhe Gan
Philipp Dufter
Nina Wenzel
...
Haoxuan You
Zirui Wang
Afshin Dehghan
Peter Grasch
Yinfei Yang
VLMMLLM
133
41
1
30 Sep 2024
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs
Zicheng Zhang
Ziheng Jia
H. Wu
Chunyi Li
Zijian Chen
...
Wei Sun
Xiaohong Liu
Xiongkuo Min
Weisi Lin
Guangtao Zhai
105
10
0
30 Sep 2024
T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness
  Recognition
T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness Recognition
Chen Yeh
You-Ming Chang
Wei-Chen Chiu
Ning Yu
66
2
0
29 Sep 2024
Visual Question Decomposition on Multimodal Large Language Models
Visual Question Decomposition on Multimodal Large Language Models
Haowei Zhang
Jianzhe Liu
Zhen Han
Shuo Chen
Bailan He
Volker Tresp
Zhiqiang Xu
Jindong Gu
157
2
0
28 Sep 2024
TrojVLM: Backdoor Attack Against Vision Language Models
TrojVLM: Backdoor Attack Against Vision Language Models
Weimin Lyu
Lu Pang
Tengfei Ma
Haibin Ling
Chao Chen
MLLM
97
11
0
28 Sep 2024
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLMAuLLM
164
12
0
26 Sep 2024
EAGLE: Egocentric AGgregated Language-video Engine
EAGLE: Egocentric AGgregated Language-video Engine
Jing Bi
Yunlong Tang
Luchuan Song
Ali Vosoughi
Nguyen Nguyen
Chenliang Xu
97
11
0
26 Sep 2024
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
  Multimodal Models
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Matt Deitke
Christopher Clark
Sangho Lee
Rohun Tripathi
Yue Yang
...
Noah A. Smith
Hannaneh Hajishirzi
Ross Girshick
Ali Farhadi
Aniruddha Kembhavi
OSLMVLM
110
13
0
25 Sep 2024
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Hong Chen
Xin Wang
Yuwei Zhou
Bin Huang
Yipeng Zhang
Wei Feng
Houlun Chen
Zeyang Zhang
Siao Tang
Wenwu Zhu
DiffM
124
9
0
23 Sep 2024
VLM's Eye Examination: Instruct and Inspect Visual Competency of Vision
  Language Models
VLM's Eye Examination: Instruct and Inspect Visual Competency of Vision Language Models
Nam Hyeon-Woo
Moon Ye-Bin
Wonseok Choi
Lee Hyun
Tae-Hyun Oh
CoGe
68
3
0
23 Sep 2024
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
Zhecan Wang
Junzhang Liu
Chia-Wei Tang
Hani Alomari
Anushka Sivakumar
...
Haoxuan You
A. Ishmam
Kai-Wei Chang
Shih-Fu Chang
Chris Thomas
CoGeVLM
171
2
0
19 Sep 2024
NVLM: Open Frontier-Class Multimodal LLMs
NVLM: Open Frontier-Class Multimodal LLMs
Wenliang Dai
Nayeon Lee
Wei Ping
Zhuoling Yang
Zihan Liu
Jon Barker
Tuomas Rintamaki
Mohammad Shoeybi
Bryan Catanzaro
Ming-Yu Liu
MLLMVLMLRM
119
73
0
17 Sep 2024
Guiding Vision-Language Model Selection for Visual Question-Answering
  Across Tasks, Domains, and Knowledge Types
Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types
Neelabh Sinha
Vinija Jain
Aman Chadha
70
3
0
14 Sep 2024
Securing Vision-Language Models with a Robust Encoder Against Jailbreak
  and Adversarial Attacks
Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks
Md Zarif Hossain
Ahmed Imteaj
AAMLVLM
81
6
0
11 Sep 2024
UNIT: Unifying Image and Text Recognition in One Vision Encoder
UNIT: Unifying Image and Text Recognition in One Vision Encoder
Yi Zhu
Yanpeng Zhou
Chunwei Wang
Yang Cao
Jianhua Han
Lu Hou
Hang Xu
ViTVLM
114
4
0
06 Sep 2024
An overview of domain-specific foundation model: key technologies, applications and challenges
An overview of domain-specific foundation model: key technologies, applications and challenges
Haolong Chen
Hanzhi Chen
Zijian Zhao
Kaifeng Han
Guangxu Zhu
Yichen Zhao
Ying Du
Wei Xu
Qingjiang Shi
ALMVLM
111
5
0
06 Sep 2024
Blocks as Probes: Dissecting Categorization Ability of Large Multimodal
  Models
Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models
Bin Fu
Qiyang Wan
Jialin Li
Ruiping Wang
Xilin Chen
55
0
0
03 Sep 2024
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene
  Understanding
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding
Yonghui Wang
Wengang Zhou
Hao Feng
Houqiang Li
VLM
68
1
0
30 Aug 2024
CogVLM2: Visual Language Models for Image and Video Understanding
CogVLM2: Visual Language Models for Image and Video Understanding
Wenyi Hong
Weihan Wang
Ming Ding
Wenmeng Yu
Qingsong Lv
...
Debing Liu
Bin Xu
Juanzi Li
Yuxiao Dong
Jie Tang
VLMMLLM
111
121
0
29 Aug 2024
Law of Vision Representation in MLLMs
Law of Vision Representation in MLLMs
Shijia Yang
Bohan Zhai
Quanzeng You
Jianbo Yuan
Hongxia Yang
Chenfeng Xu
145
12
0
29 Aug 2024
VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language
  Models for Trait Discovery from Biological Images
VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images
M. Maruf
Arka Daw
Kazi Sajeed Mehrab
Harish Babu Manogaran
Abhilash Neog
...
Wei-Lun Chao
Charles V. Stewart
T. Berger-Wolf
Wasila Dahdul
Anuj Karpatne
CoGe
89
4
0
28 Aug 2024
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation
Fangxun Shu
Yue Liao
Le Zhuo
Chenning Xu
Guanghao Zhang
...
Bolin Li
Zhelun Yu
Si Liu
Hongsheng Li
Hao Jiang
VLMMoE
63
18
0
28 Aug 2024
A Survey on Evaluation of Multimodal Large Language Models
A Survey on Evaluation of Multimodal Large Language Models
Jiaxing Huang
Jingyi Zhang
LM&MAELMLRM
110
26
0
28 Aug 2024
Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and
  Analysis
Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis
Aishik Nagar
Shantanu Jaiswal
Cheston Tan
ReLMLRM
60
12
0
27 Aug 2024
I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing
I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing
Yiwei Ma
Jiayi Ji
Ke Ye
Weihuang Lin
Zhibin Wang
Yonghan Zheng
Qiang-feng Zhou
Xiaoshuai Sun
Rongrong Ji
123
11
0
26 Aug 2024
Knowledge-Aware Reasoning over Multimodal Semi-structured Tables
Knowledge-Aware Reasoning over Multimodal Semi-structured Tables
Suyash Vardhan Mathur
J. Bafna
Kunal Kartik
Harshita Khandelwal
Manish Shrivastava
Vivek Gupta
Joey Tianyi Zhou
Dan Roth
LMTD
113
2
0
25 Aug 2024
Building and better understanding vision-language models: insights and
  future directions
Building and better understanding vision-language models: insights and future directions
Hugo Laurençon
Andrés Marafioti
Victor Sanh
Léo Tronchon
VLM
135
78
0
22 Aug 2024
CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in
  Visual Question Answering
CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in Visual Question Answering
Yuliang Cai
Mohammad Rostami
CLLVLMMLLM
123
4
0
21 Aug 2024
Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit
Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit
Qizhou Chen
Taolin Zhang
Chengyu Wang
Xiaofeng He
Dakan Wang
Tingting Liu
KELM
159
4
0
19 Aug 2024
Quality Assessment in the Era of Large Models: A Survey
Quality Assessment in the Era of Large Models: A Survey
Zicheng Zhang
Yingjie Zhou
Chunyi Li
Baixuan Zhao
Xiaohong Liu
Guangtao Zhai
103
12
0
17 Aug 2024
IIU: Independent Inference Units for Knowledge-based Visual Question
  Answering
IIU: Independent Inference Units for Knowledge-based Visual Question Answering
Yili Li
Jing Yu
Keke Gai
Gang Xiong
51
0
0
15 Aug 2024
Can Large Language Models Understand Symbolic Graphics Programs?
Can Large Language Models Understand Symbolic Graphics Programs?
Zeju Qiu
Weiyang Liu
Haiwen Feng
Zhen Liu
Tim Z. Xiao
Katherine M. Collins
J. Tenenbaum
Adrian Weller
Michael J. Black
Bernhard Schölkopf
123
14
0
15 Aug 2024
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal
  Large Language Models
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Jiabo Ye
Haiyang Xu
Haowei Liu
Anwen Hu
Ming Yan
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLMVLM
86
139
0
09 Aug 2024
How Well Can Vision Language Models See Image Details?
How Well Can Vision Language Models See Image Details?
Chenhui Gou
Abdulwahab Felemban
Faizan Farooq Khan
Deyao Zhu
Jianfei Cai
Hamid Rezatofighi
Mohamed Elhoseiny
VLMMLLM
100
5
0
07 Aug 2024
Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation
Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation
Weiqi Feng
Yangrui Chen
Shaoyu Wang
Size Zheng
H. Lin
Minlan Yu
MLLMAI4CE
138
4
0
07 Aug 2024
LLaVA-OneVision: Easy Visual Task Transfer
LLaVA-OneVision: Easy Visual Task Transfer
Bo Li
Yuanhan Zhang
Dong Guo
Renrui Zhang
Feng Li
Hao Zhang
Kaichen Zhang
Yanwei Li
Ziwei Liu
Chunyuan Li
MLLMSyDaVLM
166
865
0
06 Aug 2024
Fairness and Bias Mitigation in Computer Vision: A Survey
Fairness and Bias Mitigation in Computer Vision: A Survey
Sepehr Dehdashtian
Ruozhen He
Yi Li
Guha Balakrishnan
Nuno Vasconcelos
Vicente Ordonez
Vishnu Boddeti
137
5
0
05 Aug 2024
MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented
  Generation via Knowledge-enhanced Reranking and Noise-injected Training
MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training
Rivik Setty
Chengjin Xu
Vinay Setty
Jian Guo
83
13
0
31 Jul 2024
Autonomous Improvement of Instruction Following Skills via Foundation
  Models
Autonomous Improvement of Instruction Following Skills via Foundation Models
Zhiyuan Zhou
P. Atreya
Abraham Lee
Homer Walke
Oier Mees
Sergey Levine
95
14
0
30 Jul 2024
FlexAttention for Efficient High-Resolution Vision-Language Models
FlexAttention for Efficient High-Resolution Vision-Language Models
Junyan Li
Delin Chen
Tianle Cai
Peihao Chen
Yining Hong
Zhenfang Chen
Yikang Shen
Chuang Gan
VLM
125
5
0
29 Jul 2024
Visual Riddles: a Commonsense and World Knowledge Challenge for Large
  Vision and Language Models
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models
Nitzan Bitton-Guetta
Aviv Slobodkin
Aviya Maimon
Eliya Habba
Royi Rassin
Yonatan Bitton
Idan Szpektor
Amir Globerson
Yuval Elovici
ReLMVLMLRM
71
6
0
28 Jul 2024
Data Processing Techniques for Modern Multimodal Models
Data Processing Techniques for Modern Multimodal Models
Yinheng Li
Han Ding
Hang Chen
VLM
87
0
0
27 Jul 2024
UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons
  of Vision Language Models
UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models
Xinyu Pi
Mingyuan Wu
Jize Jiang
Haozhen Zheng
Beitong Tian
Chengxiang Zhai
Klara Nahrstedt
Zhiting Hu
VLM
108
1
0
25 Jul 2024
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal
  Large Language Model
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
Yiwei Ma
Zhibin Wang
Xiaoshuai Sun
Weihuang Lin
Qiang-feng Zhou
Jiayi Ji
Rongrong Ji
MLLMVLM
105
2
0
23 Jul 2024
Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight
Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight
Ziyuan Huang
Kaixiang Ji
Biao Gong
Zhiwu Qing
Qinglong Zhang
Kecheng Zheng
Jian Wang
Jingdong Chen
Ming Yang
LRM
68
2
0
22 Jul 2024
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal
  Reasoning
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang
Garrett Bingham
Adams Wei Yu
Quoc V. Le
Thang Luong
Golnaz Ghiasi
MLLMLRM
126
13
0
22 Jul 2024
Knowledge Acquisition Disentanglement for Knowledge-based Visual
  Question Answering with Large Language Models
Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models
Wenbin An
Feng Tian
Jiahao Nie
Wenkai Shi
Haonan Lin
Yan Chen
Qianying Wang
Y. Wu
Guang Dai
Ping Chen
VLM
94
4
0
22 Jul 2024
MIBench: Evaluating Multimodal Large Language Models over Multiple
  Images
MIBench: Evaluating Multimodal Large Language Models over Multiple Images
Haowei Liu
Xi Zhang
Haiyang Xu
Yaya Shi
Chaoya Jiang
...
Ji Zhang
Fei Huang
Chunfen Yuan
Bing Li
Weiming Hu
VLM
94
15
0
21 Jul 2024
Previous
123456...141516
Next