ROME: Evaluating Pre-trained Vision-Language Models on Reasoning beyond
Visual Common Sense

ROME: Evaluating Pre-trained Vision-Language Models on Reasoning beyond Visual Common Sense

30 October 2023

Wei Bin Au Yeong

ArXiv (abs)PDF HTML

Papers citing "ROME: Evaluating Pre-trained Vision-Language Models on Reasoning beyond Visual Common Sense"

13 / 13 papers shown

Title
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation Dongryeol Lee Yerin Hwang Yongil Kim Joonsuk Park Kyomin Jung ELM 130 10 0 28 Oct 2024
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad? Antonia Wüst Tim Nelson Tobiasch Lukas Helff Inga Ibs Wolfgang Stammer Devendra Singh Dhami Constantin Rothkopf Kristian Kersting CoGe ReLM VLM LRM 127 2 0 25 Oct 2024
VLind-Bench: Measuring Language Priors in Large Vision-Language Models Kang-il Lee Minbeom Kim Seunghyun Yoon Minsung Kim Dongryeol Lee Hyukhun Koh Kyomin Jung CoGe VLM 147 8 0 13 Jun 2024
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning Wenliang Dai Junnan Li Dongxu Li A. M. H. Tiong Junqi Zhao Weisheng Wang Boyang Albert Li Pascale Fung Steven C. H. Hoi MLLM VLM 134 2,095 0 11 May 2023
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality Qinghao Ye Haiyang Xu Guohai Xu Jiabo Ye Ming Yan ... Junfeng Tian Qiang Qi Ji Zhang Feiyan Huang Jingren Zhou VLM MLLM 286 955 0 27 Apr 2023
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training Wenliang Dai Zihan Liu Ziwei Ji Dan Su Pascale Fung MLLM VLM 82 67 0 14 Oct 2022
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality Tristan Thrush Ryan Jiang Max Bartolo Amanpreet Singh Adina Williams Douwe Kiela Candace Ross CoGe 106 427 0 07 Apr 2022
Things not Written in Text: Exploring Spatial Commonsense from Visual Signals Xiao Liu Da Yin Yansong Feng Dongyan Zhao LRM 64 46 0 15 Mar 2022
CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment Haoyu Song Li Dong Weinan Zhang Ting Liu Furu Wei VLM CLIP 81 139 0 14 Mar 2022
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation Junnan Li Ramprasaath R. Selvaraju Akhilesh Deepak Gotmare Shafiq Joty Caiming Xiong Guosheng Lin FaML 221 1,972 0 16 Jul 2021
Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models Bill Yuchen Lin Seyeon Lee Rahul Khanna Xiang Ren AIMat 68 158 0 02 May 2020
Inducing Relational Knowledge from BERT Zied Bouraoui Jose Camacho-Collados Steven Schockaert 92 167 0 28 Nov 2019
From Recognition to Cognition: Visual Commonsense Reasoning Rowan Zellers Yonatan Bisk Ali Farhadi Yejin Choi LRM BDL OCL ReLM 164 881 0 27 Nov 2018