Filling the Image Information Gap for VQA: Prompting Large Language
Models to Proactively Ask Questions

Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions

20 November 2023

Peng Li

Papers citing "Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions"

14 / 14 papers shown

Title
Everything Can Be Described in Words: A Simple Unified Multi-Modal Framework with Semantic and Temporal Alignment Xiaowei Bi Zheyuan Xu 58 1 0 12 Mar 2025
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines Xinwei Long Zhiyuan Ma Ermo Hua Kaiyan Zhang Biqing Qi Bowen Zhou RALM 48 0 0 23 Feb 2025
A Unified Hallucination Mitigation Framework for Large Vision-Language Models Yue Chang Liqiang Jing Xiaopeng Zhang Yue Zhang VLM MLLM 65 2 0 24 Sep 2024
Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models Wenbin An Feng Tian Jiahao Nie Wenkai Shi Haonan Lin Yan Chen Qianying Wang Y. Wu Guang Dai Ping Chen VLM 50 4 0 22 Jul 2024
Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering Dongze Hao Qunbo Wang Longteng Guo Jie Jiang Jing Liu 36 0 0 22 Apr 2024
Language Models Still Struggle to Zero-shot Reason about Time Series Mike A. Merrill Mingtian Tan Vinayak Gupta Tom Hartvigsen Tim Althoff AI4TS LRM 45 28 0 17 Apr 2024
Autonomous Evaluation and Refinement of Digital Agents Jiayi Pan Yichi Zhang Nicholas Tomlin Yifei Zhou Sergey Levine Alane Suhr ELM 43 48 0 09 Apr 2024
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM Wonkyun Kim Changin Choi Wonseok Lee Wonjong Rhee VLM 47 51 0 27 Mar 2024
Beyond Embeddings: The Promise of Visual Table in Visual Reasoning Yiwu Zhong Zi-Yuan Hu Michael R. Lyu Liwei Wang 29 1 0 27 Mar 2024
BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models Xueliang Zhao Xinting Huang Tingchen Fu Qintong Li Shansan Gong Lemao Liu Wei Bi Lingpeng Kong LRM 37 1 0 21 Feb 2024
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning Artemis Panagopoulou Le Xue Ning Yu Junnan Li Dongxu Li Shafiq R. Joty Ran Xu Silvio Savarese Caiming Xiong Juan Carlos Niebles VLM MLLM 41 45 0 30 Nov 2023
Retrieval Augmented Visual Question Answering with Outside Knowledge Weizhe Lin Bill Byrne RALM 74 69 0 07 Oct 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 319 11,953 0 04 Mar 2022
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA Zhengyuan Yang Zhe Gan Jianfeng Wang Xiaowei Hu Yumao Lu Zicheng Liu Lijuan Wang 180 402 0 10 Sep 2021