Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.13947
Cited By
Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering
22 April 2024
Dongze Hao
Qunbo Wang
Longteng Guo
Jie Jiang
Jing Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering"
14 / 14 papers shown
Title
PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu
Hang Hua
Zhengyuan Yang
Weijia Shi
Noah A. Smith
Jiebo Luo
75
103
0
15 Nov 2022
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
146
693
0
14 Nov 2022
A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA
Yangyang Guo
Liqiang Nie
Yongkang Wong
Yebin Liu
Zhiyong Cheng
Mohan S. Kankanhalli
85
39
0
30 Jun 2022
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
Dustin Schwenk
Apoorv Khandelwal
Christopher Clark
Kenneth Marino
Roozbeh Mottaghi
39
530
0
03 Jun 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLM
ObjD
120
859
0
07 Feb 2022
Webly Supervised Concept Expansion for General Purpose Vision Models
Amita Kamath
Christopher Clark
Tanmay Gupta
Eric Kolve
Derek Hoiem
Aniruddha Kembhavi
VLM
52
54
0
04 Feb 2022
KAT: A Knowledge Augmented Transformer for Vision-and-Language
Liangke Gui
Borui Wang
Qiuyuan Huang
Alexander G. Hauptmann
Yonatan Bisk
Jianfeng Gao
46
155
0
16 Dec 2021
Multi-Modal Answer Validation for Knowledge-Based VQA
Jialin Wu
Jiasen Lu
Ashish Sabharwal
Roozbeh Mottaghi
101
144
0
23 Mar 2021
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
Kenneth Marino
Xinlei Chen
Devi Parikh
Abhinav Gupta
Marcus Rohrbach
63
180
0
20 Dec 2020
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
192
3,659
0
06 Aug 2019
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
100
4,201
0
25 Jul 2017
Billion-scale similarity search with GPUs
Jeff Johnson
Matthijs Douze
Hervé Jégou
168
3,696
0
28 Feb 2017
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
285
3,187
0
02 Dec 2016
FVQA: Fact-based Visual Question Answering
Peng Wang
Qi Wu
Chunhua Shen
Anton van den Hengel
A. Dick
CoGe
61
455
0
17 Jun 2016
1