Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.02317
Cited By
Webly Supervised Concept Expansion for General Purpose Vision Models
4 February 2022
Amita Kamath
Christopher Clark
Tanmay Gupta
Eric Kolve
Derek Hoiem
Aniruddha Kembhavi
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Webly Supervised Concept Expansion for General Purpose Vision Models"
47 / 47 papers shown
Title
Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering
Zhongjian Hu
Peng Yang
Bing Li
Zhenqi Wang
50
0
0
24 Dec 2024
Prompting Large Language Models with Rationale Heuristics for Knowledge-based Visual Question Answering
Zhongjian Hu
Peng Yang
Bing Li
Fengyuan Liu
LRM
122
58
0
22 Dec 2024
Improving Generalization in Visual Reasoning via Self-Ensemble
Tien-Huy Nguyen
Quang-Khai Tran
Anh-Tuan Quang-Hoang
VLM
LRM
55
5
0
28 Oct 2024
Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models
Wenbin An
Feng Tian
Jiahao Nie
Wenkai Shi
Haonan Lin
Yan Chen
Qianying Wang
Y. Wu
Guang Dai
Ping Chen
VLM
50
4
0
22 Jul 2024
Multimodal Reranking for Knowledge-Intensive Visual Question Answering
Haoyang Wen
Honglei Zhuang
Hamed Zamani
Alexander Hauptmann
Michael Bendersky
39
0
0
17 Jul 2024
Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning
Cheng Tan
Jingxuan Wei
Linzhuang Sun
Zhangyang Gao
Siyuan Li
Bihui Yu
Ruifeng Guo
Stan Z. Li
ReLM
LRM
3DV
69
6
0
31 May 2024
Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering
Dongze Hao
Qunbo Wang
Longteng Guo
Jie Jiang
Jing Liu
36
0
0
22 Apr 2024
Knowledge Condensation and Reasoning for Knowledge-based VQA
Dongze Hao
Jian Jia
Longteng Guo
Qunbo Wang
Te Yang
...
Yanhua Cheng
Bo Wang
Quan Chen
Han Li
Jing Liu
44
0
0
15 Mar 2024
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering
Ziyu Ma
Shutao Li
Bin Sun
Jianfei Cai
Zuxiang Long
Fuyan Ma
34
2
0
04 Feb 2024
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
Haibi Wang
Weifeng Ge
LRM
32
4
0
19 Jan 2024
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
37
144
0
28 Dec 2023
Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
Henry Hengyuan Zhao
Pan Zhou
Mike Zheng Shou
MLLM
SyDa
38
7
0
11 Dec 2023
Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
Cheng Tan
Jingxuan Wei
Zhangyang Gao
Linzhuang Sun
Siyuan Li
Ruifeng Guo
Xihong Yang
Stan Z. Li
LRM
31
7
0
23 Nov 2023
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation
Yangyi Chen
Xingyao Wang
Manling Li
Derek Hoiem
Heng Ji
30
11
0
22 Nov 2023
From Categories to Classifier: Name-Only Continual Learning by Exploring the Web
Ameya Prabhu
Hasan Hammoud
Ser-Nam Lim
Guohao Li
Philip H. S. Torr
Adel Bibi
CLL
127
9
0
19 Nov 2023
PerceptionGPT: Effectively Fusing Visual Perception into LLM
Renjie Pi
Lewei Yao
Jiahui Gao
Jipeng Zhang
Tong Zhang
MLLM
28
30
0
11 Nov 2023
ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
KAI-QING Zhou
Kwonjoon Lee
Teruhisa Misu
Xin Eric Wang
LRM
34
3
0
09 Oct 2023
DesCo: Learning Object Recognition with Rich Language Descriptions
Liunian Harold Li
Zi-Yi Dou
Nanyun Peng
Kai-Wei Chang
ObjD
VLM
28
20
0
24 Jun 2023
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Difei Gao
Lei Ji
Luowei Zhou
Kevin Lin
Joya Chen
Zihan Fan
Mike Zheng Shou
MLLM
27
72
0
14 Jun 2023
Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
Zaid Khan
B. Vijaykumar
S. Schulter
Xiang Yu
Y. Fu
Manmohan Chandraker
VLM
MLLM
29
17
0
06 Jun 2023
Text encoders bottleneck compositionality in contrastive vision-language models
Amita Kamath
Jack Hessel
Kai-Wei Chang
CoGe
CLIP
VLM
27
19
0
24 May 2023
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Haoxuan You
Rui Sun
Zhecan Wang
Long Chen
Gengyu Wang
Hammad A. Ayyubi
Kai-Wei Chang
Shih-Fu Chang
VLM
MLLM
LRM
52
43
0
24 May 2023
Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models
A. Maharana
Amita Kamath
Christopher Clark
Joey Tianyi Zhou
Aniruddha Kembhavi
40
3
0
28 Mar 2023
Internet Explorer: Targeted Representation Learning on the Open Web
Alexander C. Li
Ellis L Brown
Alexei A. Efros
Deepak Pathak
VLM
16
24
0
27 Feb 2023
See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning
Zhenfang Chen
Qinhong Zhou
Yikang Shen
Yining Hong
Hao Zhang
Chuang Gan
LRM
VLM
33
35
0
12 Jan 2023
From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models
Jiaxian Guo
Junnan Li
Dongxu Li
A. M. H. Tiong
Boyang Albert Li
Dacheng Tao
Steven C. H. Hoi
VLM
MLLM
29
107
0
21 Dec 2022
Universal Object Detection with Large Vision Model
Feng-Huei Lin
Wenze Hu
Yaowei Wang
Yonghong Tian
Guangming Lu
Fanglin Chen
Yong-mei Xu
Xiaoyu Wang
VLM
ObjD
32
8
0
19 Dec 2022
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Ziniu Hu
Ahmet Iscen
Chen Sun
Zirui Wang
Kai-Wei Chang
Yizhou Sun
Cordelia Schmid
David A. Ross
Alireza Fathi
RALM
VLM
40
89
0
10 Dec 2022
SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding
F. Bastani
Piper Wolters
Ritwik Gupta
Joe Ferdinando
Aniruddha Kembhavi
32
98
0
28 Nov 2022
Visual Programming: Compositional visual reasoning without training
Tanmay Gupta
Aniruddha Kembhavi
ReLM
VLM
LRM
79
402
0
18 Nov 2022
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
Hao Li
Jinguo Zhu
Xiaohu Jiang
Xizhou Zhu
Hongsheng Li
...
Xiaohua Wang
Yu Qiao
Xiaogang Wang
Wenhai Wang
Jifeng Dai
MLLM
26
55
0
17 Nov 2022
I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision
Sophia Gu
Christopher Clark
Aniruddha Kembhavi
VLM
19
24
0
17 Nov 2022
PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu
Hang Hua
Zhengyuan Yang
Weijia Shi
Noah A. Smith
Jiebo Luo
45
101
0
15 Nov 2022
VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge
Sahithya Ravi
Aditya Chinchure
Leonid Sigal
Renjie Liao
Vered Shwartz
37
26
0
24 Oct 2022
A Survey of Computer Vision Technologies In Urban and Controlled-environment Agriculture
Jiayun Luo
Boyang Albert Li
Cyril Leung
53
11
0
20 Oct 2022
LAVIS: A Library for Language-Vision Intelligence
Dongxu Li
Junnan Li
Hung Le
Guangsen Wang
Silvio Savarese
S. Hoi
VLM
126
51
0
15 Sep 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
56
392
0
17 Jun 2022
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
Dustin Schwenk
Apoorv Khandelwal
Christopher Clark
Kenneth Marino
Roozbeh Mottaghi
16
505
0
03 Jun 2022
Training Vision-Language Transformers from Captions
Liangke Gui
Yingshan Chang
Qiuyuan Huang
Subhojit Som
Alexander G. Hauptmann
Jianfeng Gao
Yonatan Bisk
VLM
ViT
174
11
0
19 May 2022
GRIT: General Robust Image Task Benchmark
Tanmay Gupta
Ryan Marten
Aniruddha Kembhavi
Derek Hoiem
VLM
OOD
ObjD
16
31
0
28 Apr 2022
Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration
Shufan Wang
Laure Thompson
Mohit Iyyer
180
66
0
13 Sep 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
202
405
0
13 Jul 2021
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Nayeon Lee
Weicheng Kuo
Huayu Chen
VLM
ObjD
225
899
0
28 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
310
3,708
0
11 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Joey Tianyi Zhou
MLLM
262
525
0
04 Feb 2021
VinVL: Revisiting Visual Representations in Vision-Language Models
Pengchuan Zhang
Xiujun Li
Xiaowei Hu
Jianwei Yang
Lei Zhang
Lijuan Wang
Yejin Choi
Jianfeng Gao
ObjD
VLM
260
157
0
02 Jan 2021
VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions
Oytun Ulutan
A S M Iftekhar
B. S. Manjunath
69
186
0
11 Mar 2020
1