Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.18695
Cited By
Grounding Language Models for Visual Entity Recognition
28 February 2024
Zilin Xiao
Ming Gong
Paola Cascante-Bonilla
Xingyao Zhang
Jie Wu
Vicente Ordonez
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Grounding Language Models for Visual Entity Recognition"
15 / 15 papers shown
Title
Fine-Grained Open-Vocabulary Object Recognition via User-Guided Segmentation
Jinwoo Ahn
Hyeokjoon Kwon
Hwiyeon Yoo
ObjD
VLM
77
0
0
23 Nov 2024
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
Dongyoung Go
Taesun Whang
Chanhee Lee
Hwayeon Kim
Sunghoon Park
Seunghwan Ji
Dongchan Kim
Young-Bum Kim
Young-Bum Kim
LRM
172
1
0
19 Nov 2024
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach
Mathilde Caron
Alireza Fathi
Cordelia Schmid
Ahmet Iscen
39
1
0
31 Oct 2024
Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant
A. S. Penamakuri
Anand Mishra
26
1
0
24 Oct 2024
Unified Multi-Modal Interleaved Document Representation for Information Retrieval
Jaewoo Lee
Joonho Ko
Jinheon Baek
Soyeong Jeong
Sung Ju Hwang
25
1
0
03 Oct 2024
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
Davide Caffagni
Federico Cocchi
Nicholas Moratelli
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
KELM
34
35
0
23 Apr 2024
Ontology Enrichment for Effective Fine-grained Entity Typing
Si-yuan Ouyang
Jiaxin Huang
Pranav Pillai
Yunyi Zhang
Yu Zhang
Jiawei Han
107
4
0
11 Oct 2023
Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Hexiang Hu
Yi Luan
Yang Chen
Urvashi Khandelwal
Mandar Joshi
Kenton Lee
Kristina Toutanova
Ming-Wei Chang
VLM
50
55
0
22 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
272
4,244
0
30 Jan 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
392
4,137
0
28 Jan 2022
ImageNet-21K Pretraining for the Masses
T. Ridnik
Emanuel Ben-Baruch
Asaf Noy
Lihi Zelnik-Manor
SSeg
VLM
CLIP
181
687
0
22 Apr 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,848
0
18 Apr 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
290
1,084
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
304
3,708
0
11 Feb 2021
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
296
39,198
0
01 Sep 2014
1