Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.13837
Cited By
Democratizing Fine-grained Visual Recognition with Large Language Models
24 January 2024
Mingxuan Liu
Subhankar Roy
Wenjing Li
Zhun Zhong
N. Sebe
Elisa Ricci
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Democratizing Fine-grained Visual Recognition with Large Language Models"
20 / 20 papers shown
Title
Efficient Vocabulary-Free Fine-Grained Visual Recognition in the Age of Multimodal LLMs
Hari Chandana Kuchibhotla
Sai Srinivas Kancheti
Abbavaram Gowtham Reddy
Vineeth N. Balasubramanian
45
0
0
02 May 2025
Attribute-formed Class-specific Concept Space: Endowing Language Bottleneck Model with Better Interpretability and Scalability
Jianyang Zhang
Qianli Luo
Guowu Yang
Wenjing Yang
Weide Liu
Guosheng Lin
Fengmao Lv
59
0
0
26 Mar 2025
TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
Yiwei Guo
Shaobin Zhuang
Kunchang Li
Yu Qiao
Yali Wang
VLM
CLIP
32
0
0
16 Oct 2024
Tree of Attributes Prompt Learning for Vision-Language Models
Tong Ding
Wanhua Li
Zhongqi Miao
Hanspeter Pfister
VLM
52
1
0
15 Oct 2024
Organizing Unstructured Image Collections using Natural Language
Mingxuan Liu
Zhun Zhong
Jun Li
Gianni Franchi
Subhankar Roy
Elisa Ricci
VLM
39
3
0
07 Oct 2024
FungiTastic: A multi-modal dataset and benchmark for image categorization
Lukás Picek
Klara Janouskova
Milan Šulc
Jirí Matas
77
1
0
24 Aug 2024
Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant
Guofeng Mei
Luigi Riz
Yiming Wang
Fabio Poiesi
ISeg
VLM
59
3
0
20 Aug 2024
CoAPT: Context Attribute words for Prompt Tuning
Gun Lee
Subin An
Sungyong Baik
Soochahn Lee
VPVLM
VLM
35
1
0
18 Jul 2024
Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Character
Siyuan Ma
Weidi Luo
Yu Wang
Xiaogeng Liu
35
20
0
25 May 2024
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
Wei Lin
Leonid Karlinsky
Nina Shvetsova
Horst Possegger
Mateusz Koziñski
Rameswar Panda
Rogerio Feris
Hilde Kuehne
Horst Bischof
VLM
102
38
0
15 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
270
4,229
0
30 Jan 2023
What does a platypus look like? Generating customized prompts for zero-shot image classification
Sarah M Pratt
Ian Covert
Rosanne Liu
Ali Farhadi
VLM
128
212
0
07 Sep 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
310
4,097
0
24 May 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
358
8,457
0
28 Jan 2022
Fine-Grained Image Analysis with Deep Learning: A Survey
Xiu-Shen Wei
Yi-Zhe Song
Oisin Mac Aodha
Jianxin Wu
Yuxin Peng
Jinhui Tang
Jian Yang
Serge J. Belongie
71
279
0
11 Nov 2021
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Yumao Lu
Zicheng Liu
Lijuan Wang
174
402
0
10 Sep 2021
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
330
2,267
0
02 Sep 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
317
5,775
0
29 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
298
3,700
0
11 Feb 2021
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
296
39,194
0
01 Sep 2014
1