Learning Deep Representations of Fine-grained Visual Descriptions

17 May 2016

Bernt Schiele

Papers citing "Learning Deep Representations of Fine-grained Visual Descriptions"

50 / 351 papers shown

Title
Few-shot Novel Category Discovery Chunming Li Shidong Wang Haofeng Zhang 64 0 0 13 May 2025
Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models Aishwarya Venkataramanan P. Bodesheim Joachim Denzler BDL VLM 102 0 0 08 May 2025
Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval Zehong Ma Hao Chen Wei Zeng Limin Su Shiliang Zhang AI4TS 126 0 0 10 Apr 2025
Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning Huajie Jiang Zechao Li Xiaohan Yu Yongli Hu Baocai Yin Jian Yang Yuankai Qi VLM 78 0 0 29 Mar 2025
Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities Raman Dutt Harleen Hanspal Guoxuan Xia Petru-Daniel Tudosiu Alexander Black Yongxin Yang Jingyu Sun Sarah Parisot MoE 102 0 0 28 Mar 2025
ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing Yulin Pan Xiangteng He Chaojie Mao Zhen Han Zeyinzi Jiang Junxuan Zhang Yu Liu EGVM VLM 114 2 0 18 Mar 2025
MADS: Multi-Attribute Document Supervision for Zero-Shot Image Classification Xiangyan Qu Jing Yu Jiamin Zhuang Gaopeng Gou Gang Xiong Qi Wu VLM 134 0 0 10 Mar 2025
End-to-end Training for Text-to-Image Synthesis using Dual-Text Embeddings Yeruru Asrar Ahmed Anurag Mittal DiffM 122 0 0 03 Feb 2025
ArtFormer: Controllable Generation of Diverse 3D Articulated Objects Jiayi Su Youhe Feng Zheng Li Jinhua Song Yangfan He Botao Ren Botian Xu AI4CE 158 3 0 10 Dec 2024
TaxaBind: A Unified Embedding Space for Ecological Applications Srikumar Sastry Subash Khanal Aayush Dhakal Adeel Ahmad Nathan Jacobs 132 11 0 01 Nov 2024
TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving Fine-Grained Zero-Shot Image Captioning Joshua Forster Feinglass Yezhou Yang 65 0 0 30 Sep 2024
A Multimodal Single-Branch Embedding Network for Recommendation in Cold-Start and Missing Modality Scenarios Christian Ganhor Marta Moscati Anna Hausberger Shah Nawaz Markus Schedl 65 2 0 26 Sep 2024
Finetuning CLIP to Reason about Pairwise Differences Dylan Sam Devin Willmott João Dias Semedo J. Zico Kolter VLM 115 4 0 15 Sep 2024
Making Large Vision Language Models to be Good Few-shot Learners Fan Liu Wenwen Cai Jian Huo Chuanyi Zhang Delong Chen Jun Zhou 89 0 0 21 Aug 2024
Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach Muhammad Saad Saeed Shah Nawaz Muhammad Zaigham Zaheer Muhammad Haris Khan Karthik Nandakumar Muhammad Haroon Yousaf Hassan Sajjad Tom De Schepper Markus Schedl 93 0 0 14 Aug 2024
From Attributes to Natural Language: A Survey and Foresight on Text-based Person Re-identification Fanzhi Jiang Su Yang Mark W. Jones Liumei Zhang 104 1 0 31 Jul 2024
ZeroDDI: A Zero-Shot Drug-Drug Interaction Event Prediction Method with Semantic Enhanced Learning and Dual-Modal Uniform Alignment Ziyan Wang Zhankun Xiong Feng Huang Xuan Liu Wen Zhang 82 6 0 01 Jul 2024
On the Limits of Multi-modal Meta-Learning with Auxiliary Task Modulation Using Conditional Batch Normalization Jordi Armengol-Estapé Vincent Michalski Ramnath Kumar P. St-Charles Doina Precup Samira Ebrahimi Kahou 143 0 0 29 May 2024
Faithful Attention Explainer: Verbalizing Decisions Based on Discriminative Features Yao Rong David Scheerer Enkelejda Kasneci 80 0 0 16 May 2024
A separability-based approach to quantifying generalization: which layer is best? Luciano Dyballa Evan Gerritz Steven W. Zucker OOD 113 4 0 02 May 2024
AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models Zhiqiang Tang Haoyang Fang Su Zhou Taojiannan Yang Zihan Zhong Tony Hu Katrin Kirchhoff George Karypis 109 14 0 24 Apr 2024
Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning W. Hou Shiming Chen Shuhuang Chen Ziming Hong Yan Wang Xuetao Feng Salman Khan Fahad Shahbaz Khan Xinge You 99 12 0 23 Apr 2024
From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search Jintao Sun Zhedong Zheng Gangyi Ding Gangyi Ding 124 8 0 16 Apr 2024
CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning Haojian Huang Xiaozhen Qiao Zhuo Chen Haodong Chen Bingyu Li Zhe Sun Mulin. Chen Xuelong Li 117 11 0 15 Apr 2024
Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection Ting Lei Shaofeng Yin Yang Liu VLM 115 9 0 09 Apr 2024
Improving deep learning with prior knowledge and cognitive models: A survey on enhancing explainability, adversarial robustness and zero-shot learning F. Mumuni A. Mumuni AAML 103 7 0 11 Mar 2024
Cross-Modal Coordination Across a Diverse Set of Input Modalities Jorge Sánchez Rodrigo Laguna VLM 80 0 0 29 Jan 2024
Data-Free Generalized Zero-Shot Learning Bowen Tang Long Yan Jing Zhang Qian Yu Lu Sheng Dong Xu VLM 83 11 0 28 Jan 2024
Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions Oindrila Saha Grant Van Horn Subhransu Maji VLM 144 24 0 04 Jan 2024
Prototype-Guided Text-based Person Search based on Rich Chinese Descriptions ZiQiang Wu Bingpeng Ma 56 0 0 22 Dec 2023
ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval Kaipeng Fang Jingkuan Song Lianli Gao Pengpeng Zeng Zhi-Qi Cheng Xiyao Li Hengtao Shen VLM 71 11 0 19 Dec 2023
LIME: Localized Image Editing via Attention Regularization in Diffusion Models Enis Simsar A. Tonioni Yongqin Xian Thomas Hofmann Federico Tombari DiffM 68 9 0 14 Dec 2023
Large Language Models are Good Prompt Learners for Low-Shot Image Classification Zhao-Heng Zheng Jingmin Wei Xuefeng Hu Haidong Zhu Ramkant Nevatia VLM 106 5 0 07 Dec 2023
TextAug: Test time Text Augmentation for Multimodal Person Re-identification Mulham Fawakherji Eduard Vazquez P. Giampa Binod Bhattarai 86 3 0 04 Dec 2023
Holistic Evaluation of Text-To-Image Models Tony Lee Michihiro Yasunaga Chenlin Meng Yifan Mai Joon Sung Park ... Jun-Yan Zhu Fei-Fei Li Jiajun Wu Stefano Ermon Percy Liang 241 139 0 07 Nov 2023
Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images Zalan Fabian Zhongqi Miao Chunyuan Li Yuanhan Zhang Ziwei Liu ... Laura Siabatto Andrés Link Pablo Arbelaez Rahul Dodhia J. L. Ferres 98 11 0 02 Nov 2023
Recognize Any Regions Haosen Yang Chuofan Ma Bin Wen Yi Jiang Zehuan Yuan Xiatian Zhu ObjD VLM 96 3 0 02 Nov 2023
BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping Srikumar Sastry Subash Khanal Aayush Dhakal Di Huang Nathan Jacobs 77 10 0 29 Oct 2023
Open-Set Image Tagging with Multi-Grained Text Supervision Xinyu Huang Yi-Jie Huang Youcai Zhang Weiwei Tian Rui Feng Yuejie Zhang Yanchun Xie Yaqian Li Lei Zhang VLM 87 35 0 23 Oct 2023
Dual Feature Augmentation Network for Generalized Zero-shot Learning L. Xiang Yuan Zhou Haoran Duan Yang Long 80 1 0 25 Sep 2023
Exploring Meta Information for Audio-based Zero-shot Bird Classification Alexander Gebhard Andreas Triantafyllopoulos Teresa Bez Lukas Christ Alexander Kathan Björn W. Schuller 97 6 0 15 Sep 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey Lin Geng Foo Hossein Rahmani Jing Liu 278 31 0 27 Aug 2023
Improving Generalization of Image Captioning with Unsupervised Prompt Learning Hongchen Wei Zhenzhong Chen VLM 79 3 0 05 Aug 2023
General-Purpose Multi-Modal OOD Detection Framework Viet Duong Qiong Wu Zhengyi Zhou Eric Zavesky Jiahe Chen Xiangzhou Liu Wen-Ling Hsu Huajie Shao OODD 78 2 0 24 Jul 2023
Learning Adversarial Semantic Embeddings for Zero-Shot Recognition in Open Worlds Tianqi Li Guansong Pang Xiao Bai Jingyi Zheng Lei Zhou Xin Ning VLM 86 30 0 07 Jul 2023
ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models Uddeshya Upadhyay Shyamgopal Karthik Massimiliano Mancini Zeynep Akata MLLM VLM 86 4 0 01 Jul 2023
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing Kai Zhang Lingbo Mo Wenhu Chen Huan Sun Yu-Chuan Su EGVM 226 277 0 16 Jun 2023
Waffling around for Performance: Visual Classification with Random Words and Broad Concepts Karsten Roth Jae Myung Kim A. Sophia Koepke Oriol Vinyals Cordelia Schmid Zeynep Akata VLM 95 76 0 12 Jun 2023
Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark Shuyu Yang Yinan Zhou Yaxiong Wang Yujiao Wu Li Zhu Zhedong Zheng VLM DiffM 144 92 0 05 Jun 2023
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work Qiangchang Wang Yilong Yin 100 0 0 02 Jun 2023