Title
MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language Models Yuncheng Guo Xiaodong Gu OffRL VLM 27 0 0 15 May 2025
Leveraging Segment Anything Model for Source-Free Domain Adaptation via Dual Feature Guided Auto-Prompting Zheang Huai Hui Tang Yi Li Zhengzhang Chen Xiaomeng Li VLM 33 0 0 13 May 2025
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning Zhaochen Su Linjie Li Mingyang Song Yunzhuo Hao Zhengyuan Yang ... Guanjie Chen Jiawei Gu Juntao Li Xiaoye Qu Yu Cheng OffRL LRM 31 0 0 13 May 2025
Beyond CLIP Generalization: Against Forward&Backward Forgetting Adapter for Continual Learning of Vision-Language Models Songlin Dong Chenhao Ding Jiangyang Li Jizhou Han Qiang Wang Yuhang He Yihong Gong CLL VLM 40 0 0 12 May 2025
$Simple Semi-supervised Knowledge Distillation from Vision-Language Models via $\mathbf{\texttt{D}}$ual-$\mathbf{\texttt{H}}$ead $\mathbf{\texttt{O}}$ptimization$ Simple Semi-supervised Knowledge Distillation from Vision-Language Models via $\mathbf{\texttt{D}}$ ual- $\mathbf{\texttt{H}}$ ead $\mathbf{\texttt{O}}$ ptimization Seongjae Kang Dong Bok Lee Hyungjoon Jang Sung Ju Hwang VLM 57 0 0 12 May 2025
A Vision-Language Foundation Model for Leaf Disease Identification Khang Nguyen Quoc Lan Le Thi Thu Luyl-Da Quach VLM 26 0 0 11 May 2025
SimMIL: A Universal Weakly Supervised Pre-Training Framework for Multi-Instance Learning in Whole Slide Pathology Images Yicheng Song Tiancheng Lin Die Peng Su Yang Yi Xu MedIm 31 0 0 10 May 2025
VR-RAG: Open-vocabulary Species Recognition with RAG-Assisted Large Multi-Modal Models F. Khan Jun Chen Youssef Mohamed Chun-Mei Feng Mohamed Elhoseiny VLM 33 0 0 08 May 2025
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP Hanxun Huang Sarah Monazam Erfani Yige Li Xingjun Ma James Bailey AAML 44 0 0 08 May 2025
AS3D: 2D-Assisted Cross-Modal Understanding with Semantic-Spatial Scene Graphs for 3D Visual Grounding Feng Xiao Hongbin Xu Guocan Zhao Wenxiong Kang 48 0 0 07 May 2025
PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation HsiaoYuan Hsu Yuxin Peng 26 0 0 06 May 2025
Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability L. Wang Senmao Li Fei Yang Jianye Wang Ziheng Zhang Yong-Jin Liu Y. Wang Jian Yang DiffM 61 0 0 06 May 2025
An Adaptive Data-Resilient Multi-Modal Framework for Hierarchical Multi-Label Book Genre Identification Utsav Nareti S. Chattopadhyay Prolay Mallick Suraj Kumar Ayush Vikas Daga Chandranath Adak Adarsh Wase Arjab Roy 23 0 0 05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities Xuzhi Zhang Jintao Guo Shanshan Zhao Minghao Fu Lunhao Duan Guo-Hua Wang Qing-Guo Chen Zhao Xu Weihua Luo Kaifu Zhang DiffM 74 0 0 05 May 2025
Compositional Image-Text Matching and Retrieval by Grounding Entities Madhukar Reddy Vongala Saurabh Srivastava Jana Kosecka CLIP CoGe VLM 36 0 0 04 May 2025
Mitigating Group-Level Fairness Disparities in Federated Visual Language Models Chaomeng Chen Zitong Yu J. Dong Sen Su L. Shen Shutao Xia Xiaochun Cao FedML VLM 143 0 0 03 May 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models Wufei Ma Luoxin Ye Nessa McWeeney Celso M de Melo A. Yuille Jieneng Chen LRM 65 1 0 01 May 2025
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision Weicai Yan Wang Lin Zirun Guo Ye Wang Fangming Feng Xiaoda Yang Zhilin Wang Tao Jin DiffM 129 2 0 30 Apr 2025
FedMVP: Federated Multi-modal Visual Prompt Tuning for Vision-Language Models Mainak Singha Subhankar Roy Sarthak Mehrotra Ankit Jha Moloud Abdar Biplab Banerjee Elisa Ricci VLM VPVLM 119 0 0 29 Apr 2025
EcoWikiRS: Learning Ecological Representation of Satellite Images from Weak Supervision with Species Observations and Wikipedia Valerie Zermatten J. Castillo-Navarro Pallavi Jain D. Tuia Diego Marcos 62 0 0 28 Apr 2025
ShapeSpeak: Body Shape-Aware Textual Alignment for Visible-Infrared Person Re-Identification Shuanglin Yan Neng Dong Shuang Li Rui Yan Hao Tang Jing Qin 136 0 0 25 Apr 2025
Revisiting Data Auditing in Large Vision-Language Models Hongyu Zhu Sichu Liang Luu Anh Tuan Boheng Li Tongxin Yuan Fangqi Li Shilin Wang Zhuosheng Zhang VLM 185 0 0 25 Apr 2025
EmoSEM: Segment and Explain Emotion Stimuli in Visual Art Jing Zhang Dan Guo Zhangbin Li Meng Wang 33 0 0 20 Apr 2025
LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation Jiachen Li Qing Xie Xiaohan Yu Hongyun Wang Jinyu Xu Yongjian Liu ObjD 78 0 0 20 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network Daniel Bolya Po-Yao (Bernie) Huang Peize Sun Jang Hyun Cho Andrea Madotto ... Shiyu Dong Nikhila Ravi Daniel Li Piotr Dollár Christoph Feichtenhofer ObjD VOS 103 0 0 17 Apr 2025
PATFinger: Prompt-Adapted Transferable Fingerprinting against Unauthorized Multimodal Dataset Usage Wenbo Zhang Ju Jia Xiaojun Jia Yihao Huang Xuzhao Li Cong Wu Lina Wang AAML 38 0 0 15 Apr 2025
On the Value of Cross-Modal Misalignment in Multimodal Representation Learning Yichao Cai Yuhang Liu Erdun Gao Tianjiao Jiang Zhen Zhang Anton van den Hengel Javen Qinfeng Shi 62 0 0 14 Apr 2025
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images Boyang Deng Songyou Peng Kyle Genova Gordon Wetzstein Noah Snavely Leonidas J. Guibas Thomas Funkhouser HAI 151 0 0 11 Apr 2025
Impact of Language Guidance: A Reproducibility Study Cherish Puniani Advika Sinha Shree Singhi Aayan Yadav VLM 47 0 0 10 Apr 2025
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models Justus Westerhoff Erblina Purellku Jakob Hackstein Jonas Loos Leo Pinetzki Lorenz Hufe AAML 28 0 0 07 Apr 2025
EffOWT: Transfer Visual Language Models to Open-World Tracking Efficiently and Effectively Bingyang Wang Kaer Huang Bin Li Yiqiang Yan L. Zhang Huchuan Lu You He VLM 37 0 0 07 Apr 2025
A Reality Check of Vision-Language Pre-training in Radiology: Have We Progressed Using Text? Julio Silva-Rodríguez Jose Dolz Ismail ben Ayed VLM MedIm 38 0 0 07 Apr 2025
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention Jiuniu Wang Wenjia Xu Qingzhong Wang Antoni B. Chan 45 0 0 03 Apr 2025
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning Jie Ma Zhitao Gao Qi Chai Xiaozhong Liu P. Wang Jing Tao Zhou Su 54 0 0 01 Apr 2025
Efficient Adaptation For Remote Sensing Visual Grounding Hasan Moughnieh Mohamad Chalhoub Hasan Nasrallah Cristiano Nattero Paolo Campanella Giovanni Nico A. Ghandour 51 0 0 29 Mar 2025
VisTa: Visual-contextual and Text-augmented Zero-shot Object-level OOD Detection Bin Zhang Xiaoyang Qu Guokuan Li Jiguang Wan Jianzong Wang VLM 56 0 0 28 Mar 2025
Feature Calibration enhanced Parameter Synthesis for CLIP-based Class-incremental Learning J. Guo Xiaoguang Zhu Xiaoguang Zhu Lianlong Sun Liangyu Teng Yang Liu Di Li Wei Zhou Liang Song CLL VLM 59 1 0 24 Mar 2025
GOAL: Global-local Object Alignment Learning Hyungyu Choi Young Kyun Jang Chanho Eom VLM 130 0 0 22 Mar 2025
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding Jinlong Li Cristiano Saltori Fabio Poiesi N. Sebe 168 0 0 20 Mar 2025
TULIP: Towards Unified Language-Image Pretraining Zineng Tang Long Lian Seun Eisape Xudong Wang Roei Herzig Adam Yala Alane Suhr Trevor Darrell David M. Chan VLM CLIP MLLM 103 3 0 19 Mar 2025
Continual Multimodal Contrastive Learning Xiaohao Liu Xiaobo Xia See-Kiong Ng Tat-Seng Chua CLL 57 0 0 19 Mar 2025
SemEval-2025 Task 1: AdMIRe -- Advancing Multimodal Idiomaticity Representation Thomas Pickard Aline Villavicencio Maggie Mi Wei He Dylan Phelps Carolina Scarton 78 1 0 19 Mar 2025
Advancing Medical Representation Learning Through High-Quality Data Negin Baghbanzadeh Adibvafa Fallahpour Yasaman Parhizkar Franklin Ogidi Shuvendu Roy ... Vahid Reza Khazaie Michael Colacci Ali Etemad Arash Afkanpour Elham Dolatabadi LM&MA 85 0 0 18 Mar 2025
ChatBEV: A Visual Language Model that Understands BEV Maps Qingyao Xu S. Chen Guang Chen Yanfeng Wang Yuyao Zhang 51 0 0 18 Mar 2025
Evolution-based Region Adversarial Prompt Learning for Robustness Enhancement in Vision-Language Models Xiaojun Jia Sensen Gao Simeng Qin Ke Ma Xianrui Li Yihao Huang Wei Dong Yang Liu Xiaochun Cao AAML VLM 60 0 0 17 Mar 2025
TLAC: Two-stage LMM Augmented CLIP for Zero-Shot Classification Ans Munir Faisal Z. Qureshi M. H. Khan Mohsen Ali VLM 70 0 0 15 Mar 2025
Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis Hongyu Sun Qiuhong Ke Ming Cheng Yunhong Wang Deying Li Chenhui Gou Jianfei Cai 3DPC 92 0 0 15 Mar 2025
ProAPO: Progressively Automatic Prompt Optimization for Visual Classification Xiangyan Qu Gaopeng Gou Jiamin Zhuang Jing Yu Kun Song Qihao Wang Yili Li Gang Xiong VLM 91 0 0 13 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens Ju He Qihang Yu Qihao Liu Liang-Chieh Chen 68 0 0 13 Mar 2025
ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning Quanxing Zha Xin Liu Shu-Juan Peng Y. Cheung X. Xu Nannan Wang 50 0 0 13 Mar 2025