Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.05208
Cited By
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
11 October 2021
Yangguang Li
Feng Liang
Lichen Zhao
Yufeng Cui
Wanli Ouyang
Jing Shao
F. Yu
Junjie Yan
VLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm"
50 / 324 papers shown
Title
GeoMM: On Geodesic Perspective for Multi-modal Learning
Shibin Mei
Hang Wang
Bingbing Ni
19
0
0
16 May 2025
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
Hanxun Huang
Sarah Monazam Erfani
Yige Li
Xingjun Ma
James Bailey
AAML
44
0
0
08 May 2025
EcoWikiRS: Learning Ecological Representation of Satellite Images from Weak Supervision with Species Observations and Wikipedia
Valerie Zermatten
J. Castillo-Navarro
Pallavi Jain
D. Tuia
Diego Marcos
62
0
0
28 Apr 2025
Post-pre-training for Modality Alignment in Vision-Language Foundation Models
Shinýa Yamaguchi
Dewei Feng
Sekitoshi Kanai
Kazuki Adachi
Daiki Chijiwa
VLM
34
0
0
17 Apr 2025
Impact of Language Guidance: A Reproducibility Study
Cherish Puniani
Advika Sinha
Shree Singhi
Aayan Yadav
VLM
47
0
0
10 Apr 2025
ProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity Recognition
Sanjoy Kundu
Shanmukha Vellamchetti
Sathyanarayanan N. Aakur
EgoV
52
0
0
04 Apr 2025
SAVeD: Learning to Denoise Low-SNR Video for Improved Downstream Performance
Suzanne Stathatos
Michael Hobley
Markus Marks
Pietro Perona
35
0
0
31 Mar 2025
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Gensheng Pei
Tao Chen
Yujia Wang
Xinhao Cai
Xiangbo Shu
Tianfei Zhou
Yazhou Yao
VLM
53
1
0
21 Mar 2025
Dynamic Relation Inference via Verb Embeddings
Omri Suissa
Muhiim Ali
Ariana Azarbal
Hui Shen
Shekhar Pradhan
46
0
0
17 Mar 2025
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing
Zilun Zhang
Haozhan Shen
Tiancheng Zhao
Bin Chen
Zian Guan
Yuhao Wang
Xu Jia
Yuxiang Cai
Yongheng Shang
Jianwei Yin
54
0
0
16 Mar 2025
ProAPO: Progressively Automatic Prompt Optimization for Visual Classification
Xiangyan Qu
Gaopeng Gou
Jiamin Zhuang
Jing Yu
Kun Song
Qihao Wang
Yili Li
Gang Xiong
VLM
91
0
0
13 Mar 2025
DiffCLIP: Differential Attention Meets CLIP
Hasan Hammoud
Guohao Li
VLM
44
0
0
09 Mar 2025
Inclusive STEAM Education: A Framework for Teaching Cod-2 ing and Robotics to Students with Visually Impairment Using 3 Advanced Computer Vision
Mahmoud Hamash
Md Raqib Khan
Peter Tiernan
37
0
0
06 Mar 2025
A Shared Encoder Approach to Multimodal Representation Learning
Shuvendu Roy
Franklin Ogidi
Ali Etemad
Elham Dolatabadi
Arash Afkanpour
47
0
0
03 Mar 2025
Multi-Faceted Multimodal Monosemanticity
Hanqi Yan
Xiangxiang Cui
Lu Yin
Paul Pu Liang
Yulan He
Yifei Wang
44
0
0
16 Feb 2025
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
Marco Mistretta
Alberto Baldrati
Lorenzo Agnolucci
Marco Bertini
Andrew D. Bagdanov
CLIP
VLM
104
2
0
06 Feb 2025
AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis
B. Alawode
I. I. Ganapathi
S. Javed
Naoufel Werghi
Mohammed Bennamoun
Arif Mahmood
CLIP
VLM
78
1
0
03 Feb 2025
ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models
Yassir Bendou
Amine Ouasfi
Vincent Gripon
A. Boukhayma
VLM
51
0
0
19 Jan 2025
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Ming Dai
Jian Li
Jiedong Zhuang
Xian Zhang
Wankou Yang
ObjD
42
1
0
12 Jan 2025
Bringing Multimodality to Amazon Visual Search System
Xinliang Zhu
Michael Huang
Han Ding
Jinyu Yang
Kelvin Chen
...
Son Dinh Tran
Benjamin Z. Yao
Doug Gray
Anuj Bindal
Arnab Dhua
74
3
0
17 Dec 2024
Beyond Accuracy: On the Effects of Fine-tuning Towards Vision-Language Model's Prediction Rationality
Qitong Wang
Tang Li
Kien X. Nguyen
Xi Peng
85
0
0
17 Dec 2024
Attention Head Purification: A New Perspective to Harness CLIP for Domain Generalization
Yingfan Wang
Guoliang Kang
VLM
82
0
0
10 Dec 2024
DiffCLIP: Few-shot Language-driven Multimodal Classifier
Jiaqing Zhang
Mingxiang Cao
Xue Yang
Kai Jiang
Yunsong Li
VLM
82
0
0
10 Dec 2024
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
Chu Myaet Thwal
Ye Lin Tun
Minh N. H. Nguyen
Eui-nam Huh
Choong Seon Hong
VLM
74
0
0
05 Dec 2024
FLAIR: VLM with Fine-grained Language-informed Image Representations
Rui Xiao
Sanghwan Kim
Mariana-Iuliana Georgescu
Zeynep Akata
Stephan Alaniz
VLM
CLIP
73
2
0
04 Dec 2024
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
79
2
0
02 Dec 2024
Gen-AI for User Safety: A Survey
Akshar Prabhu Desai
Tejasvi Ravi
Mohammad Luqman
Mohit Sharma
Nithya Kota
Pranjul Yadav
35
1
0
10 Nov 2024
Beyond Accuracy: Ensuring Correct Predictions With Correct Rationales
Tang Li
Mengmeng Ma
Xi Peng
39
2
0
31 Oct 2024
Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP
Chen Huang
Skyler Seto
Samira Abnar
David Grangier
Navdeep Jaitly
J. Susskind
VLM
51
0
0
31 Oct 2024
Active Learning for Vision-Language Models
Bardia Safaei
Vishal M. Patel
VLM
47
2
0
29 Oct 2024
Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models
Ce Zhang
Simon Stepputtis
Katia P. Sycara
Yaqi Xie
VLM
35
5
0
16 Oct 2024
LatentBKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty
Joey Wilson
Ruihan Xu
Yile Sun
Parker Ewen
Minghan Zhu
Kira Barton
Maani Ghaffari
36
0
0
15 Oct 2024
FLIER: Few-shot Language Image Models Embedded with Latent Representations
Zhinuo Zhou
Peng Zhou
Xiaoyong Pan
VLM
28
0
0
10 Oct 2024
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See
Phu Pham
Phu Pham
Kun Wan
Yu-Jhe Li
Zeliang Zhang
Daniel Miranda
Ajinkya Kale
Ajinkya Kale
Chenliang Xu
29
5
0
08 Oct 2024
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models
Rabin Adhikari
Safal Thapaliya
Manish Dhakal
Bishesh Khanal
MLLM
VLM
35
0
0
07 Oct 2024
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Hanqi Jiang
Xixuan Hao
Yuzhou Huang
Chong Ma
Jiaxun Zhang
Yi Pan
Ruimao Zhang
MedIm
37
0
0
01 Oct 2024
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Kun Yuan
V. Srivastav
Nassir Navab
N. Padoy
44
7
0
30 Sep 2024
Learning to Obstruct Few-Shot Image Classification over Restricted Classes
Amber Yijia Zheng
Chiao-An Yang
Raymond A. Yeh
32
1
0
28 Sep 2024
Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification
Raja Kumar
Raghav Singhal
Pranamya Kulkarni
Deval Mehta
Kshitij Jadhav
23
0
0
26 Sep 2024
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion
Ming Dai
Lingfeng Yang
Yihao Xu
Zhenhua Feng
Wankou Yang
ObjD
27
9
0
26 Sep 2024
Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography
Yuexi Du
John Onofrey
Nicha Dvornek
VLM
50
1
0
26 Sep 2024
Adversarial Backdoor Defense in CLIP
Junhao Kuang
Siyuan Liang
Jiawei Liang
Kuanrong Liu
Xiaochun Cao
AAML
36
2
0
24 Sep 2024
ComAlign: Compositional Alignment in Vision-Language Models
Ali Abdollah
Amirmohammad Izadi
Armin Saghafian
Reza Vahidimajd
Mohammad Mozafari
Amirreza Mirzaei
Mohammadmahdi Samiei
M. Baghshah
CoGe
VLM
30
0
0
12 Sep 2024
A Multi-Modal Deep Learning Based Approach for House Price Prediction
Md Hasebul Hasan
Md Abid Jahan
Mohammed Eunus Ali
Yuan-Fang Li
Timos Sellis
16
0
0
09 Sep 2024
DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models
Eman Ali
Sathira Silva
Muhammad Haris Khan
VLM
37
0
0
16 Aug 2024
On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey
Jingcai Guo
Zhijie Rao
Zhi Chen
Song Guo
Jingren Zhou
Dacheng Tao
33
3
0
09 Aug 2024
FMiFood: Multi-modal Contrastive Learning for Food Image Classification
Xinyue Pan
Jiangpeng He
F. Zhu
42
2
0
07 Aug 2024
Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline
Tianqi Wei
Zhi Chen
Zi Huang
Xin Yu
22
6
0
06 Aug 2024
BioRAG: A RAG-LLM Framework for Biological Question Reasoning
Chengrui Wang
Qingqing Long
Meng Xiao
Xunxin Cai
Chengjun Wu
Zhen Meng
Xuezhi Wang
Yuanchun Zhou
44
26
0
02 Aug 2024
LADDER: Language Driven Slice Discovery and Error Rectification
Shantanu Ghosh
Rayan Syed
Chenyu Wang
Clare B. Poynton
Kayhan Batmanghelich
39
0
0
31 Jul 2024
1
2
3
4
5
6
7
Next