Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.07991
Cited By
LiT: Zero-Shot Transfer with Locked-image text Tuning
15 November 2021
Xiaohua Zhai
Tianlin Li
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LiT: Zero-Shot Transfer with Locked-image text Tuning"
50 / 422 papers shown
Title
Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning
Bang-ju Yang
Yong Dai
Xuxin Cheng
Yaowei Li
Asif Raza
Yuexian Zou
VLM
39
4
0
30 Jan 2024
M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining
Qingpei Guo
Furong Xu
Hanxiao Zhang
Wang Ren
Ziping Ma
Lin Ju
Jian Wang
Jingdong Chen
Ming Yang
VLM
MLLM
27
2
0
29 Jan 2024
Zoom-shot: Fast and Efficient Unsupervised Zero-Shot Transfer of CLIP to Vision Encoders with Multimodal Loss
Jordan Shipard
Arnold Wiliem
Kien Nguyen Thanh
Wei Xiang
Clinton Fookes
VLM
CLIP
38
2
0
22 Jan 2024
Exploring scalable medical image encoders beyond text supervision
Fernando Pérez-García
Harshita Sharma
Sam Bond-Taylor
Kenza Bouzid
Valentina Salvatelli
...
Maria T. A. Wetscherek
Noel C. F. Codella
Stephanie L. Hyland
Javier Alvarez-Valle
Ozan Oktay
LM&MA
MedIm
50
26
0
19 Jan 2024
Distilling Vision-Language Models on Millions of Videos
Yue Zhao
Long Zhao
Xingyi Zhou
Jialin Wu
Chun-Te Chu
...
Hartwig Adam
Ting Liu
Boqing Gong
Philipp Krahenbuhl
Liangzhe Yuan
VLM
34
13
0
11 Jan 2024
GRAM: Global Reasoning for Multi-Page VQA
Tsachi Blau
Sharon Fogel
Roi Ronen
Alona Golts
Roy Ganz
Elad Ben Avraham
Aviad Aberdam
Shahar Tsiper
Ron Litman
16
12
0
07 Jan 2024
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
Haobo Yuan
Xiangtai Li
Chong Zhou
Yining Li
Kai Chen
Chen Change Loy
VLM
29
51
0
05 Jan 2024
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment
Ziping Ma
Furong Xu
Jian Liu
Ming Yang
Qingpei Guo
VLM
42
3
0
04 Jan 2024
Unveiling Backbone Effects in CLIP: Exploring Representational Synergies and Variances
Cristian Rodriguez-Opazo
Edison Marrese-Taylor
Ehsan Abbasnejad
Hamed Damirchi
Ignacio M. Jara
Felipe Bravo-Marquez
Anton Van Den Hengel
VLM
51
1
0
22 Dec 2023
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
176
924
0
21 Dec 2023
Parrot Captions Teach CLIP to Spot Text
Yiqi Lin
Conghui He
Alex Jinpeng Wang
Bin Wang
Weijia Li
Mike Zheng Shou
36
7
0
21 Dec 2023
A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models
Julio Silva-Rodríguez
Sina Hajimiri
Ismail Ben Ayed
Jose Dolz
VLM
26
27
0
20 Dec 2023
LaViP:Language-Grounded Visual Prompts
Nilakshan Kunananthaseelan
Jing Zhang
Mehrtash Harandi
VLM
22
0
0
18 Dec 2023
Understanding the Multi-modal Prompts of the Pre-trained Vision-Language Model
Shuailei Ma
Chen-Wei Xie
Ying-yu Wei
Siyang Sun
Jiaqi Fan
Xiaoyi Bao
Yuxin Guo
Yun Zheng
VLM
VPVLM
26
2
0
18 Dec 2023
Data-Efficient Multimodal Fusion on a Single GPU
Noël Vouitsis
Zhaoyan Liu
S. Gorti
Valentin Villecroze
Jesse C. Cresswell
Guangwei Yu
G. Loaiza-Ganem
M. Volkovs
48
3
0
15 Dec 2023
Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning
Avelina Asada Hadji-Kyriacou
Ognjen Arandjelović
27
0
0
14 Dec 2023
LAMM: Label Alignment for Multi-Modal Prompt Learning
Jingsheng Gao
Jiacheng Ruan
Suncheng Xiang
Zefang Yu
Ke Ji
Mingye Xie
Ting Liu
Yuzhuo Fu
MLLM
VLM
VPVLM
24
15
0
13 Dec 2023
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Shuyang Sun
Runjia Li
Philip H. S. Torr
Xiuye Gu
Siyang Li
VLM
CLIP
31
32
0
12 Dec 2023
Open Domain Generalization with a Single Network by Regularization Exploiting Pre-trained Features
Inseop Chung
Kiyoon Yoo
Nojun Kwak
VLM
16
0
0
08 Dec 2023
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations
Maitreya Patel
Changhoon Kim
Sheng Cheng
Chitta Baral
Yezhou Yang
VLM
27
18
0
07 Dec 2023
Combining inherent knowledge of vision-language models with unsupervised domain adaptation through self-knowledge distillation
Thomas Westfechtel
Dexuan Zhang
Tatsuya Harada
VLM
26
2
0
07 Dec 2023
Mitigating Open-Vocabulary Caption Hallucinations
Assaf Ben-Kish
Moran Yanuka
Morris Alper
Raja Giryes
Hadar Averbuch-Elor
MLLM
VLM
20
6
0
06 Dec 2023
CLAMP: Contrastive LAnguage Model Prompt-tuning
Piotr Teterwak
Ximeng Sun
Bryan A. Plummer
Kate Saenko
Ser-Nam Lim
MLLM
VLM
35
1
0
04 Dec 2023
TextAug: Test time Text Augmentation for Multimodal Person Re-identification
Mulham Fawakherji
Eduard Vazquez
P. Giampa
Binod Bhattarai
43
2
0
04 Dec 2023
Brainformer: Mimic Human Visual Brain Functions to Machine Vision Models via fMRI
Xuan-Bac Nguyen
Xin Li
Pawan Sinha
Samee U. Khan
Khoa Luu
ViT
MedIm
29
0
0
30 Nov 2023
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Rohan Myer Krishnan
Zitian Tang
Zhiqiu Yu
Chen Sun
53
1
0
30 Nov 2023
A Simple Recipe for Language-guided Domain Generalized Segmentation
Mohammad Fahes
Tuan-Hung Vu
Andrei Bursuc
Patrick Pérez
Raoul de Charette
VLM
23
14
0
29 Nov 2023
Language-conditioned Detection Transformer
Jang Hyun Cho
Philipp Krahenbuhl
VLM
ObjD
47
1
0
29 Nov 2023
ViT-Lens: Towards Omni-modal Representations
Weixian Lei
Yixiao Ge
Kun Yi
Jianfeng Zhang
Difei Gao
Dylan Sun
Yuying Ge
Ying Shan
Mike Zheng Shou
21
18
0
27 Nov 2023
Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding
Hoang-Quan Nguyen
Thanh-Dat Truong
Xuan-Bac Nguyen
Ashley Dowling
Xin Li
Khoa Luu
VLM
24
19
0
26 Nov 2023
HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding
Peng Xia
Xingtong Yu
Ming Hu
Lie Ju
Zhiyong Wang
Peibo Duan
Zongyuan Ge
VLM
54
9
0
23 Nov 2023
Active Prompt Learning in Vision Language Models
Jihwan Bang
Sumyeong Ahn
Jae-Gil Lee
VLM
11
9
0
18 Nov 2023
OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data
Shiyang Lu
Haonan Chang
E. Jing
Abdeslam Boularias
Kostas Bekris
16
54
0
06 Nov 2023
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
Jameel Hassan
Hanan Gani
Noor Hussein
Muhammad Uzair Khattak
Muzammal Naseer
Fahad Shahbaz Khan
Salman Khan
VLM
OOD
53
61
0
02 Nov 2023
Latent Space Translation via Semantic Alignment
Valentino Maiorca
Luca Moschella
Antonio Norelli
Marco Fumero
Francesco Locatello
Emanuele Rodolà
27
20
0
01 Nov 2023
De-Diffusion Makes Text a Strong Cross-Modal Interface
Chen Wei
Chenxi Liu
Siyuan Qiao
Zhishuai Zhang
Alan Yuille
Jiahui Yu
VLM
DiffM
29
10
0
01 Nov 2023
Form follows Function: Text-to-Text Conditional Graph Generation based on Functional Requirements
Peter Zachares
Vahan Hovhannisyan
Alan Mosca
Yarin Gal
29
1
0
01 Nov 2023
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Haoxiang Wang
Pavan Kumar Anasosalu Vasu
Fartash Faghri
Raviteja Vemulapalli
Mehrdad Farajtabar
Sachin Mehta
Mohammad Rastegari
Oncel Tuzel
Hadi Pouransari
VLM
29
67
0
23 Oct 2023
CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages
G. O. D. Santos
Diego A. B. Moreira
Alef Iury Ferreira
Jhessica Silva
Luiz Pereira
...
H. Maia
Nádia Da Silva
Esther Colombini
Hélio Pedrini
Sandra Avila
VLM
CLIP
34
4
0
20 Oct 2023
SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem
Yongqin Xian
Xiaohua Zhai
Lukas Hoyer
Luc Van Gool
F. Tombari
VLM
26
33
0
20 Oct 2023
Few-shot Action Recognition with Captioning Foundation Models
Xiang Wang
Shiwei Zhang
Hangjie Yuan
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
VLM
28
7
0
16 Oct 2023
Beyond Segmentation: Road Network Generation with Multi-Modal LLMs
Sumedh Rasal
Sanjay K. Boddhu
32
5
0
15 Oct 2023
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Xi Chen
Xiao Wang
Lucas Beyer
Alexander Kolesnikov
Jialin Wu
...
Keran Rong
Tianli Yu
Daniel Keysers
Xiao-Qi Zhai
Radu Soricut
MLLM
VLM
32
93
0
13 Oct 2023
Visual Data-Type Understanding does not emerge from Scaling Vision-Language Models
Vishaal Udandarao
Max F. Burg
Samuel Albanie
Matthias Bethge
VLM
31
9
0
12 Oct 2023
Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification
Sravanti Addepalli
Ashish Ramayee Asokan
Lakshay Sharma
R. V. Babu
VLM
24
15
0
12 Oct 2023
VeCLIP: Improving CLIP Training via Visual-enriched Captions
Zhengfeng Lai
Haotian Zhang
Bowen Zhang
Wentao Wu
Haoping Bai
...
Zhe Gan
Jiulong Shan
Chen-Nee Chuah
Yinfei Yang
Meng Cao
CLIP
VLM
34
28
0
11 Oct 2023
IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training
Che Liu
Sibo Cheng
Miaojing Shi
Anand Shah
Wenjia Bai
Rossella Arcucci
24
26
0
11 Oct 2023
Visual Storytelling with Question-Answer Plans
Danyang Liu
Mirella Lapata
Frank Keller
CoGe
11
9
0
08 Oct 2023
LAN-grasp: Using Large Language Models for Semantic Object Grasping
Reihaneh Mirjalili
Michael Krawez
Simone Silenzi
Yannik Blei
Wolfram Burgard
VLM
62
27
0
08 Oct 2023
IPMix: Label-Preserving Data Augmentation Method for Training Robust Classifiers
Zhenglin Huang
Xianan Bao
Na Zhang
Qingqi Zhang
Xiaomei Tu
Biao Wu
Xi Yang
30
7
0
07 Oct 2023
Previous
1
2
3
4
5
6
7
8
9
Next