Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.07991
Cited By
LiT: Zero-Shot Transfer with Locked-image text Tuning
15 November 2021
Xiaohua Zhai
Tianlin Li
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LiT: Zero-Shot Transfer with Locked-image text Tuning"
50 / 422 papers shown
Title
Image Captioners Are Scalable Vision Learners Too
Michael Tschannen
Manoj Kumar
Andreas Steiner
Xiaohua Zhai
N. Houlsby
Lucas Beyer
VLM
CLIP
21
53
0
13 Jun 2023
Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
Ming Y. Lu
Bowen Chen
Andrew Zhang
Drew F. K. Williamson
Richard J. Chen
Tong Ding
L. Le
Yung-Sung Chuang
Faisal Mahmood
VLM
MedIm
25
99
0
13 Jun 2023
Retrieval-Enhanced Contrastive Vision-Text Models
Ahmet Iscen
Mathilde Caron
Alireza Fathi
Cordelia Schmid
CLIP
VLM
31
26
0
12 Jun 2023
DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents
Fuxiao Liu
Hao Tan
Chris Tensmeyer
CLIP
VLM
33
18
0
09 Jun 2023
Modular Visual Question Answering via Code Generation
Sanjay Subramanian
Medhini Narasimhan
Kushal Khangaonkar
Kevin Kaichuang Yang
Arsha Nagrani
Cordelia Schmid
Andy Zeng
Trevor Darrell
Dan Klein
21
46
0
08 Jun 2023
ScaleDet: A Scalable Multi-Dataset Object Detector
Yanbei Chen
Manchen Wang
Abhay Mittal
Zhenlin Xu
Paolo Favaro
Joseph Tighe
Davide Modolo
ObjD
14
19
0
08 Jun 2023
BeyondPixels: A Comprehensive Review of the Evolution of Neural Radiance Fields
AKM SHAHARIAR AZAD RABBY
Chengcui Zhang
24
26
0
05 Jun 2023
Open-world Text-specified Object Counting
Niki Amini-Naieni
Kiana Amini-Naieni
Tengda Han
Andrew Zisserman
VLM
24
16
0
02 Jun 2023
Consistency-guided Prompt Learning for Vision-Language Models
Shuvendu Roy
Ali Etemad
VLM
VPVLM
22
52
0
01 Jun 2023
Improving CLIP Training with Language Rewrites
Lijie Fan
Dilip Krishnan
Phillip Isola
Dina Katabi
Yonglong Tian
BDL
VLM
CLIP
24
155
0
31 May 2023
Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models
Sivan Doveh
Assaf Arbelle
Sivan Harary
Roei Herzig
Donghyun Kim
...
Rameswar Panda
Raja Giryes
Rogerio Feris
S. Ullman
Leonid Karlinsky
VLM
CoGe
38
52
0
31 May 2023
Learning without Forgetting for Vision-Language Models
Da-Wei Zhou
Yuanhan Zhang
Jingyi Ning
Jingyi Ning
De-Chuan Zhan
De-Chuan Zhan
Ziwei Liu
VLM
CLL
71
37
0
30 May 2023
Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors
Paul S. Scotti
Atmadeep Banerjee
J. Goode
Stepan Shabalin
A. Nguyen
...
Nathalie Verlinde
Elad Yundler
David Weisberg
K. A. Norman
Tanishq Mathew Abraham
DiffM
36
106
0
29 May 2023
KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models
Zhiwei Jia
P. Narayana
Arjun Reddy Akula
G. Pruthi
Haoran Su
Sugato Basu
Varun Jampani
VLM
OffRL
15
4
0
28 May 2023
Building One-class Detector for Anything: Open-vocabulary Zero-shot OOD Detection Using Text-image Models
Yunhao Ge
Jie Jessie Ren
Jiaping Zhao
Kaifeng Chen
Andrew Gallagher
Laurent Itti
Balaji Lakshminarayanan
VLM
ObjD
26
1
0
26 May 2023
Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Jannik Kossen
Mark Collier
Basil Mustafa
Tianlin Li
Xiaohua Zhai
Lucas Beyer
Andreas Steiner
Jesse Berent
Rodolphe Jenatton
Efi Kokiopoulou
VLM
39
11
0
26 May 2023
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
Yao Mu
Qinglong Zhang
Mengkang Hu
Wen Wang
Mingyu Ding
Jun Jin
Bin Wang
Jifeng Dai
Yu Qiao
Ping Luo
LM&Ro
LRM
23
220
0
24 May 2023
S-CLIP: Semi-supervised Vision-Language Learning using Few Specialist Captions
Sangwoo Mo
Minkyu Kim
Kyungmin Lee
Jinwoo Shin
VLM
CLIP
44
21
0
23 May 2023
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Ibrahim M. Alabdulmohsin
Xiaohua Zhai
Alexander Kolesnikov
Lucas Beyer
VLM
27
57
0
22 May 2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
Jiaheng Liu
13
1
0
19 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
42
114
0
18 May 2023
Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding
Zhang Tao
Su He
D. Tao
Bin Chen
Zhi Wang
Shutao Xia
VLM
32
21
0
18 May 2023
An Inverse Scaling Law for CLIP Training
Xianhang Li
Zeyu Wang
Cihang Xie
VLM
CLIP
45
54
0
11 May 2023
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
Dahun Kim
A. Angelova
Weicheng Kuo
ObjD
ViT
VLM
25
73
0
11 May 2023
Text-To-Concept (and Back) via Cross-Model Alignment
Mazda Moayeri
Keivan Rezaei
Maziar Sanjabi
S. Feizi
CLIP
38
41
0
10 May 2023
ImageBind: One Embedding Space To Bind Them All
Rohit Girdhar
Alaaeldin El-Nouby
Zhuang Liu
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
VLM
37
844
0
09 May 2023
Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime
Chuhan Zhang
Antoine Miech
Jiajun Shen
Jean-Baptiste Alayrac
Pauline Luc
VLM
VPVLM
39
2
0
03 May 2023
CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations
Gengchen Mai
Ni Lao
Yutong He
Jiaming Song
Stefano Ermon
80
58
0
01 May 2023
Learning Human-Human Interactions in Images from Weak Textual Supervision
Morris Alper
Hadar Averbuch-Elor
VLM
39
2
0
27 Apr 2023
A Cookbook of Self-Supervised Learning
Randall Balestriero
Mark Ibrahim
Vlad Sobal
Ari S. Morcos
Shashank Shekhar
...
Pierre Fernandez
Amir Bar
Hamed Pirsiavash
Yann LeCun
Micah Goldblum
SyDa
FedML
SSL
44
273
0
24 Apr 2023
Image retrieval outperforms diffusion models on data augmentation
Max F. Burg
F. Wenzel
Dominik Zietlow
Max Horn
Osama Makansi
Francesco Locatello
Chris Russell
VLM
DiffM
42
16
0
20 Apr 2023
Learning Sample Difficulty from Pre-trained Models for Reliable Prediction
Peng Cui
Dan Zhang
Zhijie Deng
Yinpeng Dong
Junyi Zhu
18
12
0
20 Apr 2023
DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training
Yihao Chen
Xianbiao Qi
Jianan Wang
Lei Zhang
18
16
0
17 Apr 2023
Progressive Visual Prompt Learning with Contrastive Feature Re-formation
C. Xu
Yuhan Zhu
Haocheng Shen
Fengyuan Shi
Boheng Chen
Yixuan Liao
Xiaoxin Chen
Limin Wang
VLM
33
20
0
17 Apr 2023
Chain of Thought Prompt Tuning in Vision Language Models
Jiaxin Ge
Hongyin Luo
Siyuan Qian
Yulu Gan
Jie Fu
Shanghang Zhang
VLM
LRM
MLLM
35
27
0
16 Apr 2023
RECLIP: Resource-efficient CLIP by Training with Small Images
Runze Li
Dahun Kim
B. Bhanu
Weicheng Kuo
VLM
CLIP
28
13
0
12 Apr 2023
APPLeNet: Visual Attention Parameterized Prompt Learning for Few-Shot Remote Sensing Image Generalization using CLIP
Mainak Singha
Ankit Jha
Bhupendra S. Solanki
Shirsha Bose
Biplab Banerjee
VLM
19
27
0
12 Apr 2023
Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
Ahmet Iscen
Alireza Fathi
Cordelia Schmid
VLM
3DV
33
25
0
11 Apr 2023
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya
Anurag Arnab
Arsha Nagrani
Michael S. Ryoo
33
19
0
05 Apr 2023
What's in a Name? Beyond Class Indices for Image Recognition
Kai Han
Yandong Li
S. Vaze
Jie Li
Xuhui Jia
VLM
19
7
0
05 Apr 2023
AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation
Jheng-Hong Yang
Carlos Lassance
Rafael Sampaio de Rezende
Krishna Srinivasan
Miriam Redi
S. Clinchant
Jimmy J. Lin
42
12
0
04 Apr 2023
Learning to Name Classes for Vision and Language Models
Sarah Parisot
Yongxin Yang
Steven G. McDonagh
VLM
17
10
0
04 Apr 2023
Black Box Few-Shot Adaptation for Vision-Language models
Yassine Ouali
Adrian Bulat
Brais Martínez
Georgios Tzimiropoulos
VLM
26
31
0
04 Apr 2023
Associating Spatially-Consistent Grouping with Text-supervised Semantic Segmentation
Yabo Zhang
Zihao Wang
Jun Hao Liew
Jingjia Huang
Manyu Zhu
Jiashi Feng
W. Zuo
VLM
16
4
0
03 Apr 2023
Probabilistic Prompt Learning for Dense Prediction
Hyeongjun Kwon
Taeyong Song
Somi Jeong
Jin-Hwa Kim
Jinhyun Jang
Kwanghoon Sohn
VLM
25
18
0
03 Apr 2023
Vision-Language Models for Vision Tasks: A Survey
Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
VLM
41
483
0
03 Apr 2023
Going Beyond Nouns With Vision & Language Models Using Synthetic Data
Paola Cascante-Bonilla
Khaled Shehada
James Smith
Sivan Doveh
Donghyun Kim
...
Gül Varol
A. Oliva
Vicente Ordonez
Rogerio Feris
Leonid Karlinsky
VLM
SyDa
29
40
0
30 Mar 2023
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Weicheng Kuo
A. Piergiovanni
Dahun Kim
Xiyang Luo
Benjamin Caine
...
Luowei Zhou
Andrew M. Dai
Zhifeng Chen
Claire Cui
A. Angelova
MLLM
VLM
29
23
0
29 Mar 2023
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Renrui Zhang
Jiaming Han
Chris Liu
Peng Gao
Aojun Zhou
Xiangfei Hu
Shilin Yan
Pan Lu
Hongsheng Li
Yu Qiao
MLLM
38
743
0
28 Mar 2023
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
30
951
0
27 Mar 2023
Previous
1
2
3
4
5
6
7
8
9
Next