ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.07991
  4. Cited By
LiT: Zero-Shot Transfer with Locked-image text Tuning

LiT: Zero-Shot Transfer with Locked-image text Tuning

15 November 2021
Xiaohua Zhai
Tianlin Li
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
    VLM
ArXivPDFHTML

Papers citing "LiT: Zero-Shot Transfer with Locked-image text Tuning"

50 / 422 papers shown
Title
ClipCrop: Conditioned Cropping Driven by Vision-Language Model
ClipCrop: Conditioned Cropping Driven by Vision-Language Model
Zhihang Zhong
Mingxi Cheng
Zhirong Wu
Yuhui Yuan
Yinqiang Zheng
Ji Li
Han Hu
Stephen Lin
Yoichi Sato
Imari Sato
VLM
CLIP
35
3
0
21 Nov 2022
I Can't Believe There's No Images! Learning Visual Tasks Using only
  Language Supervision
I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision
Sophia Gu
Christopher Clark
Aniruddha Kembhavi
VLM
14
24
0
17 Nov 2022
AltCLIP: Altering the Language Encoder in CLIP for Extended Language
  Capabilities
AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities
Zhongzhi Chen
Guangyi Liu
Bo-Wen Zhang
Fulong Ye
Qinghong Yang
Ledell Yu Wu
VLM
37
79
0
12 Nov 2022
Understanding and Mitigating Overfitting in Prompt Tuning for
  Vision-Language Models
Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models
Cheng Ma
Yang Liu
Jiankang Deng
Lingxi Xie
Weiming Dong
Changsheng Xu
VLM
VPVLM
26
43
0
04 Nov 2022
Training Vision-Language Models with Less Bimodal Supervision
Training Vision-Language Models with Less Bimodal Supervision
Elad Segal
Ben Bogin
Jonathan Berant
VLM
21
2
0
01 Nov 2022
SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity
  Representation
SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity Representation
Zekun Li
Jina Kim
Yao-Yi Chiang
Muhao Chen
87
27
0
21 Oct 2022
LAION-5B: An open large-scale dataset for training next generation
  image-text models
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
43
3,261
0
16 Oct 2022
Plausible May Not Be Faithful: Probing Object Hallucination in
  Vision-Language Pre-training
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
Wenliang Dai
Zihan Liu
Ziwei Ji
Dan Su
Pascale Fung
MLLM
VLM
29
62
0
14 Oct 2022
Caption supervision enables robust learners
Caption supervision enables robust learners
Ben Feuer
Ameya Joshi
C. Hegde
SSL
CLIP
VLM
39
2
0
13 Oct 2022
A Memory Transformer Network for Incremental Learning
A Memory Transformer Network for Incremental Learning
Ahmet Iscen
Thomas Bird
Mathilde Caron
Alireza Fathi
Cordelia Schmid
CLL
121
14
0
10 Oct 2022
APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal
  Representations
APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal Representations
Elan Rosenfeld
Preetum Nakkiran
Hadi Pouransari
Oncel Tuzel
Fartash Faghri
68
6
0
08 Oct 2022
MaPLe: Multi-modal Prompt Learning
MaPLe: Multi-modal Prompt Learning
Muhammad Uzair Khattak
H. Rasheed
Muhammad Maaz
Salman Khan
F. Khan
VPVLM
VLM
194
531
0
06 Oct 2022
Matching Text and Audio Embeddings: Exploring Transfer-learning
  Strategies for Language-based Audio Retrieval
Matching Text and Audio Embeddings: Exploring Transfer-learning Strategies for Language-based Audio Retrieval
Benno Weck
Miguel Pérez Fernández
Holger Kirchhoff
Xavier Serra
13
3
0
06 Oct 2022
Granularity-aware Adaptation for Image Retrieval over Multiple Tasks
Granularity-aware Adaptation for Image Retrieval over Multiple Tasks
Jon Almazán
ByungSoo Ko
Geonmo Gu
Diane Larlus
Yannis Kalantidis
ObjD
VLM
36
7
0
05 Oct 2022
When and why vision-language models behave like bags-of-words, and what
  to do about it?
When and why vision-language models behave like bags-of-words, and what to do about it?
Mert Yuksekgonul
Federico Bianchi
Pratyusha Kalluri
Dan Jurafsky
James Zou
VLM
CoGe
28
362
0
04 Oct 2022
ASIF: Coupled Data Turns Unimodal Models to Multimodal Without Training
ASIF: Coupled Data Turns Unimodal Models to Multimodal Without Training
Antonio Norelli
Marco Fumero
Valentino Maiorca
Luca Moschella
Emanuele Rodolà
Francesco Locatello
VLM
81
33
0
04 Oct 2022
F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language
  Models
F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Weicheng Kuo
Huayu Chen
Xiuye Gu
A. Piergiovanni
A. Angelova
MLLM
VLM
ObjD
49
134
0
30 Sep 2022
Understanding Pure CLIP Guidance for Voxel Grid NeRF Models
Understanding Pure CLIP Guidance for Voxel Grid NeRF Models
Han-Hung Lee
Angel X. Chang
24
63
0
30 Sep 2022
Linearly Mapping from Image to Text Space
Linearly Mapping from Image to Text Space
Jack Merullo
Louis Castricato
Carsten Eickhoff
Ellie Pavlick
VLM
164
104
0
30 Sep 2022
REST: REtrieve & Self-Train for generative action recognition
REST: REtrieve & Self-Train for generative action recognition
Adrian Bulat
Enrique Sanchez
Brais Martínez
Georgios Tzimiropoulos
VLM
29
4
0
29 Sep 2022
FreeSeg: Free Mask from Interpretable Contrastive Language-Image Pretraining for Semantic Segmentation
Yi Li
Huifeng Yao
Hualiang Wang
Xiaomeng Li
ISeg
VLM
35
2
0
27 Sep 2022
I2DFormer: Learning Image to Document Attention for Zero-Shot Image
  Classification
I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification
Muhammad Ferjad Naeem
Yongqin Xian
Luc Van Gool
F. Tombari
VLM
17
37
0
21 Sep 2022
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language
  Models
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models
Manli Shu
Weili Nie
De-An Huang
Zhiding Yu
Tom Goldstein
Anima Anandkumar
Chaowei Xiao
VLM
VPVLM
186
282
0
15 Sep 2022
Exploring Visual Interpretability for Contrastive Language-Image
  Pre-training
Exploring Visual Interpretability for Contrastive Language-Image Pre-training
Yi Li
Hualiang Wang
Yiqun Duan
Han Xu
Xiaomeng Li
CLIP
VLM
98
25
0
15 Sep 2022
PaLI: A Jointly-Scaled Multilingual Language-Image Model
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Xi Chen
Tianlin Li
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLM
VLM
37
682
0
14 Sep 2022
Segmenting Known Objects and Unseen Unknowns without Prior Knowledge
Segmenting Known Objects and Unseen Unknowns without Prior Knowledge
Stefano Gasperini
Alvaro Marcos-Ramiro
Michael Schmidt
Nassir Navab
Benjamin Busam
F. Tombari
40
8
0
12 Sep 2022
FETA: Towards Specializing Foundation Models for Expert Task
  Applications
FETA: Towards Specializing Foundation Models for Expert Task Applications
Amit Alfassy
Assaf Arbelle
Oshri Halimi
Sivan Harary
Roei Herzig
...
Christoph Auer
Kate Saenko
Peter W. J. Staar
Rogerio Feris
Leonid Karlinsky
23
19
0
08 Sep 2022
Contrastive Audio-Language Learning for Music
Contrastive Audio-Language Learning for Music
Ilaria Manco
Emmanouil Benetos
Elio Quinton
Gyorgy Fazekas
27
44
0
25 Aug 2022
Patching open-vocabulary models by interpolating weights
Patching open-vocabulary models by interpolating weights
Gabriel Ilharco
Mitchell Wortsman
S. Gadre
Shuran Song
Hannaneh Hajishirzi
Simon Kornblith
Ali Farhadi
Ludwig Schmidt
VLM
KELM
32
166
0
10 Aug 2022
Learning Visual Representation from Modality-Shared Contrastive
  Language-Image Pre-training
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Haoxuan You
Luowei Zhou
Bin Xiao
Noel Codella
Yu Cheng
Ruochen Xu
Shih-Fu Chang
Lu Yuan
CLIP
VLM
21
48
0
26 Jul 2022
Exploring the Design of Adaptation Protocols for Improved Generalization
  and Machine Learning Safety
Exploring the Design of Adaptation Protocols for Improved Generalization and Machine Learning Safety
Puja Trivedi
Danai Koutra
Jayaraman J. Thiagarajan
AAML
25
0
0
26 Jul 2022
Assaying Out-Of-Distribution Generalization in Transfer Learning
Assaying Out-Of-Distribution Generalization in Transfer Learning
F. Wenzel
Andrea Dittadi
Peter V. Gehler
Carl-Johann Simon-Gabriel
Max Horn
...
Chris Russell
Thomas Brox
Bernt Schiele
Bernhard Schölkopf
Francesco Locatello
OOD
OODD
AAML
54
71
0
19 Jul 2022
Is a Caption Worth a Thousand Images? A Controlled Study for
  Representation Learning
Is a Caption Worth a Thousand Images? A Controlled Study for Representation Learning
Shibani Santurkar
Yann Dubois
Rohan Taori
Percy Liang
Tatsunori Hashimoto
CLIP
VLM
19
41
0
15 Jul 2022
e-CLIP: Large-Scale Vision-Language Representation Learning in
  E-commerce
e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce
Wonyoung Shin
Jonghun Park
Taekang Woo
Yongwoo Cho
Kwangjin Oh
Hwanjun Song
VLM
27
16
0
01 Jul 2022
ProtoCLIP: Prototypical Contrastive Language Image Pretraining
ProtoCLIP: Prototypical Contrastive Language Image Pretraining
Delong Chen
Zhao Wu
Fan Liu
Zaiquan Yang
Huaxi Huang
Ying Tan
Erjin Zhou
VLM
CLIP
27
28
0
22 Jun 2022
BridgeTower: Building Bridges Between Encoders in Vision-Language
  Representation Learning
BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning
Xiao Xu
Chenfei Wu
Shachar Rosenman
Vasudev Lal
Wanxiang Che
Nan Duan
43
64
0
17 Jun 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLM
ObjD
24
124
0
15 Jun 2022
Masked Unsupervised Self-training for Label-free Image Classification
Masked Unsupervised Self-training for Label-free Image Classification
Junnan Li
Silvio Savarese
Steven C. H. Hoi
VLM
SSL
12
12
0
07 Jun 2022
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture
  of Experts
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Basil Mustafa
C. Riquelme
J. Puigcerver
Rodolphe Jenatton
N. Houlsby
VLM
MoE
28
183
0
06 Jun 2022
Utilizing Language-Image Pretraining for Efficient and Robust Bilingual
  Word Alignment
Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment
Tuan Dinh
Jy-yong Sohn
Shashank Rajput
Timothy Ossowski
Yifei Ming
Junjie Hu
Dimitris Papailiopoulos
Kangwook Lee
20
0
0
23 May 2022
Simple Open-Vocabulary Object Detection with Vision Transformers
Simple Open-Vocabulary Object Detection with Vision Transformers
Matthias Minderer
A. Gritsenko
Austin Stone
Maxim Neumann
Dirk Weissenborn
...
Zhuoran Shen
Tianlin Li
Xiaohua Zhai
Thomas Kipf
N. Houlsby
ObjD
CLIP
VLM
ViT
OCL
19
307
0
12 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
64
1,256
0
04 May 2022
Better plain ViT baselines for ImageNet-1k
Better plain ViT baselines for ImageNet-1k
Lucas Beyer
Xiaohua Zhai
Alexander Kolesnikov
ViT
VLM
33
111
0
03 May 2022
Data Determines Distributional Robustness in Contrastive Language Image
  Pre-training (CLIP)
Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)
Alex Fang
Gabriel Ilharco
Mitchell Wortsman
Yu Wan
Vaishaal Shankar
Achal Dave
Ludwig Schmidt
VLM
OOD
33
138
0
03 May 2022
Visual Spatial Reasoning
Visual Spatial Reasoning
Fangyu Liu
Guy Edward Toh Emerson
Nigel Collier
ReLM
30
158
0
30 Apr 2022
Continual Learning with Foundation Models: An Empirical Study of Latent
  Replay
Continual Learning with Foundation Models: An Empirical Study of Latent Replay
O. Ostapenko
Timothée Lesort
P. Rodríguez
Md Rifat Arefin
Arthur Douillard
Irina Rish
Laurent Charlin
29
51
0
30 Apr 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
46
3,334
0
29 Apr 2022
CLIP-Dissect: Automatic Description of Neuron Representations in Deep
  Vision Networks
CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks
Tuomas P. Oikarinen
Tsui-Wei Weng
VLM
15
81
1
23 Apr 2022
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
...
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
ReLM
LRM
15
572
0
01 Apr 2022
Large-scale Bilingual Language-Image Contrastive Learning
Large-scale Bilingual Language-Image Contrastive Learning
ByungSoo Ko
Geonmo Gu
VLM
29
14
0
28 Mar 2022
Previous
123456789
Next