ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.07991
  4. Cited By
LiT: Zero-Shot Transfer with Locked-image text Tuning

LiT: Zero-Shot Transfer with Locked-image text Tuning

15 November 2021
Xiaohua Zhai
Xiao Wang
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
    VLM
ArXivPDFHTML

Papers citing "LiT: Zero-Shot Transfer with Locked-image text Tuning"

50 / 422 papers shown
Title
Enhancing Vision-Language Model Pre-training with Image-text Pair
  Pruning Based on Word Frequency
Enhancing Vision-Language Model Pre-training with Image-text Pair Pruning Based on Word Frequency
Mingliang Liang
Martha Larson
VLM
CLIP
26
0
0
09 Oct 2024
Temporal Image Caption Retrieval Competition -- Description and Results
Temporal Image Caption Retrieval Competition -- Description and Results
Jakub Pokrywka
Piotr Wierzchoñ
Kornel Weryszko
Krzysztof Jassem
52
0
0
08 Oct 2024
LoTLIP: Improving Language-Image Pre-training for Long Text
  Understanding
LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
Wei Wu
Kecheng Zheng
Shuailei Ma
Fan Lu
Yuxin Guo
Yifei Zhang
Wei Chen
Qingpei Guo
Yujun Shen
Zheng-Jun Zha
VLM
30
9
0
07 Oct 2024
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation
  Models
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models
Rabin Adhikari
Safal Thapaliya
Manish Dhakal
Bishesh Khanal
MLLM
VLM
35
0
0
07 Oct 2024
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal
  Foundation Models
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Zhengfeng Lai
Vasileios Saveris
C. L. P. Chen
Hong-You Chen
Haotian Zhang
...
Wenze Hu
Zhe Gan
Peter Grasch
Meng Cao
Yinfei Yang
VLM
33
3
0
03 Oct 2024
The Hard Positive Truth about Vision-Language Compositionality
The Hard Positive Truth about Vision-Language Compositionality
Amita Kamath
Cheng-Yu Hsieh
Kai-Wei Chang
Ranjay Krishna
CLIP
CoGe
VLM
30
5
0
26 Sep 2024
VL4AD: Vision-Language Models Improve Pixel-wise Anomaly Detection
VL4AD: Vision-Language Models Improve Pixel-wise Anomaly Detection
Liangyu Zhong
Joachim Sicking
Fabian Hüger
Hanno Gottschalk
VLM
35
0
0
25 Sep 2024
Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient
  Music-Text Representation Learning
Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning
Ilaria Manco
Justin Salamon
Oriol Nieto
23
1
0
17 Sep 2024
Sam2Rad: A Segmentation Model for Medical Images with Learnable Prompts
Sam2Rad: A Segmentation Model for Medical Images with Learnable Prompts
Assefa Seyoum Wahd
B. Felfeliyan
Yuyue Zhou
Shrimanti Ghosh
Adam McArthur
Jiechen Zhang
Jacob L. Jaremko
A. Hareendranathan
VLM
MedIm
45
1
0
10 Sep 2024
How Molecules Impact Cells: Unlocking Contrastive PhenoMolecular
  Retrieval
How Molecules Impact Cells: Unlocking Contrastive PhenoMolecular Retrieval
Philip Fradkin
Puria Azadi
Karush Suri
Frederik Wenkel
A. Bashashati
Maciej Sypetkowski
Dominique Beaini
33
1
0
10 Sep 2024
Optimizing CLIP Models for Image Retrieval with Maintained
  Joint-Embedding Alignment
Optimizing CLIP Models for Image Retrieval with Maintained Joint-Embedding Alignment
Konstantin Schall
Kai Uwe Barthel
Nico Hezel
Klaus Jung
VLM
36
3
0
03 Sep 2024
How Does Diverse Interpretability of Textual Prompts Impact Medical
  Vision-Language Zero-Shot Tasks?
How Does Diverse Interpretability of Textual Prompts Impact Medical Vision-Language Zero-Shot Tasks?
Sicheng Wang
Che Liu
Rossella Arcucci
VLM
MedIm
36
0
0
31 Aug 2024
DPA: Dual Prototypes Alignment for Unsupervised Adaptation of
  Vision-Language Models
DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models
Eman Ali
Sathira Silva
Muhammad Haris Khan
VLM
34
0
0
16 Aug 2024
Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D
  Instance Segmentation
Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation
Tri Ton
Ji Woo Hong
Soohwan Eom
Jun Yeop Shim
Junyeong Kim
Chang D. Yoo
3DPC
ISeg
47
2
0
16 Aug 2024
Towards Flexible Visual Relationship Segmentation
Towards Flexible Visual Relationship Segmentation
Fangrui Zhu
Jianwei Yang
Huaizu Jiang
VOS
34
1
0
15 Aug 2024
ARPA: A Novel Hybrid Model for Advancing Visual Word Disambiguation
  Using Large Language Models and Transformers
ARPA: A Novel Hybrid Model for Advancing Visual Word Disambiguation Using Large Language Models and Transformers
Aristi Papastavrou
Maria Lymperaiou
Giorgos Stamou
AI4CE
32
1
0
12 Aug 2024
Efficient Test-Time Prompt Tuning for Vision-Language Models
Efficient Test-Time Prompt Tuning for Vision-Language Models
Yuhan Zhu
Guozhen Zhang
Chen Xu
Haocheng Shen
Xiaoxin Chen
Gangshan Wu
Limin Wang
VLM
37
2
0
11 Aug 2024
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic
  Segmentation
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
Dahyun Kang
Minsu Cho
ObjD
VLM
40
9
0
09 Aug 2024
MarvelOVD: Marrying Object Recognition and Vision-Language Models for
  Robust Open-Vocabulary Object Detection
MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection
Kuo Wang
Lechao Cheng
Weikai Chen
Pingping Zhang
Liang Lin
Fan Zhou
Guanbin Li
VLM
ObjD
36
1
0
31 Jul 2024
I can listen but cannot read: An evaluation of two-tower multimodal
  systems for instrument recognition
I can listen but cannot read: An evaluation of two-tower multimodal systems for instrument recognition
Yannis Vasilakis
Rachel M. Bittner
Johan Pauwels
40
0
0
25 Jul 2024
Unified Lexical Representation for Interpretable Visual-Language
  Alignment
Unified Lexical Representation for Interpretable Visual-Language Alignment
Yifan Li
Yikai Wang
Yanwei Fu
Dongyu Ru
Zheng-Wei Zhang
Tong He
VLM
42
4
0
25 Jul 2024
Babel: A Scalable Pre-trained Model for Multi-Modal Sensing via Expandable Modality Alignment
Babel: A Scalable Pre-trained Model for Multi-Modal Sensing via Expandable Modality Alignment
Shenghong Dai
Shiqi Jiang
Yifan Yang
Ting Cao
Mo Li
Suman Banerjee
Lili Qiu
49
2
0
25 Jul 2024
Multi-label Cluster Discrimination for Visual Representation Learning
Multi-label Cluster Discrimination for Visual Representation Learning
Xiang An
Kaicheng Yang
Xiangzi Dai
Ziyong Feng
Jiankang Deng
VLM
45
6
0
24 Jul 2024
Robust Calibration of Large Vision-Language Adapters
Robust Calibration of Large Vision-Language Adapters
Balamurali Murugesan
Julio Silva-Rodríguez
Ismail Ben Ayed
Jose Dolz
OODD
VLM
32
6
0
18 Jul 2024
CoAPT: Context Attribute words for Prompt Tuning
CoAPT: Context Attribute words for Prompt Tuning
Gun Lee
Subin An
Sungyong Baik
Soochahn Lee
VPVLM
VLM
35
1
0
18 Jul 2024
Quantized Prompt for Efficient Generalization of Vision-Language Models
Quantized Prompt for Efficient Generalization of Vision-Language Models
Tianxiang Hao
Xiaohan Ding
Juexiao Feng
Yuhong Yang
Hui Chen
Guiguang Ding
VLM
MQ
32
5
0
15 Jul 2024
Emergent Visual-Semantic Hierarchies in Image-Text Representations
Emergent Visual-Semantic Hierarchies in Image-Text Representations
Morris Alper
Hadar Averbuch-Elor
VLM
32
6
0
11 Jul 2024
Enhancing Robustness of Vision-Language Models through Orthogonality
  Learning and Cross-Regularization
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization
Jinlong Li
Zequn Jie
Elisa Ricci
Lin Ma
N. Sebe
VLM
39
0
0
11 Jul 2024
Mind the Interference: Retaining Pre-trained Knowledge in Parameter
  Efficient Continual Learning of Vision-Language Models
Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
Longxiang Tang
Zhuotao Tian
Kai Li
Chunming He
Hantao Zhou
Hengshuang Zhao
Xiu Li
Jiaya Jia
CLL
VLM
34
20
0
07 Jul 2024
Improving Zero-shot Generalization of Learned Prompts via Unsupervised
  Knowledge Distillation
Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation
Marco Mistretta
Alberto Baldrati
Marco Bertini
Andrew D. Bagdanov
VPVLM
VLM
35
6
0
03 Jul 2024
SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
Bac Nguyen
Stefan Uhlich
Fabien Cardinaux
Lukas Mauch
Marzieh Edraki
Aaron Courville
OODD
CLL
VLM
54
3
0
03 Jul 2024
Semantically Guided Representation Learning For Action Anticipation
Semantically Guided Representation Learning For Action Anticipation
Anxhelo Diko
D. Avola
Bardh Prenkaj
Federico Fontana
Luigi Cinque
AI4TS
43
2
0
02 Jul 2024
Semantic Compositions Enhance Vision-Language Contrastive Learning
Semantic Compositions Enhance Vision-Language Contrastive Learning
Maxwell Mbabilla Aladago
Lorenzo Torresani
Soroush Vosoughi
CoGe
VLM
CLIP
41
0
0
01 Jul 2024
GM-DF: Generalized Multi-Scenario Deepfake Detection
GM-DF: Generalized Multi-Scenario Deepfake Detection
Yingxin Lai
Zitong Yu
Jing Yang
Bin Li
Xiangui Kang
Linlin Shen
32
7
0
28 Jun 2024
Dataset Size Recovery from LoRA Weights
Dataset Size Recovery from LoRA Weights
Mohammad Salama
Jonathan Kahana
Eliahu Horwitz
Yedid Hoshen
39
5
0
27 Jun 2024
Latent Space Translation via Inverse Relative Projection
Latent Space Translation via Inverse Relative Projection
Valentino Maiorca
Luca Moschella
Marco Fumero
Francesco Locatello
Emanuele Rodolà
34
1
0
21 Jun 2024
IWISDM: Assessing instruction following in multimodal models at scale
IWISDM: Assessing instruction following in multimodal models at scale
Xiaoxuan Lei
Lucas Gomez
Hao Yuan Bai
P. Bashivan
VLM
33
1
0
20 Jun 2024
MMUTF: Multimodal Multimedia Event Argument Extraction with Unified
  Template Filling
MMUTF: Multimodal Multimedia Event Argument Extraction with Unified Template Filling
Philipp Seeberger
Dominik Wagner
K. Riedhammer
27
0
0
18 Jun 2024
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Han-Hung Lee
Yiming Zhang
Angel X. Chang
3DPC
43
3
0
17 Jun 2024
What If We Recaption Billions of Web Images with LLaMA-3?
What If We Recaption Billions of Web Images with LLaMA-3?
Xianhang Li
Haoqin Tu
Mude Hui
Zeyu Wang
Bingchen Zhao
...
Jieru Mei
Qing Liu
Huangjie Zheng
Yuyin Zhou
Cihang Xie
VLM
MLLM
41
35
0
12 Jun 2024
Benchmarking Vision-Language Contrastive Methods for Medical
  Representation Learning
Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning
Shuvendu Roy
Yasaman Parhizkar
Franklin Ogidi
Vahid Reza Khazaie
Michael Colacci
Ali Etemad
Elham Dolatabadi
Arash Afkanpour
VLM
49
1
0
11 Jun 2024
Nomic Embed Vision: Expanding the Latent Space
Nomic Embed Vision: Expanding the Latent Space
Zach Nussbaum
Brandon Duderstadt
Andriy Mulyar
VLM
33
5
0
06 Jun 2024
Radar Spectra-Language Model for Automotive Scene Parsing
Radar Spectra-Language Model for Automotive Scene Parsing
Mariia Pushkareva
Yuri Feldman
Csaba Domokos
K. Rambach
Dotan Di Castro
55
1
0
04 Jun 2024
ED-SAM: An Efficient Diffusion Sampling Approach to Domain
  Generalization in Vision-Language Foundation Models
ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models
Thanh-Dat Truong
Xin Li
Bhiksha Raj
Jackson Cothren
Khoa Luu
DiffM
VLM
51
1
0
03 Jun 2024
Generalization Beyond Data Imbalance: A Controlled Study on CLIP for
  Transferable Insights
Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights
Xin Wen
Bingchen Zhao
Yilun Chen
Jiangmiao Pang
Xiaojuan Qi
35
3
0
31 May 2024
Language Augmentation in CLIP for Improved Anatomy Detection on
  Multi-modal Medical Images
Language Augmentation in CLIP for Improved Anatomy Detection on Multi-modal Medical Images
Mansi Kakkar
D. Shanbhag
Chandan Aladahalli
M. GurunathReddy
VLM
LM&MA
MedIm
23
2
0
31 May 2024
Topological Perspectives on Optimal Multimodal Embedding Spaces
Topological Perspectives on Optimal Multimodal Embedding Spaces
Abdul Aziz
Abdul Rahim
BDL
39
0
0
29 May 2024
Low-Rank Few-Shot Adaptation of Vision-Language Models
Low-Rank Few-Shot Adaptation of Vision-Language Models
Maxime Zanella
Ismail Ben Ayed
OffRL
VLM
48
26
0
28 May 2024
WIDIn: Wording Image for Domain-Invariant Representation in
  Single-Source Domain Generalization
WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization
Jiawei Ma
Yulei Niu
Shiyuan Huang
G. Han
Shih-Fu Chang
VLM
39
1
0
28 May 2024
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
Cristian Rodriguez-Opazo
Ehsan Abbasnejad
Damien Teney
Edison Marrese-Taylor
Hamed Damirchi
A. Hengel
VLM
40
1
0
27 May 2024
Previous
123456789
Next