Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.20088
Cited By
v1
v2 (latest)
Improving CLIP Training with Language Rewrites
31 May 2023
Lijie Fan
Dilip Krishnan
Phillip Isola
Dina Katabi
Yonglong Tian
BDL
VLM
CLIP
Re-assign community
ArXiv (abs)
PDF
HTML
Github (280★)
Papers citing
"Improving CLIP Training with Language Rewrites"
32 / 32 papers shown
Title
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval
Leqi Shen
Guoqiang Gong
Tianxiang Hao
Tao He
Yifeng Zhang
Pengzhang Liu
Sicheng Zhao
Jungong Han
Guiguang Ding
24
0
0
10 Jun 2025
Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language Models
Kyeonghyun Kim
Jinhee Jang
Juhwan Choi
Yoonji Lee
Kyohoon Jin
Youngbin Kim
28
0
0
09 Jun 2025
Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning
Amit Peleg
Naman D. Singh
Matthias Hein
CoGe
VLM
37
0
0
30 May 2025
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
Yuchi Wang
Yishuo Cai
Shuhuai Ren
Sihan Yang
Linli Yao
Yuanxin Liu
Y. Zhang
Pengfei Wan
Xu Sun
VLM
62
0
0
28 May 2025
CLaDMoP: Learning Transferrable Models from Successful Clinical Trials via LLMs
Yiqing Zhang
Xiaozhong Liu
Fabricio Murai
41
0
0
24 May 2025
Cultural Awareness in Vision-Language Models: A Cross-Country Exploration
Avinash Madasu
Vasudev Lal
Phillip Howard
VLM
24
0
0
23 May 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
329
9
0
17 Apr 2025
Concept-as-Tree: Synthetic Data is All You Need for VLM Personalization
Ruichuan An
Kai Zeng
Ming Lu
Sihan Yang
Renrui Zhang
Huitong Ji
Qizhe Zhang
Yihao Luo
Hao Liang
Wentao Zhang
136
1
0
17 Mar 2025
Enhanced Continual Learning of Vision-Language Models with Model Fusion
Haoyuan Gao
Zicong Zhang
Yuqi Wei
Linglan Zhao
Guilin Li
Yuezun Li
Linghe Kong
Weiran Huang
CLL
VLM
381
0
0
12 Mar 2025
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
Chan hur
Jeong-hun Hong
Dong-hun Lee
Dabin Kang
Semin Myeong
Sang-hyo Park
Hyeyoung Park
196
1
0
07 Mar 2025
FAA-CLIP: Federated Adversarial Adaptation of CLIP
Yihang Wu
Ahmad Chaddad
Christian Desrosiers
Tareef Daqqaq
R. Kateb
VLM
102
0
0
26 Feb 2025
Contrastive Localized Language-Image Pre-Training
Hong-You Chen
Zhengfeng Lai
Hao Zhang
Xiang Wang
Marcin Eichner
Keen You
Meng Cao
Bowen Zhang
Yue Yang
Zhe Gan
CLIP
VLM
124
10
0
20 Feb 2025
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
Angelos Zavras
Dimitrios Michail
Xiao Xiang Zhu
Begüm Demir
Ioannis Papoutsis
VLM
196
1
0
13 Feb 2025
MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
S. Joshi
Besmira Nushi
Vidhisha Balachandran
Varun Chandrasekaran
Vibhav Vineet
Neel Joshi
Baharan Mirzasoleiman
MLLM
VLM
170
0
0
07 Jan 2025
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
348
3
0
02 Dec 2024
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
M. Arda Aydın
Efe Mert Çırpar
Elvin Abdinli
Gözde B. Ünal
Y. Sahin
VLM
291
1
0
18 Nov 2024
Past, Present, and Future of Sensor-Based Human Activity Recognition Using Wearables: A Surveying Tutorial on a Still Challenging Task
H. Haresamudram
Chi Ian Tang
Sungho Suh
P. Lukowicz
Thomas Ploetz
183
3
0
11 Nov 2024
TIPS: Text-Image Pretraining with Spatial awareness
Kevis-Kokitsi Maninis
Kaifeng Chen
Soham Ghosh
Arjun Karpur
Koert Chen
...
Jan Dlabal
Dan Gnanapragasam
Mojtaba Seyedhosseini
Howard Zhou
Andre Araujo
VLM
129
3
0
21 Oct 2024
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Kun Yuan
V. Srivastav
Nassir Navab
N. Padoy
128
9
0
30 Sep 2024
Finetuning CLIP to Reason about Pairwise Differences
Dylan Sam
Devin Willmott
João Dias Semedo
J. Zico Kolter
VLM
115
4
0
15 Sep 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
264
15
0
02 Sep 2024
RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models
Junyao Ge
Xu Zhang
Yang Zheng
Kaitai Guo
Jimin Liang
171
2
0
27 Aug 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
150
5
0
09 Jul 2024
Heterogeneous Contrastive Learning for Foundation Models and Beyond
Lecheng Zheng
Baoyu Jing
Zihao Li
Hanghang Tong
Jingrui He
VLM
102
22
0
30 Mar 2024
A Deep Learning Method for Classification of Biophilic Artworks
Purna Kar
Jordan J. Bird
Yangang Xing
Alexander Sumich
Andrew Knight
Ahmad Lotfi
Benedict Carpenter van Barthold
79
0
0
08 Mar 2024
Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model
Huan Ma
Yan Zhu
Changqing Zhang
Peilin Zhao
Baoyuan Wu
Long-Kai Huang
Qinghua Hu
Bing Wu
VLM
146
2
0
01 Mar 2024
Fine-tuning CLIP Text Encoders with Two-step Paraphrasing
Hyunjae Kim
Seunghyun Yoon
Trung Bui
Handong Zhao
Quan Tran
Franck Dernoncourt
Jaewoo Kang
CLIP
138
2
0
23 Feb 2024
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning
Zaber Ibn Abdul Hakim
Najibul Haque Sarker
Rahul Pratap Singh
Bishmoy Paul
Ali Dabouei
Min Xu
119
1
0
10 Dec 2023
Mitigating Open-Vocabulary Caption Hallucinations
Assaf Ben-Kish
Moran Yanuka
Morris Alper
Raja Giryes
Hadar Averbuch-Elor
MLLM
VLM
123
6
0
06 Dec 2023
Towards reporting bias in visual-language datasets: bimodal augmentation by decoupling object-attribute association
Qiyu Wu
Mengjie Zhao
Yutong He
Lang Huang
Junya Ono
Hiromi Wakaki
Yuki Mitsufuji
107
5
0
02 Oct 2023
Improving Multimodal Datasets with Image Captioning
Thao Nguyen
S. Gadre
Gabriel Ilharco
Sewoong Oh
Ludwig Schmidt
VLM
99
77
0
19 Jul 2023
Retrieval-Enhanced Contrastive Vision-Text Models
Ahmet Iscen
Mathilde Caron
Alireza Fathi
Cordelia Schmid
CLIP
VLM
111
28
0
12 Jun 2023
1