ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.01936
  4. Cited By
Optimizing CLIP Models for Image Retrieval with Maintained
  Joint-Embedding Alignment

Optimizing CLIP Models for Image Retrieval with Maintained Joint-Embedding Alignment

3 September 2024
Konstantin Schall
Kai Uwe Barthel
Nico Hezel
Klaus Jung
    VLM
ArXiv (abs)PDFHTML

Papers citing "Optimizing CLIP Models for Image Retrieval with Maintained Joint-Embedding Alignment"

26 / 26 papers shown
Title
MOFI: Learning Image Representations from Noisy Entity Annotated Images
MOFI: Learning Image Representations from Noisy Entity Annotated Images
Wentao Wu
Aleksei Timofeev
Chen Chen
Bowen Zhang
Kun Duan
...
Yantao Zheng
Jonathon Shlens
Xianzhi Du
Zhe Gan
Yinfei Yang
VLM
71
8
0
13 Jun 2023
Sigmoid Loss for Language Image Pre-Training
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIPVLM
254
1,200
0
27 Mar 2023
EVA: Exploring the Limits of Masked Visual Representation Learning at
  Scale
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLMCLIP
198
725
0
14 Nov 2022
LAION-5B: An open large-scale dataset for training next generation
  image-text models
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLMMLLMCLIP
200
3,502
0
16 Oct 2022
Efficient Vision-Language Pretraining with Visual Concepts and
  Hierarchical Alignment
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment
Mustafa Shukor
Guillaume Couairon
Matthieu Cord
VLMCLIP
79
27
0
29 Aug 2022
GLIPv2: Unifying Localization and Vision-Language Understanding
GLIPv2: Unifying Localization and Vision-Language Understanding
Haotian Zhang
Pengchuan Zhang
Xiaowei Hu
Yen-Chun Chen
Liunian Harold Li
Xiyang Dai
Lijuan Wang
Lu Yuan
Lei Li
Jianfeng Gao
ObjDVLM
90
300
0
12 Jun 2022
Unified Contrastive Learning in Image-Text-Label Space
Unified Contrastive Learning in Image-Text-Label Space
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Bin Xiao
Ce Liu
Lu Yuan
Jianfeng Gao
VLMSSL
139
227
0
07 Apr 2022
Detecting Twenty-thousand Classes using Image-level Supervision
Detecting Twenty-thousand Classes using Image-level Supervision
Xingyi Zhou
Rohit Girdhar
Armand Joulin
Phillip Krahenbuhl
Ishan Misra
CLIPVLM
113
618
0
07 Jan 2022
High-Resolution Image Synthesis with Latent Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
502
15,788
0
20 Dec 2021
GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval
GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval
Konstantin Schall
Kai Uwe Barthel
Nico Hezel
Klaus Jung
VLM
41
20
0
25 Nov 2021
LiT: Zero-Shot Transfer with Locked-image text Tuning
LiT: Zero-Shot Transfer with Locked-image text Tuning
Xiaohua Zhai
Tianlin Li
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
VLM
115
560
0
15 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViTTPM
477
7,827
0
11 Nov 2021
DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local
  and Global Features
DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features
Min Yang
Dongliang He
M. Fan
Baorong Shi
Xuetong Xue
Fu Li
Errui Ding
Jizhou Huang
105
98
0
06 Aug 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
1.0K
29,926
0
26 Feb 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
420
5,000
0
24 Feb 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
450
1,142
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLMCLIP
463
3,901
0
11 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
682
41,483
0
22 Oct 2020
Google Landmarks Dataset v2 -- A Large-Scale Benchmark for
  Instance-Level Recognition and Retrieval
Google Landmarks Dataset v2 -- A Large-Scale Benchmark for Instance-Level Recognition and Retrieval
Tobias Weyand
A. Araújo
Bingyi Cao
Jack Sim
80
371
0
03 Apr 2020
Natural Adversarial Examples
Natural Adversarial Examples
Dan Hendrycks
Kevin Zhao
Steven Basart
Jacob Steinhardt
Basel Alomair
OODD
236
1,484
0
16 Jul 2019
Learning Robust Global Representations by Penalizing Local Predictive
  Power
Learning Robust Global Representations by Penalizing Local Predictive Power
Haohan Wang
Songwei Ge
Eric Xing
Zachary Chase Lipton
OOD
122
967
0
29 May 2019
Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking
Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking
Filip Radenovic
Ahmet Iscen
Giorgos Tolias
Yannis Avrithis
Ondřej Chum
60
381
0
29 Mar 2018
Fine-tuning CNN Image Retrieval with No Human Annotation
Fine-tuning CNN Image Retrieval with No Human Annotation
Filip Radenovic
Giorgos Tolias
Ondřej Chum
84
1,307
0
03 Nov 2017
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLMObjD
1.7K
39,615
0
01 Sep 2014
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
434
43,875
0
01 May 2014
Fine-Grained Visual Classification of Aircraft
Fine-Grained Visual Classification of Aircraft
Subhransu Maji
Esa Rahtu
Arno Solin
Matthew Blaschko
Andrea Vedaldi
126
2,272
0
21 Jun 2013
1