Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.15682
Cited By
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
21 February 2025
Guanqi Zhan
Yuanpei Liu
Kai Han
Weidi Xie
Andrew Zisserman
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval"
50 / 76 papers shown
Title
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Michael Tschannen
A. Gritsenko
Xiao Wang
Muhammad Ferjad Naeem
Ibrahim Alabdulmohsin
...
Basil Mustafa
Olivier J. Hénaff
Jeremiah Harmsen
Andreas Steiner
Xiaohua Zhai
VLM
127
66
0
21 Feb 2025
LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
Yikun Liu
Pingan Chen
Jiayin Cai
Xiaolong Jiang
Feng-Long Xie
Jiangchao Yao
Yanfeng Wang
Weidi Xie
RALM
91
7
0
02 Dec 2024
Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval
Naoya Sogi
Takashi Shibata
Makoto Terao
VLM
68
2
0
17 Jul 2024
Data curation via joint example selection further accelerates multimodal learning
Talfan Evans
Nikhil Parthasarathy
Hamza Merzic
Olivier J. Hénaff
81
14
0
25 Jun 2024
Rho-1: Not All Tokens Are What You Need
Zheng-Wen Lin
Zhibin Gou
Yeyun Gong
Xiao Liu
Yelong Shen
...
Chen Lin
Yujiu Yang
Jian Jiao
Nan Duan
Weizhu Chen
CLL
80
68
0
11 Apr 2024
CFIR: Fast and Effective Long-Text To Image Retrieval for Large Corpora
Zijun Long
Xuri Ge
R. McCreadie
Joemon M. Jose
44
7
0
23 Feb 2024
Effective pruning of web-scale datasets based on complexity of concept clusters
Amro Abbas
E. Rusak
Kushal Tirumala
Wieland Brendel
Kamalika Chaudhuri
Ari S. Morcos
VLM
CLIP
49
23
0
09 Jan 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
224
1,102
0
21 Dec 2023
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
Talfan Evans
Shreya Pathak
Hamza Merzic
Jonathan Schwarz
Ryutaro Tanno
Olivier J. Hénaff
46
17
0
08 Dec 2023
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
Cong Wei
Yang Chen
Haonan Chen
Hexiang Hu
Ge Zhang
Jie Fu
Alan Ritter
Wenhu Chen
60
65
0
28 Nov 2023
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Pavan Kumar Anasosalu Vasu
Hadi Pouransari
Fartash Faghri
Raviteja Vemulapalli
Oncel Tuzel
CLIP
VLM
69
50
0
28 Nov 2023
Sieve: Multimodal Dataset Pruning Using Image Captioning Models
Anas Mahmoud
Mostafa Elhoushi
Amro Abbas
Yu Yang
Newsha Ardalani
Hugh Leather
Ari S. Morcos
VLM
CLIP
50
21
0
03 Oct 2023
Data Filtering Networks
Alex Fang
Albin Madappally Jose
Amit Jain
Ludwig Schmidt
Alexander Toshev
Vaishaal Shankar
CLIP
62
135
0
29 Sep 2023
CoVR: Learning Composed Video Retrieval from Web Video Captions
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
41
21
0
28 Aug 2023
Zero-shot Composed Text-Image Retrieval
Yikun Liu
Jiangchao Yao
Ya Zhang
Yanfeng Wang
Weidi Xie
50
24
0
12 Jun 2023
DataComp: In search of the next generation of multimodal datasets
S. Gadre
Gabriel Ilharco
Alex Fang
J. Hayase
Georgios Smyrnis
...
A. Dimakis
J. Jitsev
Y. Carmon
Vaishaal Shankar
Ludwig Schmidt
VLM
68
434
0
27 Apr 2023
Learnable Pillar-based Re-ranking for Image-Text Retrieval
Leigang Qu
Meng Liu
Wenjie Wang
Zhedong Zheng
Liqiang Nie
Tat-Seng Chua
VLM
32
13
0
25 Apr 2023
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
429
4,678
0
17 Apr 2023
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
128
1,131
0
27 Mar 2023
Zero-Shot Composed Image Retrieval with Textual Inversion
Alberto Baldrati
Lorenzo Agnolucci
Marco Bertini
A. Bimbo
60
106
0
27 Mar 2023
CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion
Geonmo Gu
Sanghyuk Chun
Wonjae Kim
HeeJae Jun
Yoohoon Kang
Sangdoo Yun
DiffM
50
55
0
21 Mar 2023
SemDeDup: Data-efficient learning at web-scale through semantic deduplication
Amro Abbas
Kushal Tirumala
Daniel Simig
Surya Ganguli
Ari S. Morcos
61
175
0
16 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
401
4,527
0
30 Jan 2023
Scaling Language-Image Pre-training via Masking
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
CLIP
VLM
66
324
0
01 Dec 2022
Improving Cross-Modal Retrieval with Set of Diverse Embeddings
Dongwon Kim
Nam-Won Kim
Suha Kwak
85
37
0
30 Nov 2022
A Light Touch Approach to Teaching Transformers Multi-view Geometry
Yash Bhalgat
Joao F. Henriques
Andrew Zisserman
ViT
42
6
0
28 Nov 2022
A Tri-Layer Plugin to Improve Occluded Detection
Guanqi Zhan
Weidi Xie
Andrew Zisserman
39
20
0
18 Oct 2022
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
139
3,420
0
16 Oct 2022
Turbo Training with Token Dropout
Tengda Han
Weidi Xie
Andrew Zisserman
ViT
51
11
0
10 Oct 2022
Beyond neural scaling laws: beating power law scaling via data pruning
Ben Sorscher
Robert Geirhos
Shashank Shekhar
Surya Ganguli
Ari S. Morcos
85
439
0
29 Jun 2022
Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
Sören Mindermann
J. Brauner
Muhammed Razzak
Mrinank Sharma
Andreas Kirsch
...
Benedikt Höltgen
Aidan Gomez
Adrien Morisot
Sebastian Farquhar
Y. Gal
86
160
0
14 Jun 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
131
1,293
0
04 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
344
3,515
0
29 Apr 2022
Visual Prompt Tuning
Menglin Jia
Luming Tang
Bor-Chun Chen
Claire Cardie
Serge Belongie
Bharath Hariharan
Ser-Nam Lim
VLM
VPVLM
139
1,615
0
23 Mar 2022
Instance-wise Occlusion and Depth Orders in Natural Scenes
Hyunmin Lee
Jaesik Park
3DV
46
26
0
29 Nov 2021
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
167
1,943
0
16 Jul 2021
Step-Wise Hierarchical Alignment Network for Image-Text Matching
Zhong Ji
Kexin Chen
Haoran Wang
40
94
0
11 Jun 2021
Divide and Contrast: Self-supervised Learning from Uncurated Data
Yonglong Tian
Olivier J. Hénaff
Aaron van den Oord
SSL
110
99
0
17 May 2021
Discrete-continuous Action Space Policy Gradient-based Attention for Image-Text Matching
Shiyang Yan
Li Yu
Yuan Xie
69
34
0
21 Apr 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
117
1,545
0
18 Apr 2021
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
Antoine Miech
Jean-Baptiste Alayrac
Ivan Laptev
Josef Sivic
Andrew Zisserman
ViT
47
139
0
30 Mar 2021
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers
Hila Chefer
Shir Gur
Lior Wolf
ViT
60
317
0
29 Mar 2021
Instance-level Image Retrieval using Reranking Transformers
Fuwen Tan
Jiangbo Yuan
Vicente Ordonez
ViT
109
92
0
22 Mar 2021
Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition
Stephen Hausler
Sourav Garg
Ming Xu
Michael Milford
Tobias Fischer
86
334
0
02 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
816
29,167
0
26 Feb 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
426
1,120
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
419
3,826
0
11 Feb 2021
Training Vision Transformers for Image Retrieval
Alaaeldin El-Nouby
Natalia Neverova
Ivan Laptev
Hervé Jégou
ViT
117
158
0
10 Feb 2021
Probabilistic Embeddings for Cross-Modal Retrieval
Sanghyuk Chun
Seong Joon Oh
Rafael Sampaio de Rezende
Yannis Kalantidis
Diane Larlus
UQCV
463
205
0
13 Jan 2021
Similarity Reasoning and Filtration for Image-Text Matching
Haiwen Diao
Ying Zhang
Lingyun Ma
Huchuan Lu
275
335
0
05 Jan 2021
1
2
Next