ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.05160
  4. Cited By
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks

VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks

3 January 2025
Ziyan Jiang
Rui Meng
Xinyi Yang
Semih Yavuz
Yingbo Zhou
Wenhu Chen
    MLLM
    VLM
ArXivPDFHTML

Papers citing "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"

15 / 15 papers shown
Title
UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings
UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings
Jiajun Qin
Yuan Pu
Zhuolun He
Seunggeun Kim
David Z. Pan
Bei Yu
12
0
0
17 May 2025
Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch Mining
Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch Mining
Raghuveer Thirukovalluru
Rui Meng
Yong-Jin Liu
Karthikeyan K
Mingyi Su
Ping Nie
Semih Yavuz
Yingbo Zhou
Wenhu Chen
Bhuwan Dhingra
17
0
0
16 May 2025
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
Tiancheng Gu
Kaicheng Yang
Ziyong Feng
Xingjun Wang
Yanzhao Zhang
Dingkun Long
Yingda Chen
Weidong Cai
Jiankang Deng
VLM
220
2
0
24 Apr 2025
MIEB: Massive Image Embedding Benchmark
MIEB: Massive Image Embedding Benchmark
Chenghao Xiao
Isaac Chung
Imene Kerboua
Jamie Stirling
Xin Zhang
Márton Kardos
Roman Solomatin
Noura Al Moubayed
Kenneth Enevoldsen
Niklas Muennighoff
VLM
42
0
0
14 Apr 2025
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
Cheng-Yu Hsieh
Pavan Kumar Anasosalu Vasu
Fartash Faghri
Raviteja Vemulapalli
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Hadi Pouransari
VLM
213
0
0
11 Apr 2025
IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval
IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval
Bangwei Liu
Yicheng Bao
Shaohui Lin
Xuhong Wang
Xin Tan
Yansen Wang
Yuan Xie
Chaochao Lu
84
0
0
01 Apr 2025
Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck
Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck
Adrian Bulat
Yassine Ouali
Georgios Tzimiropoulos
208
0
0
27 Mar 2025
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
Kevin Qinghong Lin
Mike Zheng Shou
VGen
213
1
0
12 Mar 2025
A Novel Trustworthy Video Summarization Algorithm Through a Mixture of LoRA Experts
Wenzhuo Du
G. Wang
Guancheng Chen
Hang Zhao
X. Li
Jian Gao
217
0
0
08 Mar 2025
ABC: Achieving Better Control of Multimodal Embeddings using VLMs
Benjamin Schneider
Florian Kerschbaum
Wenhu Chen
189
0
0
01 Mar 2025
Joint Fusion and Encoding: Advancing Multimodal Retrieval from the Ground Up
Joint Fusion and Encoding: Advancing Multimodal Retrieval from the Ground Up
Lang Huang
Qiyu Wu
Zhongtao Miao
T. Yamasaki
201
0
0
27 Feb 2025
Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval
Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval
Ze Liu
Junjie Zhou
Yueze Wang
Zheng Liu
Defu Lian
OffRL
151
0
0
17 Feb 2025
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs
Xin Zhang
Yanzhao Zhang
Wen Xie
Mingxin Li
Ziqi Dai
Dingkun Long
Pengjun Xie
Meishan Zhang
Wenjie Li
Hao Fei
118
9
0
22 Dec 2024
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding
Yiming Zhang
Zhuokai Zhao
Zhaorun Chen
Zenghui Ding
Xianjun Yang
Yining Sun
257
1
0
21 Nov 2024
ColPali: Efficient Document Retrieval with Vision Language Models
ColPali: Efficient Document Retrieval with Vision Language Models
Manuel Faysse
Hugues Sibille
Tony Wu
Bilel Omrani
Gautier Viaud
C´eline Hudelot
Pierre Colombo
VLM
70
22
0
27 Jun 2024
1