Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.01449
Cited By
ColPali: Efficient Document Retrieval with Vision Language Models
27 June 2024
Manuel Faysse
Hugues Sibille
Tony Wu
Bilel Omrani
Gautier Viaud
C´eline Hudelot
Pierre Colombo
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ColPali: Efficient Document Retrieval with Vision Language Models"
23 / 23 papers shown
Title
Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning
François Role
Sébastien Meyer
Victor Amblard
VLM
50
0
0
06 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Zhaoxin Fan
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
86
1
0
26 Apr 2025
ColBERT-serve: Efficient Multi-Stage Memory-Mapped Scoring
Kaili Huang
Thejas Venkatesh
Uma Dingankar
Antonio Mallia
Daniel Campos
...
Matei A. Zaharia
Kwabena Boahen
Omar Khattab
Saarthak Sarup
Keshav Santhanam
32
0
0
21 Apr 2025
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
Ryota Tanaka
Taichi Iki
Taku Hasegawa
Kyosuke Nishida
Kuniko Saito
Jun Suzuki
VLM
52
0
0
14 Apr 2025
SmolVLM: Redefining small and efficient multimodal models
Andres Marafioti
Orr Zohar
Miquel Farré
Merve Noyan
Elie Bakouch
...
Hugo Larcher
Mathieu Morlon
Lewis Tunstall
Leandro von Werra
Thomas Wolf
VLM
39
6
0
07 Apr 2025
One Pic is All it Takes: Poisoning Visual Document Retrieval Augmented Generation with a Single Image
Ezzeldin Shereen
Dan Ristea
Burak Hasircioglu
Shae McFadden
V. Mavroudis
Chris Hicks
46
0
0
02 Apr 2025
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
Xu Zheng
Ziqiao Weng
Yuanhuiyi Lyu
Lutao Jiang
Haiwei Xue
Bin Ren
Danda Pani Paudel
N. Sebe
Luc Van Gool
Xuming Hu
3DV
39
1
0
23 Mar 2025
Real-world validation of a multimodal LLM-powered pipeline for High-Accuracy Clinical Trial Patient Matching leveraging EHR data
Anatole Callies
Quentin Bodinier
Philippe Ravaud
Kourosh Davarpanah
48
0
0
19 Mar 2025
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding
S. Han
Peng Xia
Ruiyi Zhang
Tong Sun
Yun-Qing Li
Hongtu Zhu
Huaxiu Yao
VLM
92
3
0
18 Mar 2025
VisTW: Benchmarking Vision-Language Models for Traditional Chinese in Taiwan
Zhi Rui Tam
Ya-Ting Pai
Yen-Wei Lee
Yun-Nung Chen
CoGe
101
0
0
13 Mar 2025
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language models
Jonathan Bourne
77
0
0
24 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
105
4
0
12 Feb 2025
VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos
Xubin Ren
Lingrui Xu
Long Xia
S. Wang
Dawei Yin
Chao Huang
VGen
VLM
73
3
0
03 Feb 2025
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs
Xin Zhang
Yanzhao Zhang
Wen Xie
Mingxin Li
Ziqi Dai
Dingkun Long
Pengjun Xie
Meishan Zhang
Wenjie Li
M. Zhang
116
7
0
22 Dec 2024
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework
Yew Ken Chia
Liying Cheng
Hou Pong Chan
Chaoqun Liu
Maojia Song
Sharifah Mahani Aljunied
Soujanya Poria
Lidong Bing
RALM
VLM
43
4
0
09 Nov 2024
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs
Sheng-Chieh Lin
Chankyu Lee
M. Shoeybi
Jimmy J. Lin
Bryan Catanzaro
Wei Ping
65
10
0
04 Nov 2024
Distill Visual Chart Reasoning Ability from LLMs to MLLMs
Wei He
Zhiheng Xi
Wanxu Zhao
Xiaoran Fan
Yiwen Ding
Zifei Shan
Tao Gui
Qi Zhang
Xuanjing Huang
LRM
51
5
0
24 Oct 2024
Unified Multi-Modal Interleaved Document Representation for Information Retrieval
Jaewoo Lee
Joonho Ko
Jinheon Baek
Soyeong Jeong
Sung Ju Hwang
20
1
0
03 Oct 2024
Towards Trustworthy Reranking: A Simple yet Effective Abstention Mechanism
Hippolyte Gisserot-Boukhlef
Manuel Faysse
Emmanuel Malherbe
C´eline Hudelot
Pierre Colombo
34
2
0
20 Feb 2024
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation
Jianlv Chen
Shitao Xiao
Peitian Zhang
Kun Luo
Defu Lian
Zheng Liu
115
328
0
05 Feb 2024
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset
Ashish V. Thapliyal
Jordi Pont-Tuset
Xi Chen
Radu Soricut
VGen
86
72
0
25 May 2022
PLAID: An Efficient Engine for Late Interaction Retrieval
Keshav Santhanam
Omar Khattab
Christopher Potts
Matei A. Zaharia
VLM
58
72
0
19 May 2022
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models
Nandan Thakur
Nils Reimers
Andreas Rucklé
Abhishek Srivastava
Iryna Gurevych
VLM
231
966
0
17 Apr 2021
1