Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.05299
Cited By
SmolVLM: Redefining small and efficient multimodal models
7 April 2025
Andres Marafioti
Orr Zohar
Miquel Farré
Merve Noyan
Elie Bakouch
Pedro Cuenca
Cyril Zakka
Loubna Ben Allal
Anton Lozhkov
Nouamane Tazi
Vaibhav Srivastav
Joshua Lochner
Hugo Larcher
Mathieu Morlon
Lewis Tunstall
Leandro von Werra
Thomas Wolf
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SmolVLM: Redefining small and efficient multimodal models"
7 / 7 papers shown
Title
EmoGist: Efficient In-Context Learning for Visual Emotion Understanding
Ronald Seoh
Dan Goldwasser
VLM
7
0
0
20 May 2025
VLC Fusion: Vision-Language Conditioned Sensor Fusion for Robust Object Detection
Aditya Taparia
Noel Ngu
Mario Leiva
Joshua Shay Kricheli
John Corcoran
Nathaniel D. Bastian
Gerardo Simari
Paulo Shakarian
Ransalu Senanayake
ObjD
7
0
0
19 May 2025
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
Zongxia Li
Xiyang Wu
Guangyao Shi
Yubin Qin
Hongyang Du
Tianyi Zhou
Dinesh Manocha
Jordan Lee Boyd-Graber
MLLM
57
0
0
02 May 2025
WildFireCan-MMD: A Multimodal Dataset for Classification of User-Generated Content During Wildfires in Canada
Braeden Sherritt
Isar Nejadgholi
Marzieh Amini
VLM
44
0
0
17 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Z. Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLM
VLM
70
19
1
14 Apr 2025
One Pic is All it Takes: Poisoning Visual Document Retrieval Augmented Generation with a Single Image
Ezzeldin Shereen
Dan Ristea
Burak Hasircioglu
Shae McFadden
V. Mavroudis
Chris Hicks
49
0
0
02 Apr 2025
ComicsPAP: understanding comic strips by picking the correct panel
Emanuele Vivoli
Artemis LLabres
Mohamed Ali Soubgui
Marco Bertini
Ernest Valveny Llobet
Dimosthenis Karatzas
65
0
0
11 Mar 2025
1