Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2506.01850
Cited By
MoDA: Modulation Adapter for Fine-Grained Visual Grounding in Instructional MLLMs
2 June 2025
Wayner Barrios
Andrés Villa
Juan Carlos León Alcázar
SouYoung Jin
Bernard Ghanem
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MoDA: Modulation Adapter for Fine-Grained Visual Grounding in Instructional MLLMs"
5 / 5 papers shown
Title
EAGLE: Enhanced Visual Grounding Minimizes Hallucinations in Instructional Multimodal Models
Andrés Villa
Juan Carlos León Alcázar
Motasem Alfarra
Vladimir Araujo
Alvaro Soto
Bernard Ghanem
VLM
44
2
0
06 Jan 2025
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Shengbang Tong
Zhuang Liu
Yuexiang Zhai
Yi-An Ma
Yann LeCun
Saining Xie
VLM
MLLM
70
311
0
11 Jan 2024
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
83
1,076
0
27 Mar 2023
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Faisal Ahmed
Zhe Gan
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
63
241
0
25 Nov 2021
MST: Masked Self-Supervised Transformer for Visual Representation
Zhaowen Li
Zhiyang Chen
Fan Yang
Wei Li
Yousong Zhu
...
Rui Deng
Liwei Wu
Rui Zhao
Ming Tang
Jinqiao Wang
ViT
67
166
0
10 Jun 2021
1