ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.01850
  4. Cited By
MoDA: Modulation Adapter for Fine-Grained Visual Grounding in Instructional MLLMs

MoDA: Modulation Adapter for Fine-Grained Visual Grounding in Instructional MLLMs

2 June 2025
Wayner Barrios
Andrés Villa
Juan Carlos León Alcázar
SouYoung Jin
Bernard Ghanem
ArXivPDFHTML

Papers citing "MoDA: Modulation Adapter for Fine-Grained Visual Grounding in Instructional MLLMs"

5 / 5 papers shown
Title
EAGLE: Enhanced Visual Grounding Minimizes Hallucinations in Instructional Multimodal Models
Andrés Villa
Juan Carlos León Alcázar
Motasem Alfarra
Vladimir Araujo
Alvaro Soto
Bernard Ghanem
VLM
44
2
0
06 Jan 2025
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Shengbang Tong
Zhuang Liu
Yuexiang Zhai
Yi-An Ma
Yann LeCun
Saining Xie
VLM
MLLM
70
311
0
11 Jan 2024
Sigmoid Loss for Language Image Pre-Training
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
83
1,076
0
27 Mar 2023
SwinBERT: End-to-End Transformers with Sparse Attention for Video
  Captioning
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Faisal Ahmed
Zhe Gan
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
63
241
0
25 Nov 2021
MST: Masked Self-Supervised Transformer for Visual Representation
MST: Masked Self-Supervised Transformer for Visual Representation
Zhaowen Li
Zhiyang Chen
Fan Yang
Wei Li
Yousong Zhu
...
Rui Deng
Liwei Wu
Rui Zhao
Ming Tang
Jinqiao Wang
ViT
67
166
0
10 Jun 2021
1