ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.10836
7
0

Multimodal Event Detection: Current Approaches and Defining the New Playground through LLMs and VLMs

16 May 2025
Abhishek Dey
Aabha Bothera
Samhita Sarikonda
Rishav Aryan
Sanjay Kumar Podishetty
Akshay Havalgi
Gaurav Singh
Saurabh Srivastava
ArXivPDFHTML
Abstract

In this paper, we study the challenges of detecting events on social media, where traditional unimodal systems struggle due to the rapid and multimodal nature of data dissemination. We employ a range of models, including unimodal ModernBERT and ConvNeXt-V2, multimodal fusion techniques, and advanced generative models like GPT-4o, and LLaVA. Additionally, we also study the effect of providing multimodal generative models (such as GPT-4o) with a single modality to assess their efficacy. Our results indicate that while multimodal approaches notably outperform unimodal counterparts, generative approaches despite having a large number of parameters, lag behind supervised methods in precision. Furthermore, we also found that they lag behind instruction-tuned models because of their inability to generate event classes correctly. During our error analysis, we discovered that common social media issues such as leet speak, text elongation, etc. are effectively handled by generative approaches but are hard to tackle using supervised approaches.

View on arXiv
@article{dey2025_2505.10836,
  title={ Multimodal Event Detection: Current Approaches and Defining the New Playground through LLMs and VLMs },
  author={ Abhishek Dey and Aabha Bothera and Samhita Sarikonda and Rishav Aryan and Sanjay Kumar Podishetty and Akshay Havalgi and Gaurav Singh and Saurabh Srivastava },
  journal={arXiv preprint arXiv:2505.10836},
  year={ 2025 }
}
Comments on this paper