ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.08424
44
1

Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task

12 April 2024
Hassan Ali
Philipp Allgeuer
Stefan Wermter
ArXivPDFHTML
Abstract

Human intention-based systems enable robots to perceive and interpret user actions to interact with humans and adapt to their behavior proactively. Therefore, intention prediction is pivotal in creating a natural interaction with social robots in human-designed environments. In this paper, we examine using Large Language Models (LLMs) to infer human intention in a collaborative object categorization task with a physical robot. We propose a novel multimodal approach that integrates user non-verbal cues, like hand gestures, body poses, and facial expressions, with environment states and user verbal cues to predict user intentions in a hierarchical architecture. Our evaluation of five LLMs shows the potential for reasoning about verbal and non-verbal user cues, leveraging their context-understanding and real-world knowledge to support intention prediction while collaborating on a task with a social robot. Video:this https URL

View on arXiv
@article{ali2025_2404.08424,
  title={ Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task },
  author={ Hassan Ali and Philipp Allgeuer and Stefan Wermter },
  journal={arXiv preprint arXiv:2404.08424},
  year={ 2025 }
}
Comments on this paper