ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.09843
53
2

MuDoC: An Interactive Multimodal Document-grounded Conversational AI System

14 February 2025
Karan Taneja
Ashok K. Goel
ArXivPDFHTML
Abstract

Multimodal AI is an important step towards building effective tools to leverage multiple modalities in human-AI communication. Building a multimodal document-grounded AI system to interact with long documents remains a challenge. Our work aims to fill the research gap of directly leveraging grounded visuals from documents alongside textual content in documents for response generation. We present an interactive conversational AI agent 'MuDoC' based on GPT-4o to generate document-grounded responses with interleaved text and figures. MuDoC's intelligent textbook interface promotes trustworthiness and enables verification of system responses by allowing instant navigation to source text and figures in the documents. We also discuss qualitative observations based on MuDoC responses highlighting its strengths and limitations.

View on arXiv
@article{taneja2025_2502.09843,
  title={ MuDoC: An Interactive Multimodal Document-grounded Conversational AI System },
  author={ Karan Taneja and Ashok K. Goel },
  journal={arXiv preprint arXiv:2502.09843},
  year={ 2025 }
}
Comments on this paper