ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.16137
47
0

Chain-of-Description: What I can understand, I can put into words

22 February 2025
J. Guo
Daimeng Wei
Z. Li
Hengchao Shang
Yuanchang Luo
Hao Yang
ArXivPDFHTML
Abstract

In this paper, we propose a novel strategy defined as Chain-of-Description (CoD) Prompting, tailored for Multi-Modal Large Language Models. This approach involves having the model first provide a detailed description of the multi-modal input before generating an answer to the question. When applied to models such as Qwen2-Audio, Qwen2-VL, and Qwen2.5-VL, CoD Prompting significantly enhances performance compared to standard prompting methods. This is demonstrated by nearly a 4\% improvement in the speech category of the audio benchmark AIR-Bench-Chat and a 5.3\% improvement in the hard-level portion of the vision benchmark MMMU\_Pro. Our ablation study further validates the effectiveness of CoD Prompting.

View on arXiv
@article{guo2025_2502.16137,
  title={ Chain-of-Description: What I can understand, I can put into words },
  author={ Jiaxin Guo and Daimeng Wei and Zongyao Li and Hengchao Shang and Yuanchang Luo and Hao Yang },
  journal={arXiv preprint arXiv:2502.16137},
  year={ 2025 }
}
Comments on this paper