ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.06084
53
0

WisWheat: A Three-Tiered Vision-Language Dataset for Wheat Management

6 June 2025
Bowen Yuan
Selena Song
Javier Fernandez
Yadan Luo
Mahsa Baktashmotlagh
Zijian Wang
ArXiv (abs)PDFHTML
Main:6 Pages
2 Figures
Bibliography:2 Pages
3 Tables
Abstract

Wheat management strategies play a critical role in determining yield. Traditional management decisions often rely on labour-intensive expert inspections, which are expensive, subjective and difficult to scale. Recently, Vision-Language Models (VLMs) have emerged as a promising solution to enable scalable, data-driven management support. However, due to a lack of domain-specific knowledge, directly applying VLMs to wheat management tasks results in poor quantification and reasoning capabilities, ultimately producing vague or even misleading management recommendations. In response, we propose WisWheat, a wheat-specific dataset with a three-layered design to enhance VLM performance on wheat management tasks: (1) a foundational pretraining dataset of 47,871 image-caption pairs for coarsely adapting VLMs to wheat morphology; (2) a quantitative dataset comprising 7,263 VQA-style image-question-answer triplets for quantitative trait measuring tasks; and (3) an Instruction Fine-tuning dataset with 4,888 samples targeting biotic and abiotic stress diagnosis and management plan for different phenological stages. Extensive experimental results demonstrate that fine-tuning open-source VLMs (e.g., Qwen2.5 7B) on our dataset leads to significant performance improvements. Specifically, the Qwen2.5 VL 7B fine-tuned on our wheat instruction dataset achieves accuracy scores of 79.2% and 84.6% on wheat stress and growth stage conversation tasks respectively, surpassing even general-purpose commercial models such as GPT-4o by a margin of 11.9% and 34.6%.

View on arXiv
@article{yuan2025_2506.06084,
  title={ WisWheat: A Three-Tiered Vision-Language Dataset for Wheat Management },
  author={ Bowen Yuan and Selena Song and Javier Fernandez and Yadan Luo and Mahsa Baktashmotlagh and Zijian Wang },
  journal={arXiv preprint arXiv:2506.06084},
  year={ 2025 }
}
Comments on this paper