ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.07170
66
0

DeFine: A Decomposed and Fine-Grained Annotated Dataset for Long-form Article Generation

10 March 2025
Ming Wang
Fang Wang
Minghao Hu
Li He
Haiyang Wang
Jun Zhang
Tianwei Yan
Li Li
Zhunchen Luo
Wei Luo
Xiaoying Bai
Guotong Geng
ArXivPDFHTML
Abstract

Long-form article generation (LFAG) presents challenges such as maintaining logical consistency, comprehensive topic coverage, and narrative coherence across extended articles. Existing datasets often lack both the hierarchical structure and fine-grained annotation needed to effectively decompose tasks, resulting in shallow, disorganized article generation. To address these limitations, we introduce DeFine, a Decomposed and Fine-grained annotated dataset for long-form article generation. DeFine is characterized by its hierarchical decomposition strategy and the integration of domain-specific knowledge with multi-level annotations, ensuring granular control and enhanced depth in article generation. To construct the dataset, a multi-agent collaborative pipeline is proposed, which systematically segments the generation process into four parts: Data Miner, Cite Retreiver, Q&A Annotator and Data Cleaner. To validate the effectiveness of DeFine, we designed and tested three LFAG baselines: the web retrieval, the local retrieval, and the grounded reference. We fine-tuned the Qwen2-7b-Instruct model using the DeFine training dataset. The experimental results showed significant improvements in text quality, specifically in topic coverage, depth of information, and content fidelity. Our dataset publicly available to facilitate future research.

View on arXiv
@article{wang2025_2503.07170,
  title={ DeFine: A Decomposed and Fine-Grained Annotated Dataset for Long-form Article Generation },
  author={ Ming Wang and Fang Wang and Minghao Hu and Li He and Haiyang Wang and Jun Zhang and Tianwei Yan and Li Li and Zhunchen Luo and Wei Luo and Xiaoying Bai and Guotong Geng },
  journal={arXiv preprint arXiv:2503.07170},
  year={ 2025 }
}
Comments on this paper