ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.03857
69
1

Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation

4 June 2025
Mingxuan Xia
Haobo Wang
Yixuan Li
Zewei Yu
Jindong Wang
Junbo Zhao
Runze Wu
ArXiv (abs)PDFHTML
Main:8 Pages
6 Figures
Bibliography:5 Pages
13 Tables
Appendix:8 Pages
Abstract

Recently, Large Language Models (LLMs) have demonstrated significant potential for data annotation, markedly reducing the labor costs associated with downstream applications. However, existing methods mostly adopt an aggressive strategy by prompting LLM to determine a single gold label for each unlabeled sample. Due to the inherent uncertainty within LLMs, they often produce incorrect labels for difficult samples, severely compromising the data quality for downstream applications. Motivated by ambiguity aversion in human behaviors, we propose a novel candidate annotation paradigm wherein large language models are encouraged to output all possible labels when incurring uncertainty. To ensure unique labels are provided for downstream tasks, we develop a teacher-student framework CanDist that distills candidate annotations with a Small Language Model (SLM). We further provide a rigorous justification demonstrating that distilling candidate annotations from the teacher LLM offers superior theoretical guarantees compared to directly using single annotations. Extensive experiments across six text classification tasks validate the effectiveness of our proposed method. The source code is available atthis https URL.

View on arXiv
@article{xia2025_2506.03857,
  title={ Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation },
  author={ Mingxuan Xia and Haobo Wang and Yixuan Li and Zewei Yu and Jindong Wang and Junbo Zhao and Runze Wu },
  journal={arXiv preprint arXiv:2506.03857},
  year={ 2025 }
}
Comments on this paper