JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry

Online platforms are increasingly interested in using Data-to-Text technologies to generate content and help their users. Unfortunately, traditional generative methods often fall into repetitive patterns, resulting in monotonous galleries of texts after only a few iterations. In this paper, we investigate LLM-based data-to-text approaches to automatically generate marketing texts that are of sufficient quality and diverse enough for broad adoption. We leverage Language Models such as T5, GPT-3.5, GPT-4, and LLaMa2 in conjunction with fine-tuning, few-shot, and zero-shot approaches to set a baseline for diverse marketing texts. We also introduce a metric JaccDiv to evaluate the diversity of a set of texts. This research extends its relevance beyond the music industry, proving beneficial in various fields where repetitive automated content generation is prevalent.
View on arXiv@article{afzal2025_2504.20849, title={ JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry }, author={ Anum Afzal and Alexandre Mercier and Florian Matthes }, journal={arXiv preprint arXiv:2504.20849}, year={ 2025 } }