JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry

29 April 2025

Anum Afzal

Alexandre Mercier

Florian Matthes

Abstract

Online platforms are increasingly interested in using Data-to-Text technologies to generate content and help their users. Unfortunately, traditional generative methods often fall into repetitive patterns, resulting in monotonous galleries of texts after only a few iterations. In this paper, we investigate LLM-based data-to-text approaches to automatically generate marketing texts that are of sufficient quality and diverse enough for broad adoption. We leverage Language Models such as T5, GPT-3.5, GPT-4, and LLaMa2 in conjunction with fine-tuning, few-shot, and zero-shot approaches to set a baseline for diverse marketing texts. We also introduce a metric JaccDiv to evaluate the diversity of a set of texts. This research extends its relevance beyond the music industry, proving beneficial in various fields where repetitive automated content generation is prevalent.

View on arXiv

@article{afzal2025_2504.20849,
  title={ JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry },
  author={ Anum Afzal and Alexandre Mercier and Florian Matthes },
  journal={arXiv preprint arXiv:2504.20849},
  year={ 2025 }
}

Comments on this paper