49
1
v1v2 (latest)

GUM-SAGE: A Novel Dataset and Approach for Graded Entity Salience Prediction

Main:9 Pages
20 Figures
Bibliography:2 Pages
5 Tables
Appendix:7 Pages
Abstract

Determining and ranking the most salient entities in a text is critical for user-facing systems, especially as users increasingly rely on models to interpret long documents they only partially read. Graded entity salience addresses this need by assigning entities scores that reflect their relative importance in a text. Existing approaches fall into two main categories: subjective judgments of salience, which allow for gradient scoring but lack consistency, and summarization-based methods, which define salience as mention-worthiness in a summary, promoting explainability but limiting outputs to binary labels (entities are either summary-worthy or not). In this paper, we introduce a novel approach for graded entity salience that combines the strengths of both approaches. Using an English dataset spanning 12 spoken and written genres, we collect 5 summaries per document and calculate each entity's salience score based on its presence across these summaries. Our approach shows stronger correlation with scores based on human summaries and alignments, and outperforms existing techniques, including LLMs. We release our data and code atthis https URLto support further research on graded salient entity extraction.

View on arXiv
@article{lin2025_2504.10792,
  title={ GUM-SAGE: A Novel Dataset and Approach for Graded Entity Salience Prediction },
  author={ Jessica Lin and Amir Zeldes },
  journal={arXiv preprint arXiv:2504.10792},
  year={ 2025 }
}
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.