ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2208.00194
19
8

Streaming Algorithms for Diversity Maximization with Fairness Constraints

30 July 2022
Yanhao Wang
Francesco Fabbri
M. Mathioudakis
ArXivPDFHTML
Abstract

Diversity maximization is a fundamental problem with wide applications in data summarization, web search, and recommender systems. Given a set XXX of nnn elements, it asks to select a subset SSS of k≪nk \ll nk≪n elements with maximum \emph{diversity}, as quantified by the dissimilarities among the elements in SSS. In this paper, we focus on the diversity maximization problem with fairness constraints in the streaming setting. Specifically, we consider the max-min diversity objective, which selects a subset SSS that maximizes the minimum distance (dissimilarity) between any pair of distinct elements within it. Assuming that the set XXX is partitioned into mmm disjoint groups by some sensitive attribute, e.g., sex or race, ensuring \emph{fairness} requires that the selected subset SSS contains kik_iki​ elements from each group i∈[1,m]i \in [1,m]i∈[1,m]. A streaming algorithm should process XXX sequentially in one pass and return a subset with maximum \emph{diversity} while guaranteeing the fairness constraint. Although diversity maximization has been extensively studied, the only known algorithms that can work with the max-min diversity objective and fairness constraints are very inefficient for data streams. Since diversity maximization is NP-hard in general, we propose two approximation algorithms for fair diversity maximization in data streams, the first of which is 1−ε4\frac{1-\varepsilon}{4}41−ε​-approximate and specific for m=2m=2m=2, where ε∈(0,1)\varepsilon \in (0,1)ε∈(0,1), and the second of which achieves a 1−ε3m+2\frac{1-\varepsilon}{3m+2}3m+21−ε​-approximation for an arbitrary mmm. Experimental results on real-world and synthetic datasets show that both algorithms provide solutions of comparable quality to the state-of-the-art algorithms while running several orders of magnitude faster in the streaming setting.

View on arXiv
Comments on this paper