Local Correlation Clustering with Asymmetric Classification Errors

11 August 2021

Abstract

In the Correlation Clustering problem, we are given a complete weighted graph $G$ with its edges labeled as "similar" and "dissimilar" by a noisy binary classifier. For a clustering $\mathcal{C}$ of graph $G$ , a similar edge is in disagreement with $\mathcal{C}$ , if its endpoints belong to distinct clusters; and a dissimilar edge is in disagreement with $\mathcal{C}$ if its endpoints belong to the same cluster. The disagreements vector, $\text{dis}$ , is a vector indexed by the vertices of $G$ such that the $v$ -th coordinate $\text{dis}_v$ equals the weight of all disagreeing edges incident on $v$ . The goal is to produce a clustering that minimizes the $\ell_p$ norm of the disagreements vector for $p\geq 1$ . We study the $\ell_p$ objective in Correlation Clustering under the following assumption: Every similar edge has weight in the range of $[\alpha\mathbf{w},\mathbf{w}]$ and every dissimilar edge has weight at least $\alpha\mathbf{w}$ (where $\alpha \leq 1$ and $\mathbf{w}>0$ is a scaling parameter). We give an $O\left((\frac{1}{\alpha})^{\frac{1}{2}-\frac{1}{2p}}\cdot \log\frac{1}{\alpha}\right)$ approximation algorithm for this problem. Furthermore, we show an almost matching convex programming integrality gap.

View on arXiv

Comments on this paper