Joint work with Ioana Bercea, Martin Groß, Samir Khuller, Aounon Kumar, Clemens Rösner, and Melanie Schmidt.
Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters -- especially if the data is already biased.
At NIPS 2017, Chierichetti et. al. proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation for the fair k-center problem and a O(t)-approximation for the fair k-median problem, where t is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation for fair k-center.
We extend and improve the known results. Firstly, we give a 5-approximation for the fair k-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bi-criteria constant-factor approximations for all of the classical clustering objectives k-center, k-supplier, k-median, k-means and facility location.
All approximations are achieved in a single framework: It takes an existing unfair solution and a fair LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where the centers are already fixed.
|