58
0

The Cost of Balanced Training-Data Production in an Online Data Market

Abstract

Many ethical issues in machine learning are connected to the training data. Online data markets are an important source of training data, facilitating both production and distribution. Recently, a trend has emerged of for-profit "ethical" participants in online data markets. This trend raises a fascinating question: Can online data markets sustainably and efficiently address ethical issues in the broader machine-learning economy?In this work, we study this question in a stylized model of an online data market. We investigate the effects of intervening in the data market to achieve balanced training-data production. The model reveals the crucial role of market conditions. In small and emerging markets, an intervention can drive the data producers out of the market, so that the cost of fairness is maximal. Yet, in large and established markets, the cost of fairness can vanish (as a fraction of overall welfare) as the market grows.Our results suggest that "ethical" online data markets can be economically feasible under favorable market conditions, and motivate more models to consider the role of data production and distribution in mediating the impacts of ethical interventions.

View on arXiv
Comments on this paper