Ethic-by-design Data Preparation

Nowadays, large-scale technologies for the management and the analysis of big data have a relevant and positive impact: they can improve people’s lives, accelerate scientific discovery and innovation, and bring about positive societal change. At the same time, it becomes increasingly important to understand the nature of these impacts at the social level and to take responsibility for them, especially when they deal with human-related data. Properties like fairness, diversity, serendipity, or coverage have been recently studied at the level of some specific data processing systems, like recommendation systems, as additional dimensions that complement basic accuracy measures with the goal of improving user satisfaction. Due to the above-mentioned social relevance and to the fact that ethical need to take responsibility is also made mandatory by the recent General Data Protection Regulation of the European Union, nowadays the development of solutions satisfying – by design – non-discriminating requirements is currently one of the main challenges in data processing and is becoming increasingly crucial when dealing with any data processing stage, including data management stages.

Our actual interest is to consider non-discriminating requirements, like fairness and diversity, during the query processing pipeline. The development of technological solutions satisfying such requirements is currently one of the main challenges in data management and it has been investigated for many front-end data processes (e.g., ranking, set selection, etc.). We are mainly interested in non-discriminating approaches for data preparation, focusing on back-end strategies: an unfair data preparation process might have a relevant impact on front-end analysis.
More precisely, the aim of our research is to design, implement, and evaluate ad hoc query processing techniques to automatically enforce specific beyond-accuracy properties, with a special reference to fairness and diversity, during the data preparation step.