Direkt zum Inhalt

Combe, C. G. (2020). Research Ethics in Data. In R. Iphofen (Ed.), Handbook of Research Ethics and Scientific Integrity (pp. 1–17). Springer. https://doi.org/10.1007/978-3-319-76040-7_13-1


Ninety percent of the data created in the entire history of humanity has been created in the last 2 years (2.5 quintillion bytes of data generated every day, Marr 2018). In this context, the use of data in the field of research is currently undergoing a revolution. Indeed, traditionally used as a marker of excellence (it is the data that makes it possible to reproduce and validate an experiment), its profusion makes it especially highly volatile, which impacts its use by the research community. However, what is data? How are data points created, and moreover, what are the ethical issues involved in collecting, building (as in feature engineering), and using data in and for research? (In data science, feature engineering refers to all interventions (reformatting, processing, cleaning, enrichment, calibration) performed on raw data before they are taken into account by a learning algorithm.) There is abundant literature regarding the importance of data, collection methodologies, and the need to make collected datasets verifiable. However, the feasibility of such constraints in the current context of data affluence is rarely considered. Data per se is content but also a set of information on the data (e.g., when we talk about metadata). From a unique, circumscribed object, data has become multiple, abounding, and “Big.” The accuracy of the data and the speed at which data are transmitted are equally important. Today, data is transported digitally, in a dematerialized form, and is less and less tangible, now stored in the “Cloud.” Overabundant content and fast-moving carriers have redefined the current understanding of data and its uses and created new ethical challenges, particularly as to how the structure and speed of the data profoundly determine its quality and its reliability (Krippendorff 2008). The ethical issues posed to research, on the one hand, by new data architectures (Ross 2003), and, on the other hand, by data, as a tool for measuring the validity and integrity of a research process, will be addressed in this chapter.