Content uploaded by Cristian Augusto
Author content
All content in this area was uploaded by Cristian Augusto on Oct 23, 2020
Content may be subject to copyright.
Software Engineering Research Group
University of Oviedo
1
giis.uniovi.es
Test-driven Anonymization for Artificial Intelligence
2019 IEEE International Conference on Artificial Intelligence Testing, AITest
2019
Cristian Augusto
Department of Computing
University of Oviedo
Gijón, Spain
augustocristian@uniovi.es
Jesús Morán
Department of Computing
University of Oviedo
Gijón, Spain
moranjesus@uniovi.es
Claudio de la Riva
Department of Computing
University of Oviedo
Gijón, Spain
claudio@uniovi.es
Javier Tuya
Department of Computing
University of Oviedo
Gijón, Spain
tuya@uniovi.es
Abstract
In recent years, data published and shared with third parties to develop artificial intelligence (AI) tools
and services has significantly increased. When there are regulatory or internal requirements regarding
privacy of data, anonymization techniques are used to maintain privacy by transforming the data. The
side-effect is that the anonymization may lead to useless data to train and test the AI because it is highly
dependent on the quality of the data. To overcome this problem, we propose a test-driven anonymization
approach for artificial intelligence tools. The approach tests different anonymization efforts to achieve a
trade-off in terms of privacy (non-functional quality) and functional suitability of the artificial intelligence
technique (functional quality). The approach has been validated by means of two real-life datasets in the
domains of healthcare and health insurance. Each of these datasets is anonymized with several privacy
protections and then used to train classification AIs. The results show how we can anonymize the data to
achieve an adequate functional suitability in the AI context while maintaining the privacy of the
anonymized data as high as possible.
Article Available in: http://hdl.handle.net/10651/56868