Cleansing data from impurities is an integral part of data processing and mainte-nance. This has lead to the development of a broad range of methods intending to enhance the accuracy and thereby the usability of existing data. This paper pre-sents a survey of data cleansing problems, approaches, and methods. We classify the various types of anomalies occurring in data that have to be eliminated, and we define a set of quality criteria that comprehensively cleansed data has to ac-complish. Based on this classification we evaluate and compare existing ap-proaches for data cleansing with respect to the types of anomalies handled and eliminated by them. We also describe in general the different steps in data clean-sing and specify the methods used within the cleansing process and give an out-look to research directions that complement the existing systems.
Figures - uploaded by
Heiko MüllerAuthor contentAll figure content in this area was uploaded by Heiko Müller
Content may be subject to copyright.