October 2024
·
10 Reads
Software systems with machine learning (ML) components are being used in a wide range of domains. Developers of such systems face challenges that are different from those of traditional systems because the performance of ML systems is directly linked to their input data. This work shows that ML systems can be improved over time by actively monitoring the data that passes through them and retraining their models in case of drift detection. To this end, we first assess some widely used statistical and distance-based methods for data drift detection, discussing their pros and cons. Then, we present results from experiments performed using these methods in real-world and synthetic datasets to detect data drifts and improve the system’s robustness automatically.