May 2025
·
6 Reads
International Journal for Population Data Science
Source and contact tracing (SCT) is a core public health measure that is used to contain the spread of infectious diseases. It aims to identify a source of infection, and to advise those who have been exposed to this source. Due to the rapid increases in incidence of COVID-19 in the Netherlands, the capacity to conduct a full SCT quickly became insufficient. Therefore, the public health services (PHS) might benefit from a restricted strategy targeted to geographical regions where (predicted) case-to-case transmission is high. In this study, we set out to develop a prediction model for the number of COVID-19 cases per postal code within the Netherlands using geographic and demographic features. The study population consists of individuals residing in one of the participating nine Dutch PHS regions who tested positive for SARS-CoV-2 between 1 June 2020 and 27 February 2021. Using a machine learning random forest regression model, we predicted the top 100 postal codes with the highest number of cases with an accuracy of 49% for the current week, 42% for next week, and 44% for two weeks from present. In addition, the age groups of 20-39 and 40-64 years had a higher prediction accuracy than groups outside these age ranges. The developed model provides a starting point for targeted preventive SCT efforts that incorporate geospatial and demographic characteristics of a neighbourhood. It should nonetheless be noted that during the early stages of the outbreak, the number of available datapoints needed to inform such models are likely insufficient. Given the accuracy and data requirements of the developed model, it is unlikely that this class of models can play a pivotal role in informing policy during the early phases of a future epidemic.