Background and objective:
The traditional statistical screening method for thalassemia based on red blood cell (RBC) indices is being replaced by machine learning. Here, we developed deep neural networks (DNNs) that outperformed the traditional method for predicting thalassemia.
Method:
Using a dataset of 8693 records comprising genetic tests and other 11 features we constructed 11 DNN models and 4 traditional statistical models and then compared their performances and analysed feature importance for interpreting DNN models.
Results:
The area under the receiver operating characteristic curve, accuracy, Youden's index, F1 score, sensitivity, specificity, positive predictive value and negative predictive value, were 0.960, 0.897, 0.794, 0.897, 0.883, 0.911, 0.914, and 0.882, respectively, for our best model, and compared with the traditional statistical model based on the mean corpuscular volume, these values were increased by 10.22%, 10.09%, 26.55%, 8.92%, 4.13%, 16.90%, 13.86% and 6.07%, respectively, and by 15.38%, 11.70%, 31.70%, 9.89%, 3.05%, 22.13%, 17.11% and 5.94%, respectively, for the mean cellular haemoglobin model. The DNN model performance will reduce without age, RBC distribution width (RDW), sex, or both WBC and PLT.
Conclusions:
Our DNN model outperformed the current screening model. In 8 features, RDW and age were the most useful, followed by sex and the combination of WBC and PLT, the remaining nearly useless.