An integrated classification algorithm is a decision-making method that is not limited to a single classifier. It comprises multiple classifiers to maintain a high classification performance for various datasets. This study investigated the feasibility of an integrated classification algorithm for offender profiling. Offender profiling is the analysis of a crime scene using statistical and psychological methods to estimate information such as the age, job, and criminal record of the offender. In this study, the following 12 machine learning algorisms were used: decision tree (C5.0, CART by entropy or Gini), logistic regression analysis (LR), naïve bayes (NB), random forest (RF), bagging, boosting, support vector machine (SVM by radial basis function or polynomial), k-nearest neighbor (KNN), and neural network (NN). The results of the study showed that the classification performances of each algorithm varied for different objective variables of the dataset (e.g., criminal record, age, or job of offenders of residential burglar). However, the majority decisions made by a combination of three classifier algorithms (e.g., decision tree, LR, and NB) showed high classification performance regarding any dataset.
The present study compared decision tree analysis to logistic regression analysis in order to investigate whether decision tree analysis has sufficient ability to construct a model that predicts offender characteristics from the crime scene and/or victim information. The data used in this study were collected from solved single homicide cases that occurred in Japan between 2004 and 2009 (n=1226). After constructing models that predict offender's criminal history by logistic regression analysis and decision tree analysis, AUC (area under the ROC curve) of those models and the predictive values were compared. The AUC was .75 (p<.001) for logistic regression model and .71 (p<.001) for the decision tree model. A significant difference between these AUCs was not observed (χ²(1)=3.71, p=.05). The predictive values were 67.3% for both the logistic regression model and the decision tree model. These findings suggest that the decision tree is comparable to logistic regression analysis in constructing a model that predicts the offender's criminal history from offence characteristics in single homicide cases.