Results of MCC and Brier score for the BS1, BS2, BS3, BS4, BS5, and BS6 use cases. normMCC = (MCC + 1)/2. complBS = 1 -BS. The values of both normMCC and complBS lay in the [0, 1] interval, with worst value equal to 0 and best value equal to 1. We reported the details of these use cases in Table 4.

Results of MCC and Brier score for the BS1, BS2, BS3, BS4, BS5, and BS6 use cases. normMCC = (MCC + 1)/2. complBS = 1 -BS. The values of both normMCC and complBS lay in the [0, 1] interval, with worst value equal to 0 and best value equal to 1. We reported the details of these use cases in Table 4.

Source publication
Article
Full-text available
Even if measuring the outcome of binary classifications is a pivotal task in machine learning and statistics, no consensus has been reached yet about which statistical rate to employ to this end. In the last century, the computer science and statistics communities have introduced several scores summing up the correctness of the predictions with res...

Context in source publication

Context 1
... 100 0 0 -1.000 0.000 1.000 K2 0 90 10 0 -1.000 -0.220 0.780 K3 0 80 20 0 -1.000 -0.471 0.529 K4 0 70 30 0 -1.000 -0.724 0.276 K5 0 60 40 0 -1.000 -0.923 0.077 K6 0 50 50 0 -1.000 -1.000 0.000 zero. To highlight these differences, we represent them as 712 barplots in Figure 5. ...

Citations

... A and are when the ML algorithm makes a correct classification, whereas and represented classification errors. In addition to 1 and 1 , a micro-averaged Matthews correlation coefficient (MCC) [93] was used, ...
... are net classification outcomes, that is, sums over the outcomes after each rating class was treated as the positive class and all others as negative. The metric was resistant to imbalance in rating classes by its inclusion of all classification outcomes through MCC [93,94]. A final metric called kappa, derived from Cohen's kappa statistic, considers the observed agreement with respect to a random guess classifier and is interpreted as a comparison of overall accuracy to the expected accuracy under a random guess [95] = − 1 − ...
Article
Full-text available
Optimization of medication therapy depends on maximizing benefits and minimizing side effects of medications. This research showed how a joint approach using text mining, natural language processing, and machine learning can provide information for personalized and optimized medication therapy. Reviews on the benefits and side effects of prescription and over-the-counter medications were used to determine how well an integrated supervised and unsupervised learning method could learn medication satisfaction. Supervised learning with naïve-Bayes, non-linear support vector machine with radial basis function kernels, and random forests with CART decision trees was measured by a micro-aggregated Matthews correlation coefficient and a macro-averaged F1 measure. Random forests outperformed support vector machines by almost 250% and naive-Bayes by 600% on the two evaluation metrics. All models did better with three rating levels, instead of five. Topic modeling and stacked cluster analysis were coupled with parts-of-speech tagging and text mining operations to establish a robust data preprocessing procedure to eliminate noisy features from the data. Unsupervised topic modeling and clustering represented an exploratory validation of how easy supervised classification would be. Well-defined latent topics were discovered including topics on ''sleep quality'', ''the opportunity to get back to work'', and ''weight gain''. Overlapping clusters revealed that incorporating more information on social, demographic, or medical history variables could improve classifier performance. This research provided evidence that medication satisfaction can be learned with carefully designed joint supervised, unsupervised, and natural language learning techniques.
... In addition, we have observed that the metrics such as balanced accuracy (B. Acc), F-measure F β , Cohen's kappa coefficient k and Matthew's correlation coefficient (MCC) have proven to be adequate in similar contexts, although they have their own limitations (Chicco et al., 2021;Chicco and Jurman, 2020;Lee et al., 2021). For this reason, Table 1 presents the results of these metrics in relation to the parameters of our model and the final results obtained in our real fire. ...
... Given their impact on public health, early identification of these diseases is crucial to prevent complications and save lives. These conditions are often diagnosed through invasive tests, which can be uncomfortable and costly for the patient, as well as consume significant resources from the healthcare system [Chicco and Jurman, 2020;Chicco et al., 2021]. ...
... In this work we used a public database (DB) called Heart failure clinical records Data Set 1 . The data was collected from 299 patients with heart failure and applying machine learning (ML) algorithms to predict the survival of patients with cardiovascular diseases [Chicco and Jurman, 2020;Chicco et al., 2021]. Investing in information systems (IS) 3 | | ...
Article
Full-text available
This article introduces an approach to diagnose heart diseases utilizing the K-Nearest Neighbor algorithm and diverse correlation filters for selecting the most pertinent attributes. Results high- light that meticulous filter selection enhances survival predictions in patients with heart diseases. Employing K = 5 and correlation filter CF = 0.1, key attributes for classification were identified as anemia, high blood pressure, serum creatinine, and sex. Omitting the 'time' attribute led to information loss but was crucial to prevent biases and generalize predictions across various clinical scenarios. Utilizing these classification parameters, we designed an Android mobile application called “Heart Info System”, functioning as an artificial intelligence service. It employs the K-Nearest Neighbor algorithm with optimal parameters to evaluate the probability of survival in the progression of heart disease. The main activity of the application retrieves data from a Firebase database. While the study results show promise, the accuracy of the application may be influenced by inaccurate or incomplete input data. Nevertheless, this application has the potential to improve the early detection of heart diseases, paving the way for life-saving interventions.
... In addition, to draw more robust conclusions on the predictive performance, we also included the Adjusted Rand Index (ARI) and the 1-Brier score [22,23]. Note that the MCC is considered more informative than both the ARI and the Brier Score in binary classification evaluations [24]. When regularization techniques were applied, the definitive metric score was established by identifying the highest performance value from among the three regularization techniques: Lasso, Ridge, and Elastic Net. ...
Article
Full-text available
Background Numerous transcriptomic-based models have been developed to predict or understand the fundamental mechanisms driving biological phenotypes. However, few models have successfully transitioned into clinical practice due to challenges associated with generalizability and interpretability. To address these issues, researchers have turned to dimensionality reduction methods and have begun implementing transfer learning approaches. Methods In this study, we aimed to determine the optimal combination of dimensionality reduction and regularization methods for predictive modeling. We applied seven dimensionality reduction methods to various datasets, including two supervised methods (linear optimal low-rank projection and low-rank canonical correlation analysis), two unsupervised methods [principal component analysis and consensus independent component analysis (c-ICA)], and three methods [autoencoder (AE), adversarial variational autoencoder, and c-ICA] within a transfer learning framework, trained on > 140,000 transcriptomic profiles. To assess the performance of the different combinations, we used a cross-validation setup encapsulated within a permutation testing framework, analyzing 30 different transcriptomic datasets with binary phenotypes. Furthermore, we included datasets with small sample sizes and phenotypes of varying degrees of predictability, and we employed independent datasets for validation. Results Our findings revealed that regularized models without dimensionality reduction achieved the highest predictive performance, challenging the necessity of dimensionality reduction when the primary goal is to achieve optimal predictive performance. However, models using AE and c-ICA with transfer learning for dimensionality reduction showed comparable performance, with enhanced interpretability and robustness of predictors, compared to models using non-dimensionality-reduced data. Conclusion These findings offer valuable insights into the optimal combination of strategies for enhancing the predictive performance, interpretability, and generalizability of transcriptomic-based models.
... While widely used, F1 score and accuracy can lead to overly optimistic performance estimates, particularly in datasets with a positive class imbalance [84]. Previous research has demonstrated that MCC offers a more informative and reliable evaluation compared to OA [85], F1 score [85], and Cohen's kappa [86], especially when dealing with challenging imbalanced classification tasks. This is because MCC provides a more balanced assessment of classifiers, no matter which class is positive [84]. ...
Article
Full-text available
Accurate urban land cover information is crucial for effective urban planning and management. While convolutional neural networks (CNNs) demonstrate superior feature learning and prediction capabilities using image-level annotations, the inherent mixed-category nature of input image patches leads to classification errors along object boundaries. Fully convolutional neural networks (FCNs) excel at pixel-wise fine segmentation, making them less susceptible to heterogeneous content, but they require fully annotated dense image patches, which may not be readily available in real-world scenarios. This paper proposes an object-based semi-supervised spatial attention residual UNet (OS-ARU) model. First, multiscale segmentation is performed to obtain segments from a remote sensing image, and segments containing sample points are assigned the categories of the corresponding points, which are used to train the model. Then, the trained model predicts class probabilities for all segments. Each unlabeled segment’s probability distribution is compared against those of labeled segments for similarity matching under a threshold constraint. Through label propagation, pseudo-labels are assigned to unlabeled segments exhibiting high similarity to labeled ones. Finally, the model is retrained using the augmented training set incorporating the pseudo-labeled segments. Comprehensive experiments on aerial image benchmarks for Vaihingen and Potsdam demonstrate that the proposed OS-ARU achieves higher classification accuracy than state-of-the-art models, including OCNN, 2OCNN, and standard OS-U, reaching an overall accuracy (OA) of 87.83% and 86.71%, respectively. The performance improvements over the baseline methods are statistically significant according to the Wilcoxon Signed-Rank Test. Despite using significantly fewer sparse annotations, this semi-supervised approach still achieves comparable accuracy to the same model under full supervision. The proposed method thus makes a step forward in substantially alleviating the heavy sampling burden of FCNs (densely sampled deep learning models) to effectively handle the complex issue of land cover information identification and classification.
... The search for Vth was carried out by a sequential search within the limits of changes in the entropy feature, with the determination of the maximum MCC (Matthews correlation coefficient [53]) value. We calculated the MCC for the entire database without dividing it into test and training data, equivalent to calculating the MCC on training data. ...
Article
Full-text available
The classification of time series using machine learning (ML) analysis and entropy-based features is an urgent task for the study of nonlinear signals in the fields of finance, biology and medicine, including EEG analysis and Brain–Computer Interfacing. As several entropy measures exist, the problem is assessing the effectiveness of entropies used as features for the ML classification of nonlinear dynamics of time series. We propose a method, called global efficiency (GEFMCC), for assessing the effectiveness of entropy features using several chaotic mappings. GEFMCC is a fitness function for optimizing the type and parameters of entropies for time series classification problems. We analyze fuzzy entropy (FuzzyEn) and neural network entropy (NNetEn) for four discrete mappings, the logistic map, the sine map, the Planck map, and the two-memristor-based map, with a base length time series of 300 elements. FuzzyEn has greater GEFMCC in the classification task compared to NNetEn. However, NNetEn classification efficiency is higher than FuzzyEn for some local areas of the time series dynamics. The results of using horizontal visibility graphs (HVG) instead of the raw time series demonstrate the GEFMCC decrease after HVG time series transformation. However, the GEFMCC increases after applying the HVG for some local areas of time series dynamics. The scientific community can use the results to explore the efficiency of the entropy-based classification of time series in “The Entropy Universe”. An implementation of the algorithms in Python is presented.
... We considered the following classifiers: Logistic Regression (LR), Support Vector Machine (SVM) [34], Decision Tree (Tree), Random Forest (RF) [35] and XGBoost (XGB) [36] (evalution metric: logloss and objective function: binary/logistic). To evaluate the performance of the classifiers, Matthews correlation coefficients (MCC) [38] on the cross-validated test sets were considered because its proven ability to summarize results from contingency tables and its invariance to class swapping [39][40][41]60]. Specifically, the MCC can take values ranging from -1 to +1, where -1 represents the misclassification of all observations, 0 represents the random association, and 1 perfect classification. ...
Article
Full-text available
Background Systemic inflammatory response syndrome (SIRS) and sepsis are the most common causes of in-hospital death. However, the characteristics associated with the improvement in the patient conditions during the ICU stay were not fully elucidated for each population as well as the possible differences between the two. Goal The aim of this study is to highlight the differences between the prognostic clinical features for the survival of patients diagnosed with SIRS and those of patients diagnosed with sepsis by using a multi-variable predictive modeling approach with a reduced set of easily available measurements collected at the admission to the intensive care unit (ICU). Methods Data were collected from 1,257 patients (816 non-sepsis SIRS and 441 sepsis) admitted to the ICU. We compared the performance of five machine learning models in predicting patient survival. Matthews correlation coefficient (MCC) was used to evaluate model performances and feature importance, and by applying Monte Carlo stratified Cross-Validation. Results Extreme Gradient Boosting (MCC = 0.489) and Logistic Regression (MCC = 0.533) achieved the highest results for SIRS and sepsis cohorts, respectively. In order of importance, APACHE II, mean platelet volume ( MPV ), eosinophil counts ( EoC ), and C-reactive protein ( CRP ) showed higher importance for predicting sepsis patient survival, whereas, SOFA, APACHE II, platelet counts ( PLTC ), and CRP obtained higher importance in the SIRS cohort. Conclusion By using complete blood count parameters as predictors of ICU patient survival, machine learning models can accurately predict the survival of SIRS and sepsis ICU patients. Interestingly, feature importance highlights the role of CRP and APACHE II in both SIRS and sepsis populations. In addition, MPV and EoC are shown to be important features for the sepsis population only, whereas SOFA and PLTC have higher importance for SIRS patients.
... When MCC results are observed, there is a difference of 4.75% between XGB and KNN in favor of the proposed model. This is one of the most reliable statistical indices, yielding high values only when correctly performed across all four categories of the confusion matrix [45]. The differences are significantly higher when comparing the values of F1 score, Kappa, and DYI. ...
Article
Full-text available
Simple Summary Non-alcoholic fatty liver disease (NAFLD) is the most prevalent chronic liver condition globally. The increasing incidence of NAFLD suggests that in the upcoming years, NAFLD-related hepatocellular carcinoma (HCC) is poised to become the leading cause of this type of tumor. The aim of this study is to evaluate the survival rates of these patients and identify the primary risk factors contributing to a less favorable prognosis. To accomplish this, we have employed machine learning techniques. This introduces a novel approach for identifying these factors that can be targeted to enhance the life expectancy of these patients, offering a more personalized and effective management strategy. This enhanced management approach not only aids in the optimization of patient care but also facilitates the delivery of the most effective available treatments. Abstract Non-alcoholic fatty liver disease (NAFLD) is the most common chronic liver disease worldwide, with an incidence that is exponentially increasing. Hepatocellular carcinoma (HCC) is the most frequent primary tumor. There is an increasing relationship between these entities due to the potential risk of developing NAFLD-related HCC and the prevalence of NAFLD. There is limited evidence regarding prognostic factors at the diagnosis of HCC. This study compares the prognosis of HCC in patients with NAFLD against other etiologies. It also evaluates the prognostic factors at the diagnosis of these patients. For this purpose, a multicenter retrospective study was conducted involving a total of 191 patients. Out of the total, 29 presented NAFLD-related HCC. The extreme gradient boosting (XGB) method was employed to develop the reference predictive model. Patients with NAFLD-related HCC showed a worse prognosis compared to other potential etiologies of HCC. Among the variables with the worst prognosis, alcohol consumption in NAFLD patients had the greatest weight within the developed predictive model. In comparison with other studied methods, XGB obtained the highest values for the analyzed metrics. In conclusion, patients with NAFLD-related HCC and alcohol consumption, obesity, cirrhosis, and clinically significant portal hypertension (CSPH) exhibited a worse prognosis than other patients. XGB developed a highly efficient predictive model for the assessment of these patients.
... The Matthews Correlation Coefficient (MCC) takes into account true positives, true negatives, false positives, and false VOLUME 4, 2016 negatives to measure the quality of the algorithm by classifying the attack and non-attack. It returns a value between -1 and 1, where 1 indicates a perfect prediction, 0 represents a random prediction, and -1 indicates total disagreement between prediction and observation and is given by Equation (21) [56]. ...
Article
Full-text available
The rise of smart cities, smart homes, and smart health powered by the Internet of Things (IoT) presents significant challenges in design, deployment, and security. The seamless data processing across a complex network of interconnected devices in unprotected conditions makes it vulnerable to potential breaches, underscoring the need for robust security at various levels of the network. Traditional security methods based on statistics often struggle to comprehend data patterns and provide the desired level of security. This work proposes a novel hybrid framework that combines Whale Optimization and Deep Learning with a trust-index to identify malicious nodes engaging in various attacks such as DoS, DDoS, Drop attack, and Tamper Attacks, thus enhancing IoT node security. The developed framework first calculates a trust-index score for IoT nodes based on drop attack, tamper attack, replay attack, and multiple-max attack. Subsequently, it utilizes the trust index score in the Optimized Neural Network model to effectively identify the malicious IoT node. The neural network optimization is achieved through a fitness function that determines optimal weights using the Whale Optimization Algorithm. The proposed framework has been tested across varying network sizes, comprising 100, 500, and 1000 nodes. The resulting outcomes were evaluated against benchmark security methods such as Logical regression, Random Forest, Support Vector Machine, Bayesian models, ANN, Elephant herding optimization, and Lion algorithm using metrics like specificity, sensitivity, accuracy, precision, False Positive Rate, False Negative Rate, False Discovery Rate, Error, F1 score, Matthews Correlation Coefficient, and Negative Predictive Value. The results reveal a notable enhancement in accuracy (26.63%, 13.04%, 17.78%, 30.52%, 22.45%, 4.26%, and 2.24%) for a 100- node network when compared to the benchmark security methods. Furthermore, the proposed framework consistently demonstrates strong performance even when applied to larger IoT networks with a higher node count.
... Cohen's Kappa is a popular measurement used in qualitative coding to calculate interraterreliability, which is also often used to compare human and AI coding (Kolesnyk & Khairova, 2022). Several scholars criticised the use of Cohen's Kappa to measure the performance of classification models in favour of a more robust Matthews correlation coefficient (Delgado & Tibau, 2019, Chicco et al., 2021. Others suggested the use of Gwet AC1, a metric developed to correct for the influence of prevalence on Cohen's Kappa (Gwet, 2008, Blood & Spratt, 2007. ...