Figure 5 - uploaded by Frank Krüger
Content may be subject to copyright.
1.: Confusion matrix for multi-class classification. The confusion matrix of a classification with n classes. When considering the class k (0 ≤ k ≤ n), the four different classification results can be obtained: true positive (green), true negative (orange), false positive (brown), and false negative (red). 

1.: Confusion matrix for multi-class classification. The confusion matrix of a classification with n classes. When considering the class k (0 ≤ k ≤ n), the four different classification results can be obtained: true positive (green), true negative (orange), false positive (brown), and false negative (red). 

Source publication
Thesis
Full-text available
As computers are becoming more and more a part of our everyday life, the vision of Mark Weiser about ubiquitous computing becomes true. One of the core tasks of such devices is to assist the users in achieving their goals. To do this, the assistive system has to have knowledge about the current situation as well as the user's goal. Such knowledge a...

Citations

... It is a table with four or more (depending on the number of classes) different predicted and actual values. The confusion matrix gives the amount of (mis)classifications for each class, and it uses TP, TN, FP, and FN, where they stand for the following [42]: ...
Article
Full-text available
Search engines are significant tools for finding and retrieving information. Every day, many new web pages in various languages are added. The threats of cyberattacks are expanding rapidly with this massive volume of data. The majority of studies on the detection of malicious websites focus on English-language websites. This necessitates more studies on malicious detection on Arabic-content websites. In this research, we aimed to investigate the security of Arabic-content websites by developing a detection tool that analyzes Arabic content based on artificial intelligence (AI) techniques. We contributed to the field of cybersecurity and AI by building a new dataset of 4048 Arabic-content websites. We created and conducted a comparative performance evaluation for four different machine-learning (ML) models using feature extraction and selection techniques: extreme gradient boosting, support vector machines, decision trees, and random forests. The best-performing model was then integrated into a Chrome plugin, created based on a random forest (RF) model, and utilized the features selected via the chi-square technique. This produced plugin tool attained an accuracy of 92.96% for classifying Arabic-content websites as phishing, suspicious, or benign. To our knowledge, this is the first tool designed specifically for Arabic-content websites.
... The categorization process in the confusion matrix is represented by four terms: True Positive (TP), True Negative (TN), False Positive (FN), and False Negative (FN). The illustration of the confusion matrix for the case of multiclass classification is shown in FIGURE 6 [21]. ...
Conference Paper
Full-text available
One of the leading causes of death is cardiovascular disease (CVD). This disease is the cause of 31% of deaths worldwide in 2016, and 85% of them are heart attacks. The traditional way to detect CVD is based on medical records and clinical analysis of the patient. Electrocardiogram (ECG) analysis is one way to determine irregular heartbeat or arrhythmia. Computer assistance with implementing specific machine learning algorithms can help recognize irregular heartbeats automatically. However, raw ECG data may contain noise that affects the accuracy of irregular heartbeat detection. In this study, the ECG data used was from the Massachusetts Institute of Technology–Beth Israel Hospital (MIT-BIH) database. The data has four categories: Normal, Atrial Fibrillation, PVC Bigeminy, and Ventricular Tachycardia. ECG raw data processing using multilevel discrete wavelet transforms (DWT) based on Haar and Daubechies wavelet. The process uses various values of mode (i.e. db1 until db10), level (i.e. level 1 and level 2), and filter (low and high pass filter), and the result is 20 data processed. Each data is used to create a model with several classification algorithms, i.e. K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Naïve Bayes, Decision Tree, Random Forest, and Deep Forest. The validation process uses 10-fold cross-validation. The results of this study indicate that Multilevel Discrete Wavelet Transforms improve irregular heartbeat detection accuracy when compared to raw ECG and processed data using a single DWT. While the best detection accuracy is the Deep Forest model, with an accuracy value of 63.57% using processed data with db1 mode values, level 2 and combining high and low pass filters.
... Beliau dipilih sebagai pakar dikarenakan merupakan lulusan strata 1 jurusan Pendidikan Bahasa Indonesia yang memiliki kompentensi pada bidang ini. Lalu akan dilakukan penghitung nilai akurasi, presisi, recall, dan f-measure [10]. Confusion matrix pada Tabel 2merupakan salah satu contoh parameter yang biasa digunakan sebagai indikator untuk mengukur dan membandingkan kemiripan hasil prediksi dan data asli. ...
Article
Full-text available
This study proposes a classification of public response to the government's decision to move the Indonesian capital using the lexicon method. The results of testing accuracy are measured using a confusion matrix. The data in this study use data from Twitter in the form of tweets. The data contains tweets of community responses to the decision to move the Indonesian capital. Data passes through 5 preprocessing processes, namely case folding, punctuation removal, stopword removal, stemming, and tokenizing. Lexicon is used because it produces good accuracy values. In this study also will look for a dictionary that has the best classification results. The results of this study show the results of a good classification by approaching the results by experts.
... A TP (TN) indicates a sample in the positive (negative) class was classified correctly, and an FP (FN) a sample in the negative (positive) class that was classified as positive (negative). The multi-class classification model of the confusion matrix can then be extrapolated as follows (Krüger, 2016), see Figure 4. Per row n ∈ C, the confusion matrix E ∈ N (N +1)×(N +1) comprises a 1 × (N + 1) vector whose n ′ -th entry is m:cm=n 1 {nm=n ′ } . The entries of the n-th row of E, with the n-th entry removed, correspond to the FN count for class n. ...
Article
Full-text available
Machine learning (ML)/Artificial Intelligence (AI) has widespread applications and has revolutionized many industries due to advanced and matured sensor technology, as well as large-scale data collection efforts. One of the key tasks for effective ML/AI operations is the extraction and identification of useful and usable data to identify complex interrelationships and solve problems efficiently. The usefulness of the data is the value and meaning of the data within the desired model, while the usability of the data refers to the ease of use of data in a model. Complex supervised and unsupervised ML models, which used to be the domain of cutting-edge scientists and academics, can now be invoked as a basic function calls in public domain packages within Python, R, MATLAB, and other languages. While these functions require effective data preprocessing to overcome the unpredicted impacts of data quality in the real world (e.g. missing data, environmental noise, synchronizing at different sampling rates, etc.), their ease of use means they are often called with little to no understanding of the underlying math or ways to efficiently work through the data set. The approachability provided by the packages enables users to dive into complex problem sets with little advance preparation. However, in doing so there is a lack of understanding which will inevitably cause problems, skew results, or force the user to take a less efficient path to get to a similar answer. Each package provides relatively simple examples that deal with specific public data sets, yet not many provide the background knowledge and comprehensive methods required for building the inputs for extensive and effective time-series data modeling. Typically, the complex nature of time-series data requires an in-depth understanding of signals analysis and domain subject expertise to use in ML/AI predictive models. This paper will provide the reader an overview of the problems associated with time-series data modelling, propose a common set of preprocessing steps to follow, demonstrate a taxonomy classification for time series data, provide introductory reasoning regarding the underlying process, and discuss the models that would benefit from such a methodology. This is done here with the goal of equipping non-knowledge-domain experts with updated and approachable techniques to find which features to focus on while preprocessing for their time-series data preparation efforts.
... Workflow diagram of this studyFig. 9. Confusion matrix for multi-class classification[25] ...
... Multiclass confusion matrix[20] TP or True Positive is a Positive prediction result with positive actual data, TN of True Negative is a Negative prediction result with negative actual data, FP or False Positive is a Positive prediction result with negative actual data, FN or False Negative is a Negative prediction result with positive actual data. Lastly, C is a class or label from the dataset. ...
Article
Full-text available
Personality is a unique set of motivations, feelings, and behaviors humans possess. Personality detection on social media is a research topic commonly conducted in computer science. Personality models often used for personality detection research are the Big Five Indicator (BFI) and Myers-Briggs Type Indicator (MBTI) models. Unlike the BFI, which classifies personalities based on an individual’s traits, the MBTI model classifies personalities based on the type of the individual. So, MBTI performs better in several scenarios than the Big Five model. Many studies use machine learning to detect personality on social media, such as Logistic Regression, Naïve Bayes, and Support Vector Machine. With the recent popularity of Deep Learning, we can use language models such as DistilBERT to classify personality on social media. Because of DistilBERT’s ability to process large sentences and the ability for parallelization thanks to the transformer architecture. Therefore, the proposed research will detect MBTI personality on Reddit using DistilBERT. The evaluation shows that removing stopwords on the data preprocessing stage can reduce the model’s performance, and with class imbalance handling, DistilBERT performs worse than without class imbalance handling. Also, as a comparison, DistilBERT outperforms other machine learning classifiers such as Naïve Bayes, SVM, and Logistic Regression in accuracy, precision, recall, and f1-score.
... For this purpose, a large dataset of superpixels with color features -color spaces (RGB, LAB, HSV, LUV, and YCrCb [37]) and color vegetation indices (NDI, ExG, ExGR, CIVE, and COM2 [38,39]) -and statistical features (standard deviation, skewness, kurtosis, entropy, minimum, maximum, mean, and median [34]), whose data was labeled, and the Random Forest Classifier (RFC) [40,41] was implemented. The complexity of the machine learning model, in terms of quantity and depth of decision trees, was considerably reduced by removing features with the smallest variances and by analyzing the performance metrics confusion matrix, accuracy, precision, recall, and F1-score [42], aiming for real-time application. ...
Article
Full-text available
This paper presents an Image data-based autonomous navigation system for an under-canopy agricultural mini-rover called TerraSentia. This kind of navigation is a very challenging problem due to the lack of GNSS accuracy. This happens because the crop leaves and stems attenuate the GNSS signal and produce multi-path data. In such a scenario, reactive navigation techniques based on the detection of crop rows using image data have proved to be an efficient alternative to GNSS. However, it also presents some challenges, mainly owing to leaves occlusions under the canopy and dealing with varying weather conditions. Our system addresses these issues by combining different image-based approaches using low-cost hardware. Tests were carried out using multiple robots, in different field conditions, and in different locations. The results show that our system is able to safely navigate without interventions in fields without significant gaps in the crop rows. In addition to this, we see as future steps, not only comparing more recent convolutional neural networks based on processing power needs and accuracy, but also the fusion of these vision-based approaches previously developed by our group in order to obtain the best of both approaches.
... d)Random forest : Breiman introduced Random Forests (RF) as a tree-based ensemble learning approach for classification and regression in 2001 [15], [23]. It has been frequently used in the healthcare field due to its simple structure and superior performance compared to other machine learning approaches [24]. ...
Article
Full-text available
Given the increasing number of COVID-19 cases and the risk of new variants, early prediction of disease severity in critical care patients is essential to optimize treatment options. In this study, we set up an experiment on 236 patients infected with COVID-19 and hospitalized at the Sidi Said hospital in Meknes, Morocco. This work proposes a new multivariate classification model to predict which patients admitted to hospital with COVID-19 will require special care (oxygen therapy, intensive care, resuscitation) or will die following an abrupt deterioration in their state of health. This model will help healthcare professionals (doctors) make decisions about recommending appropriate medical treatments to patients. A comparative study of different multivariate machine learning algorithms (Support Vector Machine (SVM), K-nearest neighbor (KNN), Decision Tree (DT) and Random Forest (RF)) is also presented in this article. The result obtained shows that the SVM classifier is a reliable, powerful and efficient algorithm to predict the level of risk of patients contaminated with COVID-19.
... The confusion matrix was used to determine the classification performance of the proposed models. Figure 6 illustrates the confusion matrix for an N-class model (Krüger 2016). Observations on correct and incorrect classifications are collected into confusion matrix C c ij , where c ij represents the frequency of class i being identified as class j . ...
... Confusion matrix for a classification with n classes (Krüger 2016) Content courtesy of Springer Nature, terms of use apply. Rights reserved. ...
Article
Full-text available
Purpose Soil classification is important in the field of geotechnical engineering. Soil types are usually defined by combinations of soil properties, which are interrelated and interactive. This means that distinguishing between soil types is laborious and uncertain. Instead, images are routinary and widespread. Thus, this study presents a framework using convolutional neural networks (CNN) to determine soil types from soil images. Besides, the image acquisition distance factors are incorporated and evaluated in the framework. Methods The properties of color and texture were collocated to define eight types of soil. To collect images effectively, an image acquisition method was designed. Then, the images of eight types of soil were collected at eight acquisition distances (10 to 80 cm). Two types of models including single-range scale model and the multirange scale model were trained and evaluated as per the framework. Results The accuracy of the single-range scale models was between 19 and 98%. In addition, the multirange scale models achieved an accuracy range between 51 and 98%. Moreover, the mean uncertainty of the former was between 0.11 and 0.16, and the latter was between 0.05 and 0.12. Conclusion The models can effectively infer soil types from images and improve robustness through multi-distance training as per the proposed framework. The property of color has a higher priority of the classification than the texture. Moreover, image-based soil classification is extremely sensitive to distance factors, and the perceptual distance of 70 cm was shown to be the better one among the eight selected distances.