Conference PaperPDF Available

Evaluation of Distance Measures for Multi-class Classification in Binary SVM Decision Tree


Abstract and Figures

Multi-class classification can often be constructed as a generalization of binary classification. The approach that we use for solving this kind of classification problem is SVM based Binary Decision Tree architecture (SVM-BDT). It takes advantage of both the efficient computation of the decision tree architecture and the high classification accuracy of SVMs. The hierarchy of binary decision subtasks using SVMs is designed with a clustering algorithm. In this work, we are investigating how different distance measures for the clustering influence the predictive performance of the SVM-BDT. The distance measures that we consider include Euclidian distance, Standardized Euclidean distance and Mahalanobis distance. We use five different datasets to evaluate the performance of the SVM based Binary Decision Tree architecture with different distances. Also, the performance of this architecture is compared with four other SVM based approaches, ensembles of decision trees and neural network. The results from the experiments suggest that the performance of the architecture significantly varies depending of applied distance measure in the clustering process.
Content may be subject to copyright.
A preview of the PDF is not available
... In this tree, the nodes that have two or more branches is called a decision node and the node that has no branch is called a leaf node. Cheong et al. [11], Madzarov et al. [35] and Madzarov and Gjorgjevikj [36], presented the use of SVM based on Binary Decision Tree (SVM-BDT) for multi-class classification problem. Pros: ...
Object recognition is one of the research area in the field of computer vision and image processing because of its varied applications in surveillance and security systems, biometrics, intelligent vehicle system, content based image retrieval, etc. Many researchers have already done a lot of work in this area, but still there are many issues like scale, rotation, illumination, viewpoint, occlusion, background clutter among many more that draw the attention of the researchers. Object recognition is the task of recognizing the object and labeling the object in an image. The main goal of this survey is to present a comprehensive study in the field of 2D object recognition. An object is recognized by extracting the features of object like color of the object, texture of the object or shape or some other features. Then based on these features, objects are classified into various classes and each class is assigned a name. In this paper, various feature extraction techniques and classification algorithms are discussed which are required for object recognition. As the deep learning has made a tremendous improvement in object recognition process, so the paper also presents the recognition results achieved with various deep learning methods. The survey also includes the applications of object recognition system and various challenges faced while recognizing the object. Pros and cons of feature extraction and classification algorithms are also discussed which may help other researchers during their initial period of study. In this survey, the authors have also reported an analysis of various researches that describes the techniques used for object recognition with the accuracy achieved on particular image dataset. Finally, this paper ends with concluding notes and future directions. The aim of this study is to introduce the researchers about various techniques used for object recognition system.
... It measures the separation of two groups of samples. It differs from Euclidean distance in that it takes into account the correlations of the data set and is scale-invariant [71]. Therefore, it is also called statistical distance [68], hence its relative advantage. ...
Energy generation from biomass requires a nexus of different sources irrespective of origin. A detailed and scientific understanding of the class to which a biomass resource belongs is therefore highly essential for energy generation. An intelligent classification of biomass resources based on properties offers a high prospect in analytical, operational and strategic decision-making. This study proposes the k-Nearest Neighbour (k-NN) classification model to classify biomass based on their properties. The study scientifically classified 214 biomass dataset obtained from several articles published in reputable journals. Four different values of k (k=1,2,3,4) were experimented for various self normalizing distance functions and their results compared for effectiveness and efficiency in order to determine the optimal model. The k–NN model based on Mahalanobis distance function revealed a great accuracy at k=3 with Root Mean Squared Error (RMSE), Accuracy, Error, Sensitivity, Specificity, False positive rate, Kappa statistics and Computation time (in seconds) of 1.42, 0.703, 0.297, 0.580, 0.953, 0.047, 0.622, and 4.7 respectively. The authors concluded that k–NN based classification model is feasible and reliable for biomass classification. The implementation of this classification models shows that k–NN can serve as a handy tool for biomass resources classification irrespective of the sources and origins.
... They depend on 1. Indirect methodology and they are one against one, one against one, all against all and directed acyclic graph SVM 2. Direct approach endeavor to discover separate limits for all classes in one stage [8][9][10]. Numerous articles turned out based on these essential systems for multi class grouping [11,12]. Despite the fact that they are being utilized generally have a few drawbacks that they are capable to form only one measure at a time henceforth it devours more computational power and even costly. ...
A feature selection technique is highly preferred preceding data classification to improve prediction performance especially in the high dimensional space. In general, filter techniques can be considered as essential or assistant selection system on account of their effortlessness, adaptability, and low computational many-sided quality. Nonetheless, a progression of inconsequential cases demonstrates that filter techniques result in less precise execution since they disregard the conditions of features. Albeit few publications have committed their regard for uncover the relationship of features by multivariate-based techniques, these strategies depict connections among elements just by linear techniques. While straightforward linear combination relationship limits the transformation in execution. In this paper, we utilized kernel method for svm-RFE with MRMR way to deal with find inalienable nonlinear connections among features and also amongst feature and target. So as to uncover the viability of our technique we played out a few analyses and thought about the outcomes between our technique and other aggressive multivariatebased features selectors. In our examination, we utilized three classifiers (support vector machine, neural system and average perceptron) on two gathering datasets, to be specific two-class and multi-class datasets (principally focused on svm). Exploratory results show that the execution of our technique is superior to anything others, particularly on three hard group datasets, to be specific Wang’s Breast Cancer, Gordon’s Lung Adenocarcinoma and Pomeroy’s Medulloblastoma. Note: Entire Implementation was developed with MS Machine learning studio.
... They are based on 1. Indirect approach which are one against one, one against one, all against all and directed acyclic graph SVM. 2. Direct approach attempt to find separate boundaries for all classes in one step [16, 17, 18]. Many articles came out based on these basic techniques for multi class classification [19, 20]. Even though they are being used widely have some downsides that they are capable to form only one measure at a time hence it consumes more computational power and even expensive. ...
... There are two common approaches to design the hierarchical structures of BT. The first one is applying the average distances between classes [30]. Namely the bigger average distances value indicates the better separability and then it should be put at the top of the hierarchical structures of BT. ...
... L'architecture la plus répandue consiste à incorporer un modèle SVM dans chaque noeud intermédiaire d'un arbre de décision (figure 5.15). L'objectif de cette architecture étant de tirer avantage de la précision des SVMs et de la simplicité des arbres de décision [Madzarov and Gjorgjevikj, 2010], un SVM pour des données de dimension 1 étant plus simple. À l'instar des SVMs, une possible application des arbres de décision pour notre problème de classification du son pourrait se faire de deux manières. ...
In many countries around the world, the number of elderly people living alone has been increasing. In the last few years, a significant number of research projects on elderly people monitoring have been launched. Most of them make use of several modalities such as video streams, sound, fall detection and so on, in order to monitor the activities of an elderly person, to supply them with a natural way to communicate with their "smart-home", and to render assistance in case of an emergency. This work is part of the Industrial Research ANR VERSO project, Sweet-Home. The goals of the project are to propose a domotic system that enables a natural interaction (using touch and voice command) between an elderly person and their house and to provide them a higher safety level through the detection of distress situations. Thus, the goal of this work is to come up with solutions for sound recognition of daily life in a realistic context. Sound recognition will run prior to an Automatic Speech Recognition system. Therefore, the speech recognition's performances rely on the reliability of the speech/non-speech separation. Furthermore, a good recognition of a few kinds of sounds, complemented by other sources of information (presence detection, fall detection, etc.) could allow for a better monitoring of the person's activities that leads to a better detection of dangerous situations. We first had been interested in methods from the Speaker Recognition and Verification field. As part of this, we have experimented methods based on GMM and SVM. We had particularly tested a Sequence Discriminant SVM kernel called SVM-GSL (SVM GMM Super Vector Linear Kernel). SVM-GSL is a combination of GMM and SVM whose basic idea is to map a sequence of vectors of an arbitrary length into one high dimensional vector called a Super Vector and used as an input of an SVM. Experiments had been carried out using a locally created sound database (containing 18 sound classes for over 1000 records), then using the Sweet-Home project's corpus. Our daily sounds recognition system was integrated into a more complete system that also performs a multi-channel sound detection and speech recognition. These first experiments had all been performed using one kind of acoustical coefficients, MFCC coefficients. Thereafter, we focused on the study of other families of acoustical coefficients. The aim of this study was to assess the usability of other acoustical coefficients for environmental sounds recognition. Our motivation was to find a few representations that are simpler and/or more effective than the MFCC coefficients. Using 15 different acoustical coefficients families, we have also experimented two approaches to map a sequence of vectors into one vector, usable with a linear SVM. The first approach consists of computing a set of a fixed number of statistical coefficients and use them instead of the whole sequence. The second one, which is one of the novel contributions of this work, makes use of a discretization method to find, for each feature within an acoustical vector, the best cut points that associates a given class with one or many intervals of values. The likelihood of the sequence is estimated for each interval. The obtained likelihood values are used to build one single vector that replaces the sequence of acoustical vectors. The obtained results show that a few families of coefficients are actually more appropriate to the recognition of some sound classes. For most sound classes, we noticed that the best recognition performances were obtained with one or many families other than MFCC. Moreover, a number of these families are less complex than MFCC. They are actually a one-feature per frame acoustical families, whereas MFCC coefficients contain 16 features per frame
Full-text available
( Alternatif Link download Buku Ajar AI, Machine Learning & Deep Learning: ) Buku ini merupakan uraian dari pemahaman empat teknik dasar pemecahan masalah dalam Artificial Intelligence, Machine Learning dan Deep Learning (AI, ML & DL), yaitu: Searching, Reasoning, Planning dan Learning. Setiap teknik memiliki banyak metode yang dapat digunakan untuk menyelesaikan kasus tertentu. Oleh karena itu, pengggunaan metode ini harus disesuaikan dengan permasalahan apa yang akan diselesaikan. Dalam perkuliahan nanti akan diberikan beberapa tugas dalam bentuk presentasi, pembuatan ilustrasi dan penyelesaian case study dengan mengimplementasikan beberapa teknik serta metode untuk membantu mempermudah pemahaman serta memberikan gambaran bagaimana memilih teknik dan metode yang tepat untuk digunakan sebagai general problem solving. Imam Cholissodin Dosen Pengampu MK Stream Data Science FILKOM UB
Supervised learning or classification is the cornerstone of Data Mining. A well-known, simple, and effective algorithm for supervised classification is k-Nearest Neighbor (k-NN). A distance measure provides significant support in the process of classification and the correct choice of distance measure is the most influential process in the classification technique. Also, the choice of k in k-Nearest Neighbor algorithm plays an effective role in the accuracy of the classifier. The aim of this paper is to analyze the integrated effect of various distance measures on different values of k in k-Nearest Neighbor algorithm on different data sets taken from UCI machine learning repository.
Conference Paper
In this paper, we compare the performance of classification techniques for multiclass support vector machines in an unstructured environment. In particular, we consider the following methods: one-against-all, one-against-one, decision directed acyclic graph, and adaptive directed acyclic graph. The performance is compared in terms of classification accuracy, training time, and evaluation time. An audio surveillance application is looked at under different noise conditions and varying signal-to-noise ratio with mel-frequency cepstral coefficients and other commonly used time and frequency domain features. The results show that while there isn't much difference in the classification accuracy using the four approaches under clean and low noise conditions, the oneagainst-all method was found to give relatively better classification accuracy in high noise conditions when trained with clean samples only. However, the results were much more even with multi-conditional training. Also, the training time for the one-against-all approach was found to increase significantly as the training data increased fourfold while the one-against-one approach showed a significantly higher evaluation time.
This paper presents a new rolling bearing fault diagnosis method based on local mean decomposition (LMD), improved multiscale fuzzy entropy (IMFE), Laplacian score (LS) and improved support vector machine based binary tree (ISVM-BT). When the fault occurs in rolling bearings, the measured vibration signal is a multi-component amplitude-modulated and frequency-modulated (AM-FM) signal. LMD, a new self-adaptive time-frequency analysis method can decompose any complicated signal into a series of product functions (PFs), each of which is exactly a mono-component AM-FM signal. Hence, LMD is introduced to preprocess the vibration signal. Furthermore, IMFE that is designed to avoid the inaccurate estimation of fuzzy entropy can be utilized to quantify the complexity and self-similarity of time series for a range of scales based on fuzzy entropy. Besides, the LS approach is introduced to refine the fault features by sorting the scale factors. Subsequently, the obtained features are fed into the multi-fault classifier ISVM-BT to automatically fulfill the fault pattern identifications. The experimental results validate the effectiveness of the methodology and demonstrate that proposed algorithm can be applied to recognize the different categories and severities of rolling bearings.
Full-text available
The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light.
Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.