Conference Paper

A first approach for cost-sensitive classification with linguistic Genetic Fuzzy Systems in imbalanced data-sets

Dept. of Comput. Sci. & A.I., Univ. of Granada, Granada, Spain
DOI: 10.1109/ISDA.2010.5687187 Conference: Intelligent Systems Design and Applications (ISDA), 2010 10th International Conference on
Source: IEEE Xplore

ABSTRACT Classification in imbalanced domains has become one of the most relevant problems within the area of Machine Learning at the present. This problem has raised in significance due to its presence in many real applications and it occurs when the distribution of the available examples to carry out the learning process is very different between the classes (often for binary class data-sets). Usually, the underrepresented class is the concept of the most interest for the problem, being the cost derived from a misclassification of these examples much higher than that of the remaining examples. In this work we analyze the behaviour of a cost-sensitive learning method for Fuzzy Rule Based Classification Systems in the scenario of high imbalanced data-sets. Specifically, we focus on one representative rule learning approach for Genetic Fuzzy Systems, the Fuzzy Hybrid Genetics-Based Machine Learning algorithm. The experimental results show how our cost-sensitive approach in this type of domains will help us to obtain very accurate solutions in shorter training times and also with a lower complexity with respect to other possibilities proposed for classification with imbalanced problems such as the use of preprocessing to rebalance the class distribution.

1 Bookmark
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: There are several aspects that might influence the performance achieved by existing learning systems. It has been reported that one of these aspects is related to class imbalance in which examples in training data belonging to one class heavily outnumber the examples in the other class. In this situation, which is found in real world data describing an infrequent but important event, the learning system may have di#culties to learn the concept related to the minority class. In this work we perform a broad experimental evaluation involving ten methods, three of them proposed by the authors, to deal with the class imbalance problem in thirteen UCI data sets. Our experiments provide evidence that class imbalance does not systematically hinder the performance of learning systems. In fact, the problem seems to be related to learning with too few minority class examples in the presence of other complicating factors, such as class overlapping. Two of our proposed methods, Smote + Tomek and Smote + ENN, deal with these conditions directly, allying a known over-sampling method with data cleaning methods in order to produce better-defined class clusters. Our comparative experiments show that, in general, over-sampling methods provide more accurate results than under-sampling methods considering the area under the ROC curve (AUC). This result seems to contradict results previously published in the literature. Smote + Tomek and Smote + ENN presented very good results for data sets with a small number of positive examples. Moreover, Random over-sampling, a very simple over-sampling method, is very competitive to more complex over-sampling methods. Since the over-sampling methods provided very good performance results, we also measured the syntactic complexity of decision trees induc...
    ACM SIGKDD Explorations Newsletter 06/2004; 6:20-29.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In the field of classification problems, we often encounter classes with a very different percentage of patterns between them, classes with a high pattern percentage and classes with a low pattern percentage. These problems receive the name of “classification problems with imbalanced data-sets”. In this paper we study the behaviour of fuzzy rule based classification systems in the framework of imbalanced data-sets, focusing on the synergy with the preprocessing mechanisms of instances and the configuration of fuzzy rule based classification systems. We will analyse the necessity of applying a preprocessing step to deal with the problem of imbalanced data-sets. Regarding the components of the fuzzy rule base classification system, we are interested in the granularity of the fuzzy partitions, the use of distinct conjunction operators, the application of some approaches to compute the rule weights and the use of different fuzzy reasoning methods.
    Fuzzy Sets and Systems 09/2008; · 1.88 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Intestinal motility assessment with video capsule endoscopy arises as a novel and challenging clinical fieldwork. This technique is based on the analysis of the patterns of intestinal contractions shown in a video provided by an ingestible capsule with a wireless micro-camera. The manual labeling of all the motility events requires large amount of time for offline screening in search of findings with low prevalence, which turns this procedure currently unpractical. In this paper, we propose a machine learning system to automatically detect the phasic intestinal contractions in video capsule endoscopy, driving a useful but not feasible clinical routine into a feasible clinical procedure. Our proposal is based on a sequential design which involves the analysis of textural, color, and blob features together with SVM classifiers. Our approach tackles the reduction of the imbalance rate of data and allows the inclusion of domain knowledge as new stages in the cascade. We present a detailed analysis, both in a quantitative and a qualitative way, by providing several measures of performance and the assessment study of interobserver variability. Our system performs at 70% of sensitivity for individual detection, whilst obtaining equivalent patterns to those of the experts for density of contractions.
    IEEE transactions on medical imaging. 06/2009; 29(2):246-59.

Full-text (2 Sources)

Available from
May 28, 2014