Conference Paper

uRule: A Rule-Based Classification System for Uncertain Data.

DOI: 10.1109/ICDMW.2010.73 Conference: ICDMW 2010, The 10th IEEE International Conference on Data Mining Workshops, Sydney, Australia, 14 December 2010
Source: DBLP

ABSTRACT Data uncertainty is common in real-world applications. Various reasons lead to data uncertainty, including imprecise measurements, network latency, outdated sources and sampling errors. These kinds of uncertainties have to be handled cautiously, or else the data mining results could be unreliable or wrong. In this demo, we will show uRule, a new rule-based classification and prediction system for uncertain data. This system uses new measures for generating, pruning and optimizing classification rules. These new measures are computed considering uncertain data intervals and probability distribution functions. Based on the new measures, the optimal splitting attributes and splitting values can be identified and used in classification rules. uRule can process uncertainty in both numerical and categorical data. It has satisfactory classification performance even when data is highly uncertain.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The paper introduces an annotation scheme for a political debate dataset which is mainly in the form of video, and audio annotations. The annotation contains various infor-mation ranging from general linguistic to domain specific information. Some are annotated with automatic tools, and some are manually annotated. One of the goals is to use the information to predict the categories of the answers by the speaker to the disruptions. A typology of such answers is proposed and an automatic categorization system based on a multimodal parametrization is successfully performed.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Data uncertainty are common in real-world applications and it can be caused by many factors such as imprecise measurements, network latency, outdated sources and sampling errors. When mining knowledge from these applications, data uncertainty need to be handled with caution. Otherwise, unreliable or even wrong mining results would be obtained. In this paper, we propose a rule induction algorithm, called uRule, to learn rules from uncertain data. The key problem in learning rules is to efficiently identify the optimal cut points from training data. For uncertain numerical data, we propose an optimization mechanism which merges adjacent bins that have equal classifying class distribution and prove its soundness. For the uncertain categorical data, we also propose a new method to select cut points based on possible world semantics. We then present the uRule algorithm in detail. Our experimental results show that the uRule algorithm can generate rules from uncertain numerical data with potentially higher accuracies, and the proposed optimization method is effective in the cut point selection for both certain and uncertain numerical data. Furthermore, uRule has quite stable performance when mining uncertain categorical data.
    Knowledge and Information Systems 01/2011; 29:103-130. · 2.23 Impact Factor
  • Source
    International Journal of Computers. 01/2014; 8:66-75.

Full-text (2 Sources)

Available from
Jun 2, 2014