Conference Paper

uRule: A Rule-Based Classification System for Uncertain Data.

DOI: 10.1109/ICDMW.2010.73 Conference: ICDMW 2010, The 10th IEEE International Conference on Data Mining Workshops, Sydney, Australia, 14 December 2010
Source: DBLP


Data uncertainty is common in real-world applications. Various reasons lead to data uncertainty, including imprecise measurements, network latency, outdated sources and sampling errors. These kinds of uncertainties have to be handled cautiously, or else the data mining results could be unreliable or wrong. In this demo, we will show uRule, a new rule-based classification and prediction system for uncertain data. This system uses new measures for generating, pruning and optimizing classification rules. These new measures are computed considering uncertain data intervals and probability distribution functions. Based on the new measures, the optimal splitting attributes and splitting values can be identified and used in classification rules. uRule can process uncertainty in both numerical and categorical data. It has satisfactory classification performance even when data is highly uncertain.

Download full-text


Available from: Biao Qin,
  • Source
    • "Nevertheless,thissystemincreasedthecomputationalcost duetothereproductionofrules.Moreover,itisquestionable ifthesystemcanproducestablerulesfromsmalldata. Additionally,in[50],anewrulebasedalgorithmcalled uRulewasdevelopedtohandleuncertaincontinuous attributes.ItisbuiltbasedonREP-basedfamilyandused newheuristicstooptimizeandpruneresultingrules,identify theoptimalthresholds,andhandleuncertainvalues.In[51], theempiricalresultthattestedthisalgorithmwasdiscussed tofindthatitcanhandleuncertaintyincontinuousand discreteattributes.However,itwasfoundthatitconsumesa lotoftimebecauseofthecomplexityofrulepruningstep. "

  • Source
    • "Uncertain data calls for new processing approaches where uncertainty is explicitly accounted for, and it has led to a solid body of work on building probabilistic databases, such as MystiQ, Trio, and MayBMS. Albeit at a smaller scale, there is effort to adapt well-known data mining tasks to uncertain data, e.g., in discovery of frequent patterns and association rules [14], clustering [5], and classification [10]. However, to the best of our knowledge, prior work only considers limited probabilistic data models based on a simplifying independence assumption and circumvents the hardness of probability computation by the use of expected values and Monte- Carlo sampling. "
    [Show abstract] [Hide abstract]
    ABSTRACT: DAGger is a clustering algorithm for uncertain data. In contrast to prior work, DAGger can work on arbitrarily correlated data and can compute both exact and approximate clusterings with error guarantees. We demonstrate DAGger using a real-world scenario in which partial discharge data from UK Power Networks is clustered to predict asset failure in the energy network.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The paper introduces an annotation scheme for a political debate dataset which is mainly in the form of video, and audio annotations. The annotation contains various infor-mation ranging from general linguistic to domain specific information. Some are annotated with automatic tools, and some are manually annotated. One of the goals is to use the information to predict the categories of the answers by the speaker to the disruptions. A typology of such answers is proposed and an automatic categorization system based on a multimodal parametrization is successfully performed.
Show more