Comparing pattern discovery and back-propagation classifiers
ABSTRACT The pattern discovery (PD) algorithm of Wang and Wong was applied as a classifier to several continuous-valued data sets generated to explore performance across a selection of interesting linearly and non-linearly separable class distributions. Performance of several configurations of PD and backpropagation (BP) neural network classifiers and a minimum inter-class distance (MICD) classifier was quantified and compared. The best performance of the PD and BP classifiers were found to be similar for all class distributions studied and close to the optimal IMICD performance for linearly separable class distributions. The performance of both PD and BP classifiers was dependent on the classifier configuration. PD classifier performance depended on the number of intervals used to quantize the continuous data in a predictable, class-distribution independent way. BP performance depended on the number of hidden nodes in a way which was class-distribution dependent and difficult to determine a priori. The transparency and statistical validity of the patterns used and the decisions made by PD classifiers make them highly suitable for problems in which the rationale and confidence of classifications are required so that multiple classifications can be effectively combined to support decisions in a broader context such as medical diagnosis. The strong absolute and relative performance of PD classifiers and the relative simplicity of their implementation when applied to continuous-valued data suggest that they can be effectively utilized in decision support systems in which the underlying data is continuous or discrete valued.
- [show abstract] [hide abstract]
ABSTRACT: To uncover qualitative and quantitative patterns in a data set is a challenging task for research in the area of machine learning and data analysis. Due to the complexity of real-world data, high-order (polythetic) patterns or event associations, in addition to first-order class-dependent relationships, have to be acquired. Once the patterns of different orders are found, they should be represented in a form appropriate for further analysis and interpretation. The authors propose a novel method to discover qualitative and quantitative patterns (or event associations) inherent in a data set. It uses the adjusted residual analysis in statistics to test the significance of the occurrence of a pattern candidate against its expectation. To avoid exhaustive search of all possible combinations of primary events, techniques of eliminating the impossible pattern candidates are developed. The detected patterns of different orders are then represented in an attributed hypergraph which is lucid for pattern interpretation and analysis. Test results on artificial and real-world data are discussed toward the end of the paperIEEE Transactions on Knowledge and Data Engineering 01/1997; 9:877-893. · 1.89 Impact Factor