Micheline Kamber's research while affiliated with Simon Fraser University and other places

Publications (51)

Chapter
This chapter discusses the advanced methods of frequent pattern mining, which mines more complex forms of frequent patterns and considers user preferences or constraints to speed up the mining process. Frequent pattern mining has reached far beyond the basics due to substantial research, numerous extensions of the problem scope, and broad applicati...
Chapter
This chapter introduces the basic concepts of frequent patterns, associations, and correlations and studies how they can be mined efficiently. How to judge whether the patterns found are interesting is also discussed. Frequent patterns are patterns (e.g., itemsets, subsequences, or substructures) that appear frequently in a data set. For example, a...
Chapter
This chapter presents a high-level overview of mining complex data types, which includes mining sequence data such as time series, symbolic sequences, and biological sequences; mining graphs and networks; and mining other kinds of data, including spatiotemporal and cyber-physical system data, multimedia, text and Web data, and data streams. Trends...
Chapter
Publisher Summary This chapter presents the basic concepts and methods of cluster analysis. The requirements of clustering methods for massive amounts of data and various applications are studied. Several basic clustering techniques are discussed organized into the following categories: partitioning methods, hierarchical methods, density-based meth...
Chapter
This chapter presents an overview of data warehouse and online analytical processing (OLAP) technology. This overview is essential for understanding the overall data mining and knowledge discovery process. Data warehouses generalize and consolidate data in multidimensional space. The construction of data warehouses involves data cleaning, data inte...
Chapter
This chapter discusses the advanced techniques for data classification starting with Bayesian belief networks, which do not assume class conditional independence. Bayesian belief networks allow class conditional independencies to be defined between subsets of variables. They provide a graphical model of causal relationships, on which learning can b...
Chapter
This chapter introduces the basic concepts of data preprocessing and the methods for data preprocessing are organized into the following categories: data cleaning, data integration, data reduction, and data transformation. Data have quality if they satisfy the requirements of the intended use. There are many factors comprising data quality, includi...
Chapter
This chapter is about getting familiar with the data. Knowledge about the data is useful for data preprocessing, the first major task of the data mining process. The various attribute types are studied. These include nominal attributes, binary attributes, ordinal attributes, and numeric attributes. Basic statistical descriptions can be used to lear...
Chapter
Classification is a form of data analysis that extracts models describing important data classes. Such models, called classifiers, predict categorical (discrete, unordered) class labels. Such analysis can help provide users with a better understanding of the data at large. Classification and numeric prediction are the two major types of prediction...
Chapter
This chapter aims to study outlier detection techniques. The different types of outliers are defined. An overview of outlier detection methods is also presented. Assume that a given statistical process is used to generate a set of data objects. An outlier is a data object that deviates significantly from the rest of the objects, as if it were gener...
Chapter
This chapter focuses on data cube technology. Data warehouse systems provide online analytical processing (OLAP) tools for interactive analysis of multidimensional data at varied granularity levels. OLAP tools typically use the data cube and a multidimensional data model to provide flexible access to summarized data. A data cube can interactively e...
Chapter
This chapter discusses the advanced topics of cluster analysis. In conventional cluster analysis, an object is assigned to one cluster exclusively. However, in some applications, there is a need to assign an object to one or more clusters in a fuzzy or probabilistic way. Fuzzy clustering and probabilistic model-based clustering allow an object to b...
Book
This is the third edition of the premier professional reference on the subject of data mining, expanding and updating the previous market leading edition. This was the first (and is still the best and most popular) of its kind. Combines sound theory with truly practical applications to prepare students for real-world challenges in data mining. Like...
Book
Data Mining Concept and Techniques 2nd edition
Article
Metarule-guided mining is an interactive approach to data mining, where users probe the data under analysis by specifying hypotheses in the form of metarules, or pattern templates. Previous methods for metarule-guided mining of association rules have primarily used a transac- tion/relation table-based structure. Such ap- proaches require costly, mu...
Article
Rule: Basic Concepts n Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit) n Find: all rules that correlate the presence of one set of items with that of another set of items n E.g., 98% of people who purchase tires and auto accessories also get automotive services done n Applications n...
Book
This is the third edition of the premier professional reference on the subject of data mining, expanding and updating the previous market leading edition. This was the first (and is still the best and most popular) of its kind. Combines sound theory with truly practical applications to prepare students for real-world challenges in data mining. Like...
Article
Full-text available
A data mining system, DBMiner, has been developed for interactive mining of multiple-level knowledge in large relational databases and data warehouses. The system implements a wide spectrum of data mining functions, including characterization, comparison, association, classification, prediction, and clustering. By incorporating several interesting...
Conference Paper
Efficiency and scalability are fundamental issues concerning data mining in large databases. Although classification has been studied extensively, few of the known methods take serious consideration of efficient induction in large databases and the analysis of data at multiple abstraction levels. The paper addresses the efficiency and scalability i...
Conference Paper
Full-text available
A data mining system, DBMiner, has been developed for interactive mining of multiple-level knowledge in large relational databases and data warehouses. The system implements a wide spectrum of data mining functions, including characterization, comparison, association, classification, prediction, and clustering. By incorporating several interesting...
Article
In this paper, we employ a novel approach to metarule-guided, multi-dimensional association rule mining which explores a data cube structure. We propose algorithms for metarule-guided min- ing: given a metarule containing p predicates, we compare mining on an n-dimensional (n-D) cube structure (where p < n) with mining on smaller multiple pdimensio...
Conference Paper
In order to re-use existing models of the environment mobile robots must be able to estimate their position and orientation in such models. Most of the existing methods for position estimation are based on special purpose sensors or aim at tracking the ...
Article
Human investigators instinctively segment medical images into their anatomical components, drawing upon prior knowledge of anatomy to overcome image artifacts, noise, and lack of tissue contrast. The authors describe: 1) the development and use of a brain tissue probability model for the segmentation of multiple sclerosis (MS) lesions in magnetic r...
Article
Three-dimensional MRI imaging techniques offer new possibilities for qualitative and quantitative studies of gross neuroanatomy, functional neuroanatomy and for neurosurgical planning. The digital nature of the data allows for the reconstruction of realistic three-dimensional models of an individual brain which can be sliced at arbitrary orientatio...
Article
The accuracy of Positron Emission Tomography (PET) for measuring in vivo concentrations of radiolabelled pharmaceuticals is affected by the limited tomograph resolution. Using computer simulations, we developed a model reproducing the distribution of the tracer [18F]fluoroDOPA which is specifically taken up in the normal human striatum. Validation...
Article
A 3D simulation procedure has been developed to generate simulated PET brain images from MRI data. MRI slices were segmented into gray matter, white matter, CSF structures, and assigned with radionuclide distributions. Projections through these regions were generated according to physical characteristics of a positron tomograph, including 3D sampli...
Conference Paper
Summary form only given. Artificial intelligence techniques of machine learning, pattern recognition, and the use of domain knowledge were employed in the segmentation, or automated detection, of multiple sclerosis (MS) lesions in magnetic resonance images of the human brain. The performances of the statistical minimum distance and Bayesian classif...
Article
This paper describes the development and use of a brain tissue probability model for the segmentation of multiple sclerosis lesions in magnetic resonance (MR) images of the human brain. Based on MR data obtained from a group of healthy volunteers, the model was constructed to provide prior probabilities of grey matter, white matter, ventricular cer...
Article
In this paper, we employ a novel approach to metarule-guided, multi-dimensional association rule mining which explores a data cube structure. We propose algorithms for metarule-guided mining: given a metarule containing p predicates, we compare mining on an n-dimensional (n-D) cube structure (where p ! n) with mining on smaller multiple p-dimension...
Article
There are a good number of introductory-level textbooks on data warehousing and OLAP technology, including Kimball and Ross (KR02), Imhofi, Galemmo and Geiger (IGG03), Inmon (Inm96), Berson and Smith (BS97), and Thomsen (Tho97). Chaudhuri and Dayal (CD97) provide a general overview of data warehousing and OLAP technology. A set of research papers o...
Article
Mining complex types of data has been a fast developing, popular research fleld, with many research papers and tutorials appearing in conferences and journals on data mining and database systems. This chapter covers a few important themes, including multidimensional analysis and mining of complex data objects, spatial data mining, multimedia data m...

Citations

... • Data cleaning and normalisation using outlier removal (Interquartile Range Method) ( Vinutha et al., 2018 ) and Min-Max scaler method ( Han et al., 2012 ). • Multiple factor analysis to remove multi-dimensionality of the data and to select key variables for the model. ...
... In particular, the Talairach and the Montreal Neurological Institute Hospital stereotaxic space emerged as CCFs for human brain tissue. For rodent brains, tThe Paxinos' Rat brain stereotaxic coordinate system and Waxholm space 11,12 are widely used [13][14][15][16][17] . However, a CCF that works for the brain 18 does not necessarily work for other human organs that might be much larger (large intestine), in/deflated (lung), or highly variable in size (lymph nodes). ...
... Different algorithms have used in this study, but we reported the best three algorithms, which have the highest accuracy. In this classification the dataset of hadith is divided randomly into ten parts, each part is held out once to test the classifier, and the classifier is trained on the remaining nine parts to train and test all dataset known as cross validation method [20], figure 3 shows the overall accuracy for each classifier. Table 5 and Figure 5 show the Precision, Recall, and F1measure (F1) for individual category. ...
... In other words, the hyperplane should be specified for each instance − → x i , the distance between the sample and the hyper-page is the maximum. Each hyperplane can be described as follows (Han et al., 2011): ...
... One reason was that the SVM models are capable of finding nonlinear decision boundaries in input space [24]. Moreover, the SVM algorithm is capable of separating class instances using a hyperplane with kernel trick (i.e., imitating higher dimensional input space). ...
... Clustering is employed to find similar or hidden patterns [25]. In these investigations, different clustering methods were used such as k-means, the spectral, the hierarchical and the fuzzy cmeans method [25][26][27][28]. The aim is to determine how well the ten individual data sets can be identified as a cluster using the individual methods for different averaging times. ...
... The goal of the induction process is to classify the data in the matrix into groups such that the dataset in each group belongs to the same class. This paper uses the ID3 decision tree algorithm, which uses information gain as an attribute selection measure, to classify the data into groups [22]. The ID3 decision tree algorithm takes two basic inputs: the performance data matrix from the virtual training scenarios, and the list of attributes that were varied in each scenario. ...
... The detection of outliers is one of its many different applications. Therefore, given a vector of samples y and its mean , the MDs of the samples is defined as [19]: ...
... Fig. 3 The comparison strategies in UcVE Matrix design. a It will cause value bias when using a Bezier curve to depict the petal's outer shape; b It shows what is possible when using Polyline to plot visualization unit petals to compare the value differences Spatiotemporal data relate to both space and time (Han et al. 2012). There is typically various information at various times on every spatial spot, and this information is typically accompanied by multi-dimensional attributes. ...
... Since the vectors generated through sentence-transformers encapsulate underlying semantics, higher cosine similarity scores have been found to indicate higher convergence of underlying semantics between the represented texts [cf. 6,31,42]. ...