Chapter

Other Analytical Techniques

Authors:
  • National Institute of Technical Teachers Training and Research Chandigar
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Machine learning techniques such as regression, Naïve Bayes, decision tree, and SVM discussed in the previous chapters are mainly used for prediction purposes. However, there are other analytical techniques that play an important role in data analysis such as clustering, association rule mining, random forest, principal component analysis, and logistic regression. Clustering is an analytical technique that groups similar objects into two or more groups using the different data attributes and features. Association rule mining is widely used in e-markets for enhancing the customer purchases by providing the suitable offers for the customers. Random forest is a classification technique that builds a number of decision trees based on the decision points in the classification. In this chapter, these analytical techniques are discussed with examples.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... During this same period there was an increase in the application of plant chemistry to reconnaissance-level geochemical mapping (e.g., Dunn et al., , 1992aDunn et al., ,b, 1994aDunn et al., ,b,c, 1996aDunn and Balma, 1997;Selinus, 1988), with much of the data obtained from INAA of dry and ashed material, and ICP-ES analysis of ash for those elements for which INAA is not an appropriate method. On the analytical scene method development included improvements in the analysis of precious metals ( Hall et al., 1990aHall et al., ,b, 1991Hall, 1995). ...
... Partial leaches of vegetation have been shown to provide further insight to the labile phases of elements in plant structures, notably the halogens ( Dunn et al., 2006). Details of these developments are given in Hall (1995) and Dunn (2007). ...
Article
Full-text available
Significant refinements of biogeochemical methods applied to mineral exploration have been made over the past quarter century, with advances on all continents. Databases from surveys around the world provide enhanced information on which species and tissues to collect from all major climatic environments, and how, why and when to sample. Recent commercialization of sophisticated new ICP-MS analytical technology and the information emerging from studies involving the Synchrotron are permitting immensely more insight into the multi-element composition of plants. From determinations of precise ultra-trace levels in dry tissues of mineral 'pathfinder' elements, and the recognition of element distribution patterns with respect to concealed mineralization a steady understanding of relationships is unfurling. Data are now readily available on the biogeochemistry of almost all elements of the Periodic Table. Surveys conducted in undisturbed areas over the past 30 years can now be put into context of results from subsequent exploration activities, some of which have been developed as mines. Examples are presented from surveys for Au, PGEs, U, and kimberlites by reference to a wealth of previously unpublished data that until recently has been classified as confidential.
... The precision of an analytical method expresses the degree of scatter between a series of measurements obtained from multiple sampling of the same homogeneous sample under prescribed conditions [18] . Intraday precision refers to the use of analytical procedure within a laboratory over a short period of time using the same operator with the same equipment whereas Interday precision involves estimation of variations in analysis when a method is used within a laboratory on different days, by different analysts [19,20] . Repeatability (intraday) was assessed by analyzing these three different ...
Article
Full-text available
A rapid, simple, selective and precise UV- Visible Spectrophotometric method has been developed for the determination of Curcumin in bulk forms and solid dosage formulations. The spectrophotometric detection was carried out at an absorption maximum of 421 nm using methanol as solvent. The method was validated for specificity, linearity, accuracy, precision, robustness and ruggedness. The detector response for the Curcumin was linear over the selected concentration range 1 to 7 μg/ml with a correlation coefficient of 0.9995. The accuracy was between 99.1& 101.4%. The precision (R.S.D.) among six sample preparations was 0.39%. The LOD and LOQ are 0.05 and 0.172 μg/ml, respectively. The recovery of curcumin was about 100.4 %. The results demonstrated that the excipients in the commercial tablets did not interfere with the method and can be conveniently employed for routine quality control analysis of Curcumin in bulk drug, marketed tablets and other formulations.
Article
A review is given of the emperical and approximate theoretical expressions which have been developed to describe various aspects of impact at hypervelocities where the projectile and some of the target materials undergo massive plastic deformation, breakup, melting or vaporization. Various stages of the penetration process are identified on the basis of experimental evidence. Empirical fits to experimental data, or, at velocities above the experimental range, to results of finite difference calculations of the penetration event, are reviewed. In some cases simple theories, usually requiring the evaluation of some undetermined parameters from experimental or numerical data, have been developed. These are described, with emphasis on those which have found use in the design of offensive or defensive systems.
Article
Full-text available
Principal component analysis is one of the most important and powerful methods in chemometrics as well as in a wealth of other areas. This paper provides a description of how to understand, use, and interpret principal component analysis. The paper focuses on the use of principal component analysis in typical chemometric areas but the results are generally applicable.
Chapter
Many medical and epidemiologic studies incorporate an ordinal response variable. In some cases an ordinal response Y represents levels of a standard measurement scale such as severity of pain (none, mild, moderate, severe). In other cases, ordinal responses are constructed by specifying a hierarchy of separate endpoints. For example, clinicians may specify an ordering of the severity of several component events and assign patients to the worst event present from among none, heart attack, disabling stroke, and death. Still another use of ordinal response methods is the application of rank-based methods to continuous responses so as to obtain robust inferences. For example, the proportional odds model described later allows for a continuous Y and is really a generalization of the Wilcoxon-Mann-Whitney rank test.
Article
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.
Article
Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into a system of ranked taxa: domain, kingdom, phylum, class, etc. Cluster analysis is the formal study of methods and algorithms for grouping, or clustering, objects according to measured or perceived intrinsic characteristics or similarity. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes data clustering (unsupervised learning) from classification or discriminant analysis (supervised learning). The aim of clustering is to find structure in data and is therefore exploratory in nature. Clustering has a long and rich history in a variety of scientific fields. One of the most popular and simple clustering algorithms, K-means, was first published in 1955. In spite of the fact that K-means was proposed over 50 years ago and thousands of clustering algorithms have been published since then, K-means is still widely used. This speaks to the difficulty in designing a general purpose clustering algorithm and the ill-posed problem of clustering. We provide a brief overview of clustering, summarize well known clustering methods, discuss the major challenges and key issues in designing clustering algorithms, and point out some of the emerging and useful research directions, including semi-supervised clustering, ensemble clustering, simultaneous feature selection during data clustering, and large scale data clustering.
Article
This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval
Article
We consider the problem of analyzing market-basket data and present several important contributions. First, we present a new algorithm for finding large itemsets which uses fewer passes over the data than classic algorithms, and yet uses fewer candidate itemsets than methods based on sampling. We investigate the idea of item reordering, which can improve the low-level efficiency of the algorithm. Second, we present a new way of generating "implication rules," which are normalized based on both the antecedent and the consequent and are truly implications (not simply a measure of co-occurrence), and we show how they produce more intuitive results than other methods. Finally, we show how different characteristics of real data, as opposed to synthetic data, can dramatically affect the performance of the system and the form of the results.
Fast algorithms for mining association rules
  • R Agarwal
  • R Srikant