Figure - uploaded by Ali A. Al-Subaihi
Content may be subject to copyright.
Source publication
Most multivariate statistical techniques rely on the assumption of multivariate normality. The effects of nonnormality on multivariate tests are assumed to be negligible when variance–covariance matrices and sample sizes are equal. Therefore, in practice, investigators usually do not attempt to assess multivariate normality. In this simulation stud...
Citations
... The Box–Cox transformation was recently considered by Giles and Kipling (2003) for use with microarray data. Some modifications to this method have been proposed (see Sakia (1992), for a review and bibliography, and also Kirisci et al. (2005)), however the original transformation was used here. In our work, an empirical approach to fitting was adopted in which each possible transformation was applied to each gene's data (across subjects) and the Kolmogorov–Smirnov (KS) goodness-of-fit (GOF) statistic was used to identify the best fitting transformation in each case. ...
Data derived from gene expression microarrays often are used for purposes of classification and discovery. Many methods have been proposed for accomplishing these and related aims, however the statistical properties of such methods generally are not well established. To this end, it is desirable to develop realistic mathematical and statistical models that can be used in a simulation context so that the impacts of data analysis methods and testing approaches can be established. A method is developed in which variation among arrays can be characterized simultaneously for a large number of genes resulting in a multivariate model of gene expression. The method is based on selecting mathematical transformations of the underlying expression measures such that the transformed variables follow approximately a Gaussian distribution, and then estimating associated parameters, including correlations. The result is a multivariate normal distribution that serves to model transformed gene expression values within a subject population, while accounting for covariances among genes and/or probes. This model then is used to simulate microarray expression and probe intensity data by employing a modified Cholesky matrix factorization technique which addresses the singularity problem for the "small n, big p" situation. An example is given using prostate cancer data and, as an illustration, it is shown how data normalization can be investigated using this approach.
The Taguchi design method, which uses orthogonal arrays to study the quality of characteristics using only a small number of experiments, produces outstanding outcomes when applied to industrial processes. However, nearly all industrial data is concealed via interactive data transformations, for which Box–Cox, arcsine and logit with computer-based management are proposed. The efficiency of each transformation based on the mean and signal-to-noise ratio was investigated for a different number of replicates and noise levels on the response. A total of four simulated scenarios each with 100 noisy data sets were used to examine the system performance. The numerical results indicate that Box–Cox and arcsine transformations are superior to logit transformations. Moreover, the analytical outcomes from the interactive ranges of transformation parameters via the hybridisation of variable neighbourhood search and particle swarm optimisation methods were a close fit to the natural data. In an actual computer-based application of the interactive data transformation system for a ball swaging process, Box–Cox and arcsine transformations followed the simulated numerical data and provided much more appropriate outcomes closer to the natural data. The target of a gram load control was then met via process parameters with higher levels of ranking results of contribution ratio, as expected from the natural data.