Using chemometrics for navigating in the large data sets of genomics, proteomics, and metabonomics (gpm)

Umeå University, Umeå, Västerbotten, Sweden
Analytical and Bioanalytical Chemistry (Impact Factor: 3.58). 11/2004; 380(3):419-29. DOI: 10.1007/s00216-004-2783-y
Source: PubMed

ABSTRACT This article describes the applicability of multivariate projection techniques, such as principal-component analysis (PCA) and partial least-squares (PLS) projections to latent structures, to the large-volume high-density data structures obtained within genomics, proteomics, and metabonomics. PCA and PLS, and their extensions, derive their usefulness from their ability to analyze data with many, noisy, collinear, and even incomplete variables in both X and Y. Three examples are used as illustrations: the first example is a genomics data set and involves modeling of microarray data of cell cycle-regulated genes in the microorganism Saccharomyces cerevisiae. The second example contains NMR-metabonomics data, measured on urine samples of male rats treated with either of the drugs chloroquine or amiodarone. The third and last data set describes sequence-function classification studies in a set of G-protein-coupled receptors using hierarchical PCA.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Radiotherapy is one of the mainstays of glioblastoma (GBM) treatment. This study aims to investigate and characterise differences in protein expression patterns in brain tumour tissue following radiotherapy, in order to gain a more detailed understanding of the biological effects. Rat BT4C glioma cells were implanted into the brain of two groups of 12 BDIX-rats. One group received radiotherapy (12 Gy single fraction). Protein expression in normal and tumour brain tissue, collected at four different time points after irradiation, were analysed using surface enhanced laser desorption/ionisation - time of flight - mass spectrometry (SELDI-TOF-MS). Mass spectrometric data were analysed by principal component analysis (PCA) and partial least squares (PLS). Using these multivariate projection methods we detected differences between tumours and normal tissue, radiation treatment-induced changes and temporal effects. 77 peaks whose intensity significantly changed after radiotherapy were discovered. The prompt changes in the protein expression following irradiation might help elucidate biological events induced by radiation. The combination of SELDI-TOF-MS with PCA and PLS seems to be well suited for studying these changes. In a further perspective these findings may prove to be useful in the development of new GBM treatment approaches.
    British Journal of Cancer 07/2006; 94(12):1853-63. DOI:10.1038/sj.bjc.6603190 · 4.82 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this study, unsupervised and supervised classification methods were compared for comprehensive analysis of the fingerprints of 26 Phyllanthus samples from different geographical regions and species. A total of 63 compounds were identified and tentatively assigned structures for the establishment of fingerprints using high-performance liquid chromatography time-of-flight mass spectrometry (HPLC/TOFMS). Unsupervised and supervised pattern recognition technologies including principal component analysis (PCA), nearest neighbors algorithm (NN), partial least squares discriminant analysis (PLS-DA), and artificial neural network (ANN) were employed. Results showed that Phyllanthus could be correctly classified according to their geographical locations and species through ANN and PLS-DA. Important variables for clusters discrimination were also identified by PCA. Although unsupervised and supervised pattern recognitions have their own disadvantage and application scope, they are effective and reliable for studying fingerprints of traditional Chinese medicines (TCM). These two technologies are complementary and can be superimposed. Our study is the first holistic comparison of supervised and unsupervised pattern recognition technologies in the TCM chemical fingerprinting. They showed advantages in sample classification and data mining, respectively.
    Analytical and Bioanalytical Chemistry 12/2014; 407(5). DOI:10.1007/s00216-014-8371-x · 3.58 Impact Factor
  • Source