Feature extraction in the analysis of proteomic mass spectra

Eastern Virginia Medical School, Norfolk, Virginia, United States
PROTEOMICS (Impact Factor: 3.81). 07/2006; 6(7):2095-100. DOI: 10.1002/pmic.200500459
Source: PubMed


Feature extraction or biomarker selection is a critical step in disease diagnosis and knowledge discovery based on protein MS. Many studies have discussed the classification methods applied in proteomics; however, few could be found to address feature extraction in detail. In this paper, we developed a systematic approach for the extraction of mass spectrum peak apex and peak area with special emphasis on noise filtration and peak calibration. Application to a head and neck cancer data generated at the Eastern Virginia Medical School [Wadsworth, J. T., Somers, K. D., Cazares, L. H., Malik, G. et al.., Clin. Cancer Res. 2004, 10, 1625-1632] revealed that the new feature extraction method would yield consistent and highly discriminatory biomarkers.

Download full-text


Available from: Oliver John Semmes, Jan 12, 2015
    • "Subsequently, peak picking was performed by finding all the local maxima and eliminating those with intensities lower than a non-uniform threshold proportional to the noise level (Currie, 1999) (Yasui et al., 2003). Since the mass spectra could be inaccurately aligned after the calibration procedure, a maximum tolerance distance equal to 600 ppm of the m/z value was accepted for the comparison (Fushiki et al., 2006) (Wang et al., 2006). Finally, a classification of peaks based on the peak detection rate (PDR) was performed, which was expressed by the ratio between the number of spectra containing the considered peak and the total number of analysed spectra (Mantini et al., 2007). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The growth of spoiling yeasts in beverages results in reduced quality, economic and image losses. Therefore, biochemical and DNA-based identification methods have been developed but are mostly time-consuming and laborious. Matrix-Assisted-Laser-Desorption/Ionization-Time-Of-Flight Mass Spectrometry (MALDI-TOF MS) could deliver discriminative peptide mass fingerprints within minutes and could thus be a rapid and reliable tool for identification and differentiation. However, routine analysis of yeasts by MALDI-TOF MS is yet impaired by low reproducibility and effects of different physiological states of organisms on the reliability of the identification method are still controversial. The aim of this study was to optimize sample preparation and measurement parameterization using three spoilage yeasts (Saccharomyces cerevisiae var. diastaticus, Wickerhamomyces anomalus and Debaryomyces hansenii). The influence of environmental or physiological parameters including oxygen availability, different nutrients, cell density and growth phase were analysed and revealed small differences in mass fingerprints. Yeasts grown in the presence or absence of oxygen were precisely differentiated along these differences in mass fingerprints and a crude classification of growth phase was possible. Cell concentration did not affect the spectra distinctly, neither qualitatively nor quantitatively, and an influence of available nutrients could not be measured in each case. However, core mass peaks remained constant under all tested conditions enabling reliable identification.
    No preview · Article · Dec 2013 · Food Microbiology
  • Source
    • "abundance, such as the peptide peak intensity (height or area of a peak), the peptide precursor ion peak height, and the peak height of product ions, can be extracted. Using such information individually or combinatorially, numerous label-free methods have been developed, including two extensively applied but fundamentally different strategies: quantitation based on spectral counting [10] and peptide ion peak area [11]. Spectral counting estimates protein abundance by counting the number of spectra matched to peptides from a speci�c protein. "
    [Show abstract] [Hide abstract]
    ABSTRACT: To address the challenges associated with differential expression proteomics, label-free mass spectrometric protein quantification methods have been developed as alternatives to array-based, gel-based, and stable isotope tag or label-based approaches. In this paper, we focus on the issues associated with label-free methods that rely on quantitation based on peptide ion peak area measurement. These issues include chromatographic alignment, peptide qualification for quantitation, and normalization. In addressing these issues, we present various approaches, assembled in a recently developed label-free quantitative mass spectrometry platform, that overcome these difficulties and enable comprehensive, accurate, and reproducible protein quantitation in highly complex protein mixtures from experiments with many sample groups. As examples of the utility of this approach, we present a variety of cases where the platform was applied successfully to assess differential protein expression or abundance in body fluids, in vitro nanotoxicology models, tissue proteomics in genetic knock-in mice, and cell membrane proteomics.
    Full-text · Article · Jan 2013
  • Source
    • "Some use wavelet-based transform such as using discrete wavelet transform to denoising the spectrum (Coombes et al., 2005b; Morris et al., 2005; Randolph and Yasui, 2006) or continuous wavelet-based pattern matching to detect peaks (Du et al., 2006). Also there are statistical and model-based methods (Dijkstra et al., 2006; Wang et al., 2006; Noy and Fasulo 2007). However, most of these peak detection algorithms identify [15:22 26/6/03 Bioinformatics-btn143.tex] "
    [Show abstract] [Hide abstract]
    ABSTRACT: Mass spectrometry (MS) has shown great potential in detecting disease-related biomarkers for early diagnosis of stroke. To discover potential biomarkers from large volume of noisy MS data, peak detection must be performed first. This article proposes a novel automatic peak detection method for the stroke MS data. In this method, a mixture model is proposed to model the spectrum. Bayesian approach is used to estimate parameters of the mixture model, and Markov chain Monte Carlo method is employed to perform Bayesian inference. By introducing a reversible jump method, we can automatically estimate the number of peaks in the model. Instead of separating peak detection into substeps, the proposed peak detection method can do baseline correction, denoising and peak identification simultaneously. Therefore, it minimizes the risk of introducing irrecoverable bias and errors from each substep. In addition, this peak detection method does not require a manually selected denoising threshold. Experimental results on both simulated dataset and stroke MS dataset show that the proposed peak detection method not only has the ability to detect small signal-to-noise ratio peaks, but also greatly reduces false detection rate while maintaining the same sensitivity. Contact:
    Full-text · Article · Aug 2008 · Bioinformatics
Show more