Feature extraction in the analysis of proteomic mass spectra

State University of New York, Stony Brook, NY, USA.
PROTEOMICS (Impact Factor: 3.81). 07/2006; 6(7):2095-100. DOI: 10.1002/pmic.200500459
Source: PubMed


Feature extraction or biomarker selection is a critical step in disease diagnosis and knowledge discovery based on protein MS. Many studies have discussed the classification methods applied in proteomics; however, few could be found to address feature extraction in detail. In this paper, we developed a systematic approach for the extraction of mass spectrum peak apex and peak area with special emphasis on noise filtration and peak calibration. Application to a head and neck cancer data generated at the Eastern Virginia Medical School [Wadsworth, J. T., Somers, K. D., Cazares, L. H., Malik, G. et al.., Clin. Cancer Res. 2004, 10, 1625-1632] revealed that the new feature extraction method would yield consistent and highly discriminatory biomarkers.

Download full-text


Available from: Oliver John Semmes, Jan 12, 2015
18 Reads
  • Source
    • "abundance, such as the peptide peak intensity (height or area of a peak), the peptide precursor ion peak height, and the peak height of product ions, can be extracted. Using such information individually or combinatorially, numerous label-free methods have been developed, including two extensively applied but fundamentally different strategies: quantitation based on spectral counting [10] and peptide ion peak area [11]. Spectral counting estimates protein abundance by counting the number of spectra matched to peptides from a speci�c protein. "
    [Show abstract] [Hide abstract]
    ABSTRACT: To address the challenges associated with differential expression proteomics, label-free mass spectrometric protein quantification methods have been developed as alternatives to array-based, gel-based, and stable isotope tag or label-based approaches. In this paper, we focus on the issues associated with label-free methods that rely on quantitation based on peptide ion peak area measurement. These issues include chromatographic alignment, peptide qualification for quantitation, and normalization. In addressing these issues, we present various approaches, assembled in a recently developed label-free quantitative mass spectrometry platform, that overcome these difficulties and enable comprehensive, accurate, and reproducible protein quantitation in highly complex protein mixtures from experiments with many sample groups. As examples of the utility of this approach, we present a variety of cases where the platform was applied successfully to assess differential protein expression or abundance in body fluids, in vitro nanotoxicology models, tissue proteomics in genetic knock-in mice, and cell membrane proteomics.
    01/2013; 2013(1):756039. DOI:10.1155/2013/756039
  • Source
    • "Some use wavelet-based transform such as using discrete wavelet transform to denoising the spectrum (Coombes et al., 2005b; Morris et al., 2005; Randolph and Yasui, 2006) or continuous wavelet-based pattern matching to detect peaks (Du et al., 2006). Also there are statistical and model-based methods (Dijkstra et al., 2006; Wang et al., 2006; Noy and Fasulo 2007). However, most of these peak detection algorithms identify [15:22 26/6/03 Bioinformatics-btn143.tex] "
    [Show abstract] [Hide abstract]
    ABSTRACT: Mass spectrometry (MS) has shown great potential in detecting disease-related biomarkers for early diagnosis of stroke. To discover potential biomarkers from large volume of noisy MS data, peak detection must be performed first. This article proposes a novel automatic peak detection method for the stroke MS data. In this method, a mixture model is proposed to model the spectrum. Bayesian approach is used to estimate parameters of the mixture model, and Markov chain Monte Carlo method is employed to perform Bayesian inference. By introducing a reversible jump method, we can automatically estimate the number of peaks in the model. Instead of separating peak detection into substeps, the proposed peak detection method can do baseline correction, denoising and peak identification simultaneously. Therefore, it minimizes the risk of introducing irrecoverable bias and errors from each substep. In addition, this peak detection method does not require a manually selected denoising threshold. Experimental results on both simulated dataset and stroke MS dataset show that the proposed peak detection method not only has the ability to detect small signal-to-noise ratio peaks, but also greatly reduces false detection rate while maintaining the same sensitivity. Contact:
    Bioinformatics 08/2008; 24(13):i407-13. DOI:10.1093/bioinformatics/btn143 · 4.98 Impact Factor
  • Source
    • "Given the significant consistency of noise level across spectra, also after preprocessing, an alternative technique for the reduction of false peak discovery was proposed in the literature [15]. It is not based on data smoothing, but on the determination of a proper non-uniform threshold, in order to exclude intensities associated with the noise. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Mass spectrometry protein profiling is a promising tool for biomarker discovery in clinical proteomics. However, the development of a reliable approach for the separation of protein signals from noise is required. In this paper, LIMPIC, a computational method for the detection of protein peaks from linear-mode MALDI-TOF data is proposed. LIMPIC is based on novel techniques for background noise reduction and baseline removal. Peak detection is performed considering the presence of a non-homogeneous noise level in the mass spectrum. A comparison of the peaks collected from multiple spectra is used to classify them on the basis of a detection rate parameter, and hence to separate the protein signals from other disturbances. LIMPIC preprocessing proves to be superior than other classical preprocessing techniques, allowing for a reliable decomposition of the background noise and the baseline drift from the MALDI-TOF mass spectra. It provides lower coefficient of variation associated with the peak intensity, improving the reliability of the information that can be extracted from single spectra. Our results show that LIMPIC peak-picking is effective even in low protein concentration regimes. The analytical comparison with commercial and freeware peak-picking algorithms demonstrates its superior performances in terms of sensitivity and specificity, both on in-vitro purified protein samples and human plasma samples. The quantitative information on the peak intensity extracted with LIMPIC could be used for the recognition of significant protein profiles by means of advanced statistic tools: LIMPIC might be valuable in the perspective of biomarker discovery.
    BMC Bioinformatics 02/2007; 8(1):101. DOI:10.1186/1471-2105-8-101 · 2.58 Impact Factor
Show more