Feature extraction in the analysis of proteomic mass spectra

State University of New York, Stony Brook, NY, USA.
PROTEOMICS (Impact Factor: 3.97). 07/2006; 6(7):2095-100. DOI: 10.1002/pmic.200500459
Source: PubMed

ABSTRACT Feature extraction or biomarker selection is a critical step in disease diagnosis and knowledge discovery based on protein MS. Many studies have discussed the classification methods applied in proteomics; however, few could be found to address feature extraction in detail. In this paper, we developed a systematic approach for the extraction of mass spectrum peak apex and peak area with special emphasis on noise filtration and peak calibration. Application to a head and neck cancer data generated at the Eastern Virginia Medical School [Wadsworth, J. T., Somers, K. D., Cazares, L. H., Malik, G. et al.., Clin. Cancer Res. 2004, 10, 1625-1632] revealed that the new feature extraction method would yield consistent and highly discriminatory biomarkers.

Download full-text


Available from: Oliver John Semmes, Jan 12, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: To address the challenges associated with differential expression proteomics, label-free mass spectrometric protein quantification methods have been developed as alternatives to array-based, gel-based, and stable isotope tag or label-based approaches. In this paper, we focus on the issues associated with label-free methods that rely on quantitation based on peptide ion peak area measurement. These issues include chromatographic alignment, peptide qualification for quantitation, and normalization. In addressing these issues, we present various approaches, assembled in a recently developed label-free quantitative mass spectrometry platform, that overcome these difficulties and enable comprehensive, accurate, and reproducible protein quantitation in highly complex protein mixtures from experiments with many sample groups. As examples of the utility of this approach, we present a variety of cases where the platform was applied successfully to assess differential protein expression or abundance in body fluids, in vitro nanotoxicology models, tissue proteomics in genetic knock-in mice, and cell membrane proteomics.
    01/2013; 2013:756039. DOI:10.1155/2013/756039
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Mass spectrometry (MS) has shown great potential in detecting disease-related biomarkers for early diagnosis of stroke. To discover potential biomarkers from large volume of noisy MS data, peak detection must be performed first. This article proposes a novel automatic peak detection method for the stroke MS data. In this method, a mixture model is proposed to model the spectrum. Bayesian approach is used to estimate parameters of the mixture model, and Markov chain Monte Carlo method is employed to perform Bayesian inference. By introducing a reversible jump method, we can automatically estimate the number of peaks in the model. Instead of separating peak detection into substeps, the proposed peak detection method can do baseline correction, denoising and peak identification simultaneously. Therefore, it minimizes the risk of introducing irrecoverable bias and errors from each substep. In addition, this peak detection method does not require a manually selected denoising threshold. Experimental results on both simulated dataset and stroke MS dataset show that the proposed peak detection method not only has the ability to detect small signal-to-noise ratio peaks, but also greatly reduces false detection rate while maintaining the same sensitivity. Contact:
    Bioinformatics 08/2008; 24(13):i407-13. DOI:10.1093/bioinformatics/btn143 · 4.62 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Early detection of cancer is a critical issue for improving patient survival rates. Recent progress in mass spectrometry has shown the promising potential of biomarker discovery in the diagnosis of diseases especially in early stages. In the present study, an alternative approach to feature extraction from mass spectrometry data of prostate cancer is proposed that results in the definition of different biomarkers. The latter provide information rich features that improve the performance of an MLP classifier in differentiating among datasets with different PSA levels of prostate cancer and with no evidence of disease. Prostate cancer dataset was collected from the National Cancer Institute Clinical Proteomics Database. The overall accuracy, in correctly classifying 63 spectra with no evidence of disease (PSA<1) and 69 spectra with prostate cancer (PSA≥4), was 95%. Furthermore 93% was the classification overall accuracy in discriminating 26 spectra of prostate cancer with (4 PSA<10) from 43 spectra of prostate cancer with (PSA>10). The high accuracies obtained by the proposed method might lead to informative biomarkers for early stage of prostate cancer diagnosis.