Article

Computational prediction of human proteins that can be secreted into the bloodstream.

Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA 30602, USA.
Bioinformatics (Impact Factor: 5.47). 09/2008; 24(20):2370-5. DOI: 10.1093/bioinformatics/btn418
Source: PubMed

ABSTRACT We present a novel computational method for predicting which proteins from highly and abnormally expressed genes in diseased human tissues, such as cancers, can be secreted into the bloodstream, suggesting possible marker proteins for follow-up serum proteomic studies. A main challenging issue in tackling this problem is that our understanding about the downstream localization after proteins are secreted outside the cells is very limited and not sufficient to provide useful hints about secretion to the bloodstream. To bypass this difficulty, we have taken a data mining approach by first collecting, through extensive literature searches, human proteins that are known to be secreted into the bloodstream due to various pathological conditions as detected by previous proteomic studies, and then asking the question: 'what do these secreted proteins have in common in terms of their physical and chemical properties, amino acid sequence and structural features that can be used to predict them?' We have identified a list of features, such as signal peptides, transmembrane domains, glycosylation sites, disordered regions, secondary structural content, hydrophobicity and polarity measures that show relevance to protein secretion. Using these features, we have trained a support vector machine-based classifier to predict protein secretion to the bloodstream. On a large test set containing 98 secretory proteins and 6601 non-secretory proteins of human, our classifier achieved approximately 90% prediction sensitivity and approximately 98% prediction specificity. Several additional datasets are used to further assess the performance of our classifier. On a set of 122 proteins that were found to be of abnormally high abundance in human blood due to various cancers, our program predicted 62 as blood-secreted proteins. By applying our program to abnormally highly expressed genes in gastric cancer and lung cancer tissues detected through microarray gene expression studies, we predicted 13 and 31 as blood secreted, respectively, suggesting that they could serve as potential biomarkers for these two cancers, respectively. Our study demonstrated that our method can provide highly useful information to link genomic and proteomic studies for disease biomarker discovery. Our software can be accessed at http://csbl1.bmb.uga.edu/cgi-bin/Secretion/secretion.cgi.

0 Bookmarks
 · 
78 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A comparative analysis of genome-scale transcriptomic data of two types of skin cancers, melanoma and basal cell carcinoma in comparison with other cancer types, was conducted with the aim of identifying key regulatory factors that either cause or contribute to the aggressiveness of melanoma, while basal cell carcinoma generally remains a mild disease. Multiple cancer-related pathways such as cell proliferation, apoptosis, angiogenesis, cell invasion and metastasis, are considered, but our focus is on energy metabolism, cell invasion and metastasis pathways. Our findings include the following. (a) Both types of skin cancers use both glycolysis and increased oxidative phosphorylation (electron transfer chain) for their energy supply. (b) Advanced melanoma shows substantial up-regulation of key genes involved in fatty acid metabolism (β-oxidation) and oxidative phosphorylation, with aerobic metabolism being far more efficient than anaerobic glycolysis, providing a source of the energetics necessary to support the rapid growth of this cancer. (c) While advanced melanoma is similar to pancreatic cancer in terms of the activity level of genes involved in promoting cell invasion and metastasis, the main metastatic form of basal cell carcinoma is substantially reduced in this activity, partially explaining why this cancer type has been considered as far less aggressive. Our method of using comparative analyses of transcriptomic data of multiple cancer types focused on specific pathways provides a novel and highly effective approach to cancer studies in general.
    PLoS ONE 01/2012; 7(1):e30750. · 3.53 Impact Factor
  • Source
    02/2012; , ISBN: 978-953-307-812-0
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Proteins can move from blood circulation into salivary glands through active transportation, passive diffusion or ultrafiltration, some of which are then released into saliva and hence can potentially serve as biomarkers for diseases if accurately identified. We present a novel computational method for predicting salivary proteins that come from circulation. The basis for the prediction is a set of physiochemical and sequence features we found to be discerning between human proteins known to be movable from circulation to saliva and proteins deemed to be not in saliva. A classifier was trained based on these features using a support-vector machine to predict protein secretion into saliva. The classifier achieved 88.56% average recall and 90.76% average precision in 10-fold cross-validation on the training data, indicating that the selected features are informative. Considering the possibility that our negative training data may not be highly reliable (i.e., proteins predicted to be not in saliva), we have also trained a ranking method, aiming to rank the known salivary proteins from circulation as the highest among the proteins in the general background, based on the same features. This prediction capability can be used to predict potential biomarker proteins for specific human diseases when coupled with the information of differentially expressed proteins in diseased versus healthy control tissues and a prediction capability for blood-secretory proteins. Using such integrated information, we predicted 31 candidate biomarker proteins in saliva for breast cancer.
    PLoS ONE 01/2013; 8(11):e80211. · 3.53 Impact Factor

Full-text (2 Sources)

View
0 Downloads
Available from
Sep 2, 2014