Identification of candidate colon cancer biomarkers by applying a random forest approach on microarray data.
ABSTRACT Colon cancer is the third most common cancer and one of the leading causes of cancer-related death in the world. Therefore, identification of biomarkers with potential in recognizing the biological characteristics is a key problem for early diagnosis of colon cancer patients. In this study, we used a random forest approach to discover biomarkers based on a set of oligonucleotide microarray data of colon cancer. Real-time PCR was used to validate the related expression levels of biomarkers selected by our approach. Furthermore, ROC curves were used to analyze the sensitivity and specificity of each biomarker in both training and test sample sets. Finally, we analyzed the clinical significance of each biomarker based on their differential expression. A single classifier consisting of 4 genes (IL8, WDR77, MYL9 and VIP) was selected by random forests with an average sensitivity and specificity of 83.75 and 76.15%. The differential expression levels of each biomarker was validated by real-time PCR in 48 test colon cancer samples compared to the matched normal tissues. Patients with high expression of IL8 and WDR77, and low expression of MYL9 and VIP had a significantly reduced median survival rate compared to colon cancer patients. The results indicate that our approach can be employed for biomarker identification based on microarray data. These 4 genes identified by our approach have the potential to act as clinical biomarkers for the early diagnosis of colon cancer.