[show abstract][hide abstract] ABSTRACT: Development of high-throughput technologies makes it possible to survey the whole genome. Genomic studies have been extensively conducted, searching for markers with predictive power for prognosis of complex diseases such as cancer, diabetes and obesity. Most existing statistical analyses are focused on developing marker selection techniques, while little attention is paid to the underlying prognosis models. In this article, we review three commonly used prognosis models, namely the Cox, additive risk and accelerated failure time models. We conduct simulation and show that gene identification can be unsatisfactory under model misspecification. We analyze three cancer prognosis studies under the three models, and show that the gene identification results, prediction performance of all identified genes combined, and reproducibility of each identified gene are model-dependent. We suggest that in practical data analysis, more attention should be paid to the model assumption, and multiple models may need to be considered.
Briefings in Bioinformatics 07/2010; 11(4):385-93. · 5.30 Impact Factor
[show abstract][hide abstract] ABSTRACT: Extensive biomedical studies have shown that clinical and environmental risk factors may not have sufficient predictive power for cancer prognosis. The development of high-throughput profiling technologies makes it possible to survey the whole genome and search for genomic markers with predictive power. Many existing studies assume the interchangeability of gene effects and ignore the coordination among them.
We adopt the weighted co-expression network to describe the interplay among genes. Although there are several different ways of defining gene networks, the weighted co-expression network may be preferred because of its computational simplicity, satisfactory empirical performance, and because it does not demand additional biological experiments. For cancer prognosis studies with gene expression measurements, we propose a new marker selection method that can properly incorporate the network connectivity of genes. We analyze six prognosis studies on breast cancer and lymphoma. We find that the proposed approach can identify genes that are significantly different from those using alternatives. We search published literature and find that genes identified using the proposed approach are biologically meaningful. In addition, they have better prediction performance and reproducibility than genes identified using alternatives.
The network contains important information on the functionality of genes. Incorporating the network structure can improve cancer marker identification.