[show abstract][hide abstract] ABSTRACT: Infection with the malaria parasite Plasmodium falciparum leads to widely different clinical conditions in children, ranging from mild flu-like symptoms to coma and death. Despite the immense medical implications, the genetic and molecular basis of this diversity remains largely unknown. Studies of in vitro gene expression have found few transcriptional differences between different parasite strains. Here we present a large study of in vivo expression profiles of parasites derived directly from blood samples from infected patients. The in vivo expression profiles define three distinct transcriptional states. The biological basis of these states can be interpreted by comparison with an extensive compendium of expression data in the yeast Saccharomyces cerevisiae. The three states in vivo closely resemble, first, active growth based on glycolytic metabolism, second, a starvation response accompanied by metabolism of alternative carbon sources, and third, an environmental stress response. The glycolytic state is highly similar to the known profile of the ring stage in vitro, but the other states have not been observed in vitro. The results reveal a previously unknown physiological diversity in the in vivo biology of the malaria parasite, in particular evidence for a functional mitochondrion in the asexual-stage parasite, and indicate in vivo and in vitro studies to determine how this variation may affect disease manifestations and treatment.
[show abstract][hide abstract] ABSTRACT: Gene expression analysis has identified biologically relevant subclasses of breast cancer. However, most classification schemes do not robustly cluster all HER2+ breast cancers, in part due to limitations and bias of clustering techniques used. In this article, we propose an alternative approach that first separates the HER2+ tumors using a gene amplification signal for Her2/neu amplicon genes and then applies consensus ensemble clustering separately to the HER2+ and HER2- clusters to look for further substructure. We applied this procedure to a microarray data set of 286 early-stage breast cancers treated only with surgery and radiation and identified two basal and four luminal subtypes in the HER2- tumors, as well as two novel and robust HER2+ subtypes. HER2+ subtypes had median distant metastasis-free survival of 99 months [95% confidence interval (95% CI), 83-118 months] and 33 months (95% CI, 11-54 months), respectively, and recurrence rates of 11% and 58%, respectively. The low recurrence subtype had a strong relative overexpression of lymphocyte-associated genes and was also associated with a prominent lymphocytic infiltration on histologic analysis. These data suggest that early-stage HER2+ cancers associated with lymphocytic infiltration are a biologically distinct subtype with an improved natural history.
Cancer Research 12/2007; 67(22):10669-76. · 8.65 Impact Factor
[show abstract][hide abstract] ABSTRACT: The high dimensionality of global transcription profiles, the expression level of 20,000 genes in a much small number of samples, presents challenges that affect the sensitivity and general applicability of analysis results. In principle, it would be better to describe the data in terms of a small number of metagenes, positive linear combinations of genes, which could reduce noise while still capturing the invariant biological features of the data. Here, we describe how to accomplish such a reduction in dimension by a metagene projection methodology, which can greatly reduce the number of features used to characterize microarray data. We show, in applications to the analysis of leukemia and lung cancer data sets, how this approach can help assess and interpret similarities and differences between independent data sets, enable cross-platform and cross-species analysis, improve clustering and class prediction, and provide a computational means to detect and remove sample contamination.
Proceedings of the National Academy of Sciences 05/2007; 104(14):5959-64. · 9.74 Impact Factor
[show abstract][hide abstract] ABSTRACT: Clustering analysis of microarray data is often criticized for giving ambiguous results because of sensitivity to data perturbation or clustering techniques used. In this paper, we describe a new method based on principal component analysis and ensemble consensus clustering that avoids these problems.
We illustrate the method on a public microarray dataset from 36 breast cancer patients of whom 31 were diagnosed with at least two of three pathological stages of disease (atypical ductal hyperplasia (ADH), ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC). Our method identifies an optimum set of genes and divides the samples into stable clusters which correlate with clinical classification into Luminal, Basal-like and Her2+ subtypes. Our analysis reveals a hierarchical portrait of breast cancer progression and identifies genes and pathways for each stage, grade and subtype. An intriguing observation is that the disease phenotype is distinguishable in ADH and progresses along distinct pathways for each subtype. The genetic signature for disease heterogeneity across subtypes is greater than the heterogeneity of progression from DCIS to IDC within a subtype, suggesting that the disease subtypes have distinct progression pathways. Our method identifies six disease subtype and one normal clusters. The first split separates the normal samples from the cancer samples. Next, the cancer cluster splits into low grade (pathological grades 1 and 2) and high grade (pathological grades 2 and 3) while the normal cluster is unchanged. Further, the low grade cluster splits into two subclusters and the high grade cluster into four. The final six disease clusters are mapped into one Luminal A, three Luminal B, one Basal-like and one Her2+.
We confirm that the cancer phenotype can be identified in early stage because the genes altered in this stage progressively alter further as the disease progresses through DCIS into IDC. We identify six subtypes of disease which have distinct genetic signatures and remain separated in the clustering hierarchy. Our findings suggest that the heterogeneity of disease across subtypes is higher than the heterogeneity of the disease progression within a subtype, indicating that the subtypes are in fact distinct diseases.
[show abstract][hide abstract] ABSTRACT: We describe a new method based on principal component analysis and robust consensus ensemble clustering to identify and elucidate the subtypes of breast cancer disease. The method was applied to microarray gene expression data using micro-dissection of samples from 36 breast cancer patients with at least two of three pathological stages of disease. Controls were normal breast epithelial cells from 3 disease free patients. Our method identified an optimum set of genes and strong, stable clusters which correlated well with clinical classification into Luminal, Basal and Her2+ subtypes based on ER, PR and Her2 status. It also revealed a hierarchical portrait of disease progression through various grades and stages and identified genes and functional pathways for each stage, grade and disease subtype. We found that gene expression heterogeneity across subtypes is much greater than the heterogeneity of progression from DCIS to IDC within a subtype, suggesting that the disease subtypes are distinct disease processes. The averaging over data perturbations and clustering methods is critical in the robust identification of subtypes and gene markers for grade and progression.
Genome informatics. International Conference on Genome Informatics 02/2007; 18:130-40.