Learning to predict relapse in invasive ductal carcinomas based on the subcellular localization of junctional proteins

ArticleinBreast Cancer Research and Treatment 121(2):527-38 · September 2009with19 Reads
Impact Factor: 3.94 · DOI: 10.1007/s10549-009-0557-0 · Source: PubMed


    The complexity of breast cancer biology makes it challenging to analyze large datasets of clinicopathologic and molecular attributes, toward identifying the key prognostic features and producing systems capable of predicting which patients are likely to relapse. We applied machine-learning techniques to analyze a set of well-characterized primary breast cancers, which specified the abundance and localization of various junctional proteins. We hypothesized that disruption of junctional complexes would lead to the cytoplasmic/nuclear redistribution of the protein components and their potential interactions with growth-regulating molecules, which would promote relapse, and that machine-learning techniques could use the subcellular locations of these proteins, together with standard clinicopathological data, to produce an efficient prognostic classifier. We used immunohistochemistry to assess the expression and subcellular distribution of six junctional proteins, in addition to a panel of eight standard clinical features and concentrations of four "growth-regulating" proteins, to produce a database involving 36 features, over 66 primary invasive ductal breast carcinomas. A machine-learning system was applied to this clinicopathologic dataset to produce a decision-tree classifier that could predict whether a novel breast cancer patient would relapse. We show that this decision-tree classifier, which incorporates a combination of only four features (nuclear alpha- and beta-catenin levels, the total level of PTEN and the number of involved axillary lymph nodes), is able to correctly classify patient outcomes essentially 80% of the time. Further, this classifier is significantly better than classifiers based on any subgroup of these 36 features. This study demonstrates that autonomous machine-learning techniques are able to generate simple and efficient decision-tree prognostic classifiers from a wide variety of clinical, pathologic and biomarker data, and unlike other analytic methods, suggest testable biologic relationships among explicitly identified key variables. The decision-tree classifier resulting from these analytic methods is sufficiently simple and should be widely applicable to a spectrum of clinical cancer settings. Further, the subcellular distribution of junctional proteins, which influences growth regulatory pathways involved in locoregional and metastatic relapse of breast cancer, helped to identify which patients would relapse while their total concentration did not. This emphasizes the need to evaluate the subcellular distribution of junctional proteins in assessing their contribution to tumor progression.