Breast Cancer Risk Estimation With Artificial Neural Networks Revisited Discrimination and Calibration

Industrial and Systems Engineering Department, University of Wisconsin, Madison, Wisconsin 53792-3252, USA.
Cancer (Impact Factor: 4.9). 07/2010; 116(14):3310-21. DOI: 10.1002/cncr.25081
Source: PubMed

ABSTRACT Discriminating malignant breast lesions from benign ones and accurately predicting the risk of breast cancer for individual patients are crucial to successful clinical decisions. In the past, several artificial neural network (ANN) models have been developed for breast cancer-risk prediction. All studies have reported discrimination performance, but not one has assessed calibration, which is an equivalently important measure for accurate risk prediction. In this study, the authors have evaluated whether an artificial neural network (ANN) trained on a large prospectively collected dataset of consecutive mammography findings can discriminate between benign and malignant disease and accurately predict the probability of breast cancer for individual patients.
Our dataset consisted of 62,219 consecutively collected mammography findings matched with the Wisconsin State Cancer Reporting System. The authors built a 3-layer feedforward ANN with 1000 hidden-layer nodes. The authors trained and tested their ANN by using 10-fold cross-validation to predict the risk of breast cancer. The authors used area the under the receiver-operating characteristic curve (AUC), sensitivity, and specificity to evaluate discriminative performance of the radiologists and their ANN. The authors assessed the accuracy of risk prediction (ie, calibration) of their ANN by using the Hosmer-Lemeshow (H-L) goodness-of-fit test.
Their ANN demonstrated superior discrimination (AUC, 0.965) compared with the radiologists (AUC, 0.939; P<.001). The authors' ANN was also well calibrated as shown by an H-L goodness of fit P-value of .13.
The authors' ANN can effectively discriminate malignant abnormalities from benign ones and accurately predict the risk of breast cancer for individual abnormalities.

Download full-text


Available from: Jagpreet Chhatwal, May 20, 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Breast cancer is the most common non-skin cancer affecting women in the United States, where every year more than 20 million mammograms are performed. Breast biopsy is commonly performed on the suspicious findings on mammograms to confirm the presence of cancer. Currently, 700,000 biopsies are performed annually in the U.S.; 55%-85% of these biopsies ultimately are found to be benign breast lesions, resulting in unnecessary treatments, patient anxiety, and expenditures. This paper addresses the decision problem faced by radiologists: When should a woman be sent for biopsy based on her mammographic features and demographic factors? This problem is formulated as a finite-horizon discrete-time Markov decision process. The optimal policy of our model shows that the decision to biopsy should take the age of patient into account; particularly, an older patient's risk threshold for biopsy should be higher than that of a younger patient. When applied to the clinical data, our model outperforms radiologists in the biopsy decision-making problem. This study also derives structural properties of the model, including sufficiency conditions that ensure the existence of a control-limit type policy and nondecreasing control-limits with age.
    Operations Research 11/2010; 58(6):1577-1591. DOI:10.1287/opre.1100.0877
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this work we show that combining physician rules and machine learned rules may improve the performance of a classifier that predicts whether a breast cancer is missed on percutaneous, image-guided breast core needle biopsy (subsequently referred to as "breast core biopsy"). Specifically, we show how advice in the form of logical rules, derived by a sub-specialty, i.e. fellowship trained breast radiologists (subsequently referred to as "our physicians") can guide the search in an inductive logic programming system, and improve the performance of a learned classifier. Our dataset of 890 consecutive benign breast core biopsy results along with corresponding mammographic findings contains 94 cases that were deemed non-definitive by a multidisciplinary panel of physicians, from which 15 were upgraded to malignant disease at surgery. Our goal is to predict upgrade prospectively and avoid surgery in women who do not have breast cancer. Our results, some of which trended toward significance, show evidence that inductive logic programming may produce better results for this task than traditional propositional algorithms with default parameters. Moreover, we show that adding knowledge from our physicians into the learning process may improve the performance of the learned classifier trained only on data.
    AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 01/2011; 2011:349-55.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Breast screening is the regular examination of a woman's breasts to find breast cancer earlier. The sole exam approved for this purpose is mammography. Usually, findings are annotated through the Breast Imaging Reporting and Data System (BIRADS) created by the American College of Radiology. The BIRADS system determines a standard lexicon to be used by radiologists when studying each finding. Although the lexicon is standard, the annotation accuracy of the findings depends on the experience of the radiologist. Moreover, the accuracy of the classification of a mammography is also highly dependent on the expertise of the radiologist. A correct classification is paramount due to economical and humanitarian reasons. The main goal of this work is to produce machine learning models that predict the outcome of a mammography from a reduced set of annotated mammography findings. In the study we used a data set consisting of 348 consecutive breast masses that underwent image guided or surgical biopsy performed between October 2005 and December 2007 on 328 female subjects. The main conclusions are threefold: (1) automatic classification of a mammography, independent on information about mass density, can reach equal or better results than the classification performed by a physician; (2) mass density seems to be a good indicator of malignancy, as previous studies suggested; (3) a machine learning model can predict mass density with a quality as good as the specialist blind to biopsy, which is one of our main contributions. Our model can predict malignancy in the absence of the mass density attribute, since we can fill up this attribute using our mass density predictor.
    11/2011; 2011. DOI:10.1109/BIBM.2011.71