Scale normalization of histopathological images for batch invariant cancer diagnostic models


Histopathological images acquired from different experimental set-ups often suffer from batch-effects due to color variations and scale variations. In this paper, we develop a novel scale normalization model for histopathological images based on nuclear area distributions. Results indicate that the normalization model closely fits empirical values for two renal tumor datasets. We study the effect of scale normalization on classification of renal tumor images. Scale normalization improves classification performance in most cases. However, performance decreases in a few cases. In order to understand this, we propose two methods to filter extracted image features that are sensitive to image scaling and features that are uncorrelated with scaling factor. Feature filtering improves the classification performance of cases that were initially negatively affected by scale normalization.

1 Read
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: With the objective of bringing clinical decision support systems to reality, this article reviews histopathological whole-slide imaging informatics methods, associated challenges, and future research opportunities. This review targets pathologists and informaticians who have a limited understanding of the key aspects of whole-slide image (WSI) analysis and/or a limited knowledge of state-of-the-art technologies and analysis methods. First, we discuss the importance of imaging informatics in pathology and highlight the challenges posed by histopathological WSI. Next, we provide a thorough review of current methods for: quality control of histopathological images; feature extraction that captures image properties at the pixel, object, and semantic levels; predictive modeling that utilizes image features for diagnostic or prognostic applications; and data and information visualization that explores WSI for de novo discovery. In addition, we highlight future research directions and discuss the impact of large public repositories of histopathological data, such as the Cancer Genome Atlas, on the field of pathology informatics. Following the review, we present a case study to illustrate a clinical decision support system that begins with quality control and ends with predictive modeling for several cancer endpoints. Currently, state-of-the-art software tools only provide limited image processing capabilities instead of complete data analysis for clinical decision-making. We aim to inspire researchers to conduct more research in pathology imaging informatics so that clinical decision support can become a reality.
    Full-text · Article · Aug 2013 · Journal of the American Medical Informatics Association
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Researchers have developed computer-aided decision support systems for translational medicine that aim to objectively and efficiently diagnose cancer using histopathological images. However, the performance of such systems is confounded by nonbiological experimental variations or "batch effects" that can commonly occur in histopathological data, especially when images are acquired using different imaging devices and patient samples. This is even more problematic in large-scale studies in which cross-laboratory sharing of large volumes of data is necessary. Batch effects can change quantitative morphological image features and decrease the prediction performance. Using four batches of renal tumor images, we compare one image-level and five feature-level batch effect removal methods. Principal component variation analysis shows that batch is a large source of variance in image features. Results show that feature-level normalization methods reduce batch-contributed variance to almost zero. Moreover, feature-level normalization, especially ComBatN, improves cross-batch and combined-batch prediction performance. Compared to no normalization, ComBatN improves performance in 83% and 90% of cross-batch and combined-batch prediction models, respectively.
    Full-text · Article · May 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Clinical decision support systems use image processing and machine learning methods to objectively predict cancer in histopathological images. Integral to the development of machine learning classifiers is the ability to generalize from training data to unseen future data. A classification model's ability to accurately predict class label for new unseen data is measured by performance metrics, which also informs the classifier model selection process. Based on our research, commonly used metrics in literature (such as accuracy, ROC curve) do not accurately reflect the trained model's robustness. To the best of our knowledge, no research has been conducted to quantitatively compare performance metrics in the context of cancer prediction in histopathological images. In this paper, we evaluate various performance metrics and show that the Lift metric has the highest correlation between internal and external validation sets of a nested cross validation pipeline (R(2) = 0.57). Thus, we demonstrate that the Lift metric best generalizes classifier performance among the 23 metrics that were evaluated. Using the lift metric, we develop a classifier with a misclassification rate of 0.25 (4-class classifier) for data that the model was not trained on (external validation).
    No preview · Article · Aug 2014