REPRODUCIBILITY OF IMAGE
ANALYSIS FOR BREAST ULTRASOUND
M. Galperin, M.P. Andre, C.H. Barker, L.K. Olson, M. O’Boyle, K. Richman,
Almen Laboratories, Inc., Vista, California 92084, USA; Dept. of Radiology, San Diego VA
Healthcare System, San Diego, California 92161, USA; Dept. of Radiology, University of
California, San Diego, California 92093, USA
Abstract: We employ a Case-Based Reasoning approach to analyze breast masses in
ultrasound and to classify them for level of suspicion for cancer following the
ACR BI-RADS® protocol. Our computer-aided imaging system (Breast
Companion®, BC) measures numeric features of the mass, determines Relative
Similarity (RS) between the mass of interest and images in a database of
masses with known findings and outcomes, then retrieves and displays the
images of the most similar known masses instantaneously for the radiologist to
review during interpretation. This study tested BC for reproducibility of
performance in comparison to that of three radiologists under a variety of
operating conditions. The long-term goal is to standardize diagnosis, reduce
radiologist variability and reduce false positives.
Key words: Computer-aided diagnosis, Breast cancer, Ultrasound, Sonography, ROC
analysis, Relative similarity
The breast ultrasound (US) exam is widely recognized to be one of the more
difficult imaging procedures to perform and interpret. The American College
of Radiology developed the (Breast Imaging Reporting and Data System)
scheme to standardize scanning, interpretation and reporting. Acceptance
and utilization of BI-RADS® increasing but it has proven difficult to teach
Springer Science+Business Media B.V.
I. Akiyama (ed.),
the method, the quality of breast ultrasound is still regarded as highly
operator dependent and many published reports show radiologists are
uncomfortable with the number of benign and malignant masses that overlap
in appearance. As a result, even with combined information from
mammography and ultrasound it is often the case that each radiologist will
apply a different threshold for deciding to biopsy a suspicious mass.
Over several years, we have developed, tested and validated a
sophisticated computer-aided system for analyzing breast ultrasound [1–5].
This system (Breast Companion®, BC, Almen Laboratories, Inc.) provides
extensive tools to define and segment breast masses, computes numeric
features of the mass, compares the mass using Relative Similarity (RS) to
images in a database of masses with known findings and outcomes
(Reference Template Database), then retrieves and displays almost
instantaneously a cluster of the most similar cases. BC uses case-based
reasoning analysis (computerized lesion assessment, CLA) derived from
measurement of the following categories of lesion parameters: margins,
shape, echogenicity, echo texture, orientation, and posterior acoustic
attenuation pattern. BC requires no classifier training, its graphic user
interface. An entirely new user interface was developed for BC that
incorporates a medical reporting system in conformance with the BI-RADS®
sonography protocol. It tailored for the diagnostic breast ultrasound
examination with the goal to help standardize interpretation and reporting
plus to potentially reduce radiologist variability
The purpose of this study was to examine some factors that may impact
the reproducibility of results from the developed CAD in future clinical use.
ROC analysis (Analyze-It®) was the performance measure to estimate
variability in comparison to the intra- and inter-reader variability of three
radiologists. Specifically, the following issues were addressed in this study:
(1) BC performance compared to that of three radiologists reading three sets
of similar or identical cases, (2) radiologists’ reproducibility and (3) BC
reproducibility of performance with three independent datasets having
increasing number of test cases (152, 291, 595) compared to a constant
Reference Template Database (331 templates).
M. Galperin et al.
METHODS AND MATERIALS
BI-RADS® requires the radiologist to assign an assessment value of 0–6 to
an image, where, 0 is equivocal (requires more information), 1 is “no
finding,” 2 is “definitely benign,” 3 is “probably benign,” 4 is “probably
malignant,” 5 is “definitely malignant,” and 6 is a known cancer.
Breast Ultrasound Computer-Aided Diagnosis
The procedural steps for BC analysis of a mass identified on breast
Radiologist reviews all images in the study and then selects image
view(s) to be assessed.
Radiologist pre-processes image using standard set of tools such as
windowing, enhancing, smoothing, etc.
Radiologist guides automatic or manual segmentation of the
suspicious mass. BC measures image features of the defined mass.
Radiologist selects BI-RADS® reporting descriptors of the mass
from the report chart and records overall BI-RADS® Assessment
of the lesion.
BC retrieves and displays the most similar masses from the
Reference Template Database.
Based on all information including the retrieved similar cases with
known findings, the radiologist decides whether to have BC
compute an independent assessment, CLA (Fig. 1).
Radiologist completes intermediate
impressions, BI-RADS® Assessment and optionally results of BC
CLA that represents the highest assessed BI-RADS® Category for
a view selected by the radiologist.
Relative Similarity (RS) of the unknown mass is determined as follows
. The combinations of measured features of the mass from step (3) above
may be represented by an N-dimensional vector P used to calculate the
“Relative Similarity,” R, of one lesion to another. A new case with an
unknown finding is compared directly to the database of stored images and a
measure of R is computed for different lesions with confirmed findings.
Similarity is calculated for a particular lesion Pit (the index of this “lesion in
question” object) compared to the other lesions, Pk (k=1,…L). Image pre-
processing reduces speckle, increases contrast, enhances edge gradient, and
reduces shading effects to facilitate segmentation of the borders of the mass
but all measurements are made on the unprocessed image. Segmentation
involves a sequence of multi-level thresholding, radial gradient and region
growing. The process is successful with all patient cases but the radiologist
may choose to guide or edit the segmentation of more difficult masses. Our
system requires that the radiologist always be “in the loop” throughout the
process to ensure accuracy of lesion border definition. The process is open
(“white box”) and the reasons for a particular CLA assessment may be
displayed. Retrieval of the most similar cases for the radiologist to review
during interpretation is nearly instantaneous. BC provides a numeric 2–5
score for CLA on a continuous scale therefore in this study performance of
the CAD was examined for a variety of conditions using ROC analysis .
report that contains
Figure 1 shows a screen image of BC for a complex cyst compared to
other images in the Reference Template Database. The mass is dark, with
some internal echoes consistent with a cyst but with irregular indistinct
margins more consistent with a solid mass and higher suspicion for cancer.
Seven cases are automatically retrieved and displayed (with contours) on the
right listed in rank order of Relative Similarity. In this case, all seven of the
“similar” masses were benign and a low CLA score of 2.3 was calculated.
Figure 1. Complex cyst (left image) is compared to other images in the Reference
Under IRB approval four different sets of breast sonography data were
developed by retrieving cases chronologically from the medical center PACS
and computerized medical records systems. When a suitable case was found
where truth was confirmed (two-year benign follow up or biopsy), all
ultrasound images of the study were examined to ensure they were free of
graphic overlays or markers, had at least two views of each mass, had
minimal artifacts, had conclusive pathology results, etc., in accordance with
our acceptance criteria. Included cases were made anonymous and added
with all its images to the Research PACS archive. The cases were assigned a
sequential code number in the order of retrieval following our research
protocol. Although arduous, our data mining methods are now highly refined
and offer a very high yield of cases suitable for our research protocol.
The sizes of the three data sets read by the radiologists were: Set 1 (112
cases), Set 2 (215) and Set 3 (331). All three data sets contain non-overlapping
M. Galperin et al.
Breast Ultrasound Computer-Aided Diagnosis
cases and had the following average mix of cases: 30% simple cyst, 18%
complicated cysts, 30% solid benign and 22% malignant. A fourth
independent data set (Set 4, 595 cases) was recently assembled with a
statistically identical mix of cases. The radiologists’ interpretation of these
cases is not yet complete but performance of BC was measured with this
new data set using the Reference Template Database (331 templates). For
comparison we used a cohort of 152 cases assembled in a similar manner
during our previous validation study in 2003–04 [4,5]. The new Set 4 of 595
cases was sampled to provide a smaller set of cases (291) with the same mix
of findings. The age range was comparable for all data sets, 21–90 years old.
The age distribution and mix of findings in these research databases
correspond to the 5-year average population of cases in our Breast Imaging
Service so they are presumed to be representative samples.
For three datasets (N=112, 215, 331) area under the ROC curve, AZ, for the
three radiologists varied from 0.90±0.03 to 0.83±0.03, while inter- and intra-
reader ROC Areas were consistent within each data set but were not
significantly different (Table 1). In Data Set 3 with 331 lesions, weighted
kappa for the radiologists varied from 0.43 to 0.53 suggesting a “moderate”
level of agreement. The standard deviation for AZ was consistently ±0.02 to
±0.03 regardless of the size of the data set. BC was in an early form of
development when Set 1 was tested, but by the time Set 3 was analyzed,
extensive optimization of BC was completed. The stand-alone performance
of BC was significantly higher than the three radiologists on the same Data
Table 1. Radiologist ROC performance
Set 1 (112)
0.91 ± 0.02
0.90 ± 0.03
0.87 ± 0.03
Set 2 (215)
0.86 ± 0.03 0.87 ± 0.02
0.85 ± 0.03 0.83 ± 0.03
0.87 ± 0.02 0.87 ± 0.02
Set 3 (331)
0.98 ± 0.03 BC
Table 2. Breast companion ROC performance
BC 0.958 ± 0.02 0.965 ± 0.02 0.982 ± 0.02
When the size of the data set of test cases was increased from 152 to 595
(Table 2) areas under the ROC curve, AZ, for Breast Companion increased
from 0.96 to 0.98 with a consistent standard deviation of ±0.02. Clearly the
absolute number of benign cases was larger in the larger data sets but cancer
prevalence remained constant at 21% ± 0.01.
It remains to be seen how much variation for Ultrasound CAD will be
acceptable in practice but the variability of the radiologists themselves offer
a potential standard. The radiologists have not completed analysis of Set 4
(595) so direct comparison to BC is planned in a future study. Nonetheless,
AZ of BC for Set 3 and all three Data Sets in Table 2 are significantly higher
that those of the three radiologists in Table 1. It appears BC’s performance is
consistent and stable with the highly suspicious cases (BI-RADS®
Categories 4 and 5) and improves with the benign component because of
very high accuracy of CLA on low-suspicion-level lesions (BI-RADS®
Categories 2 and 3).
Results here suggest we may be able to study impact on radiologist
reading performance by having them interpret the set of 595 cases with and
without using BC (95% power). The goal will be to estimate potential
reduction in the number of False Positives without a statistically significant
increase in False Negatives. Much additional analysis needs to be done
including evaluating effects of the size of the Reference Template Database
on the reading performance and accuracy of CLA computations in general.
This work was supported in part by NIH/NCI 1 R41 CA108053-01 and
NIH/NCI 1 R44 CA 112858-01.
1. MP Andre, M Galperin, G Contro, N Omid, L Olson: Acoustical Imaging 28 (2007) p. 267.
2. MP Andre, M Galperin, G Contro, N Omid, et al.: Acoustical Imaging 28 (2007) p. 341.
3. MP Andre, M Galperin, LK Olson, et al.: SPIE Medical Imaging 4322 (2001) p. 507.
4. M Galperin: SPIE Medical Imaging 5034 (2003).
5. MP André, M Galperin, LK Olson, et al.: Acoustical Imaging 26 (2002) p. 453.
6. JA Hanley, BJ McNeil: Radiology 148 (1983) p. 839.
M. Galperin et al.