ArticlePDF Available

Automated segmentation and measurement of the female pelvic floor from the mid‐sagittal plane of 3D ultrasound volumes

Wiley
Medical Physics
Authors:

Abstract and Figures

Background Transperineal ultrasound (TPUS) is a valuable imaging tool for evaluating patients with pelvic floor disorders, including pelvic organ prolapse (POP). Currently, measurements of anatomical structures in the mid‐sagittal plane of 2D and 3D US volumes are obtained manually, which is time‐consuming, has high intra‐rater variability, and requires an expert in pelvic floor US interpretation. Manual segmentation and biometric measurement can take 15 min per 2D mid‐sagittal image by an expert operator. An automated segmentation method would provide quantitative data relevant to pelvic floor disorders and improve the efficiency and reproducibility of segmentation‐based biometric methods. Purpose Develop a fast, reproducible, and automated method of acquiring biometric measurements and organ segmentations from the mid‐sagittal plane of female 3D TPUS volumes. Methods Our method used a nnU‐Net segmentation model to segment the pubis symphysis, urethra, bladder, rectum, rectal ampulla, and anorectal angle in the mid‐sagittal plane of female 3D TPUS volumes. We developed an algorithm to extract relevant biometrics from the segmentations. Our dataset included 248 3D TPUS volumes, 126/122 rest/Valsalva split, from 135 patients. System performance was assessed by comparing the automated results with manual ground truth data using the Dice similarity coefficient (DSC) and average absolute difference (AD). Intra‐class correlation coefficient (ICC) and time difference were used to compare reproducibility and efficiency between manual and automated methods respectively. High ICC, low AD and reduction in time indicated an accurate and reliable automated system, making TPUS an efficient alternative for POP assessment. Paired t‐test and non‐parametric Wilcoxon signed‐rank test were conducted, with p < 0.05 determining significance. Results The nnU‐Net segmentation model reported average DSC and p values (in brackets), compared to the next best tested model, of 87.4% (<0.0001), 68.5% (<0.0001), 61.0% (0.1), 54.6% (0.04), 49.2% (<0.0001) and 33.7% (0.02) for bladder, rectum, urethra, pubic symphysis, anorectal angle, and rectal ampulla respectively. The average ADs for the bladder neck position, bladder descent, rectal ampulla descent and retrovesical angle were 3.2 mm, 4.5 mm, 5.3 mm and 27.3°, respectively. The biometric algorithm had an ICC > 0.80 for the bladder neck position, bladder descent and rectal ampulla descent when compared to manual measurements, indicating high reproducibility. The proposed algorithms required approximately 1.27 s to analyze one image. The manual ground truths were performed by a single expert operator. In addition, due to high operator dependency for TPUS image collection, we would need to pursue further studies with images collected from multiple operators. Conclusions Based on our search in scientific databases (i.e., Web of Science, IEEE Xplore Digital Library, Elsevier ScienceDirect and PubMed), this is the first reported work of an automated segmentation and biometric measurement system for the mid‐sagittal plane of 3D TPUS volumes. The proposed algorithm pipeline can improve the efficiency (1.27 s compared to 15 min manually) and has high reproducibility (high ICC values) compared to manual TPUS analysis for pelvic floor disorder diagnosis. Further studies are needed to verify this system's viability using multiple TPUS operators and multiple experts for performing manual segmentation and extracting biometrics from the images.
This content is subject to copyright. Terms and conditions apply.
Received: 29 July 2022 Revised: 17 March 2023 Accepted:17 March 2023
DOI: 10.1002/mp.16389
RESEARCH ARTICLE
Automated segmentation and measurement of the female
pelvic floor from the mid-sagittal plane of 3D ultrasound
volumes
Zachary Szentimrey1Golafsoun Ameri2Christopher X. Hong3
Rachel Y. K. Cheung4Eranga Ukwatta1Ahmed Eltahawi2,5
1School of Engineering,University of Guelph,
Guelph, Ontario, Canada
2Cosm Medical, Toronto, Ontario, Canada
3Department of Obstetrics & Gynaecology,
University of Michigan, Ann Arbor, Michigan,
USA
4Department of Obstetrics & Gynaecology,
Faculty of Medicine, The Chinese University
of Hong Kong,Hong Kong
5Information System Depar tment, Faculty of
Computers and Informatics, Suez Canal
University, Ismailia, Egypt
Correspondence
Zachary Szentimrey and Eranga Ukwatta,
School of Engineering, University of Guelph,
Guelph, Ontario, Canada.
Emai: zszentim@uoguelph.ca and
eukwatta@uoguelph.ca
Funding information
Mitacs Accelera te program, NSERC
Discovery g rant and Cosm Medical; Canada
Foundation for Innovation; Government of
Ontario; Ontario Research Fund—Research
Excellence; University of Toronto
Abstract
Background: Transperineal ultrasound (TPUS) is a valuable imaging tool for
evaluating patients with pelvic floor disorders, including pelvic organ prolapse
(POP). Currently, measurements of anatomical structures in the mid-sagittal
plane of 2D and 3D US volumes are obtained manually, which is time-
consuming, has high intra-rater variability,and requires an expert in pelvic floor
US interpretation. Manual segmentation and biometric measurement can take
15 min per 2D mid-sagittal image by an expert operator.An automated segmen-
tation method would provide quantitative data relevant to pelvic floor disorders
and improve the efficiency and reproducibility of segmentation-based biometric
methods.
Purpose: Develop a fast,reproducible,and automated method of acquiring bio-
metric measurements and organ segmentations from the mid-sagittal plane of
female 3D TPUS volumes.
Methods: Our method used a nnU-Net segmentation model to segment the
pubis symphysis,urethra,bladder,rectum,rectal ampulla, and anorectal angle in
the mid-sagittal plane of female 3D TPUS volumes.We developed an algorithm
to extract relevant biometrics from the segmentations.Our dataset included 248
3D TPUS volumes,126/122 rest/Valsalva split,from 135 patients.System perfor-
mance was assessed by comparing the automated results with manual ground
truth data using the Dice similarity coefficient (DSC) and average absolute dif-
ference (AD). Intra-class correlation coefficient (ICC) and time difference were
used to compare reproducibility and efficiency between manual and automated
methods respectively.High ICC,low AD and reduction in time indicated an accu-
rate and reliable automated system, making TPUS an efficient alternative for
POP assessment. Paired t-test and non-parametric Wilcoxon signed-rank test
were conducted, with p<0.05 determining significance.
Results: The nnU-Net segmentation model reported average DSC and pval-
ues (in brackets),compared to the next best tested model, of 87.4% (<0.0001),
68.5% (<0.0001), 61.0% (0.1), 54.6% (0.04), 49.2% (<0.0001) and 33.7%
(0.02) for bladder, rectum, urethra, pubic symphysis, anorectal angle, and rectal
ampulla respectively. The average ADs for the bladder neck position, bladder
descent, rectal ampulla descent and retrovesical angle were 3.2 mm, 4.5 mm,
5.3 mm and 27.3, respectively. The biometric algorithm had an ICC >0.80
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial- NoDerivs License, which permits use and distribution in any
medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.
© 2023 The Authors. Medical Physics published by Wiley Periodicals LLC on behalf of American Association of Physicists in Medicine.
Med Phys. 2023;50:6215–6227. wileyonlinelibrary.com/journal/mp 6215
6216 SEGMENTATION OF PELVIC FLOOR ULTRASOUND
for the bladder neck position, bladder descent and rectal ampulla descent
when compared to manual measurements, indicating high reproducibility. The
proposed algorithms required approximately 1.27 s to analyze one image. The
manual ground truths were performed by a single expert operator. In addition,
due to high operator dependency for TPUS image collection, we would need to
pursue further studies with images collected from multiple operators.
Conclusions: Based on our search in scientific databases (i.e., Web of Sci-
ence, IEEE Xplore Digital Library, Elsevier ScienceDirect and PubMed), this is
the first reported work of an automated segmentation and biometric measure-
ment system for the mid-sagittal plane of 3D TPUS volumes. The proposed
algorithm pipeline can improve the efficiency (1.27 s compared to 15 min man-
ually) and has high reproducibility (high ICC values) compared to manual TPUS
analysis for pelvic floor disorder diagnosis.Further studies are needed to verify
this system’s viability using multiple TPUS operators and multiple experts for
performing manual segmentation and extracting biometrics from the images.
KEYWORDS
3D ultrasound, deep learning, female pelvic floor, medical image segmentation, pelvic organ
prolapse
1INTRODUCTION
Transperineal ultrasound (TPUS) is a valuable imaging
tool for evaluating patients with a variety of pelvic floor
disorders,including pelvic organ prolapse (POP),urinary
and fecal incontinence, voiding dysfunction, and pelvic
floor trauma, among others.1,2 These highly prevalent
disorders can have a detrimental impact on patients’
quality of life.3For example, patients with pelvic organ
prolapse have abnormal descent of one or more pelvic
organs (i.e., bladder, uterus, vagina) through the genial
hiatus, which is often experienced by the patient as a
persistent bothersome bulge protruding from the vagi-
nal opening.2While evaluation of pelvic floor pathology
often involves a clinical pelvic exam, ultrasound imag-
ing of the pelvic floor provides additional quantitative
assessment of pelvic floor structures that can aid in
diagnosis and treatment selection.4
TPUS is performed using a two-dimensional (2D)
or three-dimensional (3D) ultrasound transducer, which
is placed on the perineum to acquire a sonogram or
ultrasound volume of pelvic floor structures.1,5 While
2D TPUS remains common due to the prevalence and
availability of 2D US scanners, 3D US is becoming
increasingly common.6–8 Compared to 2D US, 3D US
allows for pelvic floor assessment in multiple planes
simultaneously, and is thus less operator-dependent in
acquiring the true mid-sagittal plane.1When used for
evaluating pelvic organ prolapse,the anatomic positions
of the pubic symphysis, urethra, bladder, rectum, rectal
ampulla, and puborectalis muscle (represented by the
anorectal angle) are typically captured for analysis in the
mid-sagittal plane.9
The location of anatomical structures are currently
measured manually from the US volumes,which is time-
consuming and requires a reviewer with prior training
in pelvic floor ultrasound interpretation.10–16 An expert
pelvic floor US interpreter may take up to 15 min to
manually segment and analyze a single 2D mid-sagittal
plane image. Moreover, the time required to analyze a
single patient’s data increases when additional images
per patient are taken during pelvic muscle maneuvers,
such as Valsalva, and contraction. Furthermore, prior
studies have identified significant intra-rater variabil-
ity in manual measurement of pelvic floor biometrics
such as bladder neck descent.17 These limitations have
hindered more widespread use of TPUS for evaluat-
ing pelvic floor disorders in both research and clinical
settings. In addition, emerging technologies, such as
patient-specific 3D printed vaginal pessaries designed
using ultrasound biometrics,may require an efficient and
accurate means of analyzing TPUS data. As such,there
is the opportunity and need for automated methods for
biometric extraction, especially for 3D TPUS volumes,
which are increasingly common in use.6–8 Automated
methods are reproducible, quick, less expensive with
regards to human involvement and non-user dependent
compared to manual methods.18 Previous studies have
shown that biometrics from the mid-sagittal plane includ-
ing the bladder neck descent,bladder descent and rectal
ampulla descent are key metrics from determining the
severity of POP.9–16 Previous studies have calculated
the biometrics manually, and no automated biometric
extraction method have been reported for female pelvic
3D US. However, other applications of 3D US, includ-
ing prostate and neonatal cerebral ventricle applications,
have used automated methods for segmenting and
measuring clinically useful biometrics such as ventri-
cle size and prostate volume.19–21 These studies used
the U-Net model as the backbone and benchmark for
their segmentation methods, which had shown success
in 3D US images.19–21 In addition to 3D US, the U-Net
SEGMENTATION OF PELVIC FLOOR ULTRASOUND 6217
model has been successfully used over 2500 times for
various medical image applications and was developed
specifically for small medical image datasets.22–24
This work presents a fast, reliable, and automated
method for acquiring biometrics from the mid-sagittal
plane of 3D US female pelvic floor volumes. The
proposed method consists of an automated con-
volutional neural network (CNN) with an nnU-Net25
architecture to segment structures of interest (i.e., pubic
symphysis, urethra, bladder, rectum, rectal ampulla and
anorectal angle), followed by a segmentation-based
automated biometric extraction system that can iden-
tify and measure useful biometric distances. To our
knowledge, there are no reported studies that use
an automated biometric extraction and segmentation
method for the mid-sagittal plane in 3D US volumes.
Based on our search in scientific databases (i.e.,Web of
Science,IEEE Xplore Digital Library,Elsevier ScienceDi-
rect and PubMed) including the key terms;female pelvic
floor, 3D ultrasound,pelvic organ prolapse, this is the first
reported work of an automated segmentation and bio-
metric measurement system for the mid-sagittal plane
of female pelvic floor 3D US volumes. An automated
system can improve the efficiency and reproducibility
of TPUS analysis, advancing the current methods used
for diagnosis of pelvic floor disorders.
2 MATERIALS AND METHODS
2.1 Description of the dataset
The research ethics board approval was obtained at the
Chinese University of Hong Kong and the data were
available to the University of Guelph through a data-
sharing agreement. The available dataset consisted of
248 3D TRUS volumes from 135 patients who pre-
sented to a tertiary urogynecology clinic with symptoms
of pelvic floor disorders (i.e., POP, urinary incontinence).
The patients from our study were classified using the
pelvic organ prolapse quantification (POP-Q) system.26
There were 10, 93, 25, and 3 patients classified as hav-
ing stage 1, 2, 3 or 4 POP, respectively. Four patients did
not have POP. As a comparison, stage 1 indicates that
the most distal portion of the prolapsed organ is no more
than 1 cm from the hymen,stage 2 is between 1 cm from
and 1 cm past the hymen,stage 3 is more than 1 cm but
less than 2 cm past the hymen and stage 4 is complete
eversion of the organ. In US images, the distance the
organs are from the reference line can provide a method
of determining degree of POP where organs above
the reference line are likely prolapsed and ones below
are less likely prolapsed. Of the 248 3D US volumes,
126 were captured at rest and 122 were captured dur-
ing a Valsalva maneuver. Volumes were capture using
a 3D volumetric probe (RAB4-8, GE Healthcare, IL,
USA). A single expert with prior training in pelvic floor
ultrasound interpretation then identified the mid-sagittal
plane and manually segmented relevant pelvic anatom-
ical structures and markers; these included the pubic
symphysis, urethra, bladder, rectum, rectal ampulla, and
anorectal angle. The five US biometrics extracted from
each image included pubic symphysis horizontal refer-
ence line, bladder descent, bladder neck decent, rectal
ampulla descent and retrovesical angle. These biomet-
rics were also measured manually by the expert based
on the mid-sagittal plane US image. The manual seg-
mentations and biometrics were used as ground truth
when comparing segmentation methods and were used
to validate our automated biometric extraction system.
The methods we examined used the same data split
and the same images in each split to ensure a proper
comparison can be made. The data was split patient-
wise, such that images from the same patient were not
present in multiple sets. This is to prevent the segmen-
tation models from memorizing patient specific features.
We separated the images into train/validation/test sets
with amounts 167/10/71 respectively. The training set
had 91 patients, validation set had five patients and
test set had 39 patients. The rest/Valsalva image split
for the training, validation and test sets were 84/83, 5/5
and 37/34, respectively. The validation set was used for
model tuning and hyperparameter selection.
2.2 Overview of total pipeline
The proposed segmentation and biometric extraction
pipeline are shown in Figure 1. The pipeline used a
nnU-Net25 model for segmentation of the mid-sagittal
plane and biometric extraction to measure distances
of interest from the segmentation outputs. The data
were pre-processed using normalization to produce
images with intensities between 0 and 1. After pre-
processing, the images were fed into the nnU-Net
segmentation where an output segmentation mask was
the result. Biometric extraction method was applied to
the outputs masks in order to extract various biometrics
including the public symphysis horizontal reference line,
bladder descent, bladder neck position, rectal ampulla
descent and retrovesical angle (represents the puborec-
talis muscle). These biometrics were overlayed onto the
segmentation masks and outputted into text format for
visibility.
2.3 nnU-Net segmentation model
The proposed segmentation model was the 2D nnU-
Net,25 which is a general segmentation model that
can adapt and reconfigure itself to produce optimal
results for any given medical image dataset. Tuning
and developing CNN models manually for a specific
medical image application is challenging and requires
much expertise. However, nnU-Net does not require any
expertise and holistically configures the entire segmen-
tation pipeline without manual intervention.The nnU-Net
6218 SEGMENTATION OF PELVIC FLOOR ULTRASOUND
FIGURE 1 Proposed automated segmentation and biometric extraction pipeline. The segmentation model is a 2D nnU-Net network which
segments structures of interest from the 2D mid-sagittal plane. The biometric extraction system accepts the segmentation output, performs
post-processing, and uses an extraction method to measure the biometrics of interest.
model architecture consists of an encoder decoder
framework, where the images are first encoded into a
higher dimensional space then decoded back into the
spatial domain. We chose the nnU-Net for its ability
to adapt well to segmentation tasks with few images
by being data-efficient and making design choices that
have worked on larger and more diverse datasets.
In addition, the nnU-Net model was validated against
ten different datasets from the Medical Segmentation
Decathlon27 and was shown to perform the best overall
compared to all other methods. The nnU-Net works for
datasets with variable image sizes, such as ours, mean-
ing we do not need to down sample and lose image
resolution.
2.4 Segmentation model
hyperparameters
The nnU-Net was tuned automatically to yield the best
configuration for the input data with this configura-
tion described as follows. The nnU-Net first normalized
the image intensities between 0 and 1. The nnU-Net
included 3 ×3 convolutional blocks with the number of
pooling operations per axis being [5 6] for axis [xy],
respectively. A patch size of 192 ×320 was used with
a batch size of eight images. The model was trained
for 1000 epochs at 250 batches per epoch. The Adam
optimizer28 was used with an initial learning rate of 1e-2
that decreased by 1e-6 every epoch. The output activa-
tion layer used a softmax function due to the multi-class
format of our data. The multi-class loss function used a
combination of Dice and cross entropy loses as shown
in Equation (1).
To t a l =dice +ce (1)
In addition to the nnU-Net, we also implemented and
manually tuned a vanilla 2D U-Net as a benchmark
and an Attention U-Net model as a current state-of -art
model.29 For the vanilla 2D U-Net and Attention U-Net,
the images used an image size of 192 ×320 for an
unbiased comparison to the nnU-Net with image value
normalized between 0 and 1. The vanilla U-Net and
Attention U-Net used 3 ×3 convolutional filters with [6
6] pooling operation for axis [xy], respectively. A batch
size of eight with the Adam optimizer28 were used with
a learning rate of 1e-2. The vanilla U-Net and Attention
U-Net used the same Dice and cross entropy loss func-
tion as the nnU-Net.A softmax output activation function
was used. The convolutional filters were initialized with
the He normal distribution.30 Batch normalization and
dropout layers were applied to mitigate overfitting.31
2.5 Post-processing of the
segmentation
Due to the challenges in segmenting small struc-
tures in the US images, post-processing techniques
were applied to improve segmentation accuracy, which
would in turn improve biometric accuracy. Separate
techniques were applied for each structure and these
post-processing techniques (i.e., filter sizes for mor-
phological operations) were tuned iteratively on the
validation set until the optimal techniques were found
using Dice similarity coefficient (DSC) as the tuning
metric. The post-processing methods for each organ
can be seen in Figure 2. All structures required post-
processing except for the bladder and rectum,which did
not suffer from data imbalance. Due to the poor signal-
to-noise ratio, speckling, low quality images and large
data imbalance,structures such as the pubic symphysis
and rectal ampulla may have not been segmented at all
by the automated system, even though they exist in the
ground truth images. In these cases, it is important to
predict their location since the biometrics are dependent
on their segmentations, especially the pubic symphysis
which is required to measure all the biometrics. The
pubic symphysis was first predicted as a circular shape
based on the location of the urethra, if not already
segmented. The location was predicted relative to the
SEGMENTATION OF PELVIC FLOOR ULTRASOUND 6219
FIGURE 2 Overview of post-processing techniques for each structure.Each segmentation structure was first isolated, had the
post-processing technique applied, then added back to create the final segmentation output.An example segmentation output is shown with the
corresponding segmentation legend.
urethra, which it is anatomically adjacent. The average
distance in both the horizontal and vertical dimensions
was calculated from the centroid of the pubic symphysis
and the centroid of the urethra from the training images.
These were calculated to be approximately 7.5 mm
and 2.5 mm horizontally and vertically respectively. The
size of the pubic symphysis was calculated similarly,
based on the training images, with an approximate
size of 19.6 mm2. The location of the pubic symphysis
is most important as the biometrics are determined
based on relative location so a circular shape was used
which was the most similar shape from the ground truth
segmentations. Next, dilation and closing morphological
operations using a filter kernel of size 5 ×5 were used
to join multiple disconnected segmentations. For the
urethra, we first obtained the largest object of that
class using a connected component algorithm, then,
connected it to the bladder if not already joined. The
connection was made by locating the closest points on
both the urethra and bladder segmentations. For the
urethra, which is the most irregularly shaped and one
of the smallest segmented structures, keeping only the
largest component is important as we have shown in
Figure 3where it is common for the urethra to have
multiple disconnected regions based on the poor-quality
US images. The rectal ampulla was first predicted as a
circular shape,if not already present, based on the loca-
tion of the rectum centroid. The process for predicting
the ampulla shape, size, and location was the same as
the pubic symphysis except the rectum was used as the
reference structure since the rectal ampulla is located
inside the rectum. A horizontal and vertical distance
between the rectal ampulla and rectum centroid were
found to be 2.5 mm and 2.5 mm with a predicted size
of 78.5 mm2. The ampulla was then dilated with a 5 ×5
filter kernel and only the largest object was kept.Finally,
the anorectal angle suffers from under segmentation
so we dilated that structure of interest with a 5 ×5 filter
kernel.The post-processed organs were then combined
to form the final segmentation mask.
An example demonstrating the post-processing meth-
ods on one mid-sagittal plane US image is shown in
Figure 3. The post-processing methods can remove
the smaller disconnected urethra and rectal ampulla
segmentations.
2.6 Pelvic floor biometrics
The biometrics of interest included the horizontal refer-
ence line at the level of the pubic symphysis, bladder
neck position distance, bladder descent distance, rec-
tal ampulla descent distance and retrovesical angle.9,12
These measurements and their location on an exam-
ple segmentation output can be seen in Figure 4.The
biometrics of interest are calculated based on previous
established studies that showed correlation between
these key metrics and degree of POP.9–16 The manual-
and algorithm-generated biometrics follow the same
established studies.
The horizontal line at the level of the pubic symphysis
provides a reference with which organ descent can
6220 SEGMENTATION OF PELVIC FLOOR ULTRASOUND
FIGURE 3 Example of before and after post-processing an image. (a) The mid-sagittal plane US image.(b) The ground truth manually
segmented structures. (c) Segmentation output before post-processing. Observe the multiple disconnected urethra and rectal ampulla
segmentations. (d) Segmenta tion output after post-processing.
FIGURE 4 Example of the biometrics, shown in red,measured from the segmentations.(1) is the horizontal reference line at the level of the
public symphysis.(2) is the bladder neck position distance. (3) bladder descent distance. (4) is the rectal ampulla descent distance. (5) is the
retrovesical angle.
SEGMENTATION OF PELVIC FLOOR ULTRASOUND 6221
be measured.9In particular, the bladder neck position
is commonly used to diagnose POP.1The horizontal
line for pubic symphysis was calculated to be the line
perpendicular to the US transducer that intersects the
pubic symphysis object centroid. The bladder neck
decent was calculated to be the distance between
the reference line and the point at which the urethra
and bladder were connected. If multiple pixels were
connected, the center most point was calculated and
returned. To calculate the bladder decent distance, the
point of the bladder posterior to the urethra with the
largest decent was identified, with the largest descent
meaning the most inferior anatomical point. For images
with minimal bladder descent (such as for patients
without POP and in the rest position), we located the
point on the bladder most posterior to the urethra. If
multiple points were present, the one most inferior point
was measured as the leading edge of the prolapse.
The rectal ampulla distance was measured to be the
distance between the reference line and rectal ampulla
centroid. Finally, the retrovesical angle was calculated
to be the intersecting angle of a line running through
the urethra centroid and bladder neck point with a line
running between the bladder neck point and the bladder
descent point. All biometrics were extracted automat-
ically from the segmentation masks and required no
user intervention.
2.7 Resources
The resources used for this study were provided by
Compute Canada and the SciNet HPC Consortium32,33
and included the Intel Skylake CPU (2.4 GHz) and
Nvidia V100-SXM2-32GB GPU.The code was written in
Python and used both TensorFlow and PyTorch back-
end for the deep learning implementation.34,35 Visual
analysis was performed with help from the 3D Slicer
open-source software (www.slicer.org).36
2.8 Evaluation metrics
We evaluated our methods by comparing both the seg-
mentation masks and biometric measurements with
expert manual segmentations. First, we compared the
predicted segmentation masks with the manual seg-
mentation using the DSC as shown in Equation (2).
The DSC describes the degree of spatial overlap
between segmentations and is commonly used in image
segmentation.37 We measured DSC for each class
separately in %.
DSC (m)=2N
i=1[yi
yi]
N
i=1[yi]+N
i=1[
yi]
×100,(2)
where Nis the number of pixels in the current image (m),
yiis the ground truth pixel value of pixel i, and
yiis the
predicted pixel value of pixel i. To further compare the
segmentations,we used our biometric extraction method
to calculate the five biometrics of interest on the ground
truth segmentations and the predicted segmentations.
We then compared the biometrics using the absolute dif-
ference (AD),shown in Equation (3),between the ground
truth and predicted distances/angles.
AD =|
|
|BM
yBMy|
|
|,(3)
where BM
yis the biometric value using our extraction
method on the predicted segmentation mask and BMy
is the biometric value using our extraction method on
the ground truth segmentation mask. To compare the
2D location of the bladder neck, bladder descent and
rectal ampulla points,we calculated the Euclidean differ-
ence (ED) between the ground truth and predicted 2D
locations, as shown in Equation (4).
ED =BM(u, v )
yBM(u, v )y2,(4)
where BM(u, v )
yis the 2D location of the given biometric
using our extraction method on the predicted segmen-
tation mask and BM(u, v )yis the 2D location of the
given biometric value using our extraction method on the
ground truth segmentation mask. uand vare the hori-
zontal and vertical biometric locations in the US image
respectively. The lower the AD and ED, the better the
result.
To verify the biometric extraction method, we com-
pared the biometrics our system measured from the
ground truth segmentations to manual ground truth bio-
metrics measured by an expert on the US volume.
In addition, we compared the biometrics our system
measured from the nnU-Net automated segmentation
method to manual ground truth biometrics. The compar-
isons were made using the AD as well as the ICC and
the 95% confidence interval (CI).38 The ICC is used to
evaluate reliability, extent to which measurements can
be replicated, between the automated biometric and
manual methods.
Since manual estimations are used as the gold
standard to quantify the results of our methods, we per-
formed a variability study on the manually-generated
measurements obtained from the expert. To this end,
the expert calculated the biometrics from five different
images, three times each. The standard deviation for
every biometric was calculated across each of the three
runs.Finally, the average from the five images was found.
Two statistical tests were conducted on the segmenta-
tion results to compare performance statistically. These
tests included the paired t-test,used to compare the bio-
metric measurements,and the non-parametric Wilcoxon
signed-rank test,used for DSC comparison.39 These are
6222 SEGMENTATION OF PELVIC FLOOR ULTRASOUND
FIGURE 5 Example of segmentation and biometric outputs for a patient in Valsalva maneuver.(a) The original mid-sagittal plane US image.
(b) The manually segmented structures where the pubic symphysis is purple, urethra is dark blue,bladder is light blue, rectum is yellow, rectal
ampulla is orange and anorectal angle is brown. (c) The segmentations using vanilla 2D U-Net. (d) The segmentations using nnU-Net.
used to compare the results of the three tested mod-
els including 2D U-Net, Attention U-Net, and nnU-Net.
Hypothesis testing with p<0.05 was used to determine
significance.
3RESULTS
Visualizations of segmented structures generated by
the nnU-Net, vanilla 2D U-Net and manually, are shown
in Figures 5and 6. The segmentations and biometrics
(colored lines) were overlayed onto the US image. A
summary of quantitative results for the validating the
segmentations are shown in Table 1and an example
of a training/validation loss graph is shown in Figure S1.
We trained the nnU-Net,vanilla 2D U-Net and Attention
U-Net segmentation models three times, averaging the
results from each test because of the stochastic nature
of CNN training. The results represent the mean for the
71 test images across the three training runs.In addition,
the standard deviation is reported between the three
runs. The results for verifying our automated biometric
extraction method are shown in Tables 2and 3.The
biometric validation results used 68 of the test images,
given the remaining three images did not have their
biometrics measured manually. The biometric extraction
method is deterministic. The pubis symphysis horizon-
tal reference line was not compared for the biometric
extraction since it is only used as a reference point and
is not a distance metric itself.
We measured the time required to run inference over
the entire pipeline after acquiring the mid-sagittal plane.
On average the time required to read an image and seg-
ment using the nnU-Net model is 220 ms (on a GPU).
The nnU-Net post-processing and biometric extraction
processes required 1.05 s on average per image (on a
CPU). The total time required to read, segment, post-
process and measure the biometrics on an image was
1.27 s per image. This time could be improved if all
processes were performed on the GPU. In addition to
running inference, the training time for one epoch of the
nnU-Net was approximately 42 s on average.
From the variability study, the bladder neck descent,
bladder descent, rectal ampulla descent and retrovesi-
cal angle were calculated to have average standard
deviations of 1.1 mm, 1.1 mm, 1.5 mm,and 4.8respec-
tively. The variability in the measurements were found
to be low when compared to the AD for these same
biometrics.
4DISCUSSION
A fast, accurate, and fully automated method for seg-
mentation of multiple female pelvic structures and
biometric extraction was developed using deep learning
and segmentation-based heuristics on the mid-sagittal
plane of the female pelvic floor from 3D US volumes.
A segmentation model was used for measuring the bio-
metrics, rather than measure the metrics directly from
SEGMENTATION OF PELVIC FLOOR ULTRASOUND 6223
FIGURE 6 Example of segmentation and biometric outputs for a patient in rest position. (a) The original mid-sagittal plane US image.(b)
The manually segmented structures where the pubic symphysis is purple, urethra is dark blue, bladder is light blue, rectum is yellow, rectal
ampulla is orange and anorectal angle is brown. (c) The segmentations using vanilla 2D U-Net. (d) The segmentations using nnU-Net.
TABLE 1 Summary of the segmentation results for 71 test images. The segmentations from the vanilla 2D U-Net, Attention U-Net, and
nnU-Net were compared to manual segmentations based on the DSC, AD and ED.The results represent the mean of every image across each
training run and the standard deviation between the three runs with the best results in bold and second-best underlined. The DSC were
measured in % while AD and ED were measured in mm, except the retrovesical angle was measured in .The pvalues are reported comparing
all models against each other per metric with statistical significance noted in bold. We called the 2D U-net, Attention U-Net, and nnU-Net models
1, 2, and 3, respectively when comparing pvalues.
Metrics
2D U-Net
(Model 1)
Attention
U-Net
(Model 2)
nnU-Net
(Model 3)
pvalue
(Model 1 vs.
Model 2)
pvalue
(Model 2 vs.
Model 3)
pvalue
(Model 1 vs.
Model 3)
DSC (%) Pubic Symphysis 50.6 ±1.1 52. 0 ±0.3 54.6 ±0.7 0.3 0.04 0.003
Urethra 57. 7 ±1.6 57.1 ±0.7 61.0 ±1.2 0.8 0.004 0.1
Bladder 83.9 ±0.6 84. 4 ±0.3 87.4 ±0.1 0.8 <0.0001 <0.0001
Rectum 62. 9 ±2.1 61.1 ±0.6 68.5 ±0.1 0.08 <0.0001 <0.0001
Rectal ampulla 27.7 ±1.6 29. 3 ±1.1 33.7 ±0.5 0.2 0.02 0.005
Anorectal angle 38.1 ±0.9 39. 6 ±0.5 49.2 ±0.1 0.2 <0.0001 <0.0001
AD (mm) Pubic symphysis
horizontal ref. line
3.0 ±0.5 3. 0 ±0.1 2.6 ±0.1 0.5 0.1 0.1
Bladder neck position 4. 0 ±0.6 4.1 ±0.3 3.8 ±0.2 0.3 0.2 0.3
Bladder descent 5.1 ±0.7 5. 1 ±0.6 3.9 ±0.4 0.5 0.04 0.02
Rectal ampulla descent 6.3 ±0.5 6. 2 ±0.5 5.8 ±0.1 0.4 0.2 0.2
Retrovesical angle ()31. 9 ±3.3 32.1 ±3.3 29.6 ±1.4 0.5 0.1 0.1
ED (mm) Bladder neck position 3. 8 ±0.2 3.9 ±0.1 3.2 ±0.5 0.4 0.03 0.04
Bladder descent 5. 7 ±0.8 6.4 ±0.8 3.6 ±0.5 0.06 0.0006 0.002
Rectal ampulla descent 7. 9 ±0.2 8.1 ±0.5 7.3 ±0.1 0.3 0.06 0.1
6224 SEGMENTATION OF PELVIC FLOOR ULTRASOUND
TABLE 2 Summary of the biometric extraction method results for 68 of the test images. These results compare the biometrics extracted
from the ground truth segmentations to manual ground truth biometrics.The results are of the AD in mm and ICC with its 95% CI. The AD
values are the mean ±standard deviation across all images.
Metrics
Bladder neck
position
Bladder
descent
Rectal ampulla
descent
Retrovesical
angle ()
AD (mm) 3.2 ±2.8 4.5 ±4.1 5.2 ±3.6 27.3 ±18.9
ICC [95% CI] 0.95 [0.88–0.98] 0.95 [0.91–0.97] 0.84 [0.45–0.93] 0.50 [0.30–0.66]
TABLE 3 Summary of the biometric extraction method results for 68 of the test images. These results compare the biometrics extracted
from the nnU-Net automated segmentations to manual ground truth biometrics. The results are of the AD in mm and ICC with its 95% CI. The
AD values are the mean ±standard deviation across all images.
Metrics
Bladder neck
position
Bladder
descent
Rectal ampulla
descent
Retrovesical
angle ()
AD (mm) 4.7 ±4.1 5.0 ±4.6 6.6 ±5.5 31.5 ±24.9
ICC [95% CI] 0.89 [0.82–0.93] 0.93 [0.89–0.96] 0.70 [0.51–0.82] 0.50 [0.30–0.66]
the US image, since US images have poor signal-to-
noise ratio,ill-defined object boundaries and contain US
speckle.20,40 Another benefit of using a segmentation-
based biometric extraction method is structures with
relatively low DSC accuracy (such as the rectal ampulla
with 33.7%) can still achieve a reasonable biometric
measurement (such as the rectal ampulla descent with
an ICC of 0.84) as shown in Tables 1and 2.Our
proposed pipeline can reduce the time required for
measuring the biometrics, from 15 min manually by an
expert to 1.27 s automatically, and provide a non-user
dependent segmentation and biometric system. The
proposed methods provide a quantitative assessment
of pelvic floor anatomy, which may aid in the diagno-
sis and treatment selection for pelvic floor disorders.
To our knowledge we are the first to develop a fully
automated segmentation and biometric extraction sys-
tem on the mid-sagittal plane and include images from
both the rest position and Valsalva maneuver. We vali-
dated our methods using multiple distance and overlap
based methods. In addition, the biometrics extraction
method was compared to manual measurements and
were found to mostly provide excellent agreement.
The quantitative results indicated that the nnU-Net
was the superior segmentation method when com-
pared to the vanilla 2D U-Net and Attention U-Net as
demonstrated in Table 1. The nnU-Net outperformed
in both distance- and region-based metrics. As shown
in Table 1, the DSC values for the segmentations by
the nnU-Net model were significantly different compared
to those for the 2D U-Net and Attention U-Net, except
the urethra segmentation. For the bladder descent AD,
bladder neck position ED and bladder descent ED, the
metrics reported by the nnU-Net model were signifi-
cantly different to those of the 2D U-Net and Attention
U-Net. We found no significant difference in the met-
rics reported for the 2D U-Net and Attention U-Net
segmentations for all the tested metrics. We found our
biometric extraction method was comparable to manual
measurements on the mid-sagittal plane. Our methods
for measuring the bladder neck position and bladder
descent showed excellent agreement with the manual
measurements (ICC >0.9) while the rectal ampulla
measurements were deemed good when compared
to manual (0.9 >ICC >0.75) as demonstrated in
Table 2. Finally, our methods yielded moderate agree-
ment to ground truth with regards to the retrovesical
angle (0.75 >ICC >0.5) as shown in Table 2. These
ICC agreement intervals are as stated in the work by
Koo and Li.38 The reason for this difference between
manual and automated measurements are that the
manual measurements are based on the US images
themselves, not the segmentations, and the automated
methods rely on the quality of the segmentations from
the deep learning models.
A previous work by Dietz et al.12 measured the pelvic
organ descent of 825 women with symptoms of POP
and found the mean bladder descent and rectocele
descent to be 6.2 ±17.4 mm and 8.7 ±14.3 mm,
respectively.Moreover,a study by Volløyhaug et al.11 with
581 females (68 with symptomatic POP and 513 without
symptomatic POP) found the bladder descent and rec-
tal ampulla descent to be in the range of 31.0 mm to
26.8 mm and 26.6 mm to 23.5 mm for females with
POP symptoms and 35.1 mm to 33.0 mm and 33.5
to 30.9 mm for females without POP symptoms respec-
tively. Shown in Table 3, our automated methods, when
compared to manual measurements, have an average
AD error of 5.0 ±4.6 mm and 6.6 ±5.5 mm for blad-
der descent and rectal ampulla descent.Based on these
previous studies with many participants, we believe the
biometric error in our automated system is less than
the expected variability ranges for women with or with-
out pelvic floor dysfunction symptoms and would then
have the capacity to measure pelvic floor biometrics
with sufficient accuracy. When comparing measurement
variability to previous work by Brækken et al.,17 they
found an intra-user variability ICC of 0.86 for the bladder
SEGMENTATION OF PELVIC FLOOR ULTRASOUND 6225
FIGURE 7 Example of a nnU-Net segmentation from a patient in the Valsalva maneuver with the urethra mis-segmented (a) The
mid-sagittal plane US image. (b) The manually segmented structures where the pubic symphysis is purple, urethra is dark blue, bladder is light
blue, rectum is yellow, rectal ampulla is orange and anorectal angle is brown. (c) The nnU-Net output with the urethra mis-segmented.The
urethra was incorrectly identified, leading to poor results for the bladder descent distance, bladder neck position distance and retrovesical angle.
neck position when measured twice by the same expert.
Our system with automatically generated segmenta-
tions had an ICC of 0.89 for the bladder neck position
when compared to manual measurements, as shown in
Table 3.Therefore,our proposed method is more reliable
than the same expert performing measurements at two
different time intervals.
This study had several limitations including the depen-
dence of the biometrics on the quality of the segmenta-
tion,low number of images used in our study and that we
only used one view plane for our segmentation method.
In Figure 7,an example is presented of the urethra being
mis-segmented by the nnU-Net model. A poor urethra
segmentation can lead to poor biometrics as the bladder
neck position and retrovesical angle rely heavily on the
urethra segmentation. In this example, the bladder neck
position AD was 7 mm and retrovesical angle AD was
47.8, both worse than the average. This indicated that
in order to have a reliable biometric, the segmentations
need to be accurate.For US images,this can be difficult
due to the ill-define boundaries and poor signal-to-noise
ratio.
In addition to image quality, our images had large
patient variability, we used a low number of mid-sagittal
images and the manual segmentations were performed
by one expert.Typical CNN models require thousands of
images, something of which is difficult to acquire in the
medical domain.We used the nnU-Net model purposely
because of its ability to optimize the hyperparameters
for small datasets. The location of the structures, such
as urethra, vary immensely depending on the patient
making a CNN model with high generalizability much
more difficult to develop,albeit more valuable to use clin-
ically. Manual segmentations and measurements from
one expert can introduce bias and as such, is a limitation.
Future studies, will investigate inter-observer variability
to quantify the error between multiple experts and com-
pare our methods to US images acquired by multiple
operators. In addition, due to high operator dependency
for TPUS image collection, we would need to perform
further studies to compare our methods when images
are collected from multiple operators.
We were able to capture much relevant anatomy and
detail using only the mid-sagittal plane but this plane
may not be the optimal viewing plane for all structures.
For example, the rectal ampulla is much easier to seg-
ment when provided with the coronal plane. Because
we only used the sagittal view, our DSC scores for this
anatomical structure were low in general. Our future
work will be to extend this problem to 3D segmentation
by including 3D annotations.The expectation is the addi-
tional information would make segmenting structures
6226 SEGMENTATION OF PELVIC FLOOR ULTRASOUND
such as the rectal ampulla easier, thus improving the
results. 3D segmentations would also allow for identify-
ing the mid-sagittal plane automatically (segmentation-
based), without having to be performed manually prior
to segmentation and extracting the biometrics.
5CONCLUSIONS
We propose a fast and automated deep learning-based
segmentation and biometric extraction system for the
mid-sagittal plane of female pelvic floor 3D US volumes.
Our segmentation model consisted of a nnU-Net net-
work and a deterministic biometric extraction method for
measuring the biometrics of interest. We demonstrated
reasonable accuracy when compared to previous lit-
erature and moderate to excellent agreement when
compared to manual measurements. Our results were
reported on 248 3D US volumes which included patients
in both the rest position and Valsalva maneuver. Based
on our search in scientific databases (i.e., Web of Sci-
ence,IEEE Xplore Digital Library,Elsevier ScienceDirect
and PubMed) including the key terms;female pelvic floor,
3D ultrasound, pelvic organ prolapse, this is the first
reported work of an automated segmentation and bio-
metric measurement system for the mid-sagittal plane
of female pelvic 3D US volumes.Our automated system
can improve the efficiency (1.27 s compared to 15 min
manually) and has high reproducibility (high ICC values)
compared to manual TPUS analysis,which will increase
accessibility and enable more widespread application
of this imaging technique for the assessment of pelvic
floor disorders. However, further studies using multiple
TPUS operators and manual segmentation and bio-
metrics from multiple experts are needed to verify our
system’s viability for unseen data.
ACKNOWLEDGEMENTS
This study was supported by the Mitacs Accelerate pro-
gram, NSERC Discovery grant and Cosm Medical. ZS
acknowledges scholarship support from Ontario Grad-
uate Scholarship (OGS). This research was enabled in
part by support provided by Compute Canada.Compu-
tations were performed on the Mist supercomputer at
the SciNet HPC Consortium. SciNet is funded by: the
Canada Foundation for Innovation; the Government of
Ontario;Ontario Research Fund—Research Excellence;
and the University of Toronto.
CONFLICT OF INTEREST STATEMENT
Ahmed Eltahawi and Golafsoun Ameri are full-time
employees at Cosm Medical. Christopher Hong is an
advisor/consultant for Cosm Medical.
DATA AVAILABILITY STATEMENT
Data is not available for sharing.
REFERENCES
1. Dietz HP. Ultrasound imaging of the pelvic floor. Part I: two-
dimensional aspects. Ultrasound Obstet Gynecol. 2004;23(1):80-
92. doi: 10.1002/uog.939
2. Kuncharapu I, Majeroni BA, Johnson DW. Pelvic organ pro-
lapse. Am Fam Physician. 2010;81(9):1111-1117. doi: 10.1891/
9780826159311.0017q
3. Braverman M,Atan IK, Turel F,Friedman T,Dietz HP.Does patient
posture affect the ultrasound evaluation of pelvic organ pro-
lapse? J Ultrasound Med. 2019;38(1):233-238. doi: 10.1002/jum.
14688
4. Ameri G, Barker KC, Sham D, Fenster A, Mechanically-assisted
3D ultrasound scanner for Urogynecological applications: pre-
liminary results. In: Proc. SPIE 11319, Medical Imaging 2020:
Ultrasonic Imaging and Tomography, 113190U. 2020:31. doi: 10.
1117/12.2550574
5. Schaer GN, Koechli OR, Schuessler B, Haller U. Perineal
ultrasound for evaluating the bladder neck in urinary stress
incontinence. Obstet Gynecol. 1995;85(2):220-224.
6. Noort F, Manzini C, Vaart CH, Limbeek MAJ, Slump CH, Grob
ATM. Automatic identification and segmentation of the slice of
minimal hiatal dimensions in transperineal ultrasound volumes.
Ultrasound Obstet Gynecol. 2022;60:507-576. doi: 10.1002/uog.
24810
7. Manzini C, van den Noort F, Grob ATM, Withagen MIJ, Slump
CH, van der Vaart CH. Appearance of the levator ani muscle
subdivisions on 3D transperineal ultrasound. Insights Imaging.
2021;12(1). doi: 10.1186/s13244-021-01037-y
8. Dietz HP. Ultrasound imaging of the pelvic floor. Part II: three-
dimensional or volume imaging. Ultrasound Obstet Gynecol.
2004;23(6):615-625.doi: 10.1002/uog.1072
9. Dietz HP. Pelvic floor ultrasound: a review. Am J Obstet Gynecol.
2010;202(4):321-334.doi: 10.1016/j.ajog.2009.08.018
10. Gao Y, Zhao Z, Yang Y, Zhang M, Wu J, Miao Y. Diagnostic value
of pelvic floor ultrasonography for diagnosis of pelvic organ pro-
lapse: a systematic review. Int Urogynecol J. 2019;31(1):15-33.
doi: 10.1007/s00192-019-04066-w
11. Volløyhaug I, Rojas RG, Mørkved S, Salvesen KÅ. Comparison
of transperineal ultrasound with POP-Q for assessing symptoms
of prolapse. Int Urogynecol J. 2019;30:595-602. doi: 10.1007/
s00192-018-3722-3
12. Dietz HP, Kamisan Atan I, Salita A. Association between ICS
POP-Q coordinates and translabial ultrasound findings: implica-
tions for definition of “normal pelvic organ support”. Ultrasound
Obstet Gynecol. 2016;47(3):363-368.doi: 10.1002/uog.14872
13. Lone F, Sultan AH, Stankiewicz A, Thakar R. The value of
pre-operative multicompartment pelvic floor ultrasonography: a
1-year prospective study. Br J Radiol. 2014;87. doi: 10.1259/bjr.
20140145
14. Arian A, Ghanbari Z, Chegini N, Hosseiny M. Agreement of
ultrasound measures with POP-Q in patients with pelvic organ
prolapse.Iran J Radiol.2018;15(4).doi:10.5812/iranjradiol.68461
15. Lone FW, Thakar R, Sultan AH, Stankiewicz A. Accuracy of
assessing pelvic organ prolapse quantification points using
dynamic 2D transperineal ultrasound in women with pelvic organ
prolapse. Int Urogynecol J. 2012;23:1555-1560. doi: 10.1007/
s00192-012-1779-y
16. Broekhuis SR, Kluivers KB, Hendriks JCM, Fütterer JJ, Barentsz
JO, Vierhout ME. POP-Q, dynamic MR imaging, and perineal
ultrasonography: do they agree in the quantification of female
pelvic organ prolapse? Int Urogynecol J. 2009;20:541-549. doi:
10.1007/s00192-009-0821-1
17. Brækken IH, Majida M, Ellstrøm-Engh M, Dietz HP, Umek W,
K. Test-Retest and intra-observer repeatability of two-, three-
and four-dimensional perineal ultrasound of pelvic floor muscle
anatomy and function. Int Urogynecol J. 2008;19(2):227-235. doi:
10.1007/s00192-007-0408-7
SEGMENTATION OF PELVIC FLOOR ULTRASOUND 6227
18. Trimpl MJ, Primakov S, Lambin P, Stride EPJ, Vallis KA, Gooding
MJ. Beyond automatic medical image segmentation—the spec-
trum between fully manual and fully automatic delineation. Phys
Med Biol. 2022;67. doi: 10.1088/1361-6560/ac6d9c
19. Szentimrey Z, de Ribaupierre S, Fenster A, Ukwatta E. Auto-
matic deep learning-based segmentation of neonatal cerebral
ventricles from 3D ultrasound images. Proc SPIE 11600, Med
Imaging 2021 Biomed Appl Mol Struct Funct Imaging. 2021:1-7.
doi: 10.1117/12.2581749
20. Szentimrey Z,de Ribaupierre S,Fenster A, Ukwatta E.Automated
3D U-net based segmentation of neonatal cerebral ventricles
from 3D ultrasound images. Med Phys. 2022;49(2):1034-1046.
doi: 10.1002/mp.15432
21. Orlando N, Gillies DJ, Gyacskov I, Romagnoli C, D’Souza D,
Fenster A. Automatic prostate segmentation using deep learn-
ing on clinically diverse 3D transrectal ultrasound images. Med
Phys. 2020;47(6):2413-2426. doi: 10.1002/mp.14134
22. Du G, Cao X, Liang J, Chen X, Zhan Y. Medical image seg-
mentation based on U-Net: a review. J Imaging Sci Technol.
2020;64(2):1-12. doi: 10.2352/J.ImagingSci.Technol.2020.64.2.
020508
23. Yin XX, Sun L, Fu Y, Lu R, Zhang Y. U-Net-based medical
image segmentation. J Healthc Eng. 2022;2022. doi: 10.1155/
2022/4189781
24. Wang R,Lei T, Cui R,Zhang B,Meng H,Nandi AK. Medical image
segmentation using deep learning: a survey.IET Image Process.
2022;16(5):1243-1267.doi: 10.1049/ipr2.12419
25. Isensee F, Jaeger PF,Kohl SAA,Petersen J,Maier-Hein KH.nnU-
Net: a self -configuring method for deep learning-based biomed-
ical image segmentation. Nat Methods. 2021;18(2):203-211. doi:
10.1038/s41592-020-01008-z
26. Persu C,Chapple CR,Cauni V, Gutue S,Geavlete P.Pelvic organ
prolapse quantification system (POP-Q) - a new era in pelvic
prolapse staging. J Med Life. 2011;4(1):75-81.
27. Antonelli M, Reinke A, Bakas S, et al. The Medical Segmentation
Decathlon. 2021.
28. Kingma DP, Ba JL, Adam: a method for stochastic optimization.
3rd Int Conf Learn Represent ICLR 2015 - Conf Track Proc.
2015:1-15.
29. Oktay O, Sc hlemper J, Le Folgoc L, et al. Attention U-Net: Learn-
ing where to look for the pancreas. 1st Conf Med Imaging with
Deep Learn. 2018:1-10.
30. He K, Zhang X,Ren S, Sun J, Delving Deep into Rectifiers: Sur-
passing Human-Level Performance on ImageNet Classification.
2015 IEEE Int Conf Comput Vis. 2015:1026-1034.
31. Ioffe S, Szegedy C. Batch normalization: accelerating deep net-
work training by reducing internal covariate shift. Proc 32nd
Int Conf Mach Learn PMLR. 2015;37:448-456. doi: 10.1080/
17512786.2015.1058180
32. Ponce M, Van Zon R, Northrup S, et al. Deploying a top-100
supercomputer for large parallel workloads: The Niagara super-
computer. ACM Int Conf Proceeding Ser. 2019. doi: 10.1145/
3332186.3332195
33. Loken C, Gruner D, Groer L, et al. SciNet: lessons learned from
building a power-efficient top-20 system and data centre. JPhys
Conf Ser. 2010;256(1):012026. doi: 10.1088/1742-6596/256/1/
012026
34. Abadi M, Barham P, Chen J, et al. TensorFlow: A System for
Large-Scale Machine Learning. Proc 12th USENIX Symp Oper
Syst Des Implement (OSDI ’16). 2016:265-283. doi: 10.1016/
0076-6879(83)01039-3
35. Paszke A, Gross S, Massa F, et al. PyTorch: an imperative style,
high-performance deep learning library. Adv Neural Inf Process
Syst. 2019;32:8024-8035.
36. Fedorov A, Beichel R, Kalpathy-Cramer J, et al. 3D Slicer as an
image computing platform for the quantitative imaging network.
Magn Reson Imaging. 2012;30(9):1323-1341. doi: 10.1016/j.mri.
2012.05.001
37. Zou KH, Warfield SK, Bharatha A, et al. Statistical validation of
image segmentation quality based on a spatial overlap index.
Acad Radiol. 2004;11(2):178-189. doi: 10.1016/S1076-6332(03)
00671-8
38. Koo TK, Li MY. A guideline of selecting and reporting intraclass
correlation coefficients for reliability research. J Chiropr Med.
2016;15(2):155-163.doi: 10.1016/j.jcm.2016.02.012
39. Wilcoxon F.Individual comparisons by ranking methods.Biomet-
rics Bull.1945;1(6):80-83. doi: 10.1093/jee/39.2.269
40. Contreras Ortiz SH, Chiu T, Fox MD. Ultrasound image enhance-
ment: a review. Biomed Signal Process Control. 2012;7(5):419-
428. doi: 10.1016/j.bspc.2012.02.002
SUPPORTING INFORMATION
Additional supporting information can be found online
in the Supporting Information section at the end of this
article.
How to cite this article: Szentimrey Z, Ameri G,
Hong CX, Cheung RYK, Ukwatta E, Eltahawi A.
Automated segmentation and measurement of
the female pelvic floor from the mid-sagittal plane
of 3D ultrasound volumes. Med Phys.
2023;50:6215–6227.
https://doi.org/10.1002/mp.16389
... Szentimrey et al. created a nnU-Net segmentation model that could sharpen the efficiency of transperineal ultrasound by reducing the time needed for analysis (1.27 s vs. 15 min) and that had high reproducibility, presenting significant Dice similarity coefficients for the bladder (87.4%), the rectum (68.5%) and the anorectal angle (49.2%) [48,49]. Yin and Wang created an effective CNN algorithm to enhance ultrasonic images' processing, allowing them to measure the effect of pelvic floor rehabilitation training in pregnant women with POP with a sensitivity of 93%, a positive predictive value of 98%, and a Dice coefficient of 81% [49,50]. ...
Article
Full-text available
Artificial intelligence (AI) is the new medical hot topic, being applied mainly in specialties with a strong imaging component. In the domain of gynecology, AI has been tested and shown vast potential in several areas with promising results, with an emphasis on oncology. However, fewer studies have been made focusing on urogynecology, a branch of gynecology known for using multiple imaging exams (IEs) and tests in the management of women’s pelvic floor health. This review aims to illustrate the current state of AI in urogynecology, namely with the use of machine learning (ML) and deep learning (DL) in diagnostics and as imaging tools, discuss possible future prospects for AI in this field, and go over its limitations that challenge its safe implementation.
... 12 In 2023, Zhang et al. achieved automated segmentation of pelvic floor muscles by improving the DenseUnet network. 13 In the same year, Szentimrey et al. used a nnU-Net segmentation model to segment the pubic symphysis, urethra, bladder, rectum, rectal ampulla, and anorectal angle in the mid-sagittal plane of female 3D Transperineal ultrasound (TPUS) volumes, 31 with the segmentation results being used to evaluate pelvic floor function. Convolutional neural networks (CNN) have demonstrated significant achievements in a wide range of medical applications and have played an important role in the diagnosis of various diseases. ...
Article
Full-text available
Background Although the uterus, bladder, and rectum are distinct organs, their muscular fasciae are often interconnected. Clinical experience suggests that they may share common risk factors and associations. When one organ experiences prolapse, it can potentially affect the neighboring organs. However, the current assessment of disease severity still relies on manual measurements, which can yield varying results depending on the physician, thereby leading to diagnostic inaccuracies. Purpose This study aims to develop a multilabel grading model based on deep learning to classify the degree of prolapse of three organs in the female pelvis using stress magnetic resonance imaging (MRI) and provide interpretable result analysis. Methods We utilized sagittal MRI sequences taken at rest and during maximum Valsalva maneuver from 662 subjects. The training set included 464 subjects, the validation set included 98 subjects, and the test set included 100 subjects (training set n = 464, validation set n = 98, test set n = 100). We designed a feature extraction module specifically for pelvic floor MRI using the vision transformer architecture and employed label masking training strategy and pre‐training methods to enhance model convergence. The grading results were evaluated using Precision, Kappa, Recall, and Area Under the Curve (AUC). To validate the effectiveness of the model, the designed model was compared with classic grading methods. Finally, we provided interpretability charts illustrating the model's operational principles on the grading task. Results In terms of POP grading detection, the model achieved an average Precision, Kappa coefficient, Recall, and AUC of 0.86, 0.77, 0.76, and 0.86, respectively. Compared to existing studies, our model achieved the highest performance metrics. The average time taken to diagnose a patient was 0.38 s. Conclusions The proposed model achieved detection accuracy that is comparable to or even exceeds that of physicians, demonstrating the effectiveness of the vision transformer architecture and label masking training strategy for assisting in the grading of POP under static and maximum Valsalva conditions. This offers a promising option for computer‐aided diagnosis and treatment planning of POP.
... In recent years, artificial intelligence has had an increasing impact on automating the analysis of medical imaging data and has gradually been applied to ultrasound [14,15]. The application of artificial intelligence optimizes the workflow of pelvic floor ultrasound, reduces the workload of sonographers, and improves diagnostic efficiency [16,17,18,19]. Van et al. [20] developed a convolutional neural network (CNN) for automatically and reliably segmenting puborectalis muscles and the urogenital hiatus in transperineal ultrasound images of the pelvic floor in the plane of minimal hiatal dimensions. ...
Article
Background: The anal sphincter complex comprises the anal sphincter and the U-shaped deep and superficial puborectalis muscle. As an important supporting structure of the posterior pelvic floor, together with its surrounding tissues and muscles, the anal sphincter complex maintains the normal physiological functions of defecation and continence. Objective: The plane required for diagnosing anal sphincter injury and the diagnosis of anal sphincter integrity through pelvic floor ultrasound are highly dependent on sonographers' experience. We developed a deep learning (DL) tool for the automatic diagnosis of anal sphincter integrity via pelvic floor ultrasound. Methods: A 2D detection network was trained to detect the bounding box of the anal sphincter. The pelvic floor ultrasound image and its corresponding oval mask were input into a 2D classification network to determine the integrity of the anal sphincter. The average precision (AP) and intersection over union (IoU) were used to evaluate the performance of anal sphincter detection. Receiver operating characteristic (ROC) analysis was used to evaluate the performance of the classification model. Results: The Pearson correlation coefficients (r values) of the topmost and bottommost layers detected by the CNN and sonographers were 0.932 and 0.978, respectively. The best DL model yielded the highest area under the curve (AUC) of 0.808 (95% CI: 0.698-0.921) in the test cohort. The results from the CNN agreed well with the diagnostic results of experienced sonographers. Conclusions: We proposed, for the first time, a CNN to obtain the plane required for diagnosing anal sphincter injury on the basis of pelvic floor ultrasound and for preliminarily diagnosing anal sphincter injury.
Article
Full-text available
Background In recent years, the integration of artificial intelligence (AI) techniques into medical imaging has shown great potential to transform the diagnostic process. This review aims to provide a comprehensive overview of current state-of-the-art applications for AI in abdominal and pelvic ultrasound imaging. Methods We searched the PubMed, FDA, and ClinicalTrials.gov databases for applications of AI in abdominal and pelvic ultrasound imaging. Results A total of 128 titles were identified from the database search and were eligible for screening. After screening, 57 manuscripts were included in the final review. The main anatomical applications included multi-organ detection (n = 16, 28%), gynecology (n = 15, 26%), hepatobiliary system (n = 13, 23%), and musculoskeletal (n = 8, 14%). The main methodological applications included deep learning (n = 37, 65%), machine learning (n = 13, 23%), natural language processing (n = 5, 9%), and robots (n = 2, 4%). The majority of the studies were single-center (n = 43, 75%) and retrospective (n = 56, 98%). We identified 17 FDA approved AI ultrasound devices, with only a few being specifically used for abdominal/pelvic imaging (infertility monitoring and follicle development). Conclusion The application of AI in abdominal/pelvic ultrasound shows promising early results for disease diagnosis, monitoring, and report refinement. However, the risk of bias remains high because very few of these applications have been prospectively validated (in multi-center studies) or have received FDA clearance.
Article
Full-text available
Artificial intelligence (AI) has gained prominence in medical imaging, particularly in obstetrics and gynecology (OB/GYN), where ultrasound (US) is the preferred method. It is considered cost effective and easily accessible but is time consuming and hindered by the need for specialized training. To overcome these limitations, AI models have been proposed for automated plane acquisition, anatomical measurements, and pathology detection. This study aims to overview recent literature on AI applications in OB/GYN US imaging, highlighting their benefits and limitations. For the methodology, a systematic literature search was performed in the PubMed and Cochrane Library databases. Matching abstracts were screened based on the PICOS (Participants, Intervention or Exposure, Comparison, Outcome, Study type) scheme. Articles with full text copies were distributed to the sections of OB/GYN and their research topics. As a result, this review includes 189 articles published from 1994 to 2023. Among these, 148 focus on obstetrics and 41 on gynecology. AI-assisted US applications span fetal biometry, echocardiography, or neurosonography, as well as the identification of adnexal and breast masses, and assessment of the endometrium and pelvic floor. To conclude, the applications for AI-assisted US in OB/GYN are abundant, especially in the subspecialty of obstetrics. However, while most studies focus on common application fields such as fetal biometry, this review outlines emerging and still experimental fields to promote further research.
Article
Full-text available
International challenges have become the de facto standard for comparative assessment of image analysis algorithms. Although segmentation is the most widely investigated medical image processing task, the various challenges have been organized to focus only on specific clinical tasks. We organized the Medical Segmentation Decathlon (MSD)—a biomedical image analysis challenge, in which algorithms compete in a multitude of both tasks and modalities to investigate the hypothesis that a method capable of performing well on multiple tasks will generalize well to a previously unseen task and potentially outperform a custom-designed solution. MSD results confirmed this hypothesis, moreover, MSD winner continued generalizing well to a wide range of other clinical problems for the next two years. Three main conclusions can be drawn from this study: (1) state-of-the-art image segmentation algorithms generalize well when retrained on unseen tasks; (2) consistent algorithmic performance across multiple tasks is a strong surrogate of algorithmic generalizability; (3) the training of accurate AI segmentation models is now commoditized to scientists that are not versed in AI model training. International challenges have become the de facto standard for comparative assessment of image analysis algorithms. Here, the authors present the results of a biomedical image segmentation challenge, showing that a method capable of performing well on multiple tasks will generalize well to a previously unseen task.
Article
Full-text available
Semi-automatic and fully automatic contouring tools have emerged as an alternative to fully manual segmentation to reduce time spent contouring and to increase contour quality and consistency. Particularly, fully automatic segmentation has seen exceptional improvements through the use of deep learning in recent years. These fully automatic methods may not require user interactions, but the resulting contours are often not suitable to be used in clinical practice without a review by the clinician. Furthermore, they need large amounts of labeled data to be available for training. This review presents alternatives to manual or fully automatic segmentation methods along the spectrum of variable user interactivity and data availability. The challenge lies to determine how much user interaction is necessary and how this user interaction can be used most effectively. While deep learning is already widely used for fully automatic tools, interactive methods are just at the starting point to be transformed by it. Interaction between clinician and machine, via artificial intelligence, can go both ways and this review will present the avenues that are being pursued to improve medical image segmentation.
Article
Full-text available
Deep learning has been extensively applied to segmentation in medical imaging. U-Net proposed in 2015 shows the advantages of accurate segmentation of small targets and its scalable network architecture. With the increasing requirements for the performance of segmentation in medical imaging in recent years, U-Net has been cited academically more than 2500 times. Many scholars have been constantly developing the U-Net architecture. This paper summarizes the medical image segmentation technologies based on the U-Net structure variants concerning their structure, innovation, efficiency, etc.; reviews and categorizes the related methodology; and introduces the loss functions, evaluation parameters, and modules commonly applied to segmentation in medical imaging, which will provide a good reference for the future research.
Article
Full-text available
Deep learning has been widely used for medical image segmentation and a large number of papers has been presented recording the success of deep learning in the field. A comprehensive thematic survey on medical image segmentation using deep learning techniques is presented. This paper makes two original contributions. Firstly, compared to traditional surveys that directly divide literatures of deep learning on medical image segmentation into many groups and introduce literatures in detail for each group, we classify currently popular literatures according to a multi-level structure from coarse to fine. Secondly, this paper focuses on supervised and weakly supervised learning approaches, without including unsupervised approaches since they have been introduced in many old surveys and they are not popular currently. For supervised learning approaches, we analyse literatures in three aspects: the selection of backbone networks, the design of network blocks, and the improvement of loss functions. For weakly supervised learning approaches, we investigate literature according to data augmentation, transfer learning, and interactive segmentation, separately. Compared to existing surveys, this survey classifies the literatures very differently from before and is more convenient for readers to understand the relevant rationale and will guide them to think of appropriate improvements in medical image segmentation based on deep learning approaches.
Article
Full-text available
Background Intraventricular hemorrhaging (IVH) within cerebral lateral ventricles affects 20–30% of very low birth weight infants (<1500 g). As the ventricles increase in size, the intracranial pressure increases, leading to post‐hemorrhagic ventricle dilatation (PHVD), an abnormal enlargement of the head. The most widely used imaging tool for measuring IVH and PHVD is cranial two‐dimensional (2D) ultrasound (US). Estimating volumetric changes over time with 2D US is unreliable due to high user variability when locating the same anatomical location at different scanning sessions. Compared to 2D US, three‐dimensional (3D) US is more sensitive to volumetric changes in the ventricles and does not suffer from variability in slice acquisition. However, 3D US images require segmentation of the ventricular surface, which is tedious and time‐consuming when done manually. Purpose A fast, automated ventricle segmentation method for 3D US would provide quantitative information in a timely manner when monitoring IVH and PHVD in pre‐term neonates. To this end, we developed a fast and fully automated segmentation method to segment neonatal cerebral lateral ventricles from 3D US images using deep learning. Methods Our method consists of a 3D U‐Net ensemble model composed of three U‐Net variants, each highlighting various aspects of the segmentation task such as the shape and boundary of the ventricles. The ensemble is made of a U‐Net++, attention U‐Net, and U‐Net with a deep learning‐based shape prior combined using a mean voting strategy. We used a dataset consisting of 190 3D US images, which was separated into two subsets, one set of 87 images contained both ventricles, and one set of 103 images contained only one ventricle (caused by limited field‐of‐view during acquisition). We conducted fivefold cross‐validation to evaluate the performance of the models on a larger amount of test data; 165 test images of which 75 have two ventricles (two‐ventricle images) and 90 have one ventricle (one‐ventricle images). We compared these results to each stand‐alone model and to previous works including, 2D multiplane U‐Net and 2D SegNet models. Results Using fivefold cross‐validation, the ensemble method reported a Dice similarity coefficient (DSC) of 0.720 ± 0.074, absolute volumetric difference (VD) of 3.7 ± 4.1 cm³, and a mean absolute surface distance (MAD) of 1.14 ± 0.41 mm on 75 two‐ventricle test images. Using 90 test images with a single ventricle, the model after cross‐validation reported DSC, VD, and MAD values of 0.806 ± 0.111, 3.5 ± 2.9 cm³, and 1.37 ± 1.70 mm, respectively. Compared to alternatives, the proposed ensemble yielded a higher accuracy in segmentation on both test data sets. Our method required approximately 5 s to segment one image and was substantially faster than the state‐of‐the‐art conventional methods. Conclusions Compared to the state‐of‐the‐art non‐deep learning methods, our method based on deep learning was more efficient in segmenting neonatal cerebral lateral ventricles from 3D US images with comparable or better DSC, VD, and MAD performance. Our dataset was the largest to date (190 images) for this segmentation problem and the first to segment images that show only one lateral cerebral ventricle.
Article
Full-text available
Objectives: Automatic selection and segmentation of the slice of minimal hiatal dimensions (SMHD) in transperineal ultrasound (TPUS) volumes. Methods: The SMHD was manually selected and the urogenital hiatus (UH) segmented in TPUS volumes of 116 women with symptomatic pelvic organ prolapse (POP). These data were used to train two deep learning algorithms: the first one provides an estimation of the position of the SMHD. Based on this estimation a slice is selected and fed into the second algorithm, which automatically segments the UH. From this segmentation measurements of hiatal area (HA), anteroposterior (APD) and coronal (CD) diameter are computed. The mean absolute distance between manually and automatically selected SMHD, the overlap (dice similarity index (DSI)) between manual and automatic UH segmentation and the intraclass correlation coefficient (ICC) between manual and automatic UH measurements were assessed on a testset of 30 TPUS volumes. Results: The mean absolute distance between manually and automatically selected SMHD was 0.20 cm. DSI values between manual and automatic segmentation were all above 0.85. The ICC values and 95% confidence interval between manual and automatic levator hiatus measurements were 0.94 (0.87-0.97) for levator HA, 0.92 (0.78-0.97) for APD and 0.82 (0.66-0.91) for CD. Conclusions: Our deep learning algorithms allow for reliable automatic selection and segmentation of the SMHD in TPUS volumes of women with symptomatic POP. These algorithms can be implemented in the software of TPUS machines, thus reducing clinical analysis time and easing the examination of TPUS data for research or clinical purposes. This article is protected by copyright. All rights reserved.
Article
Full-text available
Background: The levator ani muscle (LAM) consists of different subdivisions, which play a specific role in the pelvic floor mechanics. The aim of this study is to identify and describe the appearance of these subdivisions on 3-Dimensional (3D) transperineal ultrasound (TPUS). To do so, a study designed in three phases was performed in which twenty 3D TPUS scans of vaginally nulliparous women were assessed. The first phase was aimed at getting acquainted with the anatomy of the LAM subdivisions and its appearance on TPUS: relevant literature was consulted, and the TPUS scan of one patient was analyzed to identify the puborectal, iliococcygeal, puboperineal, pubovaginal, and puboanal muscle. In the second phase, the five LAM subdivisions and the pubic bone and external sphincter, used as reference structures, were manually segmented in volume data obtained from five nulliparous women at rest. In the third phase, intra- and inter-observer reproducibility were assessed on twenty TPUS scans by measuring the Dice Similarity Index (DSI). Results: The mean inter-observer and median intra-observer DSI values (with interquartile range) were: puborectal 0.83 (0.13)/0.83 (0.10), puboanal 0.70 (0.16)/0.79 (0.09), iliococcygeal 0.73 (0.14)/0.79 (0.10), puboperineal 0.63 (0.25)/0.75 (0.22), pubovaginal muscle 0.62 (0.22)/0.71 (0.16), and the external sphincter 0.81 (0.12)/0.89 (0.03). Conclusion: Our results show that the LAM subdivisions of nulliparous women can be reproducibly identified on 3D TPUS data.
Article
Full-text available
Biomedical imaging is a driver of scientific discovery and a core component of medical care and is being stimulated by the field of deep learning. While semantic segmentation algorithms enable image analysis and quantification in many applications, the design of respective specialized solutions is non-trivial and highly dependent on dataset properties and hardware conditions. We developed nnU-Net, a deep learning-based segmentation method that automatically configures itself, including preprocessing, network architecture, training and post-processing for any new task. The key design choices in this process are modeled as a set of fixed parameters, interdependent rules and empirical decisions. Without manual intervention, nnU-Net surpasses most existing approaches, including highly specialized solutions on 23 public datasets used in international biomedical segmentation competitions. We make nnU-Net publicly available as an out-of-the-box tool, rendering state-of-the-art segmentation accessible to a broad audience by requiring neither expert knowledge nor computing resources beyond standard network training.