ArticlePDF Available

Feasibility of a deep learning-based diagnostic platform to evaluate lower urinary tract disorders in men using simple uroflowmetry

Authors:

Abstract and Figures

Purpose: To diagnose lower urinary tract symptoms (LUTS) in a noninvasive manner, we created a prediction model for bladder outlet obstruction (BOO) and detrusor underactivity (DUA) using simple uroflowmetry. In this study, we used deep learning to analyze simple uroflowmetry. Materials and methods: We performed a retrospective review of 4,835 male patients aged ≥40 years who underwent a urodynamic study at a single center. We excluded patients with a disease or a history of surgery that could affect LUTS. A total of 1,792 patients were included in the study. We extracted a simple uroflowmetry graph automatically using the ABBYY Flexicapture® image capture program (ABBYY, Moscow, Russia). We applied a convolutional neural network (CNN), a deep learning method to predict DUA and BOO. A 5-fold cross-validation average value of the area under the receiver operating characteristic (AUROC) curve was chosen as an evaluation metric. When it comes to binary classification, this metric provides a richer measure of classification performance. Additionally, we provided the corresponding average precision-recall (PR) curves. Results: Among the 1,792 patients, 482 (26.90%) had BOO, and 893 (49.83%) had DUA. The average AUROC scores of DUA and BOO, which were measured using 5-fold cross-validation, were 73.30% (mean average precision [mAP]=0.70) and 72.23% (mAP=0.45), respectively. Conclusions: Our study suggests that it is possible to differentiate DUA from non-DUA and BOO from non-BOO using a simple uroflowmetry graph with a fine-tuned VGG16, which is a well-known CNN model.
Content may be subject to copyright.
301
Feasibility of a deep learning-based diagnostic
platform to evaluate lower urinary tract disorders
in men using simple uroflowmetry
Seokhwan Bang1,*,† , Sokhib Tukhtaev2,* , Kwang Jin Ko1, Deok Hyun Han1, Minki Baek1,
Hwang Gyun Jeon1, Baek Hwan Cho2, Kyu-Sung Lee1
1Department of Urology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, 2Medical AI Research Center, Samsung Medical Center,
Sungkyunkwan University School of Medicine, Seoul, Korea
Purpose: To diagnose lower urinary tract symptoms (LUTS) in a noninvasive manner, we created a prediction model for bladder
outlet obstruction (BOO) and detrusor underactivity (DUA) using simple uroflowmetry. In this study, we used deep learning to ana-
lyze simple uroflowmetry.
Materials and Methods: We performed a retrospective review of 4,835 male patients aged ≥40 years who underwent a urody-
namic study at a single center. We excluded patients with a disease or a history of surgery that could affect LUTS. A total of 1,792
patients were included in the study. We extracted a simple uroflowmetry graph automatically using the ABBYY Flexicapture® im-
age capture program (ABBYY, Moscow, Russia). We applied a convolutional neural network (CNN), a deep learning method to pre-
dict DUA and BOO. A 5-fold cross-validation average value of the area under the receiver operating characteristic (AUROC) curve
was chosen as an evaluation metric. When it comes to binary classification, this metric provides a richer measure of classification
performance. Additionally, we provided the corresponding average precision-recall (PR) curves.
Results: Among the 1,792 patients, 482 (26.90%) had BOO, and 893 (49.83%) had DUA. The average AUROC scores of DUA
and BOO, which were measured using 5-fold cross-validation, were 73.30% (mean average precision [mAP]=0.70) and 72.23%
(mAP=0.45), respectively.
Conclusions: Our study suggests that it is possible to differentiate DUA from non-DUA and BOO from non-BOO using a simple uro-
flowmetry graph with a fine-tuned VGG16, which is a well-known CNN model.
Keywords: Artificial intelligence; Bladder outlet obstruction; Detrusor underactivity; Lower urinary tract symptoms
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted
non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Original Article - Lower Urinary Tract Dysfunction
Received: 9 November, 2021 Revised: 23 January, 2022 Accepted: 24 February, 2022 Published online: 25 March, 2022
Corresponding Author: Kyu-Sung Lee https://orcid.org/0000-0003-0891-2488
Department of Urology, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro, Gangnam-gu, Seoul 06351, Korea
TEL: +82-2-3410-3554, FAX: +82-2-3410-3027, E-mail: ksleedr@skku.edu
Baek Hwan Cho https://orcid.org/0000-0001-7722-5660
Medical AI Research Center, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro, Gangnam-gu, Seoul 06351, Korea
TEL: +82-2-3410-0885, FAX: +82-2-3410-0878, E-mail: baekhwan.cho@samsung.com
*These authors contributed equally to this study and should be considered co-first authors.
Current affiliation: Department of Urology, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea.
The Korean Urological Association www.icurology.org
Investig Clin Urol 2022;63:301-308.
https://doi.org/10.4111/icu.20210434
pISSN 2466-0493 • eISSN 2466-054X
302 www.icurology.org
Bang et al
https://doi.org/10.4111/icu.20210434
INTRODUCTION
Lower urinary tract symptoms ( LUTS) is a common
disease with multif actorial causes. The most common cause
of LUTS in men is benign prostate hyperplasia (BPH). Up
to 50% of men over 50 years of age and 80% of men over
80 years of age experience LUTS caused by BPH [1]. Detru-
sor underactivity (DUA) is another very common cause
of LUTS. One review f ound that between 9% and 28% of
patien ts with LUTS under 50 years of age had DU A, while
48% of those over 70 years of age had DUA [2]. LUTS is a
concept that includes voiding dysf unction and storage dys-
f unction, each f eature represented by DUA and bladder out-
let obstruction (BOO), respectively [3]. It is critical to distin-
guish between these two diseases because their treatments
and clinical responses diff er.
Urodynamic studies (UDSs) are the gold standard f or
the diagnosis and evaluation of LUTS. However, the use of
UDS is limited by its invasiveness. Porru et al. [4] found that
4% to 45% of patients experience UDS complications, mostly
urinary tract inf ection and hematuria. In addition, several
patients report feeling shame and discomf ort during the test
and post-t est anx iety [5].
Simple urof lowmetry, one component of UDS, is a simple,
noninvasive diagnostic screening procedure used to calculate
the flow rate of urine over time. Urof lowmetry produces a
uroflowmetry graph that contains information regarding
the voiding volume and maximum urine flow rate (Qmax) [6].
Several previous trials have attempted to categorize simple
urof lowmetry graphs into several groups; however, there has
been insuf ficient evidence and ob jective standards, including
lack of pressure data, to achieve this end. There is a lack of
evidence that uroflowmetry can distinguish obstructed void-
ing and DUA. However, as we have mentioned, this distinc-
tion is crucial in determining the appropriate treatment for
LUTS.
Medical image analysis, which uses deep learning algo-
rithms, has recently become more popular for developing
technologies such as image recognition [7,8]. Many studies
have used deep learning algorithms to classif y and diagnose
several diseases based on images [9]. For instance, convolu-
tional neural networks (CNNs) have recently f ocused on
optimizing technology for analyzing, patterning, and predict-
ing trends. In 2012, the CNN proposed by Krizhevsky et al.
[10] emphasized its high perf ormance in image recognition
at classification task. Since then, researchers in the medical
domain have been exploiting deep learning algorithms f or
various tasks to fully or partially automate the disease diag-
nosis.
This study sought to develop a f ully automated device
to distinguish DUA and BOO using patterns of simple uro-
f lowmetry with a deep learning method.
MATERIALS AND METHODS
1. Ethics statement
This study was perf ormed at a single center and was
conducted according to the tenets of the Declaration of Hel-
sinki. The Institutional Review Board of Samsung Medical
Center approved this study (approval number: 2019-12-062).
Informed consent was waived by the Institutional Review
Board of Samsung Medical Center (Seoul, Korea) because of
the study’s retrospective design.
2. Patients
We retrospectively reviewed the clinical data of 4,835
men who underwent a pressure-flow study at Samsung
Medi cal Center between December 2006 and December
2017. We analyzed all patients who were ≥40 years of age
and who underwent a pressure-f low study and f ocused on
the pattern of uroflowmetry regardless of storage f unction.
Those with diseases that can af fect lower urinary tract
function, bladder cancer, and prostate cancer were excluded.
Patients who underwent previous prostate, bladder, and/or
urethral surgeries and those with indwelling catheters (or
needing regular catheterization) were also excluded. Patients
with a history of cerebrovascular accident, neurologic dis-
orders, and spinal or pelvic bone trauma that could af f ect
LUTS were excluded. Patients who had voided volumes less
than 150 mL during simple uroflowmetry were also ex-
cluded. Finally, we excluded 77 patients whose study graphs
were insufficient for analysis. Theref ore, 1,792 patients were
ultimately included (Fig. 1).
Fig. 1. Study design. CVA, cerebrovascular accident.
Exclusion criteria
Catheterized: 1,187
CVA history: 274
Bladder, prostate cancer or
lower urinary tract surgery: 664
Voided volume <150 mL: 841
Insufficient test: 77
Screening
n=4,835
Analysis
n=1,792
Enrollment
n=1,869
303
Investig Clin Urol 2022;63:301-308. www.icurology.org
AI-based diagnostic platform to evaluate male LUTS
3. Urodynamic examination
The UDS were performed by experts according to the In-
ternational Continence Society Good Urodynamics Practices
protocol using an Aquarius TT UDS system (Laborie Medi-
cal Technologies, Toronto, ON, Canada) and a DORADO-KT
(Laborie Medical Technologies) [11]. The UDS are recorded
in f our versions (7 Rel Z, 8 Rel A, 11 Rel 6, 12 Rel 0), each of
which has a different output format.
DUA was defined as a bladder contractility index
(BCI=PdetQmax+5Qmax) <100 [12]. BOO was defined as a
BOO index (BOOI=PdetQma x–2Qmax) >40 [12].
4. Data pre-processing
The patients’ personal information and identification
numbers were deleted according to the regulations. The
urof lowmetry graph was extracted separately. The original
graph was composed of data, and numerical information (and
data that were not necessary for deep learning procedure).
We separated the graph data using ABBY Y Flexicapture®
(ABBY Y, Moscow, Russia), a program that permits the au-
tomated extraction of necessary parts from an image, except
text. Using the ABBY Y program, we extracted a urof lowm-
etry graph from the simple uroflowmetry test sheet (Fig. 2).
Deep learning models typically require a fixed image
specification f or training. Szegedy et al. [13] gained more
accuracy with a 299×299 pixels input size, keeping the com-
putational ef f ort constant. Zoph et al. [14] used both 299×299
and 33331 pixels for training ImageNet models. Similarly,
we resized the resolution of all images to 299×299 pixels. Ow-
ing to the limited number of urof lowmetry graphs datasets,
we performed a data augmentation technique f or better
classif ication performance of the trained models. The aim of
data augmentation is to expand the size of a training datas-
et by generating modified images in the dataset. The nature
of uroflowmetry graphs is greatly different f rom natural
images such as dogs, cars, and pedestrian images. Thus, it is
impractical to apply popular data augmentation techniques
such as f lipping and rotation because the spatial correlation
of the urof lowmetry graph should be maintained. Therefore,
we applied the cropping approach only as data augmenta-
tion, where we cropped the lef t and right top/ bottom areas
along with the central area that maintained approximately
90% of the original graph.
5. Deep neural network model implementation
W e ad o pt e d R es N e t-18 [15 ], I n c ep t ion- V 3 [16 ], a n d V G G 16
[17] for the classification of the uroflowmetry images. Af ter
initializing with ImageNet-pretrained models, we extensively
tuned hyper-parameters such as the learning rate, batch size,
and activation functions in the training process. We trained
DUA classification models and BOO classif ication models
separately with the corresponding datasets.
To evaluate our models, 5-fold cross-validation was per-
f ormed. Pre-processed images were randomly divided into
Fig. 2. An outline of uroflowmetry graph extraction and data augmentation pipeline The ABBYY program provides the extraction area from the
original test sheet (A), then image augmentations (C) are made using the original crop (B).
ABC
304 www.icurology.org
Bang et al
https://doi.org/10.4111/icu.20210434
five non-overlapping subsets: f our subsets were used for
training and one was lef t for validation. This process was re-
peated for all five subsets so that each subset was evaluated
as a test set once. The results were averaged and recorded.
The average value of area under the receiver operating
characteristic (AUROC) curve derived f rom 5-fold cross-val-
idation and accordant mean average precision (mAP) values
for both DUA and BOO datasets were chosen as evaluation
metrics [18].
Keras, a high-level Python API, was used as our deep
learning platf orm, enabling f ast experimentation. The net-
works were implemented in the Ubuntu 16.04 LTS environ-
ment, equipped with a 1080Ti GeForce GPU series.
ResNet-18 has been heavily involved in the deep learn-
ing community for the last half decade, allowing researchers
to train deeper networks with the help of simply adding
identity mappings to every f ew stacked layers. We chose
ResNet-18 because it is light and suitable for our dataset at
hand. Similarly, Inception-V3 is a CNN model that gained
popularity in the deep learning community for its ap-
proach toward keeping the compute cost constant. Moreover,
Inception-V3 is known to improve the training ability of
a network through variations in properties. We employed
Inception-V3 to determine whether it could capture low-
level f eatures of our urof lowmetry graphs. The last network
we experimented with was VGG16, developed by the Visual
Geometry Group of the University of Oxford. It presented
a thoroughly evaluated network of increased depth, stick-
ing to3 convolutional f ilters. The model is relatively more
straightforward than the ResNet and Inception counter-
parts and has achieved promising results in various tasks.
Therefore, we adopted VGG16 for the DUA and BOO datas-
ets as well. Since VGG16 outperf ormed the f ormer networks,
we present detailed explanations of hyperparameter tunings
of the VGG network alone. The model was optimized f or
DUA classification using a stochastic gradient descent with
a learning rate of 0.003. Likewise, the hyperparameters of
BOO classif ication were tuned as same as f or DUA except
for a learning rate of 0.01. The input size of 299×299 pixels
showed better results compared to smaller analogs for both
datasets.
6. Statistical analysis
Data analysis was performed using the Statistical Pack-
age for the Social Sciences (SPSS® St atistics v e rsi on 25.0;
SPSS Inc., IBM Corp., Chicago, IL, USA ), and a Student’s t-
test was used to compare patient characteristics. Statistical
significance was set at a p-value of <0.05.
RESULTS
As shown in Table 1, among the 1,792 patients, 482
(26.90%) had BOO, and 893 (49.83%) had DUA. There were
significant diff erences between BOO and non-BOO patients
in UDS parameters except time to voiding time. In DUA and
non-DUA patients, there were signif icant dif ferences in all
the pressure-flow study parameters, except age and voiding
vo lume.
As a result of deep learning evaluations, the mean 5-fold
cross-validation AUROC metrics for DUA classification
trained with ResNet-18 and Inception-V3 networks were
0.699 and 0.648, respectively. As mentioned, the best score of
Table 1. Baseline patient characteristics
Characteristic BOO p-value DUA p-value
No (n=1,310) Yes (n=482) No (n=899) Yes (n=893)
Age, y 66.41 64.01 <0.001 64.39 64.93 0.229
BOOI 18.06 61.08 <0.001 33.01 26.22 <0.001
BCI 98.86 114.68 <0.001 127.66 78.38 <0.001
Voiding efficacy 86.35 77.78 <0.001 86.16 81.92 <0.001
Qmax, mL/s 13.95 9.99 0.001 14.67 11.09 0.001
Average flow, mL/s 6.38 4.58 <0.001 6.86 4.95 <0.001
Voding time, s 66.13 72.83 0.022 54.38 81.58 <0.001
Flow time, s 50.00 57.28 <0.001 44.86 58.66 0.001
Time to peakflow, s 20.92 24.50 0.001 16.65 27.16 <0.001
Voided volume, mL 272.91 233.89 <0.001 262.04 262.79 0.881
Residual volume, mL 48.67 77.30 <0.001 45.82 66.99 <0.001
Values are presented as mean value only.
BOO, bladder outlet obstruction; DUA, detrusor underactivity; BOOI, bladder outelet obstruction index; BCI, bladder contractility index; Qmax,
maximum urine flow rate.
Student t-test.
305
Investig Clin Urol 2022;63:301-308. www.icurology.org
AI-based diagnostic platform to evaluate male LUTS
0.733 was obtained with a f ine-tuned VGG16 network. The
BOO classif ication trained with ResNet-18 and Inception-V3
networks were 0.661 and 0.560, respectively. The VGG16 net-
work trained with the BOO dataset also achieved a higher
discrimination rate of 0.722 than ResNet-18 and Inception-
V3. F i gs. 3 and 4 show the ROC curves and PR curves of
the VGG16 network f or the DUA and BOO datasets, re-
spectively. We also calculated the sensitivity and specif icity
Fig. 3. The mean ROC curve (A) and the mean PR curve (B) of VGG16 network for DUA vs. non-DUA classification. ROC, receiver operating charac-
teristic; DUA, detrusor underactivity; AUC, area under the curve; PR, precision-recall.
0.0
1.0
0.8
0.6
0.4
0.2
1.0
True positive rate
False positive rate
0.0
0.2 0.4 0.6 0.8
ROC curve of DUA vs. non-DUA
0.0
1.0
0.8
0.6
0.4
0.2
1.0
Precision
Recall
0.0
0.2 0.4 0.6 0.8
PR curve of DUA vs. non-DUA
ROC fold_01 (AUC=0.762)
ROC fold_02 (AUC=0.758)
ROC fold_03 (AUC=0.723)
ROC fold_04 (AUC=0.710)
ROC fold_05 (AUC=0.711)
Chance
Mean ROC (AUC=0.733+0.02)
+1 standard deviation
PR-curve fold_01 (area=0.753)
PR-curve fold_02 (area=0.710)
PR-curve fold_03 (area=0.698)
PR-curve fold_04 (area=0.685)
PR-curve fold_05 (area=0.674)
No-skill
Overall PR-curve (area=0.698)
AB
Fig. 4. The mean ROC curve (A) and the mean PR curve (B) of VGG16 network for BOO vs. non-BOO classification. ROC, receiver operating charac-
teristic; BOO, bladder outlet obstruction; AUC, area under the curve; PR, precision-recall.
0.0
1.0
0.8
0.6
0.4
0.2
1.0
True positive rate
False positive rate
0.0
0.2 0.4 0.6 0.8
ROC curve of BOO vs. non-BOO
0.0
1.0
0.8
0.6
0.4
0.2
1.0
Precision
Recall
0.0
0.2 0.4 0.6 0.8
PR curve of vs. non-BOO BOO
ROC fold_01 (AUC=0.728)
ROC fold_02 (AUC=0.722)
ROC fold_03 (AUC=0.720)
ROC fold_04 (AUC=0.727)
ROC fold_05 (AUC=0.712)
Chance
Mean ROC (AUC=0.722+0.00)
+1 standard deviation
PR-curve fold_01 (area=0.484)
PR-curve fold_02 (area=0.454)
PR-curve fold_03 (area=0.503)
PR-curve fold_04 (area=0.508)
PR-curve fold_05 (area=0.516)
No-skill
Overall PR-curve (area=0.452)
AB
Fig. 5. Model explainability with GRAD-
CAM++. The first row presents samples
from the VGG16 model trained with
the DUA dataset while the second row
depicts samples from the VGG16 model
trained with the BOO dataset. BOO,
bladder outlet obstruction.
0
50
100
150
200
250
0 50 100 150 200 250
Input image 0
50
100
150
200
250
0 50 100 150 200 250
Grad-CAM++ 0
50
100
150
200
250
0 50 100 150 200 250
Input image 0
50
100
150
200
250
0 50 100 150 200 250
Grad-CAM++
0
50
100
150
200
250
0 50 100 150 200 250
Input image 0
50
100
150
200
250
0 50 100 150 200 250
Grad-CAM++ 0
50
100
150
200
250
0 50 100 150 200 250
Input image 0
50
100
150
200
250
0 50 100 150 200 250
Grad-CAM++
306 www.icurology.org
Bang et al
https://doi.org/10.4111/icu.20210434
values of the DUA and BOO models. The sensitivity and the
specificity of VGG16 network for DUA dataset accounted
f o r 65.9% a nd 68.9% a t the ma x i mum Y oud e n ’s in dex [19].
The sensitivity and the specificity of VGG16 network f or
BOO dataset accounted f or 65.1% and 68.9% at the maximum
Youden’s index. Furthermore, because a f ine-tuned VGG16
was the winner among the three experimental models, we
only depicted the visualizations of a GRAD-CAM++ [20].
Visual explanation techniques such as GRA D-CAM++ are
used to produce rough localization mappings by highlight-
ing important regions in the image. GRAD-CAM++ provides
feature maps with respect to a specif ic class score to gener-
ate visual explanations. Fig. 5 illustrates some samples f rom
uroflowmetry images and their respective mappings next to
them. Evidently, GRAD-CAM++ activated the signal graphs
compared to background regions. This implies that models
learned to identify clinically proper regions in the images.
DISCUSSION
Since the introduction of simple uroflowmetry in 1948 [6],
several attempts have been made to establish a pattern of
analysis for this technique. Van de Beek et al. [21] attempted
to classif y and predict uroflowmetry. In this study, the group
attempted to f ormalize uroflowmetry and identify diagnostic
patterns among specialists. However, the predictive rate was
only 36%. Gacci et al. [22] published a common flow pattern
in 2007. They formulated urof lowmetric parameters and
searched f or the items of diagnostic suspicion of urof lowm-
etry curves. However, their agreement was not satisfactory,
as it had a kappa value of 0.05. Moreover, the analysis was
based on the lack of reproducibility and the characteristics
of simple uroflowmetry, which vary greatly depending on
the environment.
There have also been other attempts to predict or diag-
nose BOO. Bladder wall thickness (BWT) was predicted to be
increased by BOO as one of the indicators that can be mea-
sured by ultrasound [23]. Manieri et al. [24] f irst discussed
this possibility. Using 5 mm as the reference point and a
signif icant difference (r>0.6), this group f ound that 63% of
the normal group had values <5 mm, while 88% of patients
with BOO had values >5 mm. In contrast, Hakenberg et al.
[25] found that the BWT increased slightly with age, but not
signif icantly.
The penile cuf f test was also applied to measure BOO.
This test measures the detrusor contractility by detecting
the iso-volumetric bladder pressure [26]. An inf latable cuff
is placed around the penis shaf t and expands automatically
until the urine flow is interrupted. The next cuff then de-
flates rapidly to restart the f low. This cycle can be repeated
until the urination ends. The pressure required to interrupt
urinary flow during the cycle is considered to represent
bladder pressure (Pcuff.int) [27]. However, this method has
several limitations, including its high cost and the need f or
patients to be seated when they take t he test. The seated na-
ture of the test may introduce bias, as most men void while
standing [28]. We attempted to mitigate these limitations us-
ing deep learning.
The prediction of BPH through AI has also been sug-
gested by other researchers. Torshizi et al. [29] predicted
severity of BPH based on f uzzy-ontology, and the accuracy
was about 90%. However, the results of this study presented
the severity based on the results obtained through question-
naire and clinical examination, and our study showed a big
dif f erence in that it looked at the possi bility of diagnosis
only by graph analysis. In addition, a non-invasive prediction
of LUTS using ANN (artif icial neural network) was also
presented [30], but its accuracy did not satisf ied expectations.
In this study, we tried to overcome such limitations using
CNN, and the prediction of DUA is the f irst attemption.
In t his study, we proposed the use of a deep learning tool
as a diagnostic alternative to invasive UDSs. To our knowl-
edge, this is a novel approach. We believe that it can be used
as the basis for the development of a tool to compensate f or
the defect of the UDS. This study sought to determine if one
could use graph patterns to predict disease. We compared
patients with and without DUA and those with and without
BOO. We did not account for patients who may have both
DUA and BOO. Given the large number of other patients
with LUTS, the study attempted to identif y these complex
diseases. We used CNN to conf irm the accuracy of predic-
tions f or patients with BOO and DUA using only a simple
urof lowmetry graph. The raw signal data of the urodynamic
test results graph was not provided from the urodynamic
test device. Hence, an image capture software program, AB-
BY Y Flexicapture®, was used to extract 1,792 data samples
and there was no error case. Thus, we believe that the im-
age capture process was robust. This research is meaningful
in that it used a deep learning method to approach areas
that have not been investigated using prototype trials. We
consider that this is a meaningful work that will serve as
a cornerstone for f urther research. We experimented with
known algorithms offered for classification tasks such as
ResNet-18, Resnet-50, Inception-V3, Ef f icientnet-B0, however,
final predictions were not as good as VGG16’s (data not
shown). Besides, with our dataset, VGG19 attained the same
result as its VGG16 variants, therefore we decided to select
the lighter one. As this is a feasibility study of deep learn-
307
Investig Clin Urol 2022;63:301-308. www.icurology.org
AI-based diagnostic platform to evaluate male LUTS
ing models on urodynamic test data, a f urther study with a
larger dataset will be needed. Also, we will consider experi-
menting with recent models in our future study.
This study has several limitations. First, the prediction
rate of this study is only slightly over 70%, which indicates
that a higher prediction rate is required. Additionally, the
mean AUC scores of f ine-tuned VGG16 can be ameliorated
by increasing the number of training images. Second, the ca-
pacity to set the basis for model predictions is confined due
to the absence of external data. Although visual interpreta-
tions of GRAD-CAM++ in Fig. 5 provided some evidence
that the model discriminated between the signal graph and
gridlines in the background, the f ull interpretability needs
to be addressed in f uture work. And third, this study is ex-
cluded patients who had both BOO and DUA. We included
patients who had only BOO or only DUA. In f urther studies,
it is needed to be include this complexed situation to devel-
oped usef ul device to diagnose BOO and DUA.
CONCLUSIONS
Our study suggests possibility of automated and non-in-
vasive device to differentiate DUA f rom non-DUA and BOO
from non-BOO using a simple uroflowmetry graph with a
fine-tuned VGG16, which is a well-known CNN model.
CONFLICTS OF INTEREST
The authors have nothing to disclose.
FUNDING
This work was supported by a National Research Found-
ation of Korea (NRF) grant f unded by the Korean govern-
men t (Mi nist ry of Science and ICT ) ( No. 2017R1E1A1A01077487,
2020R1F1 A1070952).
AUTHORS’ CONTRIBUTIONS
Research conception and design: Seokhwan Bang and
Kyu-Sung Lee. Data acquisition: Sokhib Tukhtaev and Baek
Hwan Cho. Statistical analysis: Seokhwan Bang, Sokhib
Tukhtaev, and Baek Hwan Cho. Data analysis and inter-
pretation: Seokhwan Bang and Deok Hyun Han. Drafting
of the manuscript: Seokhwan Bang and Sokhib Tukhtaev.
Critical revision of the manuscript: Deok Hyun Han and
Kyu-Sung Lee. Obtaining f unding: Kyu-Sung Lee and Baek
Hwan Cho. Administrative, technical, or material support:
Minki Baek and Hwang Gyun Jeon. Supervision: Kwang Jin
Ko and Deok Hyun Han. Approval of the f inal manuscript:
Kyu-Sung Lee.
REFERENCES
1. Egan KB. The epidemiology of benign prostatic hyperplasia
associated with lower urinary tract symptoms: prevalence and
incident rates. Urol Clin North Am 2016;43:289-97.
2. Osman NI, Esperto F, Chapple CR. Detrusor underactivity and
the underactive bladder: a systematic review of preclinical and
clinical studies. Eur Urol 2018;74:633-43.
3. Han DH, Jeong YS, Choo MS, Lee KS. The efficacy of trans-
urethral resection of the prostate in the patients with weak
bladder contractility index. Urology 2008;71:657-61.
4. Porru D, Madeddu G, Campus G, Montisci I, Scarpa RM, Usai
E. Evaluation of morbidity of multi-channel pressure-flow
studies. Neurourol Urodyn 1999;18:647-52.
5. Yeung JY, Eschenbacher MA, Pauls RN. Pain and embarrass-
ment associated with urodynamic testing in women. Int Uro-
gynecol J 2014;25:645-50.
6. Chancellor MB, Rivas DA, Mulholland SG, Drake WM Jr. The
invention of the modern uroflowmeter by Willard M. Drake, Jr
at Jefferson Medical College. Urology 1998;51:671-4.
7. Shen D, Wu G, Suk HI. Deep learning in medical image analy-
sis. Annu Rev Biomed Eng 2017;19:221-48.
8. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Gha-
foorian M, et al. A survey on deep learning in medical image
analysis. Med Image Anal 2017;42:60-88.
9. Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanas-
wamy A, et al. Development and validation of a deep learning
algorithm for detection of diabetic retinopathy in retinal fun-
dus photographs. JAMA 2016;316:2402-10.
10. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification
with deep convolutional neural networks. Adv Neural Inf Pro-
cess Syst 2012;25:1097.
11. Schäfer W, Abrams P, Liao L, Mattiasson A, Pesce F, Spangberg
A, et al. Good urodynamic practices: uroflowmetry, filling
cystometry, and pressure-flow studies. Neurourol Urodyn
2002;21:261-74.
12. Abrams P. Bladder outlet obstruction index, bladder contrac-
tility index and bladder voiding efficiency: three simple indices
to define bladder voiding function. BJU Int 1999;84:14-5.
13. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethink-
ing the inception architecture for computer vision. ArXiv.
1512.00567 [Preprint]. 2015 [cited 2021 Jun 23]. Available
from: https://arxiv.org/abs/1512.00567.
14. Zoph B, Vasudevan V, Shlens J, Le QV. Learning transferable
architectures for scalable image recognition. ArXiv. 1707.07012
[Preprint]. 2018 [cited 2021 Jun 23]. Available from: https://
308 www.icurology.org
Bang et al
https://doi.org/10.4111/icu.20210434
arxiv.org/abs/1707.07012.
15. He K, Zhang X, Ren S, Sun J. Deep residual learning for image
recognition. ArXiv. 1512.03385 [Preprint]. 2015 [cited 2021
Aug 5]. Available from: https://arxiv.org/abs/1512.03385.
16. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al.
Going deeper with convolutions. ArXiv. 1409.4842 [Preprint].
2014 [cited 2021 Aug 5]. Available from: https://arxiv.org/
abs/1409.4842.
17. Simonyan K, Zisserman A. Very deep convolutional networks
for large-scale image recognition. ArXiv. 1409.1556 [Preprint].
2015 [cited 2021 Aug 5]. Available from: https://arxiv.org/
abs/1409.1556.
18. Hajian-Tilaki K. Receiver operating characteristic (ROC)
curve analysis for medical diagnostic test evaluation. Caspian J
Intern Med 2013;4:627-35.
19. Youden WJ. Index for rating diagnostic tests. Cancer
1950;3:32-5.
20. Chattopadhay A, Sarkar A, Howlader P, Balasubramanian VN.
Grad-CAM++: improved visual explanations for deep convo-
lutional networks. ArXiv. 1710.11063 [Preprint]. 2018 [cited
2018 Nov 9]. Available from: https://arxiv.org/abs/1710.11063.
21. Van de Beek C, Stoevelaar HJ, McDonnell J, Nijs HG, Casparie
AF, Janknegt RA. Interpretation of uroflowmetry curves by
urologists. J Urol 1997;157:164-8.
22. Gacci M, Del Popolo G, Artibani W, Tubaro A, Palli D, Vittori
G, et al. Visual assessment of uroflowmetry curves: description
and interpretation by urodynamists. World J Urol 2007;25:333-
7.
23. Lee HN, Lee YS, Han DH, Lee KS. Change of ultrasound esti-
mated bladder weight and bladder wall thickness after treat-
ment of bladder outlet obstruction with dutasteride. Low Urin
Tract Symptoms 2017;9:67-74.
24. Manieri C, Carter SS, Romano G, Trucchi A, Valenti M,
Tubaro A. The diagnosis of bladder outlet obstruction in men
by ultrasound measurement of bladder wall thickness. J Urol
1998;159:761-5.
25. Hakenberg OW, Linne C, Manseck A, Wirth MP. Bladder wall
thickness in normal adults and men with mild lower urinary
tract symptoms and benign prostatic enlargement. Neurourol
Urodyn 2000;19:585-93.
26. Van Mastrigt R, Pel JJ. Towards a noninvasive urodynamic di-
agnosis of infravesical obstruction. BJU Int 1999;84:195-203.
27. Griffiths CJ, Rix D, MacDonald AM, Drinnan MJ, Pickard RS,
Ramsden PD. Noninvasive measurement of bladder pressure
by controlled inflation of a penile cuff. J Urol 2002;167:1344-7.
28. Mangera A, Chapple C. Modern evaluation of lower urinary
tract symptoms in 2014. Curr Opin Urol 2014;24:15-20.
29. Torshizi AD, Zarandi MH, Torshizi GD, Eghbali K. A hy-
brid fuzzy-ontology based intelligent system to determine
level of severity and treatment recommendation for Benign
Prostatic Hyperplasia. Comput Methods Programs Biomed
2014;113:301-13.
30. Sonke GS, Heskes T, Verbeek AL, de la Rosette JJ, Kiemeney
LA. Prediction of bladder outlet obstruction in men with lower
urinary tract symptoms using artificial neural networks. J Urol
2000;163:300-5.
... Thus, patients are often reluctant to undergo these invasive procedures because of the associated discomfort and above-mentioned potential risk of infection. Consequently, there has been a growing interest in developing new non-invasive diagnostic methods, including uroflowmetry (UFL) analysis [11,12], bladder wall thickness (BWT) or detrusor muscle thickness (DMT) measurement via ultrasound [13,14], use of artificial intelligence (AI) [15,16] and potential biomarkers in blood serum [17] or urine [18]. ...
... Both parameters showed good diagnostic performance, with sensitivities of 72% and 76%, and specificities of 86% and 83%, respectively [34]. The third study utilized a deep learning diagnostic platform based on UFL graphs analysis, achieving a sensitivity of over 65% and specificity of 68% [15]. ...
... Both parameters were developed with AI assistance and can be easily implemented into clinical practice [34]. Additionally, deep learning diagnostic platforms, such as one developed by Bang et al., can interpret UFL results and assist clinicians in decision-making by quickly analyzing and comparing vast data sets [15]. ...
Article
Full-text available
Background To evaluate and synthesize existing evidence on non-invasive methods for diagnosing detrusor underactivity (DU) in men presenting with lower urinary tract symptoms (LUTS), focusing on their feasibility and diagnostic accuracy. Methods A systematic search of PubMed, Scopus, and Web of Science databases was conducted for original articles reporting on non-invasive diagnostic tests for DU in men with LUTS. Data extraction focuses on study characteristics, diagnostic methods, and accuracy. The risk of bias was assessed using the QUADAS-2 tool. Results Eighteen studies involving 7390 patients, of whom 3194 were diagnosed with DU, were included in our analysis. The evaluated diagnostic methods included ultrasound parameters, biomarkers, uroflowmetry results, symptom questionnaires, and clinical characteristics. Developed models, including those based on artificial intelligence (AI), and nomograms were also assessed. The symptom questionnaire DUA-SQ showed the highest sensitivity of 95.8%, while ultrasound measurements, such as detrusor wall thickness showed 100% specificity but limited sensitivity (42%). Models incorporating clinical variables achieved sensitivity rates of over 75%. Uroflowmetry parameters, particularly presence of "sawtooth" and "interrupted" waveforms, demonstrated sensitivity of 80% and specificity of 87%. Biomarkers, including serum adiponectin and urine NO/ATP ratio, achieved sensitivity of 79% and 88.5%, respectively. AI models showed potential, with sensitivities ranging from 65.9% to 79.7%. Due to the poor quality of the studies and data heterogeneity, meta-analysis was not performed. Conclusions Non-invasive diagnostic methods for DU, particularly DUA-SQ, ultrasound measurements, and AI models, demonstrate potential, though their accuracies vary. Further research is needed to standardize these methods and enhance their diagnostic reliability. Trial registration The study protocol was registered with PROSPERO (CRD42024556425). Clinical trial number: not applicable.
... Hence, LUTS had a negative impact on sleep efficiency, increasing daytime sleepiness [3]. Several comorbidities could be associated with these two realities, such as obstructive sleep apnea syndrome that has a hidden prevalence in the general population up to 50% [5,6]. Specifically, subjects are often unaware of the disorder, and the diagnosis may be performed after those symptoms occurred [5,6]. ...
... Several comorbidities could be associated with these two realities, such as obstructive sleep apnea syndrome that has a hidden prevalence in the general population up to 50% [5,6]. Specifically, subjects are often unaware of the disorder, and the diagnosis may be performed after those symptoms occurred [5,6]. However, both EDS and LUTS lead to increased perceived psychological distress [7][8][9]. ...
Article
Full-text available
Background: Both lower urinary tract symptoms (LUTS) and excessive daytime sleepiness (EDS) could negatively impair the patients’ quality of life, increasing the sensitivity to psychological distress that results in mental health disorders. The relationships of both urinary and respiratory domains with psychological distress in obstructive sleep apnea patients is still underestimated. Methods: This study was a post hoc analysis of a web-based Italian survey, which included 1998 participants. Three hierarchical multiple linear regression analyses with psychological distress as dependent variable were performed on the study of 1988 participants enrolled in the final analysis. Cohen’s f2 was used for the assessment of the effect size. Results: From the hierarchical multiple linear regression analyses, it emerged that the final statistical model (including sociodemographic characteristics, comorbidities, perceived urinary function, and excessive daytime sleepiness) for all dimensions accounted for 16.7% of the variance in psychological distress, with a medium effect size (f2 = 0.15). Conclusions: People reported psychological distress was impaired by the presence of LUTS and EDS. Specifically, our study showed that higher levels of distress were scored especially in young women exhibiting urinary symptoms and with high values of daytime sleepiness.
... The spikes at the beginning and end of the urinary flow curve, as well as the pre-filling check stage of the equipment, may be read by the computer as Q max , and these errors must also be corrected. When manually reading the Q max value, the following two standards are used: Q max must be read at the highest platform of the urinary flow rate curve or on a peak that lasts for at least 2 s; the Q max value must be read to a precision of 0. [24], many of these indicators only capture a limited number of data points during the entire urodynamic evaluation, leading to a loss of [25]. In addition, a pioneering work that utilized acoustic signatures from the uroflow of patients being treated for LUTS, based on Long Short-Term Memory (LSTM), promoted a smart system for measuring the flowrate during urination without any temporospatial constraints for patients with a urinary disorder for classification and prediction purposes [26]. ...
Article
Full-text available
Objectives To automatically identify and diagnose bladder outflow obstruction (BOO) and detrusor underactivity (DUA) in male patients with lower urinary tract symptoms through urodynamics exam. Patients and Methods We performed a retrospective review of 1949 male patients who underwent a urodynamic study at two institutions. Deep Convolutional Neural Networks scheme combined with a short‐time Fourier transform algorithm was trained to perform an accurate diagnosis of BOO and DUA, utilizing five‐channel urodynamic data (consisting of uroflowmetry, urine volume, intravesical pressure, abdominal pressure, and detrusor pressure). We used fivefold cross‐validation, constructing training and internal test sets from 1725 patients from Renmin Hospital of Wuhan University (RHWU) at a 4:1 ratio, and used an independent external validation set consisting of 224 patients from The Central Hospital of Wuhan (TCHO) to build and evaluate the DI model. We further conducted subgroup analyses to provide a more detailed description of the AI model's interpretability regarding urodynamics. Results The AUC scores of BOO and DUA, which were measured through the STFT‐based deep learning method, were 0.945 ± 0.020 and 0.929 ± 0.039 in RHWU and 0.881 and 0.850 in TCHO, respectively. The diagnostic efficiency of other subgroup analyses and indicators was also effective. Conclusion In this study, the proposed deep neural network combined with the short‐time Fourier transform method is robust and feasible for interpreting the results of urodynamics in men and has the potential for application to assist clinicians in real clinical settings.
... Recent advancements in AI technology within the field of urology have been notable [13]; however, there are relatively few studies that focus specifically on developing machine learning models for diagnosing BOO and DUA in male patients with LUTS. Bang et al. [14] of approximately 73%. In a similar vein, Matsukawa et al. [15] created an AI-based diagnostic system for LUTS that depended exclusively on uroflowmetry data to classify BOO and DUA. ...
Article
Purpose: This study aimed to develop and evaluate machine learning models, specifically CatBoost and extreme gradient boosting (XGBoost), for diagnosing lower urinary tract symptoms (LUTS) in male patients. The objective is to differentiate between bladder outlet obstruction (BOO) and detrusor underactivity (DUA) using a comprehensive dataset that includes patient-reported outcomes, uroflowmetry measurements, and ultrasound-derived features.Methods: The dataset used in this study was collected from male patients aged 40 and older who presented with LUTS and sought treatment at the urology department of Samsung Medical Center. We developed and trained CatBoost and XGBoost models using this dataset. These models incorporated features like prostate size, voiding parameters, and responses from questionnaires. Their performance was assessed using standard metrics such as accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUROC).Results: The results indicated that the CatBoost models displayed greater sensitivity, rendering them effective for initial screenings by accurately identifying true positive cases. Conversely, the XGBoost models showed higher specificity and precision, making them more suitable for confirming diagnoses and reducing false positives. In terms of overall performance for both BOO and DUA, XGBoost surpassed CatBoost, achieving an AUROC of 0.826 and 0.819, respectively.Conclusions: Integrating these machine learning models into the diagnostic workflow for LUTS can significantly enhance clinical decision-making by offering noninvasive, cost-effective, and patient-friendly diagnostic alternatives. The combined application of CatBoost and XGBoost models has the potential to improve diagnostic accuracy and provide customized treatment plans for patients, ultimately leading to better clinical outcomes.
Article
Introduction Uroflowmetry is often used to assess lower urinary tract symptoms (LUTS). Criteria for characterization of flow patterns are not well established, and subjective interpretation is the most common approach for flow curve classification. We assessed the reliability of uroflowmetry curve interpretation in adult women. Materials and Methods Uroflowmetry studies were obtained in 296 women who participated in an observational cohort study. Four investigators with expertise in female LUTS and urodynamics reviewed and categorized each tracing for interrater reliability. A random subset of 50 tracings was re‐reviewed by each investigator for intrarater reliability. The uroflowmetry tracings were rated using categories of continuous, continuous fluctuating, interrupted, and prolonged. Other parameters included flow rate, voided volume, time to maximum flow, and voiding time. Agreement between raters is summarized with kappa ( k ) statistics and percentage where at least three raters agreed. Results The mean age of participants was 44.8 ± 18.3 years. Participant age categories were 18–24 years: 20%; 25–34 years: 17%; 35–64 years: 42%; 65+ years: 18%. Nine percent described their race as Asian, 31% Black, 62% White, and 89% were of non‐Hispanic ethnicity. The interrater reliability was highest for the continuous flow category ( k = 0.65), 0.47 for prolonged, 0.41 for continuous fluctuating, and 0.39 for interrupted flow curves. Agreement among at least three raters occurred in 74.3% of uroflow curves (69% for continuous, 33% for continuous fluctuating, 23% for interrupted, and 25% for prolonged). For intrarater reliability, the mean k was 0.72 with a range of 0.57–0.85. Conclusions Currently accepted uroflowmetry pattern categories have fair to moderate interrater reliability, which is lower for flow curves that do not meet “continuous” criteria. Given the subjective nature of interpreting uroflowmetry data, more consistent and clear parameters may enhance reliability for use in research and as a screening tool for LUTS and voiding dysfunction. Trial Registration Parent trial: Validation of Bladder Health Instrument for Evaluation in Women (VIEW); ClinicalTrials.gov ID: NCT04016298.
Article
Background Machine learning algorithms as a research tool, including traditional machine learning and deep learning, are increasingly applied to the field of urodynamics. However, no studies have evaluated how to select appropriate algorithm models for different urodynamic research tasks. Methods We undertook a narrative review evaluating how the published literature reports the applications of machine learning in urodynamics. We searched PubMed up to December 2023, limited to the English language. We selected the following search terms: artificial intelligence, machine learning, deep learning, urodynamics, and lower urinary tract symptoms. We identified three domains for assessment in advance of commencing the review. These were the applications of urodynamic studies examination, applications of diagnoses of dysfunction related to urodynamics, and applications of prognosis prediction. Results The machine learning algorithm applied in the field of urodynamics can be mainly divided into three aspects, which are urodynamic examination, diagnosis of urinary tract dysfunction and prediction of the efficacy of various treatment methods. Most of these studies were single‐center retrospective studies, lacking external validation, requiring further validation of model generalization ability, and insufficient sample size. The relevant research in this field is still in the preliminary exploration stage; there are few high‐quality multi‐center clinical studies, and the performance of various models still needs to be further optimized, and there is still a distance from clinical application. Conclusions At present, there is no research to summarize and analyze the machine learning algorithms applied in the field of urodynamics. The purpose of this review is to summarize and classify the machine learning algorithms applied in this field and to guide researchers to select the appropriate algorithm model for different task requirements to achieve the best results.
Article
Full-text available
Purpose of Review We sought to review and discuss the current state and future trajectory of machine learning in interpretation of urodynamics studies. We sought to identify the most promising opportunities for improvement in urodynamic interpretation and outcome prediction based on urodynamics using machine learning. Recent Findings Several reports of machine learning algorithms demonstrate accuracy in identification of detrusor overactivity, detrusor underactivity, and other urodynamics phenomena based on tracings. Another series of reports demonstrates that machine learning algorithms incorporating urodynamics factors may accurately predict disease severity or outcomes in functional urologic conditions including overactive bladder and neurogenic lower urinary tract dysfunction. Summary Machine learning has the potential to identify clinically relevant elements such as detrusor overactivity from urodynamics tracings. If externally validated, such an approach could improve efficiency of interpretation and interrater reliability. An important, but more difficult, challenge that would require larger datasets and multi-institution efforts is the application of machine learning to identify clinically relevant urodynamic patterns, unappreciable by humans, that may assist in functional urologic diagnostics, prognostics, and treatment decision-making. In the future, machine learning may realize its potential through integrating clinical factors, test data (including urodynamics with ongoing patient feedback), imaging, biomarkers, and patient preferences, to optimize diagnosis and tailor clinical treatment on a patient-by-patient basis. Clinical Trial Registration This study is not a clinical trial and thus does not warrant registration as such.
Article
Bladder compliance assessment is crucial for diagnosing bladder functional disorders, with urodynamic study (UDS) being the principal evaluation method. However, the application of UDS is intricate and time-consuming in children. So it'S necessary to develop an efficient bladder compliance screen approach before UDS. In this study, We constructed a dataset based on UDS and designed a 1D-CNN model to optimize and train the network. Then applied the trained model to a dataset obtained solely through a proposed perfusion experiment. Our model outperformed other algorithms. The results demonstrate the potential of our model to alert abnormal bladder compliance accurately and efficiently.
Article
Introduction A “Think Tank” at the International Consultation on Incontinence‐Research Society meeting held in Bristol, United Kingdom in June 2023 considered the progress and promise of machine learning (ML) applied to urodynamic data. Methods Examples of the use of ML applied to data from uroflowmetry, pressure flow studies and imaging were presented. The advantages and limitations of ML were considered. Recommendations made during the subsequent debate for research studies were recorded. Results ML analysis holds great promise for the kind of data generated in urodynamic studies. To date, ML techniques have not yet achieved sufficient accuracy for routine diagnostic application. Potential approaches that can improve the use of ML were agreed and research questions were proposed. Conclusions ML is well suited to the analysis of urodynamic data, but results to date have not achieved clinical utility. It is considered likely that further research can improve the analysis of the large, multifactorial data sets generated by urodynamic clinics, and improve to some extent data pattern recognition that is currently subject to observer error and artefactual noise.
Article
Full-text available
Context: Detrusor underactivity (DUA) is a common but relatively under-researched bladder dysfunction. Underactive bladder (UAB) is the symptom-based correlate of DUA. Recently, there has been renewed interest in this topic. Objective: To systematically review and summarise the most recent literature and discuss this in the context of what is already known. Evidence acquisition: A systematic review of the literature was performed in December 2017 using Medline and Scopus databases. Separate searches of each database used a complex search strategy including "free text" protocols. Search terms included "underactive bladder", "detrusor underactivity", "acontractile bladder", "detrusor failure", "detrusor areflexia", "atonic bladder", "chronic retention", and "impaired bladder contractility". Evidence synthesis: The initial search retrieved a total of 1690 studies; of these 44 were included in the final analyses. Conclusions: Although there has been an expansion in the literature concerning all aspects of DUA and UAB, knowledge on its epidemiology and aetiopathogenesis is still lacking; there remains a need to develop accurate reproducible diagnostic criteria and effective treatments, in particular drug therapies. Patient summary: Recently, there has been renewed interest in underactive bladder with expanding research in this area. The lack of simple, reproducible, noninvasive diagnostic criteria has precluded an accurate estimation of the magnitude of the problem. Recent studies have highlighted the potential role of impaired bladder blood supply in causing bladder underactivity.
Article
Full-text available
Deep learning algorithms, in particular convolutional networks, have rapidly become a methodology of choice for analyzing medical images. This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year. We survey the use of deep learning for image classification, object detection, segmentation, registration, and other tasks and provide concise overviews of studies per application area. Open challenges and directions for future research are discussed.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Article
This review covers computer-assisted analysis of images in the field of medical imaging. Recent advances in machine learning, especially with regard to deep learning, are helping to identify, classify, and quantify patterns in medical images. At the core of these advances is the ability to exploit hierarchical feature representations learned solely from data, instead of features designed by hand according to domain-specific knowledge. Deep learning is rapidly becoming the state of the art, leading to enhanced performance in various medical applications. We introduce the fundamentals of deep learning methods and review their successes in image registration, detection of anatomical and cellular structures, tissue segmentation, computer-aided disease diagnosis and prognosis, and so on. We conclude by discussing research issues and suggesting future directions for further improvement. Expected final online publication date for the Annual Review of Biomedical Engineering Volume 19 is June 4, 2017. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Article
Importance: Deep learning is a family of computational methods that allow an algorithm to program itself by learning from a large set of examples that demonstrate the desired behavior, removing the need to specify rules explicitly. Application of these methods to medical imaging requires further assessment and validation. Objective: To apply deep learning to create an algorithm for automated detection of diabetic retinopathy and diabetic macular edema in retinal fundus photographs. Design and setting: A specific type of neural network optimized for image classification called a deep convolutional neural network was trained using a retrospective development data set of 128 175 retinal images, which were graded 3 to 7 times for diabetic retinopathy, diabetic macular edema, and image gradability by a panel of 54 US licensed ophthalmologists and ophthalmology senior residents between May and December 2015. The resultant algorithm was validated in January and February 2016 using 2 separate data sets, both graded by at least 7 US board-certified ophthalmologists with high intragrader consistency. Exposure: Deep learning-trained algorithm. Main outcomes and measures: The sensitivity and specificity of the algorithm for detecting referable diabetic retinopathy (RDR), defined as moderate and worse diabetic retinopathy, referable diabetic macular edema, or both, were generated based on the reference standard of the majority decision of the ophthalmologist panel. The algorithm was evaluated at 2 operating points selected from the development set, one selected for high specificity and another for high sensitivity. Results: The EyePACS-1 data set consisted of 9963 images from 4997 patients (mean age, 54.4 years; 62.2% women; prevalence of RDR, 683/8878 fully gradable images [7.8%]); the Messidor-2 data set had 1748 images from 874 patients (mean age, 57.6 years; 42.6% women; prevalence of RDR, 254/1745 fully gradable images [14.6%]). For detecting RDR, the algorithm had an area under the receiver operating curve of 0.991 (95% CI, 0.988-0.993) for EyePACS-1 and 0.990 (95% CI, 0.986-0.995) for Messidor-2. Using the first operating cut point with high specificity, for EyePACS-1, the sensitivity was 90.3% (95% CI, 87.5%-92.7%) and the specificity was 98.1% (95% CI, 97.8%-98.5%). For Messidor-2, the sensitivity was 87.0% (95% CI, 81.1%-91.0%) and the specificity was 98.5% (95% CI, 97.7%-99.1%). Using a second operating point with high sensitivity in the development set, for EyePACS-1 the sensitivity was 97.5% and specificity was 93.4% and for Messidor-2 the sensitivity was 96.1% and specificity was 93.9%. Conclusions and relevance: In this evaluation of retinal fundus photographs from adults with diabetes, an algorithm based on deep machine learning had high sensitivity and specificity for detecting referable diabetic retinopathy. Further research is necessary to determine the feasibility of applying this algorithm in the clinical setting and to determine whether use of the algorithm could lead to improved care and outcomes compared with current ophthalmologic assessment.
Conference Paper
Convolutional networks are at the core of most stateof-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we are exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21.2% top-1 and 5.6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3.5% top-5 error and 17.3% top-1 error.