PreprintPDF Available

Voxels Intersecting along Orthogonal Levels Attention U-Net (viola-Unet) to Segment Intracerebral Haemorrhage Using Computed Tomography Head Scans

Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

We implemented two distinct 3-dimensional deep learning neural networks and evaluate their ability to segment intracranial hemorrhage (ICH) seen on non-contrast computed tomography (CT). One model, referred to as "Voxels-Intersecting along Orthogonal Levels of Attention U-Net" (viola-Unet), has architecture elements that are amenable to the INSTANCE 2022 Data Challenge. A second comparison model was derived from the no-new U-Net (nnU-Net). Input images and ground truth segmentation maps were used to train the two networks separately in supervised manner; validation data were subsequently used for semi-supervised training. Model predictions were compared during 5-fold cross validation. The viola-Unet outperformed the comparison network on two out of four performance metrics (i.e., NSD and RVD). An ensemble model that combined viola-Unet and nnU-Net networks had the highest performance for DSC and HD. We demonstrate there were ICH segmentation performance benefits associated with a 3D U-Net efficiently incorporates spatially orthogonal features during the decoding branch of the U-Net. The code base, pretrained weights, and docker image of the viola-Unet AI tool will be publicly available at .
Content may be subject to copyright.
Voxels Intersecting along Orthogonal Levels
Attention U-Net (viola-Unet) to Segment
Intracerebral Haemorrhage Using Computed
Tomography Head Scans ?
Qinghui Liu1, Bradley J MacIntosh1, Till Schellhorn1, Karoline Skogen1, Kyrre
Eeg Emblem1, and Atle Bjørnerud1
Oslo University Hospital (OUS), Rikshospitalet
0372 Oslo, Norway
{qiliu, bramac, uxscti, kaskog, kemblem, abjorner}
Abstract. We implemented two distinct 3-dimensional deep learning
neural networks and evaluate their ability to segment intracranial hem-
orrhage (ICH) seen on non-contrast computed tomography (CT). One
model, referred to as ”Voxels-Intersecting along Orthogonal Levels of At-
tention U-Net” (viola-Unet), has architecture elements that are amenable
to the INSTANCE 2022 Data Challenge. A second comparison model was
derived from the no-new U-Net (nnU-Net). Input images and ground
truth segmentation maps were used to train the two networks sepa-
rately in supervised manner; validation data were subsequently used
for semi-supervised training. Model predictions were compared during
5-fold cross validation. The viola-Unet outperformed the comparison
network on two out of four performance metrics (i.e., NSD and RVD).
An ensemble model that combined viola-Unet and nnU-Net networks
had the highest performance for DSC and HD. We demonstrate there
were ICH segmentation performance benefits associated with a 3D U-
Net efficiently incorporates spatially orthogonal features during the de-
coding branch of the U-Net. The code base, pretrained weights, and
docker image of the viola-Unet AI tool will be publicly available at
Keywords: U-Net ·intracranial hemorrhage ·head computed tomog-
raphy ·deep learning ·semi-supervised training.
1 Introduction
A spontaneous intracranial hemorrhage (ICH) is the second most common cause
of stroke, following ischemic stroke, but with disproportionate high mortality
and long-term disability. Bleeding might occur in the brain parenchyma or in the
surrounding anatomical spaces. Subdural, epidural, and subarachnoid bleedings
?Supported and funding provided by Helse Sør-Øst Regional Health Authority.
arXiv:2208.06313v1 [eess.IV] 12 Aug 2022
2 Q. Liu et al.
are examples of ICH that occur in close proximity to layers near the skull and
tend to be trauma-related. Parenchymal and subarachnoid ICH can lead to blood
in the ventricles, which is a poor prognostic marker [1].
Non contrast computed tomography is effective at detecting hemorrhage. Ro-
bust radiological ICH estimates are desirable. For example, a subdural hematoma
with a width ¿1 cm typically warrants neurosurgery [2]. The risk of poor out-
comes scales with each mL increase in stroke hematoma volume [1]. Current clin-
ical practise is to calculate hematoma volume ”by-hand”, following the ABC/2
method that uses the maximum hematoma diameter ”A”, the orthogonal in-
place diameter ”B”, and ”C” the number of slices where the hematoma is visible
to produce a volume estimate. Although the ABC/2 can be done quickly (i.e.
minutes) it is desirable to develop automated ICH segmentation methods [3] .
Fig. 1. An INSTANCE2022 example (case: 088) from the training dataset. Each images
shows different Hounsfield Unit (HU) windowing levels to take advance of an RGB-
style 3-window data input combination (from left to right: [200 1300], [0 100],
[20 200], and 3-window combination), with the ICH labeled segmentation regions
highlighted by the pink edge lines.
Deep learning (DL) algorithms have recently received increasing attention in
computer-aided automatic methods for medical data analysis. The state-of-the-
art medical image segmentation models tend to rely on the popular U-Net [4]
architecture, an encoder-decoder convolutional neural network (CNN) based
approach with end-to-end training pipeline for pixel- or voxel-wise segmenta-
tion. Several U-Net-like models have tackled ICH segmentation using head CT
scans [5,6,7,8,9] and these successes are mirrored in other brain imaging fields
such as tumor segmentation of multi-modal MRI scans [10,11]. Isensee et al.,
in particular, used the nnU-Net framework [12] to present a winning model for
the BraTS20 challenge [13], with a self-configuring method for various DL-based
biomedical image segmentation tasks. Thus, we chose nnU-Net as the strong
baseline model in the current work.
In this paper, we propose a novel solution for the ICH segmentation task of
the INSTANCE 2022 challenge [14]. We demonstrate that there is a deep learning
model that is fast, accurate, robust, and computational efficient in segmenting
the ICH lesion. Section 2presents our methodology in detail. Experimental pro-
cedure, evaluation, and test results are performed in Section 3. Finally, we draw
Viola-Unet to segment ICH 3
conclusions based on our participation in the INSTANCE 2022 challenge in the
last Section 4.
2 Methods
We first describe the proposed Viola U-Net (viola-Unet) framework and its vari-
ants. We then present our design choices and provide detailed information about
viola attention module for the ICH segmentation task.
2.1 Model architectures
320 x 5 x 5 x 1
3 x 3 x 1 Conv, BatchNorm, LeakyReLU, 1 x 1 x 1 stride
3 x 3 x 1 Conv, BatchNorm, LeakyReLU, 2 x 2 x 1 stride
3 x 3 x 3 Conv, BatchNorm, LeakyReLU, 2 x 2 x 2 stride
3 x 3 x 3 Conv, BatchNorm, LeakyReLU, 1 x 1 x 1 stride
2 x 2 x 2 TransposeConv, 2 x 2 x 2 stride
1 x 1 x 1 TransposeConv, 1 x 1 x 1 stride
2 x 2 x 1 TransposeConv, 2 x 2 x 1 stride
Skip connection
Viola attention module
1 x 1 x 1 Conv, 1 x 1 x 1 stride, 2 out channels
160 x 160 x 16
80 x 80 x 16
40 x 40 x 8
20 x 20 x 4
10 x 10 x 2
5 x 5 x 1
Fig. 2. The Viola U-Net (viola-Unet) architecture powered by the proposed Voxels
Intersecting along Orthogonal Levels Attention (viola) module. Additional two output
heads are only used for deep supervision [15] training.
Baseline nnU-Net: As a baseline, we used a self-configured U-Net architecture
from the official open source nnU-Net framework 1. The nnU-Net had a depth of
6. The number of channels at each encoder and decoder (symmetric) level were:
32, 64, 128, 256, 320 and 320. The input path size was 1 ×320 ×320 ×16 with
5 scales of deep supervision training outputs.
4 Q. Liu et al.
Viola U-Net: Our solution is called ”viola-Unet” as it relies on Voxels in
feature space that Intersect along Orthogonal Levels to provide an Attention U-
Net, which is an asymmetric encoder-decoder architecture with 7-depth layers
( shown in Figure 2). The number of channels at each encoder was 32, 64, 96,
128, 192, 256 and 320, while the channel-numbers at each corresponding decoder
layer were 32, 64, 96, 128,128 and 128. In addition, the input patch size was
3×160 ×160 ×16 with 2 extra scales of deep supervision outputs.
Architecture considerations: The viola-Unet is flexible and configurable, i.e.
strides and kernel sizes at each layer, number of features in both encoder and de-
coder layers, symmetric or asymmetric, the number of deep supervision outputs.
We can also incorporate other attention blocks such as gated attention [16]. The
final submission used a larger version of viola-Unet configured as following:
=[ [3 , 3 ,1 ] ,[ 3 ,3 , 1] , [3 , 3 ,1 ] ,[ 3 ,3 , 3] , [3 , 3 ,3 ] ,[ 3 ,3 , 3] , [3 , 3 ,3 ]] ,
=[ [1 , 1 ,1 ] ,[ 2 ,2 , 1] , [2 , 2 ,1 ] ,[ 2 ,2 , 2] , [2 , 2 ,2 ] ,[ 2 ,2 , 1] , [1 , 1 ,1 ]] ,
=[ [2 , 2 ,1 ] ,[ 2 ,2 , 1] , [2 , 2 ,2 ] ,[ 2 ,2 , 2] , [2 , 2 ,1 ] ,[ 1 ,1 , 1] ] ,
10 f il te rs = (3 2 ,6 4 ,9 6 ,1 28 , 19 2 ,2 56 , 32 0) ,
12 # ca n us e differ e n t n u m b e r of feature in each d e c o d e r la y e r
13 d ec _f il t er s =( 32 , 64 , 96 , 12 8 ,1 92 , 25 6) ,
15 n or m _ na m e = ( " B AT CH " , { " affine": T ru e }) ,
16 a ct _ n am e =( " l e ak y re lu " , { " inplace": Tr u e , "negative_slope "
17 d ro po u t =0 .2 ,
18 deep_supervision=True,
19 d ee p _ su p r _ nu m = 4 ,
20 r es _ b lo c k = T ru e ,
21 t ra n s _b i a s = T ru e ,
22 v io l a _a t t = T ru e , # t u r n o n o r o ff v i ola a t t e n t i o n
23 g at e d _a t t = F al se , # t urn on or off gated a t t e n t i o n
24 s um _ de e p_ s u pr = F a ls e # ca n s u m all de ep - s u p e r v i s i o n o u t p u t s
duri n g i n f e r e n ce
25 )
Listing 1.1. viola-Unet-l model configurations
Viola-Unet to segment ICH 5
Viola attention module: Squeeze-and-Excitation (SE) networks are able to
recalibrate channel-wise feature responses by explicitly modelling interdepen-
dencies between channels on 2D feature planes[17]. The viola-Unet attention
method is similar; Fig. 3shows how the viola attention module incorporate
features along orthogonal directions, which is an efficient way to incorporate
through-plane features.
Fig. 3. The illustration of Voxels Intersecting along Orthogonal Levels Attention (Vi-
ola) pipeline. Here AdaAvgPool denotes adaptive average pooling, and DDCM denotes
dense dilated convolutions’ merging network [18].
Overall Viola module is composed of three key blocks, i.e., the adaptive aver-
age pooling (AdaAvgPool) module that squeezes the input feature volume (e.g.,
XRC×H××W×D, where C, H, W , and Drepresent channel, height, width,
and depth for a given feature volume.) into three latent representation spaces
(e.g., XhRC×H,XwRC×W, and XdRC×D) along each axis of the in-
put feature patch. The customized dense dilated convolutions merging (DDCM)
networks fuses cross-channel and non-local contextual information on each or-
thogonal direction with adaptive kernel sizes (i.e., k= [2 (C//32) + 3,1] ),
dilated ratios (i.e., dilation = [1, k, 2(k1) + 1,3(k1) + 1] ) and strides
(i.e, strides = [(2,1),(2,1),(4,1),(4,1)]).The Viola unit constructs the voxels in-
tersecting along orthogonal level attention volume (i.e. Aviol a RC×H××W×D)
based on fused and reshaped cross-channel-direction latent representation spaces
(i.e., XhRC×H×1×1,XwRC×1×W×1, and XdRC×1×1×D), see footnote2.
Xd= 0.5·˜
Att =˜
Att = ReLU ˆ
2Unless particularly specified, we use bold capital characters for matrices and tensors,
lowercase and capital characters in italics for scalars and bold italics for vectors.
6 Q. Liu et al.
Aviola = 0.1·(˜
Att +ˆ
Att)+0.3,Aviola =Aviola + L2Norm (Aviol a),(5)
X=XAviola .(6)
where σdenotes the Sigmoid activation function, ϕdenotes a combination func-
tion of group normalization [19] (G= 2 in this work) and Tanh non-linearity,
denotes the tensor product and denotes the element-wise multiplication.
3 Data, experiments and results
3.1 Dataset and evaluation metrics
The INSTANCE 2022 challenge dataset [20,14] consists of 200 non-contrast 3D
head CT scans of clinically diagnosed patients with ICH of various types, such as
subdural hemorrhage (SDH), epidural hemorrhage (EDH), intraventricular hem-
orrhage (IVH), intraparenchymal hemorrhage (IPH), and subarachnoid hemor-
rhage (SAH). N=100 of the publicly available cases were used for training; the
remaining N=100 cases were held-out for the validation set (N=30 for the pub-
lic leaderboard, and N=70 for the competitor rankings). The CT images had a
matrix size of: 512 ×512 ×N, where Nlies in [20,70]. The average pixel spacing
was around 0.45 ×0.45 ×5 mm.
Model performance was evaluated by four measures: Dice Similarity Coef-
ficient (DSC), Hausdorff distance (HD), Relative absolute Volume Difference
(RVD), and the Normalized Surface Dice (NSD).
3.2 Implementation
Our code for this study were written in Python3 and PyTorch [21] with use of
the open source Monai 3library version 0.9.0. We adopted and modified Monai’s
network codes to implement the proposed models (both viola-Unet and modified
3.3 Training details
Guided by our empirical results, we trained all networks with randomly sampled
patches of fixed size (3 ×160 ×160 ×16) as input and a batch size of 2. Each
network was trained with 5-fold cross validation for up to 72,000 steps using
stochastic gradient descent (SGD) and an optimizer with Nesterov momentum
of 0.99. The initial learning rate was 7 ×103with applying a cosine annealing
scheduler [22] to reduce the learning rate over epochs. We used a linear warm-up
learning rate during the first 1000 steps. A sliding window inference method 4
was applied to evaluate the model on the local validation set after every 200
training steps. We stored the checkpoint with the highest mean dice score on
the validation set of the current fold during the training phase. Based on our
4MONAI sliding window implementation was used.
Viola-Unet to segment ICH 7
training observations to achieve fast and stable convergence for each network,
we applied a combination loss function of the dice loss [23] and Focal loss [24]
for all our experiments.
3.4 Data pre-processing and augmentations
CT image and ground truth labels were reoriented into ”RAS” format (i.e.,
Right, Anterior and Superior), then resized to a standard spacing of 1×1×5 mm3
using trilinear interpolation for the image and nearest-neighbor interpolation for
the label. Each CT image was windowed into three image intensity ranges (i.e.,
[0,100],[20,200], and [200,1300] as shown in Fig. 1), and re-scaled to the
range [0,1] by min-max normalization and then stacked as 3-channel (RGB-
style) volumes to serve as inputs with the (C, H , W, D) shape where C-channels
(e.g.3), H-height (e.g. 160), W-width (e.g. 160) and D-depth (e.g. 16), and
then the 3-channel 3D volume was normalized on only non-zero values with
calculated mean and std on each channel separately. In addition, the following
data augmentation steps were taken during training phase:
Random Crop: A fixed sized patch (3 ×160 ×160 ×16) was randomly
cropped with probability of 0.5. And the center was either a foreground or
background voxel based on the Positive and Negative Ratio (1 : 1).
Random Zoom: A random value was sampled uniformly from (0.9,1.2)
with a probability of 0.15.
Gaussian Noise: Random Gaussian noise with mean 0 and standard devi-
ation of 0.01 was added to the input volume with a probability of 0.15.
Gaussian Smooth: Gaussian smoothing with standard deviation of the
Gaussian Kernel sampled uniformly from (0.5,1.15) was applied to the input
volume with probability of 0.15.
Rotation: With probability of 0.1, input volume was rotated by 90 degrees
along either the x- or y-axis.
Random Shift: Randomly shifted intensity for the entire volume was per-
formed by uniformly sampled offset value from [0.1,0.1] with a probability
of 0.5.
Random Scale: Randomly scale the intensity of the volume with a proba-
bility of 0.15 by a factor uniformly picked from [0.3,0.3].
Flips: Volumes were randomly flipped along each x, y, and z axis with a
probability of 0.25 independently.
Random Contrast: Randomly change volume intensity by a value sampled
uniformly from (0.78,1.25) with a probability of 0.15.
3.5 Semi-supervised learning
In this work, we utilised self-training strategy to do semi-supervised fine-tune
learning. The semi-supervised learning principle with self-training algorithms is
to train a model iteratively by assigning pseudo-labels to the set of unlabeled
training samples in conjunction with the labeled training set [25]. In practice,
8 Q. Liu et al.
we manually select the best prediction on each validation example from each
submission as the pseudo-label and put them into our training set to fine-tune
our models repeatedly.
During the self-training stage, we also optimized our hyperparameters grad-
ually. First, based on our experimental observations, we optimized our model
configurations and decided to use a larger version of viola-Unet, as shown in
table 1. Second, to ensure a fair comparison, we reimplemented a comparable
version of nnU-Net using the Monai library. Finally, we fine-tuned our models
using minor updated parameters, such as 1) learning rate of 5 ×103with a
warm-up of 10,000 steps, 2) windowing levels of [[0, 100], [-15, 200], [-100, 1300],
3) spacing of [0.902,0.902,4.997], and 4) batch size of 3.
Table 1. Model configurations are provided for three networks with increasing ar-
chitecture complexity (i.e., DS: number of deep-supervisions, Residual: used residual
connections in the encoder layers, as per [26], Dec-filters: the number of decoder fil-
ters for the bottom two layers, and Z-strides: if down-size the z-slice for the last two
encode layers). The number of parameters (Params) is provided in millions. The in-
ference time (Inf-Time) was measured in seconds on a GeForce RTX 2080TI GPU
with input patch size of 1 ×3×160 ×160 ×16). Note that inference time is fast for
each model. All three networks used the same number of features for encoder layers:
[32,64,96,128,192,256,320]. The r in nnU-Net-r denotes that we re-implemented the
nnU-Net after some optimization.
Models DS Residual Dec-filters Z-strides Params (M) Inf-Time (s)
viola-Unet-s 2 7[128,128] [2,1] 12.77 0.051
nnU-Net-r 4 3[192,256] [1,1] 22.01 0.026
viola-Unet-l 4 3[192,256] [1,1] 22.12 0.052
3.6 Results
Table 2shows the average Dice similarity coefficient (DSC) scores for each 5-
folds with the nnU-Net baseline models and viola-Unet models, respectively. The
viola-Unet outperforms the baseline nnU-Net by a significant margin (mean DSC
Table 3shows online validation results with nnU-Net-base models and our
viola-Unet-s models before applying semi-supervised learning methods. In terms
of DSC and RVD, our models outperformed nnU-Net-base by about 1.2% and
3.9%, respectively, while underperformed by about 0.03% in terms of NSD.
In table 4, we show the top 10 ranking scores for INSTANCE 2022 online
validation phase. Our semi-supervise trained viola-Unet-l models outperformed
the comparison networks on two out of four performance metrics (i.e., NSD
and RVD). An ensemble model that combined viola-Unet-l and re-implemented
nnU-Net-r networks had the highest performance for DSC and HD.
Viola-Unet to segment ICH 9
Table 2. Average Dice Similarity Coefficient (DSC) for each of the 5-folds. Results for
a base nnU-Net configuration (i.e. using the official nnU-Net framework and training
pipeline without any modification) are shown along with a smaller-sized version of
viola-Unet (s denotes small).
Model nnU-Net-base viola-Unet-s
Fold 0 0.7562 0.7786
Fold 1 0.7345 0.7530
Fold 2 0.7796 0.7990
Fold 3 0.7555 0.8058
Fold 4 0.7746 0.7730
Mean DSC 0.7601 0.7819 (+2.18%)
Table 3. Online validation results without semi-supervised training with the nnU-Net-
base and the viola-Unet-s models, i.e., DSC, HD (hausdorff distance), RVD (Relative
absolute Volume Difference) and NSD (normalized surface dice).
nnU-Net-base 0.7251 ±0.289 null 0.5151 ±0.202 0.2674 ±0.281
viola-Unet-s 0.7373 ±0.260 20.613 ±19.810 0.5148 ±0.187 0.2284 ±0.194
Table 4. Top 10 ranking scores for INSTANCE 2022 online validation phase [data
extracted on 7-Aug-2022]. Note that 3 submissions provided by our team scored in the
top-3. A larger version of the viola-Unet (l denotes large) was fine-tuned with semi-
supervised training and achieved highest validation performance for NSD and RVD
scores, while an ensemble of nnU-Net-r with viola-Unet-l was top for DSC and HD
arren 0.7435 ±0.236 31.616 ±33.221 0.5201 ±0.153 0.3580 ±0.450
asanner 0.7456 ±0.257 21.805 ±21.735 0.5239 ±0.175 1.1381 ±0.112
dongyuDylan 0.7503 ±0.237 29.072 ±26.121 0.5280 ±0.165 0.2301 ±0.218
testliver 0.7537 ±0.236 35.843 ±28.453 0.5289 ±0.165 0.2208 ±0.206
L Lawliet 0.7640 ±0.213 34.323 ±29.207 0.5381 ±0.145 0.2044 ±0.175
yangd05 0.7645 ±0.237 25.725 ±23.801 0.5403 ±0.169 0.2322 ±0.235
amrn 0.7821 ±0.184 32.296 ±30.039 0.5528 ±0.127 0.2027 ±0.182
nnU-Net-r (our) 0.7943 ±0.174 22.799 ±25.423 0.5673 ±0.129 0.1952 ±0.182
viola-Unet-l (our) 0.7951 ±0.171 24.038 ±29.236 0.5693 ±0.125 0.1941 ±0.179
Ensemble (our) 0.7953 ±0.172 21.557 ±25.021 0.5681 ±0.125 0.1980 ±0.180
10 Q. Liu et al.
The graph results for each of the 30 validation cases with the prediction
estimates correspond to our best submission, i.e., the ensemble of nnU-Net-r
and viola-Unet-l models, are shown in Fig. 4.
Fig. 4. The four graphs demonstrate the results for each of the 30 validation cases
[denoted as index]. The prediction estimates correspond to one of the models that
produced a high score on the leaderboard (i.e., an ensemble of nnU-Net-r and viola-
Unet-l models).
4 Conclusions
We demonstrate that it is feasible to segment a range of ICH lesions on CT
imaging by training a conventional nnU-Net and an architecture that we devel-
oped that is referred to as a viola U-Net deep learning model. The viola U-Net
architecture is a novel conception. The flexible configurations were designed to
Viola-Unet to segment ICH 11
achieve high performance despite a limited training sample size. Notably, the
model relied on image inputs that retained the 3-dimensional information from
the CT images and orthogonal projections in the feature space were used to
increase the between-plane information during the decoder layers of the U-Net.
This design produced better segmentation results compared to the nnU-Net.
The viola-Unet architecture did not incur additional computation costs and
converged more rapidly than the nnU-Net despite comparable in the number
of trainable parameters.
1. Rodrigues, M.A., E Samarasekera, N., Lerpiniere, C., Perry, L.A., Moullaali, T.J.,
J M Loan, J., Wardlaw, J.M., Al-Shahi Salman, R.: Association between Com-
puted Tomographic Biomarkers of Cerebral Small Vessel Diseases and Long-Term
Outcome after Spontaneous Intracerebral Hemorrhage. Ann Neurol 89(2), 266–279
(02 2021)
2. Zumkeller, M., Behrmann, R., Heissler, H.E., Dietz, H.: Computed tomographic
criteria and survival rate for patients with acute subdural hematoma. Neurosurgery
39(4), 708–712 (Oct 1996)
3. Kothari, R.U., Brott, T., Broderick, J.P., Barsan, W.G., Sauerbeck, L.R., Zuc-
carello, M., Khoury, J.: The ABCs of measuring intracerebral hemorrhage volumes.
Stroke 27(8), 1304–1305 (Aug 1996)
4. Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional networks for biomed-
ical image segmentation. In: International Conference on Medical image computing
and computer-assisted intervention. pp. 234–241. Springer (2015)
5. Arab, A., Chinda, B., Medvedev, G., Siu, W., Guo, H., Gu, T., Moreno, S.,
Hamarneh, G., Ester, M., Song, X.: A fast and fully-automated deep-learning ap-
proach for accurate hemorrhage segmentation and volume quantification in non-
contrast whole-head CT. Scientific Reports 10(1), 1–12 (2020)
6. Hssayeni, M.D., Croock, M.S., Salman, A.D., Al-khafaji, H.F., Yahya, Z.A., Gho-
raani, B.: Intracranial hemorrhage segmentation using a deep convolutional model.
Data 5(1), 14 (2020)
7. Patel, A., Schreuder, F.H., Klijn, C.J., Prokop, M., Ginneken, B.v., Marquering,
H.A., Roos, Y.B., Baharoglu, M., Meijer, F.J., Manniesing, R.: Intracerebral haem-
orrhage segmentation in non-contrast CT. Scientific reports 9(1), 1–11 (2019)
8. Sharrock, M.F., Mould, W.A., Ali, H., Hildreth, M., Awad, I.A., Hanley, D.F.,
Muschelli, J.: 3D deep neural network segmentation of intracerebral hemorrhage:
Development and validation for clinical trials. Neuroinformatics 19(3), 403–415
9. Yu, N., Yu, H., Li, H., Ma, N., Hu, C., Wang, J.: A Robust Deep Learning Segmen-
tation Method for Hematoma Volumetric Detection in Intracerebral Hemorrhage.
Stroke 53(1), 167–176 (01 2022)
10. Futrega, M., Milesi, A., Marcinkiewicz, M., Ribalta, P.: Optimized U-Net for Brain
Tumor Segmentation. arXiv preprint arXiv:2110.03352 (2021)
11. Luu, H.M., Park, S.H.: Extending nn-UNet for brain tumor segmentation. arXiv
preprint arXiv:2112.04653 (2021)
12. Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: a
self-configuring method for deep learning-based biomedical image segmentation.
Nature methods 18(2), 203–211 (2021)
12 Q. Liu et al.
13. Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J.,
Burren, Y., Porz, N., Slotboom, J., Wiest, R., et al.: The multimodal brain tumor
image segmentation benchmark (BRATS). IEEE transactions on medical imaging
34(10), 1993–2024 (2014)
14. Li, X., Wang, K., Liu, J., Wang, H., Xu, M., Liang, X.: The 2022 Intracranial
Hemorrhage Segmentation Challenge on Non-Contrast head CT (NCCT) (Mar
15. Zhu, Q., Du, B., Turkbey, B., Choyke, P.L., Yan, P.: Deeply-supervised cnn for
prostate segmentation. In: 2017 international joint conference on neural networks
(IJCNN). pp. 178–184. IEEE (2017)
16. Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori,
K., McDonagh, S., Hammerla, N.Y., Kainz, B., et al.: Attention U-Net: Learning
Where to Look For the Pancreas. arXiv preprint arXiv:1804.03999 (2018)
17. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the
IEEE conference on computer vision and pattern recognition. pp. 7132–7141 (2018)
18. Liu, Q., Kampffmeyer, M., Jenssen, R., Salberg, A.B.: Dense Dilated Convolutions’
Merging Network for Land Cover Classification. IEEE Transactions on Geoscience
and Remote Sensing 58(9), 6309–6320 (2020)
19. Wu, Y., He, K.: Group normalization. In: Proceedings of the European conference
on computer vision (ECCV). pp. 3–19 (2018)
20. Li, X., Luo, G., Wang, W., Wang, K., Gao, Y., Li, S.: Hematoma expansion context
guided intracranial hemorrhage segmentation and uncertainty estimation. IEEE
Journal of Biomedical and Health Informatics 26(3), 1140–1151 (2021)
21. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen,
T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high-
performance deep learning library. Advances in neural information processing sys-
tems 32 (2019)
22. Loshchilov, I., Hutter, F.: SGDR: Stochastic gradient descent with warm restarts.
ICLR (2017)
23. Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks
for volumetric medical image segmentation. In: 2016 fourth international confer-
ence on 3D vision (3DV). pp. 565–571. IEEE (2016)
24. Lin, T.Y., Goyal, P., Girshick, R., He, K., Doll´ar, P.: Focal loss for dense object
detection. In: Proceedings of the IEEE international conference on computer vision.
pp. 2980–2988 (2017)
25. Amini, M.R., Feofanov, V., Pauletto, L., Devijver, E., Maximov, Y.: Self-training:
A survey. arXiv preprint arXiv:2202.12040 (2022)
26. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:
Proceedings of the IEEE conference on computer vision and pattern recognition.
pp. 770–778 (2016)
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Brain tumor segmentation is essential for the diagnosis and prognosis of patients with gliomas. The brain tumor segmentation challenge has provided an abundant and high-quality data source to develop automatic algorithms for the task. This paper describes our contribution to the 2021 competition. We developed our methods based on nn-UNet, the winning entry of last year’s competition. We experimented with several modifications, including using a larger network, replacing batch normalization with group normalization and utilizing axial attention in the decoder. Internal 5-fold cross-validation and online evaluation from the organizers showed a minor improvement in quantitative metrics compared to the baseline. The proposed models won first place in the final ranking on unseen test data, achieving a dice score of 88.35%, 88.78%, 93.19% for the enhancing tumor, the tumor core, and the whole tumor, respectively. The codes, pretrained weights, and docker image for the winning submission are publicly available. (
Full-text available
Accurate segmentation of the Intracranial Hemorrhage (ICH) in non-contrast CT images is significant for computer-aided diagnosis. Although existing methods have achieved remarkable <sup>1</sup> <sup>1</sup> The code will be available from . results, none of them incorporated ICH’s prior information in their methods. In this work, for the first time, we proposed a novel SLice EXpansion Network (SLEX-Net), which incorporated hematoma expansion in the segmentation architecture by directly modeling the hematoma variation among adjacent slices. Firstly, a new module named Slice Expansion Module (SEM) was built, which can effectively transfer contextual information between two adjacent slices by mapping predictions from one slice to another. Secondly, to perceive contextual information from both upper and lower slices, we designed two information transmission paths: forward and backward slice expansion, and aggregated results from those paths with a novel weighing strategy. By further exploiting intra-slice and inter-slice context with the information paths, the network significantly improved the accuracy and continuity of segmentation results. Moreover, the proposed SLEX-Net enables us to conduct an uncertainty estimation with one-time inference, which is much more efficient than existing methods. We evaluated the proposed SLEX-Net and compared it with some state-of-the-art methods. Experimental results demonstrate that our method makes significant improvements in all metrics on segmentation performance and outperforms other existing uncertainty estimation methods in terms of several metrics.
Full-text available
Biomedical imaging is a driver of scientific discovery and a core component of medical care and is being stimulated by the field of deep learning. While semantic segmentation algorithms enable image analysis and quantification in many applications, the design of respective specialized solutions is non-trivial and highly dependent on dataset properties and hardware conditions. We developed nnU-Net, a deep learning-based segmentation method that automatically configures itself, including preprocessing, network architecture, training and post-processing for any new task. The key design choices in this process are modeled as a set of fixed parameters, interdependent rules and empirical decisions. Without manual intervention, nnU-Net surpasses most existing approaches, including highly specialized solutions on 23 public datasets used in international biomedical segmentation competitions. We make nnU-Net publicly available as an out-of-the-box tool, rendering state-of-the-art segmentation accessible to a broad audience by requiring neither expert knowledge nor computing resources beyond standard network training.
Full-text available
Objective A study was undertaken to assess whether cerebral small vessel disease (SVD) computed tomographic (CT) biomarkers are associated with long‐term outcome after intracerebral hemorrhage. Methods We performed a prospective, community‐based cohort study of adults diagnosed with spontaneous intracerebral hemorrhage between June 1, 2010 and May 31, 2013. A neuroradiologist rated the diagnostic brain CT for acute intracerebral hemorrhage features and SVD biomarkers. We used severity of white matter lucencies and cerebral atrophy, and the number of lacunes to calculate the CT SVD score. We assessed the association between CT SVD biomarkers and either death, or death or dependence (modified Rankin Scale scores = 4–6) 1 year after first‐ever intracerebral hemorrhage using logistic regression, adjusting for known predictors of outcome. Results Within 1 year of intracerebral hemorrhage, 224 (56%) of 402 patients died. In separate models, 1‐year death was associated with severe atrophy (adjusted odds ratio [aOR] = 2.54, 95% confidence interval [CI] = 1.44–4.49, p = 0.001) but not lacunes or severe white matter lucencies, and CT SVD sum score ≥ 1 (aOR = 2.50, 95% CI = 1.40–4.45, p = 0.002). Two hundred seventy‐seven (73%) of 378 patients with modified Rankin Scale data were dead or dependent at 1 year. In separate models, 1‐year death or dependence was associated with severe atrophy (aOR = 3.67, 95% CI = 1.71–7.89, p = 0.001) and severe white matter lucencies (aOR = 2.18, 95% CI = 1.06–4.51, p = 0.035) but not lacunes, and CT SVD sum score ≥ 1 (aOR = 2.81, 95% CI = 1.45–5.46, p = 0.002). Interpretation SVD biomarkers on the diagnostic brain CT are associated with 1‐year death and dependence after intracerebral hemorrhage, independent of known predictors of outcome.
Full-text available
This project aimed to develop and evaluate a fast and fully-automated deep-learning method applying convolutional neural networks with deep supervision (CNN-DS) for accurate hematoma segmentation and volume quantification in computed tomography (CT) scans. Non-contrast whole-head CT scans of 55 patients with hemorrhagic stroke were used. Individual scans were standardized to 64 axial slices of 128 × 128 voxels. Each voxel was annotated independently by experienced raters, generating a binary label of hematoma versus normal brain tissue based on majority voting. The dataset was split randomly into training (n = 45) and testing (n = 10) subsets. A CNN-DS model was built applying the training data and examined using the testing data. Performance of the CNN-DS solution was compared with three previously established methods. The CNN-DS achieved a Dice coefficient score of 0.84 ± 0.06 and recall of 0.83 ± 0.07, higher than patch-wise U-Net (< 0.76). CNN-DS average running time of 0.74 ± 0.07 s was faster than PItcHPERFeCT (> 1412 s) and slice-based U-Net (> 12 s). Comparable interrater agreement rates were observed between “method-human” vs. “human–human” (Cohen’s kappa coefficients > 0.82). The fully automated CNN-DS approach demonstrated expert-level accuracy in fast segmentation and quantification of hematoma, substantially improving over previous methods. Further research is warranted to test the CNN-DS solution as a software tool in clinical settings for effective stroke management.
Full-text available
Intracranial hemorrhage (ICH) occurs when a blood vessel ruptures in the brain. This leads to significant morbidity and mortality, the likelihood of which is predicated on the size of the bleeding event. X-ray computed tomography (CT) scans allow clinicians and researchers to qualitatively and quantitatively diagnose hemorrhagic stroke, guide interventions and determine inclusion criteria of patients in clinical trials. There is no currently available open source, validated tool to quickly segment hemorrhage. Using an automated pipeline and 2D and 3D deep neural networks, we show that we can quickly and accurately estimate ICH volume with high agreement with time-consuming manual segmentation. The training and validation datasets include significant heterogeneity in terms of pathology, such as the presence of intraventricular (IVH) or subdural hemorrhages (SDH) as well as variable image acquisition parameters. We show that deep neural networks trained with an appropriate anatomic context in the network receptive field, can effectively perform ICH segmentation, but those without enough context will overestimate hemorrhage along the skull and around calcifications in the ventricular system. We trained with all data from a multi-center phase II study (n = 112) achieving a best mean and median Dice coefficient of 0.914 and 0.919, a volume correlation of 0.979 and an average volume difference of 1.7 ml and root mean squared error of 4.7 ml in 500 out-of-sample scans from the corresponding multi-center phase III study. 3D networks with appropriate anatomic context outperformed both 2D and random forest models. Our results suggest that deep neural network models, when carefully developed can be incorporated into the workflow of an ICH clinical trial series to quickly and accurately segment ICH, estimate total hemorrhage volume and minimize segmentation failures. The model, weights and scripts for deployment are located at . This is the first publicly available neural network model for segmentation of ICH, the only model evaluated with the presence of both IVH and SDH and the only model validated in the workflow of a series of clinical trials.
Full-text available
Land cover classification of remote sensing images is a challenging task due to limited amounts of annotated data, highly imbalanced classes, frequent incorrect pixel-level annotations, and an inherent complexity in the semantic segmentation task. In this article, we propose a novel architecture called the dense dilated convolutions' merging network (DDCM-Net) to address this task. The proposed DDCM-Net consists of dense dilated image convolutions merged with varying dilation rates. This effectively utilizes rich combinations of dilated convolutions that enlarge the network's receptive fields with fewer parameters and features compared with the state-of-the-art approaches in the remote sensing domain. Importantly, DDCM-Net obtains fused local- and global-context information, in effect incorporating surrounding discriminative capability for multiscale and complex-shaped objects with similar color and textures in very high-resolution aerial imagery. We demonstrate the effectiveness, robustness, and flexibility of the proposed DDCM-Net on the publicly available ISPRS Potsdam and Vaihingen data sets, as well as the DeepGlobe land cover data set. Our single model, trained on three-band Potsdam and Vaihingen data sets, achieves better accuracy in terms of both mean intersection over union (mIoU) and F1-score compared with other published models trained with more than three-band data. We further validate our model on the DeepGlobe data set, achieving state-of-the-art result 56.2% mIoU with much fewer parameters and at a lower computational cost compared with related recent work.
Full-text available
Traumatic brain injuries may cause intracranial hemorrhages (ICH). ICH could lead to disability or death if it is not accurately diagnosed and treated in a time-sensitive procedure. The current clinical protocol to diagnose ICH is examining Computerized Tomography (CT) scans by radiologists to detect ICH and localize its regions. However, this process relies heavily on the availability of an experienced radiologist. In this paper, we designed a study protocol to collect a dataset of 82 CT scans of subjects with a traumatic brain injury. Next, the ICH regions were manually delineated in each slice by a consensus decision of two radiologists. The dataset is publicly available online at the PhysioNet repository for future analysis and comparisons. In addition to publishing the dataset, which is the main purpose of this manuscript, we implemented a deep Fully Convolutional Networks (FCNs), known as U-Net, to segment the ICH regions from the CT scans in a fully-automated manner. The method as a proof of concept achieved a Dice coefficient of 0.31 for the ICH segmentation based on 5-fold cross-validation.
We propose an optimized U-Net architecture for a brain tumor segmentation task in the BraTS21 challenge. To find the optimal model architecture and the learning schedule, we have run an extensive ablation study to test: deep supervision loss, Focal loss, decoder attention, drop block, and residual connections. Additionally, we have searched for the optimal depth of the U-Net encoder, number of convolutional channels and post-processing strategy. Our method won the validation phase and took third place in the test phase. We have open-sourced the code to reproduce our BraTS21 submission at the NVIDIA Deep Learning Examples GitHub Repository (
Background and Purpose Hematoma volume (HV) is a significant diagnosis for determining the clinical stage and therapeutic approach for intracerebral hemorrhage (ICH). The aim of this study is to develop a robust deep learning segmentation method for the fast and accurate HV analysis using computed tomography. Methods A novel dimension reduction UNet (DR-UNet) model was developed for computed tomography image segmentation and HV measurement. Two data sets, 512 ICH patients with 12 568 computed tomography slices in the retrospective data set and 50 ICH patients with 1257 slices in the prospective data set, were used for network training, validation, and internal and external testing. Moreover, 13 irregular hematoma cases, 11 subdural and epidural hematoma cases, and 50 different HV cases into 3 groups (<30, 30–60, and >60 mL) were selected to further evaluate the robustness of DR-UNet. The image segmentation performance of DR-UNet was compared with those of UNet, the fuzzy clustering method, and the active contour method. The HV measurement performance was compared using DR-UNet, UNet, and the Coniglobus formula method. Results Using DR-UNet, the segmentation model achieved a performance similar to that of expert clinicians in 2 independent test data sets containing internal testing data (Dice of 0.861±0.139) and external testing data (Dice of 0.874±0.130). The HV measurement derived from DR-UNet was strongly correlated with that from manual segmentation (R ² =0.9979; P <0.0001). In the irregularly shaped hematoma group and the subdural and epidural hematoma group, DR-UNet was more robust than UNet in both hematoma segmentation and HV measurement. There is no statistical significance in segmentation accuracy among 3 different HV groups. Conclusions DR-UNet can segment hematomas from the computed tomography scans of ICH patients and quantify the HV with better accuracy and greater efficiency than the main existing methods and with similar performance to expert clinicians. Due to robust performance and stable segmentation on different ICHs, DR-UNet could facilitate the development of deep learning systems for a variety of clinical applications.