PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Motion artefacts in magnetic resonance brain images are a crucial issue. The assessment of MR image quality is fundamental before proceeding with the clinical diagnosis. If the motion artefacts alter a correct delineation of structure and substructures of the brain, lesions, tumours and so on, the patients need to be re-scanned. Otherwise, neuro-radiologists could report an inaccurate or incorrect diagnosis. The first step right after scanning a patient is the "\textit{image quality assessment}" in order to decide if the acquired images are diagnostically acceptable. An automated image quality assessment based on the structural similarity index (SSIM) regression through a residual neural network has been proposed here, with the possibility to perform also the classification in different groups - by subdividing with SSIM ranges. This method predicts SSIM values of an input image in the absence of a reference ground truth image. The networks were able to detect motion artefacts, and the best performance for the regression and classification task has always been achieved with ResNet-18 with contrast augmentation. Mean and standard deviation of residuals' distribution were μ=0.0009\mu=-0.0009 and σ=0.0139\sigma=0.0139, respectively. Whilst for the classification task in 3, 5 and 10 classes, the best accuracies were 97, 95 and 89\%, respectively. The obtained results show that the proposed method could be a tool in supporting neuro-radiologists and radiographers in evaluating the image quality before the diagnosis.
Content may be subject to copyright.
JUNE 2022 1
Automated SSIM Regression for Detection and
Quantification of Motion Artefacts in Brain MR
Images
Alessandro Sciarra Soumick Chatterjee , Max D¨
unnwald ,
Giuseppe Placidi , Andreas N ¨
urnberger , Oliver Speck and Steffen Oeltze-Jafra
AbstractMotion artefacts in magnetic resonance brain
images are a crucial issue. The assessment of MR image
quality is fundamental before proceeding with the clinical
diagnosis. If the motion artefacts alter a correct delineation
of structure and substructures of the brain, lesions, tu-
mours and so on, the patients need to be re-scanned.
Otherwise, neuro-radiologists could report an inaccurate
or incorrect diagnosis. The first step right after scanning
a patient is the image quality assessment in order to
decide if the acquired images are diagnostically accept-
able.An automated image quality assessment based on
the structural similarity index (SSIM) regression through
a residual neural network has been proposed here, with
the possibility to perform also the classification in different
groups - by subdividing with SSIM ranges. This method
predicts SSIM values of an input image in the absence of
a reference ground truth image. The networks were able
to detect motion artefacts, and the best performance for
the regression and classification task has always been
achieved with ResNet-18 with contrast augmentation. Mean
and standard deviation of residuals’ distribution were µ=
0.0009 and σ=0.0139, respectively. Whilst for the classifi-
cation task in 3, 5 and 10 classes, the best accuracies were
97, 95 and 89%, respectively. The obtained results show
that the proposed method could be a tool in supporting
neuro-radiologists and radiographers in evaluating the im-
age quality before the diagnosis.
Index TermsMotion artefacts, SSIM, image quality as-
sessment, ResNet, regression, classification.
I. INTRODUCTION
Image quality assessment (IQA) is a fundamental apparatus
for evaluating MR images [1]–[3]. The main purpose of this
Alessandro Sciarra, Max D¨
unnwald and Steffen Oeltze-Jafra are
with the Medicine and Digitalization - MedDigit group, Department of
Neurology, Medical Faculty, University Hospital Magdeburg, Germany
Alessandro Sciarra, Soumick Chatterjee and Oliver Speck are with
the Department of Biomedical Magnetic Resonance, Faculty of Natural
Sciences, Otto von Guericke University, Germany
Soumick Chatterjee and Andreas N¨
urnberger are with the Data and
Knowledge Engineering Group, Faculty of Computer Science, Otto von
Guericke University, Germany
Max D¨
unnwald is with the Faculty of Computer Science, Otto von
Guericke University, Germany
Giuseppe Placidi is with the Department of Life, Health, and Environ-
mental Sciences, University of L’Aquila, Italy
Andreas N¨
urnberger, Oliver Speck and Steffen Oeltze-Jafra are with
the Center for Behavioral Brain Sciences, Magdeburg, Germany
Oliver Speck and Steffen Oeltze-Jafra are with the German Center for
Neurodegenerative Disease, Magdeburg, Germany
process is to find out if the quality can guarantee images are
diagnostically reliable and exempted from artefacts - to avoid
possible unreliable diagnosis [4], [5]. Often the evaluation
process requires time and is also subjectively dependent upon
the observer in charge of carrying it out [6]. Furthermore,
different levels of expertise and experience of the readers
(experts designated to perform the IQA) could lead to a non-
perfect matching assessment. Another intrinsic issue of the
IQA for MR images is the absence of a reference image.
No-Reference IQA techniques with and without the support
of machine and deep learning support have been proposed
in the last years for the evaluation of the visual image
quality [3], [4], [7], [8], [8]–[14]. These techniques are able
to detect and quantify the level of blurriness or corruption
with different levels of accuracy and precision. However, there
are many factors to take into consideration when choosing
which technique to apply, the most important are [15]–[17]:
data requirement - as deep learning requires a large dataset
while traditional machine learning (non deep learning based)
techniques can be trained on lesser data; accuracy - deep
learning provides higher accuracy than traditional machine
learning; training time - deep learning takes longer time
than traditional machine learning; hyperparameter tuning -
deep learning can be tuned in various different ways, and
it is not always possible to find the best parameters, while
machine learning offers limited tuning capabilities. In addition,
when choosing traditional machine learning techniques, the
fundamental step of feature extraction must be considered.
Although the list of traditional machine learning and deep
learning techniques used for regression and classification tasks
is constantly updated [18]–[21], there is still no gold standard
IQA for MR images [2]. The aim of this work is to create
an automated IQA tool that is able to detect the presence
of motion artefacts and quantify the level of corruption or
distortion compared to an ”artefact-free” counterpart, based on
the regression of the structural similarity index (SSIM) [22].
This tool has been designed to be able to work for a large
variety of MR image contrast, such as T1, T2, PD and Flair
weighted images, and independently from the resolution and
orientation of the considered image. Additionally, a contrast
augmentation step has been introduced in order to increase
the range of variability of the weighting. In practice, when
the MRIs are acquired and if there are any artefacts in
arXiv:2206.06725v1 [eess.IV] 14 Jun 2022
2 JUNE 2022
the image, ”artefact-free” counterparts are not available to
compare the image against for quality assessment. But for the
SSIM calculation, it is always necessary to have two images
(corrupted vs motion-artefact free images). For this reason, in
this work, the corrupted images were artificially created by
making use of two different algorithms - one implemented by
Shaw et al. [23] (package of the library TorchIO [24]) and a
second algorithm developed in-house [25]. Furthermore, when
training a neural network model in a fully-supervised manner,
as in this case, a large amount of labelled or annotated data is
typically required [26]. In this research on IQA, the regression
labels for training were created by comparing the artificially-
corrupted images against the original artefact-free images with
the help of SSIM and those SSIM values were finally used as
the regression labels.
II. METHODOLOGY
The proposed automatic IQA tool relies on residual neural
networks (ResNet) [27], [28]. Two different versions of ResNet
were used, with 18 (ResNet-18) and 101 (ResNet-101) residual
blocks. Every model has been trained two times - with
and without the contrast augmentation step. These are steps
executed during the training, Figure 1:
1) Given a 3D input volume one random slice (2D image)
is selected from one of the possible orientations - axial,
sagittal, and coronal. In case of an anisotropic volume,
the slice selection is done only following the original
acquisition orientation.
2) In case of contrast augmentation is enabled, one of the
contrast augmentation algorithms is selected randomly
from the following and applied on the input image:
Random gamma adjustment [29]
Random logarithmic adjustment [30]
Random sigmoid adjustment on the input image [31]
Random adaptive histogram adjustment [32]
3) Motion corruption is applied on the 2D image using one
of these two options:
TorchIO [23], [24], Figure 2 (a)
”in-house” algorithm, Figure 2 (b)
4) The SSIM is calculated between the 2D input image and
the corresponding corrupted one.
5) The calculated SSIM value and the corrupted image are
passed to the chosen model for training
Three datasets - train, validation, and test sets - were used
for this work, Table I. For training, 200 volumes were used,
while 50 were used for validation and 50 for testing. The
first group of 68 volumes were selected from the public IXI
dataset 1, the second group (Table I, Site-A) of 114 volumes
were acquired with a 3T scanner, the third group (Table I,
Site-B) of 93 volumes was acquired at 7T, and a final group
(Table I, Site-B) of 25 volumes was acquired with different
scanners (1.5 and 3T). The volumes from IXI, Site-A, and
Site-B were resampled in order to have an isotropic resolution
of 1.00 mm3.
1Dataset available at: https://brain-development.org/ixi-dataset/
Fig. 1. Graphical illustration of all steps for the training as explained in
Section II.
The loss during training was calculated using the mean
squared error (MSE) [33] and was optimised using the Adam
optimiser [34] with a learning rate of 1e3and a batch size
of 100 for 2000 epochs. All the images (during training,
validation, and testing) were always normalised, and resized
or padded to have a 2D matrix size of 256x256.
For testing, a total of 10000 images were repetitively
selected randomly and then corrupted from the 50 volumes of
the test dataset - applying the random orientation selection,
the contrast augmentation, and finally the corruption - as
performed during the training stage.
In order to evaluate the performances of the trained models,
first the predicted SSIM values were plotted against the ground
truth SSIM values as shown in Figure 3, next the residuals
were calculated as follow Residuals =SS IMpredicted
SSIMgroundtruth , Figure 4.
The predicted SSIM value of an image can be considered
SCIARRA et al.: AUTOMATED SSIM REGRESSION FOR DETECTION AND QUANTIFICATION OF MOTION ARTEFACTS IN BRAIN MR IMAGES 3
TABLE I
DATA FO R TR AIN IN G,VAL IDATI ON A ND TE ST ING .
Data Weighting Volumes Matrix Size Resolution (mm3)
m(M) x m(M) x m(M)† m(M) x m(M) x m(M)†
TRAINING
IXI T1,T2,PD 15,15,15 230(240)x230(240)x134(162) 1.00 isotropic
Site-A T1,T2,PD,FLAIR 20,20,20,20 168(168)x224(224)x143(144) 1.00 isotropic
Site-B T1,T2,FLAIR 20,20,20 156(156)x224(224)x100(100) 1.00 isotropic
Site-C T1 3 192(512)x256(512)x36(256) 0.45(1.00)x0.45(0.98)x0.98(4.40)
Site-C T2 11 192(640)x192(640)x32(160) 0.42(1.09)x0.42(1.09)x1.00(4.40)
Site-C FLAIR 1 320x320x34 0.72x0.72x4.40
VALIDATION
IXI T1,T2,PD 1,5,7 230(240)x230(240)x134(162) 1.00 isotropic
Site-A T1,T2,PD,FLAIR 4,4,4,4 168(168)x224(224)x143(144) 1.00 isotropic
Site-B T1,T2,FLAIR 6,6,4 156(156)x224(224)x100(100) 1.00 isotropic
Site-C T1 3 176(240)x240(256)x118(256) 1.00 isotropic
Site-C T2 1 240x320x80 0.80x0.80x2.00
Site-C PD 1 240x320x80 0.80x0.80x2.00
TESTING
IXI T1,T2,PD 2,4,4 230(240)x230(240)x134(162) 1.00 isotropic
Site-A T1,T2,PD,FLAIR 6,4,4,4 168(168)x224(224)x143(144) 1.00 isotropic
Site-B T1,T2,FLAIR 6,6,5 156(156)x224(224)x100(100) 1.00 isotropic
Site-C T1 2 288(320)x288(320)x35(46) 0.72(0.87)x0.72(0.87)x3.00(4.40)
Site-C T2 2 320(512)x320(512)x34(34) 0.44(0.72)x0.45(0.72)x4.40(4.40)
Site-C FLAIR 1 320x320x35 0.70x0.70x4.40
†: ”m” indicates the minimum value while ”M” the maximum.
Fig. 2. Samples of artificially corrupted images. On the left column
original images, on the right the corrupted ones. (a): image corrupted
making use of TorchIO library, (b): image corrupted making use of the
home-made algorithm
equivalent to measuring the distortion or corruption level of the
image. However, when applying this approach to a real clinical
case, it is challenging to compare this value with a subjective
assessment. To get around this problem, the regression task
was simplified into a classification one. For the same, three
different experiments were performed by choosing a different
number of classes - 3, 5 and 10 classes. For every case, the
SSIM range [0-1] was equally divided in order to have equal
sub-ranges. For instance, in case of 3 classes, there were three
sub-ranges, class-1:[0.00-0.33], class-2:[0.34-0.66] and class-
3:[0.67-1.00]. A similar step was also performed for creating
5 and 10 classes.
A second dataset was also used for testing the trained models -
comprised of randomly selected images from clinical acquisi-
tions. This dataset contained five subjects, each with a different
number of scans, as shown in Table II. In this case, there
were no ground truth reference images, and for this reason,
the images were also subjectively evaluated by one expert
using the following classification scheme: class 1 - images
with good to a high quality that might have minor motion
artefacts, but not altering structures and substructures of the
brain (SSIM range between 0.85 and 1.00); class 2 - images
with sufficient to good quality, in this case, the images can
have motion artefacts that prevent a correct delineation of the
brain structures, substructures or lesions (SSIM range between
0.60 and 0.85); and class 3 - image with insufficient quality
and a re-scan will be required (SSIM range between 0.00 and
0.60). Additionally, this dataset contained different contrasts
not included in the training, such as diffusion-weighted images
(DWI).
III. RES ULTS
The results for the first section, the regression task, are
presented in Figures 3 and 4. Figure 3 shows a scatter plot
where the predicted SSIM values are compared against the
ground truth values. Additionally, the plot shows the plotted
linear fitting performed for each trained model. Finally, the
distributions of the ground truth and predicted SSIM values
are also shown. Figure 3 presents general comparisons across
all the trained models and their qualitative dispersion levels. In
this case, the term dispersion implies how much the predicted
4 JUNE 2022
TABLE II
CLI NIC AL DATA
Data Weighting Volumes Matrix Size Resolution (mm3)
m(M) x m(M) x m(M)† m(M) x m(M) x m(M)†
Subj. 1 T1,T2,FLAIR 1,4,2 130(560)x256(560)x26(256) 0.42(1.00)x0.42(0.94)x0.93(4.40)
Subj. 2 T2 3 288(320)x288(320)x28(28) 0.76(0.81)x0.76(0.81)x5.50(5.50)
Subj. 3 T1,T2,FLAIR,DWI,(§) 1,2,1,4,1 256(640)x256(640)x32(150) 0.42(0.90)x0.42(0.90)x0.45(4.40)
Subj. 4 T2, FLAIR, DWI 1,2,6 144(512)x144(512)x20(34) 0.45(1.40)x0.45(1.40)x2.00(4.40)
Subj. 5 T2, FLAIR, DWI 3,1,4 256(640)x256(640)x28(42) 0.40(1.09)x0.40(1.09)x3.30(6.20)
†: ”m” indicates the minimum value while ”M” the maximum.
Fig. 3. Scatter plot for the regression task. There are also shown the
linear fittings for each group of data. On the top: ground truth SSIM
values distribution; right side: predicted SSIM values distributions for
each group of data.
SSIM values differ from the ground-truth SSI Mpredicted =
SSIMgroundtruth . On the other hand, in Figure 4, the
results are shown separately using the scatter plots - for each
model. The relative residual distribution plots are explained in
section II. For the residual distributions, a further statistical
normal distribution fitting was carried out, making the use
of the python package SciPy [35]. The calculated mean and
standard deviation values are shown in Figure 4. According to
the statistical analysis, the model that has the smallest standard
deviation (σ= 0.0139) and the mean value closer to zero
(µ=0.0009) was the ResNet-18 model trained with contrast
augmentation, while the model with the mean value farther
from zero and largest standard deviation was the ResNet-101
trained without contrast augmentation. A clear effect of the
contrast augmentation for both models ResNet-18 and ResNet-
101 can be seen from the results - reflected as a reduction of
the standard deviation values, and this visually correlates with
a lower dispersion level in the scatter plots.
The results for the classification task are shown in Figure 5
and table III. Figure 5 shows the logarithmic confusion matri-
ces obtained for the classification task. From the matrices, it
can be noted that all the trained models performed well and in
a similar way. In particular, none of the matrices presents non-
zero elements far from the diagonal, but only the neighbouring
ones - as commonly expected from a classification task. The
table III is complementary to Figure 5. It shows the class-wise,
macro-average and weighted average of precision, recall, and
f1-score for all the trained models. This table also presents
the accuracy. For all the three scenarios, 3, 5 and 10 classes
as presented in section II, once again, the model with the best
results is ResNet-18 trained with contrast augmentation. This
model always obtained the highest accuracy value - 97, 95
and 89% for 3, 5, and 10 class scenarios, respectively. Even
though the ResNet-18 with contrast augmentation performed
better than the other models, no substantial differences can be
discerned from the tabular data. But once again, it is possible
to observe an improvement in terms of performance when the
contrast augmentation is applied.
The results regarding the clinical data samples are shown
in Figure 6. In this case, the obtained SSIM predictions are
shown for each model - overlayed with the subjective scores
- shown in a per-slice manner grouped by the subjects. As
introduced in section II, the subjective ratings for the clinical
data samples were within the classes 1, 2 or 3 - after a careful
visual evaluation. If the predictions obtained with the different
models fall within the classes assigned by the subjective
evaluation, this will imply that there is an agreement between
the objective and subjective evaluations. When the objective
prediction lies outside the class assigned by the expert, this
indicates a disagreement between the two assessments. The
percentage of agreement between subjective and objective
analysis is 76.6±0.8% (mean ±standard deviation), with
a minimum value of 75.5% achieved by ResNet-101 without
contrast augmentation and a maximum of 77.7% by ResNet-
101 with contrast augmentation.
IV. DISCUSSION
The performances of the trained models when solving the
regression task were very similar. However, when the two
models ResNet-18 and ResNet-101 were coupled with contrast
augmentation showed a distinct improvement. Looking at
the Residuals distributions of the errors, for both models,
contrast augmentation has been the reason why the mean
values fell closer to zero and also, the values of the standard
deviation decreased by 1.5and 1.44 times for ResNet-18
and ResNet-101, respectively. The reduction of the standard
deviations is quite evident also in the scatter plots, where the
dispersion level is visibly less when the contrast augmentation
is applied.
While considering the classification task, the first notable
thing is that there is a linear decrease in the accuracy as
the number of classes increases - 97,95 and 89%. This
SCIARRA et al.: AUTOMATED SSIM REGRESSION FOR DETECTION AND QUANTIFICATION OF MOTION ARTEFACTS IN BRAIN MR IMAGES 5
Fig. 4. Scatter plot SSIM predicted against ground truth values and Residuals distribution for (a) ResNet-18 without contrast augmentation, (b)
ResNet-18 with contrast augmentation, (c) ResNet-101 without contrast augmentation and (d) ResNet-101 with contrast augmentation.
can be explained by the fact that as the number of classes
increases, the difficulty level also increases for each model to
classify the image in the correct pre-defined range of SSIM
values. The confusion matrices confirm this behaviour - by the
increase of the values being out-of-diagonal, i.e., considering
the ResNet-18 not coupled with contrast augmentation, for the
classification task with three classes, the maximum value out-
of-diagonal is 0.04 (for class-2 and class-3), while considering
the classification task with ten classes, the maximum value
is 0.50 (for class-1). This implies that the ResNet-18, not
coupled with contrast augmentation when performing the 10-
classes classification task, classifies incorrectly 50% of the
tested images. When contrast augmentation is applied, there is
an apparent reduction of wrongly classified images of class-
1. Although this is the general trend observed in Figure 5,
there are also contradiction results, i.e., when looking at the
5-classes classification task for class-1 always considering
ResNet-18 without and with contrast augmentation, there is
a net increase of erroneously classified class-1 images, from
9to 21% of tested images.
The final application on clinical data also provided satis-
factory results, with a maximum agreement rate of 77.7%
between the objective and subjective assessments. A direct
comparison with the previous three-classes classification task
is not possible due to the different subjective schemes selected
(Section II). Although there is a visible reduction in the
performance when the trained models are applied to clinical
data, this can be justified by taking into account several factors.
First of all, the clinical data sample involved type of image
data, such as diffusion acquisition and derived diffusion maps,
which were never seen by the models during the training
step, and secondly, the motion artefacts artificially created
did not cover the infinite possible motion artefacts that can
appear in a truly MR motion corrupted image. A possible
improvement can be obtained by introducing new contrasts
in the training set, different resolutions and orientations. For
example, oblique acquisitions have been not considered in this
work. In addition, the artificial corruption methods used for
this work can be further improved, e.g., including corruption
algorithms based on motion log information recorded by a
tracking device, as commonly used for prospective motion
correction [37], [38], [39]. However, this would require the
availability of raw MR data, and it has to be taken into
account also the computational time to de-correct the images,
comparably slower than the current approaches. Another point
to take into account for the subjective assessment is the bias
introduced by each expert while evaluating the image quality.
In this work, the expert’s perception of image quality is
emulated with good accuracy, 76.6±0.8% , which can not
be considered as a standard reference. Although the subjective
6 JUNE 2022
Fig. 5. Confusion matrices for the classification task. First row 3 classes case, second row 5 classes and third row 10 classes. The columns are
for (a) ResNet-18 without contrast augmentation, (b) ResNet-18 with contrast augmentation, (c) ResNet-101 without contrast augmentation, (d)
ResNet-101 with contrast augmentation, respectively.
assessment can be repeated with the help of several experts,
there will always be differences between them, i.e., years of
experience or different sensitivity to the presence of motion
artefacts in the assessed image. It is also noteworthy that the
SSIM ranges defined for the three classes can be re-defined
following a different scheme. In the scenario explored in this
paper, the scheme has been defined by making use of the
artificially corrupted images and the ground truth images - this
allowed an exact calculation of the SSIM values, and it was
simple to define ranges that visually agree with the scheme
defined in Sect. II.
V. CONCLUSION
This research presents an SSIM-regression based IQA tech-
nique using ResNet models, coupled with contrast augmen-
tations to make them robust against changes in the image
contrasts in clinical scenarios. The method managed to predict
the SSIM values from artificially motion corrupted images
without the ground-truth (motion-free) images with high accu-
racy (residual SSIMs as less as 0.0009 ±0.0139). Moreover,
the motion classes obtained from the predicted SSIMs were
very close to the true ones and achieved a maximum weighted
accuracy of 89% for the ten classes scenario as reported in
Table III, and achieved a maximum accuracy value of 97%
when the number of classes was three (Table III). Considering
the complexity of the problem in quantifying the image
degradation level due to motion artefacts and additionally the
variability of the type of contrast, resolution, etc., the results
obtained are very promising. Further evaluations, including
multiple subjective evaluations, will be performed on clinical
data to judge its clinical applicability and robustness against
changes in real-world scenarios. In addition, other trainings
will be carried out in order to have a larger variety of
images that should include common clinical routine acquisi-
tions such as diffusion-weighted imaging and Time-of-Flight
imaging. Furthermore, it would be beneficial to include images
also acquired at lower magnetic field strength (<1.5T).
Considering the results obtained by ResNet models in this
work, it is reasonable to think that future works can also be
targeted towards a different anatomical body part, focusing,
for instance, on abdominal or cardiac imaging. However, the
reproduction of real looking-like motion artefacts plays a key
role in the performances of deep learning models trained to
have a reference-less image quality assessment tool.
SCIARRA et al.: AUTOMATED SSIM REGRESSION FOR DETECTION AND QUANTIFICATION OF MOTION ARTEFACTS IN BRAIN MR IMAGES 7
TABLE III
RESULTS FOR THE CLASSIFICATION TASK.T HE CLASSIFICATION TASK HAS BEEN PERFORMED THREE TIMES,CONSIDERING 3,5 AND 10 CLASSES,
RESPECTIVELY. "PREC." IS THE ABBREVIATION OF THE TERM PRECISION,WHILE "MAC RO AVG "CORRESPONDS TO MACRO AVERAGE AND "WEIGHT.
AVG"TO WEIGHTED AVERAGE CALCULATED USING THE PYTHON PACKAGE SCIKIT-LEARN [36]. (A)IS FOR RESNET-18 WITHOUT CONTRAST
AUG MEN TATION ,(A)IS FOR RES NET-18 WITH CONTRAST AUGMENTATION,(C)IS FOR RESNE T-101 WITHOUT CONTRAST AUGMENTATION,(C)IS
FO R RESNE T-101 WITH CONTRAST AUGMENTATION.
(a) (b) (c) (d)
Class (SSIM) Prec. Recall f1-score Prec. Recall f1-score Prec. Recall f1-score Prec. Recall f1-score Support
1 [0.00 - 0.33] 0.94 0.97 0.95 0.93 0.97 0.95 0.93 0.98 0.96 0.97 0.89 0.93 117
2 [0.033 - 0.66] 0.95 0.96 0.95 0.97 0.96 0.96 0.94 0.97 0.95 0.98 0.94 0.96 4307
3 [0.66 - 1.00] 0.97 0.96 0.97 0.97 0.98 0.97 0.98 0.95 0.96 0.95 0.99 0.97 5576
accuracy 0.96 0.97 0.96 0.96 10000
macro avg 0.95 0.95 0.96 0.96 0.97 0.96 0.95 0.97 0.96 0.97 0.94 0.95 10000
weight. avg 0.96 0.96 0.96 0.97 0.97 0.97 0.96 0.96 0.96 0.96 0.96 0.96 10000
1 [0.00 - 0.20] 0.97 0.91 0.94 0.93 0.79 0.85 0.94 0.97 0.96 0.85 0.88 0.87 33
2 [0.20 - 0.40] 0.86 0.89 0.88 0.85 0.90 0.87 0.83 0.91 0.87 0.93 0.77 0.84 262
3 [0.40 - 0.60] 0.91 0.92 0.91 0.93 0.92 0.93 0.89 0.94 0.91 0.94 0.90 0.92 2320
4 [0.60 - 0.80] 0.94 0.95 0.94 0.95 0.96 0.96 0.94 0.94 0.94 0.94 0.96 0.95 5021
5 [0.80 - 1.00] 0.96 0.93 0.95 0.96 0.96 0.96 0.97 0.92 0.95 0.95 0.96 0.96 2364
accuracy 0.93 0.95 0.93 0.94 10000
macro avg 0.93 0.92 0.92 0.93 0.91 0.91 0.91 0.93 0.92 0.92 0.89 0.91 10000
weight. avg 0.93 0.93 0.93 0.95 0.95 0.95 0.93 0.93 0.93 0.94 0.94 0.94 10000
1 [0.00 - 0.10] 1.00 0.50 0.67 1.00 0.62 0.77 1.00 0.62 0.77 1.00 0.75 0.86 8
2 [0.10 - 0.20] 0.81 0.88 0.85 0.78 0.72 0.75 0.83 0.96 0.89 0.75 0.84 0.79 25
3 [0.20 - 0.30] 0.90 0.90 0.90 0.81 0.84 0.83 0.87 0.89 0.88 0.91 0.79 0.84 62
4 [0.30 - 0.40] 0.81 0.84 0.83 0.80 0.85 0.83 0.76 0.85 0.80 0.88 0.71 0.79 200
5 [0.40 - 0.50] 0.82 0.86 0.84 0.86 0.87 0.87 0.79 0.87 0.83 0.86 0.83 0.84 689
6 [0.50 - 0.60] 0.84 0.84 0.84 0.89 0.87 0.88 0.83 0.86 0.84 0.89 0.84 0.86 1631
7 [0.60 - 0.70] 0.86 0.88 0.87 0.89 0.89 0.89 0.85 0.87 0.86 0.88 0.88 0.88 2706
8 [0.70 - 0.80] 0.87 0.87 0.87 0.89 0.90 0.89 0.88 0.84 0.86 0.86 0.90 0.88 2315
9 [0.80 - 0.90] 0.86 0.88 0.87 0.89 0.92 0.90 0.89 0.85 0.87 0.87 0.91 0.89 1456
10 [0.80 - 1.0] 0.97 0.86 0.91 0.97 0.91 0.94 0.96 0.88 0.91 0.95 0.93 0.94 908
accuracy 0.87 0.89 0.86 0.88 10000
macro avg 0.88 0.83 0.84 0.88 0.84 0.85 0.86 0.85 0.85 0.88 0.84 0.86 10000
weight. avg 0.87 0.87 0.87 0.89 0.89 0.89 0.86 0.86 0.86 0.88 0.88 0.88 10000
REFERENCES
[1] M. Khosravy, N. Patel, N. Gupta, and I. K. Sethi, “Image quality
assessment: A review to full reference indexes, Recent trends in
communication, computing, and electronics, pp. 279–288, 2019.
[2] L. S. Chow and R. Paramesran, “Review of medical image quality
assessment,” Biomedical signal processing and control, vol. 27, pp. 145–
154, 2016.
[3] B. Mortamet, M. A. Bernstein, C. R. Jack Jr, J. L. Gunter, C. Ward,
P. J. Britson, R. Meuli, J.-P. Thiran, and G. Krueger, Automatic quality
assessment in structural brain magnetic resonance imaging,” Magnetic
Resonance in Medicine: An Official Journal of the International Society
for Magnetic Resonance in Medicine, vol. 62, no. 2, pp. 365–372, 2009.
[4] P. Bourel, D. Gibon, E. Coste, V. Daanen, and J. Rousseau, “Auto-
matic quality assessment protocol for mri equipment,” Medical physics,
vol. 26, no. 12, pp. 2693–2700, 1999.
[5] P. Jezzard, “The physical basis of spatial distortions in magnetic reso-
nance images,” 2009.
[6] J. J. Ma, U. Nakarmi, C. Y. S. Kin, C. M. Sandino, J. Y. Cheng, A. B.
Syed, P. Wei, J. M. Pauly, and S. S. Vasanawala, “Diagnostic image qual-
ity assessment and classification in medical imaging: Opportunities and
challenges,” in 2020 IEEE 17th International Symposium on Biomedical
Imaging (ISBI). IEEE, 2020, pp. 337–340.
[7] L. S. Chow and H. Rajagopal, “Modified-brisque as no reference
image quality assessment for structural mr images,” Magnetic resonance
imaging, vol. 43, pp. 74–87, 2017.
[8] S. J. Sujit, R. E. Gabr, I. Coronado, M. Robinson, S. Datta, and
P. A. Narayana, “Automated image quality evaluation of structural brain
magnetic resonance images using deep convolutional neural networks,
in 2018 9th Cairo International Biomedical Engineering Conference
(CIBEC). IEEE, 2018, pp. 33–36.
[9] O. Esteban, D. Birman, M. Schaer, O. O. Koyejo, R. A. Poldrack, and
K. J. Gorgolewski, “Mriqc: Advancing the automatic prediction of image
quality in mri from unseen sites,” PloS one, vol. 12, no. 9, p. e0184661,
2017.
[10] T. K¨
ustner, A. Liebgott, L. Mauch, P. Martirosian, F. Bamberg, K. Niko-
laou, B. Yang, F. Schick, and S. Gatidis, Automated reference-free
detection of motion artifacts in magnetic resonance images,” Magnetic
Resonance Materials in Physics, Biology and Medicine, vol. 31, no. 2,
pp. 243–256, 2018.
[11] T. K¨
ustner, S. Gatidis, A. Liebgott, M. Schwartz, L. Mauch, P. Mar-
tirosian, H. Schmidt, N. F. Schwenzer, K. Nikolaou, F. Bamberg et al.,
“A machine-learning framework for automatic reference-free quality
assessment in mri,” Magnetic Resonance Imaging, vol. 53, pp. 134–147,
2018.
[12] I. Fantini, L. Rittner, C. Yasuda, and R. Lotufo, Automatic detection of
motion artifacts on mri using deep cnn,” in 2018 International Workshop
on Pattern Recognition in Neuroimaging (PRNI). IEEE, 2018, pp. 1–4.
[13] I. Fantini, C. Yasuda, M. Bento, L. Rittner, F. Cendes, and R. Lotufo,
“Automatic mr image quality evaluation using a deep cnn: A reference-
free method to rate motion artifacts in neuroimaging,” Computerized
Medical Imaging and Graphics, vol. 90, p. 101897, 2021.
[14] L. L. Backhausen, M. M. Herting, J. Buse, V. Roessner, M. N. Smolka,
and N. C. Vetter, “Quality control of structural mri images applied using
freesurfer—a hands-on workflow to rate motion artifacts, Frontiers in
neuroscience, vol. 10, p. 558, 2016.
[15] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press,
2016.
[16] A. G´
eron, Hands-on machine learning with Scikit-Learn, Keras, and
TensorFlow: Concepts, tools, and techniques to build intelligent systems.
O’Reilly Media, Inc.”, 2019.
[17] T. Amr, Hands-On Machine Learning with scikit-learn and Scientific
Python Toolkits: A practical guide to implementing supervised and
8 JUNE 2022
Fig. 6. Evaluation for the clinical dataset. The curves represent the SSIM predictions obtained with the different trained models while the colored
bars the subjective classification performed by the expert. When the curves are within the coloured bars there is an agreement between the objective
and subjective evaluation, disagreement otherwise. The blue dashed lines indicate the separation between the different subjects.
unsupervised machine learning algorithms in Python. Packt Publishing
Ltd, 2020.
[18] W. Rawat and Z. Wang, “Deep convolutional neural networks for image
classification: A comprehensive review,” Neural computation, vol. 29,
no. 9, pp. 2352–2449, 2017.
[19] Y. Li, “Research and application of deep learning in image recognition,”
in 2022 IEEE 2nd International Conference on Power, Electronics and
Computer Applications (ICPECA). IEEE, 2022, pp. 994–999.
[20] V. E. Staartjes and J. M. Kernbach, “Foundations of machine learning-
based clinical prediction modeling: Part v—a practical approach to
regression problems,” in Machine Learning in Clinical Neuroscience.
Springer, 2022, pp. 43–50.
[21] P. Langley et al., “The changing science of machine learning,” Machine
learning, vol. 82, no. 3, pp. 275–279, 2011.
[22] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
quality assessment: from error visibility to structural similarity, IEEE
transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
[23] R. Shaw, C. Sudre, S. Ourselin, and M. J. Cardoso, “Mri k-space motion
artefact augmentation: model robustness and task-specific uncertainty,”
in International Conference on Medical Imaging with Deep Learning–
Full Paper Track, 2018.
[24] F. P´
erez-Garc´
ıa, R. Sparks, and S. Ourselin, “Torchio: a python li-
brary for efficient loading, preprocessing, augmentation and patch-based
sampling of medical images in deep learning,” Computer Methods and
Programs in Biomedicine, vol. 208, p. 106236, 2021.
[25] S. Chatterjee, A. Sciarra, M. D¨
unnwald, S. Oeltze-Jafra, A. N¨
urnberger,
and O. Speck, “Retrospective motion correction of mr images using
prior-assisted deep learning,” arXiv preprint arXiv:2011.14134, 2020.
[26] A. S. Atukorale and T. Downs, “Using labeled and unlabeled data for
training,” 2001.
[27] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2016, pp. 770–778.
[28] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan,
T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf,
E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner,
L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style,
high-performance deep learning library, in Advances in Neural
Information Processing Systems 32. Curran Associates, Inc., 2019,
pp. 8024–8035. [Online]. Available: http://papers.neurips.cc/paper/
9015-pytorch- an-imperative-style- high-performance- deep-learning- library.
pdf
[29] W. Kubinger, M. Vincze, and M. Ayromlou, “The role of gamma cor-
rection in colour image processing,” in 9th European Signal Processing
Conference (EUSIPCO 1998). IEEE, 1998, pp. 1–4.
[30] R. Jain, R. Kasturi, and B. Schunck, “Machine vision mcgraw-hill
international editions,” New York, 1995.
[31] G. J. Braun and M. D. Fairchild, “Image lightness rescaling using sig-
moidal contrast enhancement functions,” Journal of Electronic Imaging,
vol. 8, no. 4, pp. 380–393, 1999.
[32] S. M. Pizer, E. P. Amburn, J. D. Austin, R. Cromartie, A. Geselowitz,
T. Greer, B. ter Haar Romeny, J. B. Zimmerman, and K. Zuiderveld,
“Adaptive histogram equalization and its variations,” Computer vision,
graphics, and image processing, vol. 39, no. 3, pp. 355–368, 1987.
[33] D. M. Allen, “Mean square error of prediction as a criterion for selecting
variables, Technometrics, vol. 13, no. 3, pp. 469–475, 1971.
[34] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,
arXiv preprint arXiv:1412.6980, 2014.
[35] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy,
D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J.
van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J.
Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, ˙
I. Polat, Y. Feng, E. W.
Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henrik-
sen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro,
F. Pedregosa, P. van Mulbregt, and SciPy 1.0 Contributors, “SciPy 1.0:
Fundamental Algorithms for Scientific Computing in Python,” Nature
Methods, vol. 17, pp. 261–272, 2020.
[36] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-
plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch-
esnay, “Scikit-learn: Machine learning in Python, Journal of Machine
Learning Research, vol. 12, pp. 2825–2830, 2011.
[37] M. Herbst, J. Maclaren, C. Lovell-Smith, R. Sostheim, K. Egger,
A. Harloff, J. Korvink, J. Hennig, and M. Zaitsev, “Reproduction
of motion artifacts for performance analysis of prospective motion
correction in mri,” Magnetic Resonance in Medicine, vol. 71, no. 1,
pp. 182–190, 2014.
[38] B. Zahneisen, B. Keating, A. Singh, M. Herbst, and T. Ernst, “Re-
verse retrospective motion correction, Magnetic resonance in medicine,
vol. 75, no. 6, pp. 2341–2349, 2016.
SCIARRA et al.: AUTOMATED SSIM REGRESSION FOR DETECTION AND QUANTIFICATION OF MOTION ARTEFACTS IN BRAIN MR IMAGES 9
[39] A. Sciarra, H. Mattern, R. Yakupov, S. Chatterjee, D. Stucht, S. Oeltze-
Jafra, F. Godenschweger, and O. Speck, “Quantitative evaluation of
prospective motion correction in healthy subjects at 7t mri, Magnetic
resonance in medicine, vol. 87, no. 2, pp. 646–657, 2022.
... The Structural Similarity Index Measure (SSIM) is a novel tool that avoids preprocessing and minimizes classification error [8]. In [9], image quality assessment was performed using SSIM with a neural network. The primary objective was to invent a mechanized IQA tool to detect motion artifacts from brain MRIs. ...
... After comparing the artificial with the original image, SSIM is calculated to be used for further analysis using a neural network. Classification with ResNet-18 for three classes achieved the best accuracies of 97%, 95%, and 89% [9]. ...
Article
Full-text available
Medical imaging has improved image quality and enables accurate diagnosis and treatment. Medical imaging is used in the early detection and diagnosis of mental disorders or mental illnesses, and treatment. This study performs image-based classification using the Structural Similarity Index Measure (SSIM) to detect normal and abnormal neuroimages. Two experiments were performed on the same dataset. 342 Dicom images were divided into standard and abnormal categories. At first, the SSIM between images was calculated. SVM, KNN, Naïve Bayes, and Decision Tree classifiers were applied and compared. Similarly, an artificial neural network using two optimizers, Adam and SGD, was applied to the same dataset. In theexperiments, 100% and 97% accuracy was achieved in image-based classification, while SSIM-based classification achieved 100% and 61% for different classifiers.
Article
Full-text available
Purpose Quantitative assessment of prospective motion correction (PMC) capability at 7T MRI for compliant healthy subjects to improve high‐resolution images in the absence of intentional motion. Methods Twenty‐one healthy subjects were imaged at 7 T. They were asked not to move, to consider only unintentional motion. An in‐bore optical tracking system was used to monitor head motion and consequently update the imaging volume. For all subjects, high‐resolution T1 (3D‐MPRAGE), T2 (2D turbo spin echo), proton density (2D turbo spin echo), and T2∗ (2D gradient echo) weighted images were acquired with and without PMC. The images were evaluated through subjective and objective analysis. Results Subjective evaluation overall has shown a statistically significant improvement (5.5%) in terms of image quality with PMC ON. In a separate evaluation of every contrast, three of the four contrasts (T1, T2, and proton density) have shown a statistically significant improvement (9.62%, 9.85%, and 9.26%), whereas the fourth one (T2∗) has shown improvement, although not statistically significant. In the evaluation with objective metrics, average edge strength has shown an overall improvement of 6% with PMC ON, which was statistically significant; and gradient entropy has shown an overall improvement of 2%, which did not reach statistical significance. Conclusion Based on subjective assessment, PMC improved image quality in high‐resolution images of healthy compliant subjects in the absence of intentional motion for all contrasts except T2∗, in which no significant differences were observed. Quantitative metrics showed an overall trend for an improvement with PMC, but not all differences were significant.
Article
Full-text available
Background and Objective Processing of medical images such as MRI or CT presents different challenges compared to RGB images typically used in computer vision. These include a lack of labels for large datasets, high computational costs, and the need of metadata to describe the physical properties of voxels. Data augmentation is used to artificially increase the size of the training datasets. Training with image subvolumes or patches decreases the need for computational power. Spatial metadata needs to be carefully taken into account in order to ensure a correct alignment and orientation of volumes. Methods We present TorchIO, an open-source Python library to enable efficient loading, preprocessing, augmentation and patch-based sampling of medical images for deep learning. TorchIO follows the style of PyTorch and integrates standard medical image processing libraries to efficiently process images during training of neural networks. TorchIO transforms can be easily composed, reproduced, traced and extended. Most transforms can be inverted, making the library suitable for test-time augmentation and estimation of aleatoric uncertainty in the context of segmentation. We provide multiple generic preprocessing and augmentation operations as well as simulation of MRI-specific artifacts. ResultsSource code, comprehensive tutorials and extensive documentation for TorchIO can be found at https://github.com/fepegar/torchio. The package can be installed from the Python Package Index (PyPI) running pip install torchio. It includes a command-line interface which allows users to apply transforms to image files without using Python. Additionally, we provide a graphical user interface within a TorchIO extension in 3D Slicer to visualize the effects of transforms. Conclusions TorchIO was developed to help researchers standardize medical image processing pipelines and allow them to focus on the deep learning experiments. It encourages good open-science practices, as it supports experiment reproducibility and is version-controlled so that the software can be cited precisely. Due to its modularity, the library is compatible with other frameworks for deep learning with medical images.
Conference Paper
Full-text available
In MRI, motion artefacts are among the most common types of artefacts. They can degrade images and render them unusable for accurate diagnosis. Traditional methods , such as prospective or retrospective motion correction, have been proposed to avoid or alleviate motion artefacts. Recently, several other methods based on deep learning approaches have been proposed to solve this problem. This work proposes to enhance the performance of existing deep learning models by the inclusion of additional information present as image priors. The proposed approach has shown promising results and will be further investigated for clinical validity.
Conference Paper
Full-text available
Magnetic Resonance Imaging (MRI) suffers from several artifacts, the most common of which are motion artifacts. These artifacts often yield images that are of non-diagnostic quality. To detect such artifacts, images are prospectively evaluated by experts for their diagnostic quality, which necessitates patient-revisits and rescans whenever non-diagnostic quality scans are encountered. This motivates the need to develop an automated framework capable of accessing medical image quality and detecting diagnostic and non-diagnostic images. In this paper, we explore several convolutional neural network-based frameworks for medical image quality assessment and investigate several challenges therein.
Chapter
This chapter goes through the steps required to train and validate a simple, machine learning-based clinical prediction model for any continuous outcome. We supply fully structured code for the readers to download and execute in parallel to this section, as well as a simulated database of 10,000 glioblastoma patients who underwent microsurgery, and predict survival from diagnosis in months. We walk the reader through each step, including import, checking, splitting of data. In terms of pre-processing, we focus on how to practically implement imputation using a k-nearest neighbor algorithm. We also illustrate how to select features based on recursive feature elimination and how to use k-fold cross validation. We demonstrate a generalized linear model, a generalized additive model, a random forest, a ridge regressor, and a Least Absolute Shrinkage and Selection Operator (LASSO) regressor. Specifically for regression, we discuss how to evaluate root mean square error (RMSE), mean average error (MAE), and the R2 statistic, as well as how a quantile-quantile plot can be used to assess the performance of the regressor along the spectrum of the outcome variable, similarly to calibration when dealing with binary outcomes. Finally, we explain how to arrive at a measure of variable importance using a universal, nonparametric method.
Article
Motion artifacts on magnetic resonance (MR) images degrade image quality and thus negatively affect clinical and research scanning. Considering the difficulty in preventing patient motion during MR examinations, the identification of motion artifact has attracted significant attention from researchers. We propose an automatic method for the evaluation of motion corrupted images using a deep convolutional neural network (CNN). Deep CNNs has been used widely in image classification tasks. While such methods require a significant amount of annotated training data, a scarce resource in medical imaging, the transfer learning and fine-tuning approaches allow us to use a smaller amount of data. Here we selected four renowned architectures, initially trained on Imagenet contest dataset, to fine-tune. The models were fine-tuned using patches from an annotated dataset composed of 68 T1-weighted volumetric acquisitions from healthy volunteers. For training and validation 48 images were used, while the remaining 20 images were used for testing. Each architecture was fine-tuned for each MR axis, detecting the motion artifact per patches from the three orthogonal MR acquisition axes. The overall average accuracy for the twelve models (three axes for each of four architecture) was 86.3%. As our goal was to detect fine-grained corruption in the image, we performed an extensive search on lower layers from each of the four architectures, since they filter small regions in the original input. Experiments showed that architectures with fewer layers than the original ones reported the better results for image patches with an overall average accuracy of 90.4%. The accuracies per architecture were similar so we decided to explore all four architectures performing a result consensus. Also, to determine the probability of motion artifacts presence on the whole acquisition a combination of the three axes were performed. The final architecture consists of an artificial neural network (ANN) classifier combining all models from the four shallower architectures, which overall acquisition-based accuracy was 100.0%. The proposed method generalization was tested using three different MR data: 1) MR image acquired in epilepsy patients (93 acquisitions); 2) MR image presenting susceptibility artifact (22 acquisitions); and 3) MR image acquired from different scanner vendor (20 acquisitions). The achieved acquisition-based accuracy on generalization tests (1) 90.3%, 2) 63.6%, and 3) 75.0%) suggests that domain adaptation is necessary. Our proposed method can be rapidly applied to large amounts of image data, providing a motion probability p∈[0,1] per acquisition. This method output can be used as a scale to identify the motion corrupted images from the dataset, thus minimizing the time spent on visual quality control.