Content uploaded by Soumick Chatterjee
Author content
All content in this area was uploaded by Soumick Chatterjee on Feb 14, 2022
Content may be subject to copyright.
Fine-tuning deep learning model parameters for improved super-resolution of dynamic
MRI with prior-knowledge
Chompunuch Sarasaena,b,c,∗, Soumick Chatterjeea,c,d,e,∗, Mario Breitkopfa,c, Georg Roseb,c, Andreas N ¨
urnbergerd,e,g,
Oliver Specka,c,f,g,h
aBiomedical Magnetic Resonance, Otto von Guericke University Magdeburg, Germany
bInstitute for Medical Engineering, Otto von Guericke University Magdeburg, Germany
cResearch Campus STIMULATE, Otto von Guericke University Magdeburg, Germany
dFaculty of Computer Science, Otto von Guericke University Magdeburg, Germany
eData and Knowledge Engineering Group, Otto von Guericke University Magdeburg, Germany
fGerman Center for Neurodegenerative Disease, Magdeburg, Germany
gCenter for Behavioral Brain Sciences, Magdeburg, Germany
hLeibniz Institute for Neurobiology, Magdeburg, Germany
Abstract
Dynamic imaging is a beneficial tool for interventions to assess physiological changes. Nonetheless during dynamic MRI, while
achieving a high temporal resolution, the spatial resolution is compromised. To overcome this spatio-temporal trade-off, this re-
search presents a super-resolution (SR) MRI reconstruction with prior knowledge based fine-tuning to maximise spatial information
while reducing the required scan-time for dynamic MRIs. A U-Net based network with perceptual loss is trained on a benchmark
dataset and fine-tuned using one subject-specific static high resolution MRI as prior knowledge to obtain high resolution dynamic
images during the inference stage. 3D dynamic data for three subjects were acquired with different parameters to test the gener-
alisation capabilities of the network. The method was tested for different levels of in-plane undersampling for dynamic MRI. The
reconstructed dynamic SR results after fine-tuning showed higher similarity with the high resolution ground-truth, while quantita-
tively achieving statistically significant improvement. The average SSIM of the lowest resolution experimented during this research
(6.25% of the k-space) before and after fine-tuning were 0.939 ±0.008 and 0.957 ±0.006 respectively. This could theoretically
result in an acceleration factor of 16, which can potentially be acquired in less than half a second. The proposed approach shows
that the super-resolution MRI reconstruction with prior-information can alleviate the spatio-temporal trade-offin dynamic MRI,
even for high acceleration factors.
Keywords: super-resolution, dynamic MRI, prior knowledge, fine-tuning, patch-based super-resolution, deep learning
1. Introduction
Magnetic Resonance Imaging (MRI) is in clinical use for a
few decades with the advantage of non-ionising radiation, non-
invasiveness and an excellent soft tissue contrast. Considering
the clear visibility of tumours because of its high soft tissue
contrast, together with real-time supervision (e.g. thermome-
try), MRI is a promising tool for interventions. The visuali-
sation of lesions, as well as the needle paths, have to be ac-
quired prior to any interventional procedures, so-called a plan-
ning scan or a preinterventional MR imaging (Mahnken et al.,
2009). Furthermore, in MR-guided interventions, such as liver
biopsy, it is necessary to continuously acquire data and recon-
struct a series of images during interventions, in order to ex-
amine dynamic movements of internal organs (Bernstein et al.,
2004). A clear interpretable visualisation of the target lesion,
surrounding tissues including risk structures is crucial during
interventions. In order to achieve a high temporal resolution
during dynamic MRIs, because of the inherently slow speed
∗C. Sarasaen and S. Chatterjee contributed equally to this work.
of image acquisition, the amount of data to be acquired has
to be reduced - which may result in loss of spatial resolution.
Although there are a number of techniques dealing with this
spatio-temporal trade-off(Tsao et al., 2003; Lustig et al., 2006,
2007; Jung et al., 2009), their speed of reconstruction creates
a hindrance for real-time or near real-time imaging. Therefore,
a compromise between spatial and temporal resolution is in-
evitable during real-time MRIs and needs to be mitigated.
The so-called super-resolution (SR) algorithms aim to restore
images with high spatial resolution from the corresponding low
resolution images. SR approaches have been widely used for
various applications (Zhang et al., 2014; Sajjadi et al., 2017),
including for super-resolution of MRIs (SR-MRI) (Van Reeth
et al., 2012; Plenge et al., 2012; Isaac and Kulkarni, 2015).
Furthermore, deep learning based super-resolution reconstruc-
tion has been substantiated in recent times to be a successful
tool for SR-MRI (Zeng et al., 2018; He et al., 2020), including
for dynamic MRIs (Qin et al., 2018; Lyu et al., 2020). How-
ever, most deep learning based methods need large training data
sets and finding such training data – matching with the data of
the real-time acquisition that needs to be reconstructed in terms
Preprint submitted to Elsevier October 26, 2021
arXiv:2102.02711v4 [eess.IV] 23 Oct 2021
of contrast and sequence – can be a challenging task. Using
a training set significantly different than the test set can pro-
duce results of poor quality (Wang and Deng, 2018; Wilson
and Cook, 2020). Several techniques have been used to deal
with the problem of small datasets in deep learning, such as
data augmentation (Perez and Wang, 2017) and synthetic data
generation (Lateh et al., 2017; Frid-Adar et al., 2018). How-
ever, these methods rely on artificially modifying the data to
increase the size of the dataset. Patch-based training can also
help cope with the small dataset problem by splitting each data
into smaller patches. This can effectively increase the num-
ber of samples in the dataset without artificially modifying the
data (Frid-Adar et al., 2017). The patch-based super-resolution
(PBSR) techniques learn the mapping function from given cor-
responding pairs of high resolution and low resolution image
patches (Yang et al., 2014).
This study proposes a PBSR reconstruction, aiming at ad-
dressing the problem of the lack of large abdominal datasets
for training. This research intends to improve deep learning
based super-resolution of dynamic MR images by incorporat-
ing prior images (planning scan). The network was trained on
a publicly available abdominal dataset of 40 subjects, acquired
using different sequences than the dynamic MR that is to be
reconstructed. After that, the network was fine-tuned using a
high resolution prior planning scan of the same subject as the
dynamic acquisition.
1.1. Related works
Super-resolution approaches have been employed for a wide
variety of tasks, such as computer vision (Shi et al., 2016;
Dong et al., 2016; Sajjadi et al., 2017), remote sensing (Zhang
et al., 2014; Ran et al., 2020), face-related tasks (Tappen and
Liu, 2012; Yu et al., 2018) and medical applications (Isaac and
Kulkarni, 2015; Huang et al., 2017). Deep learning based meth-
ods have been widely used in recent times for performing super-
resolution (Dong et al., 2014; Zhu et al., 2014; Dong et al.,
2016; Ran et al., 2020). Moreover, deep learning based tech-
niques have been proven to be a successful tool for numer-
ous applications in the field of MRI, including for perform-
ing MR reconstruction (Wang et al., 2016; Hyun et al., 2018;
Hammernik et al., 2018; Chatterjee et al., 2019) and for SR-
MRI (Zeng et al., 2018; Liu et al., 2018; Chaudhari et al.,
2018; He et al., 2020). Different deep learning based SR-
MRI ideas have been proposed for static brain MRI (Huang
et al., 2017; Tanno et al., 2017; Pham et al., 2017; Zeng et al.,
2018; Liu et al., 2018; Chen et al., 2018; Deka et al., 2020; Gu
et al., 2020). Furthermore, deep learning based methods have
additionally been shown to tackle the spatio-temporal trade-
off(Liang et al., 2020), also for dynamic cardiac MR recon-
struction (Qin et al., 2018; Lyu et al., 2020).
Single-image super-resolution techniques are classified into
the groups of prediction-based, edge-based, image statistical
and patch-based methods (Yang et al., 2014). PBSR can over-
come the need of large training datasets as the actual train-
ing is done using patches, rather than on whole images. The
PBSR methods have been applied to different tasks, including
applications in medical imaging (Manj´
on et al., 2010; Rousseau
et al., 2010; Zhang et al., 2012; Coup´
e et al., 2013; Jain et al.,
2017). By employing PBSR, the reconstruction procedure can
be driven to cope with the limitation of available training ab-
dominal MR data (Tang and Shao, 2016; Misra et al., 2020).
The U-Net (Ronneberger et al., 2015) model, which was
originally proposed for image segmentation, over the past few
years has been proven to solve various inverse problems as
well (Hyun et al., 2018; Iqbal et al., 2019; Ghodrati et al., 2019).
Iqbal et al. (2019) developed a U-Net based architecture for SR
reconstruction of MR spectroscopic images. Hyun et al. (2018)
reconstructed MRI utilising a 2D U-Net from zero-filled data,
undersampled using uniform Cartesian sampling (GRAPPA-
like) with a dense k-space centre. Ghodrati et al. (2019) em-
ployed a U-Net model to test the performance of this network
structure for cardiac MRI reconstruction. Due to the promising
results shown in the papers mentioned earlier, the current paper
proposed a 3D U-Net (C¸ ic¸ek et al., 2016) based architecture for
performing SR-MRI for abdominal dynamic images.
Transfer learning is a technique for re-proposing or adapting
a pre-trained model with fine-tuning (Bengio et al., 2017). With
transfer learning, the network weights learned from one task
can be used as pre-trained weights for another task, and then
the network is trained (fine-tuned) for the new task. It has been
widely used in data mining and machine learning (Dai et al.,
2007; Li et al., 2009; Choi et al., 2017; Lee et al., 2018). Trans-
fer learning can address the issue of having insufficient training
data (Zhao, 2017; Kim et al., 2020). The fine-tuning process
is known to improve the network’s performance and can help
to converge in less training epochs with smaller datasets (Pan
and Yang, 2009). One of the main research inquiries of apply-
ing transfer learning is ”what to transfer”. This current study
thereby, utilises the specific knowledge of priors from a static
planning image, which is usually acquired earlier to an inter-
ventional procedure. The incorporation of priors is meant to
constrain anatomical structures in the fine-tuning process and
to improve the data fidelity term in the regularisation process.
To determine the reconstruction error during training a deep
learning model, between the model’s prediction and the corre-
sponding ground-truth images, the selection of a loss function
is crucial. Pixel-based loss functions such as mean squared er-
ror (L2 loss) are commonly used for SR, however, in terms of
perceptual quality, it often generates too smooth results which
is caused due to the loss of high-frequency details (Wang et al.,
2003, 2004; Johnson et al., 2016; Ledig et al., 2017). Perceptual
loss has shown potential to achieve high-quality images for im-
age transformation tasks such as style transfer (Johnson et al.,
2016; Gatys et al., 2016). For MRI reconstruction, Ghodrati
et al. (2019) have shown a comparative study of loss functions,
such as perceptual loss (using VGG-16 as perceptual loss net-
work), pixel-based loss (L1 and L2) and patch-wise structural
dissimilarity (Dssim), for deep learning based cardiac MR re-
construction. They found that the results of the perceptual loss
outperformed the other loss functions. Hence, in this work a
combination of perceptual loss network with the mean absolute
error (MAE) as the loss function was used, which is explained
in section 2.3.1.
2
1.2. Problem statement
Given a low resolution image ILR and a corresponding high
resolution image IHR , the reconstructed high resolution image
ˆ
IHR can be recovered from a super-resolution reconstruction us-
ing the following equation (Wang et al., 2020):
ˆ
IHR =z(ILR ;θ) (1)
where zdenotes the super-resolution model that maps the
image counterparts and θdenotes the parameters of z. The SR
image reconstruction is an ill-posed problem, a network model
can be trained to solve the objective function: ‘I
ˆ
θ=arg minL(ˆ
IHR ,IHR)+λR(θ)(2)
where the L(ˆ
IHR ,IHR) denotes the loss function between the
approximated HR image ˆ
IHR and ground-truth image IH R,R(θ)
is a regularisation term and λdenotes the trade-offparameter.
1.3. Contributions
This paper presents a method to incorporate prior knowledge
in deep-learning based super-resolution and its application in
dynamic MRI. The main contributions are as follows:
•This paper addresses the trade-offbetween the spatial and
temporal resolution of dynamic MRI by incorporating a
static high resolution scan as prior knowledge while per-
forming spatial super-resolution on low-resolution images,
effectively reducing the required scan-time per volume -
which in turn will improve the temporal resolution.
•A 3D U-Net model was first trained for the task of SR-
MRI on a benchmark dataset and was fine-tuned using a
subject-specific prior planning scan.
•This paper further tackles the problem of the lack of high-
resolution dynamic MRI for training in two ways:
–By using a static benchmark dataset for training, hav-
ing different contrasts and resolutions than the tar-
get dynamic MRI; followed by fine-tuning using one
static planning scan
–By patch-based super-resolution training and fine-
tuning
•To achieve realistic super-resolved images, perceptual loss
was used as the loss function for training and fine-tuning
the model, which was calculated using a 3D perceptual
loss network pre-trained on MR images.
2. Methodology
This paper proposes a framework of patch-based MR super-
resolution based on U-Net, incorporating prior knowledge. The
framework can be divided into three stages: main training, fine-
tuning and inference. The U-Net model was initially trained
with a benchmark dataset for main-training and then was fine-
tuned using a subject-specific prior static scan. Finally in the
inference stage, high resolution dynamic MRIs were recon-
structed from low resolution scans. This chapter starts with the
description of various datasets used in this research, then ex-
plains the network architecture followed by the implementation
and training, finally explains the metrics used for evaluation.
2.1. Data
In this work, 3D abdominal MR volumes were artificially
downsampled in-plane using MRUnder (Chatterjee, 2020)1
pipeline to simulate low resolution datasets. The low reso-
lution data was generated by performing undersampling in-
plane (phase-encoding and read-out direction) by taking the
centre of k-space without zero-padding. The CHAOS challenge
dataset (Kavur et al., 2020) (T1-Dual; in- and opposed-phase
images) was used for the main training. During this phase,
the dataset was split into training and validation set, with the
ratio of 70:30. High resolution 3D static (breath-hold) and
3D ”pseudo”-dynamic (free-breathing) scans for 10 time-points
(TP) using T1w FLASH sequence were acquired for fine-tuning
and inference respectively. Each time-point of the dynamic ac-
quisition was treated as separate 3D Volume. Three healthy
subjects were scanned with the same sequence but with differ-
ent parameters on a 3T MRI (Siemens Magnetom Skyra). This
aims to test the generalisation of the network. For each subject,
the 3D static and the 3D dynamic scans were acquired in dif-
ferent sessions using the same sequences and parameters. The
sequence parameters of the various datasets have been listed in
Table 1. The CHAOS dataset (for main training), the 3D static
scans (for fine-tuning) and the 3D dynamic scans (for inference)
were artificially downsampled for three different levels of im-
age resolution, by taking 25%, 10% and 6.25% of the k-space
centre, resulting in MR acceleration factor of 2, 3 and 4 respec-
tively (considering undersampling only in the phase-encoding
direction). This can be accelerated theoretically to a factor of 4,
9 and 16 respectively considering the amount of data used for
the SR reconstruction. The effective resolutions and the roughly
estimated acquisition times of the low resolution images were
calculated from the corresponding high-resolution images, are
reported in Table 2. The acquisition times were calculated as:
AcqT ime =PEn×T R ×Sm(3)
where PEnis the number of phase-encoding lines - which
equals to the matrix dimension in this direction, T R is the rep-
etition time - a parameter of an MRI pulse sequence which de-
fines the time between two consecutive radio-frequency pulses,
and Smis the number of slices. Phase/slice oversampling,
phase/slice resolutions and GRAPPA factor were taken into ac-
count while calculating PEn. The low resolution images served
as the input to the network and were compared against the high
resolution ground-truth images.
2.2. Network Architecture
Fig 1 portrays the proposed network architecture. In this
work, a 3D U-Net based model (Ronneberger et al., 2015;
1MRUnder on Github: https://github.com/soumickmj/MRUnder
3
Table 1: MRI acquisition parameters CHAOS dataset and subject-wise 3D dynamic scans. Static scans were performed using the same subject-wise sequence
parameters as the dynamic scans for one time-point (TP), acquired at a different session.
CHAOS
(40 Subjects) Subject 1 Subject 2 Subject 3
Sequence T1 Dual In-Phase
& Opposed-Phase T1w Flash 3D T1w Flash 3D T1w Flash 3D
Resolution 1.44 x 1.44 x 5 -
2.03 x 2.03 x 8 mm31.09 x 1.09 x 4 mm31.09 x 1.09 x 4 mm31.36 x 1.36 x 4 mm3
FOV x, y, z 315 x 315 x 240 -
520 x 520 x 280 mm3280 x 210 x 160 mm3280 x 210 x 160 mm3350 x 262 x 176 mm3
Encoding matrix 256 x 256 x 26 -
400 x 400 x 50 256 x 192 x 40 256 x 192 x 40 256 x 192 x 44
Phase/Slice oversampling - 10/0 % 10/0 % 10/0 %
TR/TE
110.17 - 255.54 ms /
4.60 - 4.64 ms (In-Phase)
2.30 ms (Opposed-Phase)
2.34/0.93 ms 2.34/0.93 ms 2.23/0.93 ms
Flip angle 80° 8° 8° 8°
Bandwidth - 975 Hz/Px 975 Hz/Px 975 Hz/Px
GRAPPA factor None 2 None None
Phase/Slice partial Fourier - Off/OffOff/OffOff/Off
Phase/Slice resolution - 75/65 % 75/65 % 50/64 %
Fat suppression - None On On
Time per TP - 5.53 sec 11.76 sec 8.01 sec
Table 2: Effective resolutions and estimated acquisition times (per TP) of the dynamic and static datasets after performing different levels of artificial undersampling.
Subject 1 Subject 2 Subject 3
Resolution
(mm3)
Acq. Time
(sec)
Resolution
(mm3)
Acq. Time
(sec)
Resolution
(mm3)
Acq. Time
(sec)
High Resolution Ground-truth 1.09 x 1.09 x 4 4.81 1.09 x 1.09 x 4 9.61 1.36 x 1.36 x 4 6.62
25% of k-space 2.19 x 2.19 x 4 1.22 2.19 x 2.19 x 4 2.43 2.73 x 2.73 x 4 1.65
10% of k-space 3.50 x 3.50 x 4 0.47 3.50 x 3.50 x 4 0.94 4.38 x 4.38 x 4 0.66
6.25% of k-space 4.38 x 4.38 x 4 0.28 4.38 x 4.38 x 4 0.56 5.47 x 5.47 x 4 0.42
Figure 1: The proposed network Architecture.
4
Figure 2: Method Overview.
C¸ ic¸ek et al., 2016; Sarasaen et al., 2020) with perceptual loss
network (Chatterjee et al., 2020) was employed for super-
resolution reconstruction. The U-Net architecture consists of
two main paths; contracting (encoding) and expanding (decod-
ing). The contracting path consists of three blocks, each com-
prises of two convolutional layers and a ReLU activation func-
tion. The expanding path also consists of three blocks, but con-
volutional transpose layers were used instead of the convolu-
tional layers. The training was performed using 3D patches of
the volumes, with a patch size of 243.
The U-Net model requires the same matrix size for input and
output, therefore, the input low-resolution image patches were
interpolated using trilinear interpolation before supplying them
to the network. The interpolation factor was determined based
on the undersampling factor - thus the low-resolution input is
modified to the same matrix size as the target high-resolution
ground-truth.
Patch-based training may result in patching artefacts during
inference. To remove these artefacts, the inference was per-
formed with a stride of one and the overlapped portions were
averaged after reconstruction.
2.3. Model Implementation and Network Training
Fig 2 shows the method overview. The main training was
performed using 3D patches of the 80 volumes (40 subjects, in-
phase and opposed-phase for each subject), with a patch size
of 243and a stride of 6 for the slice dimension and 12 for the
other dimensions. After that, the network was fine-tuned using
a single 3D static scan of the same subject from an earlier ses-
sion, labelled as x,y,z,t in Fig 2. This static scan has the same
resolution, contrast and volume coverage as the high resolution
dynamic scan. The static and the dynamic scans were not co-
registered to keep it similar to the real-life scenario and to keep
the speed of inference fast. Fine-tuning and evaluations were
performed with a patch size of 243and a stride of one. The im-
plementation was done using PyTorch (Paszke et al., 2019) and
was trained using Nvidia Tesla V100 GPUs. The loss was min-
imised using the Adam optimiser with a learning rate of 1e-4.
The main training was performed for 200 epochs. The network
was fine-tuned for only one epoch, using the planning scan with
a lower learning rate (1e-6).
2.3.1. Perceptual Loss
Loss during the training and fine-tuning of the network was
calculated with the help of perceptual loss (Johnson et al.,
2016). The first three levels of the contraction path of a pre-
trained (on 7T MRA scans, for vessel segmentation) frozen
U-Net MSS model (Chatterjee et al., 2020) were used as the
perceptual loss network (PLN) to extract features from the fi-
nal super-resolved output of the model and the ground-truth
images (refer to Fig.1). Typically VGG-16 trained on three-
channel RGB non-medical images (ImageNet Dataset) is used
as PLN, even while working with medical images (Ghodrati
et al., 2019) as the PLN doesn’t have to be trained on a similar
dataset. In this research, the pre-trained network was chosen
because it was originally trained on single-channel medical im-
ages, but of different contrast and organ; and it was hypothe-
sised that using a network trained as such will be more suitable
than a network trained on three-channel images. The extracted
features from the model’s output and from the ground-truth im-
ages were compared using mean absolute error (L1 loss). The
losses obtained at each level for each feature were then added
together and backpropagated.
2.4. Evaluation Metrics
To evaluate the quality of the reconstructed images against
the ground-truth HR images, two of the most commonly used
metrics for evaluating image quality were selected, namely the
structural similarity index (SSIM)(Wang et al., 2004) and the
peak signal-to-noise ratio (PSNR).
For perceptual quality assessment, the accuracy of the recon-
structed images was compared to the ground truth using SSIM,
5
which is based on the computations of luminance, contrast and
structure terms between image x and y:
S S I M(x,y)=(2µxµy+c1)(2σxy +c2)
(µ2
x+µ2
y+c1)(σ2
x+σ2
y+c2)(4)
where µx, µy, σx, σyand σxy are the local means, standard de-
viations, and cross-covariance for images xand y, respectively.
c1=(k1L)2and c2=(k2L)2, where Lis the dynamic range of
the pixel-values, k1=0.01 and k2=0.03.
Additionally, the performance of the model was measured
statistically with PSNR. It is calculated via the mean-square
error (MSE) as:
PS NR =10 log10 R2
MS E !(5)
where Ris the maximum fluctuation in the input image.
3. Results and Discussion
Performance of the model was evaluated for three different
levels of undersampling: by taking 25%, 10% and 6.25% of the
k-space centre. The network was tested before and after fine-
tuning using 3D dynamic MRI. The proposed approach was
compared against the low resolution input, the traditional trilin-
ear interpolation, area interpolation (based on adaptive average
pooling, as implemented in PyTorch2) and finally against the
most widely used technique in clinical MRIs - Fourier interpo-
lation of the input (zero-padded k-space, also known as the sinc
interpolation).
There was a noticeable improvement qualitatively and quan-
titatively while reconstructing low resolution data using the pro-
posed method, even for only 6.25% of the k-space. Fig 3 shows
the comparison qualitatively for the low resolution images by
taking 25%, 10% and 6.25% of the k-space. Fig 4 portrays
the comparison of the low resolution input for 6.25% of the k-
space, the lowest resolution investigated during this study, with
the SR result after fine-tuning over different time points. The
SSIM maps were calculated against the high resolution ground-
truth, which the respective SSIM value can be found on top
of the image. Fig 5 illustrates the deviations of an example
result from its corresponding ground-truth for two different re-
gions of interest. It can be observed that SR results after fine-
tuning could alleviate the undersampling artefacts, which are
still present in the SR results of the main training, even for rela-
tively low resolution images like 10% and 6.25% of the k-space.
Consequently, the visibility of small details is improved.
Additionally, for quantitative analysis, Table 3 displays the
average and standard deviation (SD) of SSIM, PSNR and the
SD of subtracted images for all time points for the dynamic
datasets. Here, each time-point has been considered indepen-
dent of each other as separate 3D volumes. Fig 6 shows the dis-
tribution of the resultant metrics over all resolutions and sub-
jects. The first row portrays the SSIM values and the second
2PyTorch Interpolation: https://pytorch.org/docs/stable/
generated/torch.nn.functional.interpolate.html
row shows the PSNR values. The columns denote 25%, 10%,
and 6.25% of the k-space, respectively. The blue, orange, green,
red, and violet lines represent the reconstruction results of tri-
linear interpolation, area interpolation, zero-padding (sinc inter-
polation), SR main training, and SR after fine-tuning, whereas
the thickness of the lines represents the standard deviation over
time-points. It can be observed that the proposed method (SR
after fine-tuning) significantly outperformed all the baseline
methods experimented here.
Fine-tuning with the planning scan helped in obtaining
sharper images and achieving a better edge delineation. Fur-
thermore, the statistical significance of the improvements in
terms of SSIM achieved by the model after fine-tuning was
evaluated using paired t-test and Wilcoxon signed-rank test.
Separate tests were performed considering all the different res-
olutions together and also by considering each resolution sep-
arately. It was observed that the improvement was statistically
significant in every evaluated scenario (p-values were always
less than 0.001).
The acquisition time of high resolution 3D ”pseudo”-
dynamic reference data without parallel acquisition in this study
was ten seconds and five seconds with GRAPPA factor two (Ta-
ble 2). These are not sufficient for real-time or near real-time
applications and might lead to blurring in free-breathing sub-
jects. This research shows the potential to acquire such volume
with only minimal loss of spatial information in less than half a
second.
The fine-tuning process took approximately eight hours to
finish for each subject using the earlier mentioned setup (Sec-
tion 2.3). Super-resolving each time-point took only a fraction
of a second. The required time for fine-tuning and inference
can further be reduced by reducing the patch-overlap (stride),
though that might reduce the quality of the resultant super-
resolved images. It can be further perceived that the network
was able to produce results highly similar to the ground-truth
(SSIM of 0.957) even while super-resolving 6.25% of k-space,
which can make the acquisition 16 times faster. Combining this
fast acquisition speed with the inference speed of the method,
this study can be extended to be used for real-time or near real-
time MRIs during interventions.
In the current study, only the centre of the k-space was used
during undersampling, which results in loss of resolution with-
out creating explicit image artefacts. Other undersampling pat-
terns, such as variable density or GRAPPA-like uniform under-
sampling of higher spatial frequencies may be investigated in
the future.
It should be noted that the static planning scans and the actual
dynamic scans during interventions are typically acquired with
different sequences, with planning scans having higher contrast
and resolution than the dynamic scans. This study was con-
ducted using the same sequence for static and dynamic scans,
but different resolutions and positions (different scan session).
An additional experiment was performed by fine-tuning using
a volumetric interpolated breath-hold examination (VIBE) se-
quence as planning scan for one subject. Super-resolving the
dynamic low-resolution images from 6.25% of the k-space re-
sulted in 0.032 lower SSIM than using the identical sequence
6
Figure 3: Comparative results of low resolution (25%, 10% and 6.25% of k-space) 3D Dynamic data of the same slice. From left to right: low resolution images
(scaled-up), Interpolated input (Trilinear), super-resolution results of the main training (SR Results Main Training), super-resolution results after the fine-tune (SR
After Fine-Tuning) and ground-truth images.
Figure 4: An example comparison of the low resolution input of the 6.25% of k-space with the super-resolution (SR) result after fine-tuning over three different
time points, compared against the high resolution ground-truth using SSIM maps.
Table 3: The average and the standard deviation of SSIM, PSNR, and SD of difference images with ground-truth. The table shows the results of different resolutions.
Data 25% of k-space 10% of k-space 6.25% of k-space
SSIM PSNR diffSD SSIM PSNR diffSD SSIM PSNR diffSD
Area Interpolation 0.911±0.011 29.721±1.948 0.068±0.011 0.814±0.018 26.250±1.867 0.071±0.013 0.723±0.031 24.092±1.964 0.080±0.013
Trilinear Interpolation 0.964±0.005 37.680±1.770 0.013±0.002 0.906±0.007 33.148±1.780 0.022±0.004 0.872±0.011 31.504±1.786 0.026±0.005
Zero-padded 0.977±0.013 37.980±4.078 0.064±0.011 0.926±0.009 31.844±2.260 0.067±0.013 0.888±0.012 29.803±2.147 0.069±0.015
SR Main Training 0.986±0.007 42.781±2.424 0.009±0.002 0.961±0.009 36.710±1.086 0.014±0.002 0.939±0.008 35.377±1.653 0.017±0.003
SR Fine-tuning 0.993±0.004 45.706±2.169 0.005±0.002 0.973±0.005 39.433±2.144 0.007±0.001 0.957±0.006 37.306±2.357 0.014±0.004
7
Figure 5: An example from the reconstructed results, compared against its ground-truth (GT) for low resolution images from 6.25% of k-space. From left to right,
upper to lower: ground-truth, trilinear interpolation (input of the network), SR result of main training and SR result after fine-tuning. For the yellow ROI, (a-b):
trilinear interpolation and the difference image from GT, (e-f): SR result of the main training and the difference image from GT and (i-j): SR result after fine-tuning
and the difference image from GT. The images on the right part are identical examples for the red ROI.
Figure 6: Line plot showing the mean and 95% confidence interval of the resultant SSIM and PSNR over the different time-points for each subject. The blue,
orange, green, red, and violet lines represent the reconstruction results of trilinear interpolation, area interpolation, zero-padding (sinc interpolation), SR main
training, and SR after fine-tuning, respectively. The upper row shows the SSIM values and the lower row shows the PSNR values.
8
with higher resolution for fine-tuning. This may be a limitation
of the current approach but requires further investigation.
4. Conclusion and Future Work
This research shows that fine-tuning with a subject-specific
prior static scan can significantly improve the results of deep
learning based super-resolution (SR) reconstruction. A 3D U-
Net based model was trained with the help of perceptual loss to
estimate the reconstruction error. The network model was ini-
tially trained using the CHAOS abdominal benchmark dataset
and was then fine-tuned using a static high resolution prior
scan. The model was used to obtain super-resolved high res-
olution 3D abdominal dynamic MRI from their corresponding
low resolution images. Even though the network was trained
using MRI sequences different than the reconstructed dynamic
MRI, the SR results after fine-tuning showed higher similar-
ity with the ground-truth images. The proposed method could
overcome the spatio-temporal trade-offby improving the spa-
tial resolution of the images without compromising the speed
of acquisition. This approach could be applied to real-time dy-
namic acquisitions, such as for interventional MRIs, because of
the high speed of inference of deep learning models.
In the presented approach, a 3D U-Net was used as the net-
work model, which needs interpolation as a pre-processing step.
Therefore the reconstructed images could be suffering from in-
terpolation errors. As future work, network models such as
SRCNN, which do not need interpolation, will be studied. In
addition, image resolutions lower than the already investigated
ones will be studied to check the network’s limitations. More-
over, clinical interventions are performed with devices, such as
needle, which are not present in the planning scan. The authors
plan to extend this research in future by evaluating on images
with such devices.
Acknowledgements
This work was conducted within the context of the Inter-
national Graduate School MEMoRIAL at Otto von Guericke
University (OVGU) Magdeburg, Germany, kindly supported by
the European Structural and Investment Funds (ESF) under the
programme ”Sachsen-Anhalt WISSENSCHAFT International-
isierung“ (project no. ZS/2016/08/80646).
References
Bengio, Y., Goodfellow, I., Courville, A., 2017. Deep learning. volume 1. MIT
press Massachusetts, USA:.
Bernstein, M.A., King, K.F., Zhou, X.J., 2004. Handbook of MRI pulse se-
quences. Elsevier.
Chatterjee, S., 2020. soumickmj/mrunder: Initial release. doi:10.5281/
zenodo.3901455.
Chatterjee, S., Breitkopf, M., Sarasaen, C., Rose, G., N¨
urnberger, A., Speck,
O., 2019. A deep learning approach for reconstruction of undersampled
cartesian and radial data, in: ESMRMB 2019.
Chatterjee, S., Prabhu, K., Pattadkal, M., Bortsova, G., Sarasaen, C., Dubost,
F., Mattern, H., de Bruijne, M., Speck, O., N¨
urnberger, A., 2020. Ds6,
deformation-aware semi-supervised learning: Application to small vessel
segmentation with noisy training data. arXiv preprint arXiv:2006.10802 .
Chaudhari, A.S., Fang, Z., Kogan, F., Wood, J., Stevens, K.J., Gibbons, E.K.,
Lee, J.H., Gold, G.E., Hargreaves, B.A., 2018. Super-resolution muscu-
loskeletal mri using deep learning. Magnetic resonance in medicine 80,
2139–2154.
Chen, Y., Shi, F., Christodoulou, A.G., Xie, Y., Zhou, Z., Li, D., 2018. Efficient
and accurate mri super-resolution using a generative adversarial network and
3d multi-level densely connected network, in: International Conference on
Medical Image Computing and Computer-Assisted Intervention, Springer.
pp. 91–99.
Choi, K., Fazekas, G., Sandler, M., Cho, K., 2017. Transfer learning for music
classification and regression tasks. arXiv preprint arXiv:1703.09179 .
C¸ ic¸ ek, ¨
O., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O., 2016.
3d u-net: learning dense volumetric segmentation from sparse annotation,
in: International conference on medical image computing and computer-
assisted intervention, Springer. pp. 424–432.
Coup´
e, P., Manj´
on, J.V., Chamberland, M., Descoteaux, M., Hiba, B., 2013.
Collaborative patch-based super-resolution for diffusion-weighted images.
NeuroImage 83, 245–261.
Dai, W., Xue, G.R., Yang, Q., Yu, Y., 2007. Transferring naive bayes classifiers
for text classification, in: AAAI, pp. 540–545.
Deka, B., Mullah, H.U., Datta, S., Lakshmi, V., Ganesan, R., 2020. Sparse rep-
resentation based super-resolution of mri images with non-local total varia-
tion regularization. SN Computer Science 1, 1–13.
Dong, C., Loy, C.C., He, K., Tang, X., 2014. Learning a deep convolutional
network for image super-resolution, in: European conference on computer
vision, Springer. pp. 184–199.
Dong, C., Loy, C.C., Tang, X., 2016. Accelerating the super-resolution con-
volutional neural network, in: European conference on computer vision,
Springer. pp. 391–407.
Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., Greenspan,
H., 2017. Modeling the intra-class variability for liver lesion detection using
a multi-class patch-based cnn, in: International Workshop on Patch-based
Techniques in Medical Imaging, Springer. pp. 129–137.
Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., Greenspan,
H., 2018. Gan-based synthetic medical image augmentation for increased
cnn performance in liver lesion classification. Neurocomputing 321, 321–
331.
Gatys, L.A., Ecker, A.S., Bethge, M., 2016. Image style transfer using convolu-
tional neural networks, in: Proceedings of the IEEE conference on computer
vision and pattern recognition, pp. 2414–2423.
Ghodrati, V., Shao, J., Bydder, M., Zhou, Z., Yin, W., Nguyen, K.L., Yang, Y.,
Hu, P., 2019. Mr image reconstruction using deep learning: evaluation of
network structure and loss functions. Quantitative imaging in medicine and
surgery 9, 1516–1527.
Gu, Y., Zeng, Z., Chen, H., Wei, J., Zhang, Y., Chen, B., Li, Y., Qin, Y., Xie,
Q., Jiang, Z., et al., 2020. Medsrgan: medical images super-resolution using
generative adversarial networks. Multimedia Tools and Applications 79,
21815–21840.
Hammernik, K., Klatzer, T., Kobler, E., Recht, M.P., Sodickson, D.K., Pock,
T., Knoll, F., 2018. Learning a variational network for reconstruction of
accelerated mri data. Magnetic resonance in medicine 79, 3055–3071.
He, X., Lei, Y., Fu, Y., Mao, H., Curran, W.J., Liu, T., Yang, X., 2020. Super-
resolution magnetic resonance imaging reconstruction using deep attention
networks, in: Medical Imaging 2020: Image Processing, International Soci-
ety for Optics and Photonics. p. 113132J.
Huang, Y., Shao, L., Frangi, A.F., 2017. Simultaneous super-resolution and
cross-modality synthesis of 3d medical images using weakly-supervised
joint convolutional sparse coding, in: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 6070–6079.
Hyun, C.M., Kim, H.P., Lee, S.M., Lee, S., Seo, J.K., 2018. Deep learning
for undersampled mri reconstruction. Physics in Medicine & Biology 63,
135007.
Iqbal, Z., Nguyen, D., Hangel, G., Motyka, S., Bogner, W., Jiang, S., 2019.
Super-resolution 1h magnetic resonance spectroscopic imaging utilizing
deep learning. Frontiers in oncology 9.
Isaac, J.S., Kulkarni, R., 2015. Super resolution techniques for medical image
processing, in: 2015 International Conference on Technologies for Sustain-
able Development (ICTSD), IEEE. pp. 1–6.
Jain, S., Sima, D.M., Sanaei Nezhad, F., Hangel, G., Bogner, W., Williams, S.,
Van Huffel, S., Maes, F., Smeets, D., 2017. Patch-based super-resolution
of mr spectroscopic images: application to multiple sclerosis. Frontiers in
9
neuroscience 11, 13.
Johnson, J., Alahi, A., Fei-Fei, L., 2016. Perceptual losses for real-time style
transfer and super-resolution, in: European conference on computer vision,
Springer. pp. 694–711.
Jung, H., Sung, K., Nayak, K.S., Kim, E.Y., Ye, J.C., 2009. k-t focuss: a general
compressed sensing framework for high resolution dynamic mri. Magnetic
Resonance in Medicine: An Official Journal of the International Society for
Magnetic Resonance in Medicine 61, 103–116.
Kavur, A.E., Gezer, N.S., Barıs¸, M., Aslan, S., Conze, P.H., Groza, V., Pham,
D.D., Chatterjee, S., Ernst, P., ¨
Ozkan, S., et al., 2020. Chaos challenge-
combined (ct-mr) healthy abdominal organ segmentation. Medical Image
Analysis , 101950.
Kim, Y.G., Kim, S., Cho, C.E., Song, I.H., Lee, H.J., Ahn, S., Park, S.Y.,
Gong, G., Kim, N., 2020. Effectiveness of transfer learning for enhancing
tumor classification with a convolutional neural network on frozen sections.
Scientific Reports 10, 1–9.
Lateh, M.A., Muda, A.K., Yusof, Z.I.M., Muda, N.A., Azmi, M.S., 2017. Han-
dling a small dataset problem in prediction model by employ artificial data
generation approach: A review, in: Journal of Physics: Conference Series,
p. 012016.
Ledig, C., Theis, L., Husz´
ar, F., Caballero, J., Cunningham, A., Acosta, A.,
Aitken, A., Tejani, A., Totz, J., Wang, Z., et al., 2017. Photo-realistic single
image super-resolution using a generative adversarial network, in: Proceed-
ings of the IEEE conference on computer vision and pattern recognition, pp.
4681–4690.
Lee, K.H., He, X., Zhang, L., Yang, L., 2018. Cleannet: Transfer learning for
scalable image classifier training with label noise, in: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 5447–
5456.
Li, B., Yang, Q., Xue, X., 2009. Transfer learning for collaborative filtering
via a rating-matrix generative model, in: Proceedings of the 26th annual
international conference on machine learning, pp. 617–624.
Liang, M., Du, J., Li, L., Xue, Z., Wang, X., Kou, F., Wang, X., 2020. Video
super-resolution reconstruction based on deep learning and spatio-temporal
feature self-similarity. IEEE Transactions on Knowledge and Data Engi-
neering .
Liu, C., Wu, X., Yu, X., Tang, Y., Zhang, J., Zhou, J., 2018. Fusing multi-scale
information in convolution network for mr image super-resolution recon-
struction. Biomedical engineering online 17, 1–23.
Lustig, M., Donoho, D., Pauly, J.M., 2007. Sparse mri: The application of com-
pressed sensing for rapid mr imaging. Magnetic Resonance in Medicine:
An Official Journal of the International Society for Magnetic Resonance in
Medicine 58, 1182–1195.
Lustig, M., Santos, J.M., Donoho, D.L., Pauly, J.M., 2006. kt sparse: High
frame rate dynamic mri exploiting spatio-temporal sparsity, in: Proceedings
of the 13th annual meeting of ISMRM, Seattle.
Lyu, Q., Shan, H., Xie, Y., Li, D., Wang, G., 2020. Cine cardiac mri mo-
tion artifact reduction using a recurrent neural network. arXiv preprint
arXiv:2006.12700 .
Mahnken, A.H., Ricke, J., Wilhelm, K.E., 2009. CT-and MR-guided Interven-
tions in Radiology. volume 22. Springer.
Manj´
on, J.V., Coup´
e, P., Buades, A., Fonov, V., Collins, D.L., Robles, M., 2010.
Non-local mri upsampling. Medical image analysis 14, 784–792.
Misra, D., Crispim-Junior, C., Tougne, L., 2020. Patch-based cnn evaluation for
bark classification, in: European Conference on Computer Vision, Springer.
pp. 197–212.
Pan, S.J., Yang, Q., 2009. A survey on transfer learning. IEEE Transactions on
knowledge and data engineering 22, 1345–1359.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T.,
Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., De-
Vito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai,
J., Chintala, S., 2019. Pytorch: An imperative style, high-performance deep
learning library, in: Wallach, H., Larochelle, H., Beygelzimer, A., d'Alch´
e-
Buc, F., Fox, E., Garnett, R. (Eds.), Advances in Neural Information Pro-
cessing Systems 32. Curran Associates, Inc., pp. 8024–8035.
Perez, L., Wang, J., 2017. The effectiveness of data augmentation in image
classification using deep learning. arXiv preprint arXiv:1712.04621 .
Pham, C.H., Ducournau, A., Fablet, R., Rousseau, F., 2017. Brain mri super-
resolution using deep 3d convolutional networks, in: 2017 IEEE 14th Inter-
national Symposium on Biomedical Imaging (ISBI 2017), IEEE. pp. 197–
200.
Plenge, E., Poot, D.H., Bernsen, M., Kotek, G., Houston, G., Wielopolski, P.,
van der Weerd, L., Niessen, W.J., Meijering, E., 2012. Super-resolution
methods in mri: can they improve the trade-offbetween resolution, signal-
to-noise ratio, and acquisition time? Magnetic resonance in medicine 68,
1983–1993.
Qin, C., Schlemper, J., Caballero, J., Price, A.N., Hajnal, J.V., Rueckert, D.,
2018. Convolutional recurrent neural networks for dynamic mr image re-
construction. IEEE transactions on medical imaging 38, 280–290.
Ran, Q., Xu, X., Zhao, S., Li, W., Du, Q., 2020. Remote sensing images
super-resolution with deep convolution networks. Multimedia Tools and
Applications 79, 8985–9001.
Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks
for biomedical image segmentation, in: International Conference on Medi-
cal image computing and computer-assisted intervention, Springer. pp. 234–
241.
Rousseau, F., Initiative, A.D.N., et al., 2010. A non-local approach for im-
age super-resolution using intermodality priors. Medical image analysis 14,
594–605.
Sajjadi, M.S., Scholkopf, B., Hirsch, M., 2017. Enhancenet: Single image
super-resolution through automated texture synthesis, in: Proceedings of the
IEEE International Conference on Computer Vision, pp. 4491–4500.
Sarasaen, C., Chatterjee, S., N¨
urnberger, A., Speck, O., 2020. Super resolution
of dynamic mri using deep learning, enhanced by prior-knowledge, in: 37th
Annual Scientific Meeting Congress of the European Society for Magnetic
Resonance in Medicine and Biology, 33(Supplement 1): S03.04, S28-S29,
Springer. doi:10.1007/s10334-020-00874- 0.
Shi, W., Caballero, J., Husz´
ar, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert,
D., Wang, Z., 2016. Real-time single image and video super-resolution us-
ing an efficient sub-pixel convolutional neural network, in: Proceedings of
the IEEE conference on computer vision and pattern recognition, pp. 1874–
1883.
Tang, Y., Shao, L., 2016. Pairwise operator learning for patch-based single-
image super-resolution. IEEE Transactions on Image Processing 26, 994–
1003.
Tanno, R., Worrall, D.E., Ghosh, A., Kaden, E., Sotiropoulos, S.N., Criminisi,
A., Alexander, D.C., 2017. Bayesian image quality transfer with cnns: ex-
ploring uncertainty in dmri super-resolution, in: International Conference on
Medical Image Computing and Computer-Assisted Intervention, Springer.
pp. 611–619.
Tappen, M.F., Liu, C., 2012. A bayesian approach to alignment-based image
hallucination, in: European conference on computer vision, Springer. pp.
236–249.
Tsao, J., Boesiger, P., Pruessmann, K.P., 2003. k-t blast and k-t sense: dynamic
mri with high frame rate exploiting spatiotemporal correlations. Magnetic
Resonance in Medicine: An Official Journal of the International Society for
Magnetic Resonance in Medicine 50, 1031–1042.
Van Reeth, E., Tham, I.W., Tan, C.H., Poh, C.L., 2012. Super-resolution in
magnetic resonance imaging: a review. Concepts in Magnetic Resonance
Part A 40, 306–325.
Wang, M., Deng, W., 2018. Deep visual domain adaptation: A survey. Neuro-
computing 312, 135–153.
Wang, S., Su, Z., Ying, L., Peng, X., Zhu, S., Liang, F., Feng, D., Liang, D.,
2016. Accelerating magnetic resonance imaging via deep learning, in: 2016
IEEE 13th International Symposium on Biomedical Imaging (ISBI), IEEE.
pp. 514–517.
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P., 2004. Image quality
assessment: from error visibility to structural similarity. IEEE transactions
on image processing 13, 600–612.
Wang, Z., Chen, J., Hoi, S.C., 2020. Deep learning for image super-resolution:
A survey. IEEE transactions on pattern analysis and machine intelligence .
Wang, Z., Simoncelli, E.P., Bovik, A.C., 2003. Multiscale structural similarity
for image quality assessment, in: The Thrity-Seventh Asilomar Conference
on Signals, Systems & Computers, 2003, Ieee. pp. 1398–1402.
Wilson, G., Cook, D.J., 2020. A survey of unsupervised deep domain adapta-
tion. ACM Transactions on Intelligent Systems and Technology (TIST) 11,
1–46.
Yang, C.Y., Ma, C., Yang, M.H., 2014. Single-image super-resolution: A
benchmark, in: European Conference on Computer Vision, Springer. pp.
372–386.
Yu, X., Fernando, B., Ghanem, B., Porikli, F., Hartley, R., 2018. Face super-
resolution guided by facial component heatmaps, in: Proceedings of the
10
European Conference on Computer Vision (ECCV), pp. 217–233.
Zeng, K., Zheng, H., Cai, C., Yang, Y., Zhang, K., Chen, Z., 2018. Simulta-
neous single-and multi-contrast super-resolution for brain mri images based
on a convolutional neural network. Computers in biology and medicine 99,
133–141.
Zhang, H., Yang, Z., Zhang, L., Shen, H., 2014. Super-resolution reconstruction
for multi-angle remote sensing images considering resolution differences.
Remote Sensing 6, 637–657.
Zhang, Y., Wu, G., Yap, P.T., Feng, Q., Lian, J., Chen, W., Shen, D., 2012.
Reconstruction of super-resolution lung 4d-ct using patch-based sparse rep-
resentation, in: 2012 IEEE Conference on Computer Vision and Pattern
Recognition, IEEE. pp. 925–931.
Zhao, W., 2017. Research on the deep learning of the small sample data based
on transfer learning, in: AIP Conference Proceedings, AIP Publishing LLC.
p. 020018.
Zhu, Y., Zhang, Y., Yuille, A.L., 2014. Single image super-resolution using
deformable patches, in: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 2917–2924.
11