ArticlePDF Available

Abstract and Figures

Dynamic imaging is a beneficial tool for interventions to assess physiological changes. Nonetheless during dynamic MRI, while achieving a high temporal resolution, the spatial resolution is compromised. To overcome this spatio-temporal trade-off, this research presents a super-resolution (SR) MRI reconstruction with prior knowledge based fine-tuning to maximise spatial information while reducing the required scan-time for dynamic MRIs. An U-Net based network with perceptual loss is trained on a benchmark dataset and fine-tuned using one subject-specific static high resolution MRI as prior knowledge to obtain high resolution dynamic images during the inference stage. 3D dynamic data for three subjects were acquired with different parameters to test the generalisation capabilities of the network. The method was tested for different levels of in-plane undersampling for dynamic MRI. The reconstructed dynamic SR results after fine-tuning showed higher similarity with the high resolution ground-truth, while quantitatively achieving statistically significant improvement. The average SSIM of the lowest resolution experimented during this research (6.25 % of the k-space) before and after fine-tuning were 0.939 ± 0.008 and 0.957 ± 0.006 respectively. This could theoretically result in an acceleration factor of 16, which can potentially be acquired in less than half a second. The proposed approach shows that the super-resolution MRI reconstruction with prior-information can alleviate the spatio-temporal trade-off in dynamic MRI, even for high acceleration factors.
Content may be subject to copyright.
Fine-tuning deep learning model parameters for improved super-resolution of dynamic
MRI with prior-knowledge
Chompunuch Sarasaena,b,c,, Soumick Chatterjeea,c,d,e,, Mario Breitkopfa,c, Georg Roseb,c, Andreas N ¨
urnbergerd,e,g,
Oliver Specka,c,f,g,h
aBiomedical Magnetic Resonance, Otto von Guericke University Magdeburg, Germany
bInstitute for Medical Engineering, Otto von Guericke University Magdeburg, Germany
cResearch Campus STIMULATE, Otto von Guericke University Magdeburg, Germany
dFaculty of Computer Science, Otto von Guericke University Magdeburg, Germany
eData and Knowledge Engineering Group, Otto von Guericke University Magdeburg, Germany
fGerman Center for Neurodegenerative Disease, Magdeburg, Germany
gCenter for Behavioral Brain Sciences, Magdeburg, Germany
hLeibniz Institute for Neurobiology, Magdeburg, Germany
Abstract
Dynamic imaging is a beneficial tool for interventions to assess physiological changes. Nonetheless during dynamic MRI, while
achieving a high temporal resolution, the spatial resolution is compromised. To overcome this spatio-temporal trade-o, this re-
search presents a super-resolution (SR) MRI reconstruction with prior knowledge based fine-tuning to maximise spatial information
while reducing the required scan-time for dynamic MRIs. A U-Net based network with perceptual loss is trained on a benchmark
dataset and fine-tuned using one subject-specific static high resolution MRI as prior knowledge to obtain high resolution dynamic
images during the inference stage. 3D dynamic data for three subjects were acquired with dierent parameters to test the gener-
alisation capabilities of the network. The method was tested for dierent levels of in-plane undersampling for dynamic MRI. The
reconstructed dynamic SR results after fine-tuning showed higher similarity with the high resolution ground-truth, while quantita-
tively achieving statistically significant improvement. The average SSIM of the lowest resolution experimented during this research
(6.25% of the k-space) before and after fine-tuning were 0.939 ±0.008 and 0.957 ±0.006 respectively. This could theoretically
result in an acceleration factor of 16, which can potentially be acquired in less than half a second. The proposed approach shows
that the super-resolution MRI reconstruction with prior-information can alleviate the spatio-temporal trade-oin dynamic MRI,
even for high acceleration factors.
Keywords: super-resolution, dynamic MRI, prior knowledge, fine-tuning, patch-based super-resolution, deep learning
1. Introduction
Magnetic Resonance Imaging (MRI) is in clinical use for a
few decades with the advantage of non-ionising radiation, non-
invasiveness and an excellent soft tissue contrast. Considering
the clear visibility of tumours because of its high soft tissue
contrast, together with real-time supervision (e.g. thermome-
try), MRI is a promising tool for interventions. The visuali-
sation of lesions, as well as the needle paths, have to be ac-
quired prior to any interventional procedures, so-called a plan-
ning scan or a preinterventional MR imaging (Mahnken et al.,
2009). Furthermore, in MR-guided interventions, such as liver
biopsy, it is necessary to continuously acquire data and recon-
struct a series of images during interventions, in order to ex-
amine dynamic movements of internal organs (Bernstein et al.,
2004). A clear interpretable visualisation of the target lesion,
surrounding tissues including risk structures is crucial during
interventions. In order to achieve a high temporal resolution
during dynamic MRIs, because of the inherently slow speed
C. Sarasaen and S. Chatterjee contributed equally to this work.
of image acquisition, the amount of data to be acquired has
to be reduced - which may result in loss of spatial resolution.
Although there are a number of techniques dealing with this
spatio-temporal trade-o(Tsao et al., 2003; Lustig et al., 2006,
2007; Jung et al., 2009), their speed of reconstruction creates
a hindrance for real-time or near real-time imaging. Therefore,
a compromise between spatial and temporal resolution is in-
evitable during real-time MRIs and needs to be mitigated.
The so-called super-resolution (SR) algorithms aim to restore
images with high spatial resolution from the corresponding low
resolution images. SR approaches have been widely used for
various applications (Zhang et al., 2014; Sajjadi et al., 2017),
including for super-resolution of MRIs (SR-MRI) (Van Reeth
et al., 2012; Plenge et al., 2012; Isaac and Kulkarni, 2015).
Furthermore, deep learning based super-resolution reconstruc-
tion has been substantiated in recent times to be a successful
tool for SR-MRI (Zeng et al., 2018; He et al., 2020), including
for dynamic MRIs (Qin et al., 2018; Lyu et al., 2020). How-
ever, most deep learning based methods need large training data
sets and finding such training data – matching with the data of
the real-time acquisition that needs to be reconstructed in terms
Preprint submitted to Elsevier October 26, 2021
arXiv:2102.02711v4 [eess.IV] 23 Oct 2021
of contrast and sequence – can be a challenging task. Using
a training set significantly dierent than the test set can pro-
duce results of poor quality (Wang and Deng, 2018; Wilson
and Cook, 2020). Several techniques have been used to deal
with the problem of small datasets in deep learning, such as
data augmentation (Perez and Wang, 2017) and synthetic data
generation (Lateh et al., 2017; Frid-Adar et al., 2018). How-
ever, these methods rely on artificially modifying the data to
increase the size of the dataset. Patch-based training can also
help cope with the small dataset problem by splitting each data
into smaller patches. This can eectively increase the num-
ber of samples in the dataset without artificially modifying the
data (Frid-Adar et al., 2017). The patch-based super-resolution
(PBSR) techniques learn the mapping function from given cor-
responding pairs of high resolution and low resolution image
patches (Yang et al., 2014).
This study proposes a PBSR reconstruction, aiming at ad-
dressing the problem of the lack of large abdominal datasets
for training. This research intends to improve deep learning
based super-resolution of dynamic MR images by incorporat-
ing prior images (planning scan). The network was trained on
a publicly available abdominal dataset of 40 subjects, acquired
using dierent sequences than the dynamic MR that is to be
reconstructed. After that, the network was fine-tuned using a
high resolution prior planning scan of the same subject as the
dynamic acquisition.
1.1. Related works
Super-resolution approaches have been employed for a wide
variety of tasks, such as computer vision (Shi et al., 2016;
Dong et al., 2016; Sajjadi et al., 2017), remote sensing (Zhang
et al., 2014; Ran et al., 2020), face-related tasks (Tappen and
Liu, 2012; Yu et al., 2018) and medical applications (Isaac and
Kulkarni, 2015; Huang et al., 2017). Deep learning based meth-
ods have been widely used in recent times for performing super-
resolution (Dong et al., 2014; Zhu et al., 2014; Dong et al.,
2016; Ran et al., 2020). Moreover, deep learning based tech-
niques have been proven to be a successful tool for numer-
ous applications in the field of MRI, including for perform-
ing MR reconstruction (Wang et al., 2016; Hyun et al., 2018;
Hammernik et al., 2018; Chatterjee et al., 2019) and for SR-
MRI (Zeng et al., 2018; Liu et al., 2018; Chaudhari et al.,
2018; He et al., 2020). Dierent deep learning based SR-
MRI ideas have been proposed for static brain MRI (Huang
et al., 2017; Tanno et al., 2017; Pham et al., 2017; Zeng et al.,
2018; Liu et al., 2018; Chen et al., 2018; Deka et al., 2020; Gu
et al., 2020). Furthermore, deep learning based methods have
additionally been shown to tackle the spatio-temporal trade-
o(Liang et al., 2020), also for dynamic cardiac MR recon-
struction (Qin et al., 2018; Lyu et al., 2020).
Single-image super-resolution techniques are classified into
the groups of prediction-based, edge-based, image statistical
and patch-based methods (Yang et al., 2014). PBSR can over-
come the need of large training datasets as the actual train-
ing is done using patches, rather than on whole images. The
PBSR methods have been applied to dierent tasks, including
applications in medical imaging (Manj´
on et al., 2010; Rousseau
et al., 2010; Zhang et al., 2012; Coup´
e et al., 2013; Jain et al.,
2017). By employing PBSR, the reconstruction procedure can
be driven to cope with the limitation of available training ab-
dominal MR data (Tang and Shao, 2016; Misra et al., 2020).
The U-Net (Ronneberger et al., 2015) model, which was
originally proposed for image segmentation, over the past few
years has been proven to solve various inverse problems as
well (Hyun et al., 2018; Iqbal et al., 2019; Ghodrati et al., 2019).
Iqbal et al. (2019) developed a U-Net based architecture for SR
reconstruction of MR spectroscopic images. Hyun et al. (2018)
reconstructed MRI utilising a 2D U-Net from zero-filled data,
undersampled using uniform Cartesian sampling (GRAPPA-
like) with a dense k-space centre. Ghodrati et al. (2019) em-
ployed a U-Net model to test the performance of this network
structure for cardiac MRI reconstruction. Due to the promising
results shown in the papers mentioned earlier, the current paper
proposed a 3D U-Net (C¸ ic¸ek et al., 2016) based architecture for
performing SR-MRI for abdominal dynamic images.
Transfer learning is a technique for re-proposing or adapting
a pre-trained model with fine-tuning (Bengio et al., 2017). With
transfer learning, the network weights learned from one task
can be used as pre-trained weights for another task, and then
the network is trained (fine-tuned) for the new task. It has been
widely used in data mining and machine learning (Dai et al.,
2007; Li et al., 2009; Choi et al., 2017; Lee et al., 2018). Trans-
fer learning can address the issue of having insucient training
data (Zhao, 2017; Kim et al., 2020). The fine-tuning process
is known to improve the network’s performance and can help
to converge in less training epochs with smaller datasets (Pan
and Yang, 2009). One of the main research inquiries of apply-
ing transfer learning is ”what to transfer”. This current study
thereby, utilises the specific knowledge of priors from a static
planning image, which is usually acquired earlier to an inter-
ventional procedure. The incorporation of priors is meant to
constrain anatomical structures in the fine-tuning process and
to improve the data fidelity term in the regularisation process.
To determine the reconstruction error during training a deep
learning model, between the model’s prediction and the corre-
sponding ground-truth images, the selection of a loss function
is crucial. Pixel-based loss functions such as mean squared er-
ror (L2 loss) are commonly used for SR, however, in terms of
perceptual quality, it often generates too smooth results which
is caused due to the loss of high-frequency details (Wang et al.,
2003, 2004; Johnson et al., 2016; Ledig et al., 2017). Perceptual
loss has shown potential to achieve high-quality images for im-
age transformation tasks such as style transfer (Johnson et al.,
2016; Gatys et al., 2016). For MRI reconstruction, Ghodrati
et al. (2019) have shown a comparative study of loss functions,
such as perceptual loss (using VGG-16 as perceptual loss net-
work), pixel-based loss (L1 and L2) and patch-wise structural
dissimilarity (Dssim), for deep learning based cardiac MR re-
construction. They found that the results of the perceptual loss
outperformed the other loss functions. Hence, in this work a
combination of perceptual loss network with the mean absolute
error (MAE) as the loss function was used, which is explained
in section 2.3.1.
2
1.2. Problem statement
Given a low resolution image ILR and a corresponding high
resolution image IHR , the reconstructed high resolution image
ˆ
IHR can be recovered from a super-resolution reconstruction us-
ing the following equation (Wang et al., 2020):
ˆ
IHR =z(ILR ;θ) (1)
where zdenotes the super-resolution model that maps the
image counterparts and θdenotes the parameters of z. The SR
image reconstruction is an ill-posed problem, a network model
can be trained to solve the objective function: ‘I
ˆ
θ=arg minL(ˆ
IHR ,IHR)+λR(θ)(2)
where the L(ˆ
IHR ,IHR) denotes the loss function between the
approximated HR image ˆ
IHR and ground-truth image IH R,R(θ)
is a regularisation term and λdenotes the trade-oparameter.
1.3. Contributions
This paper presents a method to incorporate prior knowledge
in deep-learning based super-resolution and its application in
dynamic MRI. The main contributions are as follows:
This paper addresses the trade-obetween the spatial and
temporal resolution of dynamic MRI by incorporating a
static high resolution scan as prior knowledge while per-
forming spatial super-resolution on low-resolution images,
eectively reducing the required scan-time per volume -
which in turn will improve the temporal resolution.
A 3D U-Net model was first trained for the task of SR-
MRI on a benchmark dataset and was fine-tuned using a
subject-specific prior planning scan.
This paper further tackles the problem of the lack of high-
resolution dynamic MRI for training in two ways:
By using a static benchmark dataset for training, hav-
ing dierent contrasts and resolutions than the tar-
get dynamic MRI; followed by fine-tuning using one
static planning scan
By patch-based super-resolution training and fine-
tuning
To achieve realistic super-resolved images, perceptual loss
was used as the loss function for training and fine-tuning
the model, which was calculated using a 3D perceptual
loss network pre-trained on MR images.
2. Methodology
This paper proposes a framework of patch-based MR super-
resolution based on U-Net, incorporating prior knowledge. The
framework can be divided into three stages: main training, fine-
tuning and inference. The U-Net model was initially trained
with a benchmark dataset for main-training and then was fine-
tuned using a subject-specific prior static scan. Finally in the
inference stage, high resolution dynamic MRIs were recon-
structed from low resolution scans. This chapter starts with the
description of various datasets used in this research, then ex-
plains the network architecture followed by the implementation
and training, finally explains the metrics used for evaluation.
2.1. Data
In this work, 3D abdominal MR volumes were artificially
downsampled in-plane using MRUnder (Chatterjee, 2020)1
pipeline to simulate low resolution datasets. The low reso-
lution data was generated by performing undersampling in-
plane (phase-encoding and read-out direction) by taking the
centre of k-space without zero-padding. The CHAOS challenge
dataset (Kavur et al., 2020) (T1-Dual; in- and opposed-phase
images) was used for the main training. During this phase,
the dataset was split into training and validation set, with the
ratio of 70:30. High resolution 3D static (breath-hold) and
3D ”pseudo”-dynamic (free-breathing) scans for 10 time-points
(TP) using T1w FLASH sequence were acquired for fine-tuning
and inference respectively. Each time-point of the dynamic ac-
quisition was treated as separate 3D Volume. Three healthy
subjects were scanned with the same sequence but with dier-
ent parameters on a 3T MRI (Siemens Magnetom Skyra). This
aims to test the generalisation of the network. For each subject,
the 3D static and the 3D dynamic scans were acquired in dif-
ferent sessions using the same sequences and parameters. The
sequence parameters of the various datasets have been listed in
Table 1. The CHAOS dataset (for main training), the 3D static
scans (for fine-tuning) and the 3D dynamic scans (for inference)
were artificially downsampled for three dierent levels of im-
age resolution, by taking 25%, 10% and 6.25% of the k-space
centre, resulting in MR acceleration factor of 2, 3 and 4 respec-
tively (considering undersampling only in the phase-encoding
direction). This can be accelerated theoretically to a factor of 4,
9 and 16 respectively considering the amount of data used for
the SR reconstruction. The eective resolutions and the roughly
estimated acquisition times of the low resolution images were
calculated from the corresponding high-resolution images, are
reported in Table 2. The acquisition times were calculated as:
AcqT ime =PEn×T R ×Sm(3)
where PEnis the number of phase-encoding lines - which
equals to the matrix dimension in this direction, T R is the rep-
etition time - a parameter of an MRI pulse sequence which de-
fines the time between two consecutive radio-frequency pulses,
and Smis the number of slices. Phase/slice oversampling,
phase/slice resolutions and GRAPPA factor were taken into ac-
count while calculating PEn. The low resolution images served
as the input to the network and were compared against the high
resolution ground-truth images.
2.2. Network Architecture
Fig 1 portrays the proposed network architecture. In this
work, a 3D U-Net based model (Ronneberger et al., 2015;
1MRUnder on Github: https://github.com/soumickmj/MRUnder
3
Table 1: MRI acquisition parameters CHAOS dataset and subject-wise 3D dynamic scans. Static scans were performed using the same subject-wise sequence
parameters as the dynamic scans for one time-point (TP), acquired at a dierent session.
CHAOS
(40 Subjects) Subject 1 Subject 2 Subject 3
Sequence T1 Dual In-Phase
& Opposed-Phase T1w Flash 3D T1w Flash 3D T1w Flash 3D
Resolution 1.44 x 1.44 x 5 -
2.03 x 2.03 x 8 mm31.09 x 1.09 x 4 mm31.09 x 1.09 x 4 mm31.36 x 1.36 x 4 mm3
FOV x, y, z 315 x 315 x 240 -
520 x 520 x 280 mm3280 x 210 x 160 mm3280 x 210 x 160 mm3350 x 262 x 176 mm3
Encoding matrix 256 x 256 x 26 -
400 x 400 x 50 256 x 192 x 40 256 x 192 x 40 256 x 192 x 44
Phase/Slice oversampling - 10/0 % 10/0 % 10/0 %
TR/TE
110.17 - 255.54 ms /
4.60 - 4.64 ms (In-Phase)
2.30 ms (Opposed-Phase)
2.34/0.93 ms 2.34/0.93 ms 2.23/0.93 ms
Flip angle 80° 8° 8° 8°
Bandwidth - 975 Hz/Px 975 Hz/Px 975 Hz/Px
GRAPPA factor None 2 None None
Phase/Slice partial Fourier - Off/OOff/OOff/O
Phase/Slice resolution - 75/65 % 75/65 % 50/64 %
Fat suppression - None On On
Time per TP - 5.53 sec 11.76 sec 8.01 sec
Table 2: Eective resolutions and estimated acquisition times (per TP) of the dynamic and static datasets after performing dierent levels of artificial undersampling.
Subject 1 Subject 2 Subject 3
Resolution
(mm3)
Acq. Time
(sec)
Resolution
(mm3)
Acq. Time
(sec)
Resolution
(mm3)
Acq. Time
(sec)
High Resolution Ground-truth 1.09 x 1.09 x 4 4.81 1.09 x 1.09 x 4 9.61 1.36 x 1.36 x 4 6.62
25% of k-space 2.19 x 2.19 x 4 1.22 2.19 x 2.19 x 4 2.43 2.73 x 2.73 x 4 1.65
10% of k-space 3.50 x 3.50 x 4 0.47 3.50 x 3.50 x 4 0.94 4.38 x 4.38 x 4 0.66
6.25% of k-space 4.38 x 4.38 x 4 0.28 4.38 x 4.38 x 4 0.56 5.47 x 5.47 x 4 0.42
Figure 1: The proposed network Architecture.
4
Figure 2: Method Overview.
C¸ ic¸ek et al., 2016; Sarasaen et al., 2020) with perceptual loss
network (Chatterjee et al., 2020) was employed for super-
resolution reconstruction. The U-Net architecture consists of
two main paths; contracting (encoding) and expanding (decod-
ing). The contracting path consists of three blocks, each com-
prises of two convolutional layers and a ReLU activation func-
tion. The expanding path also consists of three blocks, but con-
volutional transpose layers were used instead of the convolu-
tional layers. The training was performed using 3D patches of
the volumes, with a patch size of 243.
The U-Net model requires the same matrix size for input and
output, therefore, the input low-resolution image patches were
interpolated using trilinear interpolation before supplying them
to the network. The interpolation factor was determined based
on the undersampling factor - thus the low-resolution input is
modified to the same matrix size as the target high-resolution
ground-truth.
Patch-based training may result in patching artefacts during
inference. To remove these artefacts, the inference was per-
formed with a stride of one and the overlapped portions were
averaged after reconstruction.
2.3. Model Implementation and Network Training
Fig 2 shows the method overview. The main training was
performed using 3D patches of the 80 volumes (40 subjects, in-
phase and opposed-phase for each subject), with a patch size
of 243and a stride of 6 for the slice dimension and 12 for the
other dimensions. After that, the network was fine-tuned using
a single 3D static scan of the same subject from an earlier ses-
sion, labelled as x,y,z,t in Fig 2. This static scan has the same
resolution, contrast and volume coverage as the high resolution
dynamic scan. The static and the dynamic scans were not co-
registered to keep it similar to the real-life scenario and to keep
the speed of inference fast. Fine-tuning and evaluations were
performed with a patch size of 243and a stride of one. The im-
plementation was done using PyTorch (Paszke et al., 2019) and
was trained using Nvidia Tesla V100 GPUs. The loss was min-
imised using the Adam optimiser with a learning rate of 1e-4.
The main training was performed for 200 epochs. The network
was fine-tuned for only one epoch, using the planning scan with
a lower learning rate (1e-6).
2.3.1. Perceptual Loss
Loss during the training and fine-tuning of the network was
calculated with the help of perceptual loss (Johnson et al.,
2016). The first three levels of the contraction path of a pre-
trained (on 7T MRA scans, for vessel segmentation) frozen
U-Net MSS model (Chatterjee et al., 2020) were used as the
perceptual loss network (PLN) to extract features from the fi-
nal super-resolved output of the model and the ground-truth
images (refer to Fig.1). Typically VGG-16 trained on three-
channel RGB non-medical images (ImageNet Dataset) is used
as PLN, even while working with medical images (Ghodrati
et al., 2019) as the PLN doesn’t have to be trained on a similar
dataset. In this research, the pre-trained network was chosen
because it was originally trained on single-channel medical im-
ages, but of dierent contrast and organ; and it was hypothe-
sised that using a network trained as such will be more suitable
than a network trained on three-channel images. The extracted
features from the model’s output and from the ground-truth im-
ages were compared using mean absolute error (L1 loss). The
losses obtained at each level for each feature were then added
together and backpropagated.
2.4. Evaluation Metrics
To evaluate the quality of the reconstructed images against
the ground-truth HR images, two of the most commonly used
metrics for evaluating image quality were selected, namely the
structural similarity index (SSIM)(Wang et al., 2004) and the
peak signal-to-noise ratio (PSNR).
For perceptual quality assessment, the accuracy of the recon-
structed images was compared to the ground truth using SSIM,
5
which is based on the computations of luminance, contrast and
structure terms between image x and y:
S S I M(x,y)=(2µxµy+c1)(2σxy +c2)
(µ2
x+µ2
y+c1)(σ2
x+σ2
y+c2)(4)
where µx, µy, σx, σyand σxy are the local means, standard de-
viations, and cross-covariance for images xand y, respectively.
c1=(k1L)2and c2=(k2L)2, where Lis the dynamic range of
the pixel-values, k1=0.01 and k2=0.03.
Additionally, the performance of the model was measured
statistically with PSNR. It is calculated via the mean-square
error (MSE) as:
PS NR =10 log10 R2
MS E !(5)
where Ris the maximum fluctuation in the input image.
3. Results and Discussion
Performance of the model was evaluated for three dierent
levels of undersampling: by taking 25%, 10% and 6.25% of the
k-space centre. The network was tested before and after fine-
tuning using 3D dynamic MRI. The proposed approach was
compared against the low resolution input, the traditional trilin-
ear interpolation, area interpolation (based on adaptive average
pooling, as implemented in PyTorch2) and finally against the
most widely used technique in clinical MRIs - Fourier interpo-
lation of the input (zero-padded k-space, also known as the sinc
interpolation).
There was a noticeable improvement qualitatively and quan-
titatively while reconstructing low resolution data using the pro-
posed method, even for only 6.25% of the k-space. Fig 3 shows
the comparison qualitatively for the low resolution images by
taking 25%, 10% and 6.25% of the k-space. Fig 4 portrays
the comparison of the low resolution input for 6.25% of the k-
space, the lowest resolution investigated during this study, with
the SR result after fine-tuning over dierent time points. The
SSIM maps were calculated against the high resolution ground-
truth, which the respective SSIM value can be found on top
of the image. Fig 5 illustrates the deviations of an example
result from its corresponding ground-truth for two dierent re-
gions of interest. It can be observed that SR results after fine-
tuning could alleviate the undersampling artefacts, which are
still present in the SR results of the main training, even for rela-
tively low resolution images like 10% and 6.25% of the k-space.
Consequently, the visibility of small details is improved.
Additionally, for quantitative analysis, Table 3 displays the
average and standard deviation (SD) of SSIM, PSNR and the
SD of subtracted images for all time points for the dynamic
datasets. Here, each time-point has been considered indepen-
dent of each other as separate 3D volumes. Fig 6 shows the dis-
tribution of the resultant metrics over all resolutions and sub-
jects. The first row portrays the SSIM values and the second
2PyTorch Interpolation: https://pytorch.org/docs/stable/
generated/torch.nn.functional.interpolate.html
row shows the PSNR values. The columns denote 25%, 10%,
and 6.25% of the k-space, respectively. The blue, orange, green,
red, and violet lines represent the reconstruction results of tri-
linear interpolation, area interpolation, zero-padding (sinc inter-
polation), SR main training, and SR after fine-tuning, whereas
the thickness of the lines represents the standard deviation over
time-points. It can be observed that the proposed method (SR
after fine-tuning) significantly outperformed all the baseline
methods experimented here.
Fine-tuning with the planning scan helped in obtaining
sharper images and achieving a better edge delineation. Fur-
thermore, the statistical significance of the improvements in
terms of SSIM achieved by the model after fine-tuning was
evaluated using paired t-test and Wilcoxon signed-rank test.
Separate tests were performed considering all the dierent res-
olutions together and also by considering each resolution sep-
arately. It was observed that the improvement was statistically
significant in every evaluated scenario (p-values were always
less than 0.001).
The acquisition time of high resolution 3D ”pseudo”-
dynamic reference data without parallel acquisition in this study
was ten seconds and five seconds with GRAPPA factor two (Ta-
ble 2). These are not sucient for real-time or near real-time
applications and might lead to blurring in free-breathing sub-
jects. This research shows the potential to acquire such volume
with only minimal loss of spatial information in less than half a
second.
The fine-tuning process took approximately eight hours to
finish for each subject using the earlier mentioned setup (Sec-
tion 2.3). Super-resolving each time-point took only a fraction
of a second. The required time for fine-tuning and inference
can further be reduced by reducing the patch-overlap (stride),
though that might reduce the quality of the resultant super-
resolved images. It can be further perceived that the network
was able to produce results highly similar to the ground-truth
(SSIM of 0.957) even while super-resolving 6.25% of k-space,
which can make the acquisition 16 times faster. Combining this
fast acquisition speed with the inference speed of the method,
this study can be extended to be used for real-time or near real-
time MRIs during interventions.
In the current study, only the centre of the k-space was used
during undersampling, which results in loss of resolution with-
out creating explicit image artefacts. Other undersampling pat-
terns, such as variable density or GRAPPA-like uniform under-
sampling of higher spatial frequencies may be investigated in
the future.
It should be noted that the static planning scans and the actual
dynamic scans during interventions are typically acquired with
dierent sequences, with planning scans having higher contrast
and resolution than the dynamic scans. This study was con-
ducted using the same sequence for static and dynamic scans,
but dierent resolutions and positions (dierent scan session).
An additional experiment was performed by fine-tuning using
a volumetric interpolated breath-hold examination (VIBE) se-
quence as planning scan for one subject. Super-resolving the
dynamic low-resolution images from 6.25% of the k-space re-
sulted in 0.032 lower SSIM than using the identical sequence
6
Figure 3: Comparative results of low resolution (25%, 10% and 6.25% of k-space) 3D Dynamic data of the same slice. From left to right: low resolution images
(scaled-up), Interpolated input (Trilinear), super-resolution results of the main training (SR Results Main Training), super-resolution results after the fine-tune (SR
After Fine-Tuning) and ground-truth images.
Figure 4: An example comparison of the low resolution input of the 6.25% of k-space with the super-resolution (SR) result after fine-tuning over three dierent
time points, compared against the high resolution ground-truth using SSIM maps.
Table 3: The average and the standard deviation of SSIM, PSNR, and SD of dierence images with ground-truth. The table shows the results of dierent resolutions.
Data 25% of k-space 10% of k-space 6.25% of k-space
SSIM PSNR diSD SSIM PSNR diSD SSIM PSNR diSD
Area Interpolation 0.911±0.011 29.721±1.948 0.068±0.011 0.814±0.018 26.250±1.867 0.071±0.013 0.723±0.031 24.092±1.964 0.080±0.013
Trilinear Interpolation 0.964±0.005 37.680±1.770 0.013±0.002 0.906±0.007 33.148±1.780 0.022±0.004 0.872±0.011 31.504±1.786 0.026±0.005
Zero-padded 0.977±0.013 37.980±4.078 0.064±0.011 0.926±0.009 31.844±2.260 0.067±0.013 0.888±0.012 29.803±2.147 0.069±0.015
SR Main Training 0.986±0.007 42.781±2.424 0.009±0.002 0.961±0.009 36.710±1.086 0.014±0.002 0.939±0.008 35.377±1.653 0.017±0.003
SR Fine-tuning 0.993±0.004 45.706±2.169 0.005±0.002 0.973±0.005 39.433±2.144 0.007±0.001 0.957±0.006 37.306±2.357 0.014±0.004
7
Figure 5: An example from the reconstructed results, compared against its ground-truth (GT) for low resolution images from 6.25% of k-space. From left to right,
upper to lower: ground-truth, trilinear interpolation (input of the network), SR result of main training and SR result after fine-tuning. For the yellow ROI, (a-b):
trilinear interpolation and the dierence image from GT, (e-f): SR result of the main training and the dierence image from GT and (i-j): SR result after fine-tuning
and the dierence image from GT. The images on the right part are identical examples for the red ROI.
Figure 6: Line plot showing the mean and 95% confidence interval of the resultant SSIM and PSNR over the dierent time-points for each subject. The blue,
orange, green, red, and violet lines represent the reconstruction results of trilinear interpolation, area interpolation, zero-padding (sinc interpolation), SR main
training, and SR after fine-tuning, respectively. The upper row shows the SSIM values and the lower row shows the PSNR values.
8
with higher resolution for fine-tuning. This may be a limitation
of the current approach but requires further investigation.
4. Conclusion and Future Work
This research shows that fine-tuning with a subject-specific
prior static scan can significantly improve the results of deep
learning based super-resolution (SR) reconstruction. A 3D U-
Net based model was trained with the help of perceptual loss to
estimate the reconstruction error. The network model was ini-
tially trained using the CHAOS abdominal benchmark dataset
and was then fine-tuned using a static high resolution prior
scan. The model was used to obtain super-resolved high res-
olution 3D abdominal dynamic MRI from their corresponding
low resolution images. Even though the network was trained
using MRI sequences dierent than the reconstructed dynamic
MRI, the SR results after fine-tuning showed higher similar-
ity with the ground-truth images. The proposed method could
overcome the spatio-temporal trade-oby improving the spa-
tial resolution of the images without compromising the speed
of acquisition. This approach could be applied to real-time dy-
namic acquisitions, such as for interventional MRIs, because of
the high speed of inference of deep learning models.
In the presented approach, a 3D U-Net was used as the net-
work model, which needs interpolation as a pre-processing step.
Therefore the reconstructed images could be suering from in-
terpolation errors. As future work, network models such as
SRCNN, which do not need interpolation, will be studied. In
addition, image resolutions lower than the already investigated
ones will be studied to check the network’s limitations. More-
over, clinical interventions are performed with devices, such as
needle, which are not present in the planning scan. The authors
plan to extend this research in future by evaluating on images
with such devices.
Acknowledgements
This work was conducted within the context of the Inter-
national Graduate School MEMoRIAL at Otto von Guericke
University (OVGU) Magdeburg, Germany, kindly supported by
the European Structural and Investment Funds (ESF) under the
programme ”Sachsen-Anhalt WISSENSCHAFT International-
isierung“ (project no. ZS/2016/08/80646).
References
Bengio, Y., Goodfellow, I., Courville, A., 2017. Deep learning. volume 1. MIT
press Massachusetts, USA:.
Bernstein, M.A., King, K.F., Zhou, X.J., 2004. Handbook of MRI pulse se-
quences. Elsevier.
Chatterjee, S., 2020. soumickmj/mrunder: Initial release. doi:10.5281/
zenodo.3901455.
Chatterjee, S., Breitkopf, M., Sarasaen, C., Rose, G., N¨
urnberger, A., Speck,
O., 2019. A deep learning approach for reconstruction of undersampled
cartesian and radial data, in: ESMRMB 2019.
Chatterjee, S., Prabhu, K., Pattadkal, M., Bortsova, G., Sarasaen, C., Dubost,
F., Mattern, H., de Bruijne, M., Speck, O., N¨
urnberger, A., 2020. Ds6,
deformation-aware semi-supervised learning: Application to small vessel
segmentation with noisy training data. arXiv preprint arXiv:2006.10802 .
Chaudhari, A.S., Fang, Z., Kogan, F., Wood, J., Stevens, K.J., Gibbons, E.K.,
Lee, J.H., Gold, G.E., Hargreaves, B.A., 2018. Super-resolution muscu-
loskeletal mri using deep learning. Magnetic resonance in medicine 80,
2139–2154.
Chen, Y., Shi, F., Christodoulou, A.G., Xie, Y., Zhou, Z., Li, D., 2018. Ecient
and accurate mri super-resolution using a generative adversarial network and
3d multi-level densely connected network, in: International Conference on
Medical Image Computing and Computer-Assisted Intervention, Springer.
pp. 91–99.
Choi, K., Fazekas, G., Sandler, M., Cho, K., 2017. Transfer learning for music
classification and regression tasks. arXiv preprint arXiv:1703.09179 .
C¸ ic¸ ek, ¨
O., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O., 2016.
3d u-net: learning dense volumetric segmentation from sparse annotation,
in: International conference on medical image computing and computer-
assisted intervention, Springer. pp. 424–432.
Coup´
e, P., Manj´
on, J.V., Chamberland, M., Descoteaux, M., Hiba, B., 2013.
Collaborative patch-based super-resolution for diusion-weighted images.
NeuroImage 83, 245–261.
Dai, W., Xue, G.R., Yang, Q., Yu, Y., 2007. Transferring naive bayes classifiers
for text classification, in: AAAI, pp. 540–545.
Deka, B., Mullah, H.U., Datta, S., Lakshmi, V., Ganesan, R., 2020. Sparse rep-
resentation based super-resolution of mri images with non-local total varia-
tion regularization. SN Computer Science 1, 1–13.
Dong, C., Loy, C.C., He, K., Tang, X., 2014. Learning a deep convolutional
network for image super-resolution, in: European conference on computer
vision, Springer. pp. 184–199.
Dong, C., Loy, C.C., Tang, X., 2016. Accelerating the super-resolution con-
volutional neural network, in: European conference on computer vision,
Springer. pp. 391–407.
Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., Greenspan,
H., 2017. Modeling the intra-class variability for liver lesion detection using
a multi-class patch-based cnn, in: International Workshop on Patch-based
Techniques in Medical Imaging, Springer. pp. 129–137.
Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., Greenspan,
H., 2018. Gan-based synthetic medical image augmentation for increased
cnn performance in liver lesion classification. Neurocomputing 321, 321–
331.
Gatys, L.A., Ecker, A.S., Bethge, M., 2016. Image style transfer using convolu-
tional neural networks, in: Proceedings of the IEEE conference on computer
vision and pattern recognition, pp. 2414–2423.
Ghodrati, V., Shao, J., Bydder, M., Zhou, Z., Yin, W., Nguyen, K.L., Yang, Y.,
Hu, P., 2019. Mr image reconstruction using deep learning: evaluation of
network structure and loss functions. Quantitative imaging in medicine and
surgery 9, 1516–1527.
Gu, Y., Zeng, Z., Chen, H., Wei, J., Zhang, Y., Chen, B., Li, Y., Qin, Y., Xie,
Q., Jiang, Z., et al., 2020. Medsrgan: medical images super-resolution using
generative adversarial networks. Multimedia Tools and Applications 79,
21815–21840.
Hammernik, K., Klatzer, T., Kobler, E., Recht, M.P., Sodickson, D.K., Pock,
T., Knoll, F., 2018. Learning a variational network for reconstruction of
accelerated mri data. Magnetic resonance in medicine 79, 3055–3071.
He, X., Lei, Y., Fu, Y., Mao, H., Curran, W.J., Liu, T., Yang, X., 2020. Super-
resolution magnetic resonance imaging reconstruction using deep attention
networks, in: Medical Imaging 2020: Image Processing, International Soci-
ety for Optics and Photonics. p. 113132J.
Huang, Y., Shao, L., Frangi, A.F., 2017. Simultaneous super-resolution and
cross-modality synthesis of 3d medical images using weakly-supervised
joint convolutional sparse coding, in: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 6070–6079.
Hyun, C.M., Kim, H.P., Lee, S.M., Lee, S., Seo, J.K., 2018. Deep learning
for undersampled mri reconstruction. Physics in Medicine & Biology 63,
135007.
Iqbal, Z., Nguyen, D., Hangel, G., Motyka, S., Bogner, W., Jiang, S., 2019.
Super-resolution 1h magnetic resonance spectroscopic imaging utilizing
deep learning. Frontiers in oncology 9.
Isaac, J.S., Kulkarni, R., 2015. Super resolution techniques for medical image
processing, in: 2015 International Conference on Technologies for Sustain-
able Development (ICTSD), IEEE. pp. 1–6.
Jain, S., Sima, D.M., Sanaei Nezhad, F., Hangel, G., Bogner, W., Williams, S.,
Van Huel, S., Maes, F., Smeets, D., 2017. Patch-based super-resolution
of mr spectroscopic images: application to multiple sclerosis. Frontiers in
9
neuroscience 11, 13.
Johnson, J., Alahi, A., Fei-Fei, L., 2016. Perceptual losses for real-time style
transfer and super-resolution, in: European conference on computer vision,
Springer. pp. 694–711.
Jung, H., Sung, K., Nayak, K.S., Kim, E.Y., Ye, J.C., 2009. k-t focuss: a general
compressed sensing framework for high resolution dynamic mri. Magnetic
Resonance in Medicine: An Ocial Journal of the International Society for
Magnetic Resonance in Medicine 61, 103–116.
Kavur, A.E., Gezer, N.S., Barıs¸, M., Aslan, S., Conze, P.H., Groza, V., Pham,
D.D., Chatterjee, S., Ernst, P., ¨
Ozkan, S., et al., 2020. Chaos challenge-
combined (ct-mr) healthy abdominal organ segmentation. Medical Image
Analysis , 101950.
Kim, Y.G., Kim, S., Cho, C.E., Song, I.H., Lee, H.J., Ahn, S., Park, S.Y.,
Gong, G., Kim, N., 2020. Eectiveness of transfer learning for enhancing
tumor classification with a convolutional neural network on frozen sections.
Scientific Reports 10, 1–9.
Lateh, M.A., Muda, A.K., Yusof, Z.I.M., Muda, N.A., Azmi, M.S., 2017. Han-
dling a small dataset problem in prediction model by employ artificial data
generation approach: A review, in: Journal of Physics: Conference Series,
p. 012016.
Ledig, C., Theis, L., Husz´
ar, F., Caballero, J., Cunningham, A., Acosta, A.,
Aitken, A., Tejani, A., Totz, J., Wang, Z., et al., 2017. Photo-realistic single
image super-resolution using a generative adversarial network, in: Proceed-
ings of the IEEE conference on computer vision and pattern recognition, pp.
4681–4690.
Lee, K.H., He, X., Zhang, L., Yang, L., 2018. Cleannet: Transfer learning for
scalable image classifier training with label noise, in: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 5447–
5456.
Li, B., Yang, Q., Xue, X., 2009. Transfer learning for collaborative filtering
via a rating-matrix generative model, in: Proceedings of the 26th annual
international conference on machine learning, pp. 617–624.
Liang, M., Du, J., Li, L., Xue, Z., Wang, X., Kou, F., Wang, X., 2020. Video
super-resolution reconstruction based on deep learning and spatio-temporal
feature self-similarity. IEEE Transactions on Knowledge and Data Engi-
neering .
Liu, C., Wu, X., Yu, X., Tang, Y., Zhang, J., Zhou, J., 2018. Fusing multi-scale
information in convolution network for mr image super-resolution recon-
struction. Biomedical engineering online 17, 1–23.
Lustig, M., Donoho, D., Pauly, J.M., 2007. Sparse mri: The application of com-
pressed sensing for rapid mr imaging. Magnetic Resonance in Medicine:
An Ocial Journal of the International Society for Magnetic Resonance in
Medicine 58, 1182–1195.
Lustig, M., Santos, J.M., Donoho, D.L., Pauly, J.M., 2006. kt sparse: High
frame rate dynamic mri exploiting spatio-temporal sparsity, in: Proceedings
of the 13th annual meeting of ISMRM, Seattle.
Lyu, Q., Shan, H., Xie, Y., Li, D., Wang, G., 2020. Cine cardiac mri mo-
tion artifact reduction using a recurrent neural network. arXiv preprint
arXiv:2006.12700 .
Mahnken, A.H., Ricke, J., Wilhelm, K.E., 2009. CT-and MR-guided Interven-
tions in Radiology. volume 22. Springer.
Manj´
on, J.V., Coup´
e, P., Buades, A., Fonov, V., Collins, D.L., Robles, M., 2010.
Non-local mri upsampling. Medical image analysis 14, 784–792.
Misra, D., Crispim-Junior, C., Tougne, L., 2020. Patch-based cnn evaluation for
bark classification, in: European Conference on Computer Vision, Springer.
pp. 197–212.
Pan, S.J., Yang, Q., 2009. A survey on transfer learning. IEEE Transactions on
knowledge and data engineering 22, 1345–1359.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T.,
Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., De-
Vito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai,
J., Chintala, S., 2019. Pytorch: An imperative style, high-performance deep
learning library, in: Wallach, H., Larochelle, H., Beygelzimer, A., d'Alch´
e-
Buc, F., Fox, E., Garnett, R. (Eds.), Advances in Neural Information Pro-
cessing Systems 32. Curran Associates, Inc., pp. 8024–8035.
Perez, L., Wang, J., 2017. The eectiveness of data augmentation in image
classification using deep learning. arXiv preprint arXiv:1712.04621 .
Pham, C.H., Ducournau, A., Fablet, R., Rousseau, F., 2017. Brain mri super-
resolution using deep 3d convolutional networks, in: 2017 IEEE 14th Inter-
national Symposium on Biomedical Imaging (ISBI 2017), IEEE. pp. 197–
200.
Plenge, E., Poot, D.H., Bernsen, M., Kotek, G., Houston, G., Wielopolski, P.,
van der Weerd, L., Niessen, W.J., Meijering, E., 2012. Super-resolution
methods in mri: can they improve the trade-obetween resolution, signal-
to-noise ratio, and acquisition time? Magnetic resonance in medicine 68,
1983–1993.
Qin, C., Schlemper, J., Caballero, J., Price, A.N., Hajnal, J.V., Rueckert, D.,
2018. Convolutional recurrent neural networks for dynamic mr image re-
construction. IEEE transactions on medical imaging 38, 280–290.
Ran, Q., Xu, X., Zhao, S., Li, W., Du, Q., 2020. Remote sensing images
super-resolution with deep convolution networks. Multimedia Tools and
Applications 79, 8985–9001.
Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks
for biomedical image segmentation, in: International Conference on Medi-
cal image computing and computer-assisted intervention, Springer. pp. 234–
241.
Rousseau, F., Initiative, A.D.N., et al., 2010. A non-local approach for im-
age super-resolution using intermodality priors. Medical image analysis 14,
594–605.
Sajjadi, M.S., Scholkopf, B., Hirsch, M., 2017. Enhancenet: Single image
super-resolution through automated texture synthesis, in: Proceedings of the
IEEE International Conference on Computer Vision, pp. 4491–4500.
Sarasaen, C., Chatterjee, S., N¨
urnberger, A., Speck, O., 2020. Super resolution
of dynamic mri using deep learning, enhanced by prior-knowledge, in: 37th
Annual Scientific Meeting Congress of the European Society for Magnetic
Resonance in Medicine and Biology, 33(Supplement 1): S03.04, S28-S29,
Springer. doi:10.1007/s10334-020-00874- 0.
Shi, W., Caballero, J., Husz´
ar, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert,
D., Wang, Z., 2016. Real-time single image and video super-resolution us-
ing an ecient sub-pixel convolutional neural network, in: Proceedings of
the IEEE conference on computer vision and pattern recognition, pp. 1874–
1883.
Tang, Y., Shao, L., 2016. Pairwise operator learning for patch-based single-
image super-resolution. IEEE Transactions on Image Processing 26, 994–
1003.
Tanno, R., Worrall, D.E., Ghosh, A., Kaden, E., Sotiropoulos, S.N., Criminisi,
A., Alexander, D.C., 2017. Bayesian image quality transfer with cnns: ex-
ploring uncertainty in dmri super-resolution, in: International Conference on
Medical Image Computing and Computer-Assisted Intervention, Springer.
pp. 611–619.
Tappen, M.F., Liu, C., 2012. A bayesian approach to alignment-based image
hallucination, in: European conference on computer vision, Springer. pp.
236–249.
Tsao, J., Boesiger, P., Pruessmann, K.P., 2003. k-t blast and k-t sense: dynamic
mri with high frame rate exploiting spatiotemporal correlations. Magnetic
Resonance in Medicine: An Ocial Journal of the International Society for
Magnetic Resonance in Medicine 50, 1031–1042.
Van Reeth, E., Tham, I.W., Tan, C.H., Poh, C.L., 2012. Super-resolution in
magnetic resonance imaging: a review. Concepts in Magnetic Resonance
Part A 40, 306–325.
Wang, M., Deng, W., 2018. Deep visual domain adaptation: A survey. Neuro-
computing 312, 135–153.
Wang, S., Su, Z., Ying, L., Peng, X., Zhu, S., Liang, F., Feng, D., Liang, D.,
2016. Accelerating magnetic resonance imaging via deep learning, in: 2016
IEEE 13th International Symposium on Biomedical Imaging (ISBI), IEEE.
pp. 514–517.
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P., 2004. Image quality
assessment: from error visibility to structural similarity. IEEE transactions
on image processing 13, 600–612.
Wang, Z., Chen, J., Hoi, S.C., 2020. Deep learning for image super-resolution:
A survey. IEEE transactions on pattern analysis and machine intelligence .
Wang, Z., Simoncelli, E.P., Bovik, A.C., 2003. Multiscale structural similarity
for image quality assessment, in: The Thrity-Seventh Asilomar Conference
on Signals, Systems & Computers, 2003, Ieee. pp. 1398–1402.
Wilson, G., Cook, D.J., 2020. A survey of unsupervised deep domain adapta-
tion. ACM Transactions on Intelligent Systems and Technology (TIST) 11,
1–46.
Yang, C.Y., Ma, C., Yang, M.H., 2014. Single-image super-resolution: A
benchmark, in: European Conference on Computer Vision, Springer. pp.
372–386.
Yu, X., Fernando, B., Ghanem, B., Porikli, F., Hartley, R., 2018. Face super-
resolution guided by facial component heatmaps, in: Proceedings of the
10
European Conference on Computer Vision (ECCV), pp. 217–233.
Zeng, K., Zheng, H., Cai, C., Yang, Y., Zhang, K., Chen, Z., 2018. Simulta-
neous single-and multi-contrast super-resolution for brain mri images based
on a convolutional neural network. Computers in biology and medicine 99,
133–141.
Zhang, H., Yang, Z., Zhang, L., Shen, H., 2014. Super-resolution reconstruction
for multi-angle remote sensing images considering resolution dierences.
Remote Sensing 6, 637–657.
Zhang, Y., Wu, G., Yap, P.T., Feng, Q., Lian, J., Chen, W., Shen, D., 2012.
Reconstruction of super-resolution lung 4d-ct using patch-based sparse rep-
resentation, in: 2012 IEEE Conference on Computer Vision and Pattern
Recognition, IEEE. pp. 925–931.
Zhao, W., 2017. Research on the deep learning of the small sample data based
on transfer learning, in: AIP Conference Proceedings, AIP Publishing LLC.
p. 020018.
Zhu, Y., Zhang, Y., Yuille, A.L., 2014. Single image super-resolution using
deformable patches, in: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 2917–2924.
11
... Multi-channel training has been used across numerous applications including image recognition [24], speech recognition [25,26], audio classification [27], natural language processing [28], etc. This paper extends the previous work into the temporal domain [29] by exploiting dual-channel inputs (prior image and low-resolution image) in the deep learning model -to learn the temporal relationship between timepoints, while also learning the spatial relationship between low-and high-resolution images, to perform SISR, using the proposed DDoS (Dynamic Dual-channel of Superresolution) approach. ...
... Originally proposed for image segmentation, different flavours of UNet have been developed and deployed in plenty of applications such as image segmentation [32,33,34,35], audio source separation [36,37,38] and image reconstruction [39,40]. 3D UNet and its variants have been used for MR super-resolution as well [29,41,42]. Furthermore, UNet has been extended to multi-channel and dual-branch to incorporate prior information [22]. ...
... Previous work attempting to super-resolve 3D dynamic MRIs treats each timepoint as a single 3D volume and then super-resolves them individually [29]. But in this way, the inherent relationship between the different timepoints of the dynamic MRI is not utilised, which might be possible to exploit to improve the super-resolution performance. ...
Article
Full-text available
Magnetic resonance imaging (MRI) provides high spatial resolution and excellent soft-tissue contrast without using harmful ionising radiation. Dynamic MRI is an essential tool for interventions to visualise movements or changes of the target organ. However, such MRI acquisitions with high temporal resolution suffer from limited spatial resolution - also known as the spatio-temporal trade-off of dynamic MRI. Several approaches, including deep learning based super-resolution approaches by treating each timepoint as individual volumes. This research addresses this issue by creating a deep learning model which attempts to learn both spatial and temporal relationships. A modified 3D UNet model, DDoS-UNet, is proposed - which takes the low-resolution volume of the current timepoint along with a prior image volume. Initially, the network is supplied with a static high-resolution planning scan as the prior image along with the low-resolution input to super-resolve the first timepoint. Then it continues step-wise by using the super-resolved timepoints as the prior image while super-resolving the subsequent timepoints. The model performance was tested with 3D dynamic data that was undersampled to different in-plane levels and achieved an average SSIM value of 0.951±0.017 while reconstructing only 4% of the k-space - which could result in a theoretical acceleration factor of 25. The proposed approach can be used to reduce the required scan-time while achieving high spatial resolution - consequently alleviating the spatio-temporal trade-off of dynamic MRI, by incorporating prior knowledge of spatio-temporal information from the available high-resolution planning scan and the existing temporal redundancy of time-series images into the network model.
... Recently, transfer learning has been incorporated to address the time-consuming training for each patient and overcome domain shifting (Gulamhussene et al. 2023). Sarasaen et al. (2021) proposed a U-Net-based super-resolution model with fine-tuning using subjectspecific static high-resolution MRI, resulting in high-resolution dynamic images. Terpstra et al. (2023) introduced MODEST, which uses low-dimensional subnetworks to reconstruct 4D-MRI by registering the exhale phase to every other respiratory phase using undersampled 4D-MRI and computed DVFs as input. ...
Article
Full-text available
Four-dimensional imaging (4D-imaging) plays a critical role in achieving precise motion management in radiation therapy. However, challenges remain in 4D-imaging such as a long imaging time, suboptimal image quality, and inaccurate motion estimation. With the tremendous success of artificial intelligence (AI) in the image domain, particularly deep learning, there is great potential in overcoming these challenges and improving the accuracy and efficiency of 4D-imaging without the need for hardware modifications. In this review, we provide a comprehensive overview of how these AI-based methods could drive the evolution of 4D-imaging for motion management. We discuss the inherent issues associated with multiple 4D modalities and explore the current research progress of AI in 4D-imaging. Furthermore, we delve into the unresolved challenges and limitations in 4D-imaging and provide insights into the future direction of this field.
... Küstner et al. [27] performed 3D cardiac cine MRI reconstruction based on 4D spatiotemporal convolution, but it suffered from a high computational cost. Sarasaen et al. [28] fine-tuned their network using static high-resolution MRI as prior knowledge. ...
Preprint
Cardiac cine magnetic resonance imaging (MRI) is one of the important means to assess cardiac functions and vascular abnormalities. Mitigating artifacts arising during image reconstruction and accelerating cardiac cine MRI acquisition to obtain high-quality images is important. A novel end-to-end deep learning network is developed to improve cardiac cine MRI reconstruction. First, a U-Net is adopted to obtain the initial reconstructed images in k-space. Further to remove the motion artifacts, the motion-guided deformable alignment (MGDA) module with second-order bidirectional propagation is introduced to align the adjacent cine MRI frames by maximizing spatial-temporal information to alleviate motion artifacts. Finally, the multi-resolution fusion (MRF) module is designed to correct the blur and artifacts generated from alignment operation and obtain the last high-quality reconstructed cardiac images. At an 8×\times acceleration rate, the numerical measurements on the ACDC dataset are structural similarity index (SSIM) of 78.40%±\pm.57%, peak signal-to-noise ratio (PSNR) of 30.46±\pm1.22dB, and normalized mean squared error (NMSE) of 0.0468±\pm0.0075. On the ACMRI dataset, the results are SSIM of 87.65%±\pm4.20%, PSNR of 30.04±\pm1.18dB, and NMSE of 0.0473±\pm0.0072. The proposed method exhibits high-quality results with richer details and fewer artifacts for cardiac cine MRI reconstruction on different accelerations.
... On the other hand, MRIs with low-spatial resolution can be improved by treating it as a super-resolution task. Deep-learning-based techniques to improve the image quality of MRIs have been proposed for both artefact reduction [5,6] and super-resolution [7][8][9]. The focus of this research is on the latter, improving the image quality of low-resolution MRI by treating it as an SISR problem. ...
Article
Full-text available
High-spatial resolution MRI produces abundant structural information, enabling highly accurate clinical diagnosis and image-guided therapeutics. However, the acquisition of high-spatial resolution MRI data typically can come at the expense of less spatial coverage, lower signal-to-noise ratio (SNR), and longer scan time due to physical, physiological and hardware limitations. In order to overcome these limitations, super-resolution MRI deep-learning-based techniques can be utilised. In this work, different state-of-the-art 3D convolution neural network models for super resolution (RRDB, SPSR, UNet, UNet-MSS and ShuffleUNet) were compared for the super-resolution task with the goal of finding the best model in terms of performance and robustness. The public IXI dataset (only structural images) was used. Data were artificially downsampled to obtain lower-resolution spatial MRIs (downsampling factor varying from 8 to 64). When assessing performance using the SSIM metric in the test set, all models performed well. In particular, regardless of the downsampling factor, the UNet consistently obtained the top results. On the other hand, the SPSR model consistently performed worse. In conclusion, UNet and UNet-MSS achieved overall top performances while RRDB performed relatively poorly compared to the other models.
... Pre-trained models are often trained by well-resourced and experienced teams with large amounts of clean data (Yin et al. 2023a). Exceptional pre-trained models can help hardware-and data-limited teams save plenty of training costs and train well-performing deep models on new tasks (Sarasaen et al. 2021;Amisse, Jijón-Palma, and Centeno 2021;Too et al. 2019;Käding et al. 2016). In the era of large models, the efficiency of tuning pre-trained models Figure 1: Comparisons of our method with full finetuning and recent delta-tuning art on representative visual tasks. ...
Preprint
Pre-training & fine-tuning can enhance the transferring efficiency and performance in visual tasks. Recent delta-tuning methods provide more options for visual classification tasks. Despite their success, existing visual delta-tuning art fails to exceed the upper limit of full fine-tuning on challenging tasks like object detection and segmentation. To find a competitive alternative to full fine-tuning, we propose the Multi-cognitive Visual Adapter (Mona) tuning, a novel adapter-based tuning method. First, we introduce multiple vision-friendly filters into the adapter to enhance its ability to process visual signals, while previous methods mainly rely on language-friendly linear filters. Second, we add the scaled normalization layer in the adapter to regulate the distribution of input features for visual filters. To fully demonstrate the practicality and generality of Mona, we conduct experiments on multiple representative visual tasks, including instance segmentation on COCO, semantic segmentation on ADE20K, object detection on Pascal VOC, oriented object detection on DOTA/STAR, and image classification on three common datasets. Exciting results illustrate that Mona surpasses full fine-tuning on all these tasks, and is the only delta-tuning method outperforming full fine-tuning on the above various tasks. For example, Mona achieves 1% performance gain on the COCO dataset compared to full fine-tuning. Comprehensive results suggest that Mona-tuning is more suitable for retaining and utilizing the capabilities of pre-trained models than full fine-tuning. We will make the code publicly available.
... The diagnostic performance of the AI model for TMJ effusion was acceptable, with an excellent discrimination level. In deep learning, the fine-tuning model is an approach in which the weights of a pre-trained model are trained on new data 62 . The fine-tuning model had better prediction performance for TMJ effusion than the from-scratch and freeze models. ...
Article
Full-text available
This study investigated the usefulness of deep learning-based automatic detection of temporomandibular joint (TMJ) effusion using magnetic resonance imaging (MRI) in patients with temporomandibular disorder and whether the diagnostic accuracy of the model improved when patients’ clinical information was provided in addition to MRI images. The sagittal MR images of 2948 TMJs were collected from 1017 women and 457 men (mean age 37.19 ± 18.64 years). The TMJ effusion diagnostic performances of three convolutional neural networks (scratch, fine-tuning, and freeze schemes) were compared with those of human experts based on areas under the curve (AUCs) and diagnosis accuracies. The fine-tuning model with proton density (PD) images showed acceptable prediction performance (AUC = 0.7895), and the from-scratch (0.6193) and freeze (0.6149) models showed lower performances (p < 0.05). The fine-tuning model had excellent specificity compared to the human experts (87.25% vs. 58.17%). However, the human experts were superior in sensitivity (80.00% vs. 57.43%) (all p < 0.001). In gradient-weighted class activation mapping (Grad-CAM) visualizations, the fine-tuning scheme focused more on effusion than on other structures of the TMJ, and the sparsity was higher than that of the from-scratch scheme (82.40% vs. 49.83%, p < 0.05). The Grad-CAM visualizations agreed with the model learned through important features in the TMJ area, particularly around the articular disc. Two fine-tuning models on PD and T2-weighted images showed that the diagnostic performance did not improve compared with using PD alone (p < 0.05). Diverse AUCs were observed across each group when the patients were divided according to age (0.7083–0.8375) and sex (male:0.7576, female:0.7083). The prediction accuracy of the ensemble model was higher than that of the human experts when all the data were used (74.21% vs. 67.71%, p < 0.05). A deep neural network (DNN) was developed to process multimodal data, including MRI and patient clinical data. Analysis of four age groups with the DNN model showed that the 41–60 age group had the best performance (AUC = 0.8258). The fine-tuning model and DNN were optimal for judging TMJ effusion and may be used to prevent true negative cases and aid in human diagnostic performance. Assistive automated diagnostic methods have the potential to increase clinicians’ diagnostic accuracy.
... Küstner et al. [27] performed 3D cardiac cine MRI reconstruction based on 4D spatiotemporal convolution, but it suffered from a high computational cost. Sarasaen et al. [28] fine-tuned their network using static high-resolution MRI as prior knowledge. Additionally, some studies [29,30] consider motion information as a crucial prior in cine MRI to improve the reconstruction quality. ...
Article
Full-text available
Cardiac cine magnetic resonance imaging (MRI) is one of the important means to assess cardiac functions and vascular abnormalities. Mitigating artifacts arising during image reconstruction and accelerating cardiac cine MRI acquisition to obtain high‐quality images is important. A novel end‐to‐end deep learning network is developed to improve cardiac cine MRI reconstruction. First, a U‐Net is adopted to obtain the initial reconstructed images in k‐space. Further to remove the motion artifacts, the motion‐guided deformable alignment (MGDA) module with second‐order bidirectional propagation is introduced to align the adjacent cine MRI frames by maximizing spatial–temporal information to alleviate motion artifacts. Finally, the multi‐resolution fusion (MRF) module is designed to correct the blur and artifacts generated from alignment operation and obtain the last high‐quality reconstructed cardiac images. At an 8× acceleration rate, the numerical measurements on the ACDC dataset are structural similarity index (SSIM) of 78.40% ± 4.57%, peak signal‐to‐noise ratio (PSNR) of 30.46 ± 1.22 dB, and normalized mean squared error (NMSE) of 0.0468 ± 0.0075. On the ACMRI dataset, the results are SSIM of 87.65% ± 4.20%, PSNR of 30.04 ± 1.18 dB, and NMSE of 0.0473 ± 0.0072. The proposed method exhibits high‐quality results with richer details and fewer artifacts for cardiac cine MRI reconstruction on different accelerations.
Article
Background Late gadolinium enhancement (LGE) cardiovascular magnetic resonance (CMR) imaging enables imaging of scar/fibrosis and is a cornerstone of most CMR imaging protocols. CMR imaging can benefit from image acceleration; however, image acceleration in LGE remains challenging due to its limited signal-to-noise ratio. In this study, we sought to evaluate a rapid two-dimensional (2D) LGE imaging protocol using a generative artificial intelligence (AI) algorithm with inline reconstruction. Methods A generative AI-based image enhancement was used to improve the sharpness of 2D LGE images acquired with low spatial resolution in the phase-encode direction. The generative AI model is an image enhancement technique built on the enhanced super-resolution generative adversarial network. The model was trained using balanced steady-state free-precession cine images, readily used for LGE without additional training. The model was implemented inline, allowing the reconstruction of images on the scanner console. We prospectively enrolled 100 patients (55 ± 14 years, 72 males) referred for clinical CMR at 3T. We collected three sets of LGE images in each subject, with in-plane spatial resolutions of 1.5 × 1.5-3-6 mm². The generative AI model enhanced in-plane resolution to 1.5 × 1.5 mm² from the low-resolution counterparts. Images were compared using a blur metric, quantifying the perceived image sharpness (0 = sharpest, 1 = blurriest). LGE image sharpness (using a 5-point scale) was assessed by three independent readers. Results The scan times for the three imaging sets were 15 ± 3, 9 ± 2, and 6 ± 1 s, with inline generative AI-based images reconstructed time of ∼37 ms. The generative AI-based model improved visual image sharpness, resulting in lower blur metric compared to low-resolution counterparts (AI-enhanced from 1.5 × 3 mm² resolution: 0.3 ± 0.03 vs 0.35 ± 0.03, P < 0.01). Meanwhile, AI-enhanced images from 1.5 × 3 mm² resolution and original LGE images showed similar blur metric (0.30 ± 0.03 vs 0.31 ± 0.03, P = 1.0) Additionally, there was an overall 18% improvement in image sharpness between AI-enhanced images from 1.5 × 3 mm² resolution and original LGE images in the subjective blurriness score (P < 0.01). Conclusion The generative AI-based model enhances the image quality of 2D LGE images while reducing the scan time and preserving imaging sharpness. Further evaluation in a large cohort is needed to assess the clinical utility of AI-enhanced LGE images for scar evaluation, as this proof-of-concept study does not provide evidence of an impact on diagnosis.
Article
Full-text available
Blood vessels of the brain provide the human brain with the required nutrients and oxygen. As a vulnerable part of the cerebral blood supply, pathology of small vessels can cause serious problems such as Cerebral Small Vessel Diseases (CSVD). It has also been shown that CSVD is related to neurodegeneration, such as Alzheimer’s disease. With the advancement of 7 Tesla MRI systems, higher spatial image resolution can be achieved, enabling the depiction of very small vessels in the brain. Non-Deep Learning-based approaches for vessel segmentation, e.g., Frangi’s vessel enhancement with subsequent thresholding, are capable of segmenting medium to large vessels but often fail to segment small vessels. The sensitivity of these methods to small vessels can be increased by extensive parameter tuning or by manual corrections, albeit making them time-consuming, laborious, and not feasible for larger datasets. This paper proposes a deep learning architecture to automatically segment small vessels in 7 Tesla 3D Time-of-Flight (ToF) Magnetic Resonance Angiography (MRA) data. The algorithm was trained and evaluated on a small imperfect semi-automatically segmented dataset of only 11 subjects; using six for training, two for validation, and three for testing. The deep learning model based on U-Net Multi-Scale Supervision was trained using the training subset and was made equivariant to elastic deformations in a self-supervised manner using deformation-aware learning to improve the generalisation performance. The proposed technique was evaluated quantitatively and qualitatively against the test set and achieved a Dice score of 80.44 ± 0.83. Furthermore, the result of the proposed method was compared against a selected manually segmented region (62.07 resultant Dice) and has shown a considerable improvement (18.98%) with deformation-aware learning.
Chapter
Full-text available
The identification of tree species from bark images is a challenging computer vision problem. However, even in the era of deep learning today, bark recognition continues to be explored by traditional methods using time-consuming handcrafted features, mainly due to the problem of limited data. In this work, we implement a patch-based convolutional neural network alternative for analyzing a challenging bark dataset Bark-101, comprising of 2587 images from 101 classes. We propose to apply image re-scaling during the patch extraction process to compensate for the lack of sufficient data. Individual patch-level predictions from fine-tuned CNNs are then combined by classical majority voting to obtain image-level decisions. Since ties can often occur in the voting process, we investigate various tie-breaking strategies from ensemble-based classifiers. Our study outperforms the classification accuracy achieved by traditional methods applied to Bark-101, thus demonstrating the feasibility of applying patch-based CNNs to such challenging datasets.
Article
Full-text available
Segmentation of abdominal organs has been a comprehensive, yet unresolved, research field for many years. In the last decade, intensive developments in deep learning (DL) introduced new state-of-the-art segmentation systems. Despite outperforming the overall accuracy of existing systems, the effects of DL model properties and parameters on the performance are hard to interpret. This makes comparative analysis a necessary tool towards interpretable studies and systems. Moreover, the performance of DL for emerging learning approaches such as cross-modality and multi-modal semantic segmentation tasks has been rarely discussed. In order to expand the knowledge on these topics, the CHAOS – Combined (CT-MR) Healthy Abdominal Organ Segmentation challenge was organized in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI), 2019, in Venice, Italy. Abdominal organ segmentation from routine acquisitions plays an important role in several clinical applications, such as pre-surgical planning or morphological and volumetric follow-ups for various diseases. These applications require a certain level of performance on a diverse set of metrics such as maximum symmetric surface distance (MSSD) to determine surgical error-margin or overlap errors for tracking size and shape differences. Previous abdomen related challenges are mainly focused on tumor/lesion detection and/or classification with a single modality. Conversely, CHAOS provides both abdominal CT and MR data from healthy subjects for single and multiple abdominal organ segmentation. Five different but complementary tasks were designed to analyze the capabilities of participating approaches from multiple perspectives. The results were investigated thoroughly, compared with manual annotations and interactive methods. The analysis shows that the performance of DL models for single modality (CT / MR) can show reliable volumetric analysis performance (DICE: 0.98 ± 0.00 / 0.95 ± 0.01), but the best MSSD performance remains limited (21.89 ± 13.94 / 20.85 ± 10.63 mm). The performances of participating models decrease dramatically for cross-modality tasks both for the liver (DICE: 0.88 ± 0.15 MSSD: 36.33 21.97 mm). Despite contrary examples on different applications, multi-tasking DL models designed to segment all organs are observed to perform worse compared to organ-specific ones (performance drop around 5%). Nevertheless, some of the successful models show better performance with their multi-organ versions. We conclude that the exploration of those pros and cons in both single vs multi-organ and cross-modality segmentations is poised to have an impact on further research for developing effective algorithms that would support real-world clinical applications. Finally, having more than 1500 participants and receiving more than 550 submissions, another important contribution of this study is the analysis on shortcomings of challenge organizations such as the effects of multiple submissions and peeking phenomenon.
Article
Full-text available
Fast and accurate confirmation of metastasis on the frozen tissue section of intraoperative sentinel lymph node biopsy is an essential tool for critical surgical decisions. However, accurate diagnosis by pathologists is difficult within the time limitations. Training a robust and accurate deep learning model is also difficult owing to the limited number of frozen datasets with high quality labels. To overcome these issues, we validated the effectiveness of transfer learning from CAMELYON16 to improve performance of the convolutional neural network (CNN)-based classification model on our frozen dataset (N = 297) from Asan Medical Center (AMC). Among the 297 whole slide images (WSIs), 157 and 40 WSIs were used to train deep learning models with different dataset ratios at 2, 4, 8, 20, 40, and 100%. The remaining, i.e., 100 WSIs, were used to validate model performance in terms of patch- and slide-level classification. An additional 228 WSIs from Seoul National University Bundang Hospital (SNUBH) were used as an external validation. Three initial weights, i.e., scratch-based (random initialization), ImageNet-based, and CAMELYON16-based models were used to validate their effectiveness in external validation. In the patch-level classification results on the AMC dataset, CAMELYON16-based models trained with a small dataset (up to 40%, i.e., 62 WSIs) showed a significantly higher area under the curve (AUC) of 0.929 than those of the scratch- and ImageNet-based models at 0.897 and 0.919, respectively, while CAMELYON16-based and ImageNet-based models trained with 100% of the training dataset showed comparable AUCs at 0.944 and 0.943, respectively. For the external validation, CAMELYON16-based models showed higher AUCs than those of the scratch- and ImageNet-based models. Model performance for slide feasibility of the transfer learning to enhance model performance was validated in the case of frozen section datasets with limited numbers.
Conference Paper
Full-text available
Introduction Dynamic MRI suffers from the spatial-temporal resolution trade-off of MRI, making the acquisition of 3D Dynamic MRI a challenging task. Deep Learning has been proven to be a successful tool for performing super-resolution on MRIs. The fast speed of inference of deep learning based models makes them perfect for real-time dynamic MRI for interventional purposes. But typically, deep learning based methods need large training sets similar to the actual real-time acquisition; using training set significantly different than the test set can produce results of poor quality. This research tries to address this problem by training on a publicly available dataset, acquired using completely different sequences than the dynamic MRI used here for testing; and then fine-tuning using a prior static scan of the same subject. Before any intervention, such scans are taken for planning. During intervention that model can be used to perform super-resolution reconstruction of the real-time dynamic MRI.
Article
Full-text available
Diffusion-weighted (DW) and spectroscopic MR (MRS) images are found to be very helpful for diagnostic purposes as they provide complementary information to that provided by conventional MRI. These images are also acquired at a faster rate, but with low signal-to-noise ratio. This limitation can be overcome by applying image super-resolution techniques. In this paper, we propose sparse representation over a learned overcomplete dictionary based single-image super-resolution (SISR) technique for DW and MRS images. The proposed SISR method incorporates patch-wise sparsity constraint based on external HR information together with the non-local total variation (NLTV) as internal information to make the regularization problem more robust. Experiments are conducted for both DW and MRS test images and results are compared with some of the recent methods. Results indicate the potential of the proposed method for clinical MRI applications.
Article
Cine cardiac magnetic resonance imaging (MRI) is widely used for the diagnosis of cardiac diseases thanks to its ability to present cardiovascular features in excellent contrast. As compared to computed tomography (CT), MRI, however, requires a long scan time, which inevitably induces motion artifacts and causes patients' discomfort. Thus, there has been a strong clinical motivation to develop techniques to reduce both the scan time and motion artifacts. Given its successful applications in other medical imaging tasks such as MRI super-resolution and CT metal artifact reduction, deep learning is a promising approach for cardiac MRI motion artifact reduction. In this paper, we propose a novel recurrent generative adversarial network model for cardiac MRI motion artifact reduction. This model utilizes bi-directional convolutional long short-term memory (ConvLSTM) and multi-scale convolutions to improve the performance of the proposed network, in which bi-directional ConvLSTMs handle long-range temporal features while multi-scale convolutions gather both local and global features. We demonstrate a decent generalizability of the proposed method thanks to the novel architecture of our deep network that captures the essential relationship of cardiovascular dynamics. Indeed, our extensive experiments show that our method achieves better image quality for cine cardiac MRI images than existing state-of-the-art methods. In addition, our method can generate reliable missing intermediate frames based on their adjacent frames, improving the temporal resolution of cine cardiac MRI sequences.
Article
To address the problems in the existing video super-resolution methods, such as noise, over smooth and visual artifacts, which are caused by reliance on limited external training or mismatch of internal similarity instances, this study proposes a video super-resolution reconstruction algorithm based on deep learning and spatio-temporal feature similarity (DLSS-VSR). The video super-resolution reconstruction mechanism with joint internal and external constraints is established utilizing both external deep correlation mapping learning and internal spatio-temporal nonlocal self-similarity prior constraint. A deep learning model based on deep convolutional neural network is constructed to learn the nonlinear correlation mapping between low-resolution and high-resolution video frame patches. A spatio-temporal feature similarity calculation method is proposed, which considers both internal video spatio-temporal self-similarity and external clean nonlocal similarity. For the internal spatio-temporal feature self-similarity, we improve the accuracy and robustness of similarity matching by proposing a similarity measure strategy based on spatio-temporal moment feature similarity and structural similarity. The external nonlocal similarity prior constraint is learned by patch group-based Gaussian mixture model. The time efficiency for spatio-temporal similarity matching is further improved based on saliency detection and region correlation judgment strategy. Experimental results demonstrate that the DLSS-VSR achieves competitive super-resolution quality compared to other state-of-the-art algorithms.
Article
Deep learning has produced state-of-the-art results for a variety of tasks. While such approaches for supervised learning have performed well, they assume that training and testing data are drawn from the same distribution, which may not always be the case. As a complement to this challenge, single-source unsupervised domain adaptation can handle situations where a network is trained on labeled data from a source domain and unlabeled data from a related but different target domain with the goal of performing well at test-time on the target domain. Many single-source and typically homogeneous unsupervised deep domain adaptation approaches have thus been developed, combining the powerful, hierarchical representations from deep learning with domain adaptation to reduce reliance on potentially costly target data labels. This survey will compare these approaches by examining alternative methods, the unique and common elements, results, and theoretical insights. We follow this with a look at application areas and open research directions.