K-SVD for HARDI denoising

Conference Paper (PDF Available)inProceedings / IEEE International Symposium on Biomedical Imaging: from nano to macro. IEEE International Symposium on Biomedical Imaging · May 2011with91 Reads
DOI: 10.1109/ISBI.2011.5872757 · Source: IEEE Xplore
Conference: Biomedical Imaging: From Nano to Macro, 2011 IEEE International Symposium on
Noise is an important concern in high-angular resolution diffusion imaging studies because it can lead to errors in downstream analyses of white matter structure. To address this issue, we investigate a new approach for denoising diffusion-weighted data sets based on the K-SVD algorithm. We analyze its characteristics using both simulated and biological data and compare its performance with existing methods. Our results show that K-SVD provides robust and effective noise reduction and is practical for use in high-volume applications.


Vishal Patel, Yonggang Shi, Paul M. Thompson, Arthur W. Toga
Laboratory of Neuro Imaging, University of California, Los Angeles
Noise is an important concern in high-angular resolution diffusion
imaging studies because it can lead to errors in downstream analy-
ses of white matter structure. To address this issue, we investigate
a new approach for denoising diffusion-weighted data sets based on
the K-SVD algorithm. We analyze its characteristics using both sim-
ulated and biological data and compare its performance with existing
methods. Our results show that K-SVD provides robust and effective
noise reduction and is practical for use in high-volume applications.
Index TermsMagnetic resonance imaging, diffusion tensor
imaging, noise reduction, algorithms, brain
High-angular resolution diffusion imaging (HARDI) involves
the analysis of multiple diffusion-weighted images (DWIs) to
reconstruct complex white matter structure. As with all MR
images, these DWIs are corrupted by noise from biological,
electronic, and various other sources. This noise can lead to
inaccurate DWI registration, erroneous orientation distribu-
tion function (ODF) estimation, and subsequent tractography
errors. In this report, we consider HARDI denoising as an in-
dependent processing stage—this approach has the potential
to improve not only ODF estimation and tractography, but
also DWI registration which is often important for population
studies of white matter structure.
Many previously proposed methods for MR noise reduc-
tion extend approaches for 2-D image denoising, and [1] pro-
vides a comprehensive review of common techniques. Here,
we briefly mention methods specifically directed at denoising
DWIs for HARDI analysis. Some have examined anisotropic
filtering for diffusion MRI using the Perona-Malik scheme
[2] and anisotropic Gaussian kernels [3]. Other methods rely
on total variation minimization: [4] proposed smoothing the
spherical signal with a finite-element method prior to mini-
mizing its 3-D total variation, while [5] have presented a sim-
ilar approach which operates on the spherical apparent diffu-
sion coefficient. The popular non-local means algorithm has
also been evaluated for this purpose in [6] and [7]. Still oth-
ers have proposed restoring DWIs with a linear mean square
error estimator through a Rician noise model [8].
In this paper, we adapt a recently-developed denoising
algorithm, K-SVD [9], for the task of noise reduction in
HARDI data. Below, we present its formulation and evaluate
its performance relative to other denoising methods.
The K-SVD algorithm was introduced by [9] as a method for
sparse signal representation, a problem which has recently at-
tracted much attention. A full review of this topic is beyond
the scope of this report; instead, we provide here a focused
description of the K-SVD algorithm with the specific intent
of denoising HARDI data sets.
K-SVD is designed to seek an efficient decomposition of
a set of signals into a sparse coding Xfrom a dictionary D.
Given a HARDI data set comprised of MDWIs and Nvox-
els, we denote the M×Nmatrix of DWI signal attenuations
as Y. For computational efficiency, we train our K-atom dic-
tionary (DRM×K)onaP<Nrandom sampling of
in-brain voxels collected in the M×Ptraining set W. More-
over, we desire that each of the K-length coding vectors in X
satisfies a sparsity threshold T0:
Fs.t. p, xp0T0(1)
In (1) and throughout this paper, we use lowercase symbols to
represent vector components of the corresponding matrices,
with subscripts and superscripts indicating column and row
vectors, respectively. K-SVD optimizes Dand Xthrough a
number of training iterations (in this work, we use I=40
iterations, which we find empirically to be sufficient for solu-
tion convergence). Each iteration consists of a sparse coding
stage that optimizes the coefficients in Xand a dictionary up-
date stage that improves the atoms in D.
During the sparse coding stage,Dis held fixed while each
coefficient vector xpis optimized through the minimization:
2s.t. xp0T0(2)
Problems of the form (2) have been widely studied for “com-
pressed sensing”; solution methods include basis pursuit,
matching pursuit, FOCUSS, etc. In this work, we utilize
an orthogonal matching pursuit variant, Batch-OMP [10], to
solve (2) efficiently, but any suitable minimization technique
can be substituted to compute xp.
During the dictionary update stage, each atom dkis im-
proved sequentially, along with the coding vectors in Xwhich
utilize that atom. This update process is the key insight of
K-SVD which accelerates the convergence of (1) while main-
taining the sparsity requirement [9]. To optimally replace the
1805978-1-4244-4128-0/11/$25.00 ©2011 IEEE ISBI 2011
Algorithm 1 K-SVD for HARDI (see text for variable definitions)
Z0// initialize sparse-coded image
for r=1to Rdo
select the Pcolumns of Wrandomly from Y// training set
select the Kcolumns of Drandomly from W// initialize dictionary
for i=1to Ido // training iterations
for p=1to Pdo // sparse coding stage
2s.t. xp0T0
end for
for k=1to Kdo // dictionary update stage
Eep|xk(p)=0 // columns which use dk
dku1// updated atom
s11v1// updated codings
end for
end for
for n=1to Ndo // encode all voxels by OMP
2s.t. xn0T0
end for
end for
kth atom, we first set dk=0and compute the reconstruc-
tion error ˜
Efor those signals which use dk, (i.e. those signals
for which xkis non-zero). The ideal replacement atom and
coding then satisfy ˜
E=dkxk; however, since the right-hand
side is the product of two vectors, this reduces to finding the
closest rank-1approximation to ˜
E, which we obtain by trun-
cating the singular value decomposition: ˜
E=USVT. Atom
dkis thus replaced by the first output basis vector u1, while
the non-zero values in xkare adjusted to the product of the
first singular value and the first input basis vector s11v1.
After Iiterations of sparse coding and dictionary update
stages, Dis optimized to span the range of signals contained
in the training set W. We then encode the entire volume Y
by Batch-OMP following (2) and recover the sparse-coded
result Z=DX. In practice, for reasons discussed in Sec-
tion 3.4, we repeat this entire process for Rrounds (we use
R=10unless otherwise specified), and average the results
in Z. The denoising problem can then be written as a simple
quadratic minimization between a data-fidelity term and the
sparse-coded result: ˆ
where the parameter λcontrols the relative weighting. This
form ensures that any elements of Ywhich cannot be well-
represented by the W-trained dictionary are still reasonably
preserved. Empirically, we find that with known or estimated
SNR σfor the raw data, λ=10
0.1σ2is a useful heuris-
tic which increases the weight of the data-fidelity term as
Ybecomes more reliable. The closed-form solution ˆ
(λY+Z)/(λ+1)gives the denoised image. The full pro-
cedure is summarized in Algorithm 1.
We note here a unique property of the K-SVD denois-
ing method for HARDI: the algorithm is driven solely by the
statistical properties of the training data—in contrast to most
conventional denoising methods, it makes no implicit or ex-
Fig. 1. K-SVD parameter tuning on a simulated volume: 106voxels,
64 gradient directions, SNR =10. K-SVD was performed over
a range of values for dictionary size (K) and sparsity (T0). Error
between the denoised result and the ideal noise-free simulation is
quantified as root-mean-square error over the 4-D image (left) and
the Fisher-Rao measure between reconstructed ODFs (right).
plicit assumptions about voxel neighborhoods, spatial conti-
nuity, or gradients. Indeed, the method does not consider the
physical positions or adjacency of the signals in Yat all. The
behavior of K-SVD can be understood intuitively by realiz-
ing that the sparse coding forces the signals into a smaller
subspace so that they become more similar than their noisy
variants. The image is thus denoised without imposing un-
necessary smoothness constraints so that true details are pre-
served. Note though, that if neighborhood information is ex-
pected to improve denoising, it can be trivially incorporated
into this process through a simple image tiling scheme [9].
To understand the properties of K-SVD denoising, we eval-
uate the optimal parameter choices and perform quantitative
and qualitative comparisons with other denoising approaches
using both simulated and biological data sets.
3.1. Parameter Tuning
The main parameters to evaluate in the K-SVD process are
the dictionary size Kand the sparsity limit T0. We study
the effect of varying these parameters using a simulation
with N=10
6voxels and 64 evenly-distributed diffusion-
weighting gradient directions gm. The signal in each voxel
arises from 13randomly-oriented fibers simulated by the
multi-tensor model: Sm=S0q[1,3] ebgT
mDqgm/q, where
b= 1000 s/mm2, and diffusion tensor Dqhas eigenvalues
2=λ3=0.2×103mm2/s with the
primary eigenvector directed along the qth fiber direction. We
degrade this ideal data set with Rician noise to produce a sim-
ulated volume with SNR =10, which we then denoise with
the K-SVD procedure using P= 2000 voxels for training
and a range of values for K[10,316] and T0[1,10].In
Fig. 1, we show the error between the denoised reconstruction
and the ideal, noise-free data quantified set as: 1) the root-
Fig. 2. Comparison of K-SVD with TV and NLM. Reconstructed ODFs are shown for a 4×4region from a 106-voxel digital phantom.
Left-to-right: ground truth simulation, noise-corrupted simulation, and TV, NLM, and K-SVD denoising results. Below: Over all 106voxels,
the mean Fisher-Rao (FR) distance between ground truth ODFs and those from the corresponding panel, and the computational run time (RT).
mean-square error (RMSE) using the DWIs themselves, and
2) the mean Fisher-Rao metric [11] between unregularized
ODFs estimated from these DWIs.
Both error measures reveal several important properties
of K-SVD. For very small dictionaries or low sparsity thresh-
olds, the error between the denoised result and the ideal noise-
free case increases, indicating that the parameters are too re-
strictive to permit effective coding of the full range of signals
present in the test volume. The reconstruction error also in-
creases if the dictionary size is made too large or the sparsity
constraint is too lax, suggesting that expansive dictionaries
or dense codings allow the K-SVD result to reproduce some
of the noise in the input. Optimal denoising performance is
obtained for intermediate values of dictionary size (K20
100) and sparsity (T025), which permit enough entropy
to capture the true signal variability, but not enough to re-
produce most of the noise. We also note that over the full
range of Kand T0we have examined, error measures are less
than for the case in which no denoising is performed (not de-
picted: RMSE =0.099, Fisher-Rao =0.079), indicating that
K-SVD is unlikely to have a detrimental effect across a broad
range of parameter values.
3.2. Comparison with Other Denoising Methods
We next compare the performance of K-SVD with other
denoising approaches which have recently been considered
for use in HARDI: total variation (TV) and non-local means
(NLM). Briefly, for TV, we minimize a functional involving
the total variation of the 3-D spherical apparent diffusion
coefficient as in [5]. For NLM, which replaces each voxel
with a weighted average of itself and “similar” voxels in
some search locality, we apply the method to each DWI in-
dependently, as found to be best by [7]. We adjust tuning
parameters for both TV and NLM manually to obtain optimal
denoising, and for K-SVD, we train for R=10rounds on
P= 2000 voxels and take conservative estimates K= 100
and T0=4from Section 3.1.
We generate a simulated data set as in Section 3.1 with
one difference: to ensure fair testing for TV and NLM which
rely on spatial information, our new digital phantom consists
of large areas with smoothly-varying fiber orientation sepa-
rated by sharp boundaries as might be encountered in biolog-
ical data (e.g. Fig. 2, left). For quantitative comparison, we
compute ODFs from the denoised DWIs, and as before, we
use the Fisher-Rao distance between the recovered ODFs and
the ground truth ODFs as an error measure. We also track the
computation time required to denoise the simulated data set
for each method using a single 2.4GHz CPU.
Fig. 2 contains the results. At top, to illustrate the quali-
tative performance of the denoising methods, we show ODFs
for a small edge-containing region of the simulated volume.
Inspection reveals that the NLM and K-SVD results most
closely match the ground truth, with the K-SVD ODFs be-
ing slightly more faithful. These observations are confirmed
by the quantitative analysis: on average, ODFs reconstructed
from the K-SVD denoised DWIs are closer (in a Riemannian
sense) to the ground truth (mean Fisher-Rao distance = 0.028)
than those from TV (0.069) or NLM (0.044). With respect to
computational run time, we see that K-SVD is more than an
order of magnitude faster than both TV (which requires an ex-
pensive gradient descent) and NLM (which has a well-known
cost for computing window similarities). These results are for
unoptimized implementations of the algorithms—the impor-
tant conclusion is that K-SVD denoising is fast enough to be
of practical use in high-volume applications.
3.3. Qualitative Results from Biological Data
We next verify these findings in a biological data set acquired
from a healthy adult volunteer. Usinga4TBruker Med-
spec unit with a single-shot echo planar technique and twice-
refocused spin echo sequence, we collected 94 DWIs with b-
value 1159 s/mm2and 11 b0images. Image dimensions were
128 ×128 ×55 voxels, with voxel size 1.8×1.8×2.0mm.
Total acquisition time was 14.5min.
Fig. 3 shows a randomly-selected directional DWI from
the original noisy data set and denoised versions generated by
TV, NLM, and K-SVD. Denoising parameters are the same
as in Section 3.2. We observe that the K-SVD image appears
more uniform than those obtained through TV and NLM. We
also note that the K-SVD image reveals details not clearly
Fig. 3. Qualitative denoising comparison on biological data. Left to
right: Original noisy image and denoising results generated by TV,
NLM, and K-SVD for one DWI from a 94-direction acquisition.
distinguished by other methods (e.g. the cortical ribbon just
anterior to the callosal genu). With the usual caveats regard-
ing the lack of ground truth for biological data, Fig. 3 suggests
that the performance of K-SVD on real human brain data is
similar to that observed in our digital phantom experiments.
3.4. Reproducibility
Finally we address the non-deterministic nature of K-SVD
and the need for multiple-round averaging. In the absence
of relevant prior information, it makes most sense to initial-
ize the training set Wand dictionary Drandomly from the
data as indicated in Algorithm 1. Naturally, the resulting op-
timized dictionary and consequently the denoised result will
depend somewhat on these choices. For single-round K-SVD,
results for two separate denoising runs on the biological data
set from Section 3.3 are not identical as shown in the top
row of Fig. 4. Though the discrepancy is small (comparing
voxel intensities between runs, mean percent error =2.95%),
it may be desirable to minimize this behavior for certain ap-
plications. This can be achieved by employing the simple av-
eraging method we have used throughout this report: the bot-
tom row of Fig. 4 shows that averaging Zfor R=10rounds
improves reproducibility (mean percent error =1.24%).
We have presented a new method for HARDI denoising based
on K-SVD and characterized its performance using both sim-
ulated and biological data. The results suggest that K-SVD
outperforms existing denoising methods with respect to both
recovered image quality and computational cost. We have
also shown that the reproducibility of the method can be im-
proved through multiple-round averaging. K-SVD thus pro-
vides a practical denoising solution with downstream bene-
fits for ODF estimation and DWI registration. Future studies
should investigate the potential for reusing dictionaries be-
tween data sets and the effects on fiber tractography and an-
gular resolution.
[1] A. Buades, B. Coll, and J. M. Morel, “A review of image de-
noising algorithms, with a new one, Multiscale Model Simul,
Fig. 4. K-SVD reproducibility can be improved through multiple-
round averaging. Top: Single-round runs of K-SVD are effective for
denoising (cf. original image, Fig. 3), but random initialization val-
ues lead to discrepant results. Bottom: 10-round averaging reduces
differences across runs.
vol. 4, pp. 490–530, 2005.
[2] G. J. M. Parker, J. A. Schnabel, M. R. Symms, D. J. Wer-
ring, and G. J. Barker, “Nonlinear smoothing for reduction
of systematic and random errors in diffusion tensor imaging.,
J Magn Reson Imag, vol. 11, no. 6, pp. 702–710, 2000.
[3] J. E. Lee, M. K. Chung, and A. L. Alex, “Evaluation of
anisotropic filters for diffusion tensor imaging, in IEEE Sym-
posium on Biomedical Imaging: Macro to Nano, 2006, pp. 77–
[4] T. McGraw, B. C. Vemuri, E. Ozarslan, Y. Chen, and T. Mareci,
“Variational denoising of diffusion weighted MRI, Inv Prob
Imag, vol. 3, no. 3, pp. 625–649, 2009.
[5] Y. Kim, P. M. Thompson, A. W. Toga, L. Vese, and L. Zhan,
“HARDI denoising: variational regularization of the spherical
apparent diffusion coefficient sADC., Inf Proc Med Imag, vol.
21, pp. 515–527, 2009.
[6] M. Descoteaux, N. Wiest-Daessl´
e, S. Prima, C. Barillot, and
R. Deriche, “Impact of Rician adapted non-local means filter-
ing on HARDI.,” Med Imag Comput Comput Assist Interv, vol.
11, pp. 122–130, 2008.
[7] N. Wiest-Daessl´
e, S. Prima, P. Coup´
e, S. P. Morrissey, and
C. Barillot, “Non-local means variants for denoising of
diffusion-weighted and diffusion tensor MRI., Med Imag
Comput Comput Assist Interv, vol. 10, pp. 344–351, 2007.
[8] S. Aja-Fernandez, M. Niethammer, M. Kubicki, M.E. Shenton,
and C.-F. Westin, “Restoration of DWI data using a Rician
LMMSE estimator, IEEE Trans Med Imag, vol. 27, no. 10,
pp. 1389 –1403, 2008.
[9] M. Elad and M. Aharon, “Image denoising via sparse and re-
dundant representations over learned dictionaries,IEEE Trans
Image Proc, vol. 15, no. 12, pp. 3736 –3745, 2006.
[10] R. Rubinstein, M. Zibulevsky, and M. Elad, “Efficient im-
plementation of the K-SVD algorithm using batch orthogonal
matching pursuit.,” Tech. Rep., CS Technion, 2008.
[11] A. Goh, C. Lenglet, P.M. Thompson, and R. Vidal, “A non-
parametric Riemannian framework for processing high angular
resolution diffusion images (HARDI),” in Comp Vis Pat Recog,
2009, pp. 2496 –2503.
    • "This could indicate that dictionary-based techniques successfully estimate the missing q-space samples as well as denoise the q-space. In accordance with this conclusion, K-SVD was recently proposed as a denoising tool for high-angular diffusion imaging (HARDI) [35] , where training and denoising were performed on q-space images. Regarding the PCA method, using a lower dimensional space reduces the number of coefficients that need to be estimated from the sampled q-space points. "
    [Show abstract] [Hide abstract] ABSTRACT: Diffusion Spectrum Imaging (DSI) reveals detailed local diffusion properties at the expense of substantially long imaging times. It is possible to accelerate acquisition by undersampling in q-space, followed by image reconstruction that exploits prior knowledge on the diffusion probability density functions (pdfs). Previously proposed methods impose this prior in the form of sparsity under wavelet and total variation (TV) transforms, or under adaptive dictionaries that are trained on example datasets to maximize the sparsity of the representation. These compressed sensing (CS) methods require full-brain processing times on the order of hours using Matlab running on a workstation. This work presents two dictionary-based reconstruction techniques that use analytical solutions, and are two orders of magnitude faster than the previously proposed dictionary-based CS approach. The first method generates a dictionary from the training data using Principal Component Analysis (PCA), and performs the reconstruction in the PCA space. The second proposed method applies reconstruction using pseudoinverse with Tikhonov regularization with respect to a dictionary. This dictionary can either be obtained using the KSVD algorithm, or it can simply be the training dataset of pdfs without any training. All of the proposed methods achieve reconstruction times on the order of seconds per imaging slice, and have reconstruction quality comparable to that of dictionarybased CS algorithm.
    Full-text · Article · Jul 2013
    • "In ref. [4], SVD with a least-squares criterion is used to replace the inverse Fourier transform. In ref. [3], SVD is used for denoising of diffusion images. In multi-channel MRI systems, SVD can also be used for coil compression [5] . "
    [Show abstract] [Hide abstract] ABSTRACT: The reconstruction of magnetic resonance imaging (MRI) data can be a computationally demanding task. Signal-to-noise ratio is also a concern, especially in high-resolution imaging. Data compression may be useful not only for reducing reconstruction complexity and memory requirements, but also for reducing noise, as it is capable of eliminating spurious components. This work proposes the use of a singular value decomposition low-rank approximation for reconstruction and denoising of MRI data. The Akaike Information Criterion is used to estimate the appropriate model order, which is used to remove noise components and to reduce the amount of data to be stored and processed. The proposed method is evaluated using in vivo MRI data. We present images reconstructed using less than 20% of the original data size, and with a similar quality in terms of visual inspection. A quantitative evaluation is also presented.
    Full-text · Article · Jan 2012 · Proceedings / IEEE International Symposium on Biomedical Imaging: from nano to macro. IEEE International Symposium on Biomedical Imaging
  • [Show abstract] [Hide abstract] ABSTRACT: This paper addresses the denoising problem associated with diffusion MR imaging. Building on previous approaches to this problem, this paper presents a new method for joint denoising of a sequence of diffusion-weighted (DW) magnitude images. The proposed method uses a maximum a posteriori (MAP) estimation formulation to incorporate a Rician likelihood (for modeling the noisy magnitude data), a low rank model (for the DW image sequences) and a spatial prior (for imposing joint edge constraints). An efficient algorithm to solve the associated optimization problem is also described. The proposed method has been evaluated using both simulated and experimental diffusion tensor imaging (DTI) data, which yields very encouraging results both qualitatively and quantitatively.
    Article · May 2012
Show more