K-SVD for HARDI denoising
ABSTRACT Noise is an important concern in high-angular resolution diffusion imaging studies because it can lead to errors in downstream analyses of white matter structure. To address this issue, we investigate a new approach for denoising diffusion-weighted data sets based on the K-SVD algorithm. We analyze its characteristics using both simulated and biological data and compare its performance with existing methods. Our results show that K-SVD provides robust and effective noise reduction and is practical for use in high-volume applications.
- SourceAvailable from: Pierrick Coupé[show abstract] [hide abstract]
ABSTRACT: Diffusion tensor imaging (DT-MRI) is very sensitive to corrupting noise due to the non linear relationship between the diffusion-weighted image intensities (DW-MRI) and the resulting diffusion tensor. Denoising is a crucial step to increase the quality of the estimated tensor field. This enhanced quality allows for a better quantification and a better image interpretation. The methods proposed in this paper are based on the Non-Local (NL) means algorithm. This approach uses the natural redundancy of information in images to remove the noise. We introduce three variations of the NL-means algorithms adapted to DW-MRI and to DT-MRI. Experiments were carried out on a set of 12 diffusion-weighted images (DW-MRI) of the same subject. The results show that the intensity based NL-means approaches give better results in the context of DT-MRI than other classical denoising methods, such as Gaussian Smoothing, Anisotropic Diffusion and Total Variation.Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention. 02/2007; 10(Pt 2):344-51.
- [show abstract] [hide abstract]
ABSTRACT: This paper introduces and analyzes a linear minimum mean square error (LMMSE) estimator using a Rician noise model and its recursive version (RLMMSE) for the restoration of diffusion weighted images. A method to estimate the noise level based on local estimations of mean or variance is used to automatically parametrize the estimator. The restoration performance is evaluated using quality indexes and compared to alternative estimation schemes. The overall scheme is simple, robust, fast, and improves estimations. Filtering diffusion weighted magnetic resonance imaging (DW-MRI) with the proposed methodology leads to more accurate tensor estimations. Real and synthetic datasets are analyzed.IEEE transactions on medical imaging. 11/2008; 27(10):1389-403.
Conference Proceeding: Evaluation of anisotropic filters for diffusion tensor imaging.[show abstract] [hide abstract]
ABSTRACT: Diffusion tensor imaging (DTI) measures, such as fractional anisotropy (FA), and trace are very sensitive to noise contained in the acquired diffusion weighted images. Typical isotropic smoothing methods reduce the high spatial frequency image content and blur the image features. We hypothesized that the diffusion tensor would be an approximate anisotropic Gaussian filter function because the blur will tend to be oriented parallel to the white matter structures. Thus, we implemented and evaluated an anisotropic Gaussian kernel smoothing method based on the diffusion tensor for preserving diffusion tensor structural features while significantly reducing the noise. We compared the diffusion tensor anisotropic filtering with isotropic Gaussian filtering, and a Perona-Malik (PM) filtering algorithm, which was derived from the intensity gradients of diffusion weighted images. Human brain DTI data with high SNR was used as a gold standard for evaluation. Overall, the anisotropic filters performed similarly, with slightly better performance using the DT anisotropic filter across the whole brainProceedings of the 2006 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Arlington, VA, USA, 6-9 April 2006; 01/2006
K-SVD FOR HARDI DENOISING
Vishal Patel, Yonggang Shi, Paul M. Thompson, Arthur W. Toga
Laboratory of Neuro Imaging, University of California, Los Angeles
Noise is an important concern in high-angular resolution diffusion
imaging studies because it can lead to errors in downstream analy-
ses of white matter structure. To address this issue, we investigate
a new approach for denoising diffusion-weighted data sets based on
the K-SVD algorithm. We analyze its characteristics using both sim-
ulated and biological data and compare its performance with existing
methods. Our results show that K-SVD provides robust and effective
noise reduction and is practical for use in high-volume applications.
Index Terms— Magnetic resonance imaging, diffusion tensor
imaging, noise reduction, algorithms, brain
High-angular resolution diffusion imaging (HARDI) involves
the analysis of multiple diffusion-weighted images (DWIs) to
reconstruct complex white matter structure. As with all MR
images, these DWIs are corrupted by noise from biological,
electronic, and various other sources. This noise can lead to
inaccurate DWI registration, erroneous orientation distribu-
tion function (ODF) estimation, and subsequent tractography
errors. In this report, we consider HARDI denoising as an in-
dependent processing stage—this approach has the potential
to improve not only ODF estimation and tractography, but
also DWI registration which is often important for population
studies of white matter structure.
Many previously proposed methods for MR noise reduc-
tion extend approaches for 2-D image denoising, and  pro-
vides a comprehensive review of common techniques. Here,
we briefly mention methods specifically directed at denoising
DWIs for HARDI analysis. Some have examined anisotropic
filtering for diffusion MRI using the Perona-Malik scheme
 and anisotropic Gaussian kernels . Other methods rely
on total variation minimization:  proposed smoothing the
spherical signal with a finite-element method prior to mini-
mizing its 3-D total variation, while  have presented a sim-
ilar approach which operates on the spherical apparent diffu-
sion coefficient. The popular non-local means algorithm has
also been evaluated for this purpose in  and . Still oth-
ers have proposed restoring DWIs with a linear mean square
error estimator through a Rician noise model .
In this paper, we adapt a recently-developed denoising
algorithm, K-SVD , for the task of noise reduction in
HARDI data. Below, we present its formulation and evaluate
its performance relative to other denoising methods.
2. K-SVD FOR HARDI
The K-SVD algorithm was introduced by  as a method for
sparse signal representation, a problem which has recently at-
tracted much attention. A full review of this topic is beyond
the scope of this report; instead, we provide here a focused
description of the K-SVD algorithm with the specific intent
of denoising HARDI data sets.
K-SVD is designed to seek an efficient decomposition of
a set of signals into a sparse coding X from a dictionary D.
Given a HARDI data set comprised of M DWIs and N vox-
els, we denote the M × N matrix of DWI signal attenuations
as Y. For computational efficiency, we train our K-atom dic-
tionary (D ∈ RM×K) on a P < N random sampling of
in-brain voxels collected in the M ×P training set W. More-
over, we desire that each of the K-length coding vectors in X
satisfies a sparsity threshold T0:
?W − DX?2
∀p, ?xp?0≤ T0
In (1) and throughout this paper, we use lowercase symbols to
represent vector components of the corresponding matrices,
with subscripts and superscripts indicating column and row
vectors, respectively. K-SVD optimizes D and X through a
number of training iterations (in this work, we use I = 40
iterations, which we find empirically to be sufficient for solu-
tion convergence). Each iteration consists of a sparse coding
stage that optimizes the coefficients in X and a dictionary up-
date stage that improves the atoms in D.
coefficient vector xpis optimized through the minimization:
Problems of the form (2) have been widely studied for “com-
pressed sensing”; solution methods include basis pursuit,
matching pursuit, FOCUSS, etc. In this work, we utilize
an orthogonal matching pursuit variant, Batch-OMP , to
solve (2) efficiently, but any suitable minimization technique
can be substituted to compute xp.
During the dictionary update stage, each atom dkis im-
utilize that atom. This update process is the key insight of
K-SVD which accelerates the convergence of (1) while main-
taining the sparsity requirement . To optimally replace the
1805978-1-4244-4128-0/11/$25.00 ©2011 IEEEISBI 2011
Algorithm 1 K-SVD for HARDI (see text for variable definitions)
Z ← 0 // initialize sparse-coded image
for r = 1 to R do
select the P columns of W randomly from Y // training set
select the K columns of D randomly from W // initialize dictionary
for i = 1 to I do // training iterations
for p = 1 to P do // sparse coding stage
for k = 1 to K do // dictionary update stage
E ← W − DX
˜E ← ep|xk(p)?=0// columns which use dk
dk← u1// updated atom
xk?p | xk(p) ?= 0?← s11v1// updated codings
for n = 1 to N do // encode all voxels by OMP
zn← zn+ Dxn/R
ˆC ← (λY + Z)/(λ + 1)
kth atom, we first set dk = 0 and compute the reconstruc-
tion error˜E for those signals which use dk, (i.e. those signals
for which xkis non-zero). The ideal replacement atom and
coding then satisfy˜E = dkxk; however, since the right-hand
side is the product of two vectors, this reduces to finding the
closest rank-1 approximation to˜E, which we obtain by trun-
cating the singular value decomposition:˜E = USVT. Atom
dkis thus replaced by the first output basis vector u1, while
the non-zero values in xkare adjusted to the product of the
first singular value and the first input basis vector s11v1.
After I iterations of sparse coding and dictionary update
stages, D is optimized to span the range of signals contained
in the training set W. We then encode the entire volume Y
by Batch-OMP following (2) and recover the sparse-coded
result Z = DX. In practice, for reasons discussed in Sec-
tion 3.4, we repeat this entire process for R rounds (we use
R = 10 unless otherwise specified), and average the results
in Z. The denoising problem can then be written as a simple
quadratic minimization between a data-fidelity term and the
sparse-coded result:ˆC = argmin
where the parameter λ controls the relative weighting. This
form ensures that any elements of Y which cannot be well-
represented by the W-trained dictionary are still reasonably
preserved. Empirically, we find that with known or estimated
SNR σ for the raw data, λ = 100.1σ−2is a useful heuris-
tic which increases the weight of the data-fidelity term as
Y becomes more reliable. The closed-form solutionˆC =
(λY + Z)/(λ + 1) gives the denoised image. The full pro-
cedure is summarized in Algorithm 1.
We note here a unique property of the K-SVD denois-
ing method for HARDI: the algorithm is driven solely by the
statistical properties of the training data—in contrast to most
conventional denoising methods, it makes no implicit or ex-
λ?Y − C?2
2+ ?Z − C?2
Fig. 1. K-SVD parameter tuning on a simulated volume: 106voxels,
64 gradient directions, SNR = 10. K-SVD was performed over
a range of values for dictionary size (K) and sparsity (T0). Error
between the denoised result and the ideal noise-free simulation is
quantified as root-mean-square error over the 4-D image (left) and
the Fisher-Rao measure between reconstructed ODFs (right).
plicit assumptions about voxel neighborhoods, spatial conti-
nuity, or gradients. Indeed, the method does not consider the
physical positions or adjacency of the signals in Y at all. The
behavior of K-SVD can be understood intuitively by realiz-
ing that the sparse coding forces the signals into a smaller
subspace so that they become more similar than their noisy
variants. The image is thus denoised without imposing un-
necessary smoothness constraints so that true details are pre-
served. Note though, that if neighborhood information is ex-
pected to improve denoising, it can be trivially incorporated
into this process through a simple image tiling scheme .
3. EXPERIMENTS AND RESULTS
To understand the properties of K-SVD denoising, we eval-
uate the optimal parameter choices and perform quantitative
and qualitative comparisons with other denoising approaches
using both simulated and biological data sets.
3.1. Parameter Tuning
The main parameters to evaluate in the K-SVD process are
the dictionary size K and the sparsity limit T0. We study
the effect of varying these parameters using a simulation
with N = 106voxels and 64 evenly-distributed diffusion-
weighting gradient directions gm. The signal in each voxel
arises from 1–3 randomly-oriented fibers simulated by the
multi-tensormodel: Sm= S0
b = 1000s/mm2, and diffusion tensor Dqhas eigenvalues
λ1 = 1.7 × 10−3,λ2 = λ3 = 0.2 × 10−3mm2/s with the
primary eigenvector directed along the qth fiber direction. We
degrade this ideal data set with Rician noise to produce a sim-
ulated volume with SNR = 10, which we then denoise with
the K-SVD procedure using P = 2000 voxels for training
and a range of values for K ∈ [10,316] and T0∈ [1,10]. In
Fig. 1, we show the error between the denoised reconstruction
and the ideal, noise-free data quantified set as: 1) the root-
Fig. 2. Comparison of K-SVD with TV and NLM. Reconstructed ODFs are shown for a 4 × 4 region from a 106-voxel digital phantom.
Left-to-right: ground truth simulation, noise-corrupted simulation, and TV, NLM, and K-SVD denoising results. Below: Over all 106voxels,
the mean Fisher-Rao (FR) distance between ground truth ODFs and those from the corresponding panel, and the computational run time (RT).
mean-square error (RMSE) using the DWIs themselves, and
2) the mean Fisher-Rao metric  between unregularized
ODFs estimated from these DWIs.
Both error measures reveal several important properties
of K-SVD. For very small dictionaries or low sparsity thresh-
olds, the errorbetweenthedenoised resultandtheideal noise-
free case increases, indicating that the parameters are too re-
strictive to permit effective coding of the full range of signals
present in the test volume. The reconstruction error also in-
creases if the dictionary size is made too large or the sparsity
constraint is too lax, suggesting that expansive dictionaries
or dense codings allow the K-SVD result to reproduce some
of the noise in the input. Optimal denoising performance is
obtained for intermediate values of dictionary size (K ≈ 20–
100) and sparsity (T0≈ 2–5), which permit enough entropy
to capture the true signal variability, but not enough to re-
produce most of the noise. We also note that over the full
range of K and T0we have examined, error measures are less
than for the case in which no denoising is performed (not de-
picted: RMSE = 0.099, Fisher-Rao = 0.079), indicating that
K-SVD is unlikely to have a detrimental effect across a broad
range of parameter values.
3.2. Comparison with Other Denoising Methods
We next compare the performance of K-SVD with other
denoising approaches which have recently been considered
for use in HARDI: total variation (TV) and non-local means
(NLM). Briefly, for TV, we minimize a functional involving
the total variation of the 3-D spherical apparent diffusion
coefficient as in . For NLM, which replaces each voxel
with a weighted average of itself and “similar” voxels in
some search locality, we apply the method to each DWI in-
dependently, as found to be best by . We adjust tuning
parameters for both TV and NLM manually to obtain optimal
denoising, and for K-SVD, we train for R = 10 rounds on
P = 2000 voxels and take conservative estimates K = 100
and T0= 4 from Section 3.1.
We generate a simulated data set as in Section 3.1 with
one difference: to ensure fair testing for TV and NLM which
rely on spatial information, our new digital phantom consists
of large areas with smoothly-varying fiber orientation sepa-
rated by sharp boundaries as might be encountered in biolog-
ical data (e.g. Fig. 2, left). For quantitative comparison, we
compute ODFs from the denoised DWIs, and as before, we
use the Fisher-Rao distance between the recovered ODFs and
the ground truth ODFs as an error measure. We also track the
computation time required to denoise the simulated data set
for each method using a single 2.4 GHz CPU.
Fig. 2 contains the results. At top, to illustrate the quali-
tative performance of the denoising methods, we show ODFs
for a small edge-containing region of the simulated volume.
Inspection reveals that the NLM and K-SVD results most
closely match the ground truth, with the K-SVD ODFs be-
ing slightly more faithful. These observations are confirmed
by the quantitative analysis: on average, ODFs reconstructed
from the K-SVD denoised DWIs are closer (in a Riemannian
sense) to the ground truth (mean Fisher-Rao distance = 0.028)
than those from TV (0.069) or NLM (0.044). With respect to
computational run time, we see that K-SVD is more than an
order of magnitude faster than both TV (which requires an ex-
pensive gradient descent) and NLM (which has a well-known
cost for computing window similarities). These results are for
unoptimized implementations of the algorithms—the impor-
tant conclusion is that K-SVD denoising is fast enough to be
of practical use in high-volume applications.
3.3. Qualitative Results from Biological Data
We next verify these findings in a biological data set acquired
from a healthy adult volunteer. Using a 4 T Bruker Med-
spec unit with a single-shot echo planar technique and twice-
refocused spin echo sequence, we collected 94 DWIs with b-
value 1159s/mm2and 11 b0images. Image dimensions were
128 × 128 × 55 voxels, with voxel size 1.8 × 1.8 × 2.0 mm.
Total acquisition time was 14.5 min.
Fig. 3 shows a randomly-selected directional DWI from
the original noisy data set and denoised versions generated by
TV, NLM, and K-SVD. Denoising parameters are the same
as in Section 3.2. We observe that the K-SVD image appears
more uniform than those obtained through TV and NLM. We
also note that the K-SVD image reveals details not clearly
Fig. 3. Qualitative denoising comparison on biological data. Left to
right: Original noisy image and denoising results generated by TV,
NLM, and K-SVD for one DWI from a 94-direction acquisition.
distinguished by other methods (e.g. the cortical ribbon just
anterior to the callosal genu). With the usual caveats regard-
that the performance of K-SVD on real human brain data is
similar to that observed in our digital phantom experiments.
Finally we address the non-deterministic nature of K-SVD
and the need for multiple-round averaging. In the absence
of relevant prior information, it makes most sense to initial-
ize the training set W and dictionary D randomly from the
data as indicated in Algorithm 1. Naturally, the resulting op-
timized dictionary and consequently the denoised result will
depend somewhat on these choices. For single-round K-SVD,
results for two separate denoising runs on the biological data
set from Section 3.3 are not identical as shown in the top
row of Fig. 4. Though the discrepancy is small (comparing
voxel intensities between runs, mean percent error = 2.95%),
it may be desirable to minimize this behavior for certain ap-
plications. This can be achieved by employing the simple av-
eraging method we have used throughout this report: the bot-
tom row of Fig. 4 shows that averaging Z for R = 10 rounds
improves reproducibility (mean percent error = 1.24%).
We have presented a new method for HARDI denoising based
on K-SVD and characterized its performance using both sim-
ulated and biological data. The results suggest that K-SVD
outperforms existing denoising methods with respect to both
recovered image quality and computational cost. We have
also shown that the reproducibility of the method can be im-
proved through multiple-round averaging. K-SVD thus pro-
vides a practical denoising solution with downstream bene-
fits for ODF estimation and DWI registration. Future studies
should investigate the potential for reusing dictionaries be-
tween data sets and the effects on fiber tractography and an-
 A. Buades, B. Coll, and J. M. Morel, “A review of image de-
noising algorithms, with a new one,” Multiscale Model Simul,
Fig. 4. K-SVD reproducibility can be improved through multiple-
round averaging. Top: Single-round runs of K-SVD are effective for
denoising (cf. original image, Fig. 3), but random initialization val-
ues lead to discrepant results. Bottom: 10-round averaging reduces
differences across runs.
vol. 4, pp. 490–530, 2005.
 G. J. M. Parker, J. A. Schnabel, M. R. Symms, D. J. Wer-
ring, and G. J. Barker, “Nonlinear smoothing for reduction
of systematic and random errors in diffusion tensor imaging.,”
J Magn Reson Imag, vol. 11, no. 6, pp. 702–710, 2000.
 J. E. Lee, M. K. Chung, and A. L. Alex,
anisotropic filters for diffusion tensor imaging,” in IEEE Sym-
posium on Biomedical Imaging: Macro to Nano, 2006, pp. 77–
 T. McGraw, B. C. Vemuri, E. Ozarslan, Y. Chen, and T. Mareci,
“Variational denoising of diffusion weighted MRI,” Inv Prob
Imag, vol. 3, no. 3, pp. 625–649, 2009.
 Y. Kim, P. M. Thompson, A. W. Toga, L. Vese, and L. Zhan,
“HARDI denoising: variational regularization of the spherical
apparent diffusion coefficient sADC.,” Inf Proc Med Imag, vol.
21, pp. 515–527, 2009.
 M. Descoteaux, N. Wiest-Daessl´ e, S. Prima, C. Barillot, and
R. Deriche, “Impact of Rician adapted non-local means filter-
ing on HARDI.,” Med Imag Comput Comput Assist Interv, vol.
11, pp. 122–130, 2008.
 N. Wiest-Daessl´ e, S. Prima, P. Coup´ e, S. P. Morrissey, and
C. Barillot, “Non-local means variants for denoising of
diffusion-weighted and diffusion tensor MRI.,”
Comput Comput Assist Interv, vol. 10, pp. 344–351, 2007.
 S. Aja-Fernandez, M. Niethammer, M. Kubicki, M.E. Shenton,
and C.-F. Westin, “Restoration of DWI data using a Rician
LMMSE estimator,” IEEE Trans Med Imag, vol. 27, no. 10,
pp. 1389 –1403, 2008.
 M. Elad and M. Aharon, “Image denoising via sparse and re-
dundant representations over learned dictionaries,” IEEE Trans
Image Proc, vol. 15, no. 12, pp. 3736 –3745, 2006.
 R. Rubinstein, M. Zibulevsky, and M. Elad, “Efficient im-
plementation of the K-SVD algorithm using batch orthogonal
matching pursuit.,” Tech. Rep., CS Technion, 2008.
 A. Goh, C. Lenglet, P.M. Thompson, and R. Vidal, “A non-
parametric Riemannian framework for processing high angular
resolution diffusion images (HARDI),” in Comp Vis Pat Recog,
2009, pp. 2496 –2503.