# K-SVD for HARDI denoising

**Abstract**

Noise is an important concern in high-angular resolution diffusion imaging studies because it can lead to errors in downstream analyses of white matter structure. To address this issue, we investigate a new approach for denoising diffusion-weighted data sets based on the K-SVD algorithm. We analyze its characteristics using both simulated and biological data and compare its performance with existing methods. Our results show that K-SVD provides robust and effective noise reduction and is practical for use in high-volume applications.

# Figures

K-SVD FOR HARDI DENOISING

Vishal Patel, Yonggang Shi, Paul M. Thompson, Arthur W. Toga

Laboratory of Neuro Imaging, University of California, Los Angeles

ABSTRACT

Noise is an important concern in high-angular resolution diffusion

imaging studies because it can lead to errors in downstream analy-

ses of white matter structure. To address this issue, we investigate

a new approach for denoising diffusion-weighted data sets based on

the K-SVD algorithm. We analyze its characteristics using both sim-

ulated and biological data and compare its performance with existing

methods. Our results show that K-SVD provides robust and effective

noise reduction and is practical for use in high-volume applications.

Index Terms—Magnetic resonance imaging, diffusion tensor

imaging, noise reduction, algorithms, brain

1. INTRODUCTION

High-angular resolution diffusion imaging (HARDI) involves

the analysis of multiple diffusion-weighted images (DWIs) to

reconstruct complex white matter structure. As with all MR

images, these DWIs are corrupted by noise from biological,

electronic, and various other sources. This noise can lead to

inaccurate DWI registration, erroneous orientation distribu-

tion function (ODF) estimation, and subsequent tractography

errors. In this report, we consider HARDI denoising as an in-

dependent processing stage—this approach has the potential

to improve not only ODF estimation and tractography, but

also DWI registration which is often important for population

studies of white matter structure.

Many previously proposed methods for MR noise reduc-

tion extend approaches for 2-D image denoising, and [1] pro-

vides a comprehensive review of common techniques. Here,

we brieﬂy mention methods speciﬁcally directed at denoising

DWIs for HARDI analysis. Some have examined anisotropic

ﬁltering for diffusion MRI using the Perona-Malik scheme

[2] and anisotropic Gaussian kernels [3]. Other methods rely

on total variation minimization: [4] proposed smoothing the

spherical signal with a ﬁnite-element method prior to mini-

mizing its 3-D total variation, while [5] have presented a sim-

ilar approach which operates on the spherical apparent diffu-

sion coefﬁcient. The popular non-local means algorithm has

also been evaluated for this purpose in [6] and [7]. Still oth-

ers have proposed restoring DWIs with a linear mean square

error estimator through a Rician noise model [8].

In this paper, we adapt a recently-developed denoising

algorithm, K-SVD [9], for the task of noise reduction in

HARDI data. Below, we present its formulation and evaluate

its performance relative to other denoising methods.

2. K-SVD FOR HARDI

The K-SVD algorithm was introduced by [9] as a method for

sparse signal representation, a problem which has recently at-

tracted much attention. A full review of this topic is beyond

the scope of this report; instead, we provide here a focused

description of the K-SVD algorithm with the speciﬁc intent

of denoising HARDI data sets.

K-SVD is designed to seek an efﬁcient decomposition of

a set of signals into a sparse coding Xfrom a dictionary D.

Given a HARDI data set comprised of MDWIs and Nvox-

els, we denote the M×Nmatrix of DWI signal attenuations

as Y. For computational efﬁciency, we train our K-atom dic-

tionary (D∈RM×K)onaP<Nrandom sampling of

in-brain voxels collected in the M×Ptraining set W. More-

over, we desire that each of the K-length coding vectors in X

satisﬁes a sparsity threshold T0:

argmin

D,X

W−DX2

Fs.t. ∀p, xp0≤T0(1)

In (1) and throughout this paper, we use lowercase symbols to

represent vector components of the corresponding matrices,

with subscripts and superscripts indicating column and row

vectors, respectively. K-SVD optimizes Dand Xthrough a

number of training iterations (in this work, we use I=40

iterations, which we ﬁnd empirically to be sufﬁcient for solu-

tion convergence). Each iteration consists of a sparse coding

stage that optimizes the coefﬁcients in Xand a dictionary up-

date stage that improves the atoms in D.

During the sparse coding stage,Dis held ﬁxed while each

coefﬁcient vector xpis optimized through the minimization:

argmin

xp

wp−Dxp2

2s.t. xp0≤T0(2)

Problems of the form (2) have been widely studied for “com-

pressed sensing”; solution methods include basis pursuit,

matching pursuit, FOCUSS, etc. In this work, we utilize

an orthogonal matching pursuit variant, Batch-OMP [10], to

solve (2) efﬁciently, but any suitable minimization technique

can be substituted to compute xp.

During the dictionary update stage, each atom dkis im-

proved sequentially, along with the coding vectors in Xwhich

utilize that atom. This update process is the key insight of

K-SVD which accelerates the convergence of (1) while main-

taining the sparsity requirement [9]. To optimally replace the

1805978-1-4244-4128-0/11/$25.00 ©2011 IEEE ISBI 2011

Algorithm 1 K-SVD for HARDI (see text for variable deﬁnitions)

Z←0// initialize sparse-coded image

for r=1to Rdo

select the Pcolumns of Wrandomly from Y// training set

select the Kcolumns of Drandomly from W// initialize dictionary

for i=1to Ido // training iterations

for p=1to Pdo // sparse coding stage

xp←argmin

xp

wp−Dxp2

2s.t. xp0≤T0

end for

for k=1to Kdo // dictionary update stage

dk←0

E←W−DX

˜

E←ep|xk(p)=0 // columns which use dk

U,S,VT←SVD ˜

E

dk←u1// updated atom

xkp|xk(p)=0

←s11v1// updated codings

end for

end for

for n=1to Ndo // encode all voxels by OMP

xn←argmin

xn

yn−Dxn2

2s.t. xn0≤T0

zn←zn+Dxn/R

end for

end for

ˆ

C←(λY+Z)/(λ+1)

kth atom, we ﬁrst set dk=0and compute the reconstruc-

tion error ˜

Efor those signals which use dk, (i.e. those signals

for which xkis non-zero). The ideal replacement atom and

coding then satisfy ˜

E=dkxk; however, since the right-hand

side is the product of two vectors, this reduces to ﬁnding the

closest rank-1approximation to ˜

E, which we obtain by trun-

cating the singular value decomposition: ˜

E=USVT. Atom

dkis thus replaced by the ﬁrst output basis vector u1, while

the non-zero values in xkare adjusted to the product of the

ﬁrst singular value and the ﬁrst input basis vector s11v1.

After Iiterations of sparse coding and dictionary update

stages, Dis optimized to span the range of signals contained

in the training set W. We then encode the entire volume Y

by Batch-OMP following (2) and recover the sparse-coded

result Z=DX. In practice, for reasons discussed in Sec-

tion 3.4, we repeat this entire process for Rrounds (we use

R=10unless otherwise speciﬁed), and average the results

in Z. The denoising problem can then be written as a simple

quadratic minimization between a data-ﬁdelity term and the

sparse-coded result: ˆ

C=argmin

C

λY−C2

2+Z−C2

2,

where the parameter λcontrols the relative weighting. This

form ensures that any elements of Ywhich cannot be well-

represented by the W-trained dictionary are still reasonably

preserved. Empirically, we ﬁnd that with known or estimated

SNR σfor the raw data, λ=10

0.1σ−2is a useful heuris-

tic which increases the weight of the data-ﬁdelity term as

Ybecomes more reliable. The closed-form solution ˆ

C=

(λY+Z)/(λ+1)gives the denoised image. The full pro-

cedure is summarized in Algorithm 1.

We note here a unique property of the K-SVD denois-

ing method for HARDI: the algorithm is driven solely by the

statistical properties of the training data—in contrast to most

conventional denoising methods, it makes no implicit or ex-

Fig. 1. K-SVD parameter tuning on a simulated volume: 106voxels,

64 gradient directions, SNR =10. K-SVD was performed over

a range of values for dictionary size (K) and sparsity (T0). Error

between the denoised result and the ideal noise-free simulation is

quantiﬁed as root-mean-square error over the 4-D image (left) and

the Fisher-Rao measure between reconstructed ODFs (right).

plicit assumptions about voxel neighborhoods, spatial conti-

nuity, or gradients. Indeed, the method does not consider the

physical positions or adjacency of the signals in Yat all. The

behavior of K-SVD can be understood intuitively by realiz-

ing that the sparse coding forces the signals into a smaller

subspace so that they become more similar than their noisy

variants. The image is thus denoised without imposing un-

necessary smoothness constraints so that true details are pre-

served. Note though, that if neighborhood information is ex-

pected to improve denoising, it can be trivially incorporated

into this process through a simple image tiling scheme [9].

3. EXPERIMENTS AND RESULTS

To understand the properties of K-SVD denoising, we eval-

uate the optimal parameter choices and perform quantitative

and qualitative comparisons with other denoising approaches

using both simulated and biological data sets.

3.1. Parameter Tuning

The main parameters to evaluate in the K-SVD process are

the dictionary size Kand the sparsity limit T0. We study

the effect of varying these parameters using a simulation

with N=10

6voxels and 64 evenly-distributed diffusion-

weighting gradient directions gm. The signal in each voxel

arises from 1–3randomly-oriented ﬁbers simulated by the

multi-tensor model: Sm=S0q∈[1,3] e−bgT

mDqgm/q, where

b= 1000 s/mm2, and diffusion tensor Dqhas eigenvalues

λ1=1.7×10−3,λ

2=λ3=0.2×10−3mm2/s with the

primary eigenvector directed along the qth ﬁber direction. We

degrade this ideal data set with Rician noise to produce a sim-

ulated volume with SNR =10, which we then denoise with

the K-SVD procedure using P= 2000 voxels for training

and a range of values for K∈[10,316] and T0∈[1,10].In

Fig. 1, we show the error between the denoised reconstruction

and the ideal, noise-free data quantiﬁed set as: 1) the root-

1806

Fig. 2. Comparison of K-SVD with TV and NLM. Reconstructed ODFs are shown for a 4×4region from a 106-voxel digital phantom.

Left-to-right: ground truth simulation, noise-corrupted simulation, and TV, NLM, and K-SVD denoising results. Below: Over all 106voxels,

the mean Fisher-Rao (FR) distance between ground truth ODFs and those from the corresponding panel, and the computational run time (RT).

mean-square error (RMSE) using the DWIs themselves, and

2) the mean Fisher-Rao metric [11] between unregularized

ODFs estimated from these DWIs.

Both error measures reveal several important properties

of K-SVD. For very small dictionaries or low sparsity thresh-

olds, the error between the denoised result and the ideal noise-

free case increases, indicating that the parameters are too re-

strictive to permit effective coding of the full range of signals

present in the test volume. The reconstruction error also in-

creases if the dictionary size is made too large or the sparsity

constraint is too lax, suggesting that expansive dictionaries

or dense codings allow the K-SVD result to reproduce some

of the noise in the input. Optimal denoising performance is

obtained for intermediate values of dictionary size (K≈20–

100) and sparsity (T0≈2–5), which permit enough entropy

to capture the true signal variability, but not enough to re-

produce most of the noise. We also note that over the full

range of Kand T0we have examined, error measures are less

than for the case in which no denoising is performed (not de-

picted: RMSE =0.099, Fisher-Rao =0.079), indicating that

K-SVD is unlikely to have a detrimental effect across a broad

range of parameter values.

3.2. Comparison with Other Denoising Methods

We next compare the performance of K-SVD with other

denoising approaches which have recently been considered

for use in HARDI: total variation (TV) and non-local means

(NLM). Brieﬂy, for TV, we minimize a functional involving

the total variation of the 3-D spherical apparent diffusion

coefﬁcient as in [5]. For NLM, which replaces each voxel

with a weighted average of itself and “similar” voxels in

some search locality, we apply the method to each DWI in-

dependently, as found to be best by [7]. We adjust tuning

parameters for both TV and NLM manually to obtain optimal

denoising, and for K-SVD, we train for R=10rounds on

P= 2000 voxels and take conservative estimates K= 100

and T0=4from Section 3.1.

We generate a simulated data set as in Section 3.1 with

one difference: to ensure fair testing for TV and NLM which

rely on spatial information, our new digital phantom consists

of large areas with smoothly-varying ﬁber orientation sepa-

rated by sharp boundaries as might be encountered in biolog-

ical data (e.g. Fig. 2, left). For quantitative comparison, we

compute ODFs from the denoised DWIs, and as before, we

use the Fisher-Rao distance between the recovered ODFs and

the ground truth ODFs as an error measure. We also track the

computation time required to denoise the simulated data set

for each method using a single 2.4GHz CPU.

Fig. 2 contains the results. At top, to illustrate the quali-

tative performance of the denoising methods, we show ODFs

for a small edge-containing region of the simulated volume.

Inspection reveals that the NLM and K-SVD results most

closely match the ground truth, with the K-SVD ODFs be-

ing slightly more faithful. These observations are conﬁrmed

by the quantitative analysis: on average, ODFs reconstructed

from the K-SVD denoised DWIs are closer (in a Riemannian

sense) to the ground truth (mean Fisher-Rao distance = 0.028)

than those from TV (0.069) or NLM (0.044). With respect to

computational run time, we see that K-SVD is more than an

order of magnitude faster than both TV (which requires an ex-

pensive gradient descent) and NLM (which has a well-known

cost for computing window similarities). These results are for

unoptimized implementations of the algorithms—the impor-

tant conclusion is that K-SVD denoising is fast enough to be

of practical use in high-volume applications.

3.3. Qualitative Results from Biological Data

We next verify these ﬁndings in a biological data set acquired

from a healthy adult volunteer. Usinga4TBruker Med-

spec unit with a single-shot echo planar technique and twice-

refocused spin echo sequence, we collected 94 DWIs with b-

value 1159 s/mm2and 11 b0images. Image dimensions were

128 ×128 ×55 voxels, with voxel size 1.8×1.8×2.0mm.

Total acquisition time was 14.5min.

Fig. 3 shows a randomly-selected directional DWI from

the original noisy data set and denoised versions generated by

TV, NLM, and K-SVD. Denoising parameters are the same

as in Section 3.2. We observe that the K-SVD image appears

more uniform than those obtained through TV and NLM. We

also note that the K-SVD image reveals details not clearly

1807

Fig. 3. Qualitative denoising comparison on biological data. Left to

right: Original noisy image and denoising results generated by TV,

NLM, and K-SVD for one DWI from a 94-direction acquisition.

distinguished by other methods (e.g. the cortical ribbon just

anterior to the callosal genu). With the usual caveats regard-

ing the lack of ground truth for biological data, Fig. 3 suggests

that the performance of K-SVD on real human brain data is

similar to that observed in our digital phantom experiments.

3.4. Reproducibility

Finally we address the non-deterministic nature of K-SVD

and the need for multiple-round averaging. In the absence

of relevant prior information, it makes most sense to initial-

ize the training set Wand dictionary Drandomly from the

data as indicated in Algorithm 1. Naturally, the resulting op-

timized dictionary and consequently the denoised result will

depend somewhat on these choices. For single-round K-SVD,

results for two separate denoising runs on the biological data

set from Section 3.3 are not identical as shown in the top

row of Fig. 4. Though the discrepancy is small (comparing

voxel intensities between runs, mean percent error =2.95%),

it may be desirable to minimize this behavior for certain ap-

plications. This can be achieved by employing the simple av-

eraging method we have used throughout this report: the bot-

tom row of Fig. 4 shows that averaging Zfor R=10rounds

improves reproducibility (mean percent error =1.24%).

4. CONCLUSIONS

We have presented a new method for HARDI denoising based

on K-SVD and characterized its performance using both sim-

ulated and biological data. The results suggest that K-SVD

outperforms existing denoising methods with respect to both

recovered image quality and computational cost. We have

also shown that the reproducibility of the method can be im-

proved through multiple-round averaging. K-SVD thus pro-

vides a practical denoising solution with downstream bene-

ﬁts for ODF estimation and DWI registration. Future studies

should investigate the potential for reusing dictionaries be-

tween data sets and the effects on ﬁber tractography and an-

gular resolution.

5. REFERENCES

[1] A. Buades, B. Coll, and J. M. Morel, “A review of image de-

noising algorithms, with a new one,” Multiscale Model Simul,

Fig. 4. K-SVD reproducibility can be improved through multiple-

round averaging. Top: Single-round runs of K-SVD are effective for

denoising (cf. original image, Fig. 3), but random initialization val-

ues lead to discrepant results. Bottom: 10-round averaging reduces

differences across runs.

vol. 4, pp. 490–530, 2005.

[2] G. J. M. Parker, J. A. Schnabel, M. R. Symms, D. J. Wer-

ring, and G. J. Barker, “Nonlinear smoothing for reduction

of systematic and random errors in diffusion tensor imaging.,”

J Magn Reson Imag, vol. 11, no. 6, pp. 702–710, 2000.

[3] J. E. Lee, M. K. Chung, and A. L. Alex, “Evaluation of

anisotropic ﬁlters for diffusion tensor imaging,” in IEEE Sym-

posium on Biomedical Imaging: Macro to Nano, 2006, pp. 77–

78.

[4] T. McGraw, B. C. Vemuri, E. Ozarslan, Y. Chen, and T. Mareci,

“Variational denoising of diffusion weighted MRI,” Inv Prob

Imag, vol. 3, no. 3, pp. 625–649, 2009.

[5] Y. Kim, P. M. Thompson, A. W. Toga, L. Vese, and L. Zhan,

“HARDI denoising: variational regularization of the spherical

apparent diffusion coefﬁcient sADC.,” Inf Proc Med Imag, vol.

21, pp. 515–527, 2009.

[6] M. Descoteaux, N. Wiest-Daessl´

e, S. Prima, C. Barillot, and

R. Deriche, “Impact of Rician adapted non-local means ﬁlter-

ing on HARDI.,” Med Imag Comput Comput Assist Interv, vol.

11, pp. 122–130, 2008.

[7] N. Wiest-Daessl´

e, S. Prima, P. Coup´

e, S. P. Morrissey, and

C. Barillot, “Non-local means variants for denoising of

diffusion-weighted and diffusion tensor MRI.,” Med Imag

Comput Comput Assist Interv, vol. 10, pp. 344–351, 2007.

[8] S. Aja-Fernandez, M. Niethammer, M. Kubicki, M.E. Shenton,

and C.-F. Westin, “Restoration of DWI data using a Rician

LMMSE estimator,” IEEE Trans Med Imag, vol. 27, no. 10,

pp. 1389 –1403, 2008.

[9] M. Elad and M. Aharon, “Image denoising via sparse and re-

dundant representations over learned dictionaries,” IEEE Trans

Image Proc, vol. 15, no. 12, pp. 3736 –3745, 2006.

[10] R. Rubinstein, M. Zibulevsky, and M. Elad, “Efﬁcient im-

plementation of the K-SVD algorithm using batch orthogonal

matching pursuit.,” Tech. Rep., CS Technion, 2008.

[11] A. Goh, C. Lenglet, P.M. Thompson, and R. Vidal, “A non-

parametric Riemannian framework for processing high angular

resolution diffusion images (HARDI),” in Comp Vis Pat Recog,

2009, pp. 2496 –2503.

1808

- CitationsCitations7
- ReferencesReferences13

- "This could indicate that dictionary-based techniques successfully estimate the missing q-space samples as well as denoise the q-space. In accordance with this conclusion, K-SVD was recently proposed as a denoising tool for high-angular diffusion imaging (HARDI) [35] , where training and denoising were performed on q-space images. Regarding the PCA method, using a lower dimensional space reduces the number of coefficients that need to be estimated from the sampled q-space points. "

[Show abstract] [Hide abstract]**ABSTRACT:**Diffusion Spectrum Imaging (DSI) reveals detailed local diffusion properties at the expense of substantially long imaging times. It is possible to accelerate acquisition by undersampling in q-space, followed by image reconstruction that exploits prior knowledge on the diffusion probability density functions (pdfs). Previously proposed methods impose this prior in the form of sparsity under wavelet and total variation (TV) transforms, or under adaptive dictionaries that are trained on example datasets to maximize the sparsity of the representation. These compressed sensing (CS) methods require full-brain processing times on the order of hours using Matlab running on a workstation. This work presents two dictionary-based reconstruction techniques that use analytical solutions, and are two orders of magnitude faster than the previously proposed dictionary-based CS approach. The first method generates a dictionary from the training data using Principal Component Analysis (PCA), and performs the reconstruction in the PCA space. The second proposed method applies reconstruction using pseudoinverse with Tikhonov regularization with respect to a dictionary. This dictionary can either be obtained using the KSVD algorithm, or it can simply be the training dataset of pdfs without any training. All of the proposed methods achieve reconstruction times on the order of seconds per imaging slice, and have reconstruction quality comparable to that of dictionarybased CS algorithm.- "In ref. [4], SVD with a least-squares criterion is used to replace the inverse Fourier transform. In ref. [3], SVD is used for denoising of diffusion images. In multi-channel MRI systems, SVD can also be used for coil compression [5] . "

[Show abstract] [Hide abstract]**ABSTRACT:**The reconstruction of magnetic resonance imaging (MRI) data can be a computationally demanding task. Signal-to-noise ratio is also a concern, especially in high-resolution imaging. Data compression may be useful not only for reducing reconstruction complexity and memory requirements, but also for reducing noise, as it is capable of eliminating spurious components. This work proposes the use of a singular value decomposition low-rank approximation for reconstruction and denoising of MRI data. The Akaike Information Criterion is used to estimate the appropriate model order, which is used to remove noise components and to reduce the amount of data to be stored and processed. The proposed method is evaluated using in vivo MRI data. We present images reconstructed using less than 20% of the original data size, and with a similar quality in terms of visual inspection. A quantitative evaluation is also presented.- [Show abstract] [Hide abstract]
**ABSTRACT:**This paper addresses the denoising problem associated with diffusion MR imaging. Building on previous approaches to this problem, this paper presents a new method for joint denoising of a sequence of diffusion-weighted (DW) magnitude images. The proposed method uses a maximum a posteriori (MAP) estimation formulation to incorporate a Rician likelihood (for modeling the noisy magnitude data), a low rank model (for the DW image sequences) and a spatial prior (for imposing joint edge constraints). An efficient algorithm to solve the associated optimization problem is also described. The proposed method has been evaluated using both simulated and experimental diffusion tensor imaging (DTI) data, which yields very encouraging results both qualitatively and quantitatively.

Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.

This publication is from a journal that may support self archiving.

Learn more