Page 1

1

OPTIMAL SENSOR CONFIGURATION IN

REMOTE IMAGE FORMATION

Behzad Sharif, Student Member, IEEE, and Farzad Kamalabadi, Member, IEEE

EDICS Category: GEO-SENS

Abstract

Determination of optimal sensor configuration is an important issue in many remote imaging modal-

ities such as tomographic and interferometric imaging. In this paper, a statistical optimality criterion

is defined and a search is performed over the space of candidate sensor locations to determine the

configuration that optimizes the criterion over all candidates. To make the search process computationally

feasible, a modified version of a previously proposed suboptimal backward greedy algorithm is used. A

statistical framework is developed which allows for inclusion of several widely used image constraints.

Computational complexity of the proposed algorithm is discussed and a fast implementation is described.

Furthermore, upper bounds on the sum of squared error of the proposed algorithm are derived. Connec-

tions of the method to the deterministic backward greedy algorithm for the subset selection problem are

presented and two application examples are described. Four compelling optimality criteria are considered

and their performance is investigated through numerical experiments for a tomographic imaging scenario.

In all cases, it is verified that the chosen configuration by the proposed algorithm performs better than

wisely chosen alternatives.

I. INTRODUCTION

In many image formation scenarios involving multiple sensors, where the relationship between the set

of observations and the unknown field can be adequately characterized by a linear observation model,

the image reconstruction problem can be formulated by the Fredholm integral equation of the first kind

[1]:

Y (r,s) =

??

Ω

a(r,s;r′,s′)X(r′,s′)dr′ds′

(1)

The authors are with the department of Electrical and Computer Engineering and Coordinated Science Laboratory, University

of Illinois at Urbana-Champaign, Urbana, IL 61801 USA (E-mail: {sharif, farzadk}@uiuc.edu).

Page 2

2

where a two-dimensional observation geometry is assumed with r and s denoting spatial variables, and

Ω ⊂ R2is the region of support. Also, Y (r,s) and X(r,s) are the measured data and the unknown field

respectively. The observation kernel is denoted by a(r,s;r′,s′). In practice, the observations are often

a discrete sequence of measured data, {yi}m

field X(r,s) must be discretized. In what follows, it is assumed that the unknown field can be sufficiently

i=1. Furthermore, for a nonanalytical solution, the unknown

represented by a weighted sum of n basis functions {φj(r,s)}n

j=1as follows:

X(r,s) =

n

?

j=1

xjφj(r,s)

(2)

For instance, {φj(r,s)}n

dimensional array of square pixels. In that case, if a square g × f pixel array is used, then n=g · f and

the discretized field is completely described by the set of coefficients {xj}n

values. Collecting all the observations into a vector y of length m, and the unknown image coefficients

j=1are often chosen to be the set of unit height boxes corresponding to a two-

j=1, corresponding to the pixel

into a vector x of length n, results in the following observation model in form of a matrix equation:

y = Ax + w

(3)

where w is the additive measurement noise that is assumed to be independent of x, and A ∈ Cm×n

where m ≥ n is the observation matrix, comprised of inner products of the basis functions with the

corresponding observation kernel:

(A)ij=

??

Ω

ai(r′,s′)φj(r′,s′)dr′ds′, 1 ≤ i ≤ m, 1 ≤ j ≤ n

(4)

where ai(r′,s′) = a(ri,si;r′,s′) denotes the kernel function corresponding to the i-th observation. In

case of image reconstruction from projection data, a ray is defined as a line running through the image

plane in the absence of diffraction and scattering effects. Let yibe the line integral measured by the i-th

ray. Then (A)ij is the weighting factor that represents the contribution of the j-th cell to the i-th ray

integral.

It is known that the placement of sensors affects the information content of the dataset and has

implications on the reconstruction quality. In [2], [3], this relationship is studied by considering the

distribution of the singular values of A. In this paper, we approach the problem by developing a statistical

framework for finding the optimal sensor configuration in remote imaging. Related works include [4]

for optimal sampling in parallel magnetic resonance imaging and [5] for finding the optimal dithering

pattern in a rectangular-grid array utilized for image formation from periodic nonuniform samples.

The task is to search over a set of candidate locations for sensors to find the optimal or close-to-

optimal subset. Initially, it is assumed that there is a sensor located at each candidate location. It is

Page 3

3

assumed that there are p candidate locations and each sensor takes d measurements. Since there are

possibly hundreds of candidate locations, the initial observation matrix, denoted by A0∈ Cm×nwhere

m = p · d, is typically very large. Selecting a subset of candidate locations is equivalent to choosing a

subset of rows of A0. An optimality criterion is needed for the choice of rows, and subsequently the

resulting combinatorial optimization problem must be tackled. The optimality criterion is designed such

that it only depends on the statistics of x and w and the observation kernel. Assuming the statistics of

the problem are invariant under different choices of sensor configuration, the cost function will only be a

function of the observation kernel. Denoting the set of all candidate locations by S and the observation

matrix corresponding to U ⊆ S by A(U), the following optimization problem is reached:

S∗= arg

where |.| denotes the cardinality, q is the desired number of sensors, and Cost(A(U)) is the cost of

choosing subset U as the locations for the q sensors. The best subset of rows of A0can be found by

exhaustive search over all possible combinations

q

impractical even for moderate number of candidate locations. It is possible to exploit the structure of the

min

U⊆ S, |U|=qCost(A(U))

(5)

p

of rows. But this grows exponentially in p and is

problem to reduce the size of the search space by means of Branch and Bound-type algorithms [6]. Again,

for practical situations, even the restricted search space of Branch and Bound methods becomes too large

to handle. In order to avoid the exhaustive search, one has to resort to suboptimal heuristic techniques

[7]. In general, in all subset selection problems of this type there is an inherent complexity/performance

trade-off [8].

In order to make the search computationally feasible, a modified version of the sequential backward

selection (SBS) algorithm [9] is used. The original SBS algorithm eliminates the least important row

at each step. The algorithm stops when the desired number of rows remain. Although the approach is

suboptimal, it eliminates the exhaustive search required for optimally solving the subset selection problem

and has performed consistently well in all the examples tried in the existing literature [6], [9]. Note that

our problem is slightly different from the one that SBS aims to solve. Each candidate sensor location

contributes several rows to the A0matrix. Hence, instead of considering each row of A0independently

and deciding whether to eliminate it or not (as done in SBS), a group of rows contributed by each sensor

is considered at each step. At every iteration, the cost of removing each group of rows is calculated and

the group which incurs the least increase in the cost is eliminated together with its corresponding sensor.

The stop condition is when the number of sensors is reduced to the predesigned value. The proposed

algorithm is referred to as the clustered SBS (CSBS) algorithm.

Page 4

4

The paper is organized as follows. Section II develops a statistical framework which will be used

in subsequent sections to formulate the problem. Section III introduces several optimality criteria. In

Section IV, we formally introduce the CSBS algorithm and elaborate on its complexity, performance and

optimality. In Section V, two important imaging modalities are described as examples that motivate the

application of proposed method. Section VI contains the simulation results. Finally, Section VII concludes

the paper.

II. STATISTICAL FORMULATION

In a typical remote image formation application, due to limitations in the observation geometry, the

corresponding observation matrix is highly ill-conditioned and the resulting inverse problem formulated

in (3) is ill-posed. Tikhonov (quadratic) regularization [10] is one of the more common techniques used

to overcome this issue and is equivalent to maximum a posteriori (MAP) estimation assuming Gaussian

statistics for both the unknown image and noise [11]. Assuming w ∼ CN(0,Σw) and x ∼ CN(x0,Σx),

where CN(µ,Σ) represents the complex normal distribution with mean µ and covariance Σ, the MAP

estimate is

ˆ xMAP

=arg min

x∈Cn[−logp(y|x) − logp(x)]

arg min

x∈Cn

x0+ (AHΣ−1

=

?

?y − Ax?2

Σ−1

w+ ?x − x0?2

Σ−1

x

?

(6)

=

wA + Σ−1

x)−1AHΣ−1

w(y − Ax0)

Assuming independent identically distributed (IID) Gaussian noise and taking Σx =

1

γ2(LTL)−1, we

arrive at the well known Tikhonov regularization functional:

ˆ xTik

=arg min

x∈Cn

arg min

x∈Cn

x0+ (1

?1

σ2

??y − Ax?2

AHA + γ2LTL)−11

w

?y − Ax?2

2+ γ2?L(x − x0)?2

2

?

=

2+ λ?L(x − x0)?2

2

?

=

σ2

w

σ2

w

AH(y − Ax0)

(7)

where L is the positive definite regularization matrix and λ = (γσw)2where σ2

wis the variance of the

noise samples. A special case is when L = I, which results in λ being inverse of the signal-to-noise ratio.

Although we assumed IID noise, more general forms of noise covariance have been applied in remote

sensing applications [12], [13].

Page 5

5

The statistical framework allows for a closed-form measure of estimation uncertainty through the error

covariance matrix Σe= E[eeT], where e = x− ˆ xTik. For the MAP estimate, the error covariance is the

inverse of the corresponding Fisher information matrix and is given by

Σe= (AHΣ−1

wA + Σ−1

x)−1

(8)

where the expected squared error for the i-th element of x is the (i,i)-th element of Σe, denoted by

(Σe)ii. Consequently, the error covariance for Tikhonov regularized reconstruction is Σe= (1

σ2

wAHA+

γ2LTL)−1. It should be noted that with no assumption on the distribution of the image (i.e., non-Gaussian

statistics) the estimator in (7) is the linear minimum mean square error (LMMSE) estimator for x which

minimizes E??x − ˆ x?2

LMMSE estimation.

2

?. Therefore, the results of this paper will in general be valid in the context of

As we will see in the next section, selection of optimality criteria requires considering costs that are

functions of Σe. As shown in (8), this in turn requires estimating Σxor equivalently the regularization

matrix L. In what follows, we show how to incorporate four different forms of constraints into the

statistical framework established in this section in order to estimate Σx.

1) Smoothness Constraint: The L matrix in (7) is typically taken to be a discrete approximation

to the gradient operator in order to formulate smoothness of the unknown image. The discrete

approximation to the first order horizontal derivative has the following form

L =

1

−1

1

...

0

···

0

...

0

...

−1

...

···

...

0

······

1

−1

(9)

Using this regularization matrix is equivalent to assuming x to be a Brownian motion [11]. It is

possible to use higher order derivatives or to adapt these matrices to work on vertical neighbors

[14]. An alternative class of regularization matrices are discretizations of the two-dimensional

Laplacian operator, namely Fried and Hudgin discrete Laplacians [15], [16]. In some applications,

multiple regularization matrices are used. One example is when it is desired to weight smoothness

in horizontal direction with respect to vertical direction:

ˆ x = arg min

x∈Rn

??y − Ax?2

2+ λh?Lhx?2

2+ λv?Lvx?2

2

?

(10)

This can be written in form of (7) if LTL = λhLT

hLh+ λvLT

vLv(assuming λ = 1 and x0= 0).

Since the right hand side is symmetric positive definite, the equivalent L matrix exists and is equal

to the positive definite square root of the right hand side.

Page 6

6

2) Support Constraint: In some occasions, the unknown image x ∈ Cm×nis expected to vanish outside

a region H. It is possible to design a matrix B such that:

?

(r,s)∈H

One such construction is a diagonal matrix with (B)ii being one if xi lies inside H and zero

otherwise. Having B, one can formulate the support constraint by adding the term λsh(x) to

h(x) =

|X(r,s)|2drds ≈ ?Bx?2

2

(11)

penalize the functional in (7). As described in item 1, the added penalty term can be combined

with the regularization term to give a new regularization matrix equal to the positive definite square

root of λLTL + λsBTB (assuming x0= 0).

3) Reference Constraint: In some applications, it is known that x ∈ C where

C = {x′∈ Cn: ?x′− x0?2< ρ}

(12)

for some reference image x0and radius ρ > 0. This is a convex constraint and is referred to as

the reference (prototype) image constraint [17]. If no knowledge of statistics of x exists, using

the maximum likelihood framework together with the reference constraint leads to a constrained

maximum likelihood (CML) estimation problem [18]. The CML estimation is equivalent to an

unconstrained MAP estimation which, assuming Gaussian noise, reduces to Tikhonov regularization:

ˆ xCML

=argmin

x∈C[−logp(y|x)]

arg min

x∈Cn

arg min

x∈Cn

=

?−logp(y|x) + ν(ρ,y)?x − x0?2

??y − Ax?2

2

?

=

2+ ν(ρ,y)?x − x0?2

2

?

where ν(ρ,y) ∈ R+is the associated Lagrange multiplier (see Theorem 1, p. 217 of [19]). As can

be seen, the CML estimation has the same form as the Tikhonov functional with L = In×nand

λ = ν(ρ,y). Hence (8) applies here as well.

4) Energy Constraint: Energy constraint is a special case of the reference constraint with x0= 0 and

ρ =√E0where E0is the energy of the image. Hence, all the results in item 3 apply.

III. OPTIMALITY CRITERIA

In this section, based on the framework developed in the last section, four compelling choices for the

Cost(A) function in (5) are introduced:

1) Sum of Squared Errors (SSE): The expected value of the squared estimation error is given by

Cost1(A) = E[eTe] = tr(Σe) =

n

?

i=1

(Σe)ii

(13)

Page 7

7

It is worth mentioning that if no assumption about the statistics of the signal are made and only

the noise is assumed to be white Gaussian, the SSE criterion will take the form of tr{(AHA)−1}

where the inverse operator is usually replaced by the pseudo inverse to avoid instability due to

ill-posedness of the problem [6].

2) Weighted Sum of Squared Errors (WSSE): In some applications it is desirable to have small error in

one specific area of the reconstructed image while larger errors could be tolerated in other areas. A

natural approach to designing the cost function for this setting is to weight Cost1(A) as follows:

Cost2(A) =

n

?

i=1

Wi· (Σe)ii

(14)

where Wi∈ R+are the weighting coefficients.

3) Uniformity of Squared Errors (USE): In some applications, if the location of the feature of interest

is not known beforehand, it is desirable to place the sensors such that the error is distributed evenly

over all pixels. Intuitively, this will minimize the cost of the worst-case scenario. This goal implies

a different cost function:

Cost3(A) = STD?(Σe)ii

i=1(ci−1

?n

i=1

(15)

where STD?ci

4) Detection Performance: In particular applications, the goal is to reliably detect the presence of a

?n

i=1=

?

1

n

?n

n

?n

j=1cj)2.

feature in the image rather than accurate reconstruction of the image. One example is detection

of presence of a plasma depletion in tomographic imaging of the ionosphere. In [4], the authors

formulate the problem as a binary hypothesis testing problem as follows. Assume the problem is

to decide whether a feature b is present in the background x. Both the feature and the background

are unknown and are modeled as uncorrelated Gaussian random vectors distributed as CN(b0,Σb)

and CN(x0,Σx) respectively. Also the noise vector w is a Gaussian distributed as CN(0,Σw)

and is independent of both b and x. Using the imaging equation (3), the detection problem can be

modeled as the following binary hypothesis testing problem

H0: y = Ax + w

H1: y = A(x + b) + w

Assuming A ∈ Rm×n, the observation vector y is also a real Gaussian random vector. The

covariance matrix of y under hypothesis H0is

Σy|H0= AΣxAH+ Σw

(16)

Page 8

8

and under H1is

Σy|H1= A(Σx+ Σb)AH+ Σw

(17)

The optimal choice of receiver configuration would be the one that minimizes the probability

of error for the above hypothesis testing problem. Unfortunately, the corresponding optimization

problem is analytically intractable. One way of tackling this issue is to maximize the divergence

between the observation distributions under H0and H1. In [4], the use of Bhattacharya distance

[20] for this purpose is proposed, as given by

Bhat(A) =

1

8(m1− m0)H¯Σ−1(m1− m0)

1

2ln

?

+

|¯Σ|

|Σy|H0||Σy|H1|

(18)

where |.| denotes the determinant operator and mi= E[y|Hi] and¯Σ =1

the above setting, m0= Ax0and m1= A(x0+ b0). However, in most of the remote imaging

2(Σy|H0+ Σy|H1). For

applications the size of matrices are very large and computation of the determinant in (18) is not

practical due to limited numerical precision. We propose using the so called J-divergence [20]

which does not include any determinant terms and is defined as

Jdiv(A) =1

2tr??Σy|H1− Σy|H0

2tr??Σy|H1

??Σy|H0

−1?(m1− m0)(m1− m0)H?

−1− Σy|H1

−1??+

1

−1+ Σy|H0

(19)

Therefore, the cost function corresponding to the detection performance criteria is

Cost4(A) = Jdiv(A)

(20)

IV. THE CSBS ALGORITHM

Denote by Γ the set of available sensor locations. Initially, Γ contains all of the candidate sensor

locations, i.e., A(Γ)= A0. The CSBS algorithm can be formally written as:

Γ ← Γ\{k∗} : k∗= argmin

k∈ΓCost(A(Γ\{k}))

(21)

with the stopping criterion being |Γ| = q.

In Subsection IV-A, first we analyze the computational complexity of the general CSBS algorithm. Next,

based on the results in [9], we introduce a fast implementation of the CSBS algorithm with SSE criteria.

In Subsection IV-B, We develop upper bounds for the sum of squared errors of the CSBS algorithm. This

provides a measure of performance guarantee and insight into the behavior of the proposed algorithm.

Page 9

9

Finally in Subsection IV-C, we establish a connection between CSBS and the deterministic backward

greedy algorithm and suggest a conjecture on the optimality of the SBS and CSBS algorithms.

A. Computational Complexity and Fast Implementation

In the general CSBS algorithm stated in (21), there can be at most p − 1 iterations (because q > 0)

and at i-th iteration we need to compute the cost function for the existing p−i+1 sensors. Therefore, in

the worst case we need to compute the cost function?p

criterion. We need to compute the error covariance matrix in (8). Assuming that the constant matrices

i=2i = p(p + 1)/2 − 1 times. Consider the SSE

are computed and stored beforehand, straightforward computation of AHΣ−1

wA is of O(mn2+ nm2)

and we need an additional O(n3) for inversion of the sum. Having m ≥ n, complexity of computing the

cost function is O(nm2). Therefore, the overall complexity is O(nm2p2). The same result holds for the

other two criteria.

Using the Sherman-Morrison matrix inversion formula [21], Reeves et al. in [9] have developed an

efficient implementation of the SBS algorithm. Here, we first restate Reeves’ method for SBS and based

on that result improve upon the computational complexity of the CSBS algorithm in case of SSE criterion.

If we eliminate the i-th row of A, denoted by ai, the SSE corresponding to the modified matrix is given

by [9]

Cost1(A) +

aiΣ2

1 − aiΣeaH

eaH

i

i

where Σeis as given in (8). Therefore, SBS can be implemented by choosing the row that minimizes

the second term of the right hand side. Based on this fact, a fast implementation of CSBS can be derived

as follows:

Γ ← Γ\{k∗} : k∗= argmin

k∈Γ

?

i∈Πk

aiΣ2

1 − aiΣeaH

eaH

i

i

(22)

where Πkis the set of indices of rows measured by the k-th sensor and Σe= (A(Γ)HΣ−1

wA(Γ)+Σ−1

x)−1.

Note that for each iteration of the algorithm in (22), we only need to compute and store Σe once

which has complexity O(nm2) as discussed above. For each iteration, i.e. elimination of one sensor,

the summands in (22) can be computed in O(n2p). So each iteration is of O(nm2+ n2p) which is of

O(nm2) since m ≥ n and m ≥ p. This results in an overall complexity of O(nm2p). Therefore, we

have reduced the complexity by a factor of p which can be significant in practical applications.

Page 10

10

B. Upper bounds on performance of CSBS

The CSBS algorithm is greedy and therefore suboptimal. It is known that a greedy observation selection

algorithm can give the worst possible combination of rows for some applications [22]. In this section,

we provide an upper bound for SSE of the CSBS algorithm (with SSE criterion) that is valid under a

certain condition. If the condition is met, the upper bound provides a performance guarantee for the CSBS

algorithm. Throughout this section, the A matrices involved are the initial A0matrix but for notational

simplicity we drop the superscript. To proceed we need the following result from [9].

Lemma 1 (Upper Bound for SBS): If the initial number of rows is d and the final number after applying

SBS is k, the final SSE is bounded above by

d − n + 1 + tr{(´AH´A +´K)−1´K}

k − n + 1 + tr{(´AH ´A +´K)−1 ´K}tr{(´AH´A +´K)−1}

where´A ∈ Cd×nis the initial matrix.

Proof: See Theorem 1 in [9] for proof.

Since Σ−1

´K = Σ−1

Considering the CSBS algorithm and defining˜A =

w is positive definite Hermitian, it has a positive definite square root, namely

and´A =

?

?

Σ−1

w. With

x

Σ−1

wA, Lemma 1 provides a performance guarantee for the SBS algorithm [9].

?

Σ−1

wA, the SSE in (8) can be written as:

Σe= (˜AH˜A + Σ−1

x)−1

(23)

Denote by Aithe matrix formed by all rows of A with indices in Πi. In other words, Ai∈ Cd×nis the

collection of all measurements by the i-th sensor. According to (23), the SSE can be written as:

Cost1(A) = tr{(˜AH˜A + K)−1} = tr{(˜AH

i˜Ai+ P + K)−1}

where˜ Ai=

?

Σ−1

wAiand P =?

j?=i˜AH

j˜Ajand K = Σ−1

x.

Now we are in a position to obtain a closed-form expression for SSE after the i-th sensor is removed.

By assigning´A =˜ Aiand´K = P+K, Lemma 1 gives the upper bound for SSE if d−k rows of˜Aiare

removed. But eliminating a sensor means that all the corresponding rows should be removed. Therefore,

k is zero which gives the following SSE for a fixed i:

d − n + 1 + tr{(˜AH˜A + K)−1(P + K)}

−n + 1 + tr{(˜AH ˜A + K)−1(P + K)}

tr{(˜AH˜A + K)−1}

(24)

Page 11

11

where we used˜AH

i˜Ai+P =?p

j=1˜AH

j˜Aj=˜AH˜A. The expression in (24) is in effect an upper bound

for tr{(P + K)}−1since no columns of˜Aiare left after removing sensor i. Noting that

tr{(˜AH˜A + K)−1(P + K)} =

tr{(˜AH˜A + K)−1(˜AH˜A + K −˜AH

n − tr{˜AH

i˜Ai)} =

i˜Ai(˜AH˜A + K)−1}

we can write (24) as

d + 1 − tr{˜AH

1 − tr{˜AH

i˜Ai(˜AH˜A + K)−1}

i˜Ai(˜AH ˜A + K)−1}

tr{(˜AH˜A + K)−1}

(25)

It should be noted that the bound in Lemma 1 is valid only if the denominator is positive. In fact,

considering all 1 ≤ i ≤ p, the expression in (25) is a valid upper bound for a SSE if

tr{˜AH

i˜Ai(˜AH˜A + K)−1} < 1 for 1 ≤ i ≤ p(C.1)

The following lemma provides an easily verifiable sufficient condition under which (C.1) holds.

Lemma 2: With Σx=

1

γ2(LTL)−1, a sufficient condition for (C.1) to be satisfied is

γ2≥ max

1≤i≤p

?

tr{˜AH

i˜Ai(LTL)−1}

?

(26)

And assuming IID Gaussian noise (26) is equivalent to

λ ≥ max

1≤i≤p

?tr{AH

iAi(LTL)−1}?

(27)

Proof: The proof is provided in Appendix I.

Therefore, if the reconstruction is regularized enough then (C.1) holds. Under (C.1), the main result

of this subsection that is stated in the following theorem provides an upper bound (independent of i) for

(25).

Theorem 1 (Upper Bound for CSBS): With A ∈ Cp·d×nas the initial matrix, the SSE after removing

one sensor using the CSBS algorithm (with SSE criterion) is bounded above by

(d + 1)p − n + tr{(˜AH˜A + K)−1K}

p − n + tr{(˜AH ˜A + K)−1K}

tr{(˜AH˜A + K)−1}

=(d + 1)p − n + tr{Σ−1

p − n + tr{Σ−1

xΣe}

x Σe}

tr{Σe}

provided that (C.1) holds.

Proof:

The proof is by contradiction. Assume that for all 1 ≤ i ≤ p the bound on SSE after

removing one sensor (stated in (25)) is larger than the bound in Theorem 1. We will show that this will

result in a contradiction. Therefore, there exists at least one 1 ≤ i ≤ p that violates the assumption. Since

Page 12

12

CSBS picks the best i (in SSE sense), it has to violate the assumption which means that it will satisfy

the bound in Theorem 1. To proceed, we assume that for all 1 ≤ i ≤ p we have that

d + 1 − tr{˜AH

1 − tr{˜AH

is larger than

(d + 1)p − n + tr{(˜AH˜A + K)−1K}

p − n + tr{(˜AH ˜A + K)−1K}

The common term in the expressions above is positive since it is a SSE. The numerator and denominator

i˜Ai(˜AH˜A + K)−1}

i˜Ai(˜AH ˜A + K)−1}

tr{(˜AH˜A + K)−1}

(28)

tr{(˜AH˜A + K)−1}

(29)

is (28) are both positive because of (C.1). The numerator in (29) is positive since d · p = m which was

assumed to be larger that n. And the matrix inside the trace is positive definite since it is a multiplication

of two square positive definite matrices. As stated and proved in Appendix II (Lemma 3), the denominator

of (29) is also positive under (C.1). Therefore, eliminating the common term and rearranging gives:

(p − n + tr{GK})(d + 1 − tr{˜AH

(1 − tr{˜AH

i˜AiG}) >

i˜AiG})((d + 1)p − n + tr{GK})

(30)

where G = (˜AH˜A + K)−1. Since (30) holds for all 1 ≤ i ≤ p, its summation should hold as well:

p

?

i=1

(p − n + tr{GK})(d + 1 − tr{˜AH

i˜AiG}) >

((d + 1)p − n + tr{GK})

p

?

i=1

(1 − tr{˜AH

i˜AiG})

(31)

But

p

?

i=1

(tr{˜AH

i˜AiG}) = tr{G

p

?

i=1

˜AH

i˜Ai} = tr{G˜AH˜A}

(32)

and

tr{G(˜AH˜A)}

=tr{G(˜AH˜A + K − K)}

tr{(˜AH˜A + K)−1(˜AH˜A + K − K)}

=

=n − tr{GK}

(33)

Plugging (32) and (33) in (31) gives:

(p − n + tr{GK})((d + 1)p − n + tr{GK}) >

((d + 1)p − n + tr{GK})(p − n + tr{GK})

which is a contradiction. Hence, there exists at least one i that violates (30). As explained in the beginning

of the proof, this establishes the claim.

Page 13

13

C. Conjecture on Optimality of SBS and CSBS

The arsenal of greedy algorithms for subset selection can be classified into two categories: forward

greedy and backward greedy. In a forward greedy algorithm such as matching pursuit [21] and its

variations [23], the idea is to start by finding the row of A0closest to y and then to proceed by adding,

at each step, the column that gives the largest drop in the least squares residual until q columns are

selected. In contrast, the deterministic backward greedy (DBG) algorithm [24] starts with all of the rows

of A0as the set of candidates. At each step, the row that minimizes the increment in the least squares

residual is removed. Both SBS and CSBS algorithms are also backward greedy. The main difference

between SBS or CSBS and DBG is that they do not use any knowledge of the measured data y and

instead utilize the error covariance matrix computed in (8). In other words, the DBG algorithm works in

the Euclidean Hilbert space whereas the SBS and CSBS algorithms (with the SSE criterion) work in the

Hilbert space of random variables. In fact, the SSE criterion in (13) can be written as:

Cost1(A) = min?x − ˆ x?2

E

(34)

where ?z?2

random variables. In [24], it has been shown that the DBG algorithm is guaranteed to select the correct

E=?n

i=1E?|zi|2?

is the induced norm of the Hilbert space of zero-mean, finite-variance

subset of rows of A0if the noise level is small enough. This suggests that there exists a similar optimality

result for both SBS and CSBS. Thus, we propose the following conjecture:

Conjecture (Optimality of SBS and CSBS algorithms) : If the statistical assumptions mentioned in Section

II hold (if (8) applies) and under benign constraints on the condition number of the A0matrix, the SBS

and CSBS algorithms (with SSE criterion) select the optimal sensor configuration (in SSE sense) provided

that the noise variance is small enough.

V. EXAMPLES OF IMAGING MODALITIES

A. Ionospheric Tomography

In ionospheric tomography, the goal is to reconstruct the ionospheric electron density from a set of

projection data which are typically total electron content (TEC) in the case of radio tomography or

photometric brightness measurements in the case of optical tomography. In ground-based radio tomog-

raphy, coherent transmissions from a low earth orbit (LEO) satellite are tracked by an array of ground

sensors [25]. The goal of this technique is to reconstruct horizontal and vertical structures within the

two-dimensional slice of the ionosphere above the ground sensors. In Figure 1, a scenario with three

ground sensors and a LEO satellite at a 735-km orbit is shown. Each line in the figure illustrates one

Page 14

14

0

100

200

300

400

500

600

700

800

900

Geometry of TEC Measurements

Location of Ground Receivers (longitudinal)

Altitude (Km)

LEO Satellite Trajectory

Discretization

Grid

Fig. 1. A typical scenario for ionospheric radio tomography.

integral path between the location of the satellite and one of the ground sensors at a fixed time. A grid

(in polar coordinates) is defined in the vertical plane above the receiver chain. The components of the

A matrix correspond to the distance that each radio link travels through each pixel. The corresponding

inverse problem can be formulated into a linear set of equations as in (3).

B. Interferometric Imaging in Radio Astronomy and Radar Remote Sensing

Radio interferometric imaging is the common imaging modality in radio astronomy, the most well-

known example of which is the Very Large Array astronomical radio observatory [26]. Radar remote

sensing of ionospheric plasma irregularities represents another relevant scenario. Huge antenna arrays such

as Jicamarca Radio Observatory near Lima, Peru are used for these purposes and interferometric imaging

techniques are used for this soft target detection application [27]. In both applications, the objective is

to image the intensity of an unknown two-dimensional sky. In radio astronomy, the antenna are typically

fixed and the earth rotation places them at new effective look positions [28]. In radar imaging, the antenna

array beam can be steered to form different look positions (see Ch. 7 of [29]). Assume that the discretized

sky is made up of n pixels, x = [x1···xn]T∈ Cn. Suppose the two-dimensional antenna array has p

elements and we take data at d look positions. Denote by φjand θjthe azimuth and zenith angles that

the j-th pixel form relative to the center of the array. Assuming that θj and φj are small (less than

0.1 radian) so that small-angle approximations to trigonometric functions apply, each antenna sees the

signal from the j-th pixel with a phase shift of exp(j2π[rk(i)θj+sk(i)φj]), where (rk(i),sk(i)) are the

Page 15

15

coordinates of the k-th antenna at i-th look position. Therefore, the data received at k-th antenna is:

yk(i) =

n

?

j=1

xjej2π[rk(i)θj+sk(i)φj]+ wk(i)

(35)

for i = 1,··· ,d where wk(i) represents the sum of the sensor noise and the atmospheric turbulence

noise both assumed to be white, zero-mean complex Gaussian and independent of sky values. Collecting

the data for all of the antenna at look position i into a vector, the imaging equation can be expressed as:

yi= Aix + wi

(36)

where Ai∈ Cp×nis the corresponding observation operator that consists of the complex exponentials in

(35). As can be seen, in this modality we have multiple independent measurements and the formulation

in Section II needs to be adapted accordingly. The error covariance for the case of m independent

measurements in (36) is given by

Σe= (Σ−1

x +

d

?

i=1

AH

iΣ−1

wiAi)−1

(37)

Besides, a typical astronomical image consists of sparse point sources on a smooth background. The

background can be estimated and subtracted from the measurements, leaving an image which is typically

sparse, i.e., only a small percentage of pixel values are nonzero. Injecting this form of a priori knowledge

is known as sparsity regularization [30], [31]. Sparsity regularization is accomplished by adding a function,

e.g., c(x), that is monotonously increasing with the number of nonzero elements in x (which can also be

thought of as the complexity of x) to the data fidelity term [24]. Recently, it has been shown that using

a function of the form c(x) = λ?x?1performs very well even in the presence of noise [32]. However, in

the MAP estimation framework, having ℓ1-norm as the regularization functional translates into Laplacian

distribution for the unknown image coefficients. Using c(x) = λ?x?1gives p(x) ∝ e−λ?x?1where p(x)

represents the probability distribution function of the random vector x. But this violates the statistical

assumptions in Section II and makes the computation of error covariance intractable. Nonetheless, one

can derive the LMMSE estimator of x and use (8) together with any of the optimality criteria in Section

III. The LMMSE estimator is

ˆ xLMMSE= x0+ (1

σ2

w

d

?

i=1

AH

iAi+ γ2I)−1(1

σ2

w

d

?

i=1

AH

i(yi− Aix0))

where γ =

λ

√2. The corresponding error covariance is Σe= (1

σ2

w

?d

i=1AH

iAi+ γ2I)−1.