Content uploaded by Ming Jiang
Author content
All content in this area was uploaded by Ming Jiang on Jun 09, 2020
Content may be subject to copyright.
Parallel faceted imaging in radio interferometry via
proximal splitting (Faceted HyperSARA): when precision
meets scalability
Pierre-Antoine Thouvenin,1,2?Abdullah Abdulaziz,1?Ming Jiang,3?Arwa Dabbech,1?
Audrey Repetti,1,4?Adrian Jackson,5Jean-Philippe Thiran,3and Yves Wiaux1†
1Institute of Sensors, Signals and Systems, Heriot-Watt University, Edinburgh EH14 4AS, United Kingdom
2Universit´e de Lille, CNRS, Centrale Lille, UMR 9189 – CRIStAL – Centre de Recherche en Informatique, Signal et Automatique de Lille,
F-59000 Lille, France
3Signal Processing Laboratory 5 (LTS5), ´
Ecole Polytechnique F´ed´erale de Lausanne, CH-1015, Lausanne, Switzerland
4Department of Actuarial Mathematics & Statistics, Heriot-Watt University, Edinburgh EH14 4AS, United Kingdom
5EPCC, University of Edinburgh, Edinburgh EH8 9BT, United Kingdom
18 March 2020
ABSTRACT
Upcoming radio interferometers are aiming to image the sky at new levels of resolu-
tion and sensitivity, with wide-band image cubes reaching close to the Petabyte scale
for SKA. Modern proximal optimization algorithms have shown a potential to signif-
icantly outperform CLEAN thanks to their ability to inject complex image models
to regularize the inverse problem for image formation from visibility data. They were
also shown to be scalable to large data volumes thanks to a splitting functionality en-
abling the decomposition of data into blocks, for parallel processing of block-specific
data-fidelity terms of the objective function. In this work, the splitting functionality is
further exploited to decompose the image cube into spatio-spectral facets, and enable
parallel processing of facet-specific regularization terms in the objective. The result-
ing “Faceted HyperSARA” algorithm is implemented in MATLAB (code available on
GitHub). Simulation results on synthetic image cubes confirm that faceting can pro-
vide a major increase in scalability at no cost in imaging quality. A proof-of-concept
reconstruction of a 15 GB image of Cyg A from 7.4 GB of VLA data, utilizing 496
CPU cores on a HPC system for 68 hours, confirms both scalability and a quantum
jump in imaging quality from CLEAN. Assuming slow spectral slope of Cyg A, we
also demonstrate that Faceted HyperSARA can be combined with a dimensionality
reduction technique, enabling utilizing only 31 CPU cores for 142 hours to form the
Cyg A image from the same data, while preserving reconstruction quality. Cyg A
reconstructed cubes are available online.
Key words: techniques: image processing, techniques: interferometric.
1 INTRODUCTION
Modern radio interferometers, such as the Karl G. Jansky
Very Large Array (VLA) (Perley et al. 2011), the LOw Fre-
quency ARray (LOFAR) (Van Haarlem et al. 2013) and the
MeerKAT radio telescope (Jonas et al. 2018) generate ex-
tremely large volumes of data, with the aim of producing
images of the radio sky at unprecedented resolution and
dynamic range over thousands of spectral channels. The
upcoming Square Kilometer Array (SKA) (Dewdney et al.
?The first five authors contributed equally to this work.
†E-mail: y.wiaux@hw.ac.uk
2013) will form wide-band images about 0.56 Petabyte in
size (assuming double precision) from even larger visibil-
ity data volumes (Scaife 2020). SKA is expected to bring
answers to fundamental questions in astronomy1, such as
improving our understanding of cosmology and dark en-
ergy (Rawlings et al. 2004), investigating the origin and evo-
lution of cosmic magnetism (Gaensler et al. 2004) and prob-
ing the early universe where the first stars were formed (Car-
illi et al. 2004). To achieve the expected scientific goals, it
is of paramount importance to design efficient imaging algo-
rithms which meet the capabilities of such powerful instru-
1https://www.skatelescope.org/science/
©2020 The Authors
arXiv:2003.07358v1 [astro-ph.IM] 16 Mar 2020
2Thouvenin et al.
ments. On the one hand, appropriate algorithms need to
inject complex prior image models to regularize the inverse
problem for image formation from visibility data, which only
provide incomplete Fourier sampling. On the other hand,
these algorithms need to be highly parallelizable in order to
scale with the sheer amount of data and the large size of the
wide-band image cubes to be recovered.
A plethora of radio-interferometric (RI) imaging ap-
proaches have been proposed in the literature, which can be
classified into three main categories. A first class of meth-
ods is the celebrated CLEAN family (e.g. H¨
ogbom 1974;
Schwab & Cotton 1983;Bhatnagar & Cornwell 2004;Corn-
well 2008;Rau & Cornwell 2011;Offringa & Smirnov 2017).
In particular, Rau & Cornwell (2011) proposed the multi-
scale multi-frequency deconvolution algorithm (MS-MFS),
leveraging Taylor series and multi-scale CLEAN to promote
spectral smoothness of the wide-band image cube. More re-
cently, Offringa & Smirnov (2017) have proposed the Joined
Channel CLEAN algorithm (JC-CLEAN), where multi-scale
CLEAN components are identified from the integrated resid-
ual image (i.e. the sum of the residual images over all
the channels). Albeit simple and computationally efficient,
CLEAN-based algorithms provide a limited imaging quality
in high resolution and high sensitivity acquisition regimes.
This shortcoming partly results from their greedy nature
and their lack of flexibility in injecting complex prior infor-
mation to regularize the inverse imaging problem. Moreover,
these algorithms often require careful tuning of the associ-
ated parameters.
The second class of methods relies on Bayesian inference
techniques (e.g. Sutton & Wandelt 2006;Sutter et al. 2014;
Junklewitz et al. 2015,2016;Arras et al. 2019). For instance,
Sutter et al. (2014) proposed a monochromatic Bayesian
method based on Markov chain Monte Carlo (MCMC) sam-
pling, considering a Gaussian image prior. Since MCMC
sampling methods are computationally very expensive, an
efficient variant was proposed in Junklewitz et al. (2016);
Arras et al. (2019) to perform approximate Bayesian infer-
ence, formulated in the framework of information theory.
Importantly, Bayesian methods naturally enable the quan-
tification of uncertainty about the image estimate. However,
this type of approaches cannot currently scale to the data
regime expected from modern telescopes.
The third class of approaches leverages optimization
methods allowing sophisticated prior information to be con-
sidered, such as sparsity in an appropriate transform do-
main, smoothness, etc, (e.g. Wiaux et al. 2009;Li, F. et al.
2011;Dabbech et al. 2012;Carrillo et al. 2012;Wenger &
Magnor 2014;Garsden et al. 2015;Dabbech et al. 2015;Gi-
rard et al. 2015;Ferrari et al. 2015;Abdulaziz et al. 2016;
Jiang et al. 2017;Abdulaziz et al. 2019b). From the per-
spective of optimization theory, the inverse imaging prob-
lem is approached by defining an objective function, con-
sisting in a sum of a data-fidelity term and a regularization
term promoting a prior image model to compensate for the
incompleteness of the visibility data. The sought image is
estimated as a minimizer of this objective function, and is
computed through iterative algorithms, which benefit from
well-established convergence guarantees. For instance, Fer-
rari et al. (2015) promote spatial sparsity in a redundant
wavelet domain and spectral sparsity in a Discrete Cosine
Transform. Wenger & Magnor (2014) promote spectra com-
posed of a smooth contribution affected by local sparse de-
viations. In the last decade, Wiaux and collaborators pro-
posed advanced image models: the average sparsity prior
in monochromatic imaging (SARA) (Carrillo et al. 2012,
2013,2014;Onose et al. 2016a,b,2017;Pratley et al. 2017;
Dabbech et al. 2018), the low-rankness and joint average
sparsity priors for wide-band imaging (HyperSARA) (Ab-
dulaziz et al. 2016,2017,2019b), and the polarization con-
straint for polarized imaging (Polarized SARA) (Birdi et al.
2018)2. These models have been reported to result in signifi-
cant improvements in the reconstruction quality in compar-
ison with state-of-the-art clean-based imaging methods, at
the expense of an increased computation cost.
Note that, from a Bayesian perspective, the objective
function can be seen as the negative logarithm of a pos-
terior distribution, with the minimizer corresponding to a
Maximum A Posteriori (MAP) estimate. Methods for un-
certainty quantification by convex optimization have also
been tailored recently, which enable assessing the degree of
confidence in specific structures appearing in the MAP esti-
mate (Repetti et al. 2018,2019;Abdulaziz et al. 2019a). In
this work we focus solely on image estimation.
Convex optimization offers intrinsically parallel al-
gorithmic structures, such as proximal splitting meth-
ods (Combettes & Pesquet 2011;Komodakis & Pesquet
2015). In such algorithmic structures, multi-term objec-
tive functions can be minimized, all terms being handled
in parallel at each iteration. Each term is involved by ap-
plication of its so-called proximal operator, which acts as
a simple denoising operator (e.g. a sparsity regularization
term will induce a thresholding operator). The algorithms
of the SARA family are all powered by an advanced prox-
imal splitting method known as the primal-dual forward-
backward (PDFB) algorithm (Condat 2013;V˜u 2013;Pes-
quet & Repetti 2015). The splitting functionality of PDFB is
utilized in these approaches to enable the decomposition of
data into blocks and parallel processing of the block-specific
data-fidelity terms of the objective function, which provides
scalability to large data volumes. The SARA family however
models the image as a single variable, and the computational
and storage requirements induced by complex regularization
terms can be prohibitive for very large image size, in partic-
ular for wide-band imaging.
We address this bottleneck in the present work. We
propose to decompose the target image cube into regu-
lar, content-agnostic, spatially overlapping spatio-spectral
facets, with which are associated facet-specific regulariza-
tion terms in the objective function. We further exploit the
splitting functionality of PDFB to enable parallel processing
of the regularization terms and ultimately provide further
scalability.
Note that faceting is not a novel paradigm in RI
imaging: it has often been considered for calibration pur-
poses in the context of wide-field imaging, assuming piece-
wise constant direction-dependent effects. For instance, van
Haarlem, M. P. et al. (2013) proposed an image tessella-
tion scheme for LOFAR wide-field images, which has been
leveraged by Tasse et al. (2018) in the context of wide-
2Associated software on the Puri-Psi webpage: https://
basp-group.github.io/Puri- Psi/.
Faceted HyperSARA 3
field wide-band calibration and imaging. However, except
for (Naghibzedeh et al. 2018) and to the best of the au-
thors’ knowledge, facet imaging has hitherto been essentially
addressed with CLEAN-based algorithms. This class of ap-
proaches not only lacks theoretical convergence guarantees,
but also does not offer much flexibility to accommodate ad-
vanced regularization terms. In contrast with (Naghibzedeh
et al. 2018), the proposed faceting approach does not need
to be tailored to the content of the image, and thus offers
more flexibility to design balanced facets exclusively based
on computational considerations.
The reconstruction performance of the Faceted Hyper-
SARA is evaluated against HyperSARA and SARA on syn-
thetic data. We further validate the performance and scala-
bility potential of our approach through the reconstruction
of a 15 GB image cube of Cyg A from 7.4 GB of VLA obser-
vations across 480 channels. Our results confirm the recent
discovery of Cyg A2, a second super-massive black hole in
Cyg A (Perley et al. 2017). Finally, we combine Faceted
HyperSARA with a joint image and data dimensionality re-
duction technique in order to provide further scalability. In
practice, acknowledging the slow spectral slope of Cyg A in
the frequency range of interest, we target a 16-fold reduction
in spectral resolution. We also apply a data dimensionality
reduction technique relying on visibility gridding, offering
here a 33-fold reduction in the data volume. We validate the
stable performance of Faceted HyperSARA on the reduced-
size inverse problem in comparison to the approach without
dimensionality reduction.
The remainder of the article is organized as follows.
Section 2introduces the proposed faceted prior model and
associated objective function underpinning Faceted Hyper-
SARA. The associated algorithm is described in Section 3,
along with the different levels of parallelization exploited in
the proposed MATLAB implementation. Performance vali-
dation is first conducted on synthetic data in Section 4. We
successively evaluate the influence of spectral and spatial
faceting for a varying number of facets and spatial overlap,
both in terms of reconstruction quality and computing time.
Section 5is focused on the validation of the proposed ap-
proach on real VLA observations in terms of precision and
scalability. Section 6illustrates the potential of combining
Faceted HyperSARA and dimensionality reduction for fur-
ther scalability. Conclusions and perspectives are reported
in Section 7.
2 PRIOR MODEL AND OBJECTIVE
FUNCTION
In this section, focusing on optimization-based approaches,
we first recall the discrete version of the inverse problem
for RI image formation from visibility data. We then formu-
late the general structure of the objective function for the
state-of-the-art SARA and HyperSARA approaches. Finally,
we introduce spatio-spectral facets and the associated prior
model, leading to the formulation of the objective function
for the proposed Faceted HyperSARA approach.
(a) Full data cube (b) Per channel data (c) Data blocks
Figure 1. Illustration of the data blocking strategy. Starting from
the full data cube (a), each of the Lchannels represented in (b)
is decomposed into Bdata blocks (c). The data blocks for each
channel are associated with separate data-fidelity terms in the
objective function, processed by independent workers.
2.1 Wide-band inverse problem
Wide-band RI imaging consists in estimating unknown ra-
dio images of the sky over Lfrequency channels. Focusing
on intensity imaging, and assuming a small field of view on
the celestial sphere, each pair of antennae probes, at each
observation frequency, a Fourier component of the sky sur-
face brightness. The Fourier mode is given by the projec-
tion of the corresponding baseline in the plane perpendic-
ular to the line of sight, in units of the observation wave-
length (Thompson et al. 2007). The collection of data (called
visibilities) from all baselines accumulated over the whole
duration of observation provides an incomplete coverage of
the 2D Fourier plane (also called uv-plane) of the image of
interest. The RI measurements can be modeled for each fre-
quency channel index l∈ {1, . . . , L}as (Abdulaziz et al. 2016,
2019b)
yl=Φlxl+nl,with Φl=ΘlGlFZ,(1)
where yl∈CMlis the vector of Mlvisibilities acquired in the
channel l∈ {1, . . . , L}, weighted with the diagonal noise-
whitening matrix Θl∈RMl×Ml.xl∈RN
+is the underlying
image. The vector nl∈CMlrepresents measurement noise,
modeled as a realization of a complex white Gaussian noise.
The measurement operator Φlis composed of a zero-padding
and scaling operator Z∈RK×N, the 2D Discrete Fourier
Transform represented by the matrix F∈CK×K, and a non-
uniform Fourier transform interpolation matrix Gl∈CMl×K.
Each row of Glcontains a compact support interpolation
kernel centered at the corresponding uv-point (Fessler & Sut-
ton 2003), enabling the computation of the Fourier mode
associated with each visibility from surrounding discrete
Fourier points. Note that at the sensitivity of interest to
the new generation of radio telescopes, direction-dependent
effects (DDEs), of either atmospheric or instrumental origin,
complicate the RI measurement equation. For each visibility,
the sky surface brightness is pre-modulated by the product
of a DDE pattern specific to each antenna. The DDEs are
often unknown and need to be calibrated jointly with the
imaging process (Repetti et al. 2017;Repetti & Wiaux 2017;
Thouvenin et al. 2018;Birdi et al. 2019). Focussing here
on the imaging problem, i.e. assuming DDEs are known,
4Thouvenin et al.
they can simply be integrated into the forward model (1) by
building extended interpolation kernels into each row of Gl,
resulting from the convolution of the non-uniform Fourier
transform kernel with a compact-support representation of
the Fourier transform of the involved DDEs. Finally, the
matrix Θlcontains on its diagonal the inverse of the noise
standard deviation associated with each original measure-
ment. This assumes that the original visibility vector was
multiplied by Θlto produce a measurement vector ylaf-
fected by a random independent and identically distributed
(i.i.d., or white) Gaussian noise. This noise-whitening oper-
ation corresponds to what is know as natural weighting in
RI imaging.
When addressing the model (1), a first bottleneck arises
from the sheer volume of the data. To address this issue,
Onose et al. (2016b,2017) have proposed a data blocking
strategy, which has been exploited in the context of wide-
band imaging by Abdulaziz et al. (2019b). The visibility
vectors ylare decomposed into Bblocks (yl,b)1≤b≤B, which
can be handled in parallel by advanced imaging algorithms.
The data model (1) can thus be formulated for any (l,b) ∈
{1, . . . , L}×{1, . . . , B}as
yl,b=Φl,bxl+nl,b,with Φl,b=Θl,bGl,bFZ,(2)
where yl,b∈CMl,bis the vector of Ml,bvisibilities associ-
ated with the b-th block in the channel l. Different blocking
strategies can be adopted, e.g. based on a tessellation of
the uv-space into balanced sets of visibilities (Onose et al.
2016b), or on a decomposition of the data into groups of
snapshots (Dabbech et al. 2018) (see Figure 1).
2.2 General form of the objective function
Estimating the underlying wide-band sky image X=
(xl)1≤l≤Lfrom incomplete Fourier measurements is a
severely ill-posed inverse problem, which calls for powerful
regularization terms to encode a prior image model. In this
context, wide-band RI imaging can be formulated as the
following constrained optimization problem
minimize
X=(xl)1≤l≤L∈RN×L
+
L
Õ
l=1
B
Õ
b=1
ιB(yl,b, εl,b)Φl,bxl+r(X),(3)
where the indices (l,b) ∈ {1, . . . , L}×{1, . . ., B}refer to a data
block bof channel l,B(yl,b, εl,b)=z∈CMl,b| kz−yl,bk2≤
εl,bdenotes the `2-ball centred in yl,bof radius εl,b>0,
and εl,breflects the noise statistics. The notation ιB(yl,b,εl,b)
denotes the indicator function of the `2ball B(yl,b, εl,b).
Specifically, let Cbe a non-empty, closed, convex subset of
CN, then ιCdenotes the indicator function of C, defined by
ιC(z)= +∞if z∈ C and 0 otherwise. On the one hand, the
indicator functions ιB(yl,b, εl,b)act as data-fidelity terms, in
that they ensure the consistency of the modeled data with
the measurements and reflect the white Gaussian nature of
the noise (Carrillo et al. 2012). On the other hand, the func-
tion rencodes a prior model of the unknown image cube.
The priors characterizing the state-of-the-art SARA and Hy-
perSARA approaches, as well as the proposed Faceted Hy-
perSARA approach, are discussed in what follows. Finally,
note that an additional non-negativity prior is imposed in all
approaches of the SARA family focusing on intensity imag-
ing, with the aim to preserve the physical consistency of the
estimated surface brightness. This generalizes to the polar-
ization constraint when solving for all the Stokes parameters
2.3 State-of-the-art average sparsity priors
Sparsity-based priors combined with optimization tech-
niques have proved to be very efficient for astronomical
imaging, in particular in the context of radio interferome-
try (Wiaux et al. 2009;Carrillo et al. 2012;Garsden et al.
2015;Onose et al. 2016b). These techniques aim to solve the
underlying image recovery problem by enforcing sparsity of
the estimated image in an appropriate domain.
In this context, the prior of choice is the `0pseudo-norm,
which counts the number of non-zero coefficients of its ar-
gument (Donoho 2006). However, minimizing this function,
which is neither convex nor smooth, is a NP-hard problem.
A common alternative consists in replacing it by its convex
envelope, the `1norm (Donoho & Stark 1989;Donoho &
Logan 1992). When combined to other convex terms, e.g.
the non-negativity constraint and the `2-ball data-fidelity
constraint in (3), `1-norm priors form a convex objective
function, which can be efficiently minimized by powerful it-
erative algorithms under well-established guarantees on the
convergence of the iterates towards a global minimum.
Although the `1prior has been widely used over the last
decades to promote sparsity, it induces an undesirable de-
pendence on the coefficients’ magnitude. Indeed, unlike the
`0prior, the `1norm penalizes more the larger coefficients
than the smaller. To address this imbalance, a log-sum prior
can be used. In particular, Cand`es & Boyd (2008); Cand`es
et al. (2009) proposed to use a majorization-minimization
framework (Hunter & Lange 2004) to minimize objective
functions with a log-sum prior, leading to a reweighted-`1
approach consisting in minimizing a sequence of convex ob-
jectives with weighted `1norm priors, acting as convex re-
laxations of the log-sum prior. In practice, sequentially min-
imizing convex problems with weighted-`1priors is indeed
much simpler than minimizing a non-convex problem with
a log-sum prior. From a convergence point of view, mul-
tiple works have recently shown that the set of minimiz-
ers resulting from a reweighted-`1procedure coincides with
the one obtained by minimizing a problem with log-sum
prior (Ochs et al. 2015;Geiping & Moeller 2018;Ochs et al.
2019;Repetti & Wiaux 2019). In the following paragraphs,
the SARA (Carrillo et al. 2012) and HyperSARA (Abdu-
laziz et al. 2019b) log-sum priors are presented, considered
as benchmarks to assess the proposed spatio-spectral faceted
prior.
2.3.1 SARA prior
The image prior underpinning the Sparse Averaging
Reweighted Analysis (SARA) has proved efficient for astro-
nomical imaging, and in particular for RI imaging (Carrillo
et al. 2012;Onose et al. 2016b;Abdulaziz et al. 2019b). It
promotes sparsity by minimizing a log-sum prior , consid-
ering a highly redundant transformed domain Ψ†∈RN×I
defined as the concatenation of wavelet bases (first eight
Daubechies wavelets and the Dirac basis), leading to the
notion of average sparsity over the bases of interest. The
Faceted HyperSARA 5
(a) Full image cube (b) Spectral sub-cubes (c) Facets &
weights
Figure 2. Illustration of the proposed faceting scheme, using a
2-fold spectral interleaving process and 9-fold spatial tiling pro-
cess. The full image cube variable (a) is divided into two spectral
sub-cubes (b) with interleaved channels (for a 2-fold interleav-
ing, even and odd channels respectively define a sub-cube). Each
sub-cube is spatially faceted. A regular tessellation (dashed red
lines) is used to define spatio-spectral tiles. The spatio-spectral
facets result from the augmentation of each tile to produce an
overlap between facets (solid red lines). Panel (c) shows a sin-
gle facet (left), as well as the spatial weighting scheme (right)
with linearly decreasing weights in the overlap region. Note that,
though the same tiling process is underpinning the nuclear norm
and `21 norm regularization terms, the definition of the appro-
priate overlap region is specific to each of these terms (via the
selection operators Sqand e
Sqin (9)).
log-sum prior addressed by SARA is of the form
r(X)=eµ
L
Õ
l=1
I
Õ
i=1
log |[Ψ†X]i,l|+υ,(4)
where eµ > 0and υ > 0are regularization parameters, and
[Ψ†X]i,ldenotes the (i,l)-th coefficient of Ψ†X.
This prior is fully separable with respect to the spec-
tral channels, similarly to the data-fidelity term in (3). In
this setting, the wide-band objective function naturally sep-
arates into Lindependent, single-channel objective functions
underpinning the monochromatic SARA approach defined
by Carrillo et al. (2012); Onose et al. (2016b). This approach
is therefore highly parallelizable, and will be taken as a ref-
erence in terms of computing time.
In practice, each single-channel term in the objective (3)
is solved with a reweighting approach leveraging PDFB. The
splitting functionality of this advanced proximal algorithm
structure is in particular utilized to enable the parallel pro-
cessing of the block-specific data-fidelity terms for scalabil-
ity. Note that the parameter eµdoes not affect the minimizers
of the objective. Onose et al. (2016b) suggested that setting
eµin the range [10−5,10−3]provides good convergence speed
in practice.
2.3.2 HyperSARA prior
Unlike the SARA prior, the recently proposed HyperSARA
prior (Abdulaziz et al. 2019b) aims to explicitly promote
spectral correlations. In particular, it promotes low-rankness
of Xresulting from the correlation of its channels, as well as
average spatial sparsity over all frequency channels. Specif-
ically, the log-sum prior of interest is of the form
r(X)=µ
J
Õ
j=1
log |σj(X)| +υ+µ
I
Õ
i=1
log k[Ψ†X]ik2+υ,(5)
where (µ, µ, υ) ∈]0,+∞[3are regularization parameters, J≤
min{N,L}is the rank of X,σj(X)1≤j≤Jare the singular
values of X, and [Ψ†X]idenotes the i-th row of Ψ†X.
HyperSARA has been shown to produce image cubes
with superior quality when compared to SARA and the
wide-band clean-based approach JC-CLEAN (Abdulaziz
et al. 2019b). The fundamental reason for this is that the
number of degrees of freedom to be reconstructed when
adding frequency channels increases more slowly than the
amount of data due to the correlation between the chan-
nels. Also, because the magnitude of the spatial frequency
probed by an antenna pair is proportional to the observation
frequency, the uv-coverage at a higher frequency channel is a
dilated version of the uv-coverage at a lower frequency, with
the dilation parameter between two channels given by the
frequency ratio. Consequently, the data at higher frequency
channels provide higher spatial frequency information for
the lower frequency channels, thus contributing to better
precision of the image reconstruction process, both in terms
of resolution and dynamic range. HyperSARA will thus be
taken as a reference in terms of imaging quality in Section 4.
In practice, the wide-band objective (3) is also solved
with a reweighting procedure relying on PDFB, with the
splitting functionality again utilized to enable the parallel
processing of the block-specific data-fidelity terms for scala-
bility. The regularization terms at the core of HyperSARA
are however not separable, with the full image cube mod-
eled as a single variable. This entails memory and com-
puting requirements scaling with the size of the full im-
age cube. The gist of the present contribution is to ad-
dress this bottleneck by introducing spatio-spectral image
facets. Note that Abdulaziz et al. (2019b) have shown that
the regularization parameters can be set as µ=1, and
µ=kXdirtyk∗/ kΨ†Xdirtyk2,1, where Xdirty denotes the dirty
image cube. Again, in theory, only the ratio of these param-
eters affects the minimizers of the objective.
2.4 Faceting and Faceted HyperSARA prior
The proposed Faceted HyperSARA prior builds on the Hy-
perSARA prior, distributing both the average sparsity and
the low-rankness prior over multiple spatio-spectral facets to
alleviate the computing and storage requirements inherent
to HyperSARA. In particular, we propose to decompose the
3D image cube into Q×Cspatio-spectral facets, as illustrated
in Figure 2and detailed below.
2.4.1 Spectral faceting
The wide-band image cube can first be decomposed into sep-
arate image sub-cubes composed of a subset of the frequency
channels, with a separate prior for each sub-cube. Since the
data-fidelity terms are channel-specific, the overall objective
function (3) reduces to the sum of independent objectives
for each sub-cube. The smaller-size wide-band imaging sub-
problems (smaller data sets, and smaller image volumes)
6Thouvenin et al.
can thus be solved independently in parallel, offering scal-
ability. Taken to the extreme, this simple spectral faceting
can be used to separate all channels and proceed with single-
channel reconstructions (leading to SARA), however simul-
taneously loosing completely the advantage of correlations
between frames to improve image precision. The key point
is to keep an appropriate number of frames per sub-cube in
order to optimally take advantage of this correlation. Also,
given the data at higher frequency channels provide higher
spatial frequency information for the lower frequency chan-
nels, it is of critical importance that the whole extent of the
frequency band of observation be exploited in each channel
reconstruction. In this context, we propose to decompose
the cube into channel-interleaved spectral sub-cubes, each
of which results from a uniform sub-sampling of the whole
frequency band (see Figure 2(b)). We thus decompose the
original inverse problem (1) into Cindependent, channel-
interleaved sub-problems, each considering Lcchannels from
the original data cube, with L=L1+. . . +LC. For each sub-
cube c∈ {1, . . ., C},yc,l,b∈CMc,l,bdenotes the vector of
Mc,l,bvisibilities associated with the channel l∈ {1, . . . , Lc}
and data-block b∈ {1, . . . , B}, and by Φc,l,band εc,l,bthe
associated measurement operator and `2ball radius, respec-
tively. The initial minimization problem (3) is thus reformu-
lated as
minimize
X∈RN×L
+
C
Õ
c=1Lc
Õ
l=1
B
Õ
b=1
ιB(yc,l,b, εc,l,b)Φc,l,bxc,l
+rc(Xc),(6)
where, for every c∈ {1, . . ., C},Xc=(xc,l)1≤l≤Lc∈RN×Lcis
the c-th sub-cube of the full image cube X, with xc,l∈RNthe
l-th image of the sub-cube Xc, and rc:RN×Lc→] − ∞,+∞]
is a sub-part of the regularization term, only acting on the
c-th sub-cube.
2.4.2 Spatial faceting
Faceting can also be performed in the spatial domain by
decomposing the regularization term for each spectral sub-
cube into a sum of terms acting only locally in the spa-
tial domain (see Figure 2(c)). In this context, the resulting
facets will need to overlap in order to avoid edge effects,
so that the overall objective function (6) will take the form
of the sum of inter-dependent facet-specific objectives. This
inter-dependence precludes separating the imaging problem
into facet problems. However, the splitting functionality of
PDFB can be exploited to enable parallel processing of the
facet-specific regularization terms and ensure further scala-
bility (see Section 3).
On the one hand, we propose to split the average
sparsity dictionary Ψ†into Qsmaller wavelet decomposi-
tion, leveraging the wavelet splitting technique introduced
in Pruˇsa (2012, Chapter 4). Pruˇsa (2012) proposed an exact
implementation of the discrete wavelet transform distributed
over multiple facets. In this context, the Daubechies wavelet
bases are decomposed into a collection of facet-based oper-
ators Ψ†
q∈RIq×Nqacting only on the q-th facet of size Nq,
with I=I1+. . . +IQ. The overlap needed to ensure an exact
faceted implementation of the wavelet transforms is com-
posed of a number of pixels between 15(2s−2)and 15(2s−1)
in each spatial direction (Pruˇsa 2012, Section 4.1.4), with
sbeing the level of decomposition. In practice, the overlap
ensures that each facet contains all the information needed
to compute the convolutions underlying the discrete wavelet
transforms locally.
On the other hand, we consider a faceted low-rank prior
enforced by the sum of nuclear norm priors on essentially the
same overlapping facets as those introduced for the wavelet
decomposition. This provides a more tractable alternative
to the global low-rank prior encoded by the nuclear norm
of HyperSARA. Unlike the wavelet decomposition, there is
no equivalent faceted implementation of the eigen-value de-
composition. To mitigate reconstruction artifacts possibly
resulting from the faceting of the 3D image cube, for each
facet q∈ {1, . . ., Q}, of size e
Nq, we propose to introduce a
diagonal matrix Dq∈]0,+∞[ e
Nq×e
Nqensuring a smooth tran-
sition from the borders of one facet to its neighbours. A
natural choice consists in down-weighting the contribution
of pixels involved in multiple facets. A tapering window de-
caying in the overlapping regions is considered, while ensur-
ing that the sum of all the weights associated with each pixel
is equal to unity. In this work, we consider weights in the
form of a 2D triangular apodization window as considered
by Murya et al. (2017) (see Figure 2(c)). The size of the
overlap for this term is taken as an adjustable parameter of
the Faceted HyperSARA approach to further promote lo-
cal correlations. Its influence is investigated in Section 4. In
practice, a larger overlap region than the one taken for the
faceted wavelet transform is considered, taking advantage of
the overlap already imposed by the faceted implementation
of the wavelet decomposition and the associated `2,1norm
priors.
The spatial faceting procedure therefore results in split-
ting the original log-sum priors of HyperSARA in (5) into a
sum of inter-dependent facet-specific log-sum priors, defin-
ing the Faceted HyperSARA prior:
rc(Xc)=
Q
Õ
q=1µc
Jc,q
Õ
j=1
log |σj(Dqe
SqXc)| +υ
+µc
Iq
Õ
i=1
log k[Ψ†
qSqXc]ik2+υ.(7)
In (7), (µc, µc, υ) ∈]0,+∞[3are regularization parameters,
and, for every q∈ {1, . . ., Q},Jc,q≤min(e
Nq,Lc)is the rank of
Dqe
SqXc, and e
Sq∈Re
Nq×Nand Sq∈RNq×Nextract spatially
overlapping spatio-spectral facets from the full image cube
for the low-rankness prior and the average sparsity prior,
respectively. These two operators only differ in the amount
of overlapping pixels considered, which is defined as an ad-
justable parameter for e
Sq, and prescribed by Pruˇsa (2012)
for Sq(Figure 2). Each facet relies on a spatial decomposi-
tion of the image into non-overlapping tiles (see Figure 2(b),
delineated by dashed red lines), each overlapping with its top
and left spatial neighbour. In the following, the overlapping
regions will be referred to as the borders of a facet, in con-
trast with its underlying tile (see Figure 2). An edge facet,
i.e. which does not admit a neighbour in one of the two spa-
tial dimensions, has the same dimension as the underlying
tile in the direction where it does not admit a neighbour
(e.g. corner facets have the same dimension as the under-
Faceted HyperSARA 7
lying tile). Note that HyperSARA corresponds to the case
Q=C=1.
The reweighting approach utilized to minimize the ob-
jective (6) with the log-sum priors (7) via convex relaxations
powered by PDFB is described in Section 3. Crucially, the
splitting functionality of PDFB will be exploited to enable
parallel processing of these facet-specific priors, more specif-
ically their convex relaxations. Note that the appropriate
values for the parameters µcand µcwill be investigated via
simulation and in relation to the corresponding parameter
values for HyperSARA in Section 4.
3 FACETED HYPERSARA
The parallel algorithmic structure of Faceted HyperSARA
is described in this section, leveraging PDFB within a
reweighting approach to handle the log-sum priors. Imple-
mentation details are also discussed.
3.1 Outer reweighting algorithm
To efficiently address the log-sum prior underpinning the
Faceted HyperSARA prior, we resort to a majorize-minimize
algorithm similar to the one proposed by (Cand`es et al.
2009), leading to a reweighting approach described in Al-
gorithm 1.
At each iteration p∈Nof Algorithm 1, problem (6) is
majorized at the local estimate X(p)by a convex approxima-
tion, and then minimized using a PDFB algorithm described
in Algorithm 2(see Algorithm 1line 6). For each sub-cube
c∈ {1, . . ., C}, the convex approximated minimization prob-
lem is of the form
minimize
Xc∈RN×L
+
Lc
Õ
l=1
B
Õ
b=1
ιB(yc,l,b, εc,l,b)Φc,l,bxc,l
+erc(Xc,X(p)
c),(8)
where erc(·,X(p)
c)is a convex local majorant function of rcat
X(p)
c, corresponding to the weighted hybrid norm prior
erc(Xc,X(p)
c)=
Q
Õ
q=1µckDqe
SqXck∗,ωq(X(p)
c)
+µckΨ†
qSqXck2,1,ωq(X(p)
c),(9)
where, for every q∈ {1, . . . , Q}, the weights ωq(X(p)
c)=
ωq,j(X(p)
c)1≤j≤Jc,qand ωq(X(p)
c)=ωq,i(X(p)
c)1≤i≤Iqare
given by
ωq,j(X(p)
c)=σjDqe
SqX(p)
c+υ−1,(10)
ωq,i(X(p)
c)=k[Ψ†
qSqX(p)
c]ik2+υ−1.(11)
At the beginning of the algorithm, the weights are initial-
ized to one (see Algorithm 1line 3, where the notation 1Jc
stands for the vector of size Jcwith all coefficients equal to
1, and Jc=Jc,1+. . . +Jc,Q). Note that the weights defined
in (10)-(11) are multiplied by the regularization parameter
υin Algorithm 1, which is equivalent to re-scaling the sub-
problem (8) by υ. This does not affect the set of minimizers
of the global problem3.
A complete description of the PDFB-powered algorithm
used to solve the sub-problems (8) is provided in the next
section.
Algorithm 1: Outer reweighting algorithm.
Input: X(0)=(X(0)
c)c,P(0)=(P(0)
c)c,W(0)=(W(0)
c)c,
v(0)=(v(0)
c)c
1p←0;
// Initialization of the weights
2for c=1to Cdo
3θ(0)
c=(θ(0)
c,q)1≤q≤Q=1I;θ(0)
c=(θ(0)
c,q)1≤q≤Q=1Jc;
4while stopping criterion not satisfied do
// Solve spectral sub-problems in parallel
5for c=1to Cdo
// Run Algorithm 2
6(X(p+1)
c,P(p+1)
c,W(p+1)
c,v(p+1)
c)=
Algorithm2X(p)
c,P(p)
c,W(p)
c,v(p)
c,θ(p)
c,θ(p)
c;
7for q=1to Qdo
// Update weights: low-rankness prior
8θ(p+1)
c,q=υωq(X(p+1)
c);// using (10)
// Update weights: joint-sparsity prior
9θ(p+1)
c,q=υωq(X(p+1)
c);// using (11)
10 p←p+1;
Result: X(p),P(p),W(p),v(p)
3.2 Inner convex optimization algorithm
A primal-dual algorithmic structure as PDFB works by
jointly solving the problem (8), referred to as the primal
problem, and its dual formulation in the sense of the Fenchel-
Rockafellar duality theory (Bauschke & Combettes 2017).
The splitting functionality will enable all block-specific data-
fidelity terms and facet-specific regularization terms to be
updated in parallel via their proximal operator. In this work,
we resort to a preconditioned variant of PDFB, which uses
proximal operators with respect to non-Euclidean metrics
in order to reduce the number of iterations necessary to
converge. Let U∈Rn×nbe a symmetric, positive definite
matrix. The proximal operator of a proper, convex, lower
semi-continuous function f:Rn→] − ∞,+∞] at z∈Rnwith
respect to the metric induced by Uis defined by (Moreau
1965;Hiriart-Urruty & Lemar´echal 1993)
proxU
f(z)=argmin
x∈Rnnf(x)+1
2(x−z)†U(x−z)o.(12)
The more compact notation proxfis used when U=In,
where Inis the identity matrix in Rn×n. In addition, when
the function fcorresponds to an indicator function of a
3Previous works from Carrillo et al. (2012); Onose et al. (2016b);
Abdulaziz et al. (2019b) suggest that the regularization parameter
υin (10)-(11) should decrease from one iteration pto another by
a factor 80% to improve the convergence rate and the stability of
the algorithm. This procedure is also adopted in this article.
8Thouvenin et al.
Algorithm 2: Inner convex optimisation algorithm
for each spectral sub-problem (8), powered by PDFB.
Data: (yc,l,b)l,b,l∈ {1, . . . , Lc},b∈ {1, . . . , B}
Input: X(0)
c,P(0)
c=P(0)
c,qq,W(0)
c=W(0)
c,qq,
v(0)
c=v(0)
c,l,bc,l,b,θc=θc,q1≤q≤Q,
θc=θc,q1≤q≤Q
Parameters: (Dc,q)q,(Uc,l,b)l,b,(εc,l,b)l,b,µc,µc,τ,
ζ,η,κ
1k←0;ξ= +∞;ˇ
X(0)
c=X(0)
c;
2while ξ > 10−5do
// Broadcast auxiliary variables
3for q=1 to Qdo
4e
X(k)
c,q=e
Sqˇ
X(k)
c;ˇ
X(k)
c,q=Sqˇ
X(k)
c;
5for l= 1 to Lcdo
6ˆ
x(k)
c,l=FZˇ
x(k)
c,l;// Fourier transforms
7for b=1 to Bdo
8ˆ
x(k)
c,l,b=Mc,l,bˆ
x(k)
c,l;// send to data cores
// Update low-rankness variables [facet cores]
9for q=1 to Qdo
10 P(k+1)
c,q=IJq−proxζ−1µck·k∗,θc,qP(k)
c,q+Dc,qe
X(k)
c,q;
11 e
P(k+1)
c,q=D†
qP(k+1)
c;
// Update sparsity variables [facet cores]
12 for q=1 to Qdo
13 W(k+1)
c,q=
IIq−proxκ−1µck·k2,1,θc,qW(k)
c,q+Ψ†
qˇ
X(k)
c,q;
14 f
W(k+1)
c,q=ΨqW(k+1)
c,q;
// Update data fidelity variables [data cores]
15 for (l,b) = (1, 1) to (Lc,B)do
16 v(k+1)
c,l,b=
Uc,l,bIMc,l,b−proxUc,l,b
ιB(yc,l,b, εc,l,b)U−1
c,l,bv(k)
c,l,b+
Θc,l,bGc,l,bˆ
x(k)
c,l,b;
17 e
v(k+1)
c,l,b=G†
c,l,bȆ
c,l,bv(k+1)
c,l,b;
// Inter node communications
18 for l = 1 to Lcdo
19 a(k)
c,l=
Q
Õ
q=1ζe
S†
qe
p(k+1)
c,q,l+κS†
qe
w(k+1)
c,q,l+
20 ηZ†F†Õ
b
M†
c,l,be
v(k+1)
c,l,b;
// Update image tiles [on facet cores, in
parallel]
21 X(k+1)
c=proxιRN×Lc
+X(k)
c−τA(k)
c;
22 ˇ
X(k)
c=2X(k+1)
c−X(k)
c;// communicate facet
borders
23 ξ=kX(k+1)
c−X(k)
ckF/kX(k)
ckF;
24 k←k+1;
Result: X(k)
c,P(k)
c,W(k)
c,v(k)
c
closed, non-empty, convex set, then its proximal operator
reduces to the projection operator onto this set.
A graphical illustration of the PDFB-powered algorithm
to solve problem (8) is given in Figure 3. A formal de-
scription is reported in Algorithm 2. First, the faceted low-
rankness prior is handled in lines 9-11 by computing in par-
allel the proximal operator of the per facet weighted nuclear
Table 1. proximal operators involved in Algorithm 2.
proximal operator for α > 0Details
proxαk·k∗(Z)=UDiagproxαk·k1(σ)V†Z=UΣV†∈RN×L,Σ=Diag(σ)
singular value decomposition of Z
proxαk·k2,1(Z)=maxkz†
nk2−α, 0z†
n
kz†
nk21≤n≤NZ=[z1, . . . , zN]†∈RN×L
proxιB(yc,l,b,ε c,l,b)(z)=εc,l,b
z−yc,l,b
kz−yc,l,bk2
+yc,l,bz∈CMc,l,b
proxιRN×L
+
(Z)=max0,<(Z)Z∈CN×L,<(Z)denotes the real part of Z
norms (see Table 1). Second, the average sparsity prior is
addressed in lines 12-14 by computing the proximal opera-
tor of the per facet weighted `2,1norm in parallel (see Ta-
ble 1). Third, the data-fidelity terms are handled in parallel
in lines 15-17 by computing, for every data block (c,l,b), the
projection onto the `2balls B(yc,l,b, εc,l,b)with respect to
the metric induced by the diagonal matrices Uc,l,b, chosen
using the preconditioning strategy proposed by Onose et al.
(2017); Abdulaziz et al. (2019b). More precisely, their diago-
nal coefficients are the inverse of the sampling density in the
vicinity of the probed Fourier modes. The projections onto
the `2balls for the metric induced by Uc,l,bdo not admit
an analytic expression, and thus need to be approximated
numerically through sub-iterations. In this work, we resort
to FISTA (Beck & Teboulle 2009), which iteratively approx-
imates this projection by computing Euclidean projections
onto the `2balls B(yc,l,b, εc,l,b)(see Table 1). Finally, the
non-negativity constraint is handled in line 21 by comput-
ing the Euclidean projection onto the non-negative orthant
(see Table 1).
Algorithm 2is guaranteed to converge to a global solu-
tion to problem (8), for a given sub-cube c∈ {1, . . ., C}, pro-
vided that the preconditioning matrices (Uc,l,b)c,l,band the
parameters (τ, ζ, η, κ)satisfy technical conditions described in
(Pesquet & Repetti 2015, Lemma 4.3). In particular, these
conditions are satisfied for our choice of parameters: τ=1/3,
ζ=1,η=1/kU1/2
cΦck2
S,κ=1/kΨ†k2
S, where k·kSdenotes
the spectral norm of a linear operator.
Note that PDFB can accommodate randomization in
the update of the variables, e.g. by randomly selecting a
subset of the data and facet dual variables to be updated
at each iteration. This procedure can significantly alleviate
the memory load per node (Pesquet & Repetti 2015) at the
expense of an increased number of iterations for the algo-
rithm to converge. This feature, which has been specifically
investigated for wide-band imaging (Abdulaziz et al. 2017)
and facet-based monochromatic imaging (Naghibzedeh et al.
2018), is not leveraged in the implementation of Algorithm 2
used for the experiments reported in Sections 4,5and 6.
3.3 Implementation
To solve a spectral sub-problem c∈ {1, . . . ,C}, different par-
allelization strategies can be adopted, depending on the com-
puting resources available and the size of the problem to be
addressed. We propose to divide the variables to be esti-
mated into the two following groups of computing cores.
•Data cores: Each core involved in this group is respon-
sible for the update of several dual variables vc,l,b∈CMc,l,b
associated with the data-fidelity terms (see Algorithm 2
line 16). These cores produce auxiliary variables e
vc,l,b∈RN
Faceted HyperSARA 9
Data core 1
xk−1
c,1
𝚽c,1
FB step
prox
| {z }
Backward step
Forward step
z }| {
{ · · · ·}
𝚽†
c,1
Data block 1
··· ···
Data core Lc
xk−1
c,Lc
𝚽c,Lc
FB step
prox
| {z }
Backward step
Forward step
z }| {
{ · · · ·}
𝚽†
c,Lc
Data block Lc
Facet core 1
e
S1Xk−1
c,S1Xk−1
c
Spatio-spectral facet 1
FB step
prox
| {z }
Backward step
Forward step
z }| {
{ · · · ·}
e
S1Xk
c,S1Xk
c
··· ···
Facet coreQ
e
SQXk−1
c,SQXk−1
c
Spatio-spectral facet Q
FB step
prox
| {z }
Backward step
Forward step
z }| {
{ · · · ·}
e
SQXk
c,SQXk
c
Figure 3. Illustration of the two groups of cores described in Section 3, with the main steps involved in Algorithm 2applied to each
independent sub-problem c∈ {1, . . . , C}, using Qfacets (along the spatial dimension) and B=1data block per channel. Data cores
handle variables of the size of data blocks (Algorithm 2lines 15–17), whereas facet cores handle variables of the size of a spatio-spectral
facet (Algorithm 2lines 9–14), respectively. Communications between the two groups are represented by colored arrows. Communications
between facet cores, induced by the overlap between the spatio-spectral facets, are illustrated in Figure 4.
Nx,q
Ny,q
Lc
(a) Broadcast values of the tile
before facet update in dual
space
Nx,q
Ny,q
Lc
L
L
L
(b) Broadcast and average
borders before tile update in
primal space
Figure 4. Illustration of the communication steps involving a
facet core (represented by the top-left rectangle in each sub-figure)
and a maximum of three of its neighbours. The tile underpinning
each facet, located in its bottom-right corner, is delineated in
thick black lines. At each iteration, the following two steps are
performed sequentially. (a) Facet borders need to be completed
before each facet is updated independently in the dual space (Al-
gorithm 2lines 9–14): values of the tile of each facet (top left)
are broadcast to cores handling the neighbouring facets in order
to update their borders (Algorithm 2line 4). (b) Parts of the
facet tiles overlapping with borders of nearby facets need to be
updated before each tile is updated independently in the primal
space (Algorithm 2line 20): values of the parts of the borders
overlapping with the tile of each facet are broadcast by the cores
handling neighbouring facets, and averaged.
of single channel image size, each assumed to be held in
the memory of a single core (line 17). Note that the Fourier
transform computed for each channel lin line 6is performed
once per iteration on the data core (l,1). Each data core
(l,b), with b∈ {2, . . ., B}, receives only a few coefficients of
the Fourier transform of xlfrom the data core (l,1), selected
by the operator Mc,l,b(line 8);
•Facet cores: Each worker involved in this group, com-
posed of Qcores, is responsible for the update of an image
tile (i.e. a portion of the primal variable) and the dual vari-
ables Pc,qand Wc,qassociated with the low-rankness and
the joint average sparsity prior respectively (Algorithm 2,
lines 10 and 13). Note that the image cube is stored across
different facet cores, which are responsible for updating their
image tile (line 21). Since the facets underlying the proposed
prior overlap, communications involving a maximum of 4
contiguous facet cores are needed to build the facet borders
prior to updating the facets independently in the dual space
(Algorithm 2lines 9–14). Values of the tile of each facet are
broadcast to cores handling neighbouring facets in order to
update their borders (Algorithm 2line 4, see Figure 4(a)).
In a second step, parts of the facet tiles overlapping with
borders of nearby facets need to be updated before each tile
is updated independently in the primal space (Algorithm 2
line 20). More precisely, values of the parts of the borders
overlapping with the tile of each facet are broadcast by the
workers handling neighbouring facets, and averaged (see Fig-
ure 4(b)).
A MATLAB implementation of Algorithms 1and 2is
available on the Puri-Psi webpage. Both HyperSARA and
Faceted HyperSARA rely on MPI-like MATLAB paralleliza-
tion features based on the spmd MATLAB function, based
on composite MATLAB variables to handle parameters dis-
tributed across several cores (e.g. for the wide-band image
cube). In practice, 1 process running either on 1 CPU core
(physical core) or one hyperthread (logical core) specifically
10 Thouvenin et al.
ensures communication synchronization between the data
and facet cores. In the following, this process will be referred
to as the master process, hosted on a CPU core referred to
as the master CPU core.
4 VALIDATION ON SYNTHETIC DATA
In this section, the impact of spatial faceting is first assessed
in terms of both reconstruction quality and computing time
for a single spectral sub-problem, using a varying number of
facets and a varying size of the overlapping regions. The im-
pact of spectral faceting on the reconstruction performance
of Faceted HyperSARA is then quantified for a single un-
derlying facet along the spatial dimension (Q=1). Results
are compared with those of both SARA and HyperSARA.
4.1 Simulation setting
4.1.1 Images and data
Following the procedure described by Abdulaziz et al.
(2019b), a wide-band model image composed of Lspec-
tral channels is simulated from an image of the W28 su-
pernova remnant of size N, considering B=1data block
per channel. The measurement operator relies on a realis-
tic VLA uv-coverage, generated within the frequency range
[ν1, νL]=[1,2]GHz with uniformly sampled channels and a
total observation time of 6 hours. Note that the uv-coverage
associated with each channel lcorresponds to the reference
uv-coverage at the frequency ν1scaled by the factor νl/ν1.
The data are corrupted by an additive, zero-mean complex
white Gaussian noise of variance σ2. An input signal-to-
noise ratio (iSNR) of 60 dB is considered, which is defined
as
iSNR =10 log10 ÍlkΦlxlk2
2/Ml
Lσ2!.
Note that, given the larger computational cost of Hy-
perSARA, the size of the data is chosen so that it can be run
in a reasonable amount of time for the different simulation
scenarios described below.
4.1.2 Spatial faceting
The performance of Faceted HyperSARA is first evaluated
with C=1(number of facets along the spectral dimension)
for different parameters of the spatial faceting. Data gen-
erated from a N=1024 ×1024 image composed of L=20
channels are considered, with Ml=0.5Nmeasurements per
channel. The assessment is conducted with (i) varying Q
(number of facets along the spatial dimensions) and a fixed
overlap; (ii) a fixed number of facets and a varying spa-
tial overlap for the nuclear norm regularization. Additional
details can be found in the following lines. Regarding the
choice of the regularization parameters, we set eµ=10−3
for SARA as explained in Section 2.3.1. As prescribed in
Section 2.3.2, the regularization parameters of HyperSARA
are set as µ=1, and µ=kXdirty k∗/k Ψ†Xdirty k2,1=10−3.
For Faceted HyperSARA, we have observed that setting
µc=1and µc=10−2kXdirty
ck∗/kΨ†Xdirty
ck2,1=10−5leads
to a good trade-off to recover high resolution, high dynamic
range model cubes.
•Varying overlap: Reconstruction performance and com-
puting time are evaluated with C=1and Q=16 (4 facets
along each spatial dimension) and a varying size of the over-
lapping region for the faceted nuclear norm (0%, 6%, 20%,
33% and 50% of the spatial size of the facet, correspond-
ing to 0, 16, 64, 128 and 256 pixels respectively) in each of
the two spatial dimensions. Note that the overlap for the
`2,1prior is a fixed parameter (Pruˇsa 2012). The compari-
son is conducted between SARA, HyperSARA, and Faceted
HyperSARA.
•Varying number of facets: The reconstruction perfor-
mance and computing time of Faceted HyperSARA are re-
ported for experiments with Q∈ {4,9,16}(corresponding to
2, 3 and 4 facets along each spatial dimension) with a fixed
overlap corresponding to 50% of the spatial size of a facet.
The regularization parameters are set to the same values as
those considered in the experiment with a varying overlap.
4.1.3 Spectral faceting
The influence of spectral faceting is evaluated in terms of
computing time and reconstruction quality from data gen-
erated with a ground truth image composed of N=256 ×256
pixels in L=100 channels, with Ml=Nmeasurements per
channel. The overall reconstruction performance of SARA,
HyperSARA and Faceted HyperSARA with a single facet
along the spatial dimension (Q=1) is compared. For faceted
HyperSARA, a channel-interleaving process with a varying
number of facets along the spectral dimension Cis con-
sidered (see Section 2.4 and Figure 2(b)). The simula-
tion scenario involves facets composed of a varying num-
ber of channels Lc(Lc≈6, 10, 14, 20, 33 and 50 chan-
nels for each sub-problem c∈ {1, . . ., C}) obtained by down-
sampling the data cube along the frequency dimension. For
the choice of the regularization parameters, we set eµ=10−2
for SARA. Our simulations indeed show that increasing the
value beyond the range suggested in Onose et al. (2016b)
provides better convergence speed. As prescribed in Sec-
tion 2.3.2, the regularization parameters of HyperSARA are
set as µ=1, and µ=kXdirty k∗/kΨ†Xdirty k2,1=10−2.
Similarly for Faceted HyperSARA, we set µc=1and
µc=kXdirty
ck∗/kΨ†Xdirty
ck2,1=10−2.
4.2 Hardware
All the methods compared in this section have been run on
multiple compute nodes of Cirrus, one of the UK’s Tier2
HPC services4. Cirrus is an SGI ICE XA system composed
of 280 compute nodes, each with two 2.1 GHz, 18-core, Intel
Xeon E5-2695 (Broadwell) series processors. The compute
nodes have 256 GB of memory shared between the two pro-
cessors. The system has a single Infiniband FDR network
connecting nodes with a bandwidth of 54.5 GB/s.
The different methods have been applied in the follow-
ing setting. For all the experiments, SARA uses 12 CPU
4https://epsrc.ukri.org/research/facilities/hpc/tier2/
Faceted HyperSARA 11
Channel ν1=1GHz
z }| {
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-3
-2
-1
0
1
2
3
10-4
-3
-2
-1
0
1
2
3
10-4
Channel ν20 =2GHz
z }| {
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
-3
-2
-1
0
1
2
3
10-4
-3
-2
-1
0
1
2
3
10-4
Figure 5. Spatial faceting analysis for synthetic data: reconstructed images (in Jy/pixel) reported in log10 scale for channels ν1=1
GHz (first two columns) and ν20 =2GHz (last two columns) for Faceted HyperSARA with Q=16 and C=1(columns 1 and 3), and
HyperSARA (i.e. Faceted HyperSARA with Q=C=1, in column 2 and 4). From top to bottom are reported the ground truth image,
the reconstructed and residual images. The overlap for the faceted nuclear norm regularization corresponds to 50% of the spatial size of
a facet. The non-overlapping tiles underlying the definition of the facets are delineated on the residual images in red dotted lines, with
the central facet displayed in continuous lines.
Time (h) aSNR (dB) aSNRlog (dB) CPU cores
SARA 5.89 32.78 (±2.76)-1.74 (±0.83)240
HyperSARA 133.1 38.63 (±0.23)-0.39 (±0.95)22
Faceted nuclear norm overlap (0%) 26.26 37.03 (±2.90 ·10−3)5.09 (±1.09)36
Faceted nuclear norm overlap (6%) 18.01 37.01 (±1.00 ·10−3)4.09 (±0.99)36
Faceted nuclear norm overlap (20%) 18.11 36.86 (±0.90 ·10−3)4.51 (±1.07)36
Faceted nuclear norm overlap (33%) 17.94 36.98 (±1.60 ·10−3)6.00 (±1.05)36
Faceted nuclear norm overlap (50%) 20.75 37.08 (±1.60 ·10−3)7.88 (±0.91)36
Table 2. Spatial faceting experiment: varying size of the overlap region for the faceted nuclear norm regularization. Reconstruction
performance of Faceted HyperSARA with Q=16 and C=1, compared to HyperSARA (i.e. Faceted HyperSARA with Q=C=1) and
SARA. The results are reported in terms of reconstruction time, aSNR and aSNRlog (both in dB with the associated standard deviation),
and total number of CPU cores used to reconstruct the full image. The evolution of the aSNRlog, of specific interest for this experiment,
is highlighted in bold face.
cores to reconstruct each single channel, based on the paral-
lelization strategy proposed by Onose et al. (2016b): 1 mas-
ter CPU core, 2 CPU cores for the data-fidelity terms and 9
CPU cores to handle the average sparsity terms (associated
with the nine bases of the SARA dictionary). HyperSARA
and Faceted HyperSARA have been applied in the follow-
ing configuration, given the different number of visibilities
considered in the two simulation scenarios.
•Spatial faceting : HyperSARA addresses the full prob-
lem (3)-(5) with 22 CPU cores: 1 master CPU core, 20 CPU
cores for the data fidelity terms (1 CPU core per data term),
and 1 CPU core for the regularization term. To address each
sub-problem in (6), Faceted HyperSARA uses 20 CPU cores
for the data fidelity terms (1 CPU core per data channel),
and 1 CPU core for each of the Qfacets. The master process
runs on one of the hyperthreads of the node (logical core)
to ensure communication synchronizations.
12 Thouvenin et al.
Time (h) aSNR (dB) aSNRlog (dB) CPU cores
SARA 6.23 32.78 (±2.76) -1.74 (±0.83) 240
HyperSARA 133.08 38.63 (±0.23) -0.39 (±0.95) 22
Faceted HyperSARA (Q=4)42.04 36.58 (±1.80 ·10−3) 10.19 (±0.88) 24
Faceted HyperSARA (Q=9)21.60 37.00 (±1.70 ·10−3) 5.88 (±1.00) 29
Faceted HyperSARA (Q=16)17.94 37.08 (±1.60 ·10−3) 7.88 (±1.05) 36
Table 3. Spatial faceting experiment: varying number of facets along the spatial dimension Q. Reconstruction performance of Faceted
HyperSARA (C=1, overlap of 50%), compared to HyperSARA (i.e. Faceted HyperSARA with Q=C=1) and SARA. The results are
reported in terms of reconstruction time, aSNR and aSNRlog (both in dB with the associated standard deviation), and total number of
CPU cores used to reconstruct the full image. The evolution of the computing time, of specific interest for this experiment, is highlighted
in bold face.
•Spectral faceting: HyperSARA addresses the full prob-
lem (3)-(5) with 7 CPU cores: 1 master CPU core, 5 CPU
cores for the data fidelity terms (20 data-fidelity terms han-
dled by each core), and 1 CPU core for the regularization
term. To address each sub-problem in (6), Faceted Hyper-
SARA uses 1 master CPU core, 5 CPU cores for the data
fidelity terms, and 1 CPU core per facet (Qfacets in total).
Note that for each experiment, the number of cores as-
signed to each group of cores in Faceted HyperSARA (i.e.
data and facet cores) has been chosen to ensure a reasonable
balance between the different computing tasks.
4.3 Evaluation metrics
Performance is evaluated in terms of global computing time
(elapsed real time) and reconstruction SNR, defined for each
channel l∈ {1, . . . , L}as
SNRl(xl)=20 log10 kxlk2
kxl−xk2.
Results are reported in terms of the average SNR (aSNR)
aSNR(X)=1
L
L
Õ
l=1
SNRl(xl).
Since the above criterion shows limitations to reflect the dy-
namic range and thus appreciate improvements in the qual-
ity of faint emissions, the following criterion is computed
over images in log10 scale
SNRlog,l(xl)=20 log10 klog10(xl+1N)k2
klog10(xl+1N) − log10 (x+1N)k2,
where the log10 function is applied term-wise, and is an
arbitrarily small parameter to avoid numerical issues (is
set to machine precision). Results are similarly reported
in terms of the average log-SNR, defined as aSNRlog(X)=
1
LÍL
l=1SNRlog,l(xl).
4.4 Results and discussion
4.4.1 Spatial faceting
•Varying spatial overlap: The results reported in Table 2
show that spatial faceting gives a good reconstruction of high
intensity pixels (reflected by an aSNR close to HyperSARA).
Even if the performance of the proposed approach does not
vary much in terms of aSNR as the overlap for the faceted
nuclear norm increases, the aSNRlog improves significantly.
This reflects the ability of the proposed prior to enhance the
estimation of faint emissions and finer details by promoting
local correlations. This observation is further confirmed by
the reconstructed images, reported in Jy/pixel in Figure 5
for the channels ν1=1GHz and νL=2GHz, showing that
Faceted HyperSARA reconstructs images with a higher dy-
namic range (see the zoomed region delineated in white in
Figure 5). The associated residual images (last row of Fig-
ure 5) are comparable to or better than HyperSARA. Note
that the regular patterns observed on the residual images do
not result from the faceting, as they are not aligned with the
facet borders and appear for both approaches. From a com-
putational point of view, Table 2shows that increasing the
overlap size results in a moderate increase in the computing
time. Overall, an overlap of 50% gives the best reconstruc-
tion SNR for a reasonable computing time, and will thus
be considered as a default faceting setting for the real data
experiments reported in Sections 5and 6.
•Varying number of facets Qalong the spatial dimen-
sion: The reconstruction performance and computing time
reported in Table 3show that Faceted HyperSARA gives
an almost constant reconstruction performance as the num-
ber of facets increases, for an overall computing time get-
ting closer to the SARA approach. The dynamic range of
the reconstructed images is notably higher for the Faceted
approach, as indicated by the aSNRlog values reported in
Table 3. These results confirm the potential of the proposed
approach to scale to large image sizes by increasing the num-
ber of facets along the spatial dimensions, while ensuring a
stable reconstruction level as the number of facets increases.
In particular, the setting Q=16 is reported to ensure a
satisfactory reconstruction performance for a significantly
reduced computing time.
In both experiments, Faceted HyperSARA has a much
lower SNR standard deviation than HyperSARA and SARA
(see Tables 2and 3), i.e. ensures a more stable recovery qual-
ity across channels. This results from the stronger spatio-
spectral correlations induced by the proposed faceted reg-
ularization, in comparison with both the HyperSARA and
SARA priors.
4.4.2 Spectral faceting
The results reported in Table 4show that Faceted Hyper-
SARA using channel-interleaved facets retains most of the
overall reconstruction performance of HyperSARA, ensur-
ing a reconstruction quality significantly better than SARA.
As expected, the reconstruction quality of faint emissions,
Faceted HyperSARA 13
reflected by the aSNRlog values, gradually decreases as fewer
channels are involved in each facet (i.e. as Cincreases).
This observation is qualitatively confirmed by the images
reported in Figure 6(in Jy/pixel) for facets composed of
10 channels each (see the zoomed regions in Figure 6). The
slight loss of dynamic range is likely due to the reduction in
the amount of data per spectral sub-cube. Spectral faceting
remains however computationally attractive, in that it pre-
serves the overall imaging quality of HyperSARA up to an
already significant amount of interleaving (see discussion
in Section 2.4.1), while allowing lower-dimension wide-band
imaging sub-problems to be considered (see discussion in
Section 2.4). This strategy offers an increased scalability
potential to Faceted HyperSARA over HyperSARA, which
may reveal of significant interest in extreme dimension.
5 VALIDATION ON REAL DATA
In this section, we illustrate both the precision and scala-
bility potential of Faceted HyperSARA through the recon-
struction of a 15 GB image cube of Cyg A from 7.4 GB
of VLA data. The algorithm is mapped on 496 CPU cores
on a high performance computing system, achieving a Ter-
aFLOPS proof of concept. The performance of the proposed
approach is evaluated in comparison with the monochro-
matic imaging approach SARA (Onose et al. 2017) and the
CLEAN-based wide-band imaging algorithm JC-CLEAN in
the software WSCLEAN (Offringa & Smirnov 2017). Note
that HyperSARA (Abdulaziz et al. 2019b) is not considered
in this study due to its prohibitive cost.
5.1 Dataset description and imaging settings
The data analyzed in this section are part of wide-band
VLA observations of the celebrated radio galaxy Cyg A,
acquired over two years (2015-2016) within the frequency
range 2–18 GHz. We consider