Content uploaded by Abdullah Abdulaziz
Author content
All content in this area was uploaded by Abdullah Abdulaziz on May 24, 2022
Content may be subject to copyright.
Imaging and Uncertainty Quantication in Radio
Astronomy via Convex Optimization: When
Precision Meets Scalability
ABDULLAH ABDULAZIZ
A Thesis Submitted for the Degree of
Doctor of Philosophy (PhD)
in
School of Engineering and Physical Sciences
Heriot-Watt University
Edinburgh, United Kingdom
Under the supervision of
Professor Yves Wiaux
·October 2020 ·
The copyright in this thesis is owned by the author. Any quotation from the thesis or use of any
of the information contained in it must acknowledge this thesis as the source of the quotation or
information.
Abstract
Upcoming radio telescopes such as the Square Kilometre Array (SKA) will provide sheer amounts
of data, allowing large images of the sky to be reconstructed at an unprecedented resolution and
sensitivity over thousands of frequency channels. In this regard, wideband radio-interferometric
imaging consists in recovering a 3D image of the sky from incomplete and noisy Fourier data, that
is a highly ill-posed inverse problem. To regularize the inverse problem, advanced prior image
models need to be tailored. Moreover, the underlying algorithms should be highly parallelized to
scale with the vast data volumes provided and the Petabyte image cubes to be reconstructed for
SKA. The research developed in this thesis leverages convex optimization techniques to achieve
precise and scalable imaging for wideband radio interferometry and further assess the degree of
condence in particular 3D structures present in the reconstructed cube.
In the context of image reconstruction, we propose a new approach that decomposes the image
cube into regular spatio-spectral facets, each is associated with a sophisticated hybrid prior image
model. The approach is formulated as an optimization problem with a multitude of facet-based
regularization terms and block-specic data-delity terms. The underpinning algorithmic struc-
ture benets from well-established convergence guarantees and exhibits interesting functionalities
such as preconditioning to accelerate the convergence speed. Furthermore, it allows for paral-
lel processing of all data blocks and image facets over a multiplicity of CPU cores, allowing the
bottleneck induced by the size of the image and data cubes to be eciently addressed via paral-
lelization. The precision and scalability potential of the proposed approach are conrmed through
the reconstruction of a 15 GB image cube of the Cyg A radio galaxy.
In addition, we propose a new method that enables analyzing the degree of condence in
particular 3D structures appearing in the reconstructed cube. This analysis is crucial due to the
high ill-posedness of the inverse problem. Besides, it can help in making scientic decisions on
the structures under scrutiny (e.g., conrming the existence of a second black hole in the Cyg A
galaxy). The proposed method is posed as an optimization problem and solved eciently with
a modern convex optimization algorithm with preconditioning and splitting functionalities. The
simulation results showcase the potential of the proposed method to scale to big data regimes.
I
Acknowledgements
I would like to acknowledge the contributions of several people who helped me to achieve this
success.
My immense gratitude to my supervisor, Prof Yves Wiaux, the person who believed in me and
kept pushing my limits. It was my honor to be a member of the BASP research group. I am very
grateful to the current and old Postdocs of the BASP group, Dr Alexandru Onose, Dr Audrey
Repetti, Dr Pierre-Antoine Thouvenin, and Dr Ming Jiang for all the help and knowledge they
provided me, and a special thanks to Dr Arwa Dabbech, my co-supervisor, for all her help, her
availability, and her toleration at all times. I was also lucky to have bright colleagues and friends
during my PhD journey, I would like to thank Dr Marica Pesce, Dr Jasleen Birdi, Dr Roberto
Duarte Coello, Mr Matthieu Terris, Dr Ahmed Karam Eldaly, Mr Quentin Legros, and Mr Amir
Matin for all the scientic discussions and enjoyable times.
I would like to thank my thesis examiners, Prof Jean-Luc Starck and Dr Yoann Altmann, for
their invaluable time in reading my manuscript, critically commenting on it, and providing very
constructive feedback. I would like also to thank Dr Joao Mota for monitoring my online VIVA
and making sure that I am comfortable during the whole process.
I am deeply grateful to Prof Stephen McLaughlin and Dr Yoann Altmann for oering me a
Postdoctoral position to pursue my research journey in a very comfortable and competitive research
environment. Their encouragement and support during the nal stage of my PhD and through the
COVID-19 circumstances are widely appreciated. I am also grateful to Dr Abderrahim Halimi for
his continuous advice and support. He has been a great example for me and a dear friend. I am
indebted for your kindness.
I give my sincerest thanks to my parents and my sisters whose belief in me surpasses my own.
Without their constant support and prayers, I would not have reached this point.
Last but not least, I express my heartfelt gratitude to Lolity, my wife and second-half. Thank
you for your constant devotion and care. Your love and encouragement when the times got rough
is deeply appreciated and duly noted. No success in the world is worth it unless I can share it with
you ♡
Edinburgh, October 2020
II
Publications Related to the PhD
Thesis
Journal papers
[1] Thouvenin, P.A., Abdulaziz, A., Jiang, M., Dabbech, A., Repetti, A., Jackson, A., Thiran
J.-P. and Wiaux, Y., 2020, “Parallel faceted imaging in radio interferometry via proximal splitting
(Faceted HyperSARA): when precision meets scalability”. arXiv preprint.
[2] Abdulaziz, A., Dabbech, A. and Wiaux, Y., 2019. “Wideband super-resolution imaging in
radio interferometry via low rankness and joint average sparsity models (HyperSARA)”. Monthly
Notices of the Royal Astronomical Society, 489(1), pp.1230-1248.
[3] Dabbech, A., Onose, A., Abdulaziz, A., Perley, R.A., Smirnov, O.M. and Wiaux, Y., 2018.
“Cygnus A super-resolved via convex optimization from VLA data”. Monthly Notices of the Royal
Astronomical Society, 476(3), pp.2853-2866.
Conference papers
[1] Thouvenin, P.A., Abdulaziz, A., Jiang, M., Repetti, A. and Wiaux, Y., 2019, July. “A Faceted
Prior for Scalable Wideband Imaging: Application to Radio Astronomy. In IEEE International
Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).
[2] Thouvenin, P.A., Abdulaziz, A., Jiang, M., Repetti, A. and Wiaux, Y., “A Faceted Prior for
Scalable Wideband Computational Imaging”. 2019, April. In Signal Processing with Adaptive
Sparse Structured Representations (SPARS) workshop.
[3] Abdulaziz, A., Repetti, A. and Wiaux, Y., “Hyperspectral Uncertainty Quantication by Op-
III
timization”. 2019, April. In Signal Processing with Adaptive Sparse Structured Representations
(SPARS) workshop.
[4] Terris, M., Abdulaziz, A., Dabbech, A., Jiang, M., Repetti, A., Pesquet, J.C. and Wiaux, Y.,
2019, April. “Deep Post-Processing for Sparse Image Deconvolution”. In Signal Processing with
Adaptive Sparse Structured Representations (SPARS) workshop.
[5] Abdulaziz, A., Onose, A., Dabbech, A. and Wiaux, Y., 2017, January. “A distributed algo-
rithm for wide-band radio-interferometry”. In International Biomedical and Astronomical Signal
Processing (BASP) Frontiers Workshop 2017.(Best contribution)
[6] Abdulaziz, A., Dabbech, A., Onose, A. and Wiaux, Y., 2016, August. A low-rank and joint-
sparsity model for hyper-spectral radio-interferometric imaging. In 2016 24th European Signal
Processing Conference (EUSIPCO) IEEE. (pp. 388-392). (Best paper)
IV
Notations
General Rules
z: a scalar.
z: a vector.
zi: the i-th component of the vector z.
Z: a matrix.
Z†: the adjoint of the matrix Z.
zl=Zl: the l-th column of the matrix Z.
Z†
n: the n-th row of the matrix Z.
zn,l : the component in the n-th row and the l-th column of the matrix Z.
1N: ones row vector of size N.
IN: the identity matrix of size N×N.
∥x∥p: the ℓpnorm of the vector x.
∥X∥F: the Frobenius norm of the matrix X.
|x|: the absolute value of the argument x.
Acronyms
RI: radio-interferometry or radio-interferometric.
FAST: the Five hundred meter Aperture Spherical Telescope.
VLA: the Karl G. Jansky Very Large Array.
LOFAR: the LOw Frequency ARray.
SKA: the Square Kilometre Array.
PSF: point spread function.
DDE: direction dependent eect.
2D: two-dimensional.
3D: three-dimensional.
V
GB: gigabyte.
FoV: eld of view.
FFT: fast Fourier transform.
DCT: discrete cosine transform.
MAP: maximum a posteriori.
CS: compressive sensing.
prox: proximity operator.
S: soft-thresholding operator.
P: projection operator.
SNR: signal-to-noise ratio.
iSNR: input signal-to-noise ratio.
SR: sampling rate.
SM: similarity metric.
STD: standard deviation.
FB: the forward-backward algorithm [33,58].
FISTA: the fast iterative shrinkage-thresholding algorithm [9].
PD: The primal-dual algorithm [28,37,72,93,123].
PDFB: the primal-dual algorithm with forward-backward iterations [72].
NNLS: the non-negative least-squares algorithm.
JC-CLEAN: the joined-channel CLEAN algorithm [85].
SARA: the sparsity averaging reweighted analysis approach [26].
HyperSARA: the proposed approach in Chapter 4 to solve the wideband RI imaging problem.
Faceted HyperSARA: the faceting-based approach proposed in Chapter 5 for scalable wideband
RI imaging.
HyperSARA-UQ: the proposed approach in Chapter 6 to solve the wideband uncertainty quanti-
cation problem in RI.
VI
Contents
Abstract I
Acknowledgements II
Publications Related to the PhD Thesis V
List of Figures XIII
List of Tables XXII
List of Algorithms XXIV
1 Introduction 1
1.1 Radio interferometry ................................... 2
1.2 Motivation ........................................ 3
1.3 Thesis outline ....................................... 4
2 Wideband radio interferometry 6
2.1 Introduction ........................................ 6
2.2 Spatial coherence function ................................ 7
2.3 Continuous RI data model ................................ 9
2.3.1 The eect of discrete sampling ......................... 10
2.3.2 The eect of the primary beam ......................... 12
2.3.3 Weighting schemes ................................ 12
2.4 Discrete RI data model ................................. 13
2.5 Radio-interferometric imaging .............................. 14
2.5.1 CLEAN-based algorithms ............................ 15
2.5.2 Bayesian inference techniques .......................... 16
2.5.3 Optimization algorithms ............................. 17
2.6 Conclusions ........................................ 18
VII
CONTENTS CONTENTS
3 Sparse representation and convex optimization 19
3.1 Introduction ........................................ 19
3.2 RI imaging problem formulation ............................ 20
3.3 Compressive sensing and sparse representation .................... 21
3.3.1 ℓ1minimization .................................. 22
3.3.2 Reweighted-ℓ1minimization ........................... 23
3.3.2.1 SARA for RI imaging ......................... 23
3.4 Convex optimization ................................... 24
3.4.1 Proximal splitting algorithms .......................... 25
3.4.2 Primal-dual .................................... 26
3.4.3 Convex optimization for wideband RI imaging - revisited .......... 28
3.5 Conclusions ........................................ 30
4 Wideband super-resolution imaging in RI 31
4.1 Motivation ........................................ 32
4.2 HyperSARA: optimization problem ........................... 32
4.2.1 Low-rankness and joint sparsity sky model .................. 33
4.2.2 HyperSARA minimization task ......................... 34
4.3 HyperSARA: algorithmic structure ........................... 36
4.3.1 HyperSARA in a nutshell ............................ 37
4.3.2 Underlying primal-dual forward-backward algorithm ............. 37
4.3.3 Adaptive ℓ2bounds adjustment ......................... 39
4.3.4 Weighting schemes ................................ 40
4.4 Simulations ........................................ 43
4.4.1 Simulations settings ............................... 43
4.4.2 Benchmark algorithms .............................. 44
4.4.3 Imaging quality assessment ........................... 45
4.4.4 Imaging results .................................. 47
4.5 Application to real data ................................. 52
4.5.1 Data and imaging details ............................ 52
4.5.2 Imaging quality assessment ........................... 56
4.5.3 Real imaging results ............................... 57
4.6 Conclusions ........................................ 63
5 Faceted HyperSARA for wideband RI imaging: when precision meets scalability 69
5.1 Motivation ........................................ 70
5.2 Proposed faceting and Faceted HyperSARA approach ................ 71
5.2.0.1 Spectral faceting ............................ 71
VIII
CONTENTS CONTENTS
5.2.0.2 Spatial faceting ............................ 73
5.3 Algorithm and implementation ............................. 75
5.3.1 Faceted HyperSARA algorithm ......................... 75
5.3.2 Underpinning primal-dual forward-backward algorithm ............ 76
5.3.3 Parallel algorithmic structure .......................... 77
5.3.4 MATLAB implementation ............................ 78
5.4 Validation on synthetic data ............................... 78
5.4.1 Simulation setting ................................ 79
5.4.1.1 Images and data ............................ 79
5.4.1.2 Spatial faceting ............................ 80
5.4.1.3 Spectral faceting ............................ 82
5.4.2 Hardware ..................................... 82
5.4.3 Evaluation metrics ................................ 82
5.4.4 Results and discussion .............................. 83
5.4.4.1 Spatial faceting ............................ 83
5.4.4.2 Spectral faceting ............................ 84
5.5 Validation on real data .................................. 84
5.5.1 Dataset description and imaging settings ................... 90
5.5.2 Hardware ..................................... 91
5.5.3 Evaluation metrics ................................ 91
5.5.4 Results and discussion .............................. 92
5.5.4.1 Imaging quality ............................ 92
5.5.4.2 Computing cost ............................ 94
5.6 Conclusions ........................................ 94
6 Wideband uncertainty quantication by convex optimization 99
6.1 Motivation ........................................ 99
6.2 Wideband uncertainty quantication approach .................... 100
6.2.1 Bayesian hypothesis test ............................. 100
6.2.2 Choice of the set S................................ 103
6.3 Proposed minimization problem ............................. 104
6.4 Proposed algorithmic structure ............................. 105
6.4.1 Epigraphical splitting .............................. 105
6.4.2 Underpinning primal-dual forward-backward algorithm ............ 106
6.5 Validation on synthetic data ............................... 107
6.5.1 Simulation setting ................................ 107
6.5.2 Uncertainty quantication parameter ...................... 109
IX
CONTENTS CONTENTS
6.5.3 Results and discussion .............................. 110
6.6 Conclusions ........................................ 111
7 Conclusions and perspectives 120
7.1 Perspectives ........................................ 121
Appendices 123
.1 Basic denitions in convex optimization ........................ 124
.2 Proximity operators ................................... 125
.3 Overview of the parameters specic to the adaptive PDFB algorithm (Algorithm 3) 128
.4 Randomized PDFB algorithm .............................. 129
.4.1 Simulations and results ............................. 129
Bibliography 131
X
List of Figures
1.1 The full electromagnetic spectrum. Credit: NASA public domain image, CC-BY-SA
3.0. ............................................. 2
2.1 A distant source at direction son the celestial sphere observed by an antenna pair
(r1,r2)........................................... 9
2.2 VLA uv-coverage at the frequency ν= 8 GHz generated using: (a) A conguration.
(b) C conguration. ................................... 10
2.3 VLA uv-coverage generated: (a) at the frequency ν= 8 GHz. (b) at the frequency
ν= 4 GHz. ........................................ 11
2.4 Dirty beams associated with the VLA uv-coverage generated: (a) at the frequency
ν= 8 GHz. (b) at the frequency ν= 4 GHz. ..................... 12
4.1 Schematic diagram at iteration tin the adaptive PDFB, detailed in Algorithm 3.
It showcases the parallelism capabilities and overall computation ow. Intuitively,
each forward-backward step in data, prior and image space can be viewed as a
CLEAN-like iteration. The overall algorithmic structure then intuitively takes the
form of an interlaced and parallel multi-space version of CLEAN. ......... 41
4.2 Simulations using realistic VLA uv-coverage: (a) The realistic VLA uv-coverages of
all the channels projected onto one plane. (b) Channel ν1of the simulated wideband
model cube, a 256 ×256 region of the W28 supernova remnant, shown in log10 scale. 44
4.3 Simulations using random sampling with a Gaussian density prole: aSNR results
for the proposed approach HyperSARA and the benchmark methods LRJAS, JAS,
LR and the monochromatic approach SARA. The aSNR values of the estimated
model cubes (y-axis) are plotted as a function of the sampling rate (SR) (x-axis).
Each point corresponds to the mean value of 5noise realizations. The results are
displayed for dierent model cubes varying the number of channels Land the input
signal-to-noise ratio iSNR. (a) L= 60 channels and iSNR = 40 dB. (b) L= 15
channels and iSNR = 40 dB. (c) L= 60 channels and iSNR = 20 dB. (d) L= 15
channels and iSNR = 20 dB. .............................. 48
XI
LIST OF FIGURES LIST OF FIGURES
4.4 Simulations with realistic VLA uv-coverage: reconstructed images of channel ν1=
1.4GHz obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR
= 60 dB. From left to right: results of HyperSARA (aSNR = 30.13 dB), LRJAS
(aSNR = 28.85 dB), JAS (aSNR = 25.97 dB) and LR (aSNR = 26.75 dB). From
top to bottom: the estimated model images in log10 scale, the absolute value of the
error images in log10 scale and the naturally-weighted residual images in linear scale. 49
4.5 Simulations with realistic VLA uv-coverage: reconstructed images of channel ν60 =
2.78 GHz obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR
= 60 dB. From left to right: results of HyperSARA (aSNR = 30.13 dB), LRJAS
(aSNR = 28.85 dB), JAS (aSNR = 25.97 dB) and LR (aSNR = 26.75 dB). From
top to bottom: the estimated model images in log10 scale, the absolute value of the
error images in log10 scale and the naturally-weighted residual images in linear scale. 50
4.6 Simulations with realistic VLA uv-coverage: reconstructed spectra of three selected
pixels obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR = 60
dB. The results are shown for: (b) the proposed approach HyperSARA, (c) LRJAS,
(d) JAS and (e) LR, compared with the ground-truth. Each considered pixel is
highlighted with a colored circle in the ground-truth image x1displayed in (a). . 51
4.7 Simulations with realistic VLA uv-coverage: reconstructed images of channel ν1=
1.4GHz obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR
= 60 dB. From left to right: results of the proposed approach HyperSARA (aSNR =
30.13 dB), the monochromatic approach SARA (aSNR = 23.46 dB) and JC-CLEAN
(aSNR = 9.39 dB). From top to bottom (rst and second columns): the estimated
model images in log10 scale, the absolute value of the error images in log10 scale and
the naturally-weighted residual images in linear scale. From top to bottom (third
column): the estimated restored images in log10 scale, the absolute value of the error
images in log10 scale and the Briggs-weighted residual images in linear scale. . . . 53
4.8 Simulations with realistic VLA uv-coverage: reconstructed images of channel ν60 =
2.78 GHz obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR
= 60 dB. From left to right: results of the proposed approach HyperSARA (aSNR =
30.13 dB), the monochromatic approach SARA (aSNR = 23.46 dB) and JC-CLEAN
(aSNR = 9.39 dB). From top to bottom (rst and second columns): the estimated
model images in log10 scale, the absolute value of the error images in log10 scale and
the naturally-weighted residual images in linear scale. From top to bottom (third
column): the estimated restored images in log10 scale, the absolute value of the error
images in log10 scale and the Briggs-weighted residual images in linear scale. . . . 54
XII
LIST OF FIGURES LIST OF FIGURES
4.9 Simulations with realistic VLA uv-coverage: reconstructed spectra of three selected
pixels obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR = 60
dB. The results are shown for: (b) the proposed approach HyperSARA, (c) the
monochromatic approach SARA and (d) JC-CLEAN, compared with the ground-
truth. Each considered pixel is highlighted with a colored circle in the ground-truth
image x1displayed in (a). ............................... 55
4.10 Cyg A: recovered images of channel ν1= 2.04 GHz at 2.5times the nominal resolu-
tion at the highest frequency νL. From top to bottom: estimated model images of
the proposed approach HyperSARA, estimated model images of the monochromatic
approach SARA and estimated restored images of JC-CLEAN using Briggs weight-
ing. The full images are displayed in log10 scale (rst column) as well as zooms on
the east jet hotspot (second column) and the west jet hotspot (third column). . . 59
4.11 Cyg A: recovered images of channel ν32 = 5.96 GHz at 2.5times the nominal reso-
lution at the highest frequency νL. From top to bottom: estimated model images of
the proposed approach HyperSARA, estimated model images of the monochromatic
approach SARA and estimated restored images of JC-CLEAN using Briggs weight-
ing. The full images are displayed in log10 scale (rst column) as well as zooms on
the east jet hotspot (second column) and the west jet hotspot (third column). . . 60
4.12 Cyg A: naturally-weighted residual images obtained by the proposed approach Hy-
perSARA (left) and the monochromatic approach SARA (right). (a) Channel
ν1= 2.04 GHz, and (b) Channel ν32 = 5.96 GHz. The aSTD values are 1.19 ×10−2
and 8.7×10−3, respectively. .............................. 61
4.13 Cyg A: Briggs-weighted residual images obtained by the proposed approach Hyper-
SARA (left) and JC-CLEAN (right). (a) Channel ν1= 2.04 GHz, and (b) Channel
ν32 = 5.96 GHz. The aSTD values are 4.1×10−3and 2.1×10−3, respectively. . . 61
4.14 Cyg A: reconstructed spectra of selected pixels and point-like sources obtained by
the dierent approaches. Each considered pixel (P) or source (S) is highlighted with
a red circle on the estimated model image of HyperSARA at channel ν32 = 5.96
GHz displayed in (a). .................................. 62
4.15 G055.7+3.4: recovered images of channel ν1= 1.444 GHz at 2times the nominal
resolution at the highest frequency νL. From top to bottom: estimated model images
of the proposed approach HyperSARA, estimated model images of the monochro-
matic approach SARA and estimated restored images of JC-CLEAN using Briggs
weighting. The full images are displayed in log10 scale (rst column) as well as zoom
on the central region (second column). ......................... 64
XIII
LIST OF FIGURES LIST OF FIGURES
4.16 G055.7+3.4: recovered images of channel ν30 = 1.89 GHz at 2times the nominal res-
olution at the highest frequency νL. From top to bottom: estimated model images
of the proposed approach HyperSARA, estimated model images of the monochro-
matic approach SARA and estimated restored images of JC-CLEAN using Briggs
weighting. The full images are displayed in log10 scale (rst column) as well as zoom
on the central region (second column). ......................... 65
4.17 G055.7+3.4: naturally-weighted residual images obtained by the proposed approach
HyperSARA (left) and the monochromatic approach SARA (right). (a) Channel
ν1= 1.444 GHz, and (b) Channel ν30 = 1.89 GHz. The aSTD values are 6.55×10−5
and 8.37 ×10−5, respectively. .............................. 66
4.18 G055.7+3.4: Briggs-weighted residual images obtained by the proposed approach
HyperSARA (left) and JC-CLEAN (right). (a) Channel ν1= 1.444 GHz, and (b)
Channel ν30 = 1.89 GHz. The aSTD values are 1.12 ×10−4and 7.75 ×10−5,
respectively. ........................................ 67
4.19 G055.7+3.4: reconstructed spectra of selected pixels and point-like sources obtained
by the dierent approaches. Each considered pixel (P) or source (S) is highlighted
with a red circle on the estimated model image of HyperSARA at channel ν30 = 1.89
GHz (rst row). ...................................... 68
5.1 Illustration of the proposed faceting scheme, using a 2-fold spectral interleaving
process and 9-fold spatial tiling process. The full image cube variable (a) is divided
into two spectral sub-cubes (b) with interleaved channels (for a 2-fold interleaving,
even and odd channels respectively dene a sub-cube). Each sub-cube is spatially
faceted. A regular tessellation (dashed red lines) is used to dene spatio-spectral
tiles. The spatio-spectral facets result from the augmentation of each tile to produce
an overlap between facets (solid red lines). Panel (c) shows a single facet (left), as
well as the spatial weighting scheme (right) with linearly decreasing weights in the
overlap region. Note that, though the same tiling process is underpinning the nuclear
norm and ℓ21 norm regularization terms, the denition of the appropriate overlap
region is specic to each of these terms (via the selection operators Sqand e
Sqin
(5.3)). ........................................... 72
XIV
LIST OF FIGURES LIST OF FIGURES
5.2 Illustration of the communication steps involving a facet core (represented by the
top-left rectangle in each sub-gure) and a maximum of three of its neighbours. The
tile underpinning each facet, located in its bottom-right corner, is delineated in thick
black lines. At each iteration, the following two steps are performed sequentially.
(a) Facet borders need to be completed before each facet is updated independently
in the dual space (Algorithm 5 lines 11–17): values of the tile of each facet (top left)
are broadcast to cores handling the neighbouring facets in order to update their
borders (Algorithm 5 line 5). (b) Parts of the facet tiles overlapping with borders of
nearby facets need to be updated before each tile is updated independently in the
primal space (Algorithm 5 line 24): values of the parts of the borders overlapping
with the tile of each facet are broadcast by the cores handling neighbouring facets,
and averaged. ...................................... 78
5.3 Illustration of the two groups of cores described in Section 5.3 with the main steps
involved in PDFB (Algorithm 5) applied to each independent sub-problem c∈
{1, . . . , C}, considering Qfacets (along the spatial dimension) and B= 1 data block
per channel. Data cores handle variables of the size of data blocks (Algorithm 5
lines 19–21), whereas facet cores handle variables of the size of a spatio-spectral facet
(Algorithm 5 lines 11–17), respectively. Communications between the two groups
are represented by colored arrows. Communications between facet cores, induced
by the overlap between the spatio-spectral facets, are illustrated in Figure 5.2. . . . 79
5.4 Spatial faceting analysis for synthetic data: reconstructed images (in Jy/pixel) re-
ported in log10 scale for channel ν1= 1 GHz for Faceted HyperSARA with Q= 16
and C= 1 (left), and HyperSARA (i.e. Faceted HyperSARA with Q=C= 1,
right). From top to bottom are reported the ground truth image, the reconstructed
and residual images. The overlap for the faceted nuclear norm regularization corre-
sponds to 50% of the spatial size of a facet. The non-overlapping tiles underlying
the denition of the facets are delineated on the residual images in red dotted lines,
with the central facet displayed in continuous lines. .................. 86
5.5 Spatial faceting analysis for synthetic data: reconstructed images (in Jy/pixel) re-
ported in log10 scale for channel ν20 = 2 GHz for Faceted HyperSARA with Q= 16
and C= 1 (left), and HyperSARA (i.e. Faceted HyperSARA with Q=C= 1,
right). From top to bottom are reported the ground truth image, the reconstructed
and residual images. The overlap for the faceted nuclear norm regularization corre-
sponds to 50% of the spatial size of a facet. The non-overlapping tiles underlying
the denition of the facets are delineated on the residual images in red dotted lines,
with the central facet displayed in continuous lines. .................. 87
XV
LIST OF FIGURES LIST OF FIGURES
5.6 Spectral faceting analysis for synthetic data: reconstructed images (in Jy/pixel)
reported in log10 scale for channel ν1= 1 GHz with Faceted HyperSARA for C= 10
and Q= 1 (left) and HyperSARA (i.e. Faceted HyperSARA with Q=C= 1, right).
Each sub-cube is composed of 10 out of the L= 100 channels. From top to bottom:
ground truth image, estimated model images and residual images. ......... 88
5.7 Spectral faceting analysis for synthetic data: reconstructed images (in Jy/pixel)
reported in log10 scale for channel ν100 = 2 GHz with Faceted HyperSARA for C=
10 and Q= 1 (left) and HyperSARA (i.e. Faceted HyperSARA with Q=C= 1,
right). Each sub-cube is composed of 10 out of the L= 100 channels. From top to
bottom: ground truth image, estimated model images and residual images. . . . . 89
5.8 Cyg A imaged at the spectral resolution 8MHz from 7.4 GB of data. Imaging results
of channel ν1= 3.979 GHz. Estimated images at the angular resolution 0.06′′ (3.53
times the observations spatial bandwidth). From top to bottom: the respective
estimated model images of the proposed Faceted HyperSARA (Q= 15,C= 16)
and SARA, both in units of Jy/pixel, and restored image of JC-CLEAN in units
of Jy/beam. The associated synthesized beam is of size 0.37′′ ×0.35′′ and its ux
is 42.18 Jy.The full FoV images (log10 scale) are overlaid with the residual images
(bottom right, linear scale) and zooms on selected regions in Cyg A (top left, log10
scale). These correspond to the west hotspot (left) and the inner core of Cyg A
(right). The zoomed regions are displayed with dierent value ranges for contrast
visualization purposes and are highlighted with white boxes in the full images. Cyg
A-2 location is highlighted with a white dashed circle. Negative pixel values of JC-
CLEAN restored image and associated zooms are set to 0 for visualization purposes.
Full image cubes are available online [117]. ...................... 95
XVI
LIST OF FIGURES LIST OF FIGURES
5.9 Cyg A imaged at the spectral resolution 8MHz from 7.4 GB of data. Reconstruction
results of channel ν480 = 8.019 GHz. Estimated images at the angular resolution
0.06′′ (1.75 times the observations spatial bandwidths). From top to bottom: the
respective estimated model images of the proposed Faceted HyperSARA (Q= 15,
C= 16) and SARA, both in units of Jy/pixel, and restored image of JC-CLEAN in
units of Jy/beam. The associated synthesized beam is of size 0.17′′ ×0.15′′ and its
ux is 8.32 Jy. The full FoV images (log10 scale) are overlaid with the residual images
(bottom right, linear scale) and zooms on selected regions in Cyg A (top left, log10
scale). These correspond to the west hotspot (left) and the inner core of Cyg A
(right). The zoomed regions are displayed with dierent value ranges for contrast
visualization purposes and are highlighted with white boxes in the full images. Cyg
A-2 location is highlighted with a white dashed circle. Negative pixel values of JC-
CLEAN restored image and associated zooms are set to 0 for visualization purposes.
Full image cubes are available online [117]. ...................... 96
5.10 Cyg A imaged at the spectral resolution 8MHz from 7.4 GB of data. Average
estimated images, computed as the mean along the spectral dimension. From top
to bottom: the respective estimated average model images of the proposed Faceted
HyperSARA (Q= 15,C= 16) and SARA, and the average restored image of JC-
CLEAN (obtained as the mean of the restored images normalized by the ux of
their associated synthesized beam). The full FoV images (log10 scale) are overlaid
with the residual images (bottom right, linear scale) and zooms on selected regions
in Cyg A (top left, log10 scale). These correspond to the west hotspot (left) and the
inner core of Cyg A (right). The zoomed regions are displayed with dierent value
ranges for contrast visualization purposes and are highlighted with white boxes in the
full images. Cyg A-2 location is highlighted with a white dashed circle. Negative
pixel values of JC-CLEAN restored image and associated zooms are set to 0 for
visualization purposes. ................................. 97
6.1 1D illustration of the exact HPD region C∗
αand the approximated one e
Cα. Notice that
C∗
α⊂e
Cα........................................... 102
6.2 Illustration of the proposed method for the two dierent scenarios. Our approach simply
consists in examining the euclidean distance between the two sets Sand e
Cα. Left: there
is no intersection between the two sets, thus H0is rejected at level α. Right: the two sets
intersect, thus one cannot reject H0,i.e., one cannot conclude if the 3D structure exists in
the true image cube or not. ................................ 104
XVII
LIST OF FIGURES LIST OF FIGURES
6.3 Simulations with realistic uv-coverage: (c) Curves representing the values of ραin percent-
age (y-axis) as a function of the sampling rate SR=Ml/N (x-axis), in log10 scale, for the
3D structures of interest. The considered 3D structures are highlighted with rectangles on
channel ν1= 1 GHz (a) and channel ν15 = 2 GHz (b) of the ground-truth image cube, in
log10 scale. Each point corresponds to the mean value of 5tests with dierent antenna
positions and noise realizations, and the vertical bars represents the standard deviation of
the 5tests. ......................................... 113
6.4 Uncertainty quantication of 3D Structure 1: results, reported for channel ν1=
1GHz, are obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB.
The images from top to bottom are: the MAP estimate [X†]1, the uncertainty
quantication results [X‡
e
Cα
]1and [X‡
S]1. The results are given for HyperSARA-
UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 75.21%, and JAS-UQ (right) with MAP estimate aSNR = 30.87 dB
and ρα= 64.53%. All images are displayed in log10 scale and overlaid with zoom
onto the region of Structure 1. ............................. 114
6.5 Uncertainty quantication of 3D Structure 1: results, reported for channel ν15 =
2GHz, are obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB.
The images from top to bottom are: the MAP estimate [X†]15, the uncertainty
quantication results [X‡
e
Cα
]15 and [X‡
S]15. The results are given for HyperSARA-
UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 75.21%, and JAS-UQ (right) with MAP estimate aSNR = 30.87
and ρα= 64.53%. All images are displayed in log10 scale and overlaid with zoom
onto the region of Structure 1. ............................. 115
6.6 Uncertainty quantication of 3D Structure 2: results, reported for channel ν1=
1GHz, are obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB.
The images from top to bottom are: the MAP estimate [X†]1, the uncertainty
quantication results [X‡
e
Cα
]1and [X‡
S]1. The results are given for HyperSARA-
UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 54.91%, and JAS-UQ (right) with MAP estimate aSNR = 30.87
dB and ρα= 45.1%. All images are displayed in log10 scale and overlaid with zoom
onto the region of Structure 2. ............................. 116
XVIII
LIST OF FIGURES LIST OF FIGURES
6.7 Uncertainty quantication of 3D Structure 2: results, reported for channel ν15 =
2GHz, are obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB.
The images from top to bottom are: the MAP estimate [X†]15, the uncertainty
quantication results [X‡
e
Cα
]15 and [X‡
S]15. The results are given for HyperSARA-
UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 75.21%, and JAS-UQ (right) with MAP estimate aSNR = 30.87
and ρα= 64.53%. All images are displayed in log10 scale and overlaid with zoom
onto the region of Structure 2. ............................. 117
6.8 Uncertainty quantication of 3D Structure 3: results, reported for channel ν1=
1GHz, are obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB.
The images from top to bottom are: the MAP estimate [X†]1, the uncertainty
quantication results [X‡
e
Cα
]1and [X‡
S]1. The results are given for HyperSARA-
UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 97.43%, and JAS-UQ (right) with MAP estimate aSNR = 30.87 dB
and ρα= 96.95%. All images are displayed in log10 scale and overlaid with zoom
onto the region of Structure 3. ............................. 118
6.9 Uncertainty quantication of 3D Structure 3: results, reported for channel ν15 =
2GHz, are obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB.
The images from top to bottom are: the MAP estimate [X†]15, the uncertainty
quantication results [X‡
e
Cα
]15 and [X‡
S]15. The results are given for HyperSARA-
UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 97.43%, and JAS-UQ (right) with MAP estimate aSNR = 30.87
and ρα= 96.95%. All images are displayed in log10 scale and overlaid with zoom
onto the region of Structure 3. ............................. 119
1Simulations with VLA uv-coverage: (a) The ground-truth image at the reference frequency
x1. (b) Curves representing the evolution of aSNR (y-axis) as a function of the number of
iterations (x-axis), for the dierent methods: LRJAS, LRJAS-R (LRJAS with randomized
updates) and WDCT. ................................... 130
XIX
List of Tables
5.1 Spatial faceting experiment: varying size of the overlap region for the faceted nuclear
norm regularization. Reconstruction performance of Faceted HyperSARA with Q=
16 and C= 1, compared to HyperSARA (i.e. Faceted HyperSARA with Q=C= 1)
and SARA. The results are reported in terms of reconstruction time, aSNR and
aSNRlog (both in dB with the associated standard deviation), and total number of
CPU cores used to reconstruct the full image. The evolution of the aSNRlog, of
specic interest for this experiment, is highlighted in bold face. ........... 84
5.2 Spatial faceting experiment: varying number of facets along the spatial dimension
Q. Reconstruction performance of Faceted HyperSARA (C= 1, overlap of 50%),
compared to HyperSARA (i.e. Faceted HyperSARA with Q=C= 1) and SARA.
The results are reported in terms of reconstruction time, aSNR and aSNRlog (both in
dB with the associated standard deviation), and total number of CPU cores used to
reconstruct the full image. The evolution of the computing time, of specic interest
for this experiment, is highlighted in bold face. .................... 85
5.3 Spectral faceting experiment: reconstruction performance of Faceted HyperSARA
with a varying number of spectral sub-problems Cand Q= 1, compared to Hy-
perSARA (i.e. Faceted HyperSARA with Q=C= 1) and SARA. The results are
reported in terms of reconstruction time, aSNR and aSNRlog (both in dB with the
associated standard deviation) and total number of CPU cores. The reconstruction
performance of Faceted HyperSARA, specically investigated in this experiment, is
highlighted in bold face. ................................. 85
5.4 Computing cost of Cyg A imaging at the spectral resolution 8MHz from 7.4 GB
of data. Results are reported for Faceted HyperSARA, SARA, and JC-CLEAN in
terms of reconstruction time, number of CPU cores and overall CPU time (high-
lighted in bold face). ................................... 94
1 Overview of the variables employed in the adaptive procedure incorporated in Al-
gorithm 3. ......................................... 128
XX
List of Algorithms
1 Forward-backward primal-dual (PDFB) ......................... 28
2 HyperSARA approach ................................... 40
3 The adaptive PDFB algorithm underpinning HyperSARA ............... 42
4 Faceted HyperSARA approach .............................. 80
5 The PDFB algorithm underpinning Faceted HyperSARA ............... 81
6 Wideband uncertainty quantication by convex optimization ............. 108
XXII
Chapter 1
Introduction
Contents
1.1 Radio interferometry .............................. 2
1.2 Motivation .................................... 3
1.3 Thesis outline .................................. 4
Astronomy is the science of the universe, perhaps the oldest of sciences. It involves study-
ing the celestial objects such as planets, stars and galaxies and phenomena that occur outside
the Earth’s atmosphere. These celestial objects emit electromagnetic radiation along the electro-
magnetic spectrum (Figure 1.1). In the optical band (400 - 700 nm), the dominant emitters are
thermal sources with temperatures in the range 103−104K. Thermal sources outside this range
and non-thermal sources do not emit radiation in the optical band but can be strong emitters in
other bands (e.g., cold sources emit in the infrared band, and very hot objects are rmly active
in the X-rays band). The radio band characterized by a long wavelength and low frequency is of
paramount interest. As opposed to the optical band, it covers a vast range of the electromagnetic
spectrum between around 3 kHz and 300 GHz or equivalently a wavelength range of approximately
100 km to 1 mm. Radio emission produced by a variety of radio sources is not thermal and is called
synchrotron radiation. Unlike thermal emission where the ux density increases with frequency,
the ux density in synchrotron emission increases with wavelength making the radio band the best
candidate to study this active type of sources. The dominant sources in the radio frequencies are
the sun, radio galaxies (the strongest one is the Cyg A galaxy), supernova remnants and pulsars.
The long-wavelength characteristic of the radio waves reduces their absorptivity and scattering
and makes them permeable through large clouds and celestial dust, allowing the detection of star
formation obscured by gas and cosmic dust, as well as the discovery of very far away galaxies.
Furthermore, this allows the detection of the hydrogen spectral line (HI line) at wavelength 21 cm.
Since hydrogen is an extremely abundant element in the universe, the HI line has been extensively
studied to map the structure of nearby galaxies and their kinematics.
1
Chapter 1: Introduction 2
Figure 1.1: The full electromagnetic spectrum. Credit: NASA public domain image, CC-BY-SA 3.0.
1.1 Radio interferometry
Astronomers use telescopes to observe the universe. The angular resolution θ(in radians) of a
telescope is given by [14]
θ= 1.22 λ
D,(1.1)
where λis the observation wavelength and Dis the diameter of the telescope, both in units of
length. Since the wavelengths in radio astronomy are so large, the angular resolution of radio
telescopes is poor even for enormous aperture sizes. For example, the human eye has an angular
resolution of ∼20′′ while the largest single-dish radio telescope on Earth, namely the Five hundred
meter Aperture Spherical Telescope (FAST) in China, with 500 m dish size can only provide an
angular resolution of ∼3′for wavelength range of 30–15 cm [73]. To achieve higher angular
resolutions, we need radio telescopes of much larger aperture size which are impractical to build.
Instead, scientists leveraged interferometry, a pioneering technique led to a Nobel Prize in Physics
in 1974, to achieve higher angular resolutions. A radio interferometer is an array of spatially
separated antennas which collectively simulate a single telescope of huge aperture size. In this
setting, the resolution of an interferometer is dened by the maximum distance separating two
antennas in the array. Radio interferometry has opened the door to probe new regimes of radio
emission with extreme resolution and sensitivity, and to map large areas of the radio sky, deepening
our knowledge in cosmology and astrophysics.
31.2 Motivation
1.2 Motivation
Modern radio interferometers, such as the Karl G. Jansky Very Large Array (VLA) [92], the LOw
Frequency ARray (LOFAR) [122] and the MeerKAT radio telescope [68] provide massive volumes
of data, allowing large images of the sky to be reconstructed at an unprecedented resolution and
dynamic range. In particular, once completed, the upcoming Square Kilometre Array (SKA) [48]
will be the world’s largest radio telescope and will provide data at a rate 10 times larger than today’s
global Internet trac. It will form wideband images about 0.56 Petabyte in size (assuming double
precision) from even larger data volumes [108]. SKA is expected to bring answers to fundamental
questions in astronomy1, such as improving our understanding of cosmology and dark energy [97],
investigating the origin and evolution of cosmic magnetism [56] and probing the early universe
where the rst stars were formed [24]. Since the sky looks very dierent at dierent wavelengths,
it is imperative to use wideband astronomy to be able to understand all the physical processes in
the universe and achieve the expected scientic goals.
Wideband radio-interferometric (RI) imaging consists in forming a 3D image of the radio sky
from under-sampled Fourier data across a whole frequency band. Given the fact that the range
of spatial frequencies covered by a radio interferometer increases as the observation frequency
increases, higher resolution and higher dynamic range image cubes can be formed by jointly re-
covering the spatial and spectral information. To meet the capabilities of modern telescopes, it is
of paramount importance to design ecient wideband imaging algorithms, these need to be able
to recover high-quality images while being highly parallelized to scale with the sheer amount of
wideband data and the large dimension of the wideband image cubes to be estimated.
In this context, we develop a new approach within the versatile framework of convex op-
timization to solve the wideband RI imaging problem. The proposed approach, dubbed “Hy-
perSARA”, leverages low-rankness and joint average sparsity priors to enable the formation of
high-resolution and high-dynamic-range image cubes from RI data. The underlying algorithmic
structure is shipped with highly interesting functionalities such as preconditioning for accelerated
convergence and splitting functionality enabling the decomposition of data into blocks, for parallel
processing of all block-specic data-delity terms of the objective function, and thereby allowing
scalability to large data volumes. Furthermore, it involves an adaptive strategy to estimate the
noise level with respect to calibration errors present in real data. HyperSARA, however, models
the image cube as a single variable, and the computational and storage requirements induced by
complex regularization terms can be prohibitive for very large image cubes.
To alleviate this issue, the same splitting functionality is further exploited to decompose the
target image cube into spatio-spectral facets and enable parallel processing of facet-specic regular-
ization terms in the objective function. The resulting algorithm is dubbed “Faceted HyperSARA”.
1https://www.skatelescope.org/science/
Chapter 1: Introduction 4
Extensive simulation results on synthetic image cubes conrm that faceting can provide a signif-
icant increase in scalability at no cost in imaging quality. A proof-of-concept reconstruction of a
15 GB image cube of Cyg A from 7.4 GB of VLA data, utilizing 496 CPU cores on a high perfor-
mance computing system for 68 hours, conrms both scalability and a quantum jump in imaging
quality from the state-of-the-art CLEAN algorithm [118].
Since the wideband RI imaging problem is highly ill-posed, assessing the degree of condence
in specic 3D structures observed in the estimated cube is very important. More precisely, we
need methods that can tell us how much we are certain about these structures (are they real or
correspond to reconstruction artifacts?). Bayesian methods naturally enable the quantication of
uncertainty around the image estimate. However, this type of approaches usually involve sam-
pling of the full posterior distribution, hence cannot currently scale to the data regime expected
from modern telescopes. Instead, we propose to formulate the problem as a convex minimization
task solved using a sophisticated optimization algorithm. As for HyperSARA and Faceted Hyper-
SARA, the underpinning algorithmic structure benets from preconditioning and parallelization
capabilities, paving the road for scalability to large data sets and image dimensions.
1.3 Thesis outline
The thesis is organized as follows:
• Chapter 2 explains in details the wideband RI measurement framework, giving the reader
all the knowledge to understand the application. Next, we present the ill-posed inverse
problem arising in wideband radio interferometry (RI). Finally, we describe the state-of-
the-art approaches to solve the RI imaging problem. These are CLEAN-based approaches,
Bayesian inference techniques and optimization methods.
• Chapter 3 provides all the mathematical background for the work developed in this thesis.
We start by formulating the RI inverse problem as an optimization task. Second, we explore
the world of compressive sensing and sparse representation and explain sparse recovery ap-
proaches. At this point, we introduce convex optimization methods as a powerful tool to
solve convex minimization problems; we rst describe the proximal splitting methods. Next,
we put particular emphasis on the primal-dual framework adopted in this work. We nally
revisit the convex optimization methods developed in the literature to solve the wideband
RI imaging problem.
• Chapter 4 gives a complete description of the HyperSARA approach proposed for solving the
wideband RI imaging problem. We start by explaining the low-rankness and joint sparsity
priors adopted for wideband RI imaging. After that, we present the HyperSARA optimiza-
tion problem and the underlying algorithmic structure. Finally, we provide analysis of the
51.3 Thesis outline
proposed approach and comparison with the benchmark methods on simulations and VLA
observations of Cyg A and the supernova remnant G055.7+3.4.
• Chapter 5 introduces the Faceted HyperSARA approach, further development of HyperSARA
with fully distributed implementation. We begin by presenting the minimization problem and
the proposed spatio-spectral faceted prior. Afterwards, we describe the algorithmic structure
used to solve the resulting problem, along with the dierent levels of parallelization exploited.
Later, we analyze the performance and scalability potential of Faceted HyperSARA on ex-
tensive simulations and real data with a reconstruction of a 15 GB image cube of Cyg A
from 7.4 GB of VLA data.
• Chapter 6 describes our uncertainty quantication approach for wideband RI imaging. First,
we postulate a Bayesian hypothesis test and propose a convex optimization problem to formu-
late this test. Next, we present the underpinning algorithmic structure and the epigraphical
splitting technique exploited to solve the minimization problem. Eventually, we showcase
the performance of our approach on realistic simulations.
• Chapter 7 presents conclusions and nal remarks regarding the methods developed in this
thesis. We further shed light on possible directions of future work to take the current work
steps ahead toward scalability to the era of the SKA telescope.
Chapter 2
Wideband radio interferometry
Contents
2.1 Introduction ................................... 6
2.2 Spatial coherence function ........................... 7
2.3 Continuous RI data model ........................... 9
2.3.1 The eect of discrete sampling ........................ 10
2.3.2 The eect of the primary beam ....................... 12
2.3.3 Weighting schemes .............................. 12
2.4 Discrete RI data model ............................. 13
2.5 Radio-interferometric imaging ........................ 14
2.5.1 CLEAN-based algorithms .......................... 15
2.5.2 Bayesian inference techniques ........................ 16
2.5.3 Optimization algorithms ........................... 17
2.6 Conclusions .................................... 18
2.1 Introduction
A radio-interferometer is an array of spatially separated antennas probing the radio waves emanat-
ing from astrophysical sources in the space. Each antenna pair gives access to a radio measurement,
dubbed visibility, corresponding to the correlation of the sensed electromagnetic eld at the posi-
tions of the antennas.
This chapter is structured as follows. The spatial coherence function is introduced in Section
2.2. In Section 2.3, we describe the radio-interferometric (RI) measurement equation that can be
seen under some assumptions as spatial Fourier transform of the sky brightness distribution. The
RI discrete measurement model is presented in Section 2.4. RI imaging methods are discussed in
6
72.2 Spatial coherence function
Section 2.5. These are CLEAN-based approaches, Bayesian inference techniques and optimization
methods. Finally, conclusions are stated in Section 2.6.
It is worth noting that the derivation of the measurement model is based on [18,41,116].
2.2 Spatial coherence function
When an astrophysical phenomenon occurs at location Rin the universe, an electromagnetic signal
propagates from the location Rand arrives to a point rwhere can be conveniently observed by the
radio antennas. Since sources of interest are typically very far from Earth, all we can measure is the
surface brightness of the emitting source, and we cannot describe the depth of the emitting source.
A convenient way to express this assumption in radio astronomy is to assume that all astronomical
sources lie on the so-called celestial sphere, dened as a huge sphere of radius |R|=Rwhich within
there are no radiating sources, and measure the distribution of the electromagnetic radiation on
the surface of the sphere. For simplicity reasons, we consider only a monochromatic component
E(r, ν)of the electromagnetic eld at a frequency νrealizing that the entire electromagnetic
eld can be determined by the summation of all the frequency components. Also by ignoring all
polarization phenomena, the electromagnetic radiation measured at the point rcan be seen as a
scalar quantity E(r, ν). Then, the electromagnetic eld due to all sources of cosmic electromagnetic
radiation measured at the location rcan be written as
E(r, ν) = ZE(R, ν )e2iπ ν
c|R−r|
|R−r|dS, (2.1)
where E(R, ν)is the distribution of the electromagnetic eld on the surface of the sphere. dSis
a surface element on the celestial sphere with the integration done over the entire sphere and cis
the speed of light.
A radio interferometer is a device that measures the spatial coherence function of the sensed
electromagnetic eld at the positions of the antennas. For two antennas located at r1and r2,
respectively, the spatial coherence function is dened as the expectation of a product:
V(r1,r2, ν) = ⟨E(r1, ν )E∗(r2, ν)⟩,(2.2)
where ⟨.⟩stands for the expectation and ∗is the complex conjugate. By substituting E(r, ν )from
equation (2.1), we get:
V(r1,r2, ν) = Z Z E(R1, ν)E∗(R2, ν )e2iπ ν
c|R1−r1|
|R1−r1|
e−2iπ ν
c|R2−r2|
|R2−r2|dS1dS2,.(2.3)
Assuming that all astronomical sources are spatially incoherent, i.e.,⟨E(R1, ν)E∗(R2, ν )⟩= 0,∀R1̸=
R2and ⟨E(R1, ν)E∗(R2, ν )⟩=⟨|E(R, ν )|2⟩,for R1=R2=R, and considering the great distance
Chapter 2: Wideband radio interferometry 8
of the source, i.e.,|R−r.| ≈ |R|, we write:
V(r1,r2, ν) = ZI(s, ν )e−2iπ ν
c(|Rs−r2|−|Rs−r1|)dΩ,(2.4)
where I(s, ν) = ⟨|E(R, ν )|2⟩is the observed intensity at the direction s=R
|R|on the celestial
sphere and dΩ = dS
|R|2is the solid angle element.
As can be seen in Figure 2.1, we consider the coordinate system (x, y, z)to describe the physical
location rof each antenna on Earth. The z-axis is pointing toward the phase-reference center s0(a
point in the sky which the radio interferometer is steered toward). The (x, y)plane is perpendicular
to the z-axis. In the same coordinate system (x, y, z), the coordinates of a source at direction son
the celestial sphere are given by (l, m, n)with l=cos(θx), m =cos(θy), n =cos(θz).(l, m, n)are
called direction cosines and verify l2+m2+n2= 1 and dΩ = dldm
cos(θz)=dldm
√1−l2−m2. We then have:
|Rs−r|=R[(l−x/R)2+ (m−y/R)2+ (n−z/R)2]1/2(2.5)
≈R[(l2+m2+n2)−2(lx +my +nz)/R]1/2.(2.6)
We can use the binomial approximation1to simplify equation (2.6). Doing so:
|Rs−r| ≈ R−(lx +my +nz).(2.7)
By putting this result in equation (2.4), we get:
V(r1,r2, ν) = ZI(l, m, ν)e−2iπ ν
c[l(x1−x2)+m(y1−y2)+n(z1−z2)] dldm
√1−l2−m2.(2.8)
Note that equation (2.8) depends on the separation vector r1−r2and not on the absolute loca-
tions of the antennas. Therefore, it is customary in radio interferometry (RI) to use the baseline
coordinates. We dene a baseline b1,2∈R3as the vectorial distance between two antennas r1and
r2, and its components (¯u, ¯υ, ¯w)are in units of meter; ¯w=z1−z2denotes the component in the
direction of line of sight and ¯u= (¯u, ¯υ) = (x1−x2, y1−y2)are the coordinates in its perpendicular
plane (in a parallel plane to the l= (l, m)plane). Doing so, we write:
V(¯u,¯w, ν) = Zn(l)−1I(l, ν)e−2iπ ν
c(l·¯u+n¯w)d2l,(2.9)
where n(l)−1=1
√1−l2−m2.
1The binomial approximation states that (1 + x)α≈1 + αx. It is valid when |x|<1and |αx|<< 1.
92.3 Continuous RI data model
Figure 2.1: A distant source at direction son the celestial sphere observed by an antenna pair (r1,r2).
2.3 Continuous RI data model
In RI, the spatial coherence function dened in equation (2.9) is usually referred to as visibility
y(¯u,¯w, ν). It is also typical to represent the direction of the source s= (l, m, n)relative to the
phase-reference center s0= (0,0,1),i.e.,s−s0= (l, m, n −1). In this scenario, we can dene the
radio measurement, the visibility, as:
y(¯u,¯w, ν) = Zn(l)−1I(l, ν)e−2iπ ν
c(l·¯u+(n−1) ¯w)d2l.(2.10)
The Van Cittert-Zernike theorem states that the visibility equation (2.10) can be reduced to a
2D Fourier transform of the sky intensity when the eld of view is small, i.e., all sources lie within
a small region on the celestial sphere which implies n≈1. In practice, this assumption is valid for
elds of width θF OV that satisfy the following condition [18]:
θF OV <1
3pθP B ,(2.11)
where θP B is the half-power width of the primary beam2of the antennas of the radio interferometer
and both θF OV and θP B are measured in radians.
In this setting, which will be adopted in this work, the complex visibility reads
y(¯u, ν) = ZI(l, ν )e−2iπ ν
c(l·¯u)d2l.(2.12)
2The primary beam is explained in Section 2.3.2
Chapter 2: Wideband radio interferometry 10
(a) (b)
Figure 2.2: VLA uv-coverage at the frequency ν= 8 GHz generated using: (a) A conguration. (b) C
conguration.
2.3.1 The eect of discrete sampling
The radio measurements do not cover the whole Fourier plane which makes the problem of recov-
ering the sky image from the Fourier data an ill-posed inverse problem. More precisely, because
of the limited number of antennas, the Fourier samples are measured at specic spatial frequen-
cies ν
c(¯u, ¯v). For a radio-interferometer with nantennas, n(n−1)/2complex measurements are
obtained at each time instant. When the studied radio sources are known to be constant over the
observation time, we do not need to measure all the visibilities at the same time. (e.g., the data
of the Cyg A galaxy reported in this thesis are acquired over two years (2015-2016)). One way
of acquiring more measurements over time for the same radio sky is by changing the positions of
the antennas. This scenario is valid for the VLA array, which has mobile antennas that can form
many congurations known as A, B, C, and D (Figure 2.2). The method of gradually lling the
spaces in the Fourier plane is referred to as aperture synthesis. More practically, with the Earth
rotation, the coordinates ν
c(¯u, ¯v)of a baseline change with time as they are relative to the direc-
tion of sight. This results in elliptical tracks in the ν
c(¯u, ¯v)plane corresponding to each baseline,
and hence a denser sampling. Furthermore, more frequency samples can be probed by observing
at dierent wavelengths. In fact, the Fourier sampling in radio interferometry is such that high
Fourier coecients are probed at high-frequency channels, and low Fourier coecients are probed
at low-frequency channels (Figure 2.3).
The measured visibilities at a given frequency νidentify the so-called uv-coverage of the radio-
interferometer at that frequency. The uv-coverage is dened by the array conguration (i.e., the
positions of the antennas), the direction of observation, the time dierence between consecutive
measurements and the total observation time.
Let S(¯u, ν)be the sampling function, that is equal to 1 for a measured ν
c(¯u, ¯v)point and 0
11 2.3 Continuous RI data model
(a) (b)
Figure 2.3: VLA uv-coverage generated: (a) at the frequency ν= 8 GHz. (b) at the frequency
ν= 4 GHz.
otherwise, the continuous measured visibility at a frequency νdened in (2.12) resolves to
y(¯u, ν) = S(¯u, ν )ZI(l, ν)e−2iπ ν
c(l·¯u)d2l.(2.13)
The process of deriving the image from the visibilities is called mapping in radio astronomy. The
direct Fourier inversion of the measured visibilities is called dirty image (or dirty map):
Idirty(l, ν ) = ZS(¯u, ν )y(¯u, ν)e2iπ ν
c(l·¯u)d2u,(2.14)
and the point spread function (PSF) of the instrument, also known as the dirty beam, is given by
the inverse Fourier transform of the sampling function:
B(l, ν) = ZS(¯u, ν)e2iπ ν
c(l·¯u)d2u.(2.15)
The shape of the dirty beam B(:, ν )is a function of the uv-coverage at the frequency ν(Figure
2.4). For a fully sampled uv-coverage, the shape of the dirty beam is a sinc function whose main
lobe is inversely proportional to the maximum baseline ν
c¯
umax. However, for real uv-coverage,
unsampled ν
c(¯u, ¯v)points increase the side-lobes and make the dirty beam noisy.
By the convolution theorem, at each frequency ν, the dirty image (or the dirty map) Idirty(:, ν )
becomes the true sky I(:, ν )convolved with dirty beam B(:, ν ):
Idirty(:, ν ) = I(:, ν)∗B(:, ν).(2.16)
Note that Idirty(:, ν )is never a satisfactory nal product because of the side-lobes of the dirty
beam (which are due to incomplete sampling). Therefore, image reconstruction algorithms in RI
Chapter 2: Wideband radio interferometry 12
(a) (b)
Figure 2.4: Dirty beams associated with the VLA uv-coverage generated: (a) at the frequency
ν= 8 GHz. (b) at the frequency ν= 4 GHz.
are called deconvolution methods since they aim to deconvolve I(:, ν)with respect to B(:, ν).
2.3.2 The eect of the primary beam
In practice, the antennas are of nite size and are sensitive to the direction of observation. There-
fore, we introduce the primary beam of the interferometer elements A(l, ν), in the description of
the complex visibility as follows:
y(¯u, ν) = S(¯u, ν )ZA(l, ν)I(l, ν)e−2iπ ν
c(l·¯u)d2l.(2.17)
In theory, the sky intensity I(:, ν)can be extracted by a simple division by the primary beam,
if all the antennas are identical (i.e., all have the same primary beam). However, since the primary
beam shape falls rapidly to zero out of the vicinity of the tracking center, dividing by A(:, ν)
increase the errors in regions far from the center. Therefore, in practice, primary beam correction
is to be done during calibration, especially for sources far from the tracking direction. Moreover,
the sky brightness is usually modulated with the so-called direction dependent eects (DDEs),
that encompass instrumental discrepancies and propagation and receivers errors. These eects are
usually corrected for during calibration and are not in the scope of this work.
2.3.3 Weighting schemes
The dirty beam is a highly non-smooth function with potentially large side-lobes. This is because
the sampling function is a non-smooth function that consists of a collection of Diracs with many
gaps in between and sharp cut-o at the limit of the uv-coverage. To combat the natural sampling,
one can ne-tune the dirty beam shape by multiplying the sampling function with other weighting
13 2.4 Discrete RI data model
function dened as
W(¯u, ν) = ν
c
Mν
X
m=1
T(m, ν)D(m, ν )δ(¯u−¯um,¯v−¯vm),(2.18)
where Mνis the number of measurements at the frequency ν. The function T(:, ν)is a smoothly
varying function, typically Gaussian, used to down-weight the outer edge of the uv-coverage, hence
decreasing the side-lobes of the dirty beam. This is equivalent to a convolution with a Gaussian
in the image domain which will decrease the spatial resolution. The function D(:, ν )is the density
weighting function used to control the weights resulting from the non-uniform sampling in the
uv-plane. There are three schemes of D(:, ν)commonly used depending on the scientic goal:
•D(m, ν)=1/ϱ2(m), called natural weighting, where ϱ2(m)is the variance of the noise
associated with the mdata point. This scheme treats all visibilities the same. Given the
nature of the Fourier sampling in RI where the density of the measurements is heavy in the
center and decreases by moving further away (see Figure 2.3), this scheme provides the best
signal-to-noise ratio and limited resolution, which is good for imaging weak point sources
while it is undesirable for extended emission.
•D(m, ν) = 1/Ns(m), called uniform weighting, where Ns(m)is the number of measurements
within a symmetric region of width scentered at the mdata point. This scheme gives more
weight to high spatial frequencies since they are less sampled in radio interferometry, hence
enhancing the resolution at the expense of compromising the signal-to-noise ratio.
• Briggs weighting (or robust weighting) is a hybrid scheme between natural and uniform
weighting and provides a trade-o between resolution and sensitivity [16]. The key parameter
in Briggs weighting is the so-called robustness parameter which sets the desired level between
uniform and natural. Typical values are between -2 and 2. Lower values give more uniform-
like weights, higher values give more natural-like weights.
2.4 Discrete RI data model
Considering Lfrequency channels and sketching the intensity images and the RI data at each
frequency νlas vectors, the discrete version of the measurement model follows:
yl=Φlxl+nl,(2.19)
where xl∈RN
+is the true sky image of size Npixels at the frequency νl, and yl∈CMlrepresents
the visibilities of size Ml. The vector nl∈CMlrepresents measurement noise, modelled as a
realization of a random independent and identically distributed (i.i.d.) complex Gaussian noise.
Chapter 2: Wideband radio interferometry 14
Φlis the measurement operator at the frequency νl. Ideally, Φlaccounts for a direct Fourier
transform of the sky image to the non-uniform visibility space, which requires MlNcomputations
for each channel. This is infeasible in the era of the new radio telescopes where tremendous amounts
of data will be provided. Instead, the common practice in radio astronomy is to take advantage of
the Fast Fourier Transform (FFT) and interpolate the visibilities from a regular grid. Then, the
measurement operator for each channel νlcan be written as
Φl=ΘlGlFZ.(2.20)
The measurement operator Φlis composed of a zero-padding and scaling operator Z∈Ro·N×N, the
FFT matrix F∈Co·N×o·N, a non-uniform Fourier transform interpolation matrix Gl∈CMl×o·N,
and a diagonal noise-whitening matrix Θl∈CMl×Mlthat contains on its diagonal the inverse of
the noise standard deviation associated with each original measurement. This assumes that the
original visibility vector was multiplied by Θlwith the aim to produce a measurement vector yl
aected by i.i.d. Gaussian noise. In this setting, the data projected onto the image space, i.e.,
the dirty image Φ†
lyl, is naturally-weighted with the inverse of the noise variance as described in
Section 2.3.3. Each row of Glcontains a compact support interpolation kernel centered at the
corresponding uv-point [55], enabling the computation of the Fourier mode associated with each
visibility from surrounding discrete Fourier points. Note that at the sensitivity of interest to the
new generation of radio telescopes, DDEs of either atmospheric or instrumental origin, complicate
the RI measurement equation. For each visibility, the sky surface brightness is pre-modulated
by the product of a DDE pattern specic to each antenna. The DDEs are often unknown and
need to be calibrated jointly with the imaging process [12,98,101,121]. Focussing here on the
imaging problem, i.e., assuming DDEs are known, they can simply be integrated into the forward
model (2.19) by building extended interpolation kernels into each row of Gl, resulting from the
convolution of the non-uniform Fourier transform kernel with a compact-support representation of
the Fourier transform of the involved DDEs.
Estimating the underlying wideband sky image X= (xl)1⩽l⩽Lfrom incomplete Fourier mea-
surements is a severely ill-posed inverse problem, which calls for powerful regularization terms to
encode a prior image model.
2.5 Radio-interferometric imaging
A plethora of RI imaging approaches has been proposed in the literature, which can be classied
into three main categories. These are CLEAN-based approaches, Bayesian inference techniques
and optimization methods.
15 2.5 Radio-interferometric imaging
2.5.1 CLEAN-based algorithms
A rst class of methods is the celebrated CLEAN family [10,32,38,64,85,96,107,109]. CLEAN is
a greedy deconvolution method based on iterative local removal of the PSF. It assumes that the
sky image xl∈RNat the frequency νlis made of a collection of point sources, thus implicitly
assuming sparsity in the image space. This method can also be seen as a gradient descent approach
with an implicit sparsity prior on xl, and its update at iteration tfollows [87] :
x(t+1)
l=x(t)
l+Tֆ
l(yl−Φlx(t)
l),(2.21)
where Φ†
lis the adjoint of the linear operator Φl. At each iteration, the algorithm operates
in major and minor cycles. The minor cycle consists in estimating the brightest point source
(position and value), so-called CLEAN component, then removes a fraction of its contribution
from the residual image using an approximate operator e
Φlto allow fast subtraction of multiple
sources. This process is represented by the operator Tin equation (2.21). Then, the CLEAN
components are added to the estimated sky model x(t)
l. Finally, the major cycle computes the
residual image Φ†
l(yl−Φlx(t+1)
l)using the exact operator Φlfor the next round of minor cycles.
Although ecient in recovering point sources, CLEAN has shown limited performance when it
comes to the recovery of extended emissions. To overcome this limitation, a rst multi-resolution
variant of CLEAN was proposed in [113]. Leveraging an isotropic wavelet transform, the so-called
MRC algorithm performs classical CLEAN iterations on the wavelet coecients. Another multi-
scale extension of CLEAN, MS-CLEAN, has been proposed in [38]. MS-CLEAN assumes the sky
to be a linear combination of images at dierent spatial scales. At each scale, the image is a
convolution of Diracs with tapered, truncated parabolas of dierent widths. From a multi-scale
decomposition of the residual image, MS-CLEAN detects the brightest peak value, its position and
corresponding scale, adds a scaled component at the same pixel position to the estimated model
of the sky, then removes a fraction of its contribution from the data in the same manner as the
classical CLEAN.
A rst wideband CLEAN-based approach, dubbed MF-CLEAN [107], models the sky intensity
as a collection of point sources whose spectra follow a power law dened as
xl=x1•(νl
ν1
)−α,(2.22)
where α∈RNis the spectral index map. This power law is approximated by a rst-order (linear)
Taylor expansion, and the problem reduces to the estimation of two Taylor coecient images.
These are reconstructed by performing a classical CLEAN on their associated dirty images. The
locations of the CLEAN components are determined via a least-squares solution. Yet, MF-CLEAN
is sub-optimal when it comes to the recovery of extended emissions, as these are modelled with
Chapter 2: Wideband radio interferometry 16
point sources. Moreover, the approach is limited by higher-order spectral eects like spectral
curvature. To overcome these drawbacks, [96] have proposed a multi-scale multi-frequency variant
of CLEAN, dubbed MS-MFS, assuming the curvature model as a spectral model. It reads
xl=x1•(νl
ν1
)−α+βlog( νl
ν1),(2.23)
where α∈RNand β∈RNare the spectral index and the curvature maps, respectively. Using
Taylor series, xlis approximated via a linear combination of few Taylor coecient images (sr∈
RN)1⩽r⩽R:
xl=
R
X
r=1
hl,rsr,(2.24)
where (hl,r = (νl−ν1
ν1)r)1⩽l⩽L
1⩽r⩽R
are the spectral basis functions, Ris the order of Taylor series and
Lis the number of channels. In this case, the wideband image reconstruction problem reduces
to the recovery of the Taylor coecient images. These are deconvolved by performing a multi-
scale CLEAN on their respective dirty images (sdirty
r=PL
l=1 hl,rxdirty
l)1⩽r⩽R. More recently,
[85] have proposed a wideband variant of multi-scale CLEAN, so-called joined-channel CLEAN
(JC-CLEAN), that is incorporated in the software wsclean3[84]. The main idea consists in
determining the pixel positions of the CLEAN components from an integrated image, obtained as
a sum of the residual images of all the channels (initially, these correspond to the dirty images).
The spectra of the selected pixel positions are determined directly from the associated values in
the residual images at the dierent channels. When the spectral behaviour of the radio sources
is known to be smooth (that is the case for synchrotron emission [106]), a polynomial is tted to
their estimated spectra.
Albeit simple and computationally ecient, CLEAN-based algorithms provide a limited imag-
ing quality in high-resolution and high-sensitivity acquisition regimes. This shortcoming partly
results from their greedy nature and their lack of exibility in injecting complex prior information
to regularize the inverse imaging problem. Moreover, these algorithms often require careful tuning
of the associated parameters.
2.5.2 Bayesian inference techniques
The second class of methods relies on Bayesian inference techniques to sample the full poste-
rior distribution based on a hierarchical Bayesian model [7,39,69,70,114]. For instance, authors
in [114] proposed a monochromatic Bayesian method based on Markov chain Monte Carlo (MCMC)
sampling techniques, assuming a Gaussian image prior. Since MCMC sampling methods are com-
putationally very expensive, an ecient variant was proposed in [7,70] to perform approximate
Bayesian inference. The so-called RESOLVE algorithm, formulated in the language of information
3W-Stacking CLEAN (wsclean) is a wide eld RI imaging software can be found at https://sourceforge.net/projects/
wsclean/.
17 2.5 Radio-interferometric imaging
eld theory, approximates the full posterior distribution by a multivariate Gaussian distribution
and draws samples from the approximate distribution. RESOLVE is a joint imaging and cali-
bration algorithm, where only direction-independent antenna-based calibration is considered. A
wideband imaging variant of RESOLVE was introduced by [69]. Authors considered a power-law
spectral model and proposed a method restricted to the case of Gaussian or log-normal priors.
In this case, the reconstruction of the wideband model cube consists in the estimation of the sky
image at the reference frequency x1and the spectral index map α. The method is limited by
higher-order spectral eects like spectral curvature.
Importantly, this class of methods naturally enable the quantication of uncertainty around
the image estimate. However, this type of approaches cannot currently scale to the data regime
expected from modern telescopes.
2.5.3 Optimization algorithms
The third class of approaches leverages optimization methods to allow sophisticated prior in-
formation to be considered, such as sparsity in an appropriate transform domain, smoothness,
etc, [6,26,42,43,47,54,57,60,74,124,126]. From the perspective of optimization theory, the in-
verse imaging problem is approached by dening an objective function, consisting of a sum of a
data-delity term and a regularization term promoting a prior image model to compensate for the
incompleteness of the visibility data. The sought image is estimated as a minimizer of this ob-
jective function and is computed through iterative algorithms, which benet from well-established
convergence guarantees. For instance, [124] assumed that the spectra are composed of a smooth
continuous part with sparse local deviations, hence allowing for recovering non-smooth spectra.
In this respect, the authors proposed a convex minimization problem promoting sparsity in a con-
catenation of two dictionaries. The rst synthesis dictionary consists of delta functions. Sparsity
of its associated synthesis coecients is enforced, allowing for sparse local deviations. The second
synthesis dictionary consists of smooth polynomials, more precisely, the basis functions of Cheby-
shev polynomials. Joint sparsity of the synthesis coecients associated with the overall dictionary
is also enforced. Assuming smooth spectra, [54] proposed a convex minimization problem pro-
moting sparsity of both the spatial and spectral information. Spatial sparsity is promoted in a
redundant wavelet dictionary. More interestingly, sparsity of the spectra is enforced in a Discrete
Cosine Transform (DCT). Finally, a third quadratic regularization imposed on the overall model
cube. The approach involves the tuning of multiple hyper-parameters, representing the trade-o
between the dierent priors. The choice of these parameters is crucial as it aects the nal solution
signicantly. To alleviate the issue of tuning multiple parameters, [47] discarded the smooth prior
on the model cube, reducing the number of hyper-parameters to two. Furthermore, [6] proposed
an automatic procedure to tune the remaining two hyper-parameters. In the last decade, Wiaux
and collaborators proposed advanced image models: the average sparsity prior in monochromatic
Chapter 2: Wideband radio interferometry 18
imaging (SARA) [25–27,44,86–88,94], and the polarization constraint for polarized imaging (Po-
larized SARA) [11]4. These models have been reported to result in signicant improvements in
the reconstruction quality in comparison with state-of-the-art CLEAN-based imaging methods,
at the expense of an increased computation cost. The single-channel approach SARA consists in
solving a sequence of weighted minimization problems, promoting sparsity of the sky image in
an overcomplete dictionary (spanned by a collection of eight wavelet bases and the Dirac basis).
Finally, aasuming a point-source model of the sky, the authors in [89] leveraged the nite rate
of innovation (FRI) framework to nd the locations of the point sources in a continuum without
grid imposition. The main idea consists in estimating a smooth function (a polynomial) which
vanishes precisely at the non-zero positions of the continuous-domain sparse signal. Provided that
FRI signals can be written as a weighted sum of sinusoids whose frequencies are related to the
unknown parameters of the original continuous sparse signal, the authors formulated the problem
as a constrained minimization task in the frequency domain to estimate a discrete lter that its
convolution with (unknown) uniformly sampled sinusoids is zero.
Note that, from a Bayesian perspective, the objective function can be seen as the negative
logarithm of a posterior distribution, with the minimizer corresponding to a Maximum A Posteriori
(MAP) estimate. That being said, convex optimization only provides a point estimate of the
posterior distribution, making it faster than Bayesian inference techniques that explore the full
distribution. Recently, methods for uncertainty quantication by convex optimization have also
been tailored, which enable assessing the degree of condence in specic structures appearing in
the MAP estimate [99,100].
2.6 Conclusions
In this chapter, we explained in details the wideband RI measurement framework starting from
the spatial coherence function to the discrete measurement model. We then visited the state-of-
the-art approaches tailored to solve the inverse imaging problem in RI. These are CLEAN-based
methods, Bayesian inference techniques and optimization methods. Since the work developed
in this thesis falls into the last category, we dedicate the next chapter to explain the convex
optimization framework and all the tools needed to develop the proposed algorithms.
4Associated software on the Puri-Psi webpage: https://basp- group.github.io/Puri-Psi/.
Chapter 3
Sparse representation and convex
optimization
Contents
3.1 Introduction ................................... 19
3.2 RI imaging problem formulation ....................... 20
3.3 Compressive sensing and sparse representation .............. 21
3.3.1 ℓ1minimization ................................ 22
3.3.2 Reweighted-ℓ1minimization ......................... 23
3.3.2.1 SARA for RI imaging ....................... 23
3.4 Convex optimization .............................. 24
3.4.1 Proximal splitting algorithms ........................ 25
3.4.2 Primal-dual .................................. 26
3.4.3 Convex optimization for wideband RI imaging - revisited ......... 28
3.5 Conclusions .................................... 30
3.1 Introduction
This chapter provides all the mathematical background required for the work developed in this
thesis and is organized as follows. Section 3.2 poses the RI inverse problem as a minimization
task. We explore the world of compressive sensing and sparse recovery approaches in Section
3.3. In Section 3.4, we introduce convex optimization methods as powerful tools to solve convex
minimization problems and revisit the convex optimization approaches adopted for wideband RI
imaging. Finally, conclusions are stated in Section 3.5.
19
Chapter 3: Sparse representation and convex optimization 20
3.2 RI imaging problem formulation
RI data are incomplete and noisy Fourier measurements, thus image recovery in RI is an ill-posed
inverse problem and have an innite number of possible solutions that can t the acquired data. To
have an accurate estimate and discard unwanted solutions, regularization (i.e., prior information
about the underlying sky image) should be imposed [53]. In the context of optimization theory,
the inverse imaging problem can be approached by dening an objective function consisting of a
sum of a data-delity term imposing yl≈Φlxlup to the noise, and a regularization term. The
sky image of interest xlis estimated as a minimizer of the objective function as follows:
minimize
xl∈RNr(xl) + f(yl,xl),(3.1)
where:
•r(xl)is the regularization function imposing prior information on the sky image xlto be
estimated, e.g., smoothness, sparsity in some domain, etc.
•f(yl,xl)is the data-delity function, measuring the similarity between the available visibil-
ities yland the estimated ones.
We recall that in a Bayesian framework, the objective function can be seen as the negative logarithm
of a posterior distribution, with the minimizer corresponding to a MAP estimate.
Assuming an additive i.i.d Gaussian noise, a typical data-delity term is the euclidean norm
(the ℓ2norm) dened for a signal yl∈RMlas
∥yl∥2="Ml
X
m=1 |ym,l|2#1/2
.(3.2)
The minimization problem (3.1) can then be rewritten as
minimize
xl∈RNr(xl) + τ
2∥yl−Φlxl∥2
2,(3.3)
where τ > 0is a free parameter setting the trade-o between the prior and the data-delity terms.
Problem (3.3) represents an unconstrained minimization problem, and an equivalent constrained
formulation can be written as
minimize
xl∈RNr(xl)subject to ∥yl−Φlxl∥2⩽ϵl,(3.4)
where ϵl>0is a bound on the noise level; ∥nl∥2⩽ϵl. When the noise statistics are known, which
is typically the case in RI, the constrained formulation is preferred, avoiding the need to ne-tune
the free parameter τ[26]. The quality of reconstruction is highly dependent on the choice of the
21 3.3 Compressive sensing and sparse representation
regularization function. In the next section, we explore the world of sparsity, where sparsity priors
have drawn a vast interest for RI image reconstruction.
3.3 Compressive sensing and sparse representation
The Nyquist-Shannon theorem states that for the exact recovery of a band-limited signal, the
sampling rate should be at least twice the bandwidth of the signal. The aim of the theory of
compressive sensing (CS) is to go beyond the Nyquist-Shannon theorem, relying on the fact that
most signals in nature are sparse or compressible. A signal that is represented as a vector xl∈
RN, identies the NNyquist-Shannon samples, is said sparse if it has a few non-zero coecients
K << N in an adequate basis. More generally, xlis said compressible if it has a few signicant
coecients in some basis Ψ∈RN×T, T ⩾Nwhilst most of the remaining coecients have
negligible values [22],
xl=Ψαl,(3.5)
where αl∈RTis sparse or compressible. Extensive research has been conducted during the past
years to nd the most suitable dictionary Ψfor dierent types of images [104,112]. For example,
for images composed of point sources, Ψcan be set to the Dirac basis promoting sparsity in the
image domain itself. For piece-wise constant images, sparsity can be promoted in the gradient
domain [105]. For smooth images with more complex structures (e.g., extended emission), the
wavelet domain [76] and redundant wavelet dictionaries [26] have shown to be good choice for
Ψ. Other options exist in the literature for the choice of the dictionary Ψsuch as the isotropic
undecimated wavelet transform (IUWT) [111] and the curvelets [110].
The theory of compressive sensing states that, under some constraints, an exact recovery of a
compressible signal xl∈RNin a basis Ψ∈RN×Tcan be achieved by a number of Mlmeasurements
yl∈CMlin a sensing basis Φl∈RMl×N, with Mlmuch smaller than the required amount in
Nyquist-Shannon sampling such that
yl=Aαl+nlwith A=ΦlΨ∈RMl×T.(3.6)
The matrix Amust satisfy the restricted isometry property (RIP) [23]. In practice, a few random
measurements in a sensing basis Φlincoherent with the sparsity basis Ψwill ensure the RIP with
overwhelming probability [19,21,49,126]. The coherence µbetween two bases Φland Ψis dened
as the maximum complex modulus of the scalar product between unit-norm vectors ϕl,i and ψj
of the two bases:
µ=√Nmax
1⩽i,j⩽N|⟨ϕl,i ,ψj⟩|.(3.7)
For instance, probing random Fourier measurements from a signal sparse in a wavelet dictionary
Chapter 3: Sparse representation and convex optimization 22
is a good example for an incoherent sensing and sparsity bases. In this respect, the RI inverse
problem (2.19) (equivalently (3.6)) can be solved by nding the sparse representation αlthat is
consistent with the data yl. This can be done by solving the following constrained problem:
minimize
αl∈RT∥αl∥0subject to ∥yl−ΦlΨαl∥2⩽ϵl,(3.8)
where ∥.∥0denotes the ℓ0pseudo-norm that is the number of non-zero coecients of a signal αl[49].
The ℓ0pseudo-norm is neither convex nor smooth function and thus the minimization problem
dened in (3.8) is non-convex. To solve this problem, greedy algorithms such as matching pursuits
(MP) [77] or iterative hard thresholding (IHT) [13,61] can be used. However, these methods are
only guaranteed to nd local optimum, so good initialization becomes paramount.
3.3.1 ℓ1minimization
A common approach to make the problem (3.8) convex is to promote sparsity by replacing the ℓ0
pseudo-norm with its closest convex relaxation, that is the ℓ1norm, dened for a signal αl∈RT
by:
∥αl∥1=
T
X
n=1 |αn,l|.(3.9)
Thus, we pose the following convex minimisation problem [19,30,49]:
minimize
αl∈RT∥αl∥1subject to ∥yl−ΦlΨαl∥2⩽ϵl,(3.10)
The minimization of the ℓ1norm of a sparse signal αlunder a constraint on the ℓ2norm of
the residual noise is called the basis pursuit denoising (BPDN) problem. The BPDN problem
nds explicitly the sparsest signal αland then recovers the original signal xlthrough (3.5); this
is a sparsity-by-synthesis approach. It assumes that the signal xlcan be approximated by a
linear combination of few atoms of a redundant synthesis dictionary Ψand thus solves for the
synthesis coecients αl. Sparsity can also be promoted through analysis-based approaches where
the projection of xlonto a redundant analysis dictionary Ψis assumed sparse [52]. Analysis-based
problems solve directly for the signal xlitself and can be of the form:
minimize
xl∈RN∥Ψ†xl∥1subject to ∥yl−Φlxl∥2⩽ϵl.(3.11)
Both synthesis and analysis approaches are equivalent for orthonormal sparsity bases. However,
when considering redundant dictionaries, they may lead to dierent solutions. In recent works, the
analysis problem has been shown to be more robust for redundant dictionaries [26].
23 3.3 Compressive sensing and sparse representation
3.3.2 Reweighted-ℓ1minimization
Although the ℓ1prior has been widely used during the last decades to promote sparsity, it induces
an undesirable dependence on the coecients’ magnitude. Indeed, unlike the ℓ0prior, the ℓ1norm
penalizes more the larger coecients than the smaller. A better approximation of the sparsity
measure can be achieved by adopting the weighted ℓ1norm [23], dened for a signal αl∈RTas
∥αl∥1,ω=
T
X
n=1
ωn,l|αn,l |,(3.12)
where ωn,l >0is the weight associated with the coecient αn,l. Assuming the signal αlis known
a priori (which is not the case in practice), one can set the weights as ωn,l =|αn,l|−1. This choice
of the weights makes the weighted ℓ1norm independent of the values of the non-zero coecients
to mimic the ℓ0pseudo-norm behaviour. Also for zero-value coecients, we get innite weights,
forcing the solution of the ℓ1minimization problem at these positions to be zero. In practice, the
signal αlis unknown and needs to be estimated. Therefore, the appropriate weights are found by
solving sequentially a sequence of weighted ℓ1minimization problems, each is solved using weights
essentially the inverse of the solution of the previous problem [23,81].
This approach can be elegantly cast as solving for non-convex log-sum priors within a majorization-
minimization framework enabling sequential use of convex optimization algorithms [65]. The log-
sum prior ris dened for a signal αl∈RTas
r(αl) =
T
X
n=1
log |αn,l|+υ,(3.13)
with υ > 0. In practice, minimizing sequentially convex problems with weighted ℓ1priors is indeed
algorithmically much simpler than minimizing a non-convex problem with a log-sum prior [59,82,
83,102]. In the following paragraph, we present the SARA approach [26] proposed to solve the RI
imaging problem using log-sum prior.
3.3.2.1 SARA for RI imaging
The authors in [26] proposed the sparsity averaging reweighted analysis (SARA) approach to
solve the RI imaging problem. SARA promotes sparsity by minimizing a log-sum prior through a
reweighted ℓ1procedure, considering a highly redundant sparsity dictionary Ψ†∈RN×Tdened
as the concatenation of wavelet bases (rst eight Daubechies wavelets and the Dirac basis), leading
to the notion of average sparsity over the bases of interest. The SARA minimization problem reads
minimize
xl∈RN
+eµ
T
X
n=1
log |[Ψ†xl]n|+υsubject to ∥yl−Φlxl∥2⩽ϵl,(3.14)
Chapter 3: Sparse representation and convex optimization 24
where eµ > 0and υ > 0are regularization parameters, and [Ψ†xl]ndenotes the n-th coe-
cient of Ψ†xl. To solve this non-convex minimization problem, the SARA approach leverages the
majorization-minimization approach proposed by [23]. More precisely, for a given image x(k)
l, the
log-sum prior given by r(xl) = eµPT
n=1 log |[Ψ†xl]n|+υis locally majorized by a linear function,
i.e., for every xl∈RN,
r(xl)⩽r(x(k)
l) +
T
X
n=1 eµ
|[Ψ†x(k)
l]n|+υ|[Ψ†xl]n|−|[Ψ†x(k)
l]n|.(3.15)
Then, using a majorization-minimization approach, the majorant function given by the right-hand
side of (3.15) is to be minimized, subject to the data-delity and non-negativity constraints. This
results in the following convex problem, approximation of the initial non-convex problem (3.14):
minimize
xl∈RN
+eµ∥Ψ†xl∥1,ω(x(k)
l)subject to ∥yl−Φlxl∥2⩽ϵl.(3.16)
The weights ω(x(k)
l) = ωn(x(k)
l)1⩽n⩽Tare given by
ωn(x(k)
l) = |[Ψ†x(k)
l]n|+υ−1.(3.17)
Problem (3.16) is a convex approximation of the original non-convex problem (3.14) at the local
point x(k)
l. Then, once problem (3.16) is solved, the full majorization-minimization procedure is
iterated in order to globally approximate the original non-convex problem of interest (3.14).
The SARA approach has proved ecient for RI imaging on both synthetic and real data [26,
27,44,87,88,94].
3.4 Convex optimization
By denition, a function f:RN→]− ∞,+∞]is convex if dom f1is convex and for any
x1,x2∈dom fand θ∈[0,1], we have:
f(θx1+ (1 −θ)x2)⩽θf (x1) + (1 −θ)f(x2),(3.18)
i.e., the function graph between two points on the graph lies below the line segment between these
two points. A convex optimization problem consists in minimizing a convex function subject to
convex constraints. As opposed to non-convex problems like (3.8), the class of convex problems
has the nice property that any local optimum is a global one.
The previously introduced ℓ1minimization problems ((3.10), (3.11) and (3.16)) involving convex
functions (ℓ1and ℓ2norms) belong to the convex problems family. These problems can be eciently
1See Appendix .1 for the denition of the domain of a function.
25 3.4 Convex optimization
solved leveraging convex optimization techniques where various ecient algorithms in terms of
exibility and convergence have been invented in this respect [8]. In what follows, we explain two
classes of methods within the convex optimization framework: proximal splitting methods and the
primal-dual approach. The latter is used for all the algorithmic developments in this thesis.
3.4.1 Proximal splitting algorithms
Proximal splitting methods attract much interest due to their convergence guarantees. When
applied to convex problems, they are exible and lead to a global optimal solution for the associated
minimization task. Many splitting algorithms have been devised [15,33,34], all of them solving
minimization tasks of the form:
minimize
xl∈RN
Q
X
q=1
gq(xl),(3.19)
where for q={1,·· · , Q},gqis a proper, lower semi-continuous2convex function and possibly
non-smooth from RNto ]− ∞,+∞](gq∈Γ0(RN)3). These algorithms bring the advantage of
splitting the objective into many simpler functions that can be dealt with separately. Each non-
dierentiable function is involved in the minimization via its proximity operator. Let U∈RN×N
be a symmetric, positive denite matrix. The proximity operator of a proper, convex, lower semi-
continuous function g:RN→]− ∞,+∞]at xl∈RNwith respect to the metric induced by Uis
dened by [63,78]
proxU
g(xl) = argmin
xl∈RNng(xl) + 1
2(xl−xl)†U(xl−xl)o.(3.20)
In the following, the more compact notation proxgwill be used whenever U=IN, where IN∈
RN×Nis the identity matrix. The proximity operator proxg(xl)is the unique solution of the
minimization of the function g(xl)in the neighborhood of xl. It acts as a simple denoising operator
(e.g., a sparsity regularization term will induce a thresholding operator). The use of the proximity
operator introduces exibility in solving convex minimization problems since no assumptions are
required about non-smoothness. In the particular case, when gis the indicator function ιCof the
convex set C, the proximity operator reduces to the projection onto C. The indicator function of a
non-empty closed convex set C ⊂ RNat a given point xl∈RNis dened as:
(∀xl)ιC(xl) =
0xl∈ C
+∞xl/∈ C.
(3.21)
In this respect, proximal splitting methods can also solve constrained minimization problems by
resorting to the indicator function of the convex set dened by the constraint.
2See Appendix .1 for the denitions of proper and lower semi-continuous functions.
3Γ0(Rn) : class of proper, convex and lower semi-continuous functions from Rnto ]− ∞,+∞].
Chapter 3: Sparse representation and convex optimization 26
One of the proximal splitting techniques that overcome the non-dierentiability diculty is the
so-called forward-backward (FB) algorithm [33,58]. It can be considered as a generalization of the
projected gradient method. It consists in alternating between a gradient descent step applied on
the dierentiable function and a proximal step applied on the non-dierentiable one. Considering
the minimization problem (3.19) with two functions, one of them g2is dierentiable, and the other
g1is non-smooth, the forward-backward solution is characterized by
x(t+1)
l= proxδ(t)g1
| {z }
backward step
(x(t)
l−δ(t)∇g2(x(t)
l)
| {z }
forward step
),(3.22)
with δ(t)being the step size. The algorithm performs an explicit gradient (forward) step using
the function g2followed by implicit (backward) step through the proximity operator of the non-
dierentiable function g1. This is similar to the major and minor cycles of CLEAN. On the one
hand, when g1= 0, the algorithm reduces to a gradient method. On the other hand, when g2= 0,
it resolves to a proximal point method. In the particular case when the function g1is the ℓ1norm,
the proximity step becomes a soft-thresholding operation, and fast algorithms can be derived such
as the iterative shrinkage-thresholding algorithm (ISTA) [46,51] and fast ISTA (FISTA) [9].
3.4.2 Primal-dual
The primal-dual (PD) approach [28,37,72,93,123] has an important advantage, over the proximal
splitting methods, of achieving full-splitting. This means using all the terms dening the mini-
mization problem, including the linear operators, independently. Thus, the solution of the original
task corresponds to solving a sequence of simpler sub-problems. PD solves convex minimization
problems of the form:
minimize
xl∈RNf(xl) + g(Lxl) + h(xl),(3.23)
where f∈Γ0(RN),g∈Γ0(RM),L∈RM×Nis a linear operator, and h∈Γ0(RN)is a dieren-
tiable function having a Lipschitzian gradient with a Lipschitz constant β∈]0,+∞[. The latter
assumption means that the gradient ∇hof the dierentiable function hsatises
(∀(z1,z2)∈(RN)2)∥∇h(z1)− ∇h(z2)∥⩽β∥z1−z2)∥.(3.24)
The problem can also be generalized to the case of multiple functions similar to (3.19). The
minimization task (3.23) is referred to as the primal problem and it is associated with the following
dual problem:
minimize
vl∈RM(f∗□h∗)(−L†vl) + g∗(vl),(3.25)
27 3.4 Convex optimization
where f∗□h∗is the inf-convolution4of the functions fand h,L†∈RN×Mis the adjoint of the
linear operator L, and g∗is the conjugate function of g, dened as [8]
g∗(u)≜sup
x
x†u−g(x),(3.26)
for any u∈RN. The Fenchel-Rockafellar duality theorem states that solving the dual problem
provides a lower bound on the minimum value obtained by the primal one, hence simplies the
problem [8]. PD solves simultaneously for both the primal and dual problems to nd a Kuhn-Tucker
point (b
xl,b
vl)which satises
−L†b
vl− ∇h(b
xl)∈∂f (b
xl),Lb
xl∈∂g∗(b
vl),(3.27)
where ∂f ∗(respectively ∂g∗) is the subdierential5of the function f∗(respectively g∗), and b
xl
and b
vlare the primal and dual solutions, respectively.
For the work developed in this thesis, we resort to a preconditioned variant of the PD algo-
rithm with forward-backward iterations introduced by [93]. The details of the adopted algorithm,
dubbed PDFB, are presented in Algorithm 1. Solving for the dual problem (3.25) and the primal
problem (3.23) results in the dual update (Step 4) and the primal update (Step 6), respectively.
These updates take the form of FB iterations. The dual update requires the proximity operator of
the conjugate functions g∗which can be easily derived from the proximity operator of the function
gthanks to the Moreau decomposition, dened as [36,78]
proxU−1
κg∗=IM−κUproxU
g/κ(κ−1U−1),(3.28)
where U∈RM×Mis a preconditioning matrix. Algorithm 1is guaranteed to converge to the global
minimum of the primal problem (3.23) and the dual problem (3.25) if the following condition is
satised [93]:
1− ∥∆1/2
1L ∆1/2
2∥2
S>∥∆2∥Sβ/2,(3.29)
where ∆1and ∆2are two general preconditioning matrices, Lis a concatenation of all the used
operators and ∥.∥Sstands for the spectral norm.
PD exibility and parallelization capabilities makes it a very good choice for solving wideband
RI inverse problems since all the dual variables can be updated in parallel. Compared to the other
convex optimization solvers using proximity operators adopted for RI imaging such as the Douglas-
Rachford splitting algorithm [26], the simultaneous direction method of multipliers (SDMM) [27],
and the alternating direction method of multipliers (ADMM) [87], PD is more exible and has
further parallelization capabilities with limited overhead [87]. The bottleneck in SDMM is that
4See Appendix .1 for the denition of the inf-convolution.
5See Appendix .1 for the denition of the subdierential of a convex function.
Chapter 3: Sparse representation and convex optimization 28
Algorithm 1: Forward-backward primal-dual (PDFB)
Input: x(0)
l,v(0)
l
Parameters: τ,κ
1t←0;ex(0)
l=x(0)
l;
2while stopping criterion not satised do
3Update dual variable
4v(t+1)
l=∆1IM−prox∆1
g∆−1
1v(t)
l+Lex(t)
l
5Update primal variable
6x(t+1)
l= prox∆−1
2
fx(t)
l−∆2∇h(x(t)
l) + L†v(t+1)
l
7ex(t+1)
l= 2x(t+1)
l−x(t)
l
8t←t+ 1
Result: x(t)
l,v(t)
l
at each iteration, expensive matrix inversion is necessary to update the solution. This can be
prohibitively expensive for wideband RI imaging. ADMM and the Douglas-Rachford splitting
algorithm are limited to only two functions in the minimization problem, thus require sub-iterations
which is very demanding computationally when multiple functions are to be minimized. The PD
algorithm, with the full-splitting of the operators and functions, does not present these drawbacks
[35].
Furthermore, PD exibility enables to incorporate additional information about the data such
as the density of the RI Fourier sampling when enforcing data delity. This allows the algorithm to
make larger steps towards the nal solution, hence accelerating the overall convergence speed [88].
In addition to its exibility and parallelization capabilities, PD allows for randomized updates of
the dual variables [93], meaning that they can be updated less often than the primal variable. Such
functionality lowers the computational cost per iteration, thus ensuring higher scalability of the
algorithmic structure, at the expense of an increased number of iterations to achieve convergence.
3.4.3 Convex optimization for wideband RI imaging - revisited
In the context of wideband imaging, the aim is to jointly recover the spatial and spectral information
of the radio emission. A straightforward approach is to image each channel separately, i.e., no
inter-channel information is exploited. On the one hand, this approach is highly parallelizable and
single-channel image recovery has been extensively studied in the literature [7,10,26,27,32,38,
42–44,57,60,64,69,70,74,84,87,88,94,109,114,126]. On the other hand, it remains sub-optimal
for wideband imaging since the correlation of the wideband data is not exploited. Moreover, the
quality of the recovered images at the dierent channels is limited to their inherent resolution and
sensitivity. In this regard, the SARA approach explained in Section 3.3.2.1 is taken as a benchmark
29 3.4 Convex optimization
for the algorithms developed in the following chapters. SARA is solved leveraging the PDFB
algorithm explained in Section 3.4.2 using all the preconditioning and splitting functionalities for
scalability [44,88].
First applications of convex optimization methods imposing spatio-spectral sparsity priors on
the wideband RI image cube have shown promising results in recovering synthetic data [6,47,
54,124]. Since the work developed in this thesis falls into this category, we revisit the convex
optimization methods proposed for wideband RI imaging described briey in Section 2.5.3. In [124],
the authors propose a convex unconstrained minimization problem promoting sparsity-by-synthesis
of the wideband model cube. Based on the assumption that the spectrum is composed of a smooth
continuous part with sparse local deviations, this method allows for recovering non-smooth features
in the spectral domain. The minimization problem is dened as
minimize
Z∈RT×L
1
2∥ΦΨZ −Y∥2
F+µ1∥Z∥∞,1+µ2∥Zp∥1,(3.30)
with Z∈RT×Lbeing a sparse decomposition of the original signal X∈RN×Lin a redundant
dictionary Ψ∈RN×T;X=ΨZ. The matrix Y∈RM×Lrepresents the RI data cube, assuming
all channels have the same number of visibilities. The dictionary Ψconsists of delta functions and
smooth polynomials, namely the basis functions of Chebyshev polynomials. The ℓ1norm imposes
sparsity on the deviations from the smooth polynomials, denoted by Zp. The second regularizer
is the ℓ∞,1norm, that is dened for a matrix Zas
∥Z∥∞,1=
T
X
n=1
max
1⩽l⩽L|zn,l|.(3.31)
This prior promotes joint sparsity across the spectral domain so that pixels are active or inactive
across all the channels. The minimization problem is solved following the FISTA algorithm [9].
The main limitation of this approach is the assumption that all spectra can be formed as a linear
combination of Chebyshev polynomials with some sparse deviations. Although this assumption
is valid for broad types of spectra, it might not be generic enough for more complicated spectra
observed with the new generation telescopes.
The authors in [54] present a convex unconstrained minimization problem promoting sparsity-
by-analysis of both the spatial and spectral information. The proposed minimization problem
reads
minimize
X∈RN×L
1
2∥Xd−H(X)∥2
F+µ1
2∥X∥2
F+µ2∥Ψ†X∥1+µ3∥XW∥1+ιRN×L
+(X).(3.32)
The data-delity term connects the RI model cube Xto the dirty image cube Xd. The operator
His a convolutional matrix containing the PSFs for all the channels. The matrix Ψ†is a spatial-
Chapter 3: Sparse representation and convex optimization 30
sparsifying dictionary containing the rst eight Daubechies wavelet bases. This dictionary promotes
average sparsity over multiple orthonormal wavelet bases. To promote smoothness in the spectral
dimension, Wimplements a discrete cosine transform (DCT). ιRN×L
+is the indicator function of
the convex set RN×L
+enforcing non-negativity of the wideband RI image cube. Finally, a quadratic
regularization on the image cube is adopted through the Frobenius norm of X, that is given by
∥X∥F="L
X
l=1
N
X
n=1 |xn,l|2#1/2
.(3.33)
The minimization problem is solved using the ADMM algorithm [15]. The tuning of multiple
arbitrary parameters (µ1,µ2,µ3) representing here the trade-o between the dierent priors, is
usually problematic since they are dicult to choose and inuence the nal reconstruction quality.
In a later work, the authors in [47] have discarded the smooth prior on the image cube to reduce
the number of free parameters to two. The new minimization problem is solved using the PD
approach [37,123], and spatial sparsity is promoted in an IUWT dictionary. Furthermore, [6] have
proposed an automatic procedure to tune the remaining two free parameters. The authors in [1]
have shown the limited performance of the spatio-spectral sparsity priors propoesed in [6,47,54].
They suggested the use of more sophisticated models, namely the low-rankness and joint average
sparsity model, to leverage the correlation in wideband RI data and recover high-resolution high-
sensitivity image cubes. This model forms the essence of the work developed in this thesis and will
be explained in details in the next chapter.
3.5 Conclusions
In this chapter, we provided all the mathematical knowledge needed for the proposed algorithms
in the next chapters. We started by posing the RI inverse problem as a minimization task and
explored the world of sparsity and the typical sparsity priors. We then explained in details convex
optimization methods as a powerful tool to solve convex minimization problems with a particu-
lar emphasis on the primal-dual framework adopted in this thesis. Finally, we revisited convex
optimization methods adopted for wideband RI imaging.
Chapter 4
Wideband super-resolution
imaging in radio interferometry
(HyperSARA)
Contents
4.1 Motivation .................................... 32
4.2 HyperSARA: optimization problem ..................... 32
4.2.1 Low-rankness and joint sparsity sky model ................. 33
4.2.2 HyperSARA minimization task ....................... 34
4.3 HyperSARA: algorithmic structure ..................... 36
4.3.1 HyperSARA in a nutshell .......................... 37
4.3.2 Underlying primal-dual forward-backward algorithm ........... 37
4.3.3 Adaptive ℓ2bounds adjustment ....................... 39
4.3.4 Weighting schemes .............................. 40
4.4 Simulations .................................... 43
4.4.1 Simulations settings ............................. 43
4.4.2 Benchmark algorithms ............................ 44
4.4.3 Imaging quality assessment ......................... 45
4.4.4 Imaging results ................................ 47
4.5 Application to real data ............................ 52
4.5.1 Data and imaging details ........................... 52
4.5.2 Imaging quality assessment ......................... 56
4.5.3 Real imaging results ............................. 57
4.6 Conclusions .................................... 63
31
Chapter 4: Wideband super-resolution imaging in RI 32
4.1 Motivation
Upcoming radio interferometers are aiming to image the sky at new levels of resolution and sensi-
tivity, with wideband image cubes reaching close to the Petabyte scale for SKA. It is of paramount
importance to design ecient imaging algorithms which meet the capabilities of such powerful
instruments. On the one hand, appropriate algorithms need to inject complex prior image models
to regularize the inverse problem for image formation from visibility data, which only provide
incomplete Fourier sampling. On the other hand, these algorithms need to be highly parallelizable
to scale with the sheer amount of data and the large size of the wideband image cubes to be
recovered. In this respect, we propose a new approach, dubbed “HyperSARA”, within the versa-
tile framework of convex optimization to solve the wideband RI imaging problem. HyperSARA
consists in solving a sequence of weighted minimization problems, promoting the joint average
sparsity of the wideband model cube in an overcomplete dictionary (spanned by a collection of
eight wavelet bases and the Dirac basis) via the ℓ2,1norm and its low-rankness via the nuclear
norm. The resulting minimization task is solved using the primal-dual forward-backward (PDFB)
algorithm (3.4.2). The algorithmic structure is shipped with highly interesting functionalities such
as preconditioning for accelerated convergence, and parallelization over the data blocks enabling
to spread the computational cost and memory requirements across a multitude of processing CPU
cores with limited resources and thereby allowing scalability to large data volumes. HyperSARA
also involves an adaptive strategy to estimate the noise level with respect to calibration errors
present in real data. We study the reconstruction performance of our approach on simulations
and real VLA observations in comparison with the single-channel SARA approach [44,88] and the
wideband deconvolution algorithm JC-CLEAN [85].
This chapter is structured as follows. Section 4.2 explains the low-rankness and joint sparsity
priors on the wideband model cube and presents the HyperSARA minimization task. Intuitive
and complete descriptions of the HyperSARA algorithmic structure are provided in Section 4.3.
Analysis of the proposed approach and comparison with the benchmark methods on simulations
are given in Section 4.4. Imaging results of VLA observations of Cyg A and the supernova remnant
G055.7+3.4 are presented in Section 4.5. Finally, conclusions and perspectives are stated in Section
4.6.
This work has been published in [2–4,44].
4.2 HyperSARA: optimization problem
33 4.2 HyperSARA: optimization problem
4.2.1 Low-rankness and joint sparsity sky model
In the context of wideband RI image reconstruction, we adopt the linear mixture model originally
proposed by [62]. It assumes that the wideband sky is a linear combination of few sources, each
having a distinct spectral signature. Following this model, the wideband image cube reads
X=SH†,(4.1)
where the matrix S= (s1, .., sQ)∈RN×Qrepresents the physical sources present in the sky,
and their corresponding spectral signatures constitute the columns of the mixing matrix H=
(h1, .., hQ)∈RL×Q. Note that, in this model, physical sources with similar spectral behaviour are
considered as one “source” dening one column of the matrix S. Recall that solving for Sand H
would explicitly imply a source separation problem, that is a non-linear non-convex problem [66].
Instead, we leverage convex optimization by solving directly for Xwith appropriate priors. The
linear mixture model implies low-rankness of X, as the rank is upper bounded by the number of
“sources”. It also implies joint sparsity over the spectral channels; when all the sources are inactive
at one pixel position and regardless of their spectral indices, a full row of the matrix Xwill be
automatically equal to zero.
Given the nature of the RI Fourier sampling where the uv-coverage dilates with respect to fre-
quency, the combination of low-rankness and joint sparsity results in higher resolution and higher
dynamic range of the reconstructed RI image cube. On the one hand, enforcing low-rankness
implies correlation across the channels; this enhances the recovery of extended emissions at the
high-frequency channels and captures the high spatial frequency content at the low-frequency chan-
nels. On the other hand, promoting joint sparsity results in the rejection of isolated pixels that are
associated with uncorrelated noise since low-energy rows of Xare fully set to zero. Consequently,
the overall dynamic range of the reconstructed cube is increased.
The linear mixture model is similar to the one adopted in [96]; the sources can be seen as the
Taylor coecient images, and the spectral signatures can be seen as the spectral basis functions.
However, the Taylor expansion model is an approximation of a smooth function, hence only smooth
spectra can be reconstructed. Moreover, the order of Taylor series has to be set in advance. This
parameter is crucial as it represents a trade-o between accuracy and computational cost. The
linear mixture model adopted here is more generic since it does not assume a specic model of
the spectra, thus allowing for the reconstruction of complex spectral structures (e.g., emission or
absorption lines superimposed on a continuous spectrum). Moreover, there is no hard constraint
on the number of “sources”, and the prior adjusts the number of spectra needed to satisfy the data
constraint.
Chapter 4: Wideband super-resolution imaging in RI 34
4.2.2 HyperSARA minimization task
To enforce low-rankness and joint average sparsity of the RI image cube, we propose the following
minimization problem leveraging log-sum priors:
minimize
X=(xl)1⩽l⩽L∈RN×Lµ
J
X
j=1
log |σj(X)|+υ+µ
T
X
n=1
log ∥[Ψ†X]n∥2+υ
subject to
∥yl,b −Φl,bxl∥2⩽ϵl,b ,∀(l, b)∈ {1,··· , L}×{1,··· , B}
X∈RN×L
+,
(4.2)
where (µ, µ, υ)∈]0,+∞[3are regularization parameters, J⩽min{N , L}is the rank of X,σj(X)1⩽j⩽J
are the singular values of X, and [Ψ†X]†
ndenotes the n-th row of Ψ†X.∥yl,b −Φl,bxl∥2⩽ϵl,b is the
data-delity constraint on the b-th data block in the channel land X∈RN×L
+is the non-negativity
constraint.
The minimization problem (4.2) is a non-convex problem. To solve it, we leverage a majorization-
minimization approach similar to the one described for SARA in Section 3.3.2.1. More precisely,
it consists in successively solving convex optimization problems with weighted ℓ1norms [20]. At
each iteration k∈N, the problem (4.2) is locally approximated at X(k)by the convex optimization
problem
minimize
X=(xl)1⩽l⩽L∈RN×Lµ∥X∥∗,ω(X(k))+µ∥Ψ†X∥2,1,ω(X(k))
subject to
∥yl,b −Φl,bxl∥2⩽ϵl,b ,∀(l, b)∈ {1,··· , L}×{1,··· , B}
X∈RN×L
+,
(4.3)
where ∥·∥∗,ω(X(k))is the weighted nuclear norm, promoting low-rankness, and is dened as
∥X∥∗,ω(X(k))=
J
X
j=1
ωj(X(k))σj(X),(4.4)
with weights ω(X(k)) = ωj(X(k))1⩽j⩽Jgiven by
ωj(X(k)) = σjX(k)+υ−1.(4.5)
The notation ∥·∥2,1,ω(X(k))denotes the weighted ℓ2,1norm, promoting joint average sparsity, and
is dened as
∥Ψ†X∥2,1,ω(X(k))=
T
X
n=1
ωn(X(k))∥[Ψ†X]n∥2,(4.6)
35 4.2 HyperSARA: optimization problem
with the associated weights ω(X(k)) = ωn(X(k))1⩽n⩽Tare given by
ωn(X(k)) = ∥[Ψ†X(k)]n∥2+υ−1.(4.7)
Data delity: Data delity is enforced in a distributed manner by splitting the data and the
measurement operator into multiple blocks where yl,b ∈CMl,b is the b-th data block in the channel l
and Φl,b is the associated measurement operator; Φl,b =Θl,b Gl,bMl,bFZ. Since Gl,b ∈CMl,b×o·Nl,b
consists of compact support kernels, the matrix Ml,b ∈Ro·Nl,b×o·Nselects only the parts of the
discrete Fourier plane involved in computations for the data block yl,b, masking everything else.
ϵl,b is an upper bound on the ℓ2norm of the noise vector nl,b ∈CMl,b . The inter-channel blocking is
motivated by the fact that RI data probed at various wavelengths might have dierent noise levels
and modelling errors. Moreover, data splitting can be inevitable in the case of extreme sampling
rates, beyond the available memory. On the other hand, intra-channel blocking is motivated for
real data since they usually present calibration errors in addition to the thermal noise. Regarding
the estimation of the noise levels, the thermal noise associated with the visibilities is usually
given. If all visibilities within one data block have same noise variance, the noise norm follows a χ2
distribution and the bound ϵl,b can be computed from the χ2statistics [87]. This estimation is only
valid for perfectly calibrated RI data and is used with synthetic data (see Section 4.4.1). However,
in high sensitivity acquisition regimes, RI data may present signicant errors, originating from
DDEs modelling errors, which tend to dominate the thermal noise. In this setting, we propose in
Section 4.3.3 an adaptive strategy to estimate the noise levels during the imaging reconstruction.
Low-rankness: The nuclear norm that is dened for a matrix Xas the sum of its singular values,
is a relevant prior to impose low-rankness [62]. However, the ultimate goal is to minimize the rank
of the estimated cube, i.e., penalizing the vector of the singular values in ℓ0sense. Therefore, we
adopt in our minimization problem (4.2) the log-sum prior of the singular values of Xas a better
approximation of low-rankness. The log-sum prior is minimized through a reweighted ℓ1procedure
and the weights ω(X(k)) = ωj(X(k))1⩽j⩽Jare to be updated iteratively so that, ultimately, large
weights will be applied to the low magnitude singular values and small weights will be attributed
to the large magnitude singular values. By doing so, the former singular values will be strongly
penalized, leaving only a minimum number of non-zero singular values, ensuring low-rankness in
ℓ0sense.
Joint average sparsity: The ℓ2,1norm, dened as the ℓ1norm of the vector whose components
are the ℓ2norms of the rows of X, has shown to be a good prior to impose joint sparsity on the
estimated cube [62]. Penalizing the ℓ2,1norm promotes joint sparsity since low-energy rows of
Xare fully set to zero. Ideally, one aims to minimize the number of non-zero coecients jointly
in all the channels of the estimated cube, by penalizing the vector of the ℓ2norms of the rows
Chapter 4: Wideband super-resolution imaging in RI 36
in ℓ0sense. Thus, we adopt in the proposed minimization problem (4.2) the log-sum prior of
the ℓ2norms of the rows of Xas a better penalty function for joint sparsity. The log-sum prior
is minimized through a reweighted ℓ1procedure and the weights ω(X(k)) = ωn(X(k))1⩽n⩽T
are updated iteratively ensuring that after several reweights rows with signicant energy in ℓ2
sense are associated with small weights and rows with low ℓ2norm - typically corresponding to
channel decorrelated noise - are associated with large weights, and hence will be largely penalized
leaving only a minimum number of non-zero rows. By doing so, we promote joint sparsity in ℓ0
sense. The considered average sparsity dictionary Ψ†∈RT×Nis the celebrated SARA dictionary;
a concatenation of the Dirac basis and the rst eight Daubechies wavelet dictionaries; Ψ†=
(Ψ1,·· · ,ΨD)†[2,4,26,27,44,87,88,94].
The regularization parameters: If we adopt a statistical point of view, the ℓ1norm of a
random variable x∈RN,µ∥x||1, can be seen as the negative log of a Laplace prior with a
scale parameter 1/µ. This scale parameter (equivalently the regularization parameter µ) can be
estimated by the maximum likelihood estimator of the Laplace distribution as ∥x||1/N. We recall
that the nuclear norm is the ℓ1norm of the vector of the singular values of Xand the ℓ2,1norm is the
ℓ1norm of the vector of the ℓ2norms of the rows of X. From this perspective, one can estimate the
regularization parameters associated with the nuclear norm and the ℓ2,1norm in the same fashion
as µ=N/∥X∥∗and µ=N/∥Ψ†X∥2,1, respectively. A convenient choice is to set the parameter
µ= 1. Consequently, µcan be set as the ratio between the estimated regularization parameters,
that is µ=∥X∥∗/∥Ψ†X∥2,1. This ratio has shown to give the best results on extensive sets of
dierent simulations. Moreover, we found that µ=∥Xdirty∥∗/∥Ψ†Xdirty∥2,1estimated directly
from the dirty RI image cube is a good approximation.
SARA vs HyperSARA: The proposed HyperSARA approach is the wideband version of the
SARA approach described in Section 3.3.2.1. On the one hand, SARA solves a sequence of weighted
ℓ1minimization problems promoting average sparsity-by-analysis of the sky estimate in Ψ†. On
the other hand, HyperSARA solves a sequence of weighted nuclear and ℓ2,1minimization tasks of
the form (4.3) promoting low-rankness and joint average sparsity-by-analysis of the wideband sky
estimate in Ψ†.
4.3 HyperSARA: algorithmic structure
To solve the HyperSARA minimization problem (4.3), we leverage the PDFB algorithm explained
in Section 3.4.2. The data-delity constraints can be imposed by means of the indicator function
ιCof a convex set C(3.21). Doing so, the minimization problem (4.3) can be equivalently redened
37 4.3 HyperSARA: algorithmic structure
as
minimize
X=(xl)1⩽l⩽L∈RN×L
+
µ∥X∥∗,ω(X(k))+µ∥Ψ†X∥2,1,ω(X(k))+
L
X
l=1
B
X
b=1
ιB(yl,b,ϵl,b )(Φl,bxl),(4.8)
where
B(yl,b, ϵl,b ) = Φl,bxl∈CMl,b :∥yl,b −Φl,b xl∥2⩽ϵl,b(4.9)
denotes the ℓ2ball centred in yl,b of radius ϵl,b >0, where ϵl,b reects the noise statistics. The
notation ιB(yl,b,ϵl,b )denotes the indicator function of the ℓ2ball B(yl,b, ϵl,b ).
4.3.1 HyperSARA in a nutshell
The HyperSARA approach consists in solving a sequence of weighted minimization problems of the
form (4.8), to achieve low-rankness and joint average sparsity of the estimated wideband model cube
in ℓ0sense. Each of these minimization problems is solved using the adaptive PDFB algorithm,
further development of PDFB that enables imaging real data in the presence of calibration errors.
In Figure 4.1, we display the schematic diagram of the adaptive PDFB and summarize its
computation ow in what follows. At each iteration t, the master CPU core distributes the current
estimate of the wideband image cube and its Fourier coecients to the processing CPU cores.
The former is distributed to the CPU cores associated with the priors (low-rankness and the joint
average sparsity) whereas the latter are distributed to the CPU cores associated with the data-
delity constraints. The updates from all the CPU cores are then gathered in the master CPU
core to update the estimate of the image cube. In essence, all the updates consist in a forward
step (a gradient step) followed by a backward step (a proximal step), which can be interpreted
as CLEAN-like iterations. Thus, the overall algorithmic structure intuitively takes the form of an
interlaced and parallel multi-space version of CLEAN [87].
At convergence of the adaptive PDFB, the weights involved in the priors are updated from the
estimated image cube following (4.5) and (4.7). The minimization problem of the form (4.8) is
thus redened and solved using the adaptive PDFB. The overall HyperSARA method is briefed
in Algorithm 2. It is summarized by two loops: an outer loop to update the weights and an inner
loop to solve the respective weighted minimization task using the adaptive PDFB.
4.3.2 Underlying primal-dual forward-backward algorithm
The details of the adaptive PDFB algorithm are presented in Algorithm 3. Note that steps coloured
in red represent the adaptive strategy to adjust the ℓ2bounds on the data-delity terms adopted
for real data and explained in Section 4.3.3. The algorithmic structure consists of iterative updates
of the dual and primal variables via forward-backward steps. The dual variables P,(Wd)1⩽d⩽D
and (vl,b)1⩽l⩽L
1⩽b⩽B
associated with the low-rankness prior, the joint average sparsity prior and the
Chapter 4: Wideband super-resolution imaging in RI 38
data-delity terms, respectively are updated in parallel in Steps 9,12 and 16 to be used later on
in the update of the primal variable, that is the estimate of the RI image cube, in Steps 28 and 29.
The exact expressions of all the proximity operators are provided in Appendix .2. The proximity
operators of the functions vl,b, enforcing delity to data read projections onto the ℓ2balls with
respect to the preconditioning matrices Ul,b. These are built from the density of the Fourier
sampling as proposed in [88]. More precisely, each matrix Ul,b, associated with a data block
yl,b ∈CMl,b , is set to be diagonal. Its diagonal elements are strictly positive and are set to be
inversely proportional to the density of the sampling in the vicinity of the probed Fourier modes.
Note that when the Fourier sampling is uniform, the operators Ul,b reduce to the identity matrix.
However, this is not the case in radio interferometry. In fact, low Fourier modes tend to be highly
sampled as opposed to high Fourier modes. Given this discrepancy of the Fourier sampling, the
operators Ul,b depart from the Identity matrix. Incorporating such information on the RI data
has proved ecient in accelerating the convergence of the algorithm [88]. It is worth noting that
the projections onto the ℓ2balls with respect to the preconditioning matrices Ul,b do not have a
closed form. Instead, they can be numerically estimated with an iterative algorithm. In this work,
we resort to FISTA [9].
Following (3.29), the convergence of Algorithm 3is dened by the following condition:
∥∆1/2
1L ∆1/2
2∥2
S<1,(4.10)
with β= 0 since the minimization problem (4.8) has no dierentiable functions. The operator
Lis a concatenation of all the used operators. In our case, Lis a concatenation of the identity
operator INfor the nuclear norm, Ψ†for the ℓ2,1norm and Φfor the ℓ2balls associated with the
data-delity terms. That being said, Algorithm 3is guaranteed to converge to the global minimum
of the minimization problem (4.8) for a proper set of the conguration parameters. By choosing
diagonal preconditioning matrices ∆1and ∆2with the conguration parameters (κi)1⩽i⩽3and τ
on the adequate diagonal locations, we can write the convergence condition for Algorithm 3as
κ1IN0 0
0κ2IT0
0 0 κ3U
1/2
IN
Ψ†
Φ
[τIm]1/2
2
S
⩽τκ1+κ2∥Ψ†∥2
S+κ3∥U1/2Φ∥2
S<1,(4.11)
where for every X∈RN×L,U1/2Φ(X) = (U1/2
l,b Φl,bxl)1⩽l⩽L
1⩽b⩽B
. A convenient choice of (κi)1⩽i⩽3is
κ1= 1, κ2=1
∥Ψ†∥2
S
and κ3=1
∥U1/2Φ∥2
S
.(4.12)
39 4.3 HyperSARA: algorithmic structure
In this setting, convergence is guaranteed for all 0< τ < 1/3.
It is worth noting that PDFB allows for randomized updates of the dual variables [93], meaning
that they can be updated less often than the primal variable. Such functionality lowers the compu-
tational cost per iteration at the expense of increased number of iterations to achieve convergence
(see Appendix .4 for further details on the randomized PDFB algorithm). Note that randomization
of the updates in Algorithm 3is not considered since it does not aect the reconstruction quality,
but only the speed of convergence.
4.3.3 Adaptive ℓ2bounds adjustment
In high sensitivity acquisition regimes, calibrated RI data may present signicant errors, originating
from DDEs modelling errors, which tend to dominate the thermal noise and consequently limit
the dynamic range of the recovered images. In this setting, the ℓ2bounds dening the data-
delity terms in the minimization task (4.8) are unknown, hence need to be estimated. [44] have
proposed an adaptive strategy to adjust the ℓ2bounds during the imaging reconstruction by taking
into account the variability of the DDEs errors through time which we adopt herein. The main
idea consists in assuming the noise statistics to be piece-wise constant through time. Thus, a
data-splitting strategy based on the acquisition time is adopted, and the associated ℓ2bounds are
adjusted independently in the PDFB algorithm. The adaptive procedure, described in Steps 19
-25 of Algorithm 3, coloured in red. It can be summarized as follows. Starting from an under-
estimated value ϵ(0)
l,b obtained by performing imaging with the non-negative least-squares (NNLS)
approach, each ℓ2bound ϵ(t+1)
l,b is updated as a weighted mean of the current estimate ϵ(t)
l,b and the
ℓ2norm of the associated data-block residual ∥yl,b −Φl,b e
x(t)
l∥2. This update is performed only
when the relative distance between the former and the latter saturates above a certain bound λ2
set by the user. Note that, conceptually, each update of the ℓ2bounds redenes the minimization
problem set in (4.8). Thus, to ensure convergence of the minimization problem before updating
the ℓ2bounds, two convergence conditions should be met. These are the saturation of the image
cube estimate, reected by β(t+1) =∥X(t+1) −X(t)∥F
∥X(t+1)∥F
being below a low value λ1set by the user
and a minimum number of iterations between consecutive updates is performed. Note that the
image obtained with the NNLS approach tends to over-t the noisy data since only non-negativity
is imposed in the minimization problem. Therefore, the bounds ϵ(0)
l,b are usually under-estimated.
As a rule of thumb, one can initialize the bounds ϵl,b as few orders of magnitude lower than the ℓ2
norms of the associated data blocks depending on the expected noise and calibration error levels.
An overview of the variables and parameters associated with the adaptive strategy is provided in
Appendix .3 (see [44] for more details).
Chapter 4: Wideband super-resolution imaging in RI 40
Algorithm 2: HyperSARA approach
Given X(0),P(0),W(0) ,v(0),ϵ(0) ,ϑ(0)
θ(0) =1J;θ(0) =1T
Outer loop
For k= 1, . . .
(X(k+1),P(k+1) ,W(k+1),v(k+1) ,ϵ(k+1),ϑ(k+1) ) =
Adaptive PDFB X(k),P(k),W(k),v(k),ϵ(k),ϑ(k),θ(k),θ(k)
Inner loop
θ(k+1) =υω(X(k+1))using (4.5)
θ(k+1) =υω(X(k+1))using (4.7)
Until convergence
Output X(k),P(k),W(k),v(k),ϵ(k),ϑ(k)
4.3.4 Weighting schemes
The reweighting procedure represents the outer loop of Algorithm 2. At each reweight indexed
by k∈N, the HyperSARA minimization task with log-sum priors (4.2) is locally approximated
by the weighted nuclear and ℓ2,1minimization problem (4.3), with weights dened in (4.5) and
(4.7). The resulting minimization problem (4.3) (equivalently (4.8)) is solved using the adaptive
PDFB described in Algorithm 3. The weights θ(k+1) and θ(k+1) are updated using (4.5) and (4.7)
applied to the solution X(k+1). Note that the weights dened in (4.5) and (4.7) are multiplied by
the regularization parameter υin Algorithm 2. This does not aect the set of minimizers of the
global problem (4.2). The parameter υis initialized to 1 and decreased at each reweighting step
by a xed factor, which is typically chosen between 0.5 and 0.9. This has been shown to improve
the convergence rate and the scalability of the algorithm [26,87]. Starting from weights equal to
1,i.e.,θ(0) =1Jand θ(0) =1T, with 1Jstands for the vector of size Jwith all coecients
equal to 1, the approach ensures that after several ℓ2,1norm reweights coecients with signicant
spectrum energy in ℓ2sense are down-weighted, whilst other coecients - typically corresponding
to noise - remain highly penalized as their corresponding weights are close to 1. This ensures
higher dynamic range of the reconstructed RI image cube. Similarly, after several nuclear norm
reweights negligible singular values are more penalized as they are accompanied with large weights.
This guarantees more low-rankness and higher correlation across the channels, thus increasing the
overall resolution of the estimated image cube.
41 4.3 HyperSARA: algorithmic structure
Master core
FZ
| {z }
e
X(t)
=
| {z }
u(t)
l,b
Low rankness core
FB step
S∗
µθ(k)
| {z }
Backward step
Forward step
z }| {
· · · ·
P(t+1)
z }| {
· · ·
Joint average sparsity cored
FB step
S`2,1
k؆k2
Sµθ(k)
| {z }
Backward step
Forward step
z }| {
· · · ·
Ψd
A(t+1)
d
z }| {
· · · · · ·
Data fidelity corel,b
FB step
PEyl,b,(t)
l,b
| {z }
Backward step
Forward step
z }| {
· · · ·
Gl,b†Θl,b†
v(t+1)
l,b
z }| {
· · ·
Master core
PN×L
R+
−τ
P(t+1)
z }| {
−τ
k؆k2
S
D
Õ
d=1
e
A(t+1)
1, .. ., e
A(t+1)
d, .. ., e
A(t+1)
D
z }| {
−τ
kU1/2Φk2
S
Z†F†
e
v(t+1)
1,1,. .., e
v(t+1)
l,b,.. ., e
v(t+1)
L,B
z }| {
| {z }
X(t+1)
Figure 4.1: Schematic diagram at iteration tin the adaptive PDFB, detailed in Algorithm 3. It
showcases the parallelism capabilities and overall computation ow. Intuitively, each forward-backward
step in data, prior and image space can be viewed as a CLEAN-like iteration. The overall algorithmic
structure then intuitively takes the form of an interlaced and parallel multi-space version of CLEAN.
Chapter 4: Wideband super-resolution imaging in RI 42
Algorithm 3: The adaptive PDFB algorithm underpinning HyperSARA
Data: (yl,b)l,b ,l∈ {1,··· , L},b∈ {1,··· , B}
Input: X(0),P(0) ,(W(0)
d)1⩽d⩽D,(v(0)
l,b )l,b,θ(0) ,θ(0),(ϵ(0)
l,b )l,b,(ϑ(0)
l,b )l,b
Parameters: (Ul,b)l,b,µ,µ,τ,(κi)1⩽i⩽3,λ1, λ2, λ3, ϑ
1t←0;e
X(0) =X(0);
2while stopping criterion not satised do
3for l= 1 to Ldo
4bx(t)
l=FZ exl;// Fourier transforms
5for b= 1 to Bdo
6bx(t)
l,b =Ml,bbx(t)
l;// send to data cores
7Update dual variables simultaneously
8Promote low-rankness
9P(t+1) =IJ−proxκ−1
1µ∥·∥∗,θP(t)+e
X(t)
10 Promote joint average sparsity
11 for d= 1 to Ddo
12 W(t+1)
d=IN−proxκ−1
2µ∥·∥2,1,θW(t)
d+Ψ†
de
X(t)
13 f
W(t+1)
d=ΨdW(t+1)
d
14 Enforce data delity
15 for (l, b)= (1,1) to (L, B)do
16 v(t+1)
l,b =Ul,b IMl,b −proxUl,b
B2(yl,b,ϵ(t)
l,b )U−1
l,b v(t)
l,b +Θl,bGl,b bx(t)
l,b
17 ev(t+1)
l,b =G†
l,bȆ
l,bv(t+1)
l,b
18 Adjust the ℓ2bounds
19 ρ(t)
l,b =
yl,b −Φl,bex(t)
l
2
20 if β(t)< λ1&t−ϑ(t)
l,b > ϑ&ρ(t)
l,b −ϵ(t)
l,b
ϵ(t)
l,b
> λ2then
21 ϵ(t+1)
l,b =λ3ρ(t)
l,b + (1 −λ3)ϵ(t)
l,b
22 ϑ(t+1)
l,b =t
23 else
24 ϵ(t+1)
l,b =ϵ(t)
l,b
25 ϑ(t+1)
l,b =ϑ(t)
l,b
26 Update primal variable
27 for l= 1 to Ldo
28 h(t+1)
l=κ1p(t+1)
l+κ2
D
X
d=1 ew(t+1)
d,l +κ3Z†F†
B
X
b=1
M†
l,bev(t+1)
l,b
29 X(t+1) =PRN×L
+X(t)−τH(t+1)
30 e
X(t+1) = 2X(t+1) −X(t)
31 β(t+1) =∥X(t+1) −X(t)∥F
∥X(t+1)∥F
32 t←t+ 1
Result: X(t),P(t),W(t),v(t),ϵ(t),ϑ(t)
43 4.4 Simulations
4.4 Simulations
In this section, we rst investigate the performance of the low-rankness and joint average sparsity
priors on realistic simulations of wideband RI data. We then assess the eciency of our approach
HyperSARA in comparison with the wideband JC-CLEAN algorithm [85] and the single-channel
imaging approach SARA [26,88]. Note that, in this setting, the ℓ2bounds on the data-delity
terms are derived directly from the known noise statistics, thus xed.
4.4.1 Simulations settings
To simulate wideband RI data, we utilize an image of the W28 supernova remnant1, denoted by
x1, that is of size N= 256 ×256, with a peak value normalized to 1. The image x1is decomposed
into Q= 10 sources, i.e.,x1=PQ
q=1 sq, with (sq∈RN)1⩽q⩽Q. These consist of 9 dierent sources
whose brightness is in the interval [0.005 1] and the background. Note that the dierent sources
may have overlapping pixels. The wideband image cube, denoted by X, is built following the linear
mixture model described in (4.1). The sources (sq)1⩽q⩽Qconstitute the columns of S. The sources’
spectra, dening the columns of the mixing matrix H, consist of emission lines superimposed on
continuous spectra. These follow the curvature model: hq=(νl
ν1)−αq+βqlog( νl
ν1)1⩽l⩽L1⩽q⩽Q
,
where αqand βqare the respective spectral index and the curvature parameters associated with
the source sq. Emission lines at dierent positions and with dierent amplitudes are then added to
the continuous spectra. Wideband image cubes are generated within the frequency range [ν1, νL] =
[1.4,2.78] GHz, with uniformly sampled channels. Tests are carried out on two image cubes with
a total number of channels L∈ {15,60}. Note that the rank of the considered image cubes in
a matrix form is upper bounded by min{Q, L}. Figure 4.2 shows channel ν1of the simulated
wideband image cube. To study the eciency of the proposed approach in the compressive sensing
framework, we simulate wideband data cubes using a non-uniform random Fourier sampling with
a Gaussian density prole at the reference frequency ν1= 1.4GHz. To mimic RI uv-coverages, we
introduce holes in the sampling function through an inverse Gaussian prole, so that the missing
Fourier content is mainly concentrated in the high spatial frequencies [87]. We extend our study
to realistic simulations using a VLA uv-coverage. For each channel indexed by l∈ {1,··· , L},
its corresponding uv-coverage is obtained by scaling the reference uv-coverage with νl/ν1, this is
intrinsic to wideband RI data acquisition. Figure 4.2 shows the realistic VLA uv-coverages of all the
channels projected onto one plane. The visibilities are corrupted with additive zero-mean complex
white Gaussian noise of variance ϱ2resulting in input signal-to-noise ratios iSNR ∈ {20,40,60}
dB, dened as
iSNR = 10 log10 PL
l=1 ∥Φlxl∥2
2/Ml
Lϱ2!(4.13)
1Image courtesy of NRAO/AUI and [17]
Chapter 4: Wideband super-resolution imaging in RI 44
(a) Realistic VLA uv-coverages (b) Ground-truth image x1
Figure 4.2: Simulations using realistic VLA uv-coverage: (a) The realistic VLA uv-coverages of all the
channels projected onto one plane. (b) Channel ν1of the simulated wideband model cube, a 256 ×256
region of the W28 supernova remnant, shown in log10 scale.
Given same noise variance ϱ2
χon all the visibilities, the ℓ2bounds ϵl,b on the data-delity terms
are derived from the noise variance, where the noise norm follows a χ2distribution [87]. Thus, the
global bound is given by
ϵ=qM+φ√M ϱχ,(4.14)
with φis the number of standard deviations above the mean of the χ2distribution (we set φ= 2).
The block constraints must satisfy Pl,b(ϵl,b)2=ϵ2, and the ℓ2bounds associated with the dierent
data blocks are ϵl,b =pMl,b/M ϵ. We re-emphasize that the adaptive ℓ2bounds strategy is
designed for imaging real data due to the unknown calibration errors in addition to the thermal
noise. Therefore, no adjustment of the ℓ2bounds is required on simulations. We dene the sampling
rate (SR) as the ratio between the number of measurements per channel Mland the size of the
image N:
SR =Ml
N.(4.15)
Several tests are performed using the two image cubes with L∈ {15,60}and varying SR from 0.01
to 1and iSNR ∈ {20,40,60}dB.
4.4.2 Benchmark algorithms
In the rst instance, we showcase the advantage of reweighting through comparison of HyperSARA
with the following benchmark algorithms: (i) Low-Rankness and Joint Average Sparsity (LRJAS)
formulated in (4.8) for ω=1Jand ω=1T(ii) Low-Rankness (LR) formulated as follows:
minimize
X=(xl)1⩽l⩽L∈RN×L
+
µ1∥X∥∗,ω(X(k))+
L
X
l=1
B
X
b=1
ιB(yl,b,ϵl,b )(Φl,bxl),(4.16)
45 4.4 Simulations
(iii) Joint Average Sparsity (JAS) formulated below:
minimize
X=(xl)1⩽l⩽L∈RN×L
+
µ2∥Ψ†X∥2,1,ω(X(k))+
L
X
l=1
B
X
b=1
ιB(yl,b,ϵl,b )(Φl,bxl).(4.17)
LR, JAS and LRJAS are solved using the PDFB algorithm explained in Section 4.3.2 with ω=1J
and ω=1T. For LR, µ1is set to 1 and for JAS µ2= 10−2. In HyperSARA and LRJAS, the
regularization parameters are set to µ= 1 and µ= 10−2, leveraging the dirty wideband image
cube Xdirty as explained in Section 4.2.2.
In the second instance, we evaluate the performance of our approach HyperSARA in comparison
with the CLEAN-based approach JC-CLEAN [85] where we adopt the Briggs weighting for optimal
results (the robustness parameter is set to -0.5). Recall that JC-CLEAN involves polynomial
tting to enhance the reconstruction of smooth spectra. However, this is not optimal for the
simulated spectra where emission lines are incorporated. Therefore, we do not consider polynomial
tting in imaging the simulated wideband data with JC-CLEAN. We also compare with the single-
channel image reconstruction approach SARA explained in Section 3.3.2.1. We rewrite the SARA
minimization problem (3.16) by imposing the data-delity constraints via the indicator function
as follow:
minimize
xl∈RN
+eµ∥Ψ†xl∥1,ω(X(k))+
B
X
b=1
ιB(yl,b,ϵl,b )(Φl,bxl).(4.18)
The SARA approach is solved using the PDFB algorithm [88] (with eµ= 10−2). The dierent
methods are studied using our matlab implementation, with the exception to JC-CLEAN. To
give the reader a brief idea about the speed of HyperSARA in comparison with SARA; for an
image cube of size N= 256 ×256 pixels and L= 60 channels, and for Ml= 0.5N, where Ml
is the number of visibilities per frequency channel, and using 1 node (36 CPU cores) of Cirrus2,
SARA needs approximately 30 minutes to converge while HyperSARA requires around 2 hours to
converge. Note that [1] have shown the superior performance of the low-rankness and joint average
sparsity model in comparison with the state-of-the-art spatio-spectral sparsity algorithm proposed
in [54] on realistic simulations of wideband RI data.
4.4.3 Imaging quality assessment
In the qualitative comparison of the dierent methods, we consider the visual inspection of the
following cubes: the estimated model3cube b
X, the absolute value of the error cube Edened as
the absolute dierence between the ground-truth model cube Xand the estimated model cube
b
X,i.e.,E=|X−b
X|, and the naturally-weighted residual image cube Rwhose columns are
2Cirrus is one of the EPSRC Tier-2 UK National HPC Facilities (http://www.cirrus.ac.uk).
3the estimated image cube obtained by optimization methods is usually called the model cube as opposed to CLEAN-
based methods where the nal product is the so-called restored cube that results from convolving the model cube with the
respective CLEAN beams, then adding the residual image cube.
Chapter 4: Wideband super-resolution imaging in RI 46
given by rl=ηlΦ†
l(yl−Φlb
xl)where yl=Θlylare the naturally-weighted RI measurements,
Θlis a diagonal matrix whose elements are the natural weights, Φl=ΘlGlFZ is the associated
measurement operator and ηl= 1/max
n=1:N(Φ†
lΦlδ)nis a normalization factor, where δ∈RNis
an image with value 1 at the phase center and zero otherwise. By doing so, the PSF dened as
gl=ηlΦ†
lΦlδhas a peak value equal to 1. More specically to JC-CLEAN, we consider the Briggs-
weighted residual image cube e
RJC-CLEAN whose columns are e
rl=eηle
Φ†
l(e
yl−e
Φlb
xl).e
yl=e
Θlyl
are the Briggs-weighted RI measurements, e
Θlis a diagonal matrix whose elements are the Briggs
weights, e
Φl=e
ΘlGlFZ is the associated measurement operator and eηlis a normalization factor.
We also consider the restored cube b
TJC-CLEAN whose columns are b
tl=b
xl∗cl+e
rl, where clis the so-
called CLEAN beam (typically a Gaussian tted to the primary lobe of the PSF gl), and the error
cube e
EJC-CLEAN =|X−b
TJC-CLEAN|. It is worth noting that we divide the columns of the restored
cube b
TJC-CLEAN by the ux of the respective CLEAN beams, i.e., the ℓ1norm of the CLEAN
beams, to have the same brightness scale as the ground-truth. We recall that the restored cube
is the nal product of the CLEAN-based approaches because of its non-physical estimated model
cube, as opposed to compressive sensing-based approaches. The latter class of methods involve
sophisticated priors, resulting in accurate representations of the unknown sky image achieved on
simulations [26,42,126] and real data applications [44,57,88,94,125] for single-channel RI imaging.
We also provide a spectral analysis of selected pixels from the dierent sources of the estimated
wideband cubes. These are the estimated model cubes b
XHyperSARA,b
XSARA,b
XLRJAS,b
XLR and
b
XJAS, and the estimated restored cube b
TJC-CLEAN. For the case of unresolved source, i.e., point-
like source, we derive its spectra from its total ux at each frequency, integrated over the associated
beam area.
In the quantitative comparison of the dierent approaches, we adopt the signal-to-noise ratio
(SNR). For channel indexed l, it is dened as
SNRl(b
xl) = 20 log10 ∥xl∥2
∥xl−b
xl∥2,(4.19)
where xlis the original sky image at the frequency νland b
xlis the estimated model image. For
the full wideband model cube, we adopt the average SNR dened as
aSNR( b
X) = 1
L
L
X
l=1
SNRl(b
xl).(4.20)
For the sake of comparison with JC-CLEAN, we examine the similarity between the ground-truth
and the recovered model cubes with HyperSARA, SARA and JC-CLEAN up to the resolution of
the instrument. To this aim, we consider the smoothed versions of the model cubes, denoted by
Bfor the ground truth whose columns are bl=xl∗cl, and denoted by b
Bfor the estimated model
47 4.4 Simulations
cubes whose columns are b
bl=b
xl∗cl. We adopt the average similarity metric dened as
aSM(B,b
B) = 1
L
L
X
l=1
SMl(bl,b
bl),(4.21)
where for two signals bland b
bl, SMlis dened as
SMl(bl,b
bl) = 20 log10 max(∥bl∥2,∥b
bl∥2)
∥bl−b
bl∥2!.(4.22)
4.4.4 Imaging results
To investigate the performance of the proposed approach in the compressive sensing framework and
study the impact of the low-rankness and joint average sparsity priors on the image reconstruction
quality, we perform several tests on the data sets generated using a non-uniform random Fourier
sampling with a Gaussian density prole. We vary the Fourier sampling rate SR in the interval
[0.01,1], we also vary the iSNR and the number of channels Lsuch that iSNR ∈ {20,40}dB and
L∈ {15,60}. Simulated data cubes are imaged using LR (4.16), JAS (4.17), LRJAS (4.3) for ω=
1Jand ω=1T, and HyperSARA (4.3) with 10 consecutive reweights. Image reconstruction results
assessed using the aSNR metric are displayed in Figure 4.3. We notice that for SR values above
0.05, LR maintains a better performance than JAS. Better aSNR values are achieved by LRJAS,
which suggests the importance of combining both the low-rankness and joint average sparsity priors
for wideband RI imaging. More interestingly, HyperSARA supersedes these benchmark algorithms
with about 1.5 dB enhancement in comparison with LRJAS for all considered SR values. Moreover,
HyperSARA reaches high aSNR values for the drastic sampling rate of 0.01; these are 20 dB and
15 dB for iSNRs 40 dB and 20 dB, respectively. Note that we only showcase the results for SR
below 0.3 since similar behaviour is observed for higher values of SR. These results indicate the
eciency of reweighting.
For qualitative comparison, we proceed with the visual inspection of the estimated model
images, the absolute value of the error images and the residual images (naturally-weighted data).
These are obtained by imaging the wideband data cube generated using realistic VLA uv-coverage
with L= 60 channels, SR = 1 and iSNR = 60 dB. The images of channels ν1= 1.4GHz and
ν60 = 2.78 GHz are displayed in Figures 4.4 and 4.5, respectively. On the one hand, LRJAS
estimated model images (rst row, second panel) have better resolution in comparison with JAS
(rst row, third panel) and LR (rst row, fourth panel). LRJAS also presents lower error maps
(second row, second panel) in comparison with JAS (second row, third panel) and LR (second
row, fourth panel). This is highly noticeable for the low-frequency channels. On the other hand,
HyperSARA provides maps with enhanced overall resolution and dynamic range, reected in better
residuals and smaller errors. In Figure 4.6, we provide spectral analysis of selected pixels from the
Chapter 4: Wideband super-resolution imaging in RI 48
(a) L= 60, iSNR = 40
0 0.02 0.05 0.1 0.2 0.3
12
14
16
18
20
22
24
26
28
(c) L= 60, iSNR = 20
0 0.02 0.05 0.1 0.2 0.3
8
10
12
14
16
18
20
22
(b) L= 15, iSNR = 40
0 0.02 0.05 0.1 0.2 0.3
12
14
16
18
20
22
24
26
28
(d) L= 15, iSNR = 20
0 0.02 0.05 0.1 0.2 0.3
8
10
12
14
16
18
20
22
Figure 4.3: Simulations using random sampling with a Gaussian density prole: aSNR results for the
proposed approach HyperSARA and the benchmark methods LRJAS, JAS, LR and the monochromatic
approach SARA. The aSNR values of the estimated model cubes (y-axis) are plotted as a function of the
sampling rate (SR) (x-axis). Each point corresponds to the mean value of 5noise realizations. The
results are displayed for dierent model cubes varying the number of channels Land the input
signal-to-noise ratio iSNR. (a) L= 60 channels and iSNR = 40 dB. (b) L= 15 channels and iSNR = 40
dB. (c) L= 60 channels and iSNR = 20 dB. (d) L= 15 channels and iSNR = 20 dB.
estimated model cubes revealed in Figures 4.4 and 4.5. Once again, one can notice a signicantly
enhanced recovery of the spectra when combining the two priors as in LRJAS and HyperSARA.
Yet, the latter presents a more accurate estimation of the dierent shapes of the simulated spectra.
Once again, the eciency of our approach is conrmed.
When compared to single-channel image recovery, HyperSARA clearly exhibits higher perfor-
mance for all the data sets generated using a non-uniform random Fourier sampling with a Gaussian
density prole. In fact, almost 5 dB improvement in aSNR is achieved, as shown in Figure 4.3.
This conrms the relevance and eciency of the adopted spatio-spectral priors as opposed to the
purely spatial model of the SARA approach. Furthermore, for regimes with sampling rates above
0.01, increasing the number of channels enhances the recovery of HyperSARA, which shows the
eciency of the weighted nuclear norm prior in capturing the redundant information across the
channels resulting in the low-rankness of the model cube. We do not report the aSNR values for
JC-CLEAN since its non-physical model images result in poor SNR values.
49 4.4 Simulations
Channel ν1= 1.4GHz
Figure 4.4: Simulations with realistic VLA uv-coverage: reconstructed images of channel ν1= 1.4GHz
obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR = 60 dB. From left to right:
results of HyperSARA (aSNR = 30.13 dB), LRJAS (aSNR = 28.85 dB), JAS (aSNR = 25.97 dB) and
LR (aSNR = 26.75 dB). From top to bottom: the estimated model images in log10 scale, the absolute
value of the error images in log10 scale and the naturally-weighted residual images in linear scale.
Chapter 4: Wideband super-resolution imaging in RI 50
Channel ν60 = 2.78 GHz
Figure 4.5: Simulations with realistic VLA uv-coverage: reconstructed images of channel ν60 = 2.78
GHz obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR = 60 dB. From left to right:
results of HyperSARA (aSNR = 30.13 dB), LRJAS (aSNR = 28.85 dB), JAS (aSNR = 25.97 dB) and
LR (aSNR = 26.75 dB). From top to bottom: the estimated model images in log10 scale, the absolute
value of the error images in log10 scale and the naturally-weighted residual images in linear scale.
51 4.4 Simulations
(a) Ground-truth image x1
(b) HyperSARA estimated spectra
1.4e+09 1.8e+09 2.3e+09 2.8e+09
0
0.2
0.4
0.6
0.8
1
X
X
(d) JAS estimated spectra
1.4e+09 1.8e+09 2.3e+09 2.8e+09
0
0.2
0.4
0.6
0.8
1
X
X
(c) LRJAS estimated spectra
1.4e+09 1.8e+09 2.3e+09 2.8e+09
0
0.2
0.4
0.6
0.8
1
X
X
(e) LR estimated spectra
1.4e+09 1.8e+09 2.3e+09 2.8e+09
0
0.2
0.4
0.6
0.8
1
X
X
Figure 4.6: Simulations with realistic VLA uv-coverage: reconstructed spectra of three selected pixels
obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR = 60 dB. The results are shown
for: (b) the proposed approach HyperSARA, (c) LRJAS, (d) JAS and (e) LR, compared with the
ground-truth. Each considered pixel is highlighted with a colored circle in the ground-truth image x1
displayed in (a).
Chapter 4: Wideband super-resolution imaging in RI 52
For a qualitative study of the imaging quality of HyperSARA, SARA and JC-CLEAN, we
display in Figures 4.7 and 4.8 the estimated images, the absolute value of the error images and the
residual images of channels ν1= 1.4GHz and ν60 = 2.78 GHz, respectively. These are obtained by
imaging the wideband data cube generated using realistic VLA uv-coverage with L= 60 channels,
SR = 1 and iSNR = 60 dB. The resolution of the estimated images with HyperSARA (rst row,
left panel) is higher than that achieved by SARA (rst row, middle panel) and JC-CLEAN (rst
row, right panel), thanks to the weighted nuclear norm that enforces correlation, hence enhances
the details at the low-frequency channels and improves the quality of the extended emission at the
high-frequency channels. Moreover, higher dynamic range, reected in less error maps, is achieved
by HyperSARA (second row, left panel) thanks to the weighted ℓ2,1norm that rejects uncorrelated
noise. We show examples of the recovered spectra with the dierent approaches in Figure 4.9.
HyperSARA also achieves accurate recovery of the scrutinized spectra, as opposed to JC-CLEAN
and the single-channel recovery approach SARA. On the one hand, the poor recovery of SARA is
expected since no correlation is imposed and the resolution is limited to the single-channel Fourier
sampling. On the other hand, the recovery of the spectral information with JC-CLEAN is limited
as no explicit spectral model is considered (recall that polynomial tting is not considered with
JC-CLEAN since the simulated spectra contain emission lines). Finally, we report the average
similarity values of the ground-truth with HyperSARA, SARA and JC-CLEAN results at the
resolution of the instrument. These are aSM(B,b
BHyperSARA) = 52.45 dB, aSM(B,b
BSARA) = 41.23
and aSM(B,b
BJC-CLEAN) = 16.38 dB. These values indicate high accuracy of HyperSARA and more
generally strong agreement between the compressive sensing-based approaches when it comes to
recovering the Fourier content up to the resolution of the instrument. On the other hand, the poor
reconstruction of JC-CLEAN is due to the complexity of the spectra considered in the simulations.
4.5 Application to real data
In this section, we present the results of HyperSARA for wideband imaging on VLA observations of
the radio galaxy Cyg A and the supernova remnant G055.7+3.44in comparison with JC-CLEAN
[85] and the single-channel image reconstruction algorithm SARA [26,44]. As opposed to [88], the
ℓ2bounds on the data-delity terms are updated in the algorithm, allowing for imaging in the
presence of unknown noise levels and calibration errors. The values of the parameters associated
with the adaptive strategy are revealed in Appendix .3.
4.5.1 Data and imaging details
Cyg A: The data are part of wideband VLA observations within the frequency range 2-18 GHz
acquired over two years (2015-2016). We consider here 32 channels from the S band (2 - 4 GHz)
4The considered data sets have been already calibrated with the standard RI pipelines.
53 4.5 Application to real data
Channel ν1= 1.4GHz
Figure 4.7: Simulations with realistic VLA uv-coverage: reconstructed images of channel ν1= 1.4GHz
obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR = 60 dB. From left to right:
results of the proposed approach HyperSARA (aSNR = 30.13 dB), the monochromatic approach SARA
(aSNR = 23.46 dB) and JC-CLEAN (aSNR = 9.39 dB). From top to bottom (rst and second columns):
the estimated model images in log10 scale, the absolute value of the error images in log10 scale and the
naturally-weighted residual images in linear scale. From top to bottom (third column): the estimated
restored images in log10 scale, the absolute value of the error images in log10 scale and the
Briggs-weighted residual images in linear scale.
Chapter 4: Wideband super-resolution imaging in RI 54
Channel ν60 = 2.78 GHz
Figure 4.8: Simulations with realistic VLA uv-coverage: reconstructed images of channel ν60 = 2.78
GHz obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR = 60 dB. From left to right:
results of the proposed approach HyperSARA (aSNR = 30.13 dB), the monochromatic approach SARA
(aSNR = 23.46 dB) and JC-CLEAN (aSNR = 9.39 dB). From top to bottom (rst and second columns):
the estimated model images in log10 scale, the absolute value of the error images in log10 scale and the
naturally-weighted residual images in linear scale. From top to bottom (third column): the estimated
restored images in log10 scale, the absolute value of the error images in log10 scale and the
Briggs-weighted residual images in linear scale.
55 4.5 Application to real data
(a) Ground-truth image x1
(b) HyperSARA estimated spectra
1.4e+09 1.8e+09 2.3e+09 2.8e+09
0
0.2
0.4
0.6
0.8
1
X
X
(c) SARA estimated spectra
1.4e+09 1.8e+09 2.3e+09 2.8e+09
0
0.2
0.4
0.6
0.8
1
X
X
(d) JC-CLEAN estimated spectra
1.4e+09 1.8e+09 2.3e+09 2.8e+09
0
0.2
0.4
0.6
0.8
1
X
T
Figure 4.9: Simulations with realistic VLA uv-coverage: reconstructed spectra of three selected pixels
obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR = 60 dB. The results are shown
for: (b) the proposed approach HyperSARA, (c) the monochromatic approach SARA and (d)
JC-CLEAN, compared with the ground-truth. Each considered pixel is highlighted with a colored circle
in the ground-truth image x1displayed in (a).
Chapter 4: Wideband super-resolution imaging in RI 56
and the C band (4 - 8 GHz) spanning the frequency range [ν1, ν32] = [2.04,5.96] GHz with a
frequency step 128 MHz and total bandwidth of 4GHz (Data courtesy of R.A. Perley). The data
in each channel are acquired using the B conguration of the VLA and are of size 25 ×104. We
split the data in each channel to 4 blocks of size 6×104measurements on average, where each
block corresponds to data observed within a time interval over which calibration errors are assumed
constant. For imaging, we consider images of size 1024 ×512 with a pixel size δx = 0.19′′. The
chosen pixel size corresponds to recovering the signal up to 2.5times the nominal resolution at
the highest frequency νL, given by the maximum baseline; BL=νL
cumax. It is common in RI
imaging to set the pixel size δx such that 1
5BL⩽δx ⩽1
3BL, so that all the PSFs are adequately
sampled. The resulting wideband image cube is of size 1024 ×512 ×32. With regards to the choice
of the regularization parameters in HyperSARA, we found that µ=∥Xdirty∥∗/∥Ψ†Xdirty∥2,1(as
explained in Section 4.2.2) is large and results in smooth reconstructed model cubes. This can be
justied by the fact that the considered data set encompasses calibration errors, and the DDEs
are not corrected for in our measurement operator. However, we found that setting µ= 5 ×10−5
that is two orders of magnitude lower than the ratio estimated from the dirty model cube is a
good trade-o to recover high-resolution high-dynamic-range model cubes. Note that µis set to
1 as explained in Section 4.2.2. The regularization parameter associated with SARA is set to
eµ= 5 ×10−5. For SARA and HyperSARA, we solve 30 reweighted minimization problems using
the adaptive PDFB.
G055.7+3.4: The data are part of wideband VLA observations at the L band (1 - 2 GHz)
acquired in 20105. We process 10 channels from each of the following spectral windows: 1.444 −
1.498 GHz, 1.708 −1.762 GHz and 1.836 −1.89 GHz. Each consecutive 10 channels, corresponding
to one spectral window, have a frequency step of 6MHz and a total bandwidth of 60 MHz. The
data in each channel are of size 4×105visibilities, splitted to 4 blocks of size 105measurements on
average. The resulting wideband image cube is of size 1280 ×1280 ×30 with a pixel size δx = 8′′.
The chosen pixel size corresponds to recovering the signal up to 2times the nominal resolution of
the observations at the highest frequency νL. In a similar fashion to the Cyg A data set, we set the
regularization parameters µ= 1 and µ= 5 ×10−6. The regularization parameter associated with
SARA is set to eµ= 5 ×10−6. For SARA and HyperSARA, we solve 30 reweighted minimization
problems using the adaptive PDFB.
4.5.2 Imaging quality assessment
To assess the quality of the reconstruction, we perform a visual inspection of the obtained images.
For HyperSARA and SARA, we consider the estimated model cubes b
XHyperSARA and b
XSARA,
and the naturally-weighted residual image cubes RHyperSARA and RSARA. For JC-CLEAN, we
5Data courtesy of NRAO https://casaguides.nrao.edu/index.php/VLA_CASA_Imaging-CASA5.0.0.
57 4.5 Application to real data
consider Briggs weighting (the robustness parameter is set to −0.5) and examine the resultant
restored cube b
TJC-CLEAN and the Briggs-weighted residual image cube e
RJC-CLEAN. We report
the average standard deviation (aSTD) of all the residual image cubes, dened for an image cube
Z∈RN×Lby
aSTD(Z) = 1
L
L
X
l=1
STDl(zl),(4.23)
where STDl(zl)is the standard deviation of the image zl∈RNat the frequency νl.
We also provide a spectral analysis of selected pixels from the estimated cubes. These are
the estimated model cubes b
XHyperSARA and b
XSARA, and the estimated restored cube b
TJC-CLEAN.
For the case of unresolved source, i.e., point-like source, we derive its spectra from its total ux
at each frequency, integrated over the associated beam area. Finally, we report the similarity
of b
XHyperSARA and b
XSARA. Furthermore, we examine the smoothed versions of b
XHyperSARA,
b
XSARA and b
XJC-CLEAN at the resolution of the instrument, denoted by b
BHyperSARA,b
BSARA and
b
BJC-CLEAN, respectively. Recall that for channel indexed by l,b
bl=b
xl∗clwhere b
xlis the
estimated model image at the frequency νland clis the respective CLEAN beam. However, we
emphasize that smoothing b
XHyperSARA and b
XSARA is not recommended and is performed here
only for comparison purposes with JC-CLEAN.
4.5.3 Real imaging results
Cyg A: The estimated images of channels ν1= 2.04 GHz and ν32 = 5.96 GHz, obtained with the
proposed approach HyperSARA, the single-channel approach SARA and JC-CLEAN, are displayed
in Figures 4.10 and 4.11, respectively. Two key regions in Cyg A are emphasized: these are
the hotspots of the east and west jets (second and third columns). We can see that the model
images of HyperSARA exhibit more details at the low-frequency channels, visible at the hotspots
of Cyg A. Moreover, the features of Cyg A at the high-frequency channels are better resolved with
HyperSARA (see the emission line from the inner core to the east jet and the arc around the right
end of the west jet). The imaging quality of the SARA approach is poor at the low frequencies
since no inter-channel correlation is exploited, and the recovery is limited to the single-channel
inherent resolution. JC-CLEAN restored images are smooth since they result from convolving the
estimated model images with the corresponding CLEAN beams. In Figure 4.12, we display the
naturally-weighted residual images of HyperSARA and SARA. The aSTD values are 1.19 ×10−2
and 8.7×10−3, respectively which indicates higher delity to the naturally-weighted data of the
latter. Yet, SARA residual images (right) indicate poor recovery of Cyg A jets at the low-frequency
channels in comparison with those obtained with HyperSARA (left). Both HyperSARA and SARA
residual images present errors at the brightest pixel positions (the hotspots). These can be justied
by the dominant calibration errors at those positions. However, more substantial errors are kept in
the residual with HyperSARA and seem to be absorbed in the model images of SARA. HyperSARA
Chapter 4: Wideband super-resolution imaging in RI 58
and JC-CLEAN Briggs-weighted residual images are shown in Figure 4.13 with the respective
aSTD values are 4.1×10−3and 2.1×10−3. These indicate higher delity of JC-CLEAN to the
Briggs-weighted data. Recall that the two approaches solve for two dierent imaging problems;
HyperSARA solves for the naturally-weighted data whereas JC-CLEAN solves for the Briggs-
weighted data. Spectral analysis of the dierent approaches is revealed in Figure 4.14. One can
see that the spectra recovered with HyperSARA have higher intensity values at the low-frequency
channels, thanks to the weighted nuclear norm that enhances the details at the low-frequency
channels. Finally, given the unknown ground truth, we report the average similarity aSM values
of the proposed method with the benchmark approaches. These are aSM(b
XHyperSARA,b
XSARA) =
16.45 dB while aSM(b
BHyperSARA,b
BSARA) = 36.53 dB. Also aSM(b
BHyperSARA,b
BJC-CLEAN) = 33.36
dB. These values indicate high similarity of the recovered low spatial frequency content with all
the methods. In other words, there is strong agreement between the approaches up to the spatial
bandwidth of the observations.
G055.7+3.4: In Figures 4.15 and 4.16, we present the reconstructed images of channels ν1=
1.444 GHz and ν30 = 1.89 GHz, respectively, obtained with the proposed approach HyperSARA,
the single-channel approach SARA and JC-CLEAN. The gures clearly demonstrate a signicantly
higher performance of HyperSARA in terms of resolution and dynamic range. For instance, one can
see that the central extended emission is very well captured by HyperSARA in the overall estimated
model cube as opposed to SARA and JC-CLEAN. While SARA presents a smooth representation of
the source, JC-CLEAN provides a highly noisy representation. Moreover, the number of modelled
point sources is clearly higher with HyperSARA in particular at the low-frequency channels, unlike
SARA, where only a few sources are detected whereas JC-CLEAN presents a large number of false
detections. This suggests the eciency of the HyperSARA priors in capturing the correlation of
the data cube and enhancing the dynamic range of the recovered model cube. The naturally-
weighted residual images of HyperSARA and SARA are shown in Figure 4.17. The aSTD values
are 6.55 ×10−5and 8.37 ×10−5, respectively, which reects the higher delity to data achieved
by HyperSARA. The Briggs-weighted residual images of HyperSARA and JC-CLEAN are also
displayed in Figure 4.18, their respective aSTD values are 1.12 ×10−4and 7.75 ×10−5. These
indicate higher delity of JC-CLEAN to the Briggs-weighted data. Finally, we show examples of
the recovered spectra with the dierent approaches in Figure 4.19. When inspecting the dierent
spectra, one can see that HyperSARA succeeds to recover the scrutinized sources with higher ux
in comparison with the other approaches. Finally, we report the average similarity values of the
proposed method with the benchmark approaches. These are aSM(b
XHyperSARA,b
XSARA)=9.13
dB while aSM(b
BHyperSARA,b
BSARA) = 12.3dB. Also aSM(b
BHyperSARA,b
BJC-CLEAN) = 7.1dB.
The low aSM values conrm the substantial disagreement in the quality of the recovery up to
the resolution of the instrument, that is in agreement with the visual inspection of the estimated
59 4.5 Application to real data
Channel ν1= 2.04 GHz
Figure 4.10: Cyg A: recovered images of channel ν1= 2.04 GHz at 2.5times the nominal resolution at
the highest frequency νL. From top to bottom: estimated model images of the proposed approach
HyperSARA, estimated model images of the monochromatic approach SARA and estimated restored
images of JC-CLEAN using Briggs weighting. The full images are displayed in log10 scale (rst column)
as well as zooms on the east jet hotspot (second column) and the west jet hotspot (third column).
Chapter 4: Wideband super-resolution imaging in RI 60
Channel ν32 = 5.96 GHz
Figure 4.11: Cyg A: recovered images of channel ν32 = 5.96 GHz at 2.5times the nominal resolution at
the highest frequency νL. From top to bottom: estimated model images of the proposed approach
HyperSARA, estimated model images of the monochromatic approach SARA and estimated restored
images of JC-CLEAN using Briggs weighting. The full images are displayed in log10 scale (rst column)
as well as zooms on the east jet hotspot (second column) and the west jet hotspot (third column).
61 4.5 Application to real data
(a) Channel ν1= 2.04 GHz
(b) Channel ν32 = 5.96 GHz
Figure 4.12: Cyg A: naturally-weighted residual images obtained by the proposed approach
HyperSARA (left) and the monochromatic approach SARA (right). (a) Channel ν1= 2.04 GHz, and (b)
Channel ν32 = 5.96 GHz. The aSTD values are 1.19 ×10−2and 8.7×10−3, respectively.
(a) Channel ν1= 2.04 GHz
(b) Channel ν32 = 5.96 GHz
Figure 4.13: Cyg A: Briggs-weighted residual images obtained by the proposed approach HyperSARA
(left) and JC-CLEAN (right). (a) Channel ν1= 2.04 GHz, and (b) Channel ν32 = 5.96 GHz. The aSTD
values are 4.1×10−3and 2.1×10−3, respectively.
Chapter 4: Wideband super-resolution imaging in RI 62
(a) Estimated model image of HyperSARA at channel ν32 = 5.96 GHz
(b) Selected pixel P1
0
5
10
15
20
25
30
2.04 2.94 3.04 3.94 4.04 5.96
X
X
T
(d) Selected source S1
0
0.2
0.4
0.6
0.8
2.04 2.94 3.04 3.94 4.04 5.96
X
X
T
(c) Selected pixel P2
0
50
100
150
200
250
300
350
2.04 2.94 3.04 3.94 4.04 5.96
X
X
T
(e) Selected pixel P3
0
5
10
15
20
25
30
2.04 2.94 3.04 3.94 4.04 5.96
X
X
T
Figure 4.14: Cyg A: reconstructed spectra of selected pixels and point-like sources obtained by the
dierent approaches. Each considered pixel (P) or source (S) is highlighted with a red circle on the
estimated model image of HyperSARA at channel ν32 = 5.96 GHz displayed in (a).
63 4.6 Conclusions
images.
4.6 Conclusions
In this chapter, we presented the HyperSARA approach for wideband RI image reconstruction.
It consists in solving a sequence of weighted nuclear norm and ℓ2,1minimization problems pro-
moting low-rankness and joint average sparsity of the wideband model cube in ℓ0sense. Hyper-
SARA is able to achieve higher resolution of the reconstructed wideband model cube thanks to
the weighted nuclear norm that enforces inter-channel correlation. The overall dynamic range is
also enhanced thanks to the weighted ℓ2,1norm that rejects decorrelated artefacts present on the
dierent channels. The eciency of HyperSARA was validated on simulations and VLA obser-
vations in comparison with the single-channel imaging approach SARA and the CLEAN-based
wideband imaging algorithm JC-CLEAN. As opposed to the CLEAN-based methods, the sophis-
ticated priors of HyperSARA come at the expense of increased computational cost and memory
requirements. To mitigate this eect, we adopt the primal-dual algorithmic structure dened in
the context of the theory of convex optimization, owing to its highly interesting functionalities for
wideband RI imaging. We have leveraged the preconditioning functionality to provide accelerated
convergence. The functionality of parallelization of the dierent functions and operators involved
in the minimization task was also promoted as a way to spread the computational cost and memory
requirements due to the large data volumes over a multitude of processing CPU cores with limited
resources.
Although HyperSARA is scalable to large data volumes, it can be prohibitive for very large
image cubes. This is because the complex prior terms, namely the nuclear norm and ℓ2,1norm, are
not separable and the whole image cube Xis stored and processed in a single CPU core at each
iteration (See Algorithm 3, Steps 9and 12). Moreover, the proximity operator of the nuclear norm
in Step 9requires an SVD operation of the full image cube at each iteration which can be costly
for big image cubes (see Appendix .2 for the exact expressions of all the proximity operators). To
overcome this bottleneck and establish the full scalability potential of our approach, we present
Faceted HyperSARA in the next chapter.
Chapter 4: Wideband super-resolution imaging in RI 64
Channel ν1= 1.444 GHz
Figure 4.15: G055.7+3.4: recovered images of channel ν1= 1.444 GHz at 2times the nominal
resolution at the highest frequency νL. From top to bottom: estimated model images of the proposed
approach HyperSARA, estimated model images of the monochromatic approach SARA and estimated
restored images of JC-CLEAN using Briggs weighting. The full images are displayed in log10 scale (rst
column) as well as zoom on the central region (second column).
65 4.6 Conclusions
Channel ν30 = 1.89 GHz
Figure 4.16: G055.7+3.4: recovered images of channel ν30 = 1.89 GHz at 2times the nominal
resolution at the highest frequency νL. From top to bottom: estimated model images of the proposed
approach HyperSARA, estimated model images of the monochromatic approach SARA and estimated
restored images of JC-CLEAN using Briggs weighting. The full images are displayed in log10 scale (rst
column) as well as zoom on the central region (second column).
Chapter 4: Wideband super-resolution imaging in RI 66
(a) Channel ν1= 1.444 GHz
(b) Channel ν30 = 1.89 GHz
Figure 4.17: G055.7+3.4: naturally-weighted residual images obtained by the proposed approach
HyperSARA (left) and the monochromatic approach SARA (right). (a) Channel ν1= 1.444 GHz, and
(b) Channel ν30 = 1.89 GHz. The aSTD values are 6.55 ×10−5and 8.37 ×10−5, respectively.
67 4.6 Conclusions
(a) Channel ν1= 1.444 GHz
(b) Channel ν30 = 1.89 GHz
Figure 4.18: G055.7+3.4: Briggs-weighted residual images obtained by the proposed approach
HyperSARA (left) and JC-CLEAN (right). (a) Channel ν1= 1.444 GHz, and (b) Channel ν30 = 1.89
GHz. The aSTD values are 1.12 ×10−4and 7.75 ×10−5, respectively.
Chapter 4: Wideband super-resolution imaging in RI 68
(a) Estimated model image of HyperSARA at channel ν30 = 1.89 GHz
(b) Selected source S1
1
1.5
2
2.5
3
3.5
410-3
1.44 1.5 1.71 1.76 1.84 1.89
X
X
T
(d) Selected pixel P1
0
1
2
3
4
5
6
710-3
1.44 1.5 1.71 1.76 1.84 1.89
X
X
T
(c) Selected source S2
0
0.02
0.04
0.06
0.08
0.1
1.44 1.5 1.71 1.76 1.84 1.89
X
X
T
(e) Selected source S3
0
0.05
0.1
0.15
0.2
0.25
1.44 1.5 1.71 1.76 1.84 1.89
X
X
T
Figure 4.19: G055.7+3.4: reconstructed spectra of selected pixels and point-like sources obtained by
the dierent approaches. Each considered pixel (P) or source (S) is highlighted with a red circle on the
estimated model image of HyperSARA at channel ν30 = 1.89 GHz (rst row).
Chapter 5
Faceted HyperSARA for wideband
RI imaging: when precision meets
scalability
Contents
5.1 Motivation .................................... 70
5.2 Proposed faceting and Faceted HyperSARA approach .......... 71
5.2.0.1 Spectral faceting .......................... 71
5.2.0.2 Spatial faceting ........................... 73
5.3 Algorithm and implementation ........................ 75
5.3.1 Faceted HyperSARA algorithm ....................... 75
5.3.2 Underpinning primal-dual forward-backward algorithm .......... 76
5.3.3 Parallel algorithmic structure ........................ 77
5.3.4 MATLAB implementation .......................... 78
5.4 Validation on synthetic data .......................... 78
5.4.1 Simulation setting .............................. 79
5.4.1.1 Images and data .......................... 79
5.4.1.2 Spatial faceting ........................... 80
5.4.1.3 Spectral faceting .......................... 82
5.4.2 Hardware ................................... 82
5.4.3 Evaluation metrics .............................. 82
5.4.4 Results and discussion ............................ 83
5.4.4.1 Spatial faceting ........................... 83
5.4.4.2 Spectral faceting .......................... 84
69
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 70
5.5 Validation on real data ............................. 84
5.5.1 Dataset description and imaging settings .................. 90
5.5.2 Hardware ................................... 91
5.5.3 Evaluation metrics .............................. 91
5.5.4 Results and discussion ............................ 92
5.5.4.1 Imaging quality ........................... 92
5.5.4.2 Computing cost ........................... 94
5.6 Conclusions .................................... 94
5.1 Motivation
Modern proximal optimization algorithms have shown the potential to signicantly outperform
state-of-the-art approaches thanks to their ability to inject complex image models to regularize the
inverse problem for image formation from visibility data. More specically to wideband RI imaging,
HyperSARA, proposed in the previous chapter, has been shown to outperform wideband CLEAN
variant, dubbed JC-CLEAN [85] through the recovery of high-delity high-resolution image cubes.
HyperSARA, powered by the primal-dual forward-backward (PDFB) algorithm (3.4.2), enables the
decomposition of data into blocks and parallel processing of the block-specic data-delity terms
of the objective function, which provides scalability to large data volumes. HyperSARA however
models the image cube as a single variable, and the computational and storage requirements
induced by complex regularization terms can be prohibitive for a very large image size.
We address this bottleneck in this chapter. We propose to decompose the target image cube
into regular, content-agnostic, spatially overlapping spatio-spectral facets, with which are associ-
ated facet-specic regularization terms in the objective function, and further exploit the splitting
functionality of PDFB to enable parallel processing of the regularization terms and ultimately
provide further scalability. Our proposed approach is dubbed “Faceted HyperSARA”. Note that
faceting is not a novel paradigm in RI imaging: it has often been considered for calibration purposes
in the context of wide-eld imaging, assuming piece-wise constant direction-dependent eects. For
instance, [75] proposed an image tessellation scheme for LOFAR wide-eld images, which has been
leveraged by [115] in the context of wide-eld wideband calibration and imaging. However, to
the best of our knowledge, except for [80], facet imaging has hitherto been essentially addressed
with CLEAN-based algorithms. This class of approaches not only lacks theoretical convergence
guarantees but also does not oer much exibility to accommodate advanced regularization terms.
In contrast with [80], the proposed faceting approach does not need to be tailored to the con-
tent of the image, and thus oers more exibility to design balanced facets exclusively based on
computational considerations.
71 5.2 Proposed faceting and Faceted HyperSARA approach
The reconstruction performance of the proposed imaging approach is evaluated against Hyper-
SARA and SARA on synthetic data. We further validate the performance and scalability potential
of our approach through the reconstruction of a 15 GB image cube of Cyg A from 7.4 GB of VLA
observations across 480 channels. Our results conrm the recent discovery of Cyg A2, a second
super-massive black hole in Cyg A [40].
This chapter is structured as follows. Section 5.2 introduces the proposed faceted prior model
underpinning Faceted HyperSARA. The associated algorithm is described in Section 5.3, along
with the dierent levels of parallelization exploited in the proposed matlab implementation. Per-
formance validation is rst conducted on synthetic data in Section 5.4. We successively evaluate
the inuence of spectral and spatial faceting for a varying number of facets and spatial overlap,
both in terms of reconstruction quality and computing time. Section 5.5 is focused on the vali-
dation of the proposed approach on real VLA observations in terms of precision and scalability.
Conclusions and perspectives are reported in Section 5.6.
This work has been published in [118–120].
5.2 Proposed faceting and Faceted HyperSARA approach
The proposed Faceted HyperSARA approach builds on the HyperSARA method explained in
Chapter 4, distributing both the average sparsity and the low-rankness prior over multiple spatio-
spectral facets to alleviate the computing and storage requirements inherent to HyperSARA. In
particular, we propose to decompose the 3D image cube into Q×Cspatio-spectral facets, as
illustrated in Figure 5.1 and detailed below.
5.2.0.1 Spectral faceting
The wideband image cube can rst be decomposed into separate image sub-cubes composed of
a subset of the frequency channels, with a separate prior for each sub-cube. Since the data-
delity terms are channel-specic, the overall objective function of HyperSARA (4.8) reduces to
the sum of independent objectives for each sub-cube. The smaller-size wideband imaging sub-
problems (smaller data sets, and smaller image volumes) can thus be solved independently in
parallel, oering scalability. Taken to the extreme, this simple spectral faceting can be used to
separate all channels and proceed with single-channel reconstructions (leading to SARA), however
simultaneously loosing completely the advantage of correlations between frames to improve image
precision. The key point is to keep an appropriate number of frames per sub-cube in order to
optimally take advantage of this correlation. Also, given the data at higher frequency channels
provide higher spatial frequency information than the lower frequency channels, it is of critical
importance that the whole extent of the frequency band of observation be exploited in each channel
reconstruction. In this context, we propose to decompose the cube into channel-interleaved spectral
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 72
(a) Full image cube (b) Spectral sub-cubes (c) Facets & weights
Figure 5.1: Illustration of the proposed faceting scheme, using a 2-fold spectral interleaving process and
9-fold spatial tiling process. The full image cube variable (a) is divided into two spectral sub-cubes (b)
with interleaved channels (for a 2-fold interleaving, even and odd channels respectively dene a
sub-cube). Each sub-cube is spatially faceted. A regular tessellation (dashed red lines) is used to dene
spatio-spectral tiles. The spatio-spectral facets result from the augmentation of each tile to produce an
overlap between facets (solid red lines). Panel (c) shows a single facet (left), as well as the spatial
weighting scheme (right) with linearly decreasing weights in the overlap region. Note that, though the
same tiling process is underpinning the nuclear norm and ℓ21 norm regularization terms, the denition of
the appropriate overlap region is specic to each of these terms (via the selection operators Sqand e
Sqin
(5.3)).
73 5.2 Proposed faceting and Faceted HyperSARA approach
sub-cubes, each of which results from a uniform sub-sampling of the whole frequency band (see
Figure 5.1 (b)). We thus decompose the original inverse problem (2.4) into Cindependent, channel-
interleaved sub-problems, each considering Lcchannels from the original data cube, with L=
L1+. . . +LC. For each sub-cube c∈ {1, . . . , C },yc,l,b ∈CMc,l,b denotes the visibilities associated
with the channel l∈ {1, . . . , Lc}and data-block b∈ {1, . . . , B}, and by Φc,l,b and ϵc,l,b the
associated measurement operator and ℓ2ball radius, respectively. The proposed minimization
problem is thus formulated as
minimize
X=(Xc)1⩽c⩽C∈RN×L
+
C
X
c=1 Lc
X
l=1
B
X
b=1
ιB(yc,l,b,ϵc,l,b )Φc,l,bxc,l +rc(Xc)!,(5.1)
where, for every c∈ {1, . . . , C}, the indices (l, b)∈ {1, . . . , Lc}×{1, . . . , B}refer to a data block b
of channel land ιB(yc,l,b,ϵc,l,b)denotes the indicator function of the ℓ2ball B(yc,l,b, ϵc,l,b )(4.9). This
indicator function acts as a data-delity term, in that it ensures the consistency of the modeled
data with the measurements and reect the Gaussian nature of the noise [26]. The notation
Xc= (xc,l)1⩽l⩽Lc∈RN×Lcis the c-th sub-cube of the full image cube X, with xc,l ∈RNthe l-th
image of the sub-cube Xc, and rc:RN×Lc→]−∞,+∞]is a sub-part of the regularization term r,
only acting on the c-th sub-cube. Finally, note that an additional non-negativity prior is imposed
as for all approaches of the SARA family focusing on intensity imaging, with the aim to preserve
the physical consistency of the estimated surface brightness. This generalizes to the polarization
constraint when solving for all the Stokes parameters.
5.2.0.2 Spatial faceting
Faceting can also be performed in the spatial domain by decomposing the regularization term for
each spectral sub-cube into a sum of terms acting only locally in the spatial domain (see Figure 5.1
(c)). In this context, the resulting facets need to overlap to avoid edge eects, so that the overall
objective function (5.1) takes the form of the sum of inter-dependent facet-specic objectives. This
inter-dependence precludes separating the imaging problem into facet problems. However, the
splitting functionality of PDFB (explained in Section 3.4.2) can be exploited to enable parallel
processing of the facet-specic regularization terms and ensure further scalability (see Section 5.3).
On the one hand, we propose to split the average sparsity dictionary Ψ†into Qsmaller wavelet
decomposition, leveraging the wavelet splitting technique introduced in [95, Chapter 4]. [95] pro-
posed an exact implementation of the discrete wavelet transform distributed over multiple facets.
In this context, the Daubechies wavelet bases are decomposed into a collection of facet-based op-
erators Ψ†
q∈RTq×Nqacting only on the q-th facet of size Nq, with T=T1+. . . +TQ. The
overlap needed to ensure an exact faceted implementation of the wavelet transforms is composed
of a number of pixels between 15(2s−2) and 15(2s−1) in each spatial direction [95, Section 4.1.4],
with sbeing the level of decomposition. In practice, the overlap ensures that each facet contains
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 74
all the information needed to compute the convolutions underlying the discrete wavelet transforms
locally.
On the other hand, we consider a faceted low-rankness prior enforced by the sum of nuclear
norm priors on essentially the same overlapping facets as those introduced for the wavelet decom-
position. This provides a more tractable alternative to the global low-rankness prior encoded by
the nuclear norm of HyperSARA. Unlike the wavelet decomposition, there is no equivalent faceted
implementation of the eigenvalue decomposition. To mitigate reconstruction artifacts possibly
resulting from the faceting of the 3D image cube, for each facet q∈ {1, . . . , Q}, of size e
Nq, we
propose to introduce a diagonal matrix Dq∈]0,+∞[e
Nq×e
Nqensuring a smooth transition from the
borders of one facet to its neighbors. A natural choice consists in down-weighting the contribution
of pixels involved in multiple facets. A tampering window decaying in the overlapping regions
is considered while ensuring that the sum of all the weights associated with each pixel is equal
to unity. In this work, we consider weights in the form of a 2D triangular apodization window
as considered by [79] (see Figure 5.1 (c)). The size of the overlap for this term is taken as an
adjustable parameter of the Faceted HyperSARA approach to further promote local correlations.
Its inuence is investigated in Section 5.4. In practice, a larger overlap region than the one taken
for the faceted wavelet transform is considered, taking advantage of the overlap already imposed
by the faceted implementation of the wavelet decomposition and the associated ℓ2,1norm priors.
The spatial faceting procedure therefore results in splitting the original log-sum priors of Hy-
perSARA in (4.2) into a sum of inter-dependent facet-specic log-sum priors, dening the Faceted
HyperSARA prior:
rc(Xc) =
Q
X
q=1
µc
Jc,q
X
j=1
log |σj(Dqe
SqXc)|+υ+µc
Tq
X
n=1
log ∥[Ψ†
qSqXc]n∥2+υ
.(5.2)
In (5.2), (µc, µc, υ)∈]0,+∞[3are regularization parameters, and, for every q∈ {1, . . . , Q},Jc,q ⩽
min( e
Nq, Lc)is the rank of Dqe
SqXc,σj(Dqe
SqXc)1⩽j⩽Jc,q are the singular values of Dqe
SqXc,
and [Ψ†
qSqXc]†
ndenotes the n-th row of Ψ†
qSqXc. The operators e
Sq∈Re
Nq×Nand Sq∈RNq×N
extract spatially overlapping spatio-spectral facets from the full image cube for the low-rankness
prior and the average sparsity prior, respectively. These two operators only dier in the number of
overlapping pixels considered, which is dened as an adjustable parameter for e
Sq, and prescribed
by [95] for Sq(Figure 5.1). Each facet relies on a spatial decomposition of the image into non-
overlapping tiles (see Figure 5.1 (b), delineated by dashed red lines), each overlapping with its
top and left spatial neighbor. In the following, the overlapping regions will be referred to as the
borders of a facet, in contrast with its underlying tile (see Figure 5.1). A border facet, i.e., which
does not admit a neighbor in one of the two spatial dimensions, has the same dimension as the
underlying tile in the directions where it does not admit a neighbor (e.g., corner facets have the
same dimension as the underlying tile). Note that HyperSARA corresponds to the case Q=C= 1.
75 5.3 Algorithm and implementation
As expected, the prior (5.2) reads as a sum of inter-dependent facet-specic priors. Crucially,
when minimizing the convex objective (5.1) with a reweighted procedure, the splitting functionality
of PDFB can now be further exploited to enable parallel processing of these facet-specic priors.
5.3 Algorithm and implementation
The parallel algorithmic structure of Faceted HyperSARA is described in this section, leveraging
PDFB (3.4.2) within a reweighting approach.
5.3.1 Faceted HyperSARA algorithm
To eciently address the log-sum prior underpinning the Faceted HyperSARA prior, we consider a
reweighted ℓ1approach, which consists in successively solving convex optimization problems with
weighted ℓ1norms [20]. The proposed Faceted HyperSARA algorithm is described in Algorithm 4.
At each iteration k∈N, the Faceted HyperSARA log-sum prior (5.2) is locally approximated by
the weighted hybrid norm prior
erc(Xc,X(k)
c) =
Q
X
q=1 µc∥Dqe
SqXc∥∗,ωq(X(k)
c)+µc∥Ψ†
qSqXc∥2,1,ωq(X(k)
c),(5.3)
where for every q∈ {1, . . . , Q}, the weights ωq(X(k)
c) = ωq,j (X(k)
c)1⩽j⩽Jc,q and ωq(X(k)
c) =
ωq,n(X(k)
c)1⩽n⩽Tqare given by
ωq,j (X(k)
c) = σjDqe
SqX(k)
c+υ−1,(5.4)
ωq,n(X(k)
c) = ∥[Ψ†
qSqX(k)
c]n∥2+υ−1.(5.5)
Then, for each sub-cube c∈ {1, . . . , C}, the associated minimization problem
minimize
Xc∈RN×L
+
Lc
X
l=1
B
X
b=1
ιB(yc,l,b,ϵc,l,b )Φc,l,bxc,l +
Q
X
q=1 µc∥Dqe
SqXc∥∗,ωq(X(k)
c)+µc∥Ψ†
qSqXc∥2,1,ωq(X(k)
c)
(5.6)
is solved using the PDFB algorithm described in Algorithm 5. At the beginning of the algorithm,
the weights are initialized to one (see Algorithm 4line 4, where the notation 1Jcstands for the
vector of size Jcwith all coecients equal to 1, and Jc=Jc,1+. . . +Jc,Q). Note that the weights
dened in (5.4)-(5.5) are multiplied by the regularization parameter υin Algorithm 4, which is
equivalent to rescaling the sub-problem (5.6) by υ. This does not aect the set of minimizers of the
global problem. In a similar fashion to HyperSARA (Section 4.3.4), the regularization parameter
υin (5.4)-(5.5) should decrease from one iteration kto another by a factor 80% to improve the
convergence rate and the stability of the algorithm.
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 76
A complete description of the proposed PDFB algorithm used to solve the sub-problems (5.6)
(see Algorithm 4line 9) is provided in the next section.
5.3.2 Underpinning primal-dual forward-backward algorithm
At each iteration k∈N, line 9of Algorithm 4requires solving a sub-problem of the form (5.6),
corresponding to the approximation of (5.1) at X(k). The main advantage of a primal-dual algo-
rithm such as PDFB lies in the possibility of updating most of the variables to be estimated in
parallel, without resorting to costly operator inversions or sub-iterations. In this work, we resort to
a preconditioned variant of PDFB, which reduces the number of iterations necessary to converge.
A graphical illustration of the algorithm, instantiated for problem (5.6), is given in Figure 5.3. A
formal description is reported in Algorithm 5. Note that PDFB can simultaneously handle data-
delity and the image priors in parallel by associating a separate dual variable to each of them
(see Algorithm 5).
In Algorithm 5, the three main terms appearing in problem (5.6) (i.e., low-rankness prior,
average sparsity prior, and data-delity term) are handled in parallel, in the dual domain, through
their proximity operator. Their exact expression is provided in Appendix .2. First, the faceted
low-rankness prior is handled in lines 11-13 by computing in parallel the proximity operator of the
per facet weighted nuclear norms. Second, the average sparsity prior is addressed in lines 15-17
by computing the proximity operator of the weighted ℓ2,1norm in parallel. Finally, the data-
delity terms are handled in parallel in lines 19-21 by computing, for every (c, l, b), the projection
onto the ℓ2balls B(yc,l,b, ϵc,l,b)with respect to the metric induced by the diagonal matrices Uc,l,b,
chosen using the preconditioning strategy explained in Section 4.3.2. More precisely, their diagonal
coecients are the inverse of the sampling density in the vicinity of the probed Fourier modes. The
projections onto the ℓ2balls for the metric induced by Uc,l,b do not admit an analytic expression,
and thus need to be approximated numerically through sub-iterations. In this work, we resort to
FISTA [9].
In a similar fashion to Section 4.3.2, Algorithm 5is guaranteed to converge to a global solution
to problem (5.6), for a given sub-cube c∈ {1, . . . , C}, provided that the preconditioning matrices
(Uc,l,b)c,l,b and the parameters (τ , ζ, η, κ)satisfy the following condition:
τζ∥e
S∥2
S+η∥U1/2
cΦc∥2
S+κ∥Ψ†∥2
S<1(5.7)
The notation ∥·∥Sdenotes the spectral norm of an operator, e
S= (e
Sq)1⩽q⩽Qand for every
Xc∈RN×Lc,U1/2
cΦc(Xc) = U1/2
c,l,bΦc,l,b xc,l1⩽l⩽Lc
1⩽b⩽B
.In particular, we propose to set these
parameters as follows
ζ=1
∥S∥2
S
= 1, η =1
∥U1/2
cΦc∥2
S
and κ=1
∥Ψ†∥2
S
.(5.8)
77 5.3 Algorithm and implementation
In this setting, convergence is guaranteed for all 0< τ < 1/3.
We recall that PDFB can accommodate randomization in the update of the variables, e.g., by
randomly selecting a subset of the data and facet dual variables to be updated at each iteration.
This procedure can signicantly alleviate the memory load per node [93] at the expense of an
increased number of iterations for the algorithm to converge. This feature, which has been specif-
ically investigated in Appendix .4, is not leveraged in the implementation of Algorithm 5used for
the experiments reported in Sections 5.4 and 5.5.
5.3.3 Parallel algorithmic structure
To solve a spectral sub-problem, c∈ {1, . . . , C }, dierent parallelization strategies can be adopted,
depending on the computing resources available and the size of the problem to be addressed. We
propose to divide the variables to be estimated into the two following groups of computing cores.
•Data cores: Each core involved in this group is responsible for the update of several dual
variables vc,l,b ∈CMc,l,b associated with the data-delity terms (see Algorithm 5line 20).
These cores produce auxiliary variables e
vc,l,b ∈RNof single-channel image size, each assumed
to be held in the memory of a single core (line 21). Note that the Fourier transform computed
for each channel lin line 7is performed once per iteration on the data core (l, 1). Each data
core (l, b), with b∈ {2, . . . , B}, receives only a few coecients of the Fourier transform of xl
from the data core (l, 1), selected by the operator Mc,l,b (line 9);
•Facet cores: Each worker involved in this group, composed of Qcores, is responsible for the
update of an image tile (i.e., a portion of the primal variable) and the dual variables Pc,q
and Wc,q associated with the low-rankness and the joint average sparsity priors, respectively
(Algorithm 5, lines 12 and 16). Note that the image cube is stored across dierent facet cores,
which are responsible for updating their image tile (line 26). Since the facets underlying the
proposed prior overlap, communications involving a maximum of 4 contiguous facet cores are
needed to build the facet borders prior to updating the facets independently in the dual space
(Algorithm 5lines 11–17). Values of the tile of each facet are broadcast to cores handling
neighboring facets in order to update their borders (Algorithm 5line 5, see Figure 5.2(a)).
In a second step, parts of the facet tiles overlapping with borders of nearby facets need to be
updated before each tile is updated independently in the primal space (Algorithm 5line 24).
More precisely, values of the parts of the borders overlapping with the tile of each facet are
broadcast by the workers handling neighboring facets, and averaged (see Figure 5.2(b)).
The complexity of the proposed algorithmic structure with parallelized data and facet cores
is dened by the size of one data block and the size of one image facet. The only requirement
for the algorithm to scale to big data and image dimensions is to have multitude of processing
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 78
Nx,q
Ny,q
Lc
(a) Broadcast values of the tile before facet update in
dual space
Nx,q
Ny,q
Lc
L
L
L
(b) Broadcast and average borders before tile update
in primal space
Figure 5.2: Illustration of the communication steps involving a facet core (represented by the top-left
rectangle in each sub-gure) and a maximum of three of its neighbours. The tile underpinning each facet,
located in its bottom-right corner, is delineated in thick black lines. At each iteration, the following two
steps are performed sequentially. (a) Facet borders need to be completed before each facet is updated
independently in the dual space (Algorithm 5lines 11–17): values of the tile of each facet (top left) are
broadcast to cores handling the neighbouring facets in order to update their borders (Algorithm 5line 5).
(b) Parts of the facet tiles overlapping with borders of nearby facets need to be updated before each tile
is updated independently in the primal space (Algorithm 5line 24): values of the parts of the borders
overlapping with the tile of each facet are broadcast by the cores handling neighbouring facets, and
averaged.
cores. Leveraging the advanced HPC servers (see Section 5.4.2 for more details) with thousands of
computing cores, the proposed algorithm has the potential to scale to big data regimes expected
with the new generation telescopes.
5.3.4 MATLAB implementation
Amatlab implementation of Algorithms 4and 5is available on the Puri-Psi webpage. Both
HyperSARA and Faceted HyperSARA rely on MPI-like matlab parallelization features based on
the spmd matlab function, using composite matlab variables to handle parameters distributed
across several cores (e.g., for the wideband image cube). In this setting, 1 CPU core specically
ensures communication synchronization between the dierent computing cores, and is referred to
as master core in the following.
5.4 Validation on synthetic data
In this section, the impact of spatial faceting is rst assessed in terms of both reconstruction
quality and computing time for a single spectral sub-problem, using a varying number of facets
and a varying size of the overlapping regions. The impact of spectral faceting on the reconstruction
performance of Faceted HyperSARA is then quantied for a single underlying spatial tile (Q= 1).
79 5.4 Validation on synthetic data
1
Data core 1
xt−1
c,1
Φc,1
FB step
prox
| {z }
Backward step
Forward step
z }| {
{ · · · ·}
Φ†
c,1
Data block 1
··· ···
Data core Lc
xt−1
c,Lc
Φc,Lc
FB step
prox
| {z }
Backward step
Forward step
z }| {
{ · · · ·}
Φ†
c,Lc
Data block Lc
Facet core 1
e
S1Xt−1
c,S1Xt−1
c
Spatio-spectral facet 1
FB step
prox
| {z }
Backward step
Forward step
z }| {
{ · · · ·}
e
S1Xt
c,S1Xt
c
··· ···
Facet coreQ
e
SQXt−1
c,SQXt−1
c
Spatio-spectral facet Q
FB step
prox
| {z }
Backward step
Forward step
z }| {
{ · · · ·}
e
SQXt
c,SQXt
c
MNRAS 000, 000–000 (0000)
Figure 5.3: Illustration of the two groups of cores described in Section 5.3 with the main steps involved
in PDFB (Algorithm 5) applied to each independent sub-problem c∈ {1,...,C}, considering Qfacets
(along the spatial dimension) and B= 1 data block per channel. Data cores handle variables of the size
of data blocks (Algorithm 5lines 19–21), whereas facet cores handle variables of the size of a
spatio-spectral facet (Algorithm 5lines 11–17), respectively. Communications between the two groups
are represented by colored arrows. Communications between facet cores, induced by the overlap between
the spatio-spectral facets, are illustrated in Figure 5.2.
Results are compared with those of both SARA and HyperSARA.
5.4.1 Simulation setting
5.4.1.1 Images and data
Following the procedure described in Section 4.4, a wideband model image composed of Lspectral
channels is simulated from an image of the W28 supernova remnant of size N. The measurement
operator relies on a realistic VLA uv-coverage generated within the frequency range [ν1, νL] =
[1,2] GHz with uniformly sampled channels and total observation time of 6 hours. Note that
the uv-coverage associated with each channel indexed lcorresponds to the reference uv-coverage
at the frequency ν1scaled by the factor νl/ν1. The data are corrupted by an additive, zero-
mean complex white Gaussian noise of variance ϱ2. An iSNR (4.13) of 60 dB is considered. As
explained in Section 4.2.2), the regularization parameters of HyperSARA can be set as µ= 1, and
µ, computed from the dirty wideband model cube as µ=∥Xdirty∥∗/∥Ψ†Xdirty ∥2,1, where Xdirty
denotes the dirty image cube. For Faceted HyperSARA, we have observed that setting µc= 1
and µc= 10−2∥Xdirty
c∥∗/∥Ψ†Xdirty
c∥2,1leads to a good trade-o to recover high resolution, high
dynamic range model cubes. Given the higher computational cost of HyperSARA, the size of the
data is chosen so that it can be run in a reasonable amount of time for the dierent simulation
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 80
Algorithm 4: Faceted HyperSARA approach
Input: X(0) = (X(0)
c)c,P(0) = (P(0)
c)c,W(0) = (W(0)
c)c,v(0) = (v(0)
c)c
1k←0;
2Initialization of the weights
3for c= 1 to Cdo
4θ(0)
c= (θ(0)
c,q )1⩽q⩽Q=1T;θ(0)
c= (θ(0)
c,q )1⩽q⩽Q=1Jc;
5while stopping criterion not satised do
6Solve spectral sub-problems in parallel
7for c= 1 to Cdo
8Run Algorithm 5
9(X(k+1)
c,P(k+1)
c,W(k+1)
c,v(k+1)
c) = PDFBX(k)
c,P(k)
c,W(k)
c,v(k)
c,θ(k)
c,θ(k)
c;
10 for q= 1 to Qdo
11 Update weights: low-rankness prior
12 θ(k+1)
c,q =υωq(X(k+1)
c);// using (5.4)
13 Update weights: joint average sparsity prior
14 θ(k+1)
c,q =υωq(X(k+1)
c);// using (5.5)
15 k←k+ 1;
Result: X(k),P(k),W(k),v(k)
scenarios described below.
5.4.1.2 Spatial faceting
The performance of Faceted HyperSARA is rst evaluated with C= 1 (number of facets along
the spectral dimension) for dierent parameters of the spatial faceting. Data generated from a
N= 1024×1024 image composed of L= 20 channels are considered, with Ml= 0.5Nmeasurements
per channel. The assessment is conducted with (i) varying Q(number of facets along the spatial
dimensions) and a xed overlap; (ii) a xed number of facets and a varying spatial overlap for the
nuclear norm regularization. Additional details can be found in the following lines.
•Varying overlap: Reconstruction performance and computing time are evaluated with C= 1
and Q= 16 (4 facets along each spatial dimension) and a varying size of the overlapping
region for the faceted nuclear norm (0%, 6%, 20%, 33% and 50% of the spatial size of the
facet, corresponding to 0, 16, 64, 128 and 256 pixels respectively) in each of the two spatial
dimensions. Note that the overlap for the ℓ2,1prior is a xed parameter [95]. The comparison
is conducted between SARA (with eµ= 10−3), HyperSARA (with µ= 1 and µ= 10−3) and
Faceted HyperSARA (with µc= 1 and µc= 10−5).
•Varying number of facets: The reconstruction performance and computing time of Faceted
81 5.4 Validation on synthetic data
Algorithm 5: The PDFB algorithm underpinning Faceted HyperSARA
Data: (yc,l,b)l,b ,l∈ {1, . . . , Lc},b∈ {1, . . . , B}
Input: X(0)
c,P(0)
c=P(0)
c,q q,W(0)
c=W(0)
c,q q,v(0)
c=v(0)
c,l,bc,l,b ,θc=θc,q1⩽q⩽Q,
θc=θc,q1⩽q⩽Q
Parameters: (Dc,q)q,(Uc,l,b )l,b,(ϵc,l,b )l,b,µc,µc,τ,ζ,η,κ
1t←0;ξ= +∞;ˇ
X(0)
c=X(0)
c;
2while ξ > 10−5do
3Broadcast auxiliary variables
4for q = 1 to Qdo
5e
X(t)
c,q =e
Sqˇ
X(t)
c;ˇ
X(t)
c,q =Sqˇ
X(t)
c;
6for l= 1 to Lcdo
7b
x(t)
c,l =FZˇ
x(t)
c,l ;// Fourier transforms
8for b = 1 to Bdo
9b
x(t)
c,l,b =Mc,l,b b
x(t)
c,l ;// send to data cores
10 Update low-rankness variables [facet cores]
11 for q = 1 to Qdo
12 P(t+1)
c,q =IJq−proxζ−1µc∥·∥∗,θc,q P(t)
c,q +Dc,q e
X(t)
c,q;
13 e
P(t+1)
c,q =D†
qP(t+1)
c;
14 Update sparsity variables [facet cores]
15 for q = 1 to Qdo
16 W(t+1)
c,q =ITq−proxκ−1µc∥·∥2,1,θc,q W(t)
c,q +Ψ†
qˇ
X(t)
c,q;
17 f
W(t+1)
c,q =ΨqW(t+1)
c,q ;
18 Update data-delity variables [data cores]
19 for (l,b) = (1, 1) to (Lc,B)do
20 v(t+1)
c,l,b =Uc,l,bIMc,l,b −proxUc,l,b
ιB(yc,l,b,ϵc,l,b )U−1
c,l,bv(t)
c,l,b +Θc,l,bGc,l,b b
x(t)
c,l,b;
21 e
v(t+1)
c,l,b =G†
c,l,bȆ
c,l,bv(t+1)
c,l,b ;
22 Inter node communications
23 for l = 1 to Lcdo
24 h(t)
c,l =
Q
X
q=1ζe
S†
qe
p(t+1)
c,q,l +κS†
qe
w(t+1)
c,q,l +ηZ†F†
B
X
b=1
M†
c,l,be
v(t+1)
c,l,b ;
25 Update image tiles [on facet cores, in parallel]
26 X(t+1)
c= proxιRN×Lc
+X(t)
c−τH(t)
c;
27 ˇ
X(t)
c= 2X(t+1)
c−X(t)
c;// communicate facet borders
28 ξ=∥X(t+1)
c−X(t)
c∥F
∥X(t)
c∥F
;
29 t←t+ 1;
Result: X(t)
c,P(t)
c,W(t)
c,v(t)
c
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 82
HyperSARA are reported for experiments with Q∈ {4,9,16}(corresponding to 2, 3 and 4
facets along each spatial dimension) with a xed overlap corresponding to 50% of the spatial
size of a facet. The regularization parameters are set to the same values as those considered
in the experiment with a varying overlap.
5.4.1.3 Spectral faceting
The inuence of spectral faceting is evaluated in terms of computing time and reconstruction quality
from data generated with a ground truth image composed of N= 256 ×256 pixels in L= 100
channels, with Ml=Nmeasurements per channel. The overall reconstruction performance of
SARA (with eµ= 10−2), HyperSARA (with µ= 1 and µ= 10−2) and Faceted HyperSARA (with
µc= 1 and µc= 10−2) with a single facet along the spatial dimension (Q= 1) is compared.
For faceted HyperSARA, a channel-interleaving process with a varying number of facets along the
spectral dimension Cis considered (see Section 5.2 and Figure 5.1 (b)). The simulation scenario
involves facets composed of a varying number of channels Lc(Lc≈6, 10, 14, 20, 33 and 50
channels for each sub-problem c∈ {1, . . . , C}) obtained by down-sampling the data cube along the
frequency dimension.
5.4.2 Hardware
All the methods compared in this section have been run on a single compute node of Cirrus, one
of the UK’s Tier2 HPC services1. Cirrus is an SGI ICE XA system composed of 280 compute
nodes, each with two 2.1 GHz, 18-core, Intel Xeon E5-2695 (Broadwell) series processors. The
compute nodes have 256 GB of memory shared between the two processors. The system has a
single Inniband FDR network connecting nodes with a bandwidth of 54.5 Gb/s. Note that the
number of cores assigned to each group of cores of Faceted HyperSARA (i.e., data and facet cores)
has been chosen to ensure a reasonable balance between the dierent computing tasks.
5.4.3 Evaluation metrics
Performance is evaluated in terms of global computing time (elapsed real time) and reconstruction
SNR, dened for each channel l∈ {1, . . . , L}as in (4.19). Results are reported in terms of the
average SNR (aSNR) (4.20). Since the above criterion shows limitations to reect the dynamic
range and thus appreciate improvements in the quality of faint emissions, the following criterion
is computed over images in log10 scale
SNRlog,l(xl) = 20 log10 ∥log10 (xl+ε1N)∥2
∥log10(xl+ϵ1N)−log10(xl+ε1N)∥2,
1https://epsrc.ukri.org/research/facilities/hpc/tier2/
83 5.4 Validation on synthetic data
where the log10 function is applied term-wise, and εis an arbitrarily small parameter to avoid
numerical issues (εis set to machine precision). Results are similarly reported in terms of the
average log-SNR, dened as
aSNRlog(X) = 1
L
L
X
l=1
SNRlog,l(xl).(5.9)
5.4.4 Results and discussion
5.4.4.1 Spatial faceting
•Varying spatial overlap: The results reported in Table 5.1 show that spatial faceting gives
a good reconstruction of high intensity pixels (reected by an aSNR close to HyperSARA).
Even if the performance of the proposed approach does not vary much in terms of aSNR
as the overlap for the faceted nuclear norm increases, the aSNRlog improves signicantly.
This reects the ability of the proposed prior to enhance the estimation of faint emissions
and ner details by promoting local correlations. This observation is further conrmed by
the reconstructed images, reported in Jy/pixel in Figure 5.4 for channel ν1= 1 GHz and in
Figure 5.5 for channel ν20 = 2 GHz, showing that Faceted HyperSARA reconstructs images
with a higher dynamic range (see the zoomed region delineated in white in Figures 5.4
and 5.5). The associated residual images (last row of Figures 5.4 and 5.5) are comparable to
or better than HyperSARA. Note that the regular patterns observed on the residual images
do not result from the faceting, as they are not aligned with the facet borders and appear
for both approaches. From a computational point of view, Table 5.1 shows that increasing
the overlap size results in a moderate increase in the computing time. Overall, an overlap
of 50% gives the best reconstruction SNR for reasonable computing time, and will thus be
considered as a default faceting setting for the real data experiments reported in Section 5.5.
•Varying number of facets Qalong the spatial dimension: The reconstruction performance
and computing time reported in Table 5.2 show that Faceted HyperSARA gives an almost
constant reconstruction performance as the number of facets increases, for an overall com-
puting time getting closer to the SARA approach. The dynamic range of the reconstructed
images (second row of Figures 5.4 and 5.5) is notably higher for the faceted approach, as
indicated by the aSNRlog values reported in Table 5.2. These results conrm the potential of
the proposed approach to scale to large image sizes by increasing the number of facets along
the spatial dimensions while ensuring a stable reconstruction level as the number of facets
increases. In particular, the setting Q= 16 is reported to ensure a satisfactory reconstruction
performance for a signicantly reduced computing time.
In both experiments, Faceted HyperSARA has a much lower SNR standard deviation than
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 84
Time (h) aSNR (dB) aSNRlog (dB) CPU cores
SARA 5.89 32.78 (±2.76)-1.74 (±0.83)120
HyperSARA 133.1 38.63 (±0.23)-0.39 (±0.95)22
Faceted nuclear norm overlap (0%) 26.26 37.03 (±2.90 ·10−3)5.09 (±1.09)36
Faceted nuclear norm overlap (6%) 18.01 37.01 (±1.00 ·10−3)4.09 (±0.99)36
Faceted nuclear norm overlap (20%) 18.11 36.86 (±0.90 ·10−3)4.51 (±1.07)36
Faceted nuclear norm overlap (33%) 17.94 36.98 (±1.60 ·10−3)6.00 (±1.05)36
Faceted nuclear norm overlap (50%) 20.75 37.08 (±1.60 ·10−3)7.88 (±0.91)36
Table 5.1: Spatial faceting experiment: varying size of the overlap region for the faceted nuclear norm
regularization. Reconstruction performance of Faceted HyperSARA with Q= 16 and C= 1, compared
to HyperSARA (i.e. Faceted HyperSARA with Q=C= 1) and SARA. The results are reported in
terms of reconstruction time, aSNR and aSNRlog (both in dB with the associated standard deviation),
and total number of CPU cores used to reconstruct the full image. The evolution of the aSNRlog, of
specic interest for this experiment, is highlighted in bold face.
HyperSARA and SARA (see Tables 5.1 and 5.2), i.e., ensures a more stable recovery quality
across channels. This results from the stronger spatio-spectral correlations induced by the proposed
faceted regularization, in comparison with both the HyperSARA and SARA priors.
5.4.4.2 Spectral faceting
The results reported in Table 5.3 show that Faceted HyperSARA using channel-interleaved facets
retains most of the overall reconstruction performance of HyperSARA, ensuring a reconstruction
quality signicantly better than SARA. As expected, the reconstruction quality of faint emis-
sions, reected by the aSNRlog values, gradually decreases as fewer channels are involved in each
facet (i.e., as Cincreases). This observation is qualitatively conrmed by the images reported in
Figure 5.6 for channel ν1= 1 GHz and Figure 5.7 for channel ν100 = 2 GHz (in Jy/pixel), for
facets composed of 10 channels each (see the zoomed regions in Figures 5.6 and 5.7). The slight
loss of dynamic range is likely due to the reduction in the amount of data per spectral sub-cube.
Spectral faceting remains however computationally attractive, in that it preserves the overall imag-
ing quality of HyperSARA up to an already signicant amount of interleaving (see discussion in
Section 5.2.0.1), while allowing lower-dimension wideband imaging sub-problems to be considered
(see discussion in Section 5.2). This strategy oers an increased scalability potential to Faceted
HyperSARA over HyperSARA, which may reveal of signicant interest in extreme dimension.
5.5 Validation on real data
In this section, we illustrate both the precision and scalability potential of Faceted HyperSARA
through the reconstruction of a 15 GB image cube of Cyg A from 7.4 GB of VLA data. The
algorithm is mapped on 496 CPU cores on a high performance computing system, achieving a Ter-
aFLOPS proof of concept. The performance of the proposed approach is evaluated in comparison
with the monochromatic imaging approach SARA [88] and the CLEAN-based wideband imaging
85 5.5 Validation on real data
Time (h) aSNR (dB) aSNRlog (dB) CPU cores
SARA 6.23 32.78 (±2.76) -1.74 (±0.83) 120
HyperSARA 133.08 38.63 (±0.23) -0.39 (±0.95) 22
Faceted HyperSARA (Q= 4)42.04 36.58 (±1.80 ·10−3) 10.19 (±0.88) 24
Faceted HyperSARA (Q= 9)21.60 37.00 (±1.70 ·10−3) 5.88 (±1.00) 29
Faceted HyperSARA (Q= 16)17.94 37.08 (±1.60 ·10−3) 7.88 (±1.05) 36
Table 5.2: Spatial faceting experiment: varying number of facets along the spatial dimension Q.
Reconstruction performance of Faceted HyperSARA (C= 1, overlap of 50%), compared to HyperSARA
(i.e. Faceted HyperSARA with Q=C= 1) and SARA. The results are reported in terms of
reconstruction time, aSNR and aSNRlog (both in dB with the associated standard deviation), and total
number of CPU cores used to reconstruct the full image. The evolution of the computing time, of specic
interest for this experiment, is highlighted in bold face.
Time (h) aSNR aSNRlog CPU cores
SARA 0.19 25.04 (±4.06)-6.28 (±0.60) 100
HyperSARA 14.83 31.74(±1.31)-1.24 (±0.57) 7
Faceted HyperSARA (C= 16,Lc≈6) 1.31 31.05 (±0.98)-3.54 (±1.37) 112
Faceted HyperSARA (C= 10,Lc= 10) 1.87 31.48 (±0.82)-3.26 (±1.43) 70
Faceted HyperSARA (C= 7,Lc≈14) 2.36 31.68 (±0.90)-2.90 (±1.38) 49
Faceted HyperSARA (C= 5,Lc= 20) 3.31 31.84 (±0.92)-2.33 (±0.91) 35
Faceted HyperSARA (C= 3,Lc≈33) 5.10 32.00 (±1.04)-2.33 (±1.07) 21
Faceted HyperSARA (C= 2,Lc= 50) 7.56 31.97 (±1.08)-1.63 (±0.64) 14
Table 5.3: Spectral faceting experiment: reconstruction performance of Faceted HyperSARA with a
varying number of spectral sub-problems Cand Q= 1, compared to HyperSARA (i.e. Faceted
HyperSARA with Q=C= 1) and SARA. The results are reported in terms of reconstruction time,
aSNR and aSNRlog (both in dB with the associated standard deviation) and total number of CPU cores.
The reconstruction performance of Faceted HyperSARA, specically investigated in this experiment, is
highlighted in bold face.
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 86
Channel ν1= 1 GHz
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-3
-2
-1
0
1
2
3
10-4
-3
-2
-1
0
1
2
3
10-4
Figure 5.4: Spatial faceting analysis for synthetic data: reconstructed images (in Jy/pixel) reported in
log10 scale for channel ν1= 1 GHz for Faceted HyperSARA with Q= 16 and C= 1 (left), and
HyperSARA (i.e. Faceted HyperSARA with Q=C= 1, right). From top to bottom are reported the
ground truth image, the reconstructed and residual images. The overlap for the faceted nuclear norm
regularization corresponds to 50% of the spatial size of a facet. The non-overlapping tiles underlying the
denition of the facets are delineated on the residual images in red dotted lines, with the central facet
displayed in continuous lines.
87 5.5 Validation on real data
Channel ν20 = 2 GHz
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-3
-2
-1
0
1
2
3
10-4
-3
-2
-1
0
1
2
3
10-4
Figure 5.5: Spatial faceting analysis for synthetic data: reconstructed images (in Jy/pixel) reported in
log10 scale for channel ν20 = 2 GHz for Faceted HyperSARA with Q= 16 and C= 1 (left), and
HyperSARA (i.e. Faceted HyperSARA with Q=C= 1, right). From top to bottom are reported the
ground truth image, the reconstructed and residual images. The overlap for the faceted nuclear norm
regularization corresponds to 50% of the spatial size of a facet. The non-overlapping tiles underlying the
denition of the facets are delineated on the residual images in red dotted lines, with the central facet
displayed in continuous lines.
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 88
Channel ν1= 1 GHz
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-3
-2
-1
0
1
2
3
10-4
-3
-2
-1
0
1
2
3
10-4
Figure 5.6: Spectral faceting analysis for synthetic data: reconstructed images (in Jy/pixel) reported in
log10 scale for channel ν1= 1 GHz with Faceted HyperSARA for C= 10 and Q= 1 (left) and
HyperSARA (i.e. Faceted HyperSARA with Q=C= 1, right). Each sub-cube is composed of 10 out of
the L= 100 channels. From top to bottom: ground truth image, estimated model images and residual
images.
89 5.5 Validation on real data
Channel ν100 = 2 GHz
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-3
-2
-1
0
1
2
3
10-4
-3
-2
-1
0
1
2
3
10-4
Figure 5.7: Spectral faceting analysis for synthetic data: reconstructed images (in Jy/pixel) reported in
log10 scale for channel ν100 = 2 GHz with Faceted HyperSARA for C= 10 and Q= 1 (left) and
HyperSARA (i.e. Faceted HyperSARA with Q=C= 1, right). Each sub-cube is composed of 10 out of
the L= 100 channels. From top to bottom: ground truth image, estimated model images and residual
images.
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 90
algorithm JC-CLEAN in the software wsclean [85].
5.5.1 Dataset description and imaging settings
The data analyzed in this section are part of wideband VLA observations of the celebrated radio
galaxy Cyg A, acquired over two years (2015-2016) within the frequency range 2–18 GHz. We
consider 480 channels in C band spanning the frequency range [ν1, ν480] = [3.979,8.019] GHz, with
a frequency step δν = 8 MHz and a total bandwidth of 4.04 GHz. The phase center coordinates
are RA = 19h 59mn 28.356s (J2000) and DEC = +40◦44′2.07′′. The data set was acquired
at four instances which correspond to the frequency ranges [ν1, ν256] = [3.979,6.019] GHz and
[ν257, ν480 ] = [5.979,8.019] GHz, and VLA congurations A and C. The wideband data consists of
30 spectral windows, composed of 16 channels each, with approximately 106complex visibilities
per channel (about 8×105and 2×105measurements for congurations A and C, respectively),
stored as double-precision complex numbers.
In order to improve the accuracy of our modelled measurement operator, we have conducted
a pre-processing step consisting in a joint DDE calibration and imaging, applied to each chan-
nel separately. The approach, originally proposed by [98], consists in the alternate estimation
of the unknown DDEs and the image of interest, with a spatio-temporal smoothness DDE prior
and an average sparsity image prior [98,101,121]. The underpinning algorithmic structure oers
convergence guarantees to a critical point of the global non-convex optimization problem for joint
calibration and imaging, and the approach was suggested to open the door to a signicant im-
provement over the state-of-the-art [98]. Note that one would ultimately want to resort to such a
joint calibration and imaging approach to reconstruct the nal wideband image cube [45], rather
than applying it as a pre-processing step on each channel separately. However, the underpinning
algorithmic structure does not enable a faceting approach like the one proposed here, thereby
severely limiting its scalability. We thus restrict ourselves to using this approach separately on
each channel for scalability, and essentially to estimate DDEs. These are easily integrated into
the forward model (2.19) as explained in Section 2.4. The estimated model visibilities are also
exploited to determine estimates of the noise statistics, thus computing the ℓ2bounds dening the
data-delity constraints. Note that both SARA and Faceted HyperSARA take advantage of this
pre-processing step, in contrast with JC-CLEAN, as the antenna-based DDEs estimates, provided
in the spatial Fourier domain, cannot be incorporated into the wsclean software.
We consider the reconstruction of images of size N= 2560×1536 from data acquired in L= 480
channels, with a pixel size δx = 0.06′′ (in both directions) corresponding to the eld of view (FoV)
Ω = 2.56′×1.536′. The pixel size is such that the spatial bandwidth of the recovered signal is
up to 1.75 times the nominal resolution at the highest frequency νL= 8.019 GHz, and 3.53 times
the nominal resolution at the lowest frequency νL= 3.979 GHz. For both SARA and Faceted
HyperSARA, we consider B= 2 data blocks per channel, associated with VLA congurations
91 5.5 Validation on real data
A and C and presenting dierent noise statistics. More specically to Faceted HyperSARA, we
consider C= 16 channel-interleaved sub-problems with Lc= 30, for any c∈ {1, . . . , C }, and
Q= 5 ×3facets along the spatial dimension, resulting in a total of Q×C= 240 spatio-spectral
facets.
With regards to initialization, both SARA and Faceted HyperSARA are initialized with the
wideband model cube X(0) = (X(0)
c)1⩽c⩽C, obtained by the monochromatic joint calibration and
imaging pre-processing step. From now on, for any (c, l)∈ {1, . . . , C} × {1, . . . , Lc}, the gridding
matrices Gc,l include the estimated DDEs, and Φc,l refers to the resulting measurement operator.
The ℓ2bounds dening the data-delity constraints are approximated as follows. For each data
block indexed by (c, l, b)∈ {1, . . . , C}×{1, . . . , Lc}×{1, . . . , B}, we set ϵc,l,b =∥yc,l,b−Φc,l,b x(0)
c,l ∥2.
More specically to Faceted HyperSARA, the weights of the reweighting scheme (Algorithm 4)
are initialized from the cube X(0) estimated in the pre-processing step based on (5.4) and (5.5).
Following the considerations from Section 5.4.1, the regularization parameters µcand µcare set
as µc= 1/∥X(0)
c∥∗= 10−2and µc= 10−2/∥Ψ†X(0)
c∥2,1= 5 ×10−6. Finally, SARA regularization
parameter eµis xed to eµ= 5 ×10−6.
5.5.2 Hardware
All the methods investigated in this section have been run on multiple nodes of Cirrus, having 36
cores and 256 GB of memory each (see Section 5.4.2 for further details). We recall that SARA and
Faceted HyperSARA are implemented using matlab whilst JC-CLEAN is implemented in c++.
Note that HyperSARA is not considered for the following experiment due to its prohibitive cost.
5.5.3 Evaluation metrics
We rst evaluate imaging precision by visually inspecting the images obtained with the proposed
Faceted HyperSARA, in comparison with the monochromatic imaging approach SARA [88] and
JC-CLEAN [85]. For Faceted HyperSARA and SARA, we consider the estimated model cube X
and the naturally-weighted residual image cube Rwhose columns, indexed by l∈ {1, . . . , L}, are
given by rl=ηlΦ†
l(yl−Φlxl), where yl=Θl¯
ylare the naturally-weighted RI measurements,
Θlis the corresponding noise-whitening matrix and Φl=ΘlGlFZ is the associated measurement
operator. The normalization factor ηlis obtained such that the associated PSF, given by ηlΦ†
lΦlδ,
has a peak value equal to 1, where δ∈RNis an image with value 1 at the phase center and
zero elsewhere. In contrast with SARA and Faceted HyperSARA, for which natural weighting is
adopted, optimal results are obtained for JC-CLEAN with Briggs weighting [16]. We consider the
Briggs-weighted residual image cube e
R= (e
rl)1⩽l⩽Land the restored image cube T= (tl)1⩽l⩽L
whose columns are dened as tl=xl∗cl+e
rl, where xlis the estimated model image and clis
the CLEAN beam (typically a Gaussian tted to the primary lobe of the associated PSF). As a
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 92
quantitative metric of delity to data the average standard deviation (aSTD) (4.23) is reported for
the three residual image cubes. The computing time (elapsed time), resources (number of CPU
cores) and overall computing cost of the dierent approaches are reported in Table 5.4 to assess
their scalability.
5.5.4 Results and discussion
5.5.4.1 Imaging quality
To assess the reconstruction quality of our approach in comparison with SARA and JC-CLEAN, we
rst examine the estimated images of channels ν1= 3.979 GHz and ν480 = 8.019 GHz, displayed
in Figures 5.8 and 5.9, respectively. Note that these channels correspond to channel indexes 1
and 30 of the 1st and 16th sub-problems, respectively. We then examine the average images of
the estimated cubes in Figure 5.10. Furthermore, we provide an analysis of the full estimated
image cubes obtained with the three methods, which are available online [117]. The displayed
images are overlaid with zooms on selected key regions of the radio galaxy. These are (i) the west
hotspot (top left, left panel) over the angular area Ω1= 0.08′×0.08′, centered at the position
given by RA = 19h 59mn 33.006s (J2000) and DEC = +40◦43′40.889′′ and (ii) the inner core
of Cyg A (top left, right panel) over the angular area Ω2= 0.03′×0.03′, centered at the position
RA = 19h 59mn 28.345s (J2000) and DEC = +40◦44′2.015′′. Note that the scale range of the
displayed zooms are adapted to ensure a clear visualization of the contrast within the dierent
structures of Cyg A.
In general, a visual inspection of the reconstructed images displayed in Figures 5.8 and 5.9
(log10 scale) indicates superior imaging quality of Faceted HyperSARA model images (top row),
compared to the model images of SARA (middle row) and the restored images of JC-CLEAN
(bottom row), with SARA imaging quality outperforming JC-CLEAN. On the one hand, the
higher resolution of Faceted HyperSARA is reected by a better reconstruction of the hotspots
and the inner core of Cyg A, in particular at the low-frequency channels (see Figure 5.8, rst
row, top left zooms). On the other hand, its higher dynamic range is reected by the enhanced
estimation of faint emissions in Cyg A, in particular, structures whose surface brightness is within
the range [0.01,0.1] mJy (see the arc around the right end of the west jet in Figure 5.9, rst
row). We further observe that the proposed spatial tessellation does not introduce artifacts in
the estimated images over the large dynamic range of interest. For SARA, given that no spectral
correlation is promoted, the reconstruction quality of the dierent channels is restricted to their
inherent resolution and sensitivity. This explains the lower reconstruction quality of SARA in
comparison with Faceted HyperSARA. JC-CLEAN restored images exhibit a comparatively poorer
reconstruction quality, as they are limited to the instrument’s resolution (through convolutions with
the channel-associated synthesized CLEAN beams). Furthermore, the associated dynamic range
93 5.5 Validation on real data
is limited by the prominent artifacts resulting from the lack of DDE calibration. The inspection
of the average images displayed in Figure 5.10 conrms the ability of the proposed approach to
recover ne details of Cyg A in comparison with SARA and JC-CLEAN.
Naturally-weighted residual images obtained with Faceted HyperSARA and SARA, and Briggs-
weighted residual images obtained with JC-CLEAN are reported in Figures 5.8–5.10, displayed on
bottom right panels overlaying the full recovered images. Their respective aSTD values are 5.46 ×
10−4,4.53×10−4and 5,2×10−4, indicating a comparable delity to data. Yet, a visual inspection
of the residual images, in particular for the average residual images (Figure 5.10, bottom right
panels), indicates that details of Cyg A jets are not fully recovered by SARA, as opposed to Faceted
HyperSARA. Given that both approaches satisfy the same data constraints, this demonstrates the
eciency of the Faceted HyperSARA prior to capture the details of the galaxy. Although no DDE
solutions are incorporated into the forward-modelling of JC-CLEAN, its residuals are homogeneous
due to the absence of the non-negativity constraint. In fact, negative components are absorbed
in its model images (consisting of the CLEAN components) to compensate for spurious positive
components.
Recently, [40] have reported the presence of a bright object in the inner core of the galaxy. The
object, dubbed Cyg A-2, has been identied as a second black hole. Its location is highlighted
with a white dashed circle in Figures 5.8–5.10 (top left, right panel), centered at the position given
by RA = 19h 59mn 28.322s (J2000) and DEC = +40◦44′1.89′′ with a radius of size 0.1′′. The
discovery was further conrmed in [44] by imaging two monochromatic VLA data sets at C band
(6.678 GHz) and X band (8.422 GHz) with SARA. Interestingly, the inspection of the estimated
image cube provided in [117] indicates that Cyg A-2 is discernible in images reconstructed by
Faceted HyperSARA at frequencies lower than ever. More precisely, the source is resolved in
all channels within the range [5.979,8.019] GHz with an average ux of 0.5164 (±0.1394) mJy.
SARA, however, succeeds in detecting it within the range [7.131,8.019] GHz with an average ux
of 0.5157 (±0.3957) mJy. Given the important calibration errors present in the associated restored
images, JC-CLEAN is not able to resolve Cyg A-2.
Interestingly, the examination of the full image cubes shows consistency in the image reconstruc-
tion quality of the 16 independent sub-problems. In general, we observe that the spectra recovered
by the dierent methods are nearly at across each adjacent 16 channels (composing a spectral win-
dow). We further observe a spectral discontinuity at channel ν257 = 5.979 GHz for the three meth-
ods, particularly noticeable in the inner core of Cyg A. Since the data set has been acquired sep-
arately in the frequency ranges [ν1, ν256] = [3.979,6.019] GHz and [ν257 , νL] = [5.979,8.019] GHz,
such spectral discrepancy may result from dierences in the calibration errors and noise statistics
between the two-channel ranges.
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 94
Time (h) CPU cores CPU time (h)
Faceted HyperSARA 68 496 33728
SARA 12.5 5760 72000
JC-CLEAN 22 576 12672
Table 5.4: Computing cost of Cyg A imaging at the spectral resolution 8MHz from 7.4 GB of data.
Results are reported for Faceted HyperSARA, SARA, and JC-CLEAN in terms of reconstruction time,
number of CPU cores and overall CPU time (highlighted in bold face).
5.5.4.2 Computing cost
The computing time and resources required by the dierent methods are reported in Table 5.4.
JC-CLEAN required 12672 CPU hours, with 36 CPU cores assigned to each sub-problem, whereas
SARA leveraged 72000 CPU hours using the parallelization procedure proposed by [87]. More
precisely, each channel is reconstructed using 12 CPU cores: 1 master CPU core, 2 CPU cores for
the data-delity terms (one core per data-delity term), and 9 CPU cores to handle the average
sparsity terms (associated with the nine bases of the SARA dictionary). Finally, Faceted Hyper-
SARA required 33728 CPU hours. More specically, each sub-problem (composed of 30 channels)
uses 1 master CPU core, 15 CPU cores to process the 2×30 data-delity terms (4 data-delity
blocks handled by each core), and 15 CPU cores to handle the 15 spatio-spectral facets. These
numbers indicate an overall higher eciency of the parallelization procedure adopted for Faceted
HyperSARA when compared to SARA, with better use of the allocated resources.
The results reported in this section show the ability of the proposed approach to provide signif-
icantly higher resolution and higher dynamic range image cubes in comparison with JC-CLEAN.
Interestingly, this quantum jump in imaging quality comes at a computing cost of Faceted Hyper-
SARA, implemented in matlab, not far from the JC-CLEAN c++ implementation, suggesting
that the gap can be signicantly reduced, if not closed, with the forthcoming c++ implementation
of Faceted HyperSARA.
5.6 Conclusions
We have introduced the Faceted HyperSARA method, which leverages a spatio-spectral facet
prior model for wideband radio-interferometric imaging. The underlying regularization encodes a
sophisticated facet-specic prior model to ensure precision of the image reconstruction, allowing
the bottleneck induced by the size of the image cube to be eciently addressed via parallelization.
Experiments conducted on synthetic data conrm that the proposed approach can provide a major
increase in scalability in comparison with the original HyperSARA algorithm, at no cost in imaging
quality, and showing potential to improve the reconstruction of faint emissions. Leveraging the
power of a large scale high performance computing system, our matlab implementation (avail-
able on github https://basp-group.github.io/Puri-Psi/) has been further validated on the
95 5.6 Conclusions
Channel ν1= 3.979 GHz
10-5
10-4
10-3
10-2
0.01
0.02
0.03
0.04
10-4
10-3
-0.01
0
0.01
10-5
10-4
10-3
10-2
0.01
0.02
0.03
0.04
10-4
10-3
-0.01
0
0.01
10-3
10-2
10-1
0.41
0.81
1.22
1.63
0.0041
0.0407
-0.01
0
0.01
Figure 5.8: Cyg A imaged at the spectral resolution 8MHz from 7.4 GB of data. Imaging results of
channel ν1= 3.979 GHz. Estimated images at the angular resolution 0.06′′ (3.53 times the observations
spatial bandwidth). From top to bottom: the respective estimated model images of the proposed Faceted
HyperSARA (Q= 15,C= 16) and SARA, both in units of Jy/pixel, and restored image of JC-CLEAN
in units of Jy/beam. The associated synthesized beam is of size 0.37′′ ×0.35′′ and its ux is 42.18 Jy.The
full FoV images (log10 scale) are overlaid with the residual images (bottom right, linear scale) and zooms
on selected regions in Cyg A (top left, log10 scale). These correspond to the west hotspot (left) and the
inner core of Cyg A (right). The zoomed regions are displayed with dierent value ranges for contrast
visualization purposes and are highlighted with white boxes in the full images. Cyg A-2 location is
highlighted with a white dashed circle. Negative pixel values of JC-CLEAN restored image and
associated zooms are set to 0 for visualization purposes. Full image cubes are available online [117].
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 96
Channel ν480 = 8.019 GHz
10-5
10-4
10-3
10-2
0.01
0.02
0.03
0.04
10-4
10-3
-5
0
5
10-3
10-5
10-4
10-3
10-2
0.01
0.02
0.03
0.04
10-4
10-3
-5
0
5
10-3
10-4
10-3
10-2
10-1
0.08
0.16
0.24
0.32
0.0008
0.008
-5
0
5
10-3
Figure 5.9: Cyg A imaged at the spectral resolution 8MHz from 7.4 GB of data. Reconstruction
results of channel ν480 = 8.019 GHz. Estimated images at the angular resolution 0.06′′ (1.75 times the
observations spatial bandwidths). From top to bottom: the respective estimated model images of the
proposed Faceted HyperSARA (Q= 15,C= 16) and SARA, both in units of Jy/pixel, and restored
image of JC-CLEAN in units of Jy/beam. The associated synthesized beam is of size 0.17′′ ×0.15′′ and
its ux is 8.32 Jy. The full FoV images (log10 scale) are overlaid with the residual images (bottom right,
linear scale) and zooms on selected regions in Cyg A (top left, log10 scale). These correspond to the west
hotspot (left) and the inner core of Cyg A (right). The zoomed regions are displayed with dierent value
ranges for contrast visualization purposes and are highlighted with white boxes in the full images. Cyg
A-2 location is highlighted with a white dashed circle. Negative pixel values of JC-CLEAN restored image
and associated zooms are set to 0 for visualization purposes. Full image cubes are available online [117].
97 5.6 Conclusions
Average estimated images
10-5
10-4
10-3
0.01
0.02
0.03
0.04
10-4
10-3
-1
0
1
10-3
10-5
10-4
10-3
0.01
0.02
0.03
0.04
10-4
10-3
-1
0
1
10-3
10-5
10-4
10-3
0.01
0.02
0.03
0.04
10-4
10-3
-1
0
1
10-3
Figure 5.10: Cyg A imaged at the spectral resolution 8MHz from 7.4 GB of data. Average estimated
images, computed as the mean along the spectral dimension. From top to bottom: the respective
estimated average model images of the proposed Faceted HyperSARA (Q= 15,C= 16) and SARA, and
the average restored image of JC-CLEAN (obtained as the mean of the restored images normalized by
the ux of their associated synthesized beam). The full FoV images (log10 scale) are overlaid with the
residual images (bottom right, linear scale) and zooms on selected regions in Cyg A (top left, log10 scale).
These correspond to the west hotspot (left) and the inner core of Cyg A (right). The zoomed regions are
displayed with dierent value ranges for contrast visualization purposes and are highlighted with white
boxes in the full images. Cyg A-2 location is highlighted with a white dashed circle. Negative pixel
values of JC-CLEAN restored image and associated zooms are set to 0 for visualization purposes.
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 98
reconstruction of a 15 GB image cube of Cyg A from 7.4 GB of VLA data. The associated results
are a practical proof of concept of the scalability of Faceted HyperSARA, which is also shown
to provide a signicant improvement in the imaging quality with respect to JC-CLEAN. Since a
comparison with HyperSARA would have been impractical, we show that Faceted HyperSARA
also supersedes the early monochromatic SARA approach in imaging precision. Interestingly, our
results conrm the recent discovery of a super-massive second black hole in the inner core of Cyg
A at much lower frequencies than both JC-CLEAN and SARA (the black hole is detected and
resolved at C band, starting from 5.979 GHz). Our work further illustrates the potential of ad-
vanced algorithms to enhance imaging quality beyond instrument resolution, opening the door to
cost-saving considerations for forthcoming arrays.
Having addressed the imaging problem in radio interferometry, we introduce in the next chapter
a new uncertainty quantication approach to assess the degree of condence in particular 3D
structures and faint emissions appearing in the estimated cube.
Chapter 6
Wideband uncertainty
quantication by convex
optimization
Contents
6.1 Motivation .................................... 99
6.2 Wideband uncertainty quantication approach ..............100
6.2.1 Bayesian hypothesis test ........................... 100
6.2.2 Choice of the set S.............................. 103
6.3 Proposed minimization problem .......................104
6.4 Proposed algorithmic structure ........................105
6.4.1 Epigraphical splitting ............................. 105
6.4.2 Underpinning primal-dual forward-backward algorithm .......... 106
6.5 Validation on synthetic data ..........................107
6.5.1 Simulation setting .............................. 107
6.5.2 Uncertainty quantication parameter .................... 109
6.5.3 Results and discussion ............................ 110
6.6 Conclusions ....................................111
6.1 Motivation
By now, we have tackled the wideband image formation problem in RI by introducing the Hyper-
SARA, and the Faceted HyperSARA approaches for precise and scalable imaging. These methods
provide solutions that are easily visualized, yet are typically unable to analyze the uncertainty
99
Chapter 6: Wideband uncertainty quantication by convex optimization 100
associated with the solution delivered. Since the wideband RI imaging problem is highly ill-posed,
assessing the degree of condence in specic 3D structures observed in the estimated cube is
very important. Besides, uncertainty quantication helps in making accurate decisions on the
3D structures under scrutiny (e.g., conrming the existence of a second black hole in the Cyg A
galaxy). Bayesian inference techniques naturally enable the quantication of uncertainty around
the image estimate via sampling the full posterior distribution based on a hierarchical Bayesian
model [7,39,69,70,114]. For instance, authors in [114] proposed a monochromatic Bayesian method
based on MCMC sampling techniques. The authors in [7,70] proposed to approximate the full pos-
terior distribution and draw samples from the approximate distribution. However, sampling-based
techniques are computationally very expensive and cannot currently scale to the data regime ex-
pected from modern telescopes.
Instead, we propose to solve the wideband RI uncertainty quantication problem by performing
a Bayesian hypothesis test leveraging modern and scalable convex optimization methods. This test
postulates that the 3D structure of interest is absent from the RI image cube. Then, the data and
the prior model are used to determine if the null hypothesis is rejected or not. The hypothesis test
is formulated as a convex program and solved eciently using the primal-dual forward-backward
(PDFB) algorithm (3.4.2). The underlying algorithmic structure benets from preconditioning and
parallelization capabilities, paving the road for scalability to large data sets and image dimensions.
This chapter is structured as follows. Section 6.2 explains our wideband uncertainty quanti-
cation approach and the postulated Bayesian hypothesis test. In Section 6.3, we present a convex
minimization problem to formulate the hypothesis test. The underpinning algorithmic structure
and the epigraphical splitting technique exploited to solve the minimization problem are presented
in Section 6.4. We showcase the performance of our approach on realistic simulations in Section
6.5. Finally, conclusions and perspectives are stated in Section 6.6.
This work has been published in [5].
6.2 Wideband uncertainty quantication approach
6.2.1 Bayesian hypothesis test
The proposed method takes the form of a Bayesian hypothesis test to properly assess the degree of
condence in specic 3D structures appearing in the MAP estimate. We recall that in a Bayesian
framework, the objective function of a minimization problem can be seen as the negative logarithm
of a posterior distribution, with the minimizer corresponding to a MAP estimate. To dene the
test, we postulate the following hypotheses:
H0: The 3D structure of interest is ABSENT from the true image cube,
H1: The 3D structure of interest is PRESENT in the true image cube.
101 6.2 Wideband uncertainty quantication approach
These hypotheses split the set of images RN×L
+onto two regions; a set S ⊂ RN×L
+associated with
H0containing all images without the 3D structure, and the complement RN×L
+\S associated with
H1. The hypothesis test determines if the data and the prior model support that the 3D structure
is real (in favour of H1) or corresponds to a reconstruction artifact (in favour of H0). The null
hypothesis H0is rejected with signicance α∈]0,1[ if
P[H0|Y] = P[X∈ S|Y]⩽α, (6.1)
or equivalently if
P[H1|Y] = P[X∈RN×L
+\ S|Y]>1−α. (6.2)
Computing these hypothesis tests involves the calculations of probabilities in high-dimensional
space, which are typically intractable. One approach is to approximate these probabilities by
an MCMC algorithm [50,90]. The computation cost involved in these methods is still orders of
magnitude higher than that involved in computing the MAP estimator by convex optimization [29].
This necessitates the need for new uncertainty quantication methods that are both fast and
scalable to the data dimensions and image-size cubes expected with the new generation radio
telescopes.
In this regard, we propose to generalize our recent work for single-channel uncertainty quanti-
cation [99,100]. Similarly, we formulate the hypothesis test as a convex minimization problem that
can be solved eciently by convex optimization algorithms. The proposed approach only assumes
knowledge of the MAP estimate of the RI image cube and does not involve computing probabili-
ties. The hypothesis test is solved by comparing the set Swith the region of the parameter space
where most of the posterior probability mass of Xlies. This region is called credible region in the
Bayesian decision theory framework [103]. A set Cαis a posterior credible region with condence
level (1 −α)% if
P[X∈ Cα|Y] = 1 −α, for α∈]0,1[.(6.3)
For every α∈]0,1[, one can nd many regions in the parameter space that verify (6.3). Here
we consider the highest posterior density (HPD) region that is optimal in the sense that it has
minimum volume [103] and is given by
C∗
α={X|r(X)⩽ηα},(6.4)
where ris the objective function and ηα∈Ris chosen such that (6.3) holds. Computing the
exact HPD region is computationally very expensive in imaging problems because of the high
dimensionality involved. To overcome this diculty, we resort to the conservative credible region
e
Cα[91], where P[X∈e
Cα|Y]⩾1−α. For any α∈] exp(−NL/3),1[, the conservative credible
Chapter 6: Wideband uncertainty quantication by convex optimization 102
region is dened as
e
Cα={X|r(X)⩽eηα},(6.5)
where the parameter eηα∈Ris computed directly from the MAP estimate X†as [91]
eηα=r(X†) + N L(τα+ 1),(6.6)
with
τα=p16 log(3/α)/NL. (6.7)
Figure 6.1: 1D illustration of the exact HPD region C∗
αand the approximated one e
Cα. Notice that
C∗
α⊂e
Cα.
In this work, we consider the RI image cube is estimated with the HyperSARA approach (4.2),
hence e
Cαis dened as
e
Cα=X∈RN×L
+|r(X)⩽eηαand (∀(l, b)Φl,b xl∈ B2(yl,b, ϵl,b),(6.8)
with:
r(X) = µ
J
X
j=1
log |σj(X)|+υ+µ
T
X
n=1
log ∥[Ψ†X]n∥2+υ.(6.9)
The function r(X)is a non-convex function. In a similar fashion to (4.3), we approximate the
function r(X)by its convex majorant at the MAP estimate X†of the RI image cube:
r(X) = µ∥X∥∗,ω(X†)+µ∥Ψ†X∥2,1,ω(X†).(6.10)
103 6.2 Wideband uncertainty quantication approach
The set e
Cαis a convex set since it results from the intersection of multiple convex sets. This
is very important for the proposed convex optimization methodology. Note that, when Faceted
HyperSARA is used for computing the MAP estimate of the wideband image cube, we replace the
prior model r(X)with the spatio-spectral faceted prior (5.3).
According to Theorem 3.2 in [100], if e
Cα∩ S =∅, then S ⊂ RN×L
+\e
Cα. And because
P[X∈e
Cα|Y]⩾1−α, if e
Cα∩S =∅, then P[H0|Y]⩽α. That being said, we can verify if the null
hypothesis H0is rejected, i.e.,P[H0|Y]⩽αby solving the following problem:
determine if e
Cα∩ S =∅at level α. (6.11)
There are two possible scenarios:
•e
Cα∩ S =∅, then the hypothesis test H0is rejected at level αand the 3D structure under
scrutiny presents in the true RI image cube with probability (1 −α).
•e
Cα∩ S ̸=∅, then we fail to reject H0at level αand the presence of the 3D structure of
interest is uncertain.
6.2.2 Choice of the set S
We consider in this work spatially localized 3D structures. To dene these type of structures,
we introduce the selection matrix M∈ {0,1}NM×N, that is, for an image cube X∈RN×L,
MX ∈RNM×Ldenotes the region of the 3D structure. The matrix Mc∈ {0,1}(N−NM)×Nis the
complementary matrix of M. We dene the set Sbeing a subset of the intensity image cubes
of RN×L, by imposing a non-negativity constraint. In addition, to smooth the area of the 3D
structure, we use a positive inpainting matrix L∈RNM×(N−NM)that lls the region of the 3D
structure MX with the information from the other pixels McXsuch that
MX =LMcX+T,(6.12)
where T∈[−τ, τ ]NM×Land τ > 0is a small tolerance value. Note that, the inpainting technique is
done in 3D, meaning that each pixel in the 3D structure is replaced by a weighted sum of the pixels
in the 3D neighborhood, ensuring spatio-spectral smoothness in the region of the 3D structure.
The inpainting procedure might amplify the energy in MX and lead to articial artifacts. To
alleviate this issue, we constrain the energy in the region of the 3D structure to a certain bound
θ= (θl)1⩽l⩽L∈RL
+from a background reference level B∈RNM×L,i.e.,
MX ∈ B2(B,θ)(6.13)
Chapter 6: Wideband uncertainty quantication by convex optimization 104
That being said, the set Scontaining all image cubes without the 3D structure of interest is given
by
S=X∈RN×L
+|(M−LMc)X∈[−τ, τ ]NM×Land MX ∈ B2(B,θ),(6.14)
where Sis a convex set since it results from the intersection of multiple convex sets.
6.3 Proposed minimization problem
We recall that the proposed approach consists in solving problem (6.11), that is, determining if the
intersection between Sand e
Cαis empty or not. Intuitively, e
Cα∩S =∅hold only if dist(e
Cα,S)>0,
and consequently we can conclude that the hypothesis test H0is rejected at level α, where
dist(e
Cα,S) = inf ∥e
Cα− S∥ = inf n∥Xe
Cα−XS∥F: (Xe
Cα−XS)∈e
Cα× So,(6.15)
Oppositely, if dist(e
Cα,S) = 0, we can conclude that e
Cα∩ S ̸=∅and we are uncertain about the
presence of the 3D structure of interest. For more clarity, Figure 6.2 shows a simple illustration of
the proposed approach.
Figure 6.2: Illustration of the proposed method for the two dierent scenarios. Our approach simply
consists in examining the euclidean distance between the two sets Sand e
Cα. Left: there is no intersection
between the two sets, thus H0is rejected at level α. Right: the two sets intersect, thus one cannot reject
H0,i.e., one cannot conclude if the 3D structure exists in the true image cube or not.
By combining the denition of the distance (6.15) and the denitions of the sets e
Cα(6.8) and
S(6.14), we can reformulate the problem (6.11) as
105 6.4 Proposed algorithmic structure
minimize
XeCα∈RN×L,XS∈RN×L
γ
2∥Xe
Cα−XS∥2
F(6.16)
subject to
XS∈RN×L
+,LXS∈[−τ, τ ]NM×L,MXS∈ B2(B,θ),
Xe
Cα∈RN×L
+,∀(l, b)Φl,bXe
Cαl∈ B2(yl,b, ϵl,b ),
µ∥Xe
Cα∥∗,ω+µ∥Ψ†Xe
Cα∥2,1,ω⩽eηα,(6.16a)
with L=M−LMcand γ > 0. Note that the choice of the parameter γdoes not aect the solution
of the minimization problem. However, this parameter can be used in practice to accelerate the
convergence speed. The notation Xe
Cαlrepresents a column of the matrix Xe
Cα
1.
6.4 Proposed algorithmic structure
To solve the minimization problem (6.16), we adopt the PDFB algorithm explained in Section
3.4.2. However, solving for (6.16) involves projection onto a convex set dened by the constraint
(6.16a). This projection does not have a closed-form solution. To overcome this diculty, we
resort to a splitting technique based on epigraphical projection recently proposed in [31] to handle
minimization problems involving sophisticated constraints.
6.4.1 Epigraphical splitting
The epigraphical splitting proposed by [31] aims to replace complex constraints such as (6.16a)
with a collection of epigraphs and a closed half-space constraint. Thus, the problem of com-
puting the projection onto the original constraint is reduced to the problem of computing the
projection onto smaller epigraphs. Leveraging this technique, we introduce the auxiliary variables
q= (qj)1⩽j⩽J∈RJand z= (zn)1⩽n⩽T∈RTin the minimization problem (6.16) and thereby
splitting the constraint (6.16a) into simpler set of constraints. Consequently, the minimization
problem (6.16) can be equivalently reformulated as
1The notation Xlis equivalent to xladopted in the previous chapters. Since most of the symbols in this chapter
have subscripts, the new notation has been adopted for more clarity.
Chapter 6: Wideband uncertainty quantication by convex optimization 106
minimize
XeCα∈RN×L,XS∈RN×L
q∈RJ,z∈RT
γ
2∥Xe
Cα−XS∥2
F(6.17)
subject to
XS∈RN×L
+,LXS∈[−τ, τ ]NM×L,MXS∈ B2(B,θ),
Xe
Cα∈RN×L
+,∀(l, b)Φl,bXe
Cαl∈ B2(yl,b, ϵl,b ),
µ∥Xe
Cα∥∗,ω⩽eq,
µ∥Ψ†Xe
Cα∥2,1,ω⩽ez,
eq+ez⩽eηα.
(6.17a)
(6.17b)
(6.17c)
with eq=PJ
j=1 qjand ez=PT
n=1 zn. Note that the variable Xe
Cαsatisfying the constraint
(6.16a) is equivalent to having the variables (Xe
Cα,q,z)satisfying the constraints (6.17a), (6.17b)
and (6.17c). Following the epigraph denition (Appendix .1, equation (2)), one can observe that the
condition (6.17a) represents the epigraph of the weighted nuclear norm E∗,ω. Similarly, condition
(6.17b) represents the epigraph of the weighted ℓ2,1norm E2,1,ω. The constraint (6.17c) accounts
for a closed half-space. The associated projections with conditions (6.17a), (6.17b) and (6.17c)
admit a closed form and are presented in Appendix .2.
To t the minimization problem (6.17) in the PDFB framework, the constraints are imposed
by means of the indicator function ιCof a convex set C(3.21). By doing so, the minimization
problem (6.17) can be equivalently redened as
minimize
XeCα∈RN×L,XS∈RN×L
q∈RJ,z∈RT
γ
2∥Xe
Cα−XS∥2
F+ιRN×L
+(XS) + ι[−τ,τ ]NM×L(LXS) + ιB2(B,θ)(MXS)+
+ιRN×L
+(Xe
Cα) +
L
X
l=1
B
X
b=1
ιB2(yl,b,ϵl,b )Φl,bXe
Cαl+
+ιE∗,ω(Xe
Cα,q/µ),+ιE2,1,ω(Ψ†Xe
Cα,z/µ),+ιV(eηα)(q/µ, z/µ),
(6.18)
where:
V=
(q/µ, z/µ)∈RJ×RT|
J
X
j=1
qj/µ +
T
X
n=1
zn/µ ⩽eηα
.(6.19)
6.4.2 Underpinning primal-dual forward-backward algorithm
The details of the proposed algorithmic structure are presented in Algorithm 6. All the variables of
interest are updated via forward-backward steps. At each iteration t∈N, the algorithm minimizes
the distance between Xe
Cαand XSin lines 9and 12 (the image cubes of interest). In addition, pro-
107 6.5 Validation on synthetic data
jections onto the convex set e
Cαare performed in steps 21,24,27. These correspond to projections
of the estimated data Φl,bXe
Cαlonto the associated ℓ2balls with respect to the preconditioning
matrices Ul,b, and epigraphical projections of the variable Xe
Cαonto the weighted nuclear ball (line
24) and the weighted ℓ2,1ball (line 27), respectively. Similarly, projections onto the convex set
Sare performed in steps 30,32. These correspond to performing the linear inpainting (6.12) and
controlling the energy of the 3D structure of interest (6.13), respectively. All the projections are
updated in parallel and used later in the update of the primal variables of interest Xe
Cαand XSin
steps 10,13, respectively.
The exact expressions of all the projections are provided in Appendix .2. The diagonal precon-
ditioning matrices Ul,b are chosen according to the preconditioning strategy described in Section
4.3.2. More precisely, the coecients on their diagonal are given by the inverse of the sampling
density in the vicinity of the probed Fourier modes. It is worth noting that the projections onto
the ℓ2balls with respect to the preconditioning matrices Ul,b do not admit an analytic expression.
Instead, they can be numerically estimated with an iterative algorithm. In this work, we resort to
FISTA [9].
Following (3.29) and in a similar fashion to Section 4.3.2, Algorithm 6is guaranteed to con-
verge to the global minimum of the minimization problem (6.18) if the preconditioning matrices
(Ul,b)1⩽l⩽L
1⩽b⩽B
and the parameters (κi)1⩽i⩽5, ζ and γsatisfy the inequality:
1−ζκ1∥U1/2Φ∥2
S+κ2+κ3∥Ψ†∥2
S+κ4∥L∥2
S+κ5∥M∥2
S> ζ γ/2(6.20)
where for every X∈RN×L,U1/2Φ(X) = (U1/2
l,b Φl,bXl)1⩽l⩽L
1⩽b⩽B
. A convenient choice of (κi)1⩽i⩽5
is
κ1=1
∥U1/2Φ∥2
S
, κ2= 1, κ3=1
∥Ψ†∥2
S
, κ4=1
∥L∥2
S
and κ5=1
∥M∥2
S
.(6.21)
In this setting, convergence is guaranteed for all 0< ζ < 2
10+γ.
6.5 Validation on synthetic data
6.5.1 Simulation setting
Following the procedure described in Section 4.4, we simulate a wideband RI model cube composed
of L= 15 spectral channels from an image of the W28 supernova remnant of size N= 256 ×256
pixels. The wideband RI data are generated from a realistic uv-coverage within the frequency
range [ν1, νL] = [1,2] GHz with uniformly sampled channels and total observation time of 4 hours.
For each channel indexed l∈ {1,··· , L}, its corresponding uv-coverage is obtained by scaling the
reference uv-coverage with νl/ν1. We consider an input signal-to-noise ratio (iSNR) (4.13) of 60 dB.
The ℓ2bounds ϵl,b on the data-delity terms are computed from the noise statistics as explained
Chapter 6: Wideband uncertainty quantication by convex optimization 108
Algorithm 6: Wideband uncertainty quantication by convex optimization
Data: (yl,b)l,b ,l∈ {1,··· , L},b∈ {1,··· , B}
Input: X(0)
eCα
,X(0)
S,q(0),z(0) ,B(0),a(0) ,(C(0)
d,e(0)
d)1⩽d⩽D,(v(0)
l,b )l,b,D(0) ,S(0),ω(0) ,ω(0)
Parameters: (Ul,b)l,b,(ϵ(0)
l,b )l,b,eηα,µ,µ,ζ,(κi)1⩽i⩽5
1t←0;
2while stopping criterion not satised do
3for l= 1 to Ldo
4bx(t)
l=FZ exl;// Fourier transforms
5for b= 1 to Bdo
6bx(t)
l,b =Ml,bbx(t)
l;// send to data cores
7Update primal variables simultaneously
8for l= 1 to Ldo
9H(t+1)
eCαl=γX(t)
eCαl−γX(t)
Sl+κ1Z†F†
B
X
b=1
M†
l,bev(t)
l,b +κ2B(t)l+κ3
D
X
d=1 e
C(t)
dl
10 X(t+1)
eCα
=PRN×L
+X(t)
eCα
−ζH(t+1)
eCα
11 X(t+1)
eCα
= 2X(t+1)
eCα
−X(t)
eCα
12 H(t+1)
S=γX(t)
S−γX(t)
eCα
+κ4L†D(t)+κ5M†S(t)
13 X(t+1)
S=PRN×L
+X(t)
S−ζH(t+1)
S
14 X(t+1)
S= 2X(t+1)
S−X(t)
S
15
q(t+1)
z(t+1)
1
.
.
.
z(t+1)
D
=PV
(q(t)−ζκ2a(t))/µ
(z(t)
1−ζκ3e(t)
1)/µ
.
.
.
(z(t)
D−ζκ3e(t)
D)/µ
,eηα
16 q(t+1) = 2q(t+1) −q(t)
17 z(t+1) = 2z(t+1) −z(t)
18 Update dual variables simultaneously
19 Enforce data delity
20 for (l, b)= (1,1) to (L, B)do
21 v(t+1)
l,b =Ul,b IMl,b −proxUl,b
B2(yl,b,ϵl,b )U−1
l,b v(t)
l,b +Θl,bGl,b bx(t)
l,b
22 ev(t+1)
l,b =G†
l,bȆ
l,bv(t+1)
l,b
23 Epigraphical projection onto the nuclear ball
24 B(t+1),a(t+1) =IJ− PE∗,ωB(t)+X(t+1)
eCα
,(a(t)+q(t+1))/µ
25 Epigraphical projection on the ℓ2,1ball
26 for d= 1 to Ddo
27 C(t+1)
d,e(t+1)
d=IN− PE2,1,ωC(t)
d+Ψ†
dX(t+1)
eCα
,(e(t)
d+z(t+1)
d)/µ
28 e
C(t+1)
d=ΨdC(t+1)
d
29 3D structure inpainting
30 D(t+1) =INM×L− P[τ,τ]NM×LD(t)+LX(t+1)
S
31 3D structure energy bounding
32 S(t+1) =INM×L− PB2(B,θ)S(t)+MX(t+1)
S
33 t←t+ 1
109 6.5 Validation on synthetic data
in Section 4.4. Several tests are performed varying the sampling rate SR (4.15) from 0.005 to 2.
The performance of the proposed uncertainty quantication algorithm (Algorithm 6), denoted
by HyperSARA-UQ, where the MAP estimate is computed using the HyperSARA approach (Algo-
rithm 2) with 5 reweights, is compared with that of the joint average sparsity approach, denoted by
JAS-UQ, where the MAP estimated is computed by solving a sequence of 5 consecutive JAS min-
imization problems of the form (4.17) using the PDFB algorithm explained in Section 4.3.2. The
regularization parameters present in the HyperSARA minimization task (4.3) are set to µ= 1 and
µ= 10−2, leveraging the dirty wideband model cube Xdirty (Section 4.2.2), and the free parameter
appears in the JAS minimization problem (4.17) is set to µ2= 10−2. Note that for JAS-UQ, the
prior model rinvolved in the denition of the set e
Cα(6.8) is reduced to the joint average sparsity
prior, i.e.,r(X) = µ2∥Ψ†X∥2,1,ω. In this setting, JAS-UQ is solved using a simplied version of
Algorithm 6where the projection onto the set e
Cαinvolves a simple projection onto the weighted
ℓ2,1ball rather than epigraphical projections onto the weighted nuclear ball and the weighted ℓ2,1
ball.
To precisely evaluate the interest of the proposed approach, we quantify the uncertainty of three
spatially localized 3D structures appearing in the MAP estimate. These structures are compact or
slightly extended sources corresponding to the denition of the set S(6.14). The linear inpainting
matrix Lgiven in (6.12) is chosen such that L=1
3(L5×5×3+L7×7×3+L11×11×3), where L5×5×3,
L7×7×3and L11×11×3model a 3D normalized convolutions between the image cube (lled with
zeros inside the 3D structure) and 3D Gaussian convolution kernels of size 5×5×3,7×7×3and
11 ×11 ×3, respectively. Also, we set (τl=std(M[X†]l))1⩽l⩽L. To control the energy inside the
region of the 3D structure as dened in (6.13), we set B=0and (θl=∥LMc[X†]l∥2)1⩽l⩽L.
6.5.2 Uncertainty quantication parameter
Recall that the solution (X‡
Cα,X‡
S)generated by Algorithm 6satisfy
dist(e
Cα,S) = ∥X‡
Cα−X‡
S∥F.(6.22)
To relate this distance to the 3D structure’s intensity, we introduce the image cube X†
S∈ S that
corresponds to the MAP estimate X†where the 3D structure of interest has been removed via
applying the linear inpainting Lto the region of the 3D structure of interest. Formally, we dene:
MX†
S=LMcX†and McX†
S=McX†.(6.23)
Chapter 6: Wideband uncertainty quantication by convex optimization 110
At this point, we dene the normalized intensity of the 3D structure ραas the ratio between the
distance (6.22) and the intensity of the 3D structure present in the MAP, given by ∥X†−X†
S∥F:
ρα=dist(e
Cα,S)
∥X†−X†
S∥F
.(6.24)
Notice that ρα= 0 is equivalent to dist(e
Cα,S) = 0. Consequently, we can conclude:
•ρα= 0 implies that e
Cα∩ S ̸=∅. Then, we fail to reject H0at level αand the presence of
the 3D structure of interest is uncertain.
•ρα>0implies that e
Cα∩ S =∅. Thus, the hypothesis test H0is rejected at level αand the
value ραrepresents the energy percentage of the 3D structure that is conrmed in the MAP
estimate [100].
In our simulations, we consider α= 1% and we consider that H0is rejected when ρα>2% (to
allow for numerical errors).
6.5.3 Results and discussion
To showcase the eciency of the hybrid prior image model, we compare the performance of
HyperSARA-UQ against JAS-UQ. We perform several tests on the data sets generated using
a realistic uv-coverage where we vary the Fourier SR in the interval [0.01 2]. In our simulations,
we study three dierent spatially localized 3D structures; these are two strong sources: Structure 1
and Structure 3, highlighted with blue and red rectangles, respectively, in the ground-truth image
cube at channels ν1and ν15 and one weak and slightly extended source, Structure 2, highlighted
with green rectangle (see Figure 6.3, rst row). Figure 6.3 (c) shows curves representing the ρα
values of the three Structures obtained by HyperSARA-UQ (solid curves) and JAS-UQ (dashed
curves).
We notice for the three structures that the conrmed energy ραwith both methods naturally
increases when the number of measurements increases, meaning that the 3D structure uncertainty
decreases when SR increases. This is because the investigated structures are originally present in
the true image cube. We observe comparable performance between HyperSARA-UQ and JAS-UQ
for Structure 3 where ρα>90% for all sampling rates above 0.22. In this case, at least 90% of
Structure 3 intensity is conrmed with probability 99%. However, this is not the case for Structure
1 and Structure 2, where HyperSARA-UQ succeeds in conrming more energy of the sources in
comparison with JAS-UQ (almost 10% increase in the conrmed energy is reported for Structure
1 with SR >0.1, and for Structure 2 with all considered sampling rates). These results show the
power of the combination of the low-rankness and joint average sparsity priors in conrming the
energy of the considered true structures. For the weak source, Structure 2, the hypothesis H0is
always rejected for all the considered sampling rates. Interestingly, 37% of the energy of Structure 2
111 6.6 Conclusions
is conrmed for the drastic sampling SR = 0.01 with HyperSARA-UQ, while the conrmed energy
drops to 8.93% with JAS-UQ, reecting the importance of the low-rankness prior in HyperSARA-
UQ in regularizing the inverse problem. Conversely, H0cannot be rejected with both methods for
Structure 1 when SR <0.1and cannot be rejected for Structure 3 when SR <0.02.
For qualitative comparison, we proceed with the visual inspection of the images obtained with
HyperSARA-UQ and JAS-UQ applied to Structure 1 (Figures 6.4 and 6.5), Structure 2 (Figures
6.6 and 6.7) and Structure 3 (Figures 6.8 and 6.9). The images are obtained with SR = 0.5 and
iSNR = 60 dB and reported for channels ν1and ν15. The rst row of all gures show the MAP
estimate of HyperSARA (left) and JAS (right). The reported aSNR values are 32.32 dB and
30.87 dB, respectively, suggesting the superiority of HyperSARA in recovering higher-resolution
higher-dynamic-range image cubes. Comparing the uncertainty quantication parameter, we have
for Structure 1 (Figures 6.4 and 6.5)ρα= 75.21% with HyperSARA-UQ and ρα= 64.53% with
JAS-UQ. Thus, we can conclude that e
Cα∩ S =∅and H0can be rejected with signicance
α= 1%. Similar conclusion can be drawn for Structure 3 (Figures 6.8 and 6.9), with 97.43% of the
3D structure energy is conrmed using HyperSARA-UQ and 96.95% of the energy is conrmed
using JAS-UQ, suggesting the similarity between the two methods in analyzing strong sources.
Interestingly for the weak and slightly extended source, Structure 2, we have ρα= 54.91% using
HyperSARA-UQ and ρα= 45.1% using JAS-UQ. These values indicates that Structure 2 is real
and not an artifact. One can observe that the images [X‡
e
Cα
]land [X‡
S]l(Figures 6.8 and 6.9, bottom
row), obtained by performing uncertainty quantication of Structure 3 exhibit more artifacts than
those obtained of Structure 2 (Figures 6.6 and 6.7, bottom row). This can be justied by the fact
that Structure 3 is a very strong source and when removing it by projecting onto the set S, the
data tend to create artifacts in other parts of the image to compensate for the lost energy.
6.6 Conclusions
In this work, we presented a wideband uncertainty quantication approach that measures the
degree of condence in specic 3D structures appearing in the MAP estimate. The proposed
method is a generalization of our previous works for monochromatic uncertainty quantication
[99,100] and is based on the recent Bayesian inference results presented in [91]. Our approach
takes the form of a Bayesian hypothesis test, formulated as a convex minimization problem and
solved using a primal-dual algorithm. As opposed to Bayesian inference techniques, the algorithmic
structure is shipped with preconditioning and parallelization functionalities paving the road for
scalability to big data regimes. We investigated the interest of our approach for wideband RI
imaging on realistic simulations through investigating dierent spatially localized 3D structures.
Another interesting study consists of analyzing artifacts appearing in the MAP estimate. In this
case, ραis expected to decrease as SR increases. Unfortunately, we could not perform this analysis
Chapter 6: Wideband uncertainty quantication by convex optimization 112
with the considered data set as the MAP estimate presents no artifacts even for low SR rates. We
leave this study for real data where the reconstructed image cube typically exhibits artifacts result
from calibration errors or mis-modelling of the measurement operator. Validating the proposed
approach on real data sets is an ongoing project where preliminary real data results suggest the
design of more sophisticated and scalable sets S. More precisely, the current implementation
showed two problems for real data: (i) Applying the linear inpainting at each iteration could be
very costly for GB images. (ii) Our linear inpainting matrix replaces each pixel in the 3D structure
by a weighted sum of the pixels in the 3D neighborhood. More sophisticated denitions of the
inpainting operation can be investigated (e.g., one can dene the inpainting operator in the 3D
wavelets space). To conclude, we emphasize on the fact that wideband uncertainty quantication
tools are of great interest for astronomers, particularly in the era of the new generation telescopes,
namely SKA, where Petabyte image cubes are expected.
113 6.6 Conclusions
(a) Ground-truth image at channel
ν1= 1 GHz
(b) Ground-truth image at channel
ν15 = 2 GHz
(c) Curves representing the values of ρα
10-2 10-1 10 0
0
20
40
60
80
100
Figure 6.3: Simulations with realistic uv-coverage: (c) Curves representing the values of ραin
percentage (y-axis) as a function of the sampling rate SR=Ml/N (x-axis), in log10 scale, for the 3D
structures of interest. The considered 3D structures are highlighted with rectangles on channel
ν1= 1 GHz (a) and channel ν15 = 2 GHz (b) of the ground-truth image cube, in log10 scale. Each point
corresponds to the mean value of 5tests with dierent antenna positions and noise realizations, and the
vertical bars represents the standard deviation of the 5tests.
Chapter 6: Wideband uncertainty quantication by convex optimization 114
Channel ν1= 1 GHz
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
Figure 6.4: Uncertainty quantication of 3D Structure 1: results, reported for channel ν1= 1 GHz, are
obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB. The images from top to bottom are:
the MAP estimate [X†]1, the uncertainty quantication results [X‡
eCα
]1and [X‡
S]1. The results are given
for HyperSARA-UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 75.21%, and JAS-UQ (right) with MAP estimate aSNR = 30.87 dB and ρα= 64.53%.
All images are displayed in log10 scale and overlaid with zoom onto the region of Structure 1.
115 6.6 Conclusions
Channel ν15 = 2 GHz
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
Figure 6.5: Uncertainty quantication of 3D Structure 1: results, reported for channel ν15 = 2 GHz, are
obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB. The images from top to bottom are:
the MAP estimate [X†]15, the uncertainty quantication results [X‡
eCα
]15 and [X‡
S]15. The results are
given for HyperSARA-UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 75.21%, and JAS-UQ (right) with MAP estimate aSNR = 30.87 and ρα= 64.53%. All
images are displayed in log10 scale and overlaid with zoom onto the region of Structure 1.
Chapter 6: Wideband uncertainty quantication by convex optimization 116
Channel ν1= 1 GHz
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
Figure 6.6: Uncertainty quantication of 3D Structure 2: results, reported for channel ν1= 1 GHz, are
obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB. The images from top to bottom are:
the MAP estimate [X†]1, the uncertainty quantication results [X‡
eCα
]1and [X‡
S]1. The results are given
for HyperSARA-UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 54.91%, and JAS-UQ (right) with MAP estimate aSNR = 30.87 dB and ρα= 45.1%. All
images are displayed in log10 scale and overlaid with zoom onto the region of Structure 2.
117 6.6 Conclusions
Channel ν15 = 2 GHz
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
Figure 6.7: Uncertainty quantication of 3D Structure 2: results, reported for channel ν15 = 2 GHz, are
obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB. The images from top to bottom are:
the MAP estimate [X†]15, the uncertainty quantication results [X‡
eCα
]15 and [X‡
S]15. The results are
given for HyperSARA-UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 75.21%, and JAS-UQ (right) with MAP estimate aSNR = 30.87 and ρα= 64.53%. All
images are displayed in log10 scale and overlaid with zoom onto the region of Structure 2.
Chapter 6: Wideband uncertainty quantication by convex optimization 118
Channel ν1= 1 GHz
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
Figure 6.8: Uncertainty quantication of 3D Structure 3: results, reported for channel ν1= 1 GHz, are
obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB. The images from top to bottom are:
the MAP estimate [X†]1, the uncertainty quantication results [X‡
eCα
]1and [X‡
S]1. The results are given
for HyperSARA-UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 97.43%, and JAS-UQ (right) with MAP estimate aSNR = 30.87 dB and ρα= 96.95%.
All images are displayed in log10 scale and overlaid with zoom onto the region of Structure 3.
119 6.6 Conclusions
Channel ν15 = 2 GHz
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
Figure 6.9: Uncertainty quantication of 3D Structure 3: results, reported for channel ν15 = 2 GHz, are
obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB. The images from top to bottom are:
the MAP estimate [X†]15, the uncertainty quantication results [X‡
eCα
]15 and [X‡
S]15. The results are
given for HyperSARA-UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 97.43%, and JAS-UQ (right) with MAP estimate aSNR = 30.87 and ρα= 96.95%. All
images are displayed in log10 scale and overlaid with zoom onto the region of Structure 3.
Chapter 7
Conclusions and perspectives
The research reported in this thesis leverages convex optimization techniques to achieve precise
and scalable imaging for wideband radio interferometry and further assess the degree of condence
in particular 3D structures present in the reconstructed cube. On the one hand, the radio inter-
ferometric inverse problem for image formation is highly ill-posed, which prompts the adoption of
sophisticated image priors to regularize the inverse problem. On the other hand, modern radio
telescopes provide vast amounts of data, which necessitates the design of scalable and distributed
wideband imaging and uncertainty quantication algorithms that can recover and analyze the
expected very large image cubes.
To meet these extreme challenges and achieve the anticipated scientic goals, we proposed the
HyperSARA approach in Chapter 4. HyperSARA consists in solving a minimization problem with
sophisticated log-sum priors, promoting low-rankness and joint average sparsity of the estimated
image cube in ℓ0sense. This hybrid prior image model has proved ecient in recovering high-
resolution high-dynamic-range image cubes in comparison with the state-of-the-art approaches.
The underpinning algorithmic structure, namely the primal-dual framework, exhibits interesting
functionalities such as preconditioning to accelerate the convergence speed and splitting of the data
into eciently-designed data-blocks to spread the computation cost over a multitude of processing
CPU cores, allowing scalability to large data volumes. Besides, HyperSARA incorporates an
adaptive strategy to adjust the bounds associated with the data-delity terms, allowing for imaging
real data that present calibrations errors in addition to the thermal noise. Although HyperSARA
processes all data-delity blocks independently in parallel, the involved priors require computations
scaling with the size of the image cube to be reconstructed. This is prohibitively expensive in terms
of computation time and memory requirements in the era of the awaited Petabyte image cubes.
To overcome this bottleneck and push the scalability potential of HyperSARA, we developed in
Chapter 5 the Faceted HyperSARA algorithm. Faceted HyperSARA decomposes the full image
cube into regular, content-agnostic, spatio-spectral facets, each of which is associated with a facet-
120
121 7.1 Perspectives
based low-rankness and joint average sparsity regularization term. This sophisticated facet-specic
prior model was shown to provide higher imaging quality, reected by a better estimation of faint
emissions compared to HyperSARA. Faceted HyperSARA, powered by the primal-dual algorithm,
allows for parallel processing of all data blocks and image facets over a multiplicity of CPU cores.
The performance of Faceted HyperSARA was validated on a reconstruction of a 15 GB image cube
of Cyg A from 7.4 GB of VLA observations across 480 channels, utilizing 496 CPU cores on a
high performance computing system for 68 hours. On the one hand, the associated results have
proved the imaging precision of Faceted HyperSARA reected by a signicant improvement in the
imaging quality with respect to the CLEAN-based wideband algorithm JC-CLEAN. Importantly
and in contrast with JC-CLEAN, Faceted HyperSARA results conrmed the recent discovery of
a super-massive second black hole in the inner core of Cyg A at much lower frequencies than
ever. On the other hand, the computing cost of Faceted HyperSARA, implemented in matlab,
was not far from that for JC-CLEAN c++ implementation (22 hours), suggesting that the gap
can be signicantly reduced, if not closed, with the forthcoming c++ implementation of Faceted
HyperSARA, and conrming its the scalability potential.
Even though HyperSARA and Faceted HyperSARA have proved to provide an accurate esti-
mation of the wideband sky, measuring the degree of condence in specic 3D structures appearing
in the estimated cube is crucial due to the severe ill-posedness of the problem. In this context, we
developed a new method in Chapter 6 that solves the uncertainty quantication problem in radio
interferometry. Mainly, the approach performs a Bayesian hypothesis test, which postulates that
the 3D structure under scrutiny is absent from the true image cube. Then, the data and the prior
image model are employed to decide if the null hypothesis is rejected or not. The hypothesis test
is expressed as a convex minimization problem and solved eciently leveraging the sophisticated
primal-dual framework with the preconditioning and splitting functionalities. As opposed to typi-
cal Bayesian inference techniques that provide uncertainty measures leveraging MCMC or proximal
MCMC algorithms, no sampling is involved in the proposed approach, allowing for scalability to
high-dimensional data and image regimes.
7.1 Perspectives
At this point, we shed light on some research directions that can further enhance the work developed
in this thesis.
Faceted HyperSARA successfully addressed the computational bottlenecks raised by both the
volume of the data and the size of the image cube. Future work should contemplate the denition
and implementation of a faceted Fourier transform to improve the data and image locality in the
proposed algorithm. More precisely, for each channel indexed l, a Fourier transform is computed
on the data core (l, 1) at each iteration. Then, each data core (l, b), with b∈ {2, . . . , B }, receives
Chapter 7: Conclusions and perspectives 122
only a few coecients of the Fourier transform from the data core (l, 1). This procedure requires
communications between the data core (l, 1) and all the data cores (l, b), with b∈ {2, . . . , B}on the
one side, and between the data core (l, 1) and all facet cores on the other side. A faceted Fourier
transform implementation would overcome this problem and enhance the distribution scheme of
Faceted HyperSARA, in that each data core implements a faceted Fourier transform of a facet core.
Hence, no communications between data cores are needed anymore. Furthermore, only one-to-one
facet to data cores communications are required.
Another perspective consists in developing a production c++ version of Faceted HyperSARA,
building from the existing c++ version of HyperSARA (see the Puri-Psi webpage), to achieve
maximum performance and scalability of a software implementation.
Deep neural networks (DNNs), in particular convolutional neural networks (CNNs), have shown
promising results in solving inverse imaging problems [67,71]. The computational cost associated
with DNNs boils down to training the DNN oine. Once trained, its application, leveraging GPU
systems, can be extremely fast, opening the door for further scalability potential. However, the
extreme ill-posedness of the radio interferometric inverse problem and the lack of rich training
data sets makes the problem complicated even for sophisticated DNN structures such as the U-
net. The emerging plug-and-play (PnP) methods in optimization theory propose to replace the
proximity operator associated with the regularization term in a proximal splitting algorithm by
a more general denoising operator. In this regard, we suggest investigating the performance of
Faceted HyperSARA, where the proximity operator of one or two of the adopted regularization
terms, namely the nuclear norm and the ℓ2,1norm, is replaced by a CNN acting as a denoiser. In
this scenario, the prior image model can be learned directly from the data.
Last but not least, the developed uncertainty quantication method should be validated on
real data sets for broader acceptability in the research community. Importantly, real data usually
present calibration errors and if not accounted for in the measurement operator, might lead to
reconstruction artifacts. Thus, analyzing such suspected regions is crucial to make scientic deci-
sions (e.g, conrming if the faint emission in the inner core of Cyg A corresponds to a second black
hole or a reconstruction artifact). Particularly for the choice of the set S, the preliminary real
data results suggest the design of more sophisticated and scalable sets. More precisely, the current
implementation constructs Susing an inpainting technique, which replaces each pixel in the region
of the 3D structure by a weighted sum of the pixels in the 3D neighborhood. Applying this linear
inpainting at each iteration is very costly for GB image cubes. One suggestion for future work is
to dene the inpainting operator in the 3D wavelet space where the image cube is represented by
a few sparse coecients, hence accelerating the inpainting operation at each iteration.
Appendices
123
124
.1 Basic denitions in convex optimization
Denition: The domain of a function f:RN→]− ∞,+∞]is dened as
dom f={z∈RN|f(z)<+∞}.(1)
Denition: A function f:RN→]− ∞,+∞]is proper if: dom f̸=∅.
Denition: The epigraph of a function f:RN→]− ∞,+∞]is the closed convex subset of
RNdened as
epi f={(z, γ)∈RN×R|f(z)≤γ}.(2)
Denition: A function f:RN→]−∞,+∞]is lower semi-continuous if epi fis a closed set.
Denition: Let f:RN→]− ∞,+∞]be a convex function. The subdierential of fat
¯
z∈RNis the set
∂f (¯
z) = y:f(z)≥f(¯
z) + ⟨y|z−¯
z⟩,∀z∈RN.(3)
Denition: Let fand hbe convex functions from RNto ]− ∞,+∞]. The inf-convolution of
fand his
(f□h)(z) = inf
u∈RNf(u) + h(z−u).(4)
125 .2 Proximity operators
.2 Proximity operators
In what follows, we dene the proximity operators required to deal with the non-smooth functions
present in the minimization problems developed in this thesis.
The indicator function of the positivity constraint ιRN×L
+:The proximity operator of the
function ιRN×L
+, enforcing positivity, is dened for a matrix Z∈RN×Las the projection onto the
real positive orthant:
PRN×L
+(Z) = max{ℜ(Z),0}.(5)
The weighted nuclear norm µ∥·∥∗,ω:Given a matrix Z∈RN×L, the proximity operator of
the weighted nuclear norm µ∥Z∥∗,ω , involves soft-thresholding of the vector of the singular values
σ(Z). These are obtained by means of the singular value decomposition (SVD); Z=Λ1ΣΛ†
2, with
Σ= Diag(σ). The proximity operator of µ∥Z∥∗,ω is thus given by:
S∗
µω(Z) = Λ1Diag Sℓ1
µω(σ)Λ†
2,(6)
with:
Sℓ1
µω(σ) = max {σj−µ ωj,0},∀j∈ {1,·· · , J},(7)
where ωj≥0is the weight associated with the j-th singular value σjand µis the soft-thresholding
parameter.
The weighted ℓ2,1norm µ∥· ∥2,1,ω:The proximity operator of the weighted ℓ2,1norm reads a
row-wise soft-thresholding operation, dened for a matrix Z= [z†
1,·· · ,z†
N]†∈RN×Las follows:
Sℓ2,1
µω(Z) = max {∥zn∥2−µ ωn,0}z†
n
∥zn∥21⩽n⩽N
,(8)
where ωn≥0is the weight associated with the row z†
nand µis the soft-thresholding parameter.
The epigraph of the weighted nuclear norm E∗,ω :Given a matrix Z∈RN×L, the proximity
operator of the epigraph of the weighted nuclear norm reads an epigraphical projection onto the
weighted nuclear ball of radius eu=PJ
j=1 uj, with Jbeing the the number of the singular values
of Zand u= (uj)1⩽j⩽J. Interestingly, this projection, denoted by PE∗,ω (Z,u), resolves to an
epigraphical projection of the vector of the singular values σ(Z)onto the weighted ℓ1ball of radius
eu. This requires an SVD operation; Z=Λ1ΣΛ†
2, with Σ= Diag(σ). That being said, we can
write the epigraphical projection onto the weighted nuclear ball as
PE∗,ω (Z,u) = Λ1Diag PE1,ω (σ,u)Λ†
2,(9)
126
where PE1,ω (σ,u) = (p,θ)with p= (pj)1⩽j⩽J,θ= (θj)1⩽j⩽Jand ∀j∈ {1,·· · , J}:
pj=
σjωjσj⩽uj
1
1 + ω2
j
max{σj+ωjuj,0}Otherwise,,(10)
θj= max{ωjpj, uj},(11)
where ωj≥0is the weight associated with the j-th singular value σj.
The epigraph of the weighted ℓ2,1norm E2,1,ω:Given a matrix Z= [z†
1,·· · ,z†
N]†∈RN×L,
the proximity operator of the epigraph of the weighted ℓ2,1norm reads an epigraphical projection
onto the weighted ℓ2,1ball of radius eu=PN
n=1 unwith u= (un)1⩽n⩽N. This projection, denoted
by PE2,1,ω (Z,u)reads a row-wise epigraphical projections onto the weighted ℓ2balls as follows:
PE2,1,ω (Z,u) = (P,θ)with P= [p†
1,·· · ,p†
N]†∈RN×L,θ= (θn)1⩽n⩽Nand ∀n∈ {1,·· · , N}:
p†
n=
0z†
n=0
z†
nωn∥zn∥2⩽un
1
1 + ω2
n
max 1 + ωnun
∥zn∥2
,0z†
nOtherwise,
,(12)
θn= max{ωn∥pn∥2, un},(13)
where ωn≥0is the weight associated with the row z†
n.
The indicator function of a half-space ιV:The proximity operator of the indicator function
of a half-space dened as
V=
z= [z1,z2]∈RN1×RN2|
J
X
j=1
z1,j +
N
X
n=1
z2,n ⩽eηα
(14)
is given by the projection onto the hyper-plane dening the boundary of the half-space as follows:
PV(eηα)(z) =
zPJ
j=1 z1,j +PN
n=1 z2,n ⩽eηα
z+eηα−PJ
j=1 z1,j +PN
n=1 z2,n
N1+N2
Otherwise.
(15)
The indicator function of a box ι[−τ,τ ]N×L:Given a matrix Z∈RN×L, the proximity operator
of the indicator function ι[−τ,τ ]N×Lis a projection onto the box as follows:
Pι[−τ,τ ]N×L(Z) = max{min{Z, τ },−τ}.(16)
127 .2 Proximity operators
The indicator function of the ℓ2ball ιB2(y,ϵ):Given a vector z∈RN, the proximity operator
of the indicator function ιB2(y,ϵ)is a projection onto ℓ2ball centered at y∈RNand of radius ϵ∈R
as follows:
PιB2(y,ϵ (z) = y+min ϵ
∥z−y∥,1(z−y).(17)
128
.3 Overview of the parameters specic to the adaptive PDFB
algorithm (Algorithm 3)
An overview of the variables and parameters involved in the adjustment of the ℓ2bounds on the
data delity terms is presented in Tables 1and 2, respectively.
Table 1: Overview of the variables employed in the adaptive procedure incorporated in Algorithm 3.
ρb
l
(t)the ℓ2norm of the residual data corre-
sponding to the data block yb
lat iteration
t.
ϑb
l
(t−1) iteration index of the previous update of
the ℓ2bound of the data block yb
l.
β(t−1) the relative variation of the solution at it-
eration t−1.
Table 2: Overview of the parameters involved in the adaptive procedure incorporated in Algorithm 3.
λ1∈]0,1[ the bound on the relative variation of the
solution (we x it to 5×10−4).
λ2∈]0,1[ the tolerance on the relative dierence be-
tween the current estimate of a data block
ℓ2bound and the ℓ2norm of the associated
residual data (we x it to 0.01).
λ3∈]0,1[ the parameter dening the increment of
the ℓ2bound with respect to the ℓ2norm
of the residual data (we x it to 0.618).
¯
ϑthe minimum number of iterations between
consecutive updates of each ℓ2bound (we
x it to 100).
129 .4 Randomized PDFB algorithm
.4 Randomized PDFB algorithm
We have explained in Section 3.4.2 that the primal dual framework adopted in this thesis allows for
randomized updates of the dual variables [93]. This feature can signicantly reduce the memory
requirements, thus ensuring higher scalability of the algorithmic structure, at the expense of an
increased number of iterations to achieve convergence. To give the reader a brief idea about the
randomization procedure, we showcase the performance of the HyperSARA approach (proposed
in Chapter 4) with and without randomization of the dual variables associated with the data-
delity blocks. Furthermore, we report the performance of one of the convex optimization methods
developed for wideband RI imaging, dubbed WDCT, [54]. Since WDCT involves no reweighting,
we restrict HyperSARA to the simple case of no reweighting, resulting the LRJAS algorithm.
One the one hand, LRJAS solves a constrained minimization problem of the form (4.8) with
ω=1Jand ω=1T, promoting low-rankness and joint average sparsity of the image cube in a
redundant wavelet dictionary, namely the SARA dictionary. On the other hand, WDCT solves an
unconstrained minimization problem of the form (3.32), promoting sparsity of both the spatial and
spectral information. Spatial sparsity is promoted in a redundant wavelet dictionary and sparsity
of the spectra is enforced in a DCT dictionary.
.4.1 Simulations and results
Results are reported for simulations using a radio emission map from an HII region in the M31
galaxy. The image is of size N= 256 ×256 pixels and is considered as the original sky image x1
at the reference frequency ν1= 1.4GHz (Figure 1, rst row). The RI image cube is simulated
following the spectral curvature model (2.23). In order to ensure a spatial correlation in the
spectral index map, the latter is generated in an ad hoc manner, similarly to [54,69], that is a
linear combination of the reference sky image x1smoothed with a Gaussian kernel of size 3×3at
FWHM, and a random Gaussian eld. The RI data cube is simulated using realistic uv-coverage
from the VLA array at the reference frequency ν1. For each channel indexed l, its corresponding
uv-coverage is obtained by scaling the reference uv-coverage with νl/ν1. The data cube is generated
within the frequency range [ν1, νL] = [1.4,2.8] GHz, with uniformly sampled channels. The test is
carried on a cube of total number of L= 16 channels, sampling rate SR (4.15) equal to 0.5, and
iSNR (4.13) of 30 dB. We assign one data block per channel, resulting in a total number of 16
data blocks. The adopted metric to assess the reconstruction quality of the dierent methods is
the aSNR metric (4.20).
Figure 1, second row, reveals the aSNR evolution for the dierent algorithms. For the ran-
domised LRJAS method, dubbed LRJAS-R, we x the probability of selecting an active subset
from the full data to 0.5, meaning that only half of the data-delity blocks, selected at random,
are updated at each iteration. This has the advantage of lower infrastructure and memory require-
130
(a) Ground-truth image at channel ν1= 1.4GHz
(b) aSNR evolution
Figure 1: Simulations with VLA uv-coverage: (a) The ground-truth image at the reference frequency
x1. (b) Curves representing the evolution of aSNR (y-axis) as a function of the number of iterations
(x-axis), for the dierent methods: LRJAS, LRJAS-R (LRJAS with randomized updates) and WDCT.
ments, at the expense of an increased number of iterations to achieve convergence. We can see that
LRJAS and LRJAS-R exhibit comparable performance. Yet, LRJAS-R needs more iterations to
reach the same aSNR value. Also, when compared to WDCT, the randomized algorithm presents
superior performance; LRJAS-R reaches an aSNR = 25 dB that is 5dB higher than WDCT.
It is worth noting that this randomization scheme can be used in the same fashion with the
Faceted HyperSARA approach (proposed in Chapter 5) and the uncertainty quantication algo-
rithm (introduced in Chapter 6), allowing to update the dual variables less often.
Bibliography
[1] A. Abdulaziz. A Low-Rank and Joint-Sparsity Model for Wide-Band Radio-Interferometric
Imaging. Master’s thesis, Heriot-Watt University, United Kingdom, 2016.
[2] A. Abdulaziz, A. Dabbech, A. Onose, and Y. Wiaux. A low-rank and joint-sparsity model
for hyper-spectral radio-interferometric imaging. In 2016 24th European Signal Processing
Conference (EUSIPCO), pages 388–392, Aug 2016.
[3] A. Abdulaziz, A. Dabbech, and Y. Wiaux. Wideband super-resolution imaging in radio
interferometry via low rankness and joint average sparsity models (hypersara). Monthly
Notices of the Royal Astronomical Society, 489(1):1230–1248, 2019.
[4] A. Abdulaziz, A. Onose, A. Dabbech, and Y. Wiaux. A distributed algorithm for wide-
band radio-interferometry. In International Biomedical and Astronomical Signal Processing
Frontiers Workshop, page 6, 1 2017.
[5] A. Abdulaziz, A. Repetti, and Y. Wiaux. Hyperspectral uncertainty quantication by opti-
mization. 2019.
[6] R. Ammanouil, A. Ferrari, R. Flamary, C. Ferrari, and D. Mary. Multi-frequency image
reconstruction for radio-interferometry with self-tuned regularization parameters. In 2017
25th European Signal Processing Conference (EUSIPCO), pages 1435–1439, Aug 2017.
[7] P. Arras, P. Frank, R. Leike, R. Westermann, and T. Enßlin. Unied radio interferometric cal-
ibration and imaging with joint uncertainty quantication. arXiv preprint arXiv:1903.11169,
2019.
[8] H. H. Bauschke and P. L. Combettes. Convex analysis and monotone operator theory in
Hilbert spaces. Springer Science & Business Media, 2011.
[9] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse
problems. SIAM journal on imaging sciences, 2(1):183–202, 2009.
[10] S. Bhatnagar and T. J. Cornwell. Scale sensitive deconvolution of interferometric images-i.
adaptive scale pixel (asp) decomposition. Astronomy & Astrophysics, 426(2):747–754, 2004.
131
BIBLIOGRAPHY 132
[11] J. Birdi, A. Repetti, and Y. Wiaux. Sparse interferometric Stokes imaging under polarization
constraint (Polarized SARA). 478(4):4442–4463, August 2018.
[12] J. Birdi, A. Repetti, and Y. Wiaux. Polca SARA - full polarization, direction-dependent
calibration and sparse imaging for radio interferometry. 2019. to appear.
[13] T. Blumensath and M. E. Davies. Iterative thresholding for sparse approximations. Journal
of Fourier analysis and Applications, 14(5-6):629–654, 2008.
[14] M. Born and E. Wolf. Principles of optics: electromagnetic theory of propagation, interference
and diraction of light. Elsevier, 2013.
[15] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and
statistical learning via the alternating direction method of multipliers. Foundations and
Trends® in Machine Learning, 3(1):1–122, 2011.
[16] D. S. Briggs. High delity interferometric imaging: robust weighting and nnls deconvolution.
In Bulletin of the American Astronomical Society, volume 27, page 1444, 1995.
[17] C. L. Brogan, J. D. Gelfand, B. M. Gaensler, N. E. Kassim, and T. J. W Lazio. Discovery of 35
new supernova remnants in the inner galaxy. The Astrophysical Journal Letters, 639(1):L25,
2006.
[18] G. B.Taylor, C. l. Carilli, and R. A. Perley. Synthesis imaging in radio astronomy ii. In
Synthesis Imaging in Radio Astronomy II, volume 180, 1999.
[19] E. J. Candès. Compressive sampling. In Proceedings of the international congress of mathe-
maticians, volume 3, pages 1433–1452. Madrid, Spain, 2006.
[20] E. J. Candès, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? journal
of ACM, 58(1):1–37, 2009.
[21] E. J. Candès, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal recon-
struction from highly incomplete frequency information. IEEE Transactions on information
theory, 52(2):489–509, 2006.
[22] E. J. Candès and M. B. Wakin. An introduction to compressive sampling [a sensing/sampling
paradigm that goes against the common knowledge in data acquisition]. IEEE signal pro-
cessing magazine, 25(2):21–30, 2008.
[23] E. J. Candes, M. B. Wakin, and S. P. Boyd. Enhancing sparsity by reweighted � 1 minimiza-
tion. Journal of Fourier analysis and applications, 14(5):877–905, 2008.
133 BIBLIOGRAPHY
[24] C. L. Carilli, S. Furlanetto, F. Briggs, M. Jarvis, S. Rawlings, and H. Falcke. Probing the
dark ages with the square kilometer array. New Astronomy Reviews, 48(11):1029 – 1038,
2004. Science with the Square Kilometre Array.
[25] R. E. Carrillo, J. D. McEwen, D. Van De Ville, J. Thiran, and Y. Wiaux. Sparsity averaging
for compressive imaging. IEEE Signal Processing Letters, 20(6):591–594, June 2013.
[26] R. E. Carrillo, J. D. McEwen, and Y. Wiaux. Sparsity averaging reweighted analysis
(SARA): a novel algorithm for radio-interferometric imaging. Monthly Notices of the Royal
Astronomical Society, 426(2):1223–1234, 2012.
[27] R. E. Carrillo, J. D. McEwen, and Y. Wiaux. PURIFY: a new approach to radio-
interferometric imaging. Monthly Notices of the Royal Astronomical Society, 439(4):3591–
3604, 2014.
[28] A. Chambolle and T. Pock. A rst-order primal-dual algorithm for convex problems with
applications to imaging. Journal of mathematical imaging and vision, 40(1):120–145, 2011.
[29] A. Chambolle and T. Pock. An introduction to continuous optimization for imaging. Acta
Numerica, 25:161–319, 2016.
[30] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit.
SIAM review, 43(1):129–159, 2001.
[31] G. Chierchia, N. Pustelnik, J. c. Pesquet, and B. Pesquet-Popescu. Epigraphical projection
and proximal tools for solving constrained convex optimization problems. Signal, Image and
Video Processing, 9(8):1737–1749, 2015.
[32] B.G. Clark. An ecient implementation of the algorithm’clean’. Astronomy and Astrophysics,
89:377, 1980.
[33] P. L. Combettes and J.-C. Pesquet. Proximal thresholding algorithm for minimization over
orthonormal bases. SIAM Journal on Optimization, 18(4):1351–1376, 2007.
[34] P. L. Combettes and J.-C. Pesquet. Proximal splitting methods in signal processing. In Fixed-
point algorithms for inverse problems in science and engineering, pages 185–212. Springer,
2011.
[35] P. L. Combettes and J.-C. Pesquet. Primal-dual splitting algorithm for solving inclusions with
mixtures of composite, lipschitzian, and parallel-sum type monotone operators. Set-Valued
and Variational Analysis, 20(2):307–330, 2012.
[36] P. L. Combettes and B. C. Vũ. Variable metric forward–backward splitting with applications
to monotone inclusions in duality. Optimization, 63(9):1289–1318, 2014.
BIBLIOGRAPHY 134
[37] L. Condat. A primal–dual splitting method for convex optimization involving lipschitzian,
proximable and linear composite terms. Journal of Optimization Theory and Applications,
158(2):460–479, 2013.
[38] T. J. Cornwell. Multiscale clean deconvolution of radio synthesis images. IEEE Journal of
Selected Topics in Signal Processing, 2(5):793–801, Oct 2008.
[39] E. C.Sutton and B. D. Wandelt. Optimal image reconstruction in radio interferometry. The
Astrophysical Journal Supplement Series, 162(2):401, 2006.
[40] D. A. Perley, R. A. Perley, V. Dhawan, and C. L. Carilli. Discovery of a Luminous Ra-
dio Transient 460 pc from the Central Supermassive Black Hole in Cygnus A. Journal of
something, 841:117, June 2017.
[41] A. Dabbech. Déconvolution d’images en radio-astronomie centimétrique pour l’exploitation
de LOFAR et SKA: caractérisation du milieu non-thermique des amas de galaxies. PhD
thesis, PhD thesis, Univ. Nice, 2014.
[42] A. Dabbech, C. Ferrari, D. Mary, E. Slezak, O. Smirnov, and J. S. Kenyon. MORESANE:
MOdel REconstruction by Synthesis-ANalysis Estimators - A sparse deconvolution algorithm
for radio interferometric imaging”. Astronomy and Astrophysics, 576:16, 2015.
[43] A. Dabbech, D. Mary, and C. Ferrari. Astronomical image deconvolution using sparse priors:
An analysis-by-synthesis approach. In 2012 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pages 3665–3668, March 2012.
[44] A. Dabbech, A. Onose, A. Abdulaziz, R. A. Perley, O. M. Smirnov, and Y. Wiaux. Cygnus
a super-resolved via convex optimization from vla data. Monthly Notices of the Royal Astro-
nomical Society, 476(3):2853–2866, 2018.
[45] A. Dabbech, A. Repetti, and Y. Wiaux. Self direction-dependent calibration for wideband
radio-interferometric imaging. 2 2019. International BASP Frontiers workshop 2019 ; Con-
ference date: 03-02-2019 Through 08-02-2019.
[46] I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm for linear in-
verse problems with a sparsity constraint. Communications on Pure and Applied Mathemat-
ics: A Journal Issued by the Courant Institute of Mathematical Sciences, 57(11):1413–1457,
2004.
[47] J. Deguignet, A. Ferrari, D. Mary, and C. Ferrari. Distributed multi-frequency image re-
construction for radio-interferometry. In 2016 24th European Signal Processing Conference
(EUSIPCO), pages 1483–1487, Aug 2016.
135 BIBLIOGRAPHY
[48] P. Dewdney, W. Turner, R. Millenaar, R. McCool, J. Lazio, and T. Cornwell. Ska1 system
baseline design. Document number SKA-TEL-SKO-DD-001 Revision, 1(1), 2013.
[49] D. L. Donoho. Compressed sensing. Information Theory, IEEE Transactions on, 52(4):1289–
1306, 2006.
[50] A. Durmus, E. Moulines, and M. Pereyra. Ecient bayesian computation by proximal
markov chain monte carlo: when langevin meets moreau. SIAM Journal on Imaging Sciences,
11(1):473–506, 2018.
[51] M. Elad, B. Matalon, J. Shtok, and M. Zibulevsky. A wide-angle view at iterated shrinkage
algorithms. In Wavelets XII, volume 6701, page 670102. International Society for Optics and
Photonics, 2007.
[52] M. Elad, P. Milanfar, and R. Rubinstein. Analysis versus synthesis in signal priors. Inverse
problems, 23(3):947, 2007.
[53] H. W. Engl, M. Hanke, and A. Neubauer. Regularization of inverse problems, volume 375.
Springer Science & Business Media, 1996.
[54] A. Ferrari, J. Deguignet, C. Ferrari, D. Mary, A. Schutz, and O. Smirnov. Multi-frequency
image reconstruction for radio interferometry. a regularized inverse problem approach. arXiv
preprint arXiv:1504.06847, 2015.
[55] J. A. Fessler and B. P. Sutton. Nonuniform fast Fourier transforms using min-max interpo-
lation. 51(2):560–574, January 2003.
[56] B. M. Gaensler, R. Beck, and L. Feretti. The origin and evolution of cosmic magnetism. New
Astronomy Reviews, 48(11-12):1003–1012, 2004.
[57] H. Garsden, J. Girard, J.-L. Starck, S. Corbel, C. Tasse, A. Woiselle, J. McKean, A. S.Van
Amesfoort, J. Anderson, I. Avruch, et al. Lofar sparse image reconstruction. Astronomy &
astrophysics, 575:A90, 2015.
[58] G. H. G.Chen and R. T. Rockafellar. Convergence rates in forward–backward splitting. SIAM
Journal on Optimization, 7(2):421–444, 1997.
[59] J. Geiping and M. Moeller. Composite optimization by nonconvex majorization-
minimization. 11(4):2494–2598, 2018.
[60] J. N. Girard, H. Garsden, J. L. Starck, S. Corbel, A. Woiselle, C. Tasse, J. P. McKean, and
J. Bobin. Sparse representations and convex optimization as tools for lofar radio interfero-
metric imaging. Journal of Instrumentation, 10(08):C08013, 2015.
BIBLIOGRAPHY 136
[61] M. Golbabaee, S. Arberet, and P. Vandergheynst. Compressive source separation: The-
ory and methods for hyperspectral imaging. IEEE Transactions on Image Processing,
22(12):5096–5110, Dec 2013.
[62] M. Golbabaee and P. Vandergheynst. Hyperspectral image compressed sensing via low-rank
and joint-sparse matrix recovery. In 2012 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pages 2741–2744, March 2012.
[63] J.-B. Hiriart-Urruty and C. Lemaréchal. Convex analysis and minimization algorithms ii:
Advanced theory and bundle methods, vol. 306 of grundlehren der mathematischen wis-
senschaften, 1993.
[64] J. A. Högbom. Aperture synthesis with a non-regular distribution of interferometer baselines.
Astronomy and Astrophysics Supplement Series, 15:417, 1974.
[65] D. R. Hunter and K. Lange. A tutorial on MM algorithms. The American Statistician,
58(1):30–37, 2004.
[66] M. Jiang, J. Bobin, and J. Starck. Joint multichannel deconvolution and blind source sepa-
ration. SIAM Journal on Imaging Sciences, 10(4):1997–2021, 2017.
[67] K. H. Jin, M. T. McCann, E. Froustey, and M. Unser. Deep convolutional neural network
for inverse problems in imaging. IEEE Transactions on Image Processing, 26(9):4509–4522,
2017.
[68] J. Jonas et al. The meerkat radio telescope. In MeerKAT Science: On the Pathway to the
SKA, volume 277, page 001. SISSA Medialab, 2018.
[69] H. Junklewitz, M. R. Bell, and T. Enßlin. A new approach to multifrequency synthesis in
radio interferometry. Astronomy & Astrophysics, 581:A59, 2015.
[70] H. Junklewitz, M. R. Bell, M. Selig, and T. A. Enßlin. Resolve: A new algorithm for aperture
synthesis imaging of extended emission in radio astronomy. Astronomy & Astrophysics,
586:A76, 2016.
[71] F. Knoll, K. Hammernik, C. Zhang, S. Moeller, T. Pock, D. K. Sodickson, and M. Akcakaya.
Deep learning methods for parallel magnetic resonance image reconstruction. arXiv preprint
arXiv:1904.01112, 2019.
[72] N. Komodakis and J.-C. Pesquet. Playing with duality: An overview of recent primal-dual
approaches for solving large-scale optimization problems. IEEE Signal Processing Magazine,
32(6):31–54, Nov 2015.
137 BIBLIOGRAPHY
[73] D. Li, R. Nan, and Z. Pan. The ve-hundred-meter aperture spherical radio telescope project
and its early science opportunities. Proceedings of the International Astronomical Union,
8(S291):325–330, 2012.
[74] F. Li, T. J. Cornwell, and F. de Hoog. The application of compressive sampling to radio
astronomy - i. deconvolution. Astronomy & Astrophysics, 528:A31, 2011.
[75] M. P. van Haarlem, M. W. Wise, A. W. Gunst, G. Heald, J. P. McKean, J. W. T. Hessels,
A. G. de Bruyn, R. Nijboer, J. Swinbank, R. Fallows, and others. Lofar: The low-frequency
array. A&A, 556:A2, 2013.
[76] S. Mallat et al. A wavelet tour of signal processing: The sparce way. AP Professional, Third
Edition, London, 2009.
[77] S. G. Mallat and Z. Zhang. Matching with time-frequency dictionaries. IEEE Transactions
on signal processing, 41(12):3397–3415, 1993.
[78] J.-J. Moreau. Proximité et dualité dans un espace hilbertien. Bulletin de la Société mathé-
matique de France, 93:273–299, 1965.
[79] R. Murya, A. Ferrari, R. Flamary, and C. Richard. Distributed deblurring of large images
of wide eld-of-view. arXiv preprint, May 2017.
[80] S. Naghibzedeh, A. Repetti, A.-J. van der Veen, and Y. Wiaux. Facet-based regularization
for scalable radio-interferometric imaging. Roma, Italy, September 2018. to appear.
[81] J. Nocedal and S. Wright. Numerical optimization. Springer Science & Business Media, 2006.
[82] P. Ochs, A. Dosovitskiy, T. Brox, and T. Pock. On iteratively reweighted algorithms for
non-smooth nonconvex optimization in computer vision. 8(1):331–372, 2015.
[83] P. Ochs, J. Fadili, and T. Brox. Non-smooth non-convex Bregman minimization: Unication
and new algorithms. 181(1):244–278, 2019.
[84] A. R. Oringa, B. McKinley, N. Hurley-Walker, F. H. Briggs, R. B. Wayth, D. L. Kaplan,
M. E. Bell, L. U. Feng, A. R. Neben, J. D. Hughes, et al. Wsclean: an implementation of a
fast, generic wide-eld imager for radio astronomy. Monthly Notices of the Royal Astronomical
Society, 444(1):606–619, 2014.
[85] A. R. Oringa and O. Smirnov. An optimized algorithm for multiscale wideband decon-
volution of radio astronomical images. Monthly Notices of the Royal Astronomical Society,
471(1):301–316, 2017.
BIBLIOGRAPHY 138
[86] A. Onose, R. E. Carrillo, J. D. McEwen, and Y. Wiaux. A randomised primal-dual algo-
rithm for distributed radio-interferometric imaging. In 2016 24th European Signal Processing
Conference (EUSIPCO), pages 1448–1452, Aug 2016.
[87] A. Onose, R. E. Carrillo, A. Repetti, J. D. McEwen, J. p. Thiran, J.-C. Pesquet, and
Y. Wiaux. Scalable splitting algorithms for big-data interferometric imaging in the ska
era. Monthly Notices of the Royal Astronomical Society, 462(4):4314–4335, 2016.
[88] A. Onose, A. Dabbech, and Y. Wiaux. An accelerated splitting algorithm for radio-
interferometric imaging: when natural and uniform weighting meet. Monthly Notices of
the Royal Astronomical Society, 469(1):938–949, 2017.
[89] H. Pan, M. Simeoni, P. Hurley, T. Blu, and M. Vetterli. Leap: Looking beyond pixels with
continuous-space estimation of point sources. Astronomy & Astrophysics, 608:A136, 2017.
[90] M. Pereyra. Proximal markov chain monte carlo algorithms. Statistics and Computing,
26(4):745–760, 2016.
[91] M. Pereyra. Maximum-a-posteriori estimation with bayesian condence regions. SIAM Jour-
nal on Imaging Sciences, 10(1):285–302, 2017.
[92] R. A. Perley, C. J. Chandler, B. J. Butler, and J. M. Wrobel. The expanded very large array:
A new telescope for new science. The Astrophysical Journal Letters, 739(1):L1, 2011.
[93] J.-C. Pesquet and A. Repetti. A class of randomized primal-dual algorithms for distributed
optimization. 16(12):2453–2490.
[94] L. Pratley, J. D. McEwen, M. d’Avezac, R. E. Carrillo, A. Onose, and Y. Wiaux. Robust
sparse image reconstruction of radio interferometric observations with purify. Monthly Notices
of the Royal Astronomical Society, 473(1):1038–1058, 2017.
[95] Z. Pruša. Segmentwise discrete wavelet transform. PhD thesis, Brno university of technology,
2012.
[96] U. Rau and T. J. Cornwell. A multi-scale multi-frequency deconvolution algorithm for syn-
thesis imaging in radio interferometry. Astronomy & Astrophysics, 532:A71, 2011.
[97] S. Rawlings, F. B. Abdalla, S. L. Bridle, C. A. Blake, C. M. Baugh, L. J. Greenhill, and
J. M. van der Hulst. Galaxy evolution, cosmology and dark energy with the square kilometer
array. New Astronomy Reviews, 48(11):1013 – 1027, 2004. Science with the Square Kilometre
Array.
[98] A. Repetti, J. Birdi, A. Dabbech, and Y. Wiaux. Non-convex optimization for self-calibration
of direction-dependent eects in radio interferometric imaging. 470(4):3981–4006, October
2017.
139 BIBLIOGRAPHY
[99] A. Repetti, M. Pereyra, and Y. Wiaux. Uncertainty quantication in imaging: When convex
optimization meets bayesian analysis. In 2018 26th European Signal Processing Conference
(EUSIPCO), pages 2668–2672. IEEE, 2018.
[100] A. Repetti, M. Pereyra, and Y. Wiaux. Scalable bayesian uncertainty quantication in
imaging inverse problems via convex optimization. SIAM Journal on Imaging Sciences,
12(1):87–118, 2019.
[101] A. Repetti and Y. Wiaux. A non-convex perspective on calibration and imaging in radio
interferometry. In Proceedings of the conference on Wavelets and Sparsity XVII, part of
the SPIE Optical Engineering + Applications, San Diego, California, United States, August
2017.
[102] A. Repetti and Y. Wiaux. Variable metric forward-backward algorithm for composite mini-
mization problems. arXiv preprint, 2019.
[103] C. Robert. The Bayesian choice: from decision-theoretic foundations to computational im-
plementation. Springer Science & Business Media, 2007.
[104] R. Rubinstein, A. M. Bruckstein, and M. Elad. Dictionaries for sparse representation mod-
eling. Proceedings of the IEEE, 98(6):1045–1057, 2010.
[105] L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algo-
rithms. Physica D: nonlinear phenomena, 60(1-4):259–268, 1992.
[106] G. B. Rybicki and A. P. Lightman. Radiative processes in astrophysics. John Wiley & Sons,
2008.
[107] R. J. Sault and M. H. Wieringa. Multi-frequency synthesis techniques in radio interferometric
imaging. Astronomy and Astrophysics Supplement Series, 108, 1994.
[108] A. M. M. Scaife. Big telescope, big data: towards exascale with the square kilometre array.
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering
Sciences, 378(2166):20190060, 2020.
[109] F. R. Schwab and W. D. Cotton. Global fringe search techniques for vlbi. The Astronomical
Journal, 88:688–694, 1983.
[110] J.-L. Starck, D. L. Donoho, and E. J. Candès. Astronomical image representation by the
curvelet transform. Astronomy & Astrophysics, 398(2):785–800, 2003.
[111] J.-L. Starck and F. Murtagh. Image restoration with noise suppression using the wavelet
transform. Astronomy and Astrophysics, 288:342–348, 1994.
BIBLIOGRAPHY 140
[112] J.-L. Starck, F. Murtagh, and J. M. Fadili. Sparse image and signal processing: wavelets,
curvelets, morphological diversity. Cambridge university press, 2010.
[113] J.-L. Starckand, A. Bijaoui, B. Lopez, and C. Perrier. Image reconstruction by the wavelet
transform applied to aperture synthesis. Astronomy and Astrophysics, 283:349–360, 1994.
[114] P. M. Sutter, B. D. Wandelt, J. D. McEwenand E. F. Bunn, A. Karakci, A. Korotkov,
P. Timbie, and G. S. Tuckerand L. Zhang. Probabilistic image reconstruction for radio
interferometers. Monthly Notices of the Royal Astronomical Society, 438(1):768–778, 2014.
[115] C. Tasse, B. Hugo, M. Mirmont, O. Smirnov, M. Atemkeng, L. Bester, M. J. Hardcastle,
R. Lakhoo, S. Perkins, and T. Shimwell. Faceting for direction-dependent spectral deconvo-
lution. Astronomy & Astrophysics, 611:A87, March 2018.
[116] A. R. Thompson, J. M. Moran, and G. W. Swenson. Interferometry and Synthesis in Radio
Astronomy. Wiley-VCH, 2007.
[117] P.-A. Thouvenin, A. Abdulaziz, M. Jiang, A. Dabbech, A. Repetti, A. Jackson, J.-P. Thi-
ran, and Y. Wiaux. Cygnus A image cubes at C band (4-8 GHz) obtained with Faceted
HyperSARA, 2020.
[118] P.-A. Thouvenin, A. Abdulaziz, M. Jiang, A. Dabbech, A. Repetti, A. Jackson, J.-P. Thiran,
and Y. Wiaux. Parallel faceted imaging in radio interferometry via proximal splitting (faceted
hypersara): when precision meets scalability. Monthly Notices of the Royal Astronomical
Society, 2020.
[119] P.-A. Thouvenin, A. Abdulaziz, M. Jiang, A. Repetti, and Y. Wiaux. A faceted prior for
scalable wideband computational imaging. 2019.
[120] P.-A. Thouvenin, A. Abdulaziz, M. Jiang, A. Repetti, and Y. Wiaux. A faceted prior for
scalable wideband imaging: Application to radio astronomy. In 2019 IEEE International
Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP),
2019.
[121] P.-A. Thouvenin, A. Repetti, A. Dabbech, and Y. Wiaux. Time-regularized blind deconvo-
lution approach for radio interferometry. pages 475–479, Sheeld, UK, July 2018.
[122] M. P. van Haarlem, M. W. Wise, A. W. Gunst, G. Heald, J. P. McKean, J. W. Hessels,
A. G. de Bruyn, R. Nijboer, Swinbank, R. Fallows, et al. Lofar: The low-frequency array.
Astronomy & astrophysics, 556:A2, 2013.
[123] B. C. Vũ. A splitting algorithm for dual monotone inclusions involving cocoercive operators.
Advances in Computational Mathematics, 38(3):667–681, 2013.
141 BIBLIOGRAPHY
[124] S. Wenger and M. Magnor. A sparse reconstruction algorithm for multi-frequency radio
images. Technical report, Computer Graphics Lab, TU Braunschweig, 2014.
[125] S. Wenger, M. Magnor, Y. Pihlström, S. Bhatnagar, and U. Rau. Sparseri: A compressed
sensing framework for aperture synthesis imaging in radio astronomy. Publications of the
Astronomical Society of the Pacic, 122(897):1367, 2010.
[126] Y. Wiaux, L. Jacques, G. Puy, A. M. M. Scaife, and P. Vandergheynst. Compressed sensing
imaging techniques for radio interferometry. Monthly Notices of the Royal Astronomical
Society, 395(3):1733–1742, 2009.