ThesisPDF Available

PhD Thesis: Imaging and Uncertainty Quantification in Radio Astronomy via Convex Optimization: When Precision Meets Scalability

Authors:

Abstract and Figures

Upcoming radio telescopes such as the Square Kilometre Array (SKA) will provide sheer amounts of data, allowing large images of the sky to be reconstructed at an unprecedented resolution and sensitivity over thousands of frequency channels. In this regard, wideband radio-interferometric imaging consists in recovering a 3D image of the sky from incomplete and noisy Fourier data, that is a highly ill-posed inverse problem. To regularize the inverse problem, advanced prior image models need to be tailored. Moreover, the underlying algorithms should be highly parallelized to scale with the vast data volumes provided and the Petabyte image cubes to be reconstructed for SKA. The research developed in this thesis leverages convex optimization techniques to achieve precise and scalable imaging for wideband radio interferometry and further assess the degree of confidence in particular 3D structures present in the reconstructed cube. In the context of image reconstruction, we propose a new approach that decomposes the image cube into regular spatio-spectral facets, each is associated with a sophisticated hybrid prior image model. The approach is formulated as an optimization problem with a multitude of facet-based regularization terms and block-specific data-fidelity terms. The underpinning algorithmic structure benefits from well-established convergence guarantees and exhibits interesting functionalities such as preconditioning to accelerate the convergence speed. Furthermore, it allows for parallel processing of all data blocks and image facets over a multiplicity of CPU cores, allowing the bottleneck induced by the size of the image and data cubes to be efficiently addressed via parallelization. The precision and scalability potential of the proposed approach are confirmed through the reconstruction of a 15 GB image cube of the Cyg A radio galaxy. In addition, we propose a new method that enables analyzing the degree of confidence in particular 3D structures appearing in the reconstructed cube. This analysis is crucial due to the high ill-posedness of the inverse problem. Besides, it can help in making scientific decisions on the structures under scrutiny (\emph{e.g.}, confirming the existence of a second black hole in the Cyg A galaxy). The proposed method is posed as an optimization problem and solved efficiently with a modern convex optimization algorithm with preconditioning and splitting functionalities. The simulation results showcase the potential of the proposed method to scale to big data regimes.
Content may be subject to copyright.
Imaging and Uncertainty Quantication in Radio
Astronomy via Convex Optimization: When
Precision Meets Scalability
ABDULLAH ABDULAZIZ
A Thesis Submitted for the Degree of
Doctor of Philosophy (PhD)
in
School of Engineering and Physical Sciences
Heriot-Watt University
Edinburgh, United Kingdom
Under the supervision of
Professor Yves Wiaux
·October 2020 ·
The copyright in this thesis is owned by the author. Any quotation from the thesis or use of any
of the information contained in it must acknowledge this thesis as the source of the quotation or
information.
Abstract
Upcoming radio telescopes such as the Square Kilometre Array (SKA) will provide sheer amounts
of data, allowing large images of the sky to be reconstructed at an unprecedented resolution and
sensitivity over thousands of frequency channels. In this regard, wideband radio-interferometric
imaging consists in recovering a 3D image of the sky from incomplete and noisy Fourier data, that
is a highly ill-posed inverse problem. To regularize the inverse problem, advanced prior image
models need to be tailored. Moreover, the underlying algorithms should be highly parallelized to
scale with the vast data volumes provided and the Petabyte image cubes to be reconstructed for
SKA. The research developed in this thesis leverages convex optimization techniques to achieve
precise and scalable imaging for wideband radio interferometry and further assess the degree of
condence in particular 3D structures present in the reconstructed cube.
In the context of image reconstruction, we propose a new approach that decomposes the image
cube into regular spatio-spectral facets, each is associated with a sophisticated hybrid prior image
model. The approach is formulated as an optimization problem with a multitude of facet-based
regularization terms and block-specic data-delity terms. The underpinning algorithmic struc-
ture benets from well-established convergence guarantees and exhibits interesting functionalities
such as preconditioning to accelerate the convergence speed. Furthermore, it allows for paral-
lel processing of all data blocks and image facets over a multiplicity of CPU cores, allowing the
bottleneck induced by the size of the image and data cubes to be eciently addressed via paral-
lelization. The precision and scalability potential of the proposed approach are conrmed through
the reconstruction of a 15 GB image cube of the Cyg A radio galaxy.
In addition, we propose a new method that enables analyzing the degree of condence in
particular 3D structures appearing in the reconstructed cube. This analysis is crucial due to the
high ill-posedness of the inverse problem. Besides, it can help in making scientic decisions on
the structures under scrutiny (e.g., conrming the existence of a second black hole in the Cyg A
galaxy). The proposed method is posed as an optimization problem and solved eciently with
a modern convex optimization algorithm with preconditioning and splitting functionalities. The
simulation results showcase the potential of the proposed method to scale to big data regimes.
I
Acknowledgements
I would like to acknowledge the contributions of several people who helped me to achieve this
success.
My immense gratitude to my supervisor, Prof Yves Wiaux, the person who believed in me and
kept pushing my limits. It was my honor to be a member of the BASP research group. I am very
grateful to the current and old Postdocs of the BASP group, Dr Alexandru Onose, Dr Audrey
Repetti, Dr Pierre-Antoine Thouvenin, and Dr Ming Jiang for all the help and knowledge they
provided me, and a special thanks to Dr Arwa Dabbech, my co-supervisor, for all her help, her
availability, and her toleration at all times. I was also lucky to have bright colleagues and friends
during my PhD journey, I would like to thank Dr Marica Pesce, Dr Jasleen Birdi, Dr Roberto
Duarte Coello, Mr Matthieu Terris, Dr Ahmed Karam Eldaly, Mr Quentin Legros, and Mr Amir
Matin for all the scientic discussions and enjoyable times.
I would like to thank my thesis examiners, Prof Jean-Luc Starck and Dr Yoann Altmann, for
their invaluable time in reading my manuscript, critically commenting on it, and providing very
constructive feedback. I would like also to thank Dr Joao Mota for monitoring my online VIVA
and making sure that I am comfortable during the whole process.
I am deeply grateful to Prof Stephen McLaughlin and Dr Yoann Altmann for oering me a
Postdoctoral position to pursue my research journey in a very comfortable and competitive research
environment. Their encouragement and support during the nal stage of my PhD and through the
COVID-19 circumstances are widely appreciated. I am also grateful to Dr Abderrahim Halimi for
his continuous advice and support. He has been a great example for me and a dear friend. I am
indebted for your kindness.
I give my sincerest thanks to my parents and my sisters whose belief in me surpasses my own.
Without their constant support and prayers, I would not have reached this point.
Last but not least, I express my heartfelt gratitude to Lolity, my wife and second-half. Thank
you for your constant devotion and care. Your love and encouragement when the times got rough
is deeply appreciated and duly noted. No success in the world is worth it unless I can share it with
you
Edinburgh, October 2020
II
Publications Related to the PhD
Thesis
Journal papers
[1] Thouvenin, P.A., Abdulaziz, A., Jiang, M., Dabbech, A., Repetti, A., Jackson, A., Thiran
J.-P. and Wiaux, Y., 2020, “Parallel faceted imaging in radio interferometry via proximal splitting
(Faceted HyperSARA): when precision meets scalability”. arXiv preprint.
[2] Abdulaziz, A., Dabbech, A. and Wiaux, Y., 2019. “Wideband super-resolution imaging in
radio interferometry via low rankness and joint average sparsity models (HyperSARA)”. Monthly
Notices of the Royal Astronomical Society, 489(1), pp.1230-1248.
[3] Dabbech, A., Onose, A., Abdulaziz, A., Perley, R.A., Smirnov, O.M. and Wiaux, Y., 2018.
“Cygnus A super-resolved via convex optimization from VLA data”. Monthly Notices of the Royal
Astronomical Society, 476(3), pp.2853-2866.
Conference papers
[1] Thouvenin, P.A., Abdulaziz, A., Jiang, M., Repetti, A. and Wiaux, Y., 2019, July. “A Faceted
Prior for Scalable Wideband Imaging: Application to Radio Astronomy. In IEEE International
Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).
[2] Thouvenin, P.A., Abdulaziz, A., Jiang, M., Repetti, A. and Wiaux, Y., “A Faceted Prior for
Scalable Wideband Computational Imaging”. 2019, April. In Signal Processing with Adaptive
Sparse Structured Representations (SPARS) workshop.
[3] Abdulaziz, A., Repetti, A. and Wiaux, Y., “Hyperspectral Uncertainty Quantication by Op-
III
timization”. 2019, April. In Signal Processing with Adaptive Sparse Structured Representations
(SPARS) workshop.
[4] Terris, M., Abdulaziz, A., Dabbech, A., Jiang, M., Repetti, A., Pesquet, J.C. and Wiaux, Y.,
2019, April. “Deep Post-Processing for Sparse Image Deconvolution”. In Signal Processing with
Adaptive Sparse Structured Representations (SPARS) workshop.
[5] Abdulaziz, A., Onose, A., Dabbech, A. and Wiaux, Y., 2017, January. “A distributed algo-
rithm for wide-band radio-interferometry”. In International Biomedical and Astronomical Signal
Processing (BASP) Frontiers Workshop 2017.(Best contribution)
[6] Abdulaziz, A., Dabbech, A., Onose, A. and Wiaux, Y., 2016, August. A low-rank and joint-
sparsity model for hyper-spectral radio-interferometric imaging. In 2016 24th European Signal
Processing Conference (EUSIPCO) IEEE. (pp. 388-392). (Best paper)
IV
Notations
General Rules
z: a scalar.
z: a vector.
zi: the i-th component of the vector z.
Z: a matrix.
Z: the adjoint of the matrix Z.
zl=Zl: the l-th column of the matrix Z.
Z
n: the n-th row of the matrix Z.
zn,l : the component in the n-th row and the l-th column of the matrix Z.
1N: ones row vector of size N.
IN: the identity matrix of size N×N.
xp: the pnorm of the vector x.
XF: the Frobenius norm of the matrix X.
|x|: the absolute value of the argument x.
Acronyms
RI: radio-interferometry or radio-interferometric.
FAST: the Five hundred meter Aperture Spherical Telescope.
VLA: the Karl G. Jansky Very Large Array.
LOFAR: the LOw Frequency ARray.
SKA: the Square Kilometre Array.
PSF: point spread function.
DDE: direction dependent eect.
2D: two-dimensional.
3D: three-dimensional.
V
GB: gigabyte.
FoV: eld of view.
FFT: fast Fourier transform.
DCT: discrete cosine transform.
MAP: maximum a posteriori.
CS: compressive sensing.
prox: proximity operator.
S: soft-thresholding operator.
P: projection operator.
SNR: signal-to-noise ratio.
iSNR: input signal-to-noise ratio.
SR: sampling rate.
SM: similarity metric.
STD: standard deviation.
FB: the forward-backward algorithm [33,58].
FISTA: the fast iterative shrinkage-thresholding algorithm [9].
PD: The primal-dual algorithm [28,37,72,93,123].
PDFB: the primal-dual algorithm with forward-backward iterations [72].
NNLS: the non-negative least-squares algorithm.
JC-CLEAN: the joined-channel CLEAN algorithm [85].
SARA: the sparsity averaging reweighted analysis approach [26].
HyperSARA: the proposed approach in Chapter 4 to solve the wideband RI imaging problem.
Faceted HyperSARA: the faceting-based approach proposed in Chapter 5 for scalable wideband
RI imaging.
HyperSARA-UQ: the proposed approach in Chapter 6 to solve the wideband uncertainty quanti-
cation problem in RI.
VI
Contents
Abstract I
Acknowledgements II
Publications Related to the PhD Thesis V
List of Figures XIII
List of Tables XXII
List of Algorithms XXIV
1 Introduction 1
1.1 Radio interferometry ................................... 2
1.2 Motivation ........................................ 3
1.3 Thesis outline ....................................... 4
2 Wideband radio interferometry 6
2.1 Introduction ........................................ 6
2.2 Spatial coherence function ................................ 7
2.3 Continuous RI data model ................................ 9
2.3.1 The eect of discrete sampling ......................... 10
2.3.2 The eect of the primary beam ......................... 12
2.3.3 Weighting schemes ................................ 12
2.4 Discrete RI data model ................................. 13
2.5 Radio-interferometric imaging .............................. 14
2.5.1 CLEAN-based algorithms ............................ 15
2.5.2 Bayesian inference techniques .......................... 16
2.5.3 Optimization algorithms ............................. 17
2.6 Conclusions ........................................ 18
VII
CONTENTS CONTENTS
3 Sparse representation and convex optimization 19
3.1 Introduction ........................................ 19
3.2 RI imaging problem formulation ............................ 20
3.3 Compressive sensing and sparse representation .................... 21
3.3.1 1minimization .................................. 22
3.3.2 Reweighted-1minimization ........................... 23
3.3.2.1 SARA for RI imaging ......................... 23
3.4 Convex optimization ................................... 24
3.4.1 Proximal splitting algorithms .......................... 25
3.4.2 Primal-dual .................................... 26
3.4.3 Convex optimization for wideband RI imaging - revisited .......... 28
3.5 Conclusions ........................................ 30
4 Wideband super-resolution imaging in RI 31
4.1 Motivation ........................................ 32
4.2 HyperSARA: optimization problem ........................... 32
4.2.1 Low-rankness and joint sparsity sky model .................. 33
4.2.2 HyperSARA minimization task ......................... 34
4.3 HyperSARA: algorithmic structure ........................... 36
4.3.1 HyperSARA in a nutshell ............................ 37
4.3.2 Underlying primal-dual forward-backward algorithm ............. 37
4.3.3 Adaptive 2bounds adjustment ......................... 39
4.3.4 Weighting schemes ................................ 40
4.4 Simulations ........................................ 43
4.4.1 Simulations settings ............................... 43
4.4.2 Benchmark algorithms .............................. 44
4.4.3 Imaging quality assessment ........................... 45
4.4.4 Imaging results .................................. 47
4.5 Application to real data ................................. 52
4.5.1 Data and imaging details ............................ 52
4.5.2 Imaging quality assessment ........................... 56
4.5.3 Real imaging results ............................... 57
4.6 Conclusions ........................................ 63
5 Faceted HyperSARA for wideband RI imaging: when precision meets scalability 69
5.1 Motivation ........................................ 70
5.2 Proposed faceting and Faceted HyperSARA approach ................ 71
5.2.0.1 Spectral faceting ............................ 71
VIII
CONTENTS CONTENTS
5.2.0.2 Spatial faceting ............................ 73
5.3 Algorithm and implementation ............................. 75
5.3.1 Faceted HyperSARA algorithm ......................... 75
5.3.2 Underpinning primal-dual forward-backward algorithm ............ 76
5.3.3 Parallel algorithmic structure .......................... 77
5.3.4 MATLAB implementation ............................ 78
5.4 Validation on synthetic data ............................... 78
5.4.1 Simulation setting ................................ 79
5.4.1.1 Images and data ............................ 79
5.4.1.2 Spatial faceting ............................ 80
5.4.1.3 Spectral faceting ............................ 82
5.4.2 Hardware ..................................... 82
5.4.3 Evaluation metrics ................................ 82
5.4.4 Results and discussion .............................. 83
5.4.4.1 Spatial faceting ............................ 83
5.4.4.2 Spectral faceting ............................ 84
5.5 Validation on real data .................................. 84
5.5.1 Dataset description and imaging settings ................... 90
5.5.2 Hardware ..................................... 91
5.5.3 Evaluation metrics ................................ 91
5.5.4 Results and discussion .............................. 92
5.5.4.1 Imaging quality ............................ 92
5.5.4.2 Computing cost ............................ 94
5.6 Conclusions ........................................ 94
6 Wideband uncertainty quantication by convex optimization 99
6.1 Motivation ........................................ 99
6.2 Wideband uncertainty quantication approach .................... 100
6.2.1 Bayesian hypothesis test ............................. 100
6.2.2 Choice of the set S................................ 103
6.3 Proposed minimization problem ............................. 104
6.4 Proposed algorithmic structure ............................. 105
6.4.1 Epigraphical splitting .............................. 105
6.4.2 Underpinning primal-dual forward-backward algorithm ............ 106
6.5 Validation on synthetic data ............................... 107
6.5.1 Simulation setting ................................ 107
6.5.2 Uncertainty quantication parameter ...................... 109
IX
CONTENTS CONTENTS
6.5.3 Results and discussion .............................. 110
6.6 Conclusions ........................................ 111
7 Conclusions and perspectives 120
7.1 Perspectives ........................................ 121
Appendices 123
.1 Basic denitions in convex optimization ........................ 124
.2 Proximity operators ................................... 125
.3 Overview of the parameters specic to the adaptive PDFB algorithm (Algorithm 3) 128
.4 Randomized PDFB algorithm .............................. 129
.4.1 Simulations and results ............................. 129
Bibliography 131
X
List of Figures
1.1 The full electromagnetic spectrum. Credit: NASA public domain image, CC-BY-SA
3.0. ............................................. 2
2.1 A distant source at direction son the celestial sphere observed by an antenna pair
(r1,r2)........................................... 9
2.2 VLA uv-coverage at the frequency ν= 8 GHz generated using: (a) A conguration.
(b) C conguration. ................................... 10
2.3 VLA uv-coverage generated: (a) at the frequency ν= 8 GHz. (b) at the frequency
ν= 4 GHz. ........................................ 11
2.4 Dirty beams associated with the VLA uv-coverage generated: (a) at the frequency
ν= 8 GHz. (b) at the frequency ν= 4 GHz. ..................... 12
4.1 Schematic diagram at iteration tin the adaptive PDFB, detailed in Algorithm 3.
It showcases the parallelism capabilities and overall computation ow. Intuitively,
each forward-backward step in data, prior and image space can be viewed as a
CLEAN-like iteration. The overall algorithmic structure then intuitively takes the
form of an interlaced and parallel multi-space version of CLEAN. ......... 41
4.2 Simulations using realistic VLA uv-coverage: (a) The realistic VLA uv-coverages of
all the channels projected onto one plane. (b) Channel ν1of the simulated wideband
model cube, a 256 ×256 region of the W28 supernova remnant, shown in log10 scale. 44
4.3 Simulations using random sampling with a Gaussian density prole: aSNR results
for the proposed approach HyperSARA and the benchmark methods LRJAS, JAS,
LR and the monochromatic approach SARA. The aSNR values of the estimated
model cubes (y-axis) are plotted as a function of the sampling rate (SR) (x-axis).
Each point corresponds to the mean value of 5noise realizations. The results are
displayed for dierent model cubes varying the number of channels Land the input
signal-to-noise ratio iSNR. (a) L= 60 channels and iSNR = 40 dB. (b) L= 15
channels and iSNR = 40 dB. (c) L= 60 channels and iSNR = 20 dB. (d) L= 15
channels and iSNR = 20 dB. .............................. 48
XI
LIST OF FIGURES LIST OF FIGURES
4.4 Simulations with realistic VLA uv-coverage: reconstructed images of channel ν1=
1.4GHz obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR
= 60 dB. From left to right: results of HyperSARA (aSNR = 30.13 dB), LRJAS
(aSNR = 28.85 dB), JAS (aSNR = 25.97 dB) and LR (aSNR = 26.75 dB). From
top to bottom: the estimated model images in log10 scale, the absolute value of the
error images in log10 scale and the naturally-weighted residual images in linear scale. 49
4.5 Simulations with realistic VLA uv-coverage: reconstructed images of channel ν60 =
2.78 GHz obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR
= 60 dB. From left to right: results of HyperSARA (aSNR = 30.13 dB), LRJAS
(aSNR = 28.85 dB), JAS (aSNR = 25.97 dB) and LR (aSNR = 26.75 dB). From
top to bottom: the estimated model images in log10 scale, the absolute value of the
error images in log10 scale and the naturally-weighted residual images in linear scale. 50
4.6 Simulations with realistic VLA uv-coverage: reconstructed spectra of three selected
pixels obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR = 60
dB. The results are shown for: (b) the proposed approach HyperSARA, (c) LRJAS,
(d) JAS and (e) LR, compared with the ground-truth. Each considered pixel is
highlighted with a colored circle in the ground-truth image x1displayed in (a). . 51
4.7 Simulations with realistic VLA uv-coverage: reconstructed images of channel ν1=
1.4GHz obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR
= 60 dB. From left to right: results of the proposed approach HyperSARA (aSNR =
30.13 dB), the monochromatic approach SARA (aSNR = 23.46 dB) and JC-CLEAN
(aSNR = 9.39 dB). From top to bottom (rst and second columns): the estimated
model images in log10 scale, the absolute value of the error images in log10 scale and
the naturally-weighted residual images in linear scale. From top to bottom (third
column): the estimated restored images in log10 scale, the absolute value of the error
images in log10 scale and the Briggs-weighted residual images in linear scale. . . . 53
4.8 Simulations with realistic VLA uv-coverage: reconstructed images of channel ν60 =
2.78 GHz obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR
= 60 dB. From left to right: results of the proposed approach HyperSARA (aSNR =
30.13 dB), the monochromatic approach SARA (aSNR = 23.46 dB) and JC-CLEAN
(aSNR = 9.39 dB). From top to bottom (rst and second columns): the estimated
model images in log10 scale, the absolute value of the error images in log10 scale and
the naturally-weighted residual images in linear scale. From top to bottom (third
column): the estimated restored images in log10 scale, the absolute value of the error
images in log10 scale and the Briggs-weighted residual images in linear scale. . . . 54
XII
LIST OF FIGURES LIST OF FIGURES
4.9 Simulations with realistic VLA uv-coverage: reconstructed spectra of three selected
pixels obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR = 60
dB. The results are shown for: (b) the proposed approach HyperSARA, (c) the
monochromatic approach SARA and (d) JC-CLEAN, compared with the ground-
truth. Each considered pixel is highlighted with a colored circle in the ground-truth
image x1displayed in (a). ............................... 55
4.10 Cyg A: recovered images of channel ν1= 2.04 GHz at 2.5times the nominal resolu-
tion at the highest frequency νL. From top to bottom: estimated model images of
the proposed approach HyperSARA, estimated model images of the monochromatic
approach SARA and estimated restored images of JC-CLEAN using Briggs weight-
ing. The full images are displayed in log10 scale (rst column) as well as zooms on
the east jet hotspot (second column) and the west jet hotspot (third column). . . 59
4.11 Cyg A: recovered images of channel ν32 = 5.96 GHz at 2.5times the nominal reso-
lution at the highest frequency νL. From top to bottom: estimated model images of
the proposed approach HyperSARA, estimated model images of the monochromatic
approach SARA and estimated restored images of JC-CLEAN using Briggs weight-
ing. The full images are displayed in log10 scale (rst column) as well as zooms on
the east jet hotspot (second column) and the west jet hotspot (third column). . . 60
4.12 Cyg A: naturally-weighted residual images obtained by the proposed approach Hy-
perSARA (left) and the monochromatic approach SARA (right). (a) Channel
ν1= 2.04 GHz, and (b) Channel ν32 = 5.96 GHz. The aSTD values are 1.19 ×102
and 8.7×103, respectively. .............................. 61
4.13 Cyg A: Briggs-weighted residual images obtained by the proposed approach Hyper-
SARA (left) and JC-CLEAN (right). (a) Channel ν1= 2.04 GHz, and (b) Channel
ν32 = 5.96 GHz. The aSTD values are 4.1×103and 2.1×103, respectively. . . 61
4.14 Cyg A: reconstructed spectra of selected pixels and point-like sources obtained by
the dierent approaches. Each considered pixel (P) or source (S) is highlighted with
a red circle on the estimated model image of HyperSARA at channel ν32 = 5.96
GHz displayed in (a). .................................. 62
4.15 G055.7+3.4: recovered images of channel ν1= 1.444 GHz at 2times the nominal
resolution at the highest frequency νL. From top to bottom: estimated model images
of the proposed approach HyperSARA, estimated model images of the monochro-
matic approach SARA and estimated restored images of JC-CLEAN using Briggs
weighting. The full images are displayed in log10 scale (rst column) as well as zoom
on the central region (second column). ......................... 64
XIII
LIST OF FIGURES LIST OF FIGURES
4.16 G055.7+3.4: recovered images of channel ν30 = 1.89 GHz at 2times the nominal res-
olution at the highest frequency νL. From top to bottom: estimated model images
of the proposed approach HyperSARA, estimated model images of the monochro-
matic approach SARA and estimated restored images of JC-CLEAN using Briggs
weighting. The full images are displayed in log10 scale (rst column) as well as zoom
on the central region (second column). ......................... 65
4.17 G055.7+3.4: naturally-weighted residual images obtained by the proposed approach
HyperSARA (left) and the monochromatic approach SARA (right). (a) Channel
ν1= 1.444 GHz, and (b) Channel ν30 = 1.89 GHz. The aSTD values are 6.55×105
and 8.37 ×105, respectively. .............................. 66
4.18 G055.7+3.4: Briggs-weighted residual images obtained by the proposed approach
HyperSARA (left) and JC-CLEAN (right). (a) Channel ν1= 1.444 GHz, and (b)
Channel ν30 = 1.89 GHz. The aSTD values are 1.12 ×104and 7.75 ×105,
respectively. ........................................ 67
4.19 G055.7+3.4: reconstructed spectra of selected pixels and point-like sources obtained
by the dierent approaches. Each considered pixel (P) or source (S) is highlighted
with a red circle on the estimated model image of HyperSARA at channel ν30 = 1.89
GHz (rst row). ...................................... 68
5.1 Illustration of the proposed faceting scheme, using a 2-fold spectral interleaving
process and 9-fold spatial tiling process. The full image cube variable (a) is divided
into two spectral sub-cubes (b) with interleaved channels (for a 2-fold interleaving,
even and odd channels respectively dene a sub-cube). Each sub-cube is spatially
faceted. A regular tessellation (dashed red lines) is used to dene spatio-spectral
tiles. The spatio-spectral facets result from the augmentation of each tile to produce
an overlap between facets (solid red lines). Panel (c) shows a single facet (left), as
well as the spatial weighting scheme (right) with linearly decreasing weights in the
overlap region. Note that, though the same tiling process is underpinning the nuclear
norm and 21 norm regularization terms, the denition of the appropriate overlap
region is specic to each of these terms (via the selection operators Sqand e
Sqin
(5.3)). ........................................... 72
XIV
LIST OF FIGURES LIST OF FIGURES
5.2 Illustration of the communication steps involving a facet core (represented by the
top-left rectangle in each sub-gure) and a maximum of three of its neighbours. The
tile underpinning each facet, located in its bottom-right corner, is delineated in thick
black lines. At each iteration, the following two steps are performed sequentially.
(a) Facet borders need to be completed before each facet is updated independently
in the dual space (Algorithm 5 lines 11–17): values of the tile of each facet (top left)
are broadcast to cores handling the neighbouring facets in order to update their
borders (Algorithm 5 line 5). (b) Parts of the facet tiles overlapping with borders of
nearby facets need to be updated before each tile is updated independently in the
primal space (Algorithm 5 line 24): values of the parts of the borders overlapping
with the tile of each facet are broadcast by the cores handling neighbouring facets,
and averaged. ...................................... 78
5.3 Illustration of the two groups of cores described in Section 5.3 with the main steps
involved in PDFB (Algorithm 5) applied to each independent sub-problem c
{1, . . . , C}, considering Qfacets (along the spatial dimension) and B= 1 data block
per channel. Data cores handle variables of the size of data blocks (Algorithm 5
lines 19–21), whereas facet cores handle variables of the size of a spatio-spectral facet
(Algorithm 5 lines 11–17), respectively. Communications between the two groups
are represented by colored arrows. Communications between facet cores, induced
by the overlap between the spatio-spectral facets, are illustrated in Figure 5.2. . . . 79
5.4 Spatial faceting analysis for synthetic data: reconstructed images (in Jy/pixel) re-
ported in log10 scale for channel ν1= 1 GHz for Faceted HyperSARA with Q= 16
and C= 1 (left), and HyperSARA (i.e. Faceted HyperSARA with Q=C= 1,
right). From top to bottom are reported the ground truth image, the reconstructed
and residual images. The overlap for the faceted nuclear norm regularization corre-
sponds to 50% of the spatial size of a facet. The non-overlapping tiles underlying
the denition of the facets are delineated on the residual images in red dotted lines,
with the central facet displayed in continuous lines. .................. 86
5.5 Spatial faceting analysis for synthetic data: reconstructed images (in Jy/pixel) re-
ported in log10 scale for channel ν20 = 2 GHz for Faceted HyperSARA with Q= 16
and C= 1 (left), and HyperSARA (i.e. Faceted HyperSARA with Q=C= 1,
right). From top to bottom are reported the ground truth image, the reconstructed
and residual images. The overlap for the faceted nuclear norm regularization corre-
sponds to 50% of the spatial size of a facet. The non-overlapping tiles underlying
the denition of the facets are delineated on the residual images in red dotted lines,
with the central facet displayed in continuous lines. .................. 87
XV
LIST OF FIGURES LIST OF FIGURES
5.6 Spectral faceting analysis for synthetic data: reconstructed images (in Jy/pixel)
reported in log10 scale for channel ν1= 1 GHz with Faceted HyperSARA for C= 10
and Q= 1 (left) and HyperSARA (i.e. Faceted HyperSARA with Q=C= 1, right).
Each sub-cube is composed of 10 out of the L= 100 channels. From top to bottom:
ground truth image, estimated model images and residual images. ......... 88
5.7 Spectral faceting analysis for synthetic data: reconstructed images (in Jy/pixel)
reported in log10 scale for channel ν100 = 2 GHz with Faceted HyperSARA for C=
10 and Q= 1 (left) and HyperSARA (i.e. Faceted HyperSARA with Q=C= 1,
right). Each sub-cube is composed of 10 out of the L= 100 channels. From top to
bottom: ground truth image, estimated model images and residual images. . . . . 89
5.8 Cyg A imaged at the spectral resolution 8MHz from 7.4 GB of data. Imaging results
of channel ν1= 3.979 GHz. Estimated images at the angular resolution 0.06′′ (3.53
times the observations spatial bandwidth). From top to bottom: the respective
estimated model images of the proposed Faceted HyperSARA (Q= 15,C= 16)
and SARA, both in units of Jy/pixel, and restored image of JC-CLEAN in units
of Jy/beam. The associated synthesized beam is of size 0.37′′ ×0.35′′ and its ux
is 42.18 Jy.The full FoV images (log10 scale) are overlaid with the residual images
(bottom right, linear scale) and zooms on selected regions in Cyg A (top left, log10
scale). These correspond to the west hotspot (left) and the inner core of Cyg A
(right). The zoomed regions are displayed with dierent value ranges for contrast
visualization purposes and are highlighted with white boxes in the full images. Cyg
A-2 location is highlighted with a white dashed circle. Negative pixel values of JC-
CLEAN restored image and associated zooms are set to 0 for visualization purposes.
Full image cubes are available online [117]. ...................... 95
XVI
LIST OF FIGURES LIST OF FIGURES
5.9 Cyg A imaged at the spectral resolution 8MHz from 7.4 GB of data. Reconstruction
results of channel ν480 = 8.019 GHz. Estimated images at the angular resolution
0.06′′ (1.75 times the observations spatial bandwidths). From top to bottom: the
respective estimated model images of the proposed Faceted HyperSARA (Q= 15,
C= 16) and SARA, both in units of Jy/pixel, and restored image of JC-CLEAN in
units of Jy/beam. The associated synthesized beam is of size 0.17′′ ×0.15′′ and its
ux is 8.32 Jy. The full FoV images (log10 scale) are overlaid with the residual images
(bottom right, linear scale) and zooms on selected regions in Cyg A (top left, log10
scale). These correspond to the west hotspot (left) and the inner core of Cyg A
(right). The zoomed regions are displayed with dierent value ranges for contrast
visualization purposes and are highlighted with white boxes in the full images. Cyg
A-2 location is highlighted with a white dashed circle. Negative pixel values of JC-
CLEAN restored image and associated zooms are set to 0 for visualization purposes.
Full image cubes are available online [117]. ...................... 96
5.10 Cyg A imaged at the spectral resolution 8MHz from 7.4 GB of data. Average
estimated images, computed as the mean along the spectral dimension. From top
to bottom: the respective estimated average model images of the proposed Faceted
HyperSARA (Q= 15,C= 16) and SARA, and the average restored image of JC-
CLEAN (obtained as the mean of the restored images normalized by the ux of
their associated synthesized beam). The full FoV images (log10 scale) are overlaid
with the residual images (bottom right, linear scale) and zooms on selected regions
in Cyg A (top left, log10 scale). These correspond to the west hotspot (left) and the
inner core of Cyg A (right). The zoomed regions are displayed with dierent value
ranges for contrast visualization purposes and are highlighted with white boxes in the
full images. Cyg A-2 location is highlighted with a white dashed circle. Negative
pixel values of JC-CLEAN restored image and associated zooms are set to 0 for
visualization purposes. ................................. 97
6.1 1D illustration of the exact HPD region C
αand the approximated one e
Cα. Notice that
C
αe
Cα........................................... 102
6.2 Illustration of the proposed method for the two dierent scenarios. Our approach simply
consists in examining the euclidean distance between the two sets Sand e
Cα. Left: there
is no intersection between the two sets, thus H0is rejected at level α. Right: the two sets
intersect, thus one cannot reject H0,i.e., one cannot conclude if the 3D structure exists in
the true image cube or not. ................................ 104
XVII
LIST OF FIGURES LIST OF FIGURES
6.3 Simulations with realistic uv-coverage: (c) Curves representing the values of ραin percent-
age (y-axis) as a function of the sampling rate SR=Ml/N (x-axis), in log10 scale, for the
3D structures of interest. The considered 3D structures are highlighted with rectangles on
channel ν1= 1 GHz (a) and channel ν15 = 2 GHz (b) of the ground-truth image cube, in
log10 scale. Each point corresponds to the mean value of 5tests with dierent antenna
positions and noise realizations, and the vertical bars represents the standard deviation of
the 5tests. ......................................... 113
6.4 Uncertainty quantication of 3D Structure 1: results, reported for channel ν1=
1GHz, are obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB.
The images from top to bottom are: the MAP estimate [X]1, the uncertainty
quantication results [X
e
Cα
]1and [X
S]1. The results are given for HyperSARA-
UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 75.21%, and JAS-UQ (right) with MAP estimate aSNR = 30.87 dB
and ρα= 64.53%. All images are displayed in log10 scale and overlaid with zoom
onto the region of Structure 1. ............................. 114
6.5 Uncertainty quantication of 3D Structure 1: results, reported for channel ν15 =
2GHz, are obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB.
The images from top to bottom are: the MAP estimate [X]15, the uncertainty
quantication results [X
e
Cα
]15 and [X
S]15. The results are given for HyperSARA-
UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 75.21%, and JAS-UQ (right) with MAP estimate aSNR = 30.87
and ρα= 64.53%. All images are displayed in log10 scale and overlaid with zoom
onto the region of Structure 1. ............................. 115
6.6 Uncertainty quantication of 3D Structure 2: results, reported for channel ν1=
1GHz, are obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB.
The images from top to bottom are: the MAP estimate [X]1, the uncertainty
quantication results [X
e
Cα
]1and [X
S]1. The results are given for HyperSARA-
UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 54.91%, and JAS-UQ (right) with MAP estimate aSNR = 30.87
dB and ρα= 45.1%. All images are displayed in log10 scale and overlaid with zoom
onto the region of Structure 2. ............................. 116
XVIII
LIST OF FIGURES LIST OF FIGURES
6.7 Uncertainty quantication of 3D Structure 2: results, reported for channel ν15 =
2GHz, are obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB.
The images from top to bottom are: the MAP estimate [X]15, the uncertainty
quantication results [X
e
Cα
]15 and [X
S]15. The results are given for HyperSARA-
UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 75.21%, and JAS-UQ (right) with MAP estimate aSNR = 30.87
and ρα= 64.53%. All images are displayed in log10 scale and overlaid with zoom
onto the region of Structure 2. ............................. 117
6.8 Uncertainty quantication of 3D Structure 3: results, reported for channel ν1=
1GHz, are obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB.
The images from top to bottom are: the MAP estimate [X]1, the uncertainty
quantication results [X
e
Cα
]1and [X
S]1. The results are given for HyperSARA-
UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 97.43%, and JAS-UQ (right) with MAP estimate aSNR = 30.87 dB
and ρα= 96.95%. All images are displayed in log10 scale and overlaid with zoom
onto the region of Structure 3. ............................. 118
6.9 Uncertainty quantication of 3D Structure 3: results, reported for channel ν15 =
2GHz, are obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB.
The images from top to bottom are: the MAP estimate [X]15, the uncertainty
quantication results [X
e
Cα
]15 and [X
S]15. The results are given for HyperSARA-
UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 97.43%, and JAS-UQ (right) with MAP estimate aSNR = 30.87
and ρα= 96.95%. All images are displayed in log10 scale and overlaid with zoom
onto the region of Structure 3. ............................. 119
1Simulations with VLA uv-coverage: (a) The ground-truth image at the reference frequency
x1. (b) Curves representing the evolution of aSNR (y-axis) as a function of the number of
iterations (x-axis), for the dierent methods: LRJAS, LRJAS-R (LRJAS with randomized
updates) and WDCT. ................................... 130
XIX
List of Tables
5.1 Spatial faceting experiment: varying size of the overlap region for the faceted nuclear
norm regularization. Reconstruction performance of Faceted HyperSARA with Q=
16 and C= 1, compared to HyperSARA (i.e. Faceted HyperSARA with Q=C= 1)
and SARA. The results are reported in terms of reconstruction time, aSNR and
aSNRlog (both in dB with the associated standard deviation), and total number of
CPU cores used to reconstruct the full image. The evolution of the aSNRlog, of
specic interest for this experiment, is highlighted in bold face. ........... 84
5.2 Spatial faceting experiment: varying number of facets along the spatial dimension
Q. Reconstruction performance of Faceted HyperSARA (C= 1, overlap of 50%),
compared to HyperSARA (i.e. Faceted HyperSARA with Q=C= 1) and SARA.
The results are reported in terms of reconstruction time, aSNR and aSNRlog (both in
dB with the associated standard deviation), and total number of CPU cores used to
reconstruct the full image. The evolution of the computing time, of specic interest
for this experiment, is highlighted in bold face. .................... 85
5.3 Spectral faceting experiment: reconstruction performance of Faceted HyperSARA
with a varying number of spectral sub-problems Cand Q= 1, compared to Hy-
perSARA (i.e. Faceted HyperSARA with Q=C= 1) and SARA. The results are
reported in terms of reconstruction time, aSNR and aSNRlog (both in dB with the
associated standard deviation) and total number of CPU cores. The reconstruction
performance of Faceted HyperSARA, specically investigated in this experiment, is
highlighted in bold face. ................................. 85
5.4 Computing cost of Cyg A imaging at the spectral resolution 8MHz from 7.4 GB
of data. Results are reported for Faceted HyperSARA, SARA, and JC-CLEAN in
terms of reconstruction time, number of CPU cores and overall CPU time (high-
lighted in bold face). ................................... 94
1 Overview of the variables employed in the adaptive procedure incorporated in Al-
gorithm 3. ......................................... 128
XX
LIST OF TABLES LIST OF TABLES
2 Overview of the parameters involved in the adaptive procedure incorporated in
Algorithm 3. ....................................... 128
XXI
List of Algorithms
1 Forward-backward primal-dual (PDFB) ......................... 28
2 HyperSARA approach ................................... 40
3 The adaptive PDFB algorithm underpinning HyperSARA ............... 42
4 Faceted HyperSARA approach .............................. 80
5 The PDFB algorithm underpinning Faceted HyperSARA ............... 81
6 Wideband uncertainty quantication by convex optimization ............. 108
XXII
Chapter 1
Introduction
Contents
1.1 Radio interferometry .............................. 2
1.2 Motivation .................................... 3
1.3 Thesis outline .................................. 4
Astronomy is the science of the universe, perhaps the oldest of sciences. It involves study-
ing the celestial objects such as planets, stars and galaxies and phenomena that occur outside
the Earth’s atmosphere. These celestial objects emit electromagnetic radiation along the electro-
magnetic spectrum (Figure 1.1). In the optical band (400 - 700 nm), the dominant emitters are
thermal sources with temperatures in the range 103104K. Thermal sources outside this range
and non-thermal sources do not emit radiation in the optical band but can be strong emitters in
other bands (e.g., cold sources emit in the infrared band, and very hot objects are rmly active
in the X-rays band). The radio band characterized by a long wavelength and low frequency is of
paramount interest. As opposed to the optical band, it covers a vast range of the electromagnetic
spectrum between around 3 kHz and 300 GHz or equivalently a wavelength range of approximately
100 km to 1 mm. Radio emission produced by a variety of radio sources is not thermal and is called
synchrotron radiation. Unlike thermal emission where the ux density increases with frequency,
the ux density in synchrotron emission increases with wavelength making the radio band the best
candidate to study this active type of sources. The dominant sources in the radio frequencies are
the sun, radio galaxies (the strongest one is the Cyg A galaxy), supernova remnants and pulsars.
The long-wavelength characteristic of the radio waves reduces their absorptivity and scattering
and makes them permeable through large clouds and celestial dust, allowing the detection of star
formation obscured by gas and cosmic dust, as well as the discovery of very far away galaxies.
Furthermore, this allows the detection of the hydrogen spectral line (HI line) at wavelength 21 cm.
Since hydrogen is an extremely abundant element in the universe, the HI line has been extensively
studied to map the structure of nearby galaxies and their kinematics.
1
Chapter 1: Introduction 2
Figure 1.1: The full electromagnetic spectrum. Credit: NASA public domain image, CC-BY-SA 3.0.
1.1 Radio interferometry
Astronomers use telescopes to observe the universe. The angular resolution θ(in radians) of a
telescope is given by [14]
θ= 1.22 λ
D,(1.1)
where λis the observation wavelength and Dis the diameter of the telescope, both in units of
length. Since the wavelengths in radio astronomy are so large, the angular resolution of radio
telescopes is poor even for enormous aperture sizes. For example, the human eye has an angular
resolution of 20′′ while the largest single-dish radio telescope on Earth, namely the Five hundred
meter Aperture Spherical Telescope (FAST) in China, with 500 m dish size can only provide an
angular resolution of 3for wavelength range of 30–15 cm [73]. To achieve higher angular
resolutions, we need radio telescopes of much larger aperture size which are impractical to build.
Instead, scientists leveraged interferometry, a pioneering technique led to a Nobel Prize in Physics
in 1974, to achieve higher angular resolutions. A radio interferometer is an array of spatially
separated antennas which collectively simulate a single telescope of huge aperture size. In this
setting, the resolution of an interferometer is dened by the maximum distance separating two
antennas in the array. Radio interferometry has opened the door to probe new regimes of radio
emission with extreme resolution and sensitivity, and to map large areas of the radio sky, deepening
our knowledge in cosmology and astrophysics.
31.2 Motivation
1.2 Motivation
Modern radio interferometers, such as the Karl G. Jansky Very Large Array (VLA) [92], the LOw
Frequency ARray (LOFAR) [122] and the MeerKAT radio telescope [68] provide massive volumes
of data, allowing large images of the sky to be reconstructed at an unprecedented resolution and
dynamic range. In particular, once completed, the upcoming Square Kilometre Array (SKA) [48]
will be the world’s largest radio telescope and will provide data at a rate 10 times larger than today’s
global Internet trac. It will form wideband images about 0.56 Petabyte in size (assuming double
precision) from even larger data volumes [108]. SKA is expected to bring answers to fundamental
questions in astronomy1, such as improving our understanding of cosmology and dark energy [97],
investigating the origin and evolution of cosmic magnetism [56] and probing the early universe
where the rst stars were formed [24]. Since the sky looks very dierent at dierent wavelengths,
it is imperative to use wideband astronomy to be able to understand all the physical processes in
the universe and achieve the expected scientic goals.
Wideband radio-interferometric (RI) imaging consists in forming a 3D image of the radio sky
from under-sampled Fourier data across a whole frequency band. Given the fact that the range
of spatial frequencies covered by a radio interferometer increases as the observation frequency
increases, higher resolution and higher dynamic range image cubes can be formed by jointly re-
covering the spatial and spectral information. To meet the capabilities of modern telescopes, it is
of paramount importance to design ecient wideband imaging algorithms, these need to be able
to recover high-quality images while being highly parallelized to scale with the sheer amount of
wideband data and the large dimension of the wideband image cubes to be estimated.
In this context, we develop a new approach within the versatile framework of convex op-
timization to solve the wideband RI imaging problem. The proposed approach, dubbed “Hy-
perSARA”, leverages low-rankness and joint average sparsity priors to enable the formation of
high-resolution and high-dynamic-range image cubes from RI data. The underlying algorithmic
structure is shipped with highly interesting functionalities such as preconditioning for accelerated
convergence and splitting functionality enabling the decomposition of data into blocks, for parallel
processing of all block-specic data-delity terms of the objective function, and thereby allowing
scalability to large data volumes. Furthermore, it involves an adaptive strategy to estimate the
noise level with respect to calibration errors present in real data. HyperSARA, however, models
the image cube as a single variable, and the computational and storage requirements induced by
complex regularization terms can be prohibitive for very large image cubes.
To alleviate this issue, the same splitting functionality is further exploited to decompose the
target image cube into spatio-spectral facets and enable parallel processing of facet-specic regular-
ization terms in the objective function. The resulting algorithm is dubbed “Faceted HyperSARA”.
1https://www.skatelescope.org/science/
Chapter 1: Introduction 4
Extensive simulation results on synthetic image cubes conrm that faceting can provide a signif-
icant increase in scalability at no cost in imaging quality. A proof-of-concept reconstruction of a
15 GB image cube of Cyg A from 7.4 GB of VLA data, utilizing 496 CPU cores on a high perfor-
mance computing system for 68 hours, conrms both scalability and a quantum jump in imaging
quality from the state-of-the-art CLEAN algorithm [118].
Since the wideband RI imaging problem is highly ill-posed, assessing the degree of condence
in specic 3D structures observed in the estimated cube is very important. More precisely, we
need methods that can tell us how much we are certain about these structures (are they real or
correspond to reconstruction artifacts?). Bayesian methods naturally enable the quantication of
uncertainty around the image estimate. However, this type of approaches usually involve sam-
pling of the full posterior distribution, hence cannot currently scale to the data regime expected
from modern telescopes. Instead, we propose to formulate the problem as a convex minimization
task solved using a sophisticated optimization algorithm. As for HyperSARA and Faceted Hyper-
SARA, the underpinning algorithmic structure benets from preconditioning and parallelization
capabilities, paving the road for scalability to large data sets and image dimensions.
1.3 Thesis outline
The thesis is organized as follows:
Chapter 2 explains in details the wideband RI measurement framework, giving the reader
all the knowledge to understand the application. Next, we present the ill-posed inverse
problem arising in wideband radio interferometry (RI). Finally, we describe the state-of-
the-art approaches to solve the RI imaging problem. These are CLEAN-based approaches,
Bayesian inference techniques and optimization methods.
Chapter 3 provides all the mathematical background for the work developed in this thesis.
We start by formulating the RI inverse problem as an optimization task. Second, we explore
the world of compressive sensing and sparse representation and explain sparse recovery ap-
proaches. At this point, we introduce convex optimization methods as a powerful tool to
solve convex minimization problems; we rst describe the proximal splitting methods. Next,
we put particular emphasis on the primal-dual framework adopted in this work. We nally
revisit the convex optimization methods developed in the literature to solve the wideband
RI imaging problem.
Chapter 4 gives a complete description of the HyperSARA approach proposed for solving the
wideband RI imaging problem. We start by explaining the low-rankness and joint sparsity
priors adopted for wideband RI imaging. After that, we present the HyperSARA optimiza-
tion problem and the underlying algorithmic structure. Finally, we provide analysis of the
51.3 Thesis outline
proposed approach and comparison with the benchmark methods on simulations and VLA
observations of Cyg A and the supernova remnant G055.7+3.4.
Chapter 5 introduces the Faceted HyperSARA approach, further development of HyperSARA
with fully distributed implementation. We begin by presenting the minimization problem and
the proposed spatio-spectral faceted prior. Afterwards, we describe the algorithmic structure
used to solve the resulting problem, along with the dierent levels of parallelization exploited.
Later, we analyze the performance and scalability potential of Faceted HyperSARA on ex-
tensive simulations and real data with a reconstruction of a 15 GB image cube of Cyg A
from 7.4 GB of VLA data.
Chapter 6 describes our uncertainty quantication approach for wideband RI imaging. First,
we postulate a Bayesian hypothesis test and propose a convex optimization problem to formu-
late this test. Next, we present the underpinning algorithmic structure and the epigraphical
splitting technique exploited to solve the minimization problem. Eventually, we showcase
the performance of our approach on realistic simulations.
Chapter 7 presents conclusions and nal remarks regarding the methods developed in this
thesis. We further shed light on possible directions of future work to take the current work
steps ahead toward scalability to the era of the SKA telescope.
Chapter 2
Wideband radio interferometry
Contents
2.1 Introduction ................................... 6
2.2 Spatial coherence function ........................... 7
2.3 Continuous RI data model ........................... 9
2.3.1 The eect of discrete sampling ........................ 10
2.3.2 The eect of the primary beam ....................... 12
2.3.3 Weighting schemes .............................. 12
2.4 Discrete RI data model ............................. 13
2.5 Radio-interferometric imaging ........................ 14
2.5.1 CLEAN-based algorithms .......................... 15
2.5.2 Bayesian inference techniques ........................ 16
2.5.3 Optimization algorithms ........................... 17
2.6 Conclusions .................................... 18
2.1 Introduction
A radio-interferometer is an array of spatially separated antennas probing the radio waves emanat-
ing from astrophysical sources in the space. Each antenna pair gives access to a radio measurement,
dubbed visibility, corresponding to the correlation of the sensed electromagnetic eld at the posi-
tions of the antennas.
This chapter is structured as follows. The spatial coherence function is introduced in Section
2.2. In Section 2.3, we describe the radio-interferometric (RI) measurement equation that can be
seen under some assumptions as spatial Fourier transform of the sky brightness distribution. The
RI discrete measurement model is presented in Section 2.4. RI imaging methods are discussed in
6
72.2 Spatial coherence function
Section 2.5. These are CLEAN-based approaches, Bayesian inference techniques and optimization
methods. Finally, conclusions are stated in Section 2.6.
It is worth noting that the derivation of the measurement model is based on [18,41,116].
2.2 Spatial coherence function
When an astrophysical phenomenon occurs at location Rin the universe, an electromagnetic signal
propagates from the location Rand arrives to a point rwhere can be conveniently observed by the
radio antennas. Since sources of interest are typically very far from Earth, all we can measure is the
surface brightness of the emitting source, and we cannot describe the depth of the emitting source.
A convenient way to express this assumption in radio astronomy is to assume that all astronomical
sources lie on the so-called celestial sphere, dened as a huge sphere of radius |R|=Rwhich within
there are no radiating sources, and measure the distribution of the electromagnetic radiation on
the surface of the sphere. For simplicity reasons, we consider only a monochromatic component
E(r, ν)of the electromagnetic eld at a frequency νrealizing that the entire electromagnetic
eld can be determined by the summation of all the frequency components. Also by ignoring all
polarization phenomena, the electromagnetic radiation measured at the point rcan be seen as a
scalar quantity E(r, ν). Then, the electromagnetic eld due to all sources of cosmic electromagnetic
radiation measured at the location rcan be written as
E(r, ν) = ZE(R, ν )e2ν
c|Rr|
|Rr|dS, (2.1)
where E(R, ν)is the distribution of the electromagnetic eld on the surface of the sphere. dSis
a surface element on the celestial sphere with the integration done over the entire sphere and cis
the speed of light.
A radio interferometer is a device that measures the spatial coherence function of the sensed
electromagnetic eld at the positions of the antennas. For two antennas located at r1and r2,
respectively, the spatial coherence function is dened as the expectation of a product:
V(r1,r2, ν) = E(r1, ν )E(r2, ν),(2.2)
where .stands for the expectation and is the complex conjugate. By substituting E(r, ν )from
equation (2.1), we get:
V(r1,r2, ν) = Z Z E(R1, ν)E(R2, ν )e2ν
c|R1r1|
|R1r1|
e2ν
c|R2r2|
|R2r2|dS1dS2,.(2.3)
Assuming that all astronomical sources are spatially incoherent, i.e.,⟨E(R1, ν)E(R2, ν )= 0,R1̸=
R2and ⟨E(R1, ν)E(R2, ν )=⟨|E(R, ν )|2,for R1=R2=R, and considering the great distance
Chapter 2: Wideband radio interferometry 8
of the source, i.e.,|Rr.| ≈ |R|, we write:
V(r1,r2, ν) = ZI(s, ν )e2ν
c(|Rsr2|−|Rsr1|)d,(2.4)
where I(s, ν) = ⟨|E(R, ν )|2is the observed intensity at the direction s=R
|R|on the celestial
sphere and dΩ = dS
|R|2is the solid angle element.
As can be seen in Figure 2.1, we consider the coordinate system (x, y, z)to describe the physical
location rof each antenna on Earth. The z-axis is pointing toward the phase-reference center s0(a
point in the sky which the radio interferometer is steered toward). The (x, y)plane is perpendicular
to the z-axis. In the same coordinate system (x, y, z), the coordinates of a source at direction son
the celestial sphere are given by (l, m, n)with l=cos(θx), m =cos(θy), n =cos(θz).(l, m, n)are
called direction cosines and verify l2+m2+n2= 1 and dΩ = dldm
cos(θz)=dldm
1l2m2. We then have:
|Rsr|=R[(lx/R)2+ (my/R)2+ (nz/R)2]1/2(2.5)
R[(l2+m2+n2)2(lx +my +nz)/R]1/2.(2.6)
We can use the binomial approximation1to simplify equation (2.6). Doing so:
|Rsr| ≈ R(lx +my +nz).(2.7)
By putting this result in equation (2.4), we get:
V(r1,r2, ν) = ZI(l, m, ν)e2ν
c[l(x1x2)+m(y1y2)+n(z1z2)] dldm
1l2m2.(2.8)
Note that equation (2.8) depends on the separation vector r1r2and not on the absolute loca-
tions of the antennas. Therefore, it is customary in radio interferometry (RI) to use the baseline
coordinates. We dene a baseline b1,2R3as the vectorial distance between two antennas r1and
r2, and its components u, ¯υ, ¯w)are in units of meter; ¯w=z1z2denotes the component in the
direction of line of sight and ¯u= (¯u, ¯υ) = (x1x2, y1y2)are the coordinates in its perpendicular
plane (in a parallel plane to the l= (l, m)plane). Doing so, we write:
V(¯u,¯w, ν) = Zn(l)1I(l, ν)e2ν
c(l·¯u+n¯w)d2l,(2.9)
where n(l)1=1
1l2m2.
1The binomial approximation states that (1 + x)α1 + αx. It is valid when |x|<1and |αx|<< 1.
92.3 Continuous RI data model
Figure 2.1: A distant source at direction son the celestial sphere observed by an antenna pair (r1,r2).
2.3 Continuous RI data model
In RI, the spatial coherence function dened in equation (2.9) is usually referred to as visibility
y(¯u,¯w, ν). It is also typical to represent the direction of the source s= (l, m, n)relative to the
phase-reference center s0= (0,0,1),i.e.,ss0= (l, m, n 1). In this scenario, we can dene the
radio measurement, the visibility, as:
y(¯u,¯w, ν) = Zn(l)1I(l, ν)e2ν
c(l·¯u+(n1) ¯w)d2l.(2.10)
The Van Cittert-Zernike theorem states that the visibility equation (2.10) can be reduced to a
2D Fourier transform of the sky intensity when the eld of view is small, i.e., all sources lie within
a small region on the celestial sphere which implies n1. In practice, this assumption is valid for
elds of width θF OV that satisfy the following condition [18]:
θF OV <1
3pθP B ,(2.11)
where θP B is the half-power width of the primary beam2of the antennas of the radio interferometer
and both θF OV and θP B are measured in radians.
In this setting, which will be adopted in this work, the complex visibility reads
y(¯u, ν) = ZI(l, ν )e2ν
c(l·¯u)d2l.(2.12)
2The primary beam is explained in Section 2.3.2
Chapter 2: Wideband radio interferometry 10
(a) (b)
Figure 2.2: VLA uv-coverage at the frequency ν= 8 GHz generated using: (a) A conguration. (b) C
conguration.
2.3.1 The eect of discrete sampling
The radio measurements do not cover the whole Fourier plane which makes the problem of recov-
ering the sky image from the Fourier data an ill-posed inverse problem. More precisely, because
of the limited number of antennas, the Fourier samples are measured at specic spatial frequen-
cies ν
cu, ¯v). For a radio-interferometer with nantennas, n(n1)/2complex measurements are
obtained at each time instant. When the studied radio sources are known to be constant over the
observation time, we do not need to measure all the visibilities at the same time. (e.g., the data
of the Cyg A galaxy reported in this thesis are acquired over two years (2015-2016)). One way
of acquiring more measurements over time for the same radio sky is by changing the positions of
the antennas. This scenario is valid for the VLA array, which has mobile antennas that can form
many congurations known as A, B, C, and D (Figure 2.2). The method of gradually lling the
spaces in the Fourier plane is referred to as aperture synthesis. More practically, with the Earth
rotation, the coordinates ν
cu, ¯v)of a baseline change with time as they are relative to the direc-
tion of sight. This results in elliptical tracks in the ν
cu, ¯v)plane corresponding to each baseline,
and hence a denser sampling. Furthermore, more frequency samples can be probed by observing
at dierent wavelengths. In fact, the Fourier sampling in radio interferometry is such that high
Fourier coecients are probed at high-frequency channels, and low Fourier coecients are probed
at low-frequency channels (Figure 2.3).
The measured visibilities at a given frequency νidentify the so-called uv-coverage of the radio-
interferometer at that frequency. The uv-coverage is dened by the array conguration (i.e., the
positions of the antennas), the direction of observation, the time dierence between consecutive
measurements and the total observation time.
Let S(¯u, ν)be the sampling function, that is equal to 1 for a measured ν
cu, ¯v)point and 0
11 2.3 Continuous RI data model
(a) (b)
Figure 2.3: VLA uv-coverage generated: (a) at the frequency ν= 8 GHz. (b) at the frequency
ν= 4 GHz.
otherwise, the continuous measured visibility at a frequency νdened in (2.12) resolves to
y(¯u, ν) = S(¯u, ν )ZI(l, ν)e2ν
c(l·¯u)d2l.(2.13)
The process of deriving the image from the visibilities is called mapping in radio astronomy. The
direct Fourier inversion of the measured visibilities is called dirty image (or dirty map):
Idirty(l, ν ) = ZS(¯u, ν )y(¯u, ν)e2ν
c(l·¯u)d2u,(2.14)
and the point spread function (PSF) of the instrument, also known as the dirty beam, is given by
the inverse Fourier transform of the sampling function:
B(l, ν) = ZS(¯u, ν)e2ν
c(l·¯u)d2u.(2.15)
The shape of the dirty beam B(:, ν )is a function of the uv-coverage at the frequency ν(Figure
2.4). For a fully sampled uv-coverage, the shape of the dirty beam is a sinc function whose main
lobe is inversely proportional to the maximum baseline ν
c¯
umax. However, for real uv-coverage,
unsampled ν
cu, ¯v)points increase the side-lobes and make the dirty beam noisy.
By the convolution theorem, at each frequency ν, the dirty image (or the dirty map) Idirty(:, ν )
becomes the true sky I(:, ν )convolved with dirty beam B(:, ν ):
Idirty(:, ν ) = I(:, ν)B(:, ν).(2.16)
Note that Idirty(:, ν )is never a satisfactory nal product because of the side-lobes of the dirty
beam (which are due to incomplete sampling). Therefore, image reconstruction algorithms in RI
Chapter 2: Wideband radio interferometry 12
(a) (b)
Figure 2.4: Dirty beams associated with the VLA uv-coverage generated: (a) at the frequency
ν= 8 GHz. (b) at the frequency ν= 4 GHz.
are called deconvolution methods since they aim to deconvolve I(:, ν)with respect to B(:, ν).
2.3.2 The eect of the primary beam
In practice, the antennas are of nite size and are sensitive to the direction of observation. There-
fore, we introduce the primary beam of the interferometer elements A(l, ν), in the description of
the complex visibility as follows:
y(¯u, ν) = S(¯u, ν )ZA(l, ν)I(l, ν)e2ν
c(l·¯u)d2l.(2.17)
In theory, the sky intensity I(:, ν)can be extracted by a simple division by the primary beam,
if all the antennas are identical (i.e., all have the same primary beam). However, since the primary
beam shape falls rapidly to zero out of the vicinity of the tracking center, dividing by A(:, ν)
increase the errors in regions far from the center. Therefore, in practice, primary beam correction
is to be done during calibration, especially for sources far from the tracking direction. Moreover,
the sky brightness is usually modulated with the so-called direction dependent eects (DDEs),
that encompass instrumental discrepancies and propagation and receivers errors. These eects are
usually corrected for during calibration and are not in the scope of this work.
2.3.3 Weighting schemes
The dirty beam is a highly non-smooth function with potentially large side-lobes. This is because
the sampling function is a non-smooth function that consists of a collection of Diracs with many
gaps in between and sharp cut-o at the limit of the uv-coverage. To combat the natural sampling,
one can ne-tune the dirty beam shape by multiplying the sampling function with other weighting
13 2.4 Discrete RI data model
function dened as
W(¯u, ν) = ν
c
Mν
X
m=1
T(m, ν)D(m, ν )δu¯um,¯v¯vm),(2.18)
where Mνis the number of measurements at the frequency ν. The function T(:, ν)is a smoothly
varying function, typically Gaussian, used to down-weight the outer edge of the uv-coverage, hence
decreasing the side-lobes of the dirty beam. This is equivalent to a convolution with a Gaussian
in the image domain which will decrease the spatial resolution. The function D(:, ν )is the density
weighting function used to control the weights resulting from the non-uniform sampling in the
uv-plane. There are three schemes of D(:, ν)commonly used depending on the scientic goal:
D(m, ν)=12(m), called natural weighting, where ϱ2(m)is the variance of the noise
associated with the mdata point. This scheme treats all visibilities the same. Given the
nature of the Fourier sampling in RI where the density of the measurements is heavy in the
center and decreases by moving further away (see Figure 2.3), this scheme provides the best
signal-to-noise ratio and limited resolution, which is good for imaging weak point sources
while it is undesirable for extended emission.
D(m, ν) = 1/Ns(m), called uniform weighting, where Ns(m)is the number of measurements
within a symmetric region of width scentered at the mdata point. This scheme gives more
weight to high spatial frequencies since they are less sampled in radio interferometry, hence
enhancing the resolution at the expense of compromising the signal-to-noise ratio.
• Briggs weighting (or robust weighting) is a hybrid scheme between natural and uniform
weighting and provides a trade-o between resolution and sensitivity [16]. The key parameter
in Briggs weighting is the so-called robustness parameter which sets the desired level between
uniform and natural. Typical values are between -2 and 2. Lower values give more uniform-
like weights, higher values give more natural-like weights.
2.4 Discrete RI data model
Considering Lfrequency channels and sketching the intensity images and the RI data at each
frequency νlas vectors, the discrete version of the measurement model follows:
yl=Φlxl+nl,(2.19)
where xlRN
+is the true sky image of size Npixels at the frequency νl, and ylCMlrepresents
the visibilities of size Ml. The vector nlCMlrepresents measurement noise, modelled as a
realization of a random independent and identically distributed (i.i.d.) complex Gaussian noise.
Chapter 2: Wideband radio interferometry 14
Φlis the measurement operator at the frequency νl. Ideally, Φlaccounts for a direct Fourier
transform of the sky image to the non-uniform visibility space, which requires MlNcomputations
for each channel. This is infeasible in the era of the new radio telescopes where tremendous amounts
of data will be provided. Instead, the common practice in radio astronomy is to take advantage of
the Fast Fourier Transform (FFT) and interpolate the visibilities from a regular grid. Then, the
measurement operator for each channel νlcan be written as
Φl=ΘlGlFZ.(2.20)
The measurement operator Φlis composed of a zero-padding and scaling operator ZRo·N×N, the
FFT matrix FCo·N×o·N, a non-uniform Fourier transform interpolation matrix GlCMl×o·N,
and a diagonal noise-whitening matrix ΘlCMl×Mlthat contains on its diagonal the inverse of
the noise standard deviation associated with each original measurement. This assumes that the
original visibility vector was multiplied by Θlwith the aim to produce a measurement vector yl
aected by i.i.d. Gaussian noise. In this setting, the data projected onto the image space, i.e.,
the dirty image Φ
lyl, is naturally-weighted with the inverse of the noise variance as described in
Section 2.3.3. Each row of Glcontains a compact support interpolation kernel centered at the
corresponding uv-point [55], enabling the computation of the Fourier mode associated with each
visibility from surrounding discrete Fourier points. Note that at the sensitivity of interest to the
new generation of radio telescopes, DDEs of either atmospheric or instrumental origin, complicate
the RI measurement equation. For each visibility, the sky surface brightness is pre-modulated
by the product of a DDE pattern specic to each antenna. The DDEs are often unknown and
need to be calibrated jointly with the imaging process [12,98,101,121]. Focussing here on the
imaging problem, i.e., assuming DDEs are known, they can simply be integrated into the forward
model (2.19) by building extended interpolation kernels into each row of Gl, resulting from the
convolution of the non-uniform Fourier transform kernel with a compact-support representation of
the Fourier transform of the involved DDEs.
Estimating the underlying wideband sky image X= (xl)1lLfrom incomplete Fourier mea-
surements is a severely ill-posed inverse problem, which calls for powerful regularization terms to
encode a prior image model.
2.5 Radio-interferometric imaging
A plethora of RI imaging approaches has been proposed in the literature, which can be classied
into three main categories. These are CLEAN-based approaches, Bayesian inference techniques
and optimization methods.
15 2.5 Radio-interferometric imaging
2.5.1 CLEAN-based algorithms
A rst class of methods is the celebrated CLEAN family [10,32,38,64,85,96,107,109]. CLEAN is
a greedy deconvolution method based on iterative local removal of the PSF. It assumes that the
sky image xlRNat the frequency νlis made of a collection of point sources, thus implicitly
assuming sparsity in the image space. This method can also be seen as a gradient descent approach
with an implicit sparsity prior on xl, and its update at iteration tfollows [87] :
x(t+1)
l=x(t)
l+TΦ
l(ylΦlx(t)
l),(2.21)
where Φ
lis the adjoint of the linear operator Φl. At each iteration, the algorithm operates
in major and minor cycles. The minor cycle consists in estimating the brightest point source
(position and value), so-called CLEAN component, then removes a fraction of its contribution
from the residual image using an approximate operator e
Φlto allow fast subtraction of multiple
sources. This process is represented by the operator Tin equation (2.21). Then, the CLEAN
components are added to the estimated sky model x(t)
l. Finally, the major cycle computes the
residual image Φ
l(ylΦlx(t+1)
l)using the exact operator Φlfor the next round of minor cycles.
Although ecient in recovering point sources, CLEAN has shown limited performance when it
comes to the recovery of extended emissions. To overcome this limitation, a rst multi-resolution
variant of CLEAN was proposed in [113]. Leveraging an isotropic wavelet transform, the so-called
MRC algorithm performs classical CLEAN iterations on the wavelet coecients. Another multi-
scale extension of CLEAN, MS-CLEAN, has been proposed in [38]. MS-CLEAN assumes the sky
to be a linear combination of images at dierent spatial scales. At each scale, the image is a
convolution of Diracs with tapered, truncated parabolas of dierent widths. From a multi-scale
decomposition of the residual image, MS-CLEAN detects the brightest peak value, its position and
corresponding scale, adds a scaled component at the same pixel position to the estimated model
of the sky, then removes a fraction of its contribution from the data in the same manner as the
classical CLEAN.
A rst wideband CLEAN-based approach, dubbed MF-CLEAN [107], models the sky intensity
as a collection of point sources whose spectra follow a power law dened as
xl=x1(νl
ν1
)α,(2.22)
where αRNis the spectral index map. This power law is approximated by a rst-order (linear)
Taylor expansion, and the problem reduces to the estimation of two Taylor coecient images.
These are reconstructed by performing a classical CLEAN on their associated dirty images. The
locations of the CLEAN components are determined via a least-squares solution. Yet, MF-CLEAN
is sub-optimal when it comes to the recovery of extended emissions, as these are modelled with
Chapter 2: Wideband radio interferometry 16
point sources. Moreover, the approach is limited by higher-order spectral eects like spectral
curvature. To overcome these drawbacks, [96] have proposed a multi-scale multi-frequency variant
of CLEAN, dubbed MS-MFS, assuming the curvature model as a spectral model. It reads
xl=x1(νl
ν1
)α+βlog( νl
ν1),(2.23)
where αRNand βRNare the spectral index and the curvature maps, respectively. Using
Taylor series, xlis approximated via a linear combination of few Taylor coecient images (sr
RN)1rR:
xl=
R
X
r=1
hl,rsr,(2.24)
where (hl,r = (νlν1
ν1)r)1lL
1rR
are the spectral basis functions, Ris the order of Taylor series and
Lis the number of channels. In this case, the wideband image reconstruction problem reduces
to the recovery of the Taylor coecient images. These are deconvolved by performing a multi-
scale CLEAN on their respective dirty images (sdirty
r=PL
l=1 hl,rxdirty
l)1rR. More recently,
[85] have proposed a wideband variant of multi-scale CLEAN, so-called joined-channel CLEAN
(JC-CLEAN), that is incorporated in the software wsclean3[84]. The main idea consists in
determining the pixel positions of the CLEAN components from an integrated image, obtained as
a sum of the residual images of all the channels (initially, these correspond to the dirty images).
The spectra of the selected pixel positions are determined directly from the associated values in
the residual images at the dierent channels. When the spectral behaviour of the radio sources
is known to be smooth (that is the case for synchrotron emission [106]), a polynomial is tted to
their estimated spectra.
Albeit simple and computationally ecient, CLEAN-based algorithms provide a limited imag-
ing quality in high-resolution and high-sensitivity acquisition regimes. This shortcoming partly
results from their greedy nature and their lack of exibility in injecting complex prior information
to regularize the inverse imaging problem. Moreover, these algorithms often require careful tuning
of the associated parameters.
2.5.2 Bayesian inference techniques
The second class of methods relies on Bayesian inference techniques to sample the full poste-
rior distribution based on a hierarchical Bayesian model [7,39,69,70,114]. For instance, authors
in [114] proposed a monochromatic Bayesian method based on Markov chain Monte Carlo (MCMC)
sampling techniques, assuming a Gaussian image prior. Since MCMC sampling methods are com-
putationally very expensive, an ecient variant was proposed in [7,70] to perform approximate
Bayesian inference. The so-called RESOLVE algorithm, formulated in the language of information
3W-Stacking CLEAN (wsclean) is a wide eld RI imaging software can be found at https://sourceforge.net/projects/
wsclean/.
17 2.5 Radio-interferometric imaging
eld theory, approximates the full posterior distribution by a multivariate Gaussian distribution
and draws samples from the approximate distribution. RESOLVE is a joint imaging and cali-
bration algorithm, where only direction-independent antenna-based calibration is considered. A
wideband imaging variant of RESOLVE was introduced by [69]. Authors considered a power-law
spectral model and proposed a method restricted to the case of Gaussian or log-normal priors.
In this case, the reconstruction of the wideband model cube consists in the estimation of the sky
image at the reference frequency x1and the spectral index map α. The method is limited by
higher-order spectral eects like spectral curvature.
Importantly, this class of methods naturally enable the quantication of uncertainty around
the image estimate. However, this type of approaches cannot currently scale to the data regime
expected from modern telescopes.
2.5.3 Optimization algorithms
The third class of approaches leverages optimization methods to allow sophisticated prior in-
formation to be considered, such as sparsity in an appropriate transform domain, smoothness,
etc, [6,26,42,43,47,54,57,60,74,124,126]. From the perspective of optimization theory, the in-
verse imaging problem is approached by dening an objective function, consisting of a sum of a
data-delity term and a regularization term promoting a prior image model to compensate for the
incompleteness of the visibility data. The sought image is estimated as a minimizer of this ob-
jective function and is computed through iterative algorithms, which benet from well-established
convergence guarantees. For instance, [124] assumed that the spectra are composed of a smooth
continuous part with sparse local deviations, hence allowing for recovering non-smooth spectra.
In this respect, the authors proposed a convex minimization problem promoting sparsity in a con-
catenation of two dictionaries. The rst synthesis dictionary consists of delta functions. Sparsity
of its associated synthesis coecients is enforced, allowing for sparse local deviations. The second
synthesis dictionary consists of smooth polynomials, more precisely, the basis functions of Cheby-
shev polynomials. Joint sparsity of the synthesis coecients associated with the overall dictionary
is also enforced. Assuming smooth spectra, [54] proposed a convex minimization problem pro-
moting sparsity of both the spatial and spectral information. Spatial sparsity is promoted in a
redundant wavelet dictionary. More interestingly, sparsity of the spectra is enforced in a Discrete
Cosine Transform (DCT). Finally, a third quadratic regularization imposed on the overall model
cube. The approach involves the tuning of multiple hyper-parameters, representing the trade-o
between the dierent priors. The choice of these parameters is crucial as it aects the nal solution
signicantly. To alleviate the issue of tuning multiple parameters, [47] discarded the smooth prior
on the model cube, reducing the number of hyper-parameters to two. Furthermore, [6] proposed
an automatic procedure to tune the remaining two hyper-parameters. In the last decade, Wiaux
and collaborators proposed advanced image models: the average sparsity prior in monochromatic
Chapter 2: Wideband radio interferometry 18
imaging (SARA) [2527,44,8688,94], and the polarization constraint for polarized imaging (Po-
larized SARA) [11]4. These models have been reported to result in signicant improvements in
the reconstruction quality in comparison with state-of-the-art CLEAN-based imaging methods,
at the expense of an increased computation cost. The single-channel approach SARA consists in
solving a sequence of weighted minimization problems, promoting sparsity of the sky image in
an overcomplete dictionary (spanned by a collection of eight wavelet bases and the Dirac basis).
Finally, aasuming a point-source model of the sky, the authors in [89] leveraged the nite rate
of innovation (FRI) framework to nd the locations of the point sources in a continuum without
grid imposition. The main idea consists in estimating a smooth function (a polynomial) which
vanishes precisely at the non-zero positions of the continuous-domain sparse signal. Provided that
FRI signals can be written as a weighted sum of sinusoids whose frequencies are related to the
unknown parameters of the original continuous sparse signal, the authors formulated the problem
as a constrained minimization task in the frequency domain to estimate a discrete lter that its
convolution with (unknown) uniformly sampled sinusoids is zero.
Note that, from a Bayesian perspective, the objective function can be seen as the negative
logarithm of a posterior distribution, with the minimizer corresponding to a Maximum A Posteriori
(MAP) estimate. That being said, convex optimization only provides a point estimate of the
posterior distribution, making it faster than Bayesian inference techniques that explore the full
distribution. Recently, methods for uncertainty quantication by convex optimization have also
been tailored, which enable assessing the degree of condence in specic structures appearing in
the MAP estimate [99,100].
2.6 Conclusions
In this chapter, we explained in details the wideband RI measurement framework starting from
the spatial coherence function to the discrete measurement model. We then visited the state-of-
the-art approaches tailored to solve the inverse imaging problem in RI. These are CLEAN-based
methods, Bayesian inference techniques and optimization methods. Since the work developed
in this thesis falls into the last category, we dedicate the next chapter to explain the convex
optimization framework and all the tools needed to develop the proposed algorithms.
4Associated software on the Puri-Psi webpage: https://basp- group.github.io/Puri-Psi/.
Chapter 3
Sparse representation and convex
optimization
Contents
3.1 Introduction ................................... 19
3.2 RI imaging problem formulation ....................... 20
3.3 Compressive sensing and sparse representation .............. 21
3.3.1 1minimization ................................ 22
3.3.2 Reweighted-1minimization ......................... 23
3.3.2.1 SARA for RI imaging ....................... 23
3.4 Convex optimization .............................. 24
3.4.1 Proximal splitting algorithms ........................ 25
3.4.2 Primal-dual .................................. 26
3.4.3 Convex optimization for wideband RI imaging - revisited ......... 28
3.5 Conclusions .................................... 30
3.1 Introduction
This chapter provides all the mathematical background required for the work developed in this
thesis and is organized as follows. Section 3.2 poses the RI inverse problem as a minimization
task. We explore the world of compressive sensing and sparse recovery approaches in Section
3.3. In Section 3.4, we introduce convex optimization methods as powerful tools to solve convex
minimization problems and revisit the convex optimization approaches adopted for wideband RI
imaging. Finally, conclusions are stated in Section 3.5.
19
Chapter 3: Sparse representation and convex optimization 20
3.2 RI imaging problem formulation
RI data are incomplete and noisy Fourier measurements, thus image recovery in RI is an ill-posed
inverse problem and have an innite number of possible solutions that can t the acquired data. To
have an accurate estimate and discard unwanted solutions, regularization (i.e., prior information
about the underlying sky image) should be imposed [53]. In the context of optimization theory,
the inverse imaging problem can be approached by dening an objective function consisting of a
sum of a data-delity term imposing ylΦlxlup to the noise, and a regularization term. The
sky image of interest xlis estimated as a minimizer of the objective function as follows:
minimize
xlRNr(xl) + f(yl,xl),(3.1)
where:
r(xl)is the regularization function imposing prior information on the sky image xlto be
estimated, e.g., smoothness, sparsity in some domain, etc.
f(yl,xl)is the data-delity function, measuring the similarity between the available visibil-
ities yland the estimated ones.
We recall that in a Bayesian framework, the objective function can be seen as the negative logarithm
of a posterior distribution, with the minimizer corresponding to a MAP estimate.
Assuming an additive i.i.d Gaussian noise, a typical data-delity term is the euclidean norm
(the 2norm) dened for a signal ylRMlas
yl2="Ml
X
m=1 |ym,l|2#1/2
.(3.2)
The minimization problem (3.1) can then be rewritten as
minimize
xlRNr(xl) + τ
2ylΦlxl2
2,(3.3)
where τ > 0is a free parameter setting the trade-o between the prior and the data-delity terms.
Problem (3.3) represents an unconstrained minimization problem, and an equivalent constrained
formulation can be written as
minimize
xlRNr(xl)subject to ylΦlxl2ϵl,(3.4)
where ϵl>0is a bound on the noise level; nl2ϵl. When the noise statistics are known, which
is typically the case in RI, the constrained formulation is preferred, avoiding the need to ne-tune
the free parameter τ[26]. The quality of reconstruction is highly dependent on the choice of the
21 3.3 Compressive sensing and sparse representation
regularization function. In the next section, we explore the world of sparsity, where sparsity priors
have drawn a vast interest for RI image reconstruction.
3.3 Compressive sensing and sparse representation
The Nyquist-Shannon theorem states that for the exact recovery of a band-limited signal, the
sampling rate should be at least twice the bandwidth of the signal. The aim of the theory of
compressive sensing (CS) is to go beyond the Nyquist-Shannon theorem, relying on the fact that
most signals in nature are sparse or compressible. A signal that is represented as a vector xl
RN, identies the NNyquist-Shannon samples, is said sparse if it has a few non-zero coecients
K << N in an adequate basis. More generally, xlis said compressible if it has a few signicant
coecients in some basis ΨRN×T, T Nwhilst most of the remaining coecients have
negligible values [22],
xl=Ψαl,(3.5)
where αlRTis sparse or compressible. Extensive research has been conducted during the past
years to nd the most suitable dictionary Ψfor dierent types of images [104,112]. For example,
for images composed of point sources, Ψcan be set to the Dirac basis promoting sparsity in the
image domain itself. For piece-wise constant images, sparsity can be promoted in the gradient
domain [105]. For smooth images with more complex structures (e.g., extended emission), the
wavelet domain [76] and redundant wavelet dictionaries [26] have shown to be good choice for
Ψ. Other options exist in the literature for the choice of the dictionary Ψsuch as the isotropic
undecimated wavelet transform (IUWT) [111] and the curvelets [110].
The theory of compressive sensing states that, under some constraints, an exact recovery of a
compressible signal xlRNin a basis ΨRN×Tcan be achieved by a number of Mlmeasurements
ylCMlin a sensing basis ΦlRMl×N, with Mlmuch smaller than the required amount in
Nyquist-Shannon sampling such that
yl=Aαl+nlwith A=ΦlΨRMl×T.(3.6)
The matrix Amust satisfy the restricted isometry property (RIP) [23]. In practice, a few random
measurements in a sensing basis Φlincoherent with the sparsity basis Ψwill ensure the RIP with
overwhelming probability [19,21,49,126]. The coherence µbetween two bases Φland Ψis dened
as the maximum complex modulus of the scalar product between unit-norm vectors ϕl,i and ψj
of the two bases:
µ=Nmax
1i,jN|⟨ϕl,i ,ψj⟩|.(3.7)
For instance, probing random Fourier measurements from a signal sparse in a wavelet dictionary
Chapter 3: Sparse representation and convex optimization 22
is a good example for an incoherent sensing and sparsity bases. In this respect, the RI inverse
problem (2.19) (equivalently (3.6)) can be solved by nding the sparse representation αlthat is
consistent with the data yl. This can be done by solving the following constrained problem:
minimize
αlRTαl0subject to ylΦlΨαl2ϵl,(3.8)
where .0denotes the 0pseudo-norm that is the number of non-zero coecients of a signal αl[49].
The 0pseudo-norm is neither convex nor smooth function and thus the minimization problem
dened in (3.8) is non-convex. To solve this problem, greedy algorithms such as matching pursuits
(MP) [77] or iterative hard thresholding (IHT) [13,61] can be used. However, these methods are
only guaranteed to nd local optimum, so good initialization becomes paramount.
3.3.1 1minimization
A common approach to make the problem (3.8) convex is to promote sparsity by replacing the 0
pseudo-norm with its closest convex relaxation, that is the 1norm, dened for a signal αlRT
by:
αl1=
T
X
n=1 |αn,l|.(3.9)
Thus, we pose the following convex minimisation problem [19,30,49]:
minimize
αlRTαl1subject to ylΦlΨαl2ϵl,(3.10)
The minimization of the 1norm of a sparse signal αlunder a constraint on the 2norm of
the residual noise is called the basis pursuit denoising (BPDN) problem. The BPDN problem
nds explicitly the sparsest signal αland then recovers the original signal xlthrough (3.5); this
is a sparsity-by-synthesis approach. It assumes that the signal xlcan be approximated by a
linear combination of few atoms of a redundant synthesis dictionary Ψand thus solves for the
synthesis coecients αl. Sparsity can also be promoted through analysis-based approaches where
the projection of xlonto a redundant analysis dictionary Ψis assumed sparse [52]. Analysis-based
problems solve directly for the signal xlitself and can be of the form:
minimize
xlRNΨxl1subject to ylΦlxl2ϵl.(3.11)
Both synthesis and analysis approaches are equivalent for orthonormal sparsity bases. However,
when considering redundant dictionaries, they may lead to dierent solutions. In recent works, the
analysis problem has been shown to be more robust for redundant dictionaries [26].
23 3.3 Compressive sensing and sparse representation
3.3.2 Reweighted-1minimization
Although the 1prior has been widely used during the last decades to promote sparsity, it induces
an undesirable dependence on the coecients’ magnitude. Indeed, unlike the 0prior, the 1norm
penalizes more the larger coecients than the smaller. A better approximation of the sparsity
measure can be achieved by adopting the weighted 1norm [23], dened for a signal αlRTas
αl1,ω=
T
X
n=1
ωn,l|αn,l |,(3.12)
where ωn,l >0is the weight associated with the coecient αn,l. Assuming the signal αlis known
a priori (which is not the case in practice), one can set the weights as ωn,l =|αn,l|1. This choice
of the weights makes the weighted 1norm independent of the values of the non-zero coecients
to mimic the 0pseudo-norm behaviour. Also for zero-value coecients, we get innite weights,
forcing the solution of the 1minimization problem at these positions to be zero. In practice, the
signal αlis unknown and needs to be estimated. Therefore, the appropriate weights are found by
solving sequentially a sequence of weighted 1minimization problems, each is solved using weights
essentially the inverse of the solution of the previous problem [23,81].
This approach can be elegantly cast as solving for non-convex log-sum priors within a majorization-
minimization framework enabling sequential use of convex optimization algorithms [65]. The log-
sum prior ris dened for a signal αlRTas
r(αl) =
T
X
n=1
log |αn,l|+υ,(3.13)
with υ > 0. In practice, minimizing sequentially convex problems with weighted 1priors is indeed
algorithmically much simpler than minimizing a non-convex problem with a log-sum prior [59,82,
83,102]. In the following paragraph, we present the SARA approach [26] proposed to solve the RI
imaging problem using log-sum prior.
3.3.2.1 SARA for RI imaging
The authors in [26] proposed the sparsity averaging reweighted analysis (SARA) approach to
solve the RI imaging problem. SARA promotes sparsity by minimizing a log-sum prior through a
reweighted 1procedure, considering a highly redundant sparsity dictionary ΨRN×Tdened
as the concatenation of wavelet bases (rst eight Daubechies wavelets and the Dirac basis), leading
to the notion of average sparsity over the bases of interest. The SARA minimization problem reads
minimize
xlRN
+eµ
T
X
n=1
log |[Ψxl]n|+υsubject to ylΦlxl2ϵl,(3.14)
Chapter 3: Sparse representation and convex optimization 24
where eµ > 0and υ > 0are regularization parameters, and [Ψxl]ndenotes the n-th coe-
cient of Ψxl. To solve this non-convex minimization problem, the SARA approach leverages the
majorization-minimization approach proposed by [23]. More precisely, for a given image x(k)
l, the
log-sum prior given by r(xl) = eµPT
n=1 log |[Ψxl]n|+υis locally majorized by a linear function,
i.e., for every xlRN,
r(xl)r(x(k)
l) +
T
X
n=1 eµ
|[Ψx(k)
l]n|+υ|[Ψxl]n|−|[Ψx(k)
l]n|.(3.15)
Then, using a majorization-minimization approach, the majorant function given by the right-hand
side of (3.15) is to be minimized, subject to the data-delity and non-negativity constraints. This
results in the following convex problem, approximation of the initial non-convex problem (3.14):
minimize
xlRN
+eµΨxl1,ω(x(k)
l)subject to ylΦlxl2ϵl.(3.16)
The weights ω(x(k)
l) = ωn(x(k)
l)1nTare given by
ωn(x(k)
l) = |[Ψx(k)
l]n|+υ1.(3.17)
Problem (3.16) is a convex approximation of the original non-convex problem (3.14) at the local
point x(k)
l. Then, once problem (3.16) is solved, the full majorization-minimization procedure is
iterated in order to globally approximate the original non-convex problem of interest (3.14).
The SARA approach has proved ecient for RI imaging on both synthetic and real data [26,
27,44,87,88,94].
3.4 Convex optimization
By denition, a function f:RN]− ∞,+]is convex if dom f1is convex and for any
x1,x2dom fand θ[0,1], we have:
f(θx1+ (1 θ)x2)θf (x1) + (1 θ)f(x2),(3.18)
i.e., the function graph between two points on the graph lies below the line segment between these
two points. A convex optimization problem consists in minimizing a convex function subject to
convex constraints. As opposed to non-convex problems like (3.8), the class of convex problems
has the nice property that any local optimum is a global one.
The previously introduced 1minimization problems ((3.10), (3.11) and (3.16)) involving convex
functions (1and 2norms) belong to the convex problems family. These problems can be eciently
1See Appendix .1 for the denition of the domain of a function.
25 3.4 Convex optimization
solved leveraging convex optimization techniques where various ecient algorithms in terms of
exibility and convergence have been invented in this respect [8]. In what follows, we explain two
classes of methods within the convex optimization framework: proximal splitting methods and the
primal-dual approach. The latter is used for all the algorithmic developments in this thesis.
3.4.1 Proximal splitting algorithms
Proximal splitting methods attract much interest due to their convergence guarantees. When
applied to convex problems, they are exible and lead to a global optimal solution for the associated
minimization task. Many splitting algorithms have been devised [15,33,34], all of them solving
minimization tasks of the form:
minimize
xlRN
Q
X
q=1
gq(xl),(3.19)
where for q={1,·· · , Q},gqis a proper, lower semi-continuous2convex function and possibly
non-smooth from RNto ]− ∞,+](gqΓ0(RN)3). These algorithms bring the advantage of
splitting the objective into many simpler functions that can be dealt with separately. Each non-
dierentiable function is involved in the minimization via its proximity operator. Let URN×N
be a symmetric, positive denite matrix. The proximity operator of a proper, convex, lower semi-
continuous function g:RN]− ∞,+]at xlRNwith respect to the metric induced by Uis
dened by [63,78]
proxU
g(xl) = argmin
xlRNng(xl) + 1
2(xlxl)U(xlxl)o.(3.20)
In the following, the more compact notation proxgwill be used whenever U=IN, where IN
RN×Nis the identity matrix. The proximity operator proxg(xl)is the unique solution of the
minimization of the function g(xl)in the neighborhood of xl. It acts as a simple denoising operator
(e.g., a sparsity regularization term will induce a thresholding operator). The use of the proximity
operator introduces exibility in solving convex minimization problems since no assumptions are
required about non-smoothness. In the particular case, when gis the indicator function ιCof the
convex set C, the proximity operator reduces to the projection onto C. The indicator function of a
non-empty closed convex set C RNat a given point xlRNis dened as:
(xl)ιC(xl) =
0xl∈ C
+xl/∈ C.
(3.21)
In this respect, proximal splitting methods can also solve constrained minimization problems by
resorting to the indicator function of the convex set dened by the constraint.
2See Appendix .1 for the denitions of proper and lower semi-continuous functions.
3Γ0(Rn) : class of proper, convex and lower semi-continuous functions from Rnto ]− ∞,+].
Chapter 3: Sparse representation and convex optimization 26
One of the proximal splitting techniques that overcome the non-dierentiability diculty is the
so-called forward-backward (FB) algorithm [33,58]. It can be considered as a generalization of the
projected gradient method. It consists in alternating between a gradient descent step applied on
the dierentiable function and a proximal step applied on the non-dierentiable one. Considering
the minimization problem (3.19) with two functions, one of them g2is dierentiable, and the other
g1is non-smooth, the forward-backward solution is characterized by
x(t+1)
l= proxδ(t)g1
| {z }
backward step
(x(t)
lδ(t)g2(x(t)
l)
| {z }
forward step
),(3.22)
with δ(t)being the step size. The algorithm performs an explicit gradient (forward) step using
the function g2followed by implicit (backward) step through the proximity operator of the non-
dierentiable function g1. This is similar to the major and minor cycles of CLEAN. On the one
hand, when g1= 0, the algorithm reduces to a gradient method. On the other hand, when g2= 0,
it resolves to a proximal point method. In the particular case when the function g1is the 1norm,
the proximity step becomes a soft-thresholding operation, and fast algorithms can be derived such
as the iterative shrinkage-thresholding algorithm (ISTA) [46,51] and fast ISTA (FISTA) [9].
3.4.2 Primal-dual
The primal-dual (PD) approach [28,37,72,93,123] has an important advantage, over the proximal
splitting methods, of achieving full-splitting. This means using all the terms dening the mini-
mization problem, including the linear operators, independently. Thus, the solution of the original
task corresponds to solving a sequence of simpler sub-problems. PD solves convex minimization
problems of the form:
minimize
xlRNf(xl) + g(Lxl) + h(xl),(3.23)
where fΓ0(RN),gΓ0(RM),LRM×Nis a linear operator, and hΓ0(RN)is a dieren-
tiable function having a Lipschitzian gradient with a Lipschitz constant β]0,+[. The latter
assumption means that the gradient hof the dierentiable function hsatises
((z1,z2)(RN)2)∥∇h(z1)− ∇h(z2)βz1z2).(3.24)
The problem can also be generalized to the case of multiple functions similar to (3.19). The
minimization task (3.23) is referred to as the primal problem and it is associated with the following
dual problem:
minimize
vlRM(fh)(Lvl) + g(vl),(3.25)
27 3.4 Convex optimization
where fhis the inf-convolution4of the functions fand h,LRN×Mis the adjoint of the
linear operator L, and gis the conjugate function of g, dened as [8]
g(u)sup
x
xug(x),(3.26)
for any uRN. The Fenchel-Rockafellar duality theorem states that solving the dual problem
provides a lower bound on the minimum value obtained by the primal one, hence simplies the
problem [8]. PD solves simultaneously for both the primal and dual problems to nd a Kuhn-Tucker
point (b
xl,b
vl)which satises
Lb
vl− ∇h(b
xl)∂f (b
xl),Lb
xl∂g(b
vl),(3.27)
where ∂f (respectively ∂g) is the subdierential5of the function f(respectively g), and b
xl
and b
vlare the primal and dual solutions, respectively.
For the work developed in this thesis, we resort to a preconditioned variant of the PD algo-
rithm with forward-backward iterations introduced by [93]. The details of the adopted algorithm,
dubbed PDFB, are presented in Algorithm 1. Solving for the dual problem (3.25) and the primal
problem (3.23) results in the dual update (Step 4) and the primal update (Step 6), respectively.
These updates take the form of FB iterations. The dual update requires the proximity operator of
the conjugate functions gwhich can be easily derived from the proximity operator of the function
gthanks to the Moreau decomposition, dened as [36,78]
proxU1
κg=IMκUproxU
g/κ(κ1U1),(3.28)
where URM×Mis a preconditioning matrix. Algorithm 1is guaranteed to converge to the global
minimum of the primal problem (3.23) and the dual problem (3.25) if the following condition is
satised [93]:
1− ∥1/2
1L ∆1/2
22
S>2Sβ/2,(3.29)
where 1and 2are two general preconditioning matrices, Lis a concatenation of all the used
operators and .Sstands for the spectral norm.
PD exibility and parallelization capabilities makes it a very good choice for solving wideband
RI inverse problems since all the dual variables can be updated in parallel. Compared to the other
convex optimization solvers using proximity operators adopted for RI imaging such as the Douglas-
Rachford splitting algorithm [26], the simultaneous direction method of multipliers (SDMM) [27],
and the alternating direction method of multipliers (ADMM) [87], PD is more exible and has
further parallelization capabilities with limited overhead [87]. The bottleneck in SDMM is that
4See Appendix .1 for the denition of the inf-convolution.
5See Appendix .1 for the denition of the subdierential of a convex function.
Chapter 3: Sparse representation and convex optimization 28
Algorithm 1: Forward-backward primal-dual (PDFB)
Input: x(0)
l,v(0)
l
Parameters: τ,κ
1t0;ex(0)
l=x(0)
l;
2while stopping criterion not satised do
3Update dual variable
4v(t+1)
l=1IMprox1
g1
1v(t)
l+Lex(t)
l
5Update primal variable
6x(t+1)
l= prox1
2
fx(t)
l2h(x(t)
l) + Lv(t+1)
l
7ex(t+1)
l= 2x(t+1)
lx(t)
l
8tt+ 1
Result: x(t)
l,v(t)
l
at each iteration, expensive matrix inversion is necessary to update the solution. This can be
prohibitively expensive for wideband RI imaging. ADMM and the Douglas-Rachford splitting
algorithm are limited to only two functions in the minimization problem, thus require sub-iterations
which is very demanding computationally when multiple functions are to be minimized. The PD
algorithm, with the full-splitting of the operators and functions, does not present these drawbacks
[35].
Furthermore, PD exibility enables to incorporate additional information about the data such
as the density of the RI Fourier sampling when enforcing data delity. This allows the algorithm to
make larger steps towards the nal solution, hence accelerating the overall convergence speed [88].
In addition to its exibility and parallelization capabilities, PD allows for randomized updates of
the dual variables [93], meaning that they can be updated less often than the primal variable. Such
functionality lowers the computational cost per iteration, thus ensuring higher scalability of the
algorithmic structure, at the expense of an increased number of iterations to achieve convergence.
3.4.3 Convex optimization for wideband RI imaging - revisited
In the context of wideband imaging, the aim is to jointly recover the spatial and spectral information
of the radio emission. A straightforward approach is to image each channel separately, i.e., no
inter-channel information is exploited. On the one hand, this approach is highly parallelizable and
single-channel image recovery has been extensively studied in the literature [7,10,26,27,32,38,
4244,57,60,64,69,70,74,84,87,88,94,109,114,126]. On the other hand, it remains sub-optimal
for wideband imaging since the correlation of the wideband data is not exploited. Moreover, the
quality of the recovered images at the dierent channels is limited to their inherent resolution and
sensitivity. In this regard, the SARA approach explained in Section 3.3.2.1 is taken as a benchmark
29 3.4 Convex optimization
for the algorithms developed in the following chapters. SARA is solved leveraging the PDFB
algorithm explained in Section 3.4.2 using all the preconditioning and splitting functionalities for
scalability [44,88].
First applications of convex optimization methods imposing spatio-spectral sparsity priors on
the wideband RI image cube have shown promising results in recovering synthetic data [6,47,
54,124]. Since the work developed in this thesis falls into this category, we revisit the convex
optimization methods proposed for wideband RI imaging described briey in Section 2.5.3. In [124],
the authors propose a convex unconstrained minimization problem promoting sparsity-by-synthesis
of the wideband model cube. Based on the assumption that the spectrum is composed of a smooth
continuous part with sparse local deviations, this method allows for recovering non-smooth features
in the spectral domain. The minimization problem is dened as
minimize
ZRT×L
1
2ΦΨZ Y2
F+µ1Z,1+µ2Zp1,(3.30)
with ZRT×Lbeing a sparse decomposition of the original signal XRN×Lin a redundant
dictionary ΨRN×T;X=ΨZ. The matrix YRM×Lrepresents the RI data cube, assuming
all channels have the same number of visibilities. The dictionary Ψconsists of delta functions and
smooth polynomials, namely the basis functions of Chebyshev polynomials. The 1norm imposes
sparsity on the deviations from the smooth polynomials, denoted by Zp. The second regularizer
is the ,1norm, that is dened for a matrix Zas
Z,1=
T
X
n=1
max
1lL|zn,l|.(3.31)
This prior promotes joint sparsity across the spectral domain so that pixels are active or inactive
across all the channels. The minimization problem is solved following the FISTA algorithm [9].
The main limitation of this approach is the assumption that all spectra can be formed as a linear
combination of Chebyshev polynomials with some sparse deviations. Although this assumption
is valid for broad types of spectra, it might not be generic enough for more complicated spectra
observed with the new generation telescopes.
The authors in [54] present a convex unconstrained minimization problem promoting sparsity-
by-analysis of both the spatial and spectral information. The proposed minimization problem
reads
minimize
XRN×L
1
2XdH(X)2
F+µ1
2X2
F+µ2ΨX1+µ3XW1+ιRN×L
+(X).(3.32)
The data-delity term connects the RI model cube Xto the dirty image cube Xd. The operator
His a convolutional matrix containing the PSFs for all the channels. The matrix Ψis a spatial-
Chapter 3: Sparse representation and convex optimization 30
sparsifying dictionary containing the rst eight Daubechies wavelet bases. This dictionary promotes
average sparsity over multiple orthonormal wavelet bases. To promote smoothness in the spectral
dimension, Wimplements a discrete cosine transform (DCT). ιRN×L
+is the indicator function of
the convex set RN×L
+enforcing non-negativity of the wideband RI image cube. Finally, a quadratic
regularization on the image cube is adopted through the Frobenius norm of X, that is given by
XF="L
X
l=1
N
X
n=1 |xn,l|2#1/2
.(3.33)
The minimization problem is solved using the ADMM algorithm [15]. The tuning of multiple
arbitrary parameters (µ1,µ2,µ3) representing here the trade-o between the dierent priors, is
usually problematic since they are dicult to choose and inuence the nal reconstruction quality.
In a later work, the authors in [47] have discarded the smooth prior on the image cube to reduce
the number of free parameters to two. The new minimization problem is solved using the PD
approach [37,123], and spatial sparsity is promoted in an IUWT dictionary. Furthermore, [6] have
proposed an automatic procedure to tune the remaining two free parameters. The authors in [1]
have shown the limited performance of the spatio-spectral sparsity priors propoesed in [6,47,54].
They suggested the use of more sophisticated models, namely the low-rankness and joint average
sparsity model, to leverage the correlation in wideband RI data and recover high-resolution high-
sensitivity image cubes. This model forms the essence of the work developed in this thesis and will
be explained in details in the next chapter.
3.5 Conclusions
In this chapter, we provided all the mathematical knowledge needed for the proposed algorithms
in the next chapters. We started by posing the RI inverse problem as a minimization task and
explored the world of sparsity and the typical sparsity priors. We then explained in details convex
optimization methods as a powerful tool to solve convex minimization problems with a particu-
lar emphasis on the primal-dual framework adopted in this thesis. Finally, we revisited convex
optimization methods adopted for wideband RI imaging.
Chapter 4
Wideband super-resolution
imaging in radio interferometry
(HyperSARA)
Contents
4.1 Motivation .................................... 32
4.2 HyperSARA: optimization problem ..................... 32
4.2.1 Low-rankness and joint sparsity sky model ................. 33
4.2.2 HyperSARA minimization task ....................... 34
4.3 HyperSARA: algorithmic structure ..................... 36
4.3.1 HyperSARA in a nutshell .......................... 37
4.3.2 Underlying primal-dual forward-backward algorithm ........... 37
4.3.3 Adaptive 2bounds adjustment ....................... 39
4.3.4 Weighting schemes .............................. 40
4.4 Simulations .................................... 43
4.4.1 Simulations settings ............................. 43
4.4.2 Benchmark algorithms ............................ 44
4.4.3 Imaging quality assessment ......................... 45
4.4.4 Imaging results ................................ 47
4.5 Application to real data ............................ 52
4.5.1 Data and imaging details ........................... 52
4.5.2 Imaging quality assessment ......................... 56
4.5.3 Real imaging results ............................. 57
4.6 Conclusions .................................... 63
31
Chapter 4: Wideband super-resolution imaging in RI 32
4.1 Motivation
Upcoming radio interferometers are aiming to image the sky at new levels of resolution and sensi-
tivity, with wideband image cubes reaching close to the Petabyte scale for SKA. It is of paramount
importance to design ecient imaging algorithms which meet the capabilities of such powerful
instruments. On the one hand, appropriate algorithms need to inject complex prior image models
to regularize the inverse problem for image formation from visibility data, which only provide
incomplete Fourier sampling. On the other hand, these algorithms need to be highly parallelizable
to scale with the sheer amount of data and the large size of the wideband image cubes to be
recovered. In this respect, we propose a new approach, dubbed “HyperSARA”, within the versa-
tile framework of convex optimization to solve the wideband RI imaging problem. HyperSARA
consists in solving a sequence of weighted minimization problems, promoting the joint average
sparsity of the wideband model cube in an overcomplete dictionary (spanned by a collection of
eight wavelet bases and the Dirac basis) via the 2,1norm and its low-rankness via the nuclear
norm. The resulting minimization task is solved using the primal-dual forward-backward (PDFB)
algorithm (3.4.2). The algorithmic structure is shipped with highly interesting functionalities such
as preconditioning for accelerated convergence, and parallelization over the data blocks enabling
to spread the computational cost and memory requirements across a multitude of processing CPU
cores with limited resources and thereby allowing scalability to large data volumes. HyperSARA
also involves an adaptive strategy to estimate the noise level with respect to calibration errors
present in real data. We study the reconstruction performance of our approach on simulations
and real VLA observations in comparison with the single-channel SARA approach [44,88] and the
wideband deconvolution algorithm JC-CLEAN [85].
This chapter is structured as follows. Section 4.2 explains the low-rankness and joint sparsity
priors on the wideband model cube and presents the HyperSARA minimization task. Intuitive
and complete descriptions of the HyperSARA algorithmic structure are provided in Section 4.3.
Analysis of the proposed approach and comparison with the benchmark methods on simulations
are given in Section 4.4. Imaging results of VLA observations of Cyg A and the supernova remnant
G055.7+3.4 are presented in Section 4.5. Finally, conclusions and perspectives are stated in Section
4.6.
This work has been published in [24,44].
4.2 HyperSARA: optimization problem
33 4.2 HyperSARA: optimization problem
4.2.1 Low-rankness and joint sparsity sky model
In the context of wideband RI image reconstruction, we adopt the linear mixture model originally
proposed by [62]. It assumes that the wideband sky is a linear combination of few sources, each
having a distinct spectral signature. Following this model, the wideband image cube reads
X=SH,(4.1)
where the matrix S= (s1, .., sQ)RN×Qrepresents the physical sources present in the sky,
and their corresponding spectral signatures constitute the columns of the mixing matrix H=
(h1, .., hQ)RL×Q. Note that, in this model, physical sources with similar spectral behaviour are
considered as one “source” dening one column of the matrix S. Recall that solving for Sand H
would explicitly imply a source separation problem, that is a non-linear non-convex problem [66].
Instead, we leverage convex optimization by solving directly for Xwith appropriate priors. The
linear mixture model implies low-rankness of X, as the rank is upper bounded by the number of
“sources”. It also implies joint sparsity over the spectral channels; when all the sources are inactive
at one pixel position and regardless of their spectral indices, a full row of the matrix Xwill be
automatically equal to zero.
Given the nature of the RI Fourier sampling where the uv-coverage dilates with respect to fre-
quency, the combination of low-rankness and joint sparsity results in higher resolution and higher
dynamic range of the reconstructed RI image cube. On the one hand, enforcing low-rankness
implies correlation across the channels; this enhances the recovery of extended emissions at the
high-frequency channels and captures the high spatial frequency content at the low-frequency chan-
nels. On the other hand, promoting joint sparsity results in the rejection of isolated pixels that are
associated with uncorrelated noise since low-energy rows of Xare fully set to zero. Consequently,
the overall dynamic range of the reconstructed cube is increased.
The linear mixture model is similar to the one adopted in [96]; the sources can be seen as the
Taylor coecient images, and the spectral signatures can be seen as the spectral basis functions.
However, the Taylor expansion model is an approximation of a smooth function, hence only smooth
spectra can be reconstructed. Moreover, the order of Taylor series has to be set in advance. This
parameter is crucial as it represents a trade-o between accuracy and computational cost. The
linear mixture model adopted here is more generic since it does not assume a specic model of
the spectra, thus allowing for the reconstruction of complex spectral structures (e.g., emission or
absorption lines superimposed on a continuous spectrum). Moreover, there is no hard constraint
on the number of “sources”, and the prior adjusts the number of spectra needed to satisfy the data
constraint.
Chapter 4: Wideband super-resolution imaging in RI 34
4.2.2 HyperSARA minimization task
To enforce low-rankness and joint average sparsity of the RI image cube, we propose the following
minimization problem leveraging log-sum priors:
minimize
X=(xl)1lLRN×Lµ
J
X
j=1
log |σj(X)|+υ+µ
T
X
n=1
log [ΨX]n2+υ
subject to
yl,b Φl,bxl2ϵl,b ,(l, b)∈ {1,··· , L}×{1,··· , B}
XRN×L
+,
(4.2)
where (µ, µ, υ)]0,+[3are regularization parameters, Jmin{N , L}is the rank of X,σj(X)1jJ
are the singular values of X, and [ΨX]
ndenotes the n-th row of ΨX.yl,b Φl,bxl2ϵl,b is the
data-delity constraint on the b-th data block in the channel land XRN×L
+is the non-negativity
constraint.
The minimization problem (4.2) is a non-convex problem. To solve it, we leverage a majorization-
minimization approach similar to the one described for SARA in Section 3.3.2.1. More precisely,
it consists in successively solving convex optimization problems with weighted 1norms [20]. At
each iteration kN, the problem (4.2) is locally approximated at X(k)by the convex optimization
problem
minimize
X=(xl)1lLRN×LµX,ω(X(k))+µΨX2,1,ω(X(k))
subject to
yl,b Φl,bxl2ϵl,b ,(l, b)∈ {1,··· , L}×{1,··· , B}
XRN×L
+,
(4.3)
where ·,ω(X(k))is the weighted nuclear norm, promoting low-rankness, and is dened as
X,ω(X(k))=
J
X
j=1
ωj(X(k))σj(X),(4.4)
with weights ω(X(k)) = ωj(X(k))1jJgiven by
ωj(X(k)) = σjX(k)+υ1.(4.5)
The notation ·2,1,ω(X(k))denotes the weighted 2,1norm, promoting joint average sparsity, and
is dened as
ΨX2,1,ω(X(k))=
T
X
n=1
ωn(X(k))[ΨX]n2,(4.6)
35 4.2 HyperSARA: optimization problem
with the associated weights ω(X(k)) = ωn(X(k))1nTare given by
ωn(X(k)) = [ΨX(k)]n2+υ1.(4.7)
Data delity: Data delity is enforced in a distributed manner by splitting the data and the
measurement operator into multiple blocks where yl,b CMl,b is the b-th data block in the channel l
and Φl,b is the associated measurement operator; Φl,b =Θl,b Gl,bMl,bFZ. Since Gl,b CMl,b×o·Nl,b
consists of compact support kernels, the matrix Ml,b Ro·Nl,b×o·Nselects only the parts of the
discrete Fourier plane involved in computations for the data block yl,b, masking everything else.
ϵl,b is an upper bound on the 2norm of the noise vector nl,b CMl,b . The inter-channel blocking is
motivated by the fact that RI data probed at various wavelengths might have dierent noise levels
and modelling errors. Moreover, data splitting can be inevitable in the case of extreme sampling
rates, beyond the available memory. On the other hand, intra-channel blocking is motivated for
real data since they usually present calibration errors in addition to the thermal noise. Regarding
the estimation of the noise levels, the thermal noise associated with the visibilities is usually
given. If all visibilities within one data block have same noise variance, the noise norm follows a χ2
distribution and the bound ϵl,b can be computed from the χ2statistics [87]. This estimation is only
valid for perfectly calibrated RI data and is used with synthetic data (see Section 4.4.1). However,
in high sensitivity acquisition regimes, RI data may present signicant errors, originating from
DDEs modelling errors, which tend to dominate the thermal noise. In this setting, we propose in
Section 4.3.3 an adaptive strategy to estimate the noise levels during the imaging reconstruction.
Low-rankness: The nuclear norm that is dened for a matrix Xas the sum of its singular values,
is a relevant prior to impose low-rankness [62]. However, the ultimate goal is to minimize the rank
of the estimated cube, i.e., penalizing the vector of the singular values in 0sense. Therefore, we
adopt in our minimization problem (4.2) the log-sum prior of the singular values of Xas a better
approximation of low-rankness. The log-sum prior is minimized through a reweighted 1procedure
and the weights ω(X(k)) = ωj(X(k))1jJare to be updated iteratively so that, ultimately, large
weights will be applied to the low magnitude singular values and small weights will be attributed
to the large magnitude singular values. By doing so, the former singular values will be strongly
penalized, leaving only a minimum number of non-zero singular values, ensuring low-rankness in
0sense.
Joint average sparsity: The 2,1norm, dened as the 1norm of the vector whose components
are the 2norms of the rows of X, has shown to be a good prior to impose joint sparsity on the
estimated cube [62]. Penalizing the 2,1norm promotes joint sparsity since low-energy rows of
Xare fully set to zero. Ideally, one aims to minimize the number of non-zero coecients jointly
in all the channels of the estimated cube, by penalizing the vector of the 2norms of the rows
Chapter 4: Wideband super-resolution imaging in RI 36
in 0sense. Thus, we adopt in the proposed minimization problem (4.2) the log-sum prior of
the 2norms of the rows of Xas a better penalty function for joint sparsity. The log-sum prior
is minimized through a reweighted 1procedure and the weights ω(X(k)) = ωn(X(k))1nT
are updated iteratively ensuring that after several reweights rows with signicant energy in 2
sense are associated with small weights and rows with low 2norm - typically corresponding to
channel decorrelated noise - are associated with large weights, and hence will be largely penalized
leaving only a minimum number of non-zero rows. By doing so, we promote joint sparsity in 0
sense. The considered average sparsity dictionary ΨRT×Nis the celebrated SARA dictionary;
a concatenation of the Dirac basis and the rst eight Daubechies wavelet dictionaries; Ψ=
(Ψ1,·· · ,ΨD)[2,4,26,27,44,87,88,94].
The regularization parameters: If we adopt a statistical point of view, the 1norm of a
random variable xRN,µx||1, can be seen as the negative log of a Laplace prior with a
scale parameter 1. This scale parameter (equivalently the regularization parameter µ) can be
estimated by the maximum likelihood estimator of the Laplace distribution as x||1/N. We recall
that the nuclear norm is the 1norm of the vector of the singular values of Xand the 2,1norm is the
1norm of the vector of the 2norms of the rows of X. From this perspective, one can estimate the
regularization parameters associated with the nuclear norm and the 2,1norm in the same fashion
as µ=N/Xand µ=N/ΨX2,1, respectively. A convenient choice is to set the parameter
µ= 1. Consequently, µcan be set as the ratio between the estimated regularization parameters,
that is µ=X/ΨX2,1. This ratio has shown to give the best results on extensive sets of
dierent simulations. Moreover, we found that µ=Xdirty/ΨXdirty2,1estimated directly
from the dirty RI image cube is a good approximation.
SARA vs HyperSARA: The proposed HyperSARA approach is the wideband version of the
SARA approach described in Section 3.3.2.1. On the one hand, SARA solves a sequence of weighted
1minimization problems promoting average sparsity-by-analysis of the sky estimate in Ψ. On
the other hand, HyperSARA solves a sequence of weighted nuclear and 2,1minimization tasks of
the form (4.3) promoting low-rankness and joint average sparsity-by-analysis of the wideband sky
estimate in Ψ.
4.3 HyperSARA: algorithmic structure
To solve the HyperSARA minimization problem (4.3), we leverage the PDFB algorithm explained
in Section 3.4.2. The data-delity constraints can be imposed by means of the indicator function
ιCof a convex set C(3.21). Doing so, the minimization problem (4.3) can be equivalently redened
37 4.3 HyperSARA: algorithmic structure
as
minimize
X=(xl)1lLRN×L
+
µX,ω(X(k))+µΨX2,1,ω(X(k))+
L
X
l=1
B
X
b=1
ιB(yl,bl,b )(Φl,bxl),(4.8)
where
B(yl,b, ϵl,b ) = Φl,bxlCMl,b :yl,b Φl,b xl2ϵl,b(4.9)
denotes the 2ball centred in yl,b of radius ϵl,b >0, where ϵl,b reects the noise statistics. The
notation ιB(yl,bl,b )denotes the indicator function of the 2ball B(yl,b, ϵl,b ).
4.3.1 HyperSARA in a nutshell
The HyperSARA approach consists in solving a sequence of weighted minimization problems of the
form (4.8), to achieve low-rankness and joint average sparsity of the estimated wideband model cube
in 0sense. Each of these minimization problems is solved using the adaptive PDFB algorithm,
further development of PDFB that enables imaging real data in the presence of calibration errors.
In Figure 4.1, we display the schematic diagram of the adaptive PDFB and summarize its
computation ow in what follows. At each iteration t, the master CPU core distributes the current
estimate of the wideband image cube and its Fourier coecients to the processing CPU cores.
The former is distributed to the CPU cores associated with the priors (low-rankness and the joint
average sparsity) whereas the latter are distributed to the CPU cores associated with the data-
delity constraints. The updates from all the CPU cores are then gathered in the master CPU
core to update the estimate of the image cube. In essence, all the updates consist in a forward
step (a gradient step) followed by a backward step (a proximal step), which can be interpreted
as CLEAN-like iterations. Thus, the overall algorithmic structure intuitively takes the form of an
interlaced and parallel multi-space version of CLEAN [87].
At convergence of the adaptive PDFB, the weights involved in the priors are updated from the
estimated image cube following (4.5) and (4.7). The minimization problem of the form (4.8) is
thus redened and solved using the adaptive PDFB. The overall HyperSARA method is briefed
in Algorithm 2. It is summarized by two loops: an outer loop to update the weights and an inner
loop to solve the respective weighted minimization task using the adaptive PDFB.
4.3.2 Underlying primal-dual forward-backward algorithm
The details of the adaptive PDFB algorithm are presented in Algorithm 3. Note that steps coloured
in red represent the adaptive strategy to adjust the 2bounds on the data-delity terms adopted
for real data and explained in Section 4.3.3. The algorithmic structure consists of iterative updates
of the dual and primal variables via forward-backward steps. The dual variables P,(Wd)1dD
and (vl,b)1lL
1bB
associated with the low-rankness prior, the joint average sparsity prior and the
Chapter 4: Wideband super-resolution imaging in RI 38
data-delity terms, respectively are updated in parallel in Steps 9,12 and 16 to be used later on
in the update of the primal variable, that is the estimate of the RI image cube, in Steps 28 and 29.
The exact expressions of all the proximity operators are provided in Appendix .2. The proximity
operators of the functions vl,b, enforcing delity to data read projections onto the 2balls with
respect to the preconditioning matrices Ul,b. These are built from the density of the Fourier
sampling as proposed in [88]. More precisely, each matrix Ul,b, associated with a data block
yl,b CMl,b , is set to be diagonal. Its diagonal elements are strictly positive and are set to be
inversely proportional to the density of the sampling in the vicinity of the probed Fourier modes.
Note that when the Fourier sampling is uniform, the operators Ul,b reduce to the identity matrix.
However, this is not the case in radio interferometry. In fact, low Fourier modes tend to be highly
sampled as opposed to high Fourier modes. Given this discrepancy of the Fourier sampling, the
operators Ul,b depart from the Identity matrix. Incorporating such information on the RI data
has proved ecient in accelerating the convergence of the algorithm [88]. It is worth noting that
the projections onto the 2balls with respect to the preconditioning matrices Ul,b do not have a
closed form. Instead, they can be numerically estimated with an iterative algorithm. In this work,
we resort to FISTA [9].
Following (3.29), the convergence of Algorithm 3is dened by the following condition:
1/2
1L ∆1/2
22
S<1,(4.10)
with β= 0 since the minimization problem (4.8) has no dierentiable functions. The operator
Lis a concatenation of all the used operators. In our case, Lis a concatenation of the identity
operator INfor the nuclear norm, Ψfor the 2,1norm and Φfor the 2balls associated with the
data-delity terms. That being said, Algorithm 3is guaranteed to converge to the global minimum
of the minimization problem (4.8) for a proper set of the conguration parameters. By choosing
diagonal preconditioning matrices 1and 2with the conguration parameters (κi)1i3and τ
on the adequate diagonal locations, we can write the convergence condition for Algorithm 3as
κ1IN0 0
0κ2IT0
0 0 κ3U
1/2
IN
Ψ
Φ
[τIm]1/2
2
S
τκ1+κ2Ψ2
S+κ3U1/2Φ2
S<1,(4.11)
where for every XRN×L,U1/2Φ(X) = (U1/2
l,b Φl,bxl)1lL
1bB
. A convenient choice of (κi)1i3is
κ1= 1, κ2=1
Ψ2
S
and κ3=1
U1/2Φ2
S
.(4.12)
39 4.3 HyperSARA: algorithmic structure
In this setting, convergence is guaranteed for all 0< τ < 1/3.
It is worth noting that PDFB allows for randomized updates of the dual variables [93], meaning
that they can be updated less often than the primal variable. Such functionality lowers the compu-
tational cost per iteration at the expense of increased number of iterations to achieve convergence
(see Appendix .4 for further details on the randomized PDFB algorithm). Note that randomization
of the updates in Algorithm 3is not considered since it does not aect the reconstruction quality,
but only the speed of convergence.
4.3.3 Adaptive 2bounds adjustment
In high sensitivity acquisition regimes, calibrated RI data may present signicant errors, originating
from DDEs modelling errors, which tend to dominate the thermal noise and consequently limit
the dynamic range of the recovered images. In this setting, the 2bounds dening the data-
delity terms in the minimization task (4.8) are unknown, hence need to be estimated. [44] have
proposed an adaptive strategy to adjust the 2bounds during the imaging reconstruction by taking
into account the variability of the DDEs errors through time which we adopt herein. The main
idea consists in assuming the noise statistics to be piece-wise constant through time. Thus, a
data-splitting strategy based on the acquisition time is adopted, and the associated 2bounds are
adjusted independently in the PDFB algorithm. The adaptive procedure, described in Steps 19
-25 of Algorithm 3, coloured in red. It can be summarized as follows. Starting from an under-
estimated value ϵ(0)
l,b obtained by performing imaging with the non-negative least-squares (NNLS)
approach, each 2bound ϵ(t+1)
l,b is updated as a weighted mean of the current estimate ϵ(t)
l,b and the
2norm of the associated data-block residual yl,b Φl,b e
x(t)
l2. This update is performed only
when the relative distance between the former and the latter saturates above a certain bound λ2
set by the user. Note that, conceptually, each update of the 2bounds redenes the minimization
problem set in (4.8). Thus, to ensure convergence of the minimization problem before updating
the 2bounds, two convergence conditions should be met. These are the saturation of the image
cube estimate, reected by β(t+1) =X(t+1) X(t)F
X(t+1)F
being below a low value λ1set by the user
and a minimum number of iterations between consecutive updates is performed. Note that the
image obtained with the NNLS approach tends to over-t the noisy data since only non-negativity
is imposed in the minimization problem. Therefore, the bounds ϵ(0)
l,b are usually under-estimated.
As a rule of thumb, one can initialize the bounds ϵl,b as few orders of magnitude lower than the 2
norms of the associated data blocks depending on the expected noise and calibration error levels.
An overview of the variables and parameters associated with the adaptive strategy is provided in
Appendix .3 (see [44] for more details).
Chapter 4: Wideband super-resolution imaging in RI 40
Algorithm 2: HyperSARA approach
Given X(0),P(0),W(0) ,v(0),ϵ(0) ,ϑ(0)
θ(0) =1J;θ(0) =1T
Outer loop
For k= 1, . . .
(X(k+1),P(k+1) ,W(k+1),v(k+1) ,ϵ(k+1),ϑ(k+1) ) =
Adaptive PDFB X(k),P(k),W(k),v(k),ϵ(k),ϑ(k),θ(k),θ(k)
Inner loop
θ(k+1) =υω(X(k+1))using (4.5)
θ(k+1) =υω(X(k+1))using (4.7)
Until convergence
Output X(k),P(k),W(k),v(k),ϵ(k),ϑ(k)
4.3.4 Weighting schemes
The reweighting procedure represents the outer loop of Algorithm 2. At each reweight indexed
by kN, the HyperSARA minimization task with log-sum priors (4.2) is locally approximated
by the weighted nuclear and 2,1minimization problem (4.3), with weights dened in (4.5) and
(4.7). The resulting minimization problem (4.3) (equivalently (4.8)) is solved using the adaptive
PDFB described in Algorithm 3. The weights θ(k+1) and θ(k+1) are updated using (4.5) and (4.7)
applied to the solution X(k+1). Note that the weights dened in (4.5) and (4.7) are multiplied by
the regularization parameter υin Algorithm 2. This does not aect the set of minimizers of the
global problem (4.2). The parameter υis initialized to 1 and decreased at each reweighting step
by a xed factor, which is typically chosen between 0.5 and 0.9. This has been shown to improve
the convergence rate and the scalability of the algorithm [26,87]. Starting from weights equal to
1,i.e.,θ(0) =1Jand θ(0) =1T, with 1Jstands for the vector of size Jwith all coecients
equal to 1, the approach ensures that after several 2,1norm reweights coecients with signicant
spectrum energy in 2sense are down-weighted, whilst other coecients - typically corresponding
to noise - remain highly penalized as their corresponding weights are close to 1. This ensures
higher dynamic range of the reconstructed RI image cube. Similarly, after several nuclear norm
reweights negligible singular values are more penalized as they are accompanied with large weights.
This guarantees more low-rankness and higher correlation across the channels, thus increasing the
overall resolution of the estimated image cube.
41 4.3 HyperSARA: algorithmic structure
Master core
FZ
| {z }
e
X(t)
=
| {z }
u(t)
l,b
Low rankness core
FB step
S
µθ(k)
| {z }
Backward step
Forward step
z }| {
· · · ·
P(t+1)
z }| {
· · ·
Joint average sparsity cored
FB step
S`2,1
kΨk2
Sµθ(k)
| {z }
Backward step
Forward step
z }| {
· · · ·
Ψd
A(t+1)
d
z }| {
· · · · · ·
Data fidelity corel,b
FB step
PEyl,b,(t)
l,b
| {z }
Backward step
Forward step
z }| {
· · · ·
Gl,bΘl,b
v(t+1)
l,b
z }| {
· · ·
Master core
PN×L
R+
τ
P(t+1)
z }| {
τ
kΨk2
S
D
Õ
d=1
e
A(t+1)
1, .. ., e
A(t+1)
d, .. ., e
A(t+1)
D
z }| {
τ
kU1/2Φk2
S
ZF
e
v(t+1)
1,1,. .., e
v(t+1)
l,b,.. ., e
v(t+1)
L,B
z }| {
| {z }
X(t+1)
Figure 4.1: Schematic diagram at iteration tin the adaptive PDFB, detailed in Algorithm 3. It
showcases the parallelism capabilities and overall computation ow. Intuitively, each forward-backward
step in data, prior and image space can be viewed as a CLEAN-like iteration. The overall algorithmic
structure then intuitively takes the form of an interlaced and parallel multi-space version of CLEAN.
Chapter 4: Wideband super-resolution imaging in RI 42
Algorithm 3: The adaptive PDFB algorithm underpinning HyperSARA
Data: (yl,b)l,b ,l∈ {1,··· , L},b∈ {1,··· , B}
Input: X(0),P(0) ,(W(0)
d)1dD,(v(0)
l,b )l,b,θ(0) ,θ(0),(ϵ(0)
l,b )l,b,(ϑ(0)
l,b )l,b
Parameters: (Ul,b)l,b,µ,µ,τ,(κi)1i3,λ1, λ2, λ3, ϑ
1t0;e
X(0) =X(0);
2while stopping criterion not satised do
3for l= 1 to Ldo
4bx(t)
l=FZ exl;// Fourier transforms
5for b= 1 to Bdo
6bx(t)
l,b =Ml,bbx(t)
l;// send to data cores
7Update dual variables simultaneously
8Promote low-rankness
9P(t+1) =IJproxκ1
1µ∥·∥,θP(t)+e
X(t)
10 Promote joint average sparsity
11 for d= 1 to Ddo
12 W(t+1)
d=INproxκ1
2µ∥·∥2,1,θW(t)
d+Ψ
de
X(t)
13 f
W(t+1)
d=ΨdW(t+1)
d
14 Enforce data delity
15 for (l, b)= (1,1) to (L, B)do
16 v(t+1)
l,b =Ul,b IMl,b proxUl,b
B2(yl,b,ϵ(t)
l,b )U1
l,b v(t)
l,b +Θl,bGl,b bx(t)
l,b
17 ev(t+1)
l,b =G
l,bΘ
l,bv(t+1)
l,b
18 Adjust the 2bounds
19 ρ(t)
l,b =
yl,b Φl,bex(t)
l
2
20 if β(t)< λ1&tϑ(t)
l,b > ϑ&ρ(t)
l,b ϵ(t)
l,b
ϵ(t)
l,b
> λ2then
21 ϵ(t+1)
l,b =λ3ρ(t)
l,b + (1 λ3)ϵ(t)
l,b
22 ϑ(t+1)
l,b =t
23 else
24 ϵ(t+1)
l,b =ϵ(t)
l,b
25 ϑ(t+1)
l,b =ϑ(t)
l,b
26 Update primal variable
27 for l= 1 to Ldo
28 h(t+1)
l=κ1p(t+1)
l+κ2
D
X
d=1 ew(t+1)
d,l +κ3ZF
B
X
b=1
M
l,bev(t+1)
l,b
29 X(t+1) =PRN×L
+X(t)τH(t+1)
30 e
X(t+1) = 2X(t+1) X(t)
31 β(t+1) =X(t+1) X(t)F
X(t+1)F
32 tt+ 1
Result: X(t),P(t),W(t),v(t),ϵ(t),ϑ(t)
43 4.4 Simulations
4.4 Simulations
In this section, we rst investigate the performance of the low-rankness and joint average sparsity
priors on realistic simulations of wideband RI data. We then assess the eciency of our approach
HyperSARA in comparison with the wideband JC-CLEAN algorithm [85] and the single-channel
imaging approach SARA [26,88]. Note that, in this setting, the 2bounds on the data-delity
terms are derived directly from the known noise statistics, thus xed.
4.4.1 Simulations settings
To simulate wideband RI data, we utilize an image of the W28 supernova remnant1, denoted by
x1, that is of size N= 256 ×256, with a peak value normalized to 1. The image x1is decomposed
into Q= 10 sources, i.e.,x1=PQ
q=1 sq, with (sqRN)1qQ. These consist of 9 dierent sources
whose brightness is in the interval [0.005 1] and the background. Note that the dierent sources
may have overlapping pixels. The wideband image cube, denoted by X, is built following the linear
mixture model described in (4.1). The sources (sq)1qQconstitute the columns of S. The sources’
spectra, dening the columns of the mixing matrix H, consist of emission lines superimposed on
continuous spectra. These follow the curvature model: hq=(νl
ν1)αq+βqlog( νl
ν1)1lL1qQ
,
where αqand βqare the respective spectral index and the curvature parameters associated with
the source sq. Emission lines at dierent positions and with dierent amplitudes are then added to
the continuous spectra. Wideband image cubes are generated within the frequency range [ν1, νL] =
[1.4,2.78] GHz, with uniformly sampled channels. Tests are carried out on two image cubes with
a total number of channels L∈ {15,60}. Note that the rank of the considered image cubes in
a matrix form is upper bounded by min{Q, L}. Figure 4.2 shows channel ν1of the simulated
wideband image cube. To study the eciency of the proposed approach in the compressive sensing
framework, we simulate wideband data cubes using a non-uniform random Fourier sampling with
a Gaussian density prole at the reference frequency ν1= 1.4GHz. To mimic RI uv-coverages, we
introduce holes in the sampling function through an inverse Gaussian prole, so that the missing
Fourier content is mainly concentrated in the high spatial frequencies [87]. We extend our study
to realistic simulations using a VLA uv-coverage. For each channel indexed by l∈ {1,··· , L},
its corresponding uv-coverage is obtained by scaling the reference uv-coverage with νl1, this is
intrinsic to wideband RI data acquisition. Figure 4.2 shows the realistic VLA uv-coverages of all the
channels projected onto one plane. The visibilities are corrupted with additive zero-mean complex
white Gaussian noise of variance ϱ2resulting in input signal-to-noise ratios iSNR ∈ {20,40,60}
dB, dened as
iSNR = 10 log10 PL
l=1 Φlxl2
2/Ml
2!(4.13)
1Image courtesy of NRAO/AUI and [17]
Chapter 4: Wideband super-resolution imaging in RI 44
(a) Realistic VLA uv-coverages (b) Ground-truth image x1
Figure 4.2: Simulations using realistic VLA uv-coverage: (a) The realistic VLA uv-coverages of all the
channels projected onto one plane. (b) Channel ν1of the simulated wideband model cube, a 256 ×256
region of the W28 supernova remnant, shown in log10 scale.
Given same noise variance ϱ2
χon all the visibilities, the 2bounds ϵl,b on the data-delity terms
are derived from the noise variance, where the noise norm follows a χ2distribution [87]. Thus, the
global bound is given by
ϵ=qM+φM ϱχ,(4.14)
with φis the number of standard deviations above the mean of the χ2distribution (we set φ= 2).
The block constraints must satisfy Pl,b(ϵl,b)2=ϵ2, and the 2bounds associated with the dierent
data blocks are ϵl,b =pMl,b/M ϵ. We re-emphasize that the adaptive 2bounds strategy is
designed for imaging real data due to the unknown calibration errors in addition to the thermal
noise. Therefore, no adjustment of the 2bounds is required on simulations. We dene the sampling
rate (SR) as the ratio between the number of measurements per channel Mland the size of the
image N:
SR =Ml
N.(4.15)
Several tests are performed using the two image cubes with L∈ {15,60}and varying SR from 0.01
to 1and iSNR ∈ {20,40,60}dB.
4.4.2 Benchmark algorithms
In the rst instance, we showcase the advantage of reweighting through comparison of HyperSARA
with the following benchmark algorithms: (i) Low-Rankness and Joint Average Sparsity (LRJAS)
formulated in (4.8) for ω=1Jand ω=1T(ii) Low-Rankness (LR) formulated as follows:
minimize
X=(xl)1lLRN×L
+
µ1X,ω(X(k))+
L
X
l=1
B
X
b=1
ιB(yl,bl,b )(Φl,bxl),(4.16)
45 4.4 Simulations
(iii) Joint Average Sparsity (JAS) formulated below:
minimize
X=(xl)1lLRN×L
+
µ2ΨX2,1,ω(X(k))+
L
X
l=1
B
X
b=1
ιB(yl,bl,b )(Φl,bxl).(4.17)
LR, JAS and LRJAS are solved using the PDFB algorithm explained in Section 4.3.2 with ω=1J
and ω=1T. For LR, µ1is set to 1 and for JAS µ2= 102. In HyperSARA and LRJAS, the
regularization parameters are set to µ= 1 and µ= 102, leveraging the dirty wideband image
cube Xdirty as explained in Section 4.2.2.
In the second instance, we evaluate the performance of our approach HyperSARA in comparison
with the CLEAN-based approach JC-CLEAN [85] where we adopt the Briggs weighting for optimal
results (the robustness parameter is set to -0.5). Recall that JC-CLEAN involves polynomial
tting to enhance the reconstruction of smooth spectra. However, this is not optimal for the
simulated spectra where emission lines are incorporated. Therefore, we do not consider polynomial
tting in imaging the simulated wideband data with JC-CLEAN. We also compare with the single-
channel image reconstruction approach SARA explained in Section 3.3.2.1. We rewrite the SARA
minimization problem (3.16) by imposing the data-delity constraints via the indicator function
as follow:
minimize
xlRN
+eµΨxl1,ω(X(k))+
B
X
b=1
ιB(yl,bl,b )(Φl,bxl).(4.18)
The SARA approach is solved using the PDFB algorithm [88] (with eµ= 102). The dierent
methods are studied using our matlab implementation, with the exception to JC-CLEAN. To
give the reader a brief idea about the speed of HyperSARA in comparison with SARA; for an
image cube of size N= 256 ×256 pixels and L= 60 channels, and for Ml= 0.5N, where Ml
is the number of visibilities per frequency channel, and using 1 node (36 CPU cores) of Cirrus2,
SARA needs approximately 30 minutes to converge while HyperSARA requires around 2 hours to
converge. Note that [1] have shown the superior performance of the low-rankness and joint average
sparsity model in comparison with the state-of-the-art spatio-spectral sparsity algorithm proposed
in [54] on realistic simulations of wideband RI data.
4.4.3 Imaging quality assessment
In the qualitative comparison of the dierent methods, we consider the visual inspection of the
following cubes: the estimated model3cube b
X, the absolute value of the error cube Edened as
the absolute dierence between the ground-truth model cube Xand the estimated model cube
b
X,i.e.,E=|Xb
X|, and the naturally-weighted residual image cube Rwhose columns are
2Cirrus is one of the EPSRC Tier-2 UK National HPC Facilities (http://www.cirrus.ac.uk).
3the estimated image cube obtained by optimization methods is usually called the model cube as opposed to CLEAN-
based methods where the nal product is the so-called restored cube that results from convolving the model cube with the
respective CLEAN beams, then adding the residual image cube.
Chapter 4: Wideband super-resolution imaging in RI 46
given by rl=ηlΦ
l(ylΦlb
xl)where yl=Θlylare the naturally-weighted RI measurements,
Θlis a diagonal matrix whose elements are the natural weights, Φl=ΘlGlFZ is the associated
measurement operator and ηl= 1/max
n=1:N(Φ
lΦlδ)nis a normalization factor, where δRNis
an image with value 1 at the phase center and zero otherwise. By doing so, the PSF dened as
gl=ηlΦ
lΦlδhas a peak value equal to 1. More specically to JC-CLEAN, we consider the Briggs-
weighted residual image cube e
RJC-CLEAN whose columns are e
rl=eηle
Φ
l(e
yle
Φlb
xl).e
yl=e
Θlyl
are the Briggs-weighted RI measurements, e
Θlis a diagonal matrix whose elements are the Briggs
weights, e
Φl=e
ΘlGlFZ is the associated measurement operator and eηlis a normalization factor.
We also consider the restored cube b
TJC-CLEAN whose columns are b
tl=b
xlcl+e
rl, where clis the so-
called CLEAN beam (typically a Gaussian tted to the primary lobe of the PSF gl), and the error
cube e
EJC-CLEAN =|Xb
TJC-CLEAN|. It is worth noting that we divide the columns of the restored
cube b
TJC-CLEAN by the ux of the respective CLEAN beams, i.e., the 1norm of the CLEAN
beams, to have the same brightness scale as the ground-truth. We recall that the restored cube
is the nal product of the CLEAN-based approaches because of its non-physical estimated model
cube, as opposed to compressive sensing-based approaches. The latter class of methods involve
sophisticated priors, resulting in accurate representations of the unknown sky image achieved on
simulations [26,42,126] and real data applications [44,57,88,94,125] for single-channel RI imaging.
We also provide a spectral analysis of selected pixels from the dierent sources of the estimated
wideband cubes. These are the estimated model cubes b
XHyperSARA,b
XSARA,b
XLRJAS,b
XLR and
b
XJAS, and the estimated restored cube b
TJC-CLEAN. For the case of unresolved source, i.e., point-
like source, we derive its spectra from its total ux at each frequency, integrated over the associated
beam area.
In the quantitative comparison of the dierent approaches, we adopt the signal-to-noise ratio
(SNR). For channel indexed l, it is dened as
SNRl(b
xl) = 20 log10 xl2
xlb
xl2,(4.19)
where xlis the original sky image at the frequency νland b
xlis the estimated model image. For
the full wideband model cube, we adopt the average SNR dened as
aSNR( b
X) = 1
L
L
X
l=1
SNRl(b
xl).(4.20)
For the sake of comparison with JC-CLEAN, we examine the similarity between the ground-truth
and the recovered model cubes with HyperSARA, SARA and JC-CLEAN up to the resolution of
the instrument. To this aim, we consider the smoothed versions of the model cubes, denoted by
Bfor the ground truth whose columns are bl=xlcl, and denoted by b
Bfor the estimated model
47 4.4 Simulations
cubes whose columns are b
bl=b
xlcl. We adopt the average similarity metric dened as
aSM(B,b
B) = 1
L
L
X
l=1
SMl(bl,b
bl),(4.21)
where for two signals bland b
bl, SMlis dened as
SMl(bl,b
bl) = 20 log10 max(bl2,b
bl2)
blb
bl2!.(4.22)
4.4.4 Imaging results
To investigate the performance of the proposed approach in the compressive sensing framework and
study the impact of the low-rankness and joint average sparsity priors on the image reconstruction
quality, we perform several tests on the data sets generated using a non-uniform random Fourier
sampling with a Gaussian density prole. We vary the Fourier sampling rate SR in the interval
[0.01,1], we also vary the iSNR and the number of channels Lsuch that iSNR ∈ {20,40}dB and
L∈ {15,60}. Simulated data cubes are imaged using LR (4.16), JAS (4.17), LRJAS (4.3) for ω=
1Jand ω=1T, and HyperSARA (4.3) with 10 consecutive reweights. Image reconstruction results
assessed using the aSNR metric are displayed in Figure 4.3. We notice that for SR values above
0.05, LR maintains a better performance than JAS. Better aSNR values are achieved by LRJAS,
which suggests the importance of combining both the low-rankness and joint average sparsity priors
for wideband RI imaging. More interestingly, HyperSARA supersedes these benchmark algorithms
with about 1.5 dB enhancement in comparison with LRJAS for all considered SR values. Moreover,
HyperSARA reaches high aSNR values for the drastic sampling rate of 0.01; these are 20 dB and
15 dB for iSNRs 40 dB and 20 dB, respectively. Note that we only showcase the results for SR
below 0.3 since similar behaviour is observed for higher values of SR. These results indicate the
eciency of reweighting.
For qualitative comparison, we proceed with the visual inspection of the estimated model
images, the absolute value of the error images and the residual images (naturally-weighted data).
These are obtained by imaging the wideband data cube generated using realistic VLA uv-coverage
with L= 60 channels, SR = 1 and iSNR = 60 dB. The images of channels ν1= 1.4GHz and
ν60 = 2.78 GHz are displayed in Figures 4.4 and 4.5, respectively. On the one hand, LRJAS
estimated model images (rst row, second panel) have better resolution in comparison with JAS
(rst row, third panel) and LR (rst row, fourth panel). LRJAS also presents lower error maps
(second row, second panel) in comparison with JAS (second row, third panel) and LR (second
row, fourth panel). This is highly noticeable for the low-frequency channels. On the other hand,
HyperSARA provides maps with enhanced overall resolution and dynamic range, reected in better
residuals and smaller errors. In Figure 4.6, we provide spectral analysis of selected pixels from the
Chapter 4: Wideband super-resolution imaging in RI 48
(a) L= 60, iSNR = 40
0 0.02 0.05 0.1 0.2 0.3
12
14
16
18
20
22
24
26
28
(c) L= 60, iSNR = 20
0 0.02 0.05 0.1 0.2 0.3
8
10
12
14
16
18
20
22
(b) L= 15, iSNR = 40
0 0.02 0.05 0.1 0.2 0.3
12
14
16
18
20
22
24
26
28
(d) L= 15, iSNR = 20
0 0.02 0.05 0.1 0.2 0.3
8
10
12
14
16
18
20
22
Figure 4.3: Simulations using random sampling with a Gaussian density prole: aSNR results for the
proposed approach HyperSARA and the benchmark methods LRJAS, JAS, LR and the monochromatic
approach SARA. The aSNR values of the estimated model cubes (y-axis) are plotted as a function of the
sampling rate (SR) (x-axis). Each point corresponds to the mean value of 5noise realizations. The
results are displayed for dierent model cubes varying the number of channels Land the input
signal-to-noise ratio iSNR. (a) L= 60 channels and iSNR = 40 dB. (b) L= 15 channels and iSNR = 40
dB. (c) L= 60 channels and iSNR = 20 dB. (d) L= 15 channels and iSNR = 20 dB.
estimated model cubes revealed in Figures 4.4 and 4.5. Once again, one can notice a signicantly
enhanced recovery of the spectra when combining the two priors as in LRJAS and HyperSARA.
Yet, the latter presents a more accurate estimation of the dierent shapes of the simulated spectra.
Once again, the eciency of our approach is conrmed.
When compared to single-channel image recovery, HyperSARA clearly exhibits higher perfor-
mance for all the data sets generated using a non-uniform random Fourier sampling with a Gaussian
density prole. In fact, almost 5 dB improvement in aSNR is achieved, as shown in Figure 4.3.
This conrms the relevance and eciency of the adopted spatio-spectral priors as opposed to the
purely spatial model of the SARA approach. Furthermore, for regimes with sampling rates above
0.01, increasing the number of channels enhances the recovery of HyperSARA, which shows the
eciency of the weighted nuclear norm prior in capturing the redundant information across the
channels resulting in the low-rankness of the model cube. We do not report the aSNR values for
JC-CLEAN since its non-physical model images result in poor SNR values.
49 4.4 Simulations
Channel ν1= 1.4GHz
Figure 4.4: Simulations with realistic VLA uv-coverage: reconstructed images of channel ν1= 1.4GHz
obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR = 60 dB. From left to right:
results of HyperSARA (aSNR = 30.13 dB), LRJAS (aSNR = 28.85 dB), JAS (aSNR = 25.97 dB) and
LR (aSNR = 26.75 dB). From top to bottom: the estimated model images in log10 scale, the absolute
value of the error images in log10 scale and the naturally-weighted residual images in linear scale.
Chapter 4: Wideband super-resolution imaging in RI 50
Channel ν60 = 2.78 GHz
Figure 4.5: Simulations with realistic VLA uv-coverage: reconstructed images of channel ν60 = 2.78
GHz obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR = 60 dB. From left to right:
results of HyperSARA (aSNR = 30.13 dB), LRJAS (aSNR = 28.85 dB), JAS (aSNR = 25.97 dB) and
LR (aSNR = 26.75 dB). From top to bottom: the estimated model images in log10 scale, the absolute
value of the error images in log10 scale and the naturally-weighted residual images in linear scale.
51 4.4 Simulations
(a) Ground-truth image x1
(b) HyperSARA estimated spectra
1.4e+09 1.8e+09 2.3e+09 2.8e+09
0
0.2
0.4
0.6
0.8
1
X
X
(d) JAS estimated spectra
1.4e+09 1.8e+09 2.3e+09 2.8e+09
0
0.2
0.4
0.6
0.8
1
X
X
(c) LRJAS estimated spectra
1.4e+09 1.8e+09 2.3e+09 2.8e+09
0
0.2
0.4
0.6
0.8
1
X
X
(e) LR estimated spectra
1.4e+09 1.8e+09 2.3e+09 2.8e+09
0
0.2
0.4
0.6
0.8
1
X
X
Figure 4.6: Simulations with realistic VLA uv-coverage: reconstructed spectra of three selected pixels
obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR = 60 dB. The results are shown
for: (b) the proposed approach HyperSARA, (c) LRJAS, (d) JAS and (e) LR, compared with the
ground-truth. Each considered pixel is highlighted with a colored circle in the ground-truth image x1
displayed in (a).
Chapter 4: Wideband super-resolution imaging in RI 52
For a qualitative study of the imaging quality of HyperSARA, SARA and JC-CLEAN, we
display in Figures 4.7 and 4.8 the estimated images, the absolute value of the error images and the
residual images of channels ν1= 1.4GHz and ν60 = 2.78 GHz, respectively. These are obtained by
imaging the wideband data cube generated using realistic VLA uv-coverage with L= 60 channels,
SR = 1 and iSNR = 60 dB. The resolution of the estimated images with HyperSARA (rst row,
left panel) is higher than that achieved by SARA (rst row, middle panel) and JC-CLEAN (rst
row, right panel), thanks to the weighted nuclear norm that enforces correlation, hence enhances
the details at the low-frequency channels and improves the quality of the extended emission at the
high-frequency channels. Moreover, higher dynamic range, reected in less error maps, is achieved
by HyperSARA (second row, left panel) thanks to the weighted 2,1norm that rejects uncorrelated
noise. We show examples of the recovered spectra with the dierent approaches in Figure 4.9.
HyperSARA also achieves accurate recovery of the scrutinized spectra, as opposed to JC-CLEAN
and the single-channel recovery approach SARA. On the one hand, the poor recovery of SARA is
expected since no correlation is imposed and the resolution is limited to the single-channel Fourier
sampling. On the other hand, the recovery of the spectral information with JC-CLEAN is limited
as no explicit spectral model is considered (recall that polynomial tting is not considered with
JC-CLEAN since the simulated spectra contain emission lines). Finally, we report the average
similarity values of the ground-truth with HyperSARA, SARA and JC-CLEAN results at the
resolution of the instrument. These are aSM(B,b
BHyperSARA) = 52.45 dB, aSM(B,b
BSARA) = 41.23
and aSM(B,b
BJC-CLEAN) = 16.38 dB. These values indicate high accuracy of HyperSARA and more
generally strong agreement between the compressive sensing-based approaches when it comes to
recovering the Fourier content up to the resolution of the instrument. On the other hand, the poor
reconstruction of JC-CLEAN is due to the complexity of the spectra considered in the simulations.
4.5 Application to real data
In this section, we present the results of HyperSARA for wideband imaging on VLA observations of
the radio galaxy Cyg A and the supernova remnant G055.7+3.44in comparison with JC-CLEAN
[85] and the single-channel image reconstruction algorithm SARA [26,44]. As opposed to [88], the
2bounds on the data-delity terms are updated in the algorithm, allowing for imaging in the
presence of unknown noise levels and calibration errors. The values of the parameters associated
with the adaptive strategy are revealed in Appendix .3.
4.5.1 Data and imaging details
Cyg A: The data are part of wideband VLA observations within the frequency range 2-18 GHz
acquired over two years (2015-2016). We consider here 32 channels from the S band (2 - 4 GHz)
4The considered data sets have been already calibrated with the standard RI pipelines.
53 4.5 Application to real data
Channel ν1= 1.4GHz
Figure 4.7: Simulations with realistic VLA uv-coverage: reconstructed images of channel ν1= 1.4GHz
obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR = 60 dB. From left to right:
results of the proposed approach HyperSARA (aSNR = 30.13 dB), the monochromatic approach SARA
(aSNR = 23.46 dB) and JC-CLEAN (aSNR = 9.39 dB). From top to bottom (rst and second columns):
the estimated model images in log10 scale, the absolute value of the error images in log10 scale and the
naturally-weighted residual images in linear scale. From top to bottom (third column): the estimated
restored images in log10 scale, the absolute value of the error images in log10 scale and the
Briggs-weighted residual images in linear scale.
Chapter 4: Wideband super-resolution imaging in RI 54
Channel ν60 = 2.78 GHz
Figure 4.8: Simulations with realistic VLA uv-coverage: reconstructed images of channel ν60 = 2.78
GHz obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR = 60 dB. From left to right:
results of the proposed approach HyperSARA (aSNR = 30.13 dB), the monochromatic approach SARA
(aSNR = 23.46 dB) and JC-CLEAN (aSNR = 9.39 dB). From top to bottom (rst and second columns):
the estimated model images in log10 scale, the absolute value of the error images in log10 scale and the
naturally-weighted residual images in linear scale. From top to bottom (third column): the estimated
restored images in log10 scale, the absolute value of the error images in log10 scale and the
Briggs-weighted residual images in linear scale.
55 4.5 Application to real data
(a) Ground-truth image x1
(b) HyperSARA estimated spectra
1.4e+09 1.8e+09 2.3e+09 2.8e+09
0
0.2
0.4
0.6
0.8
1
X
X
(c) SARA estimated spectra
1.4e+09 1.8e+09 2.3e+09 2.8e+09
0
0.2
0.4
0.6
0.8
1
X
X
(d) JC-CLEAN estimated spectra
1.4e+09 1.8e+09 2.3e+09 2.8e+09
0
0.2
0.4
0.6
0.8
1
X
T
Figure 4.9: Simulations with realistic VLA uv-coverage: reconstructed spectra of three selected pixels
obtained by imaging the cube with L= 60 channels, SR = 1 and iSNR = 60 dB. The results are shown
for: (b) the proposed approach HyperSARA, (c) the monochromatic approach SARA and (d)
JC-CLEAN, compared with the ground-truth. Each considered pixel is highlighted with a colored circle
in the ground-truth image x1displayed in (a).
Chapter 4: Wideband super-resolution imaging in RI 56
and the C band (4 - 8 GHz) spanning the frequency range [ν1, ν32] = [2.04,5.96] GHz with a
frequency step 128 MHz and total bandwidth of 4GHz (Data courtesy of R.A. Perley). The data
in each channel are acquired using the B conguration of the VLA and are of size 25 ×104. We
split the data in each channel to 4 blocks of size 6×104measurements on average, where each
block corresponds to data observed within a time interval over which calibration errors are assumed
constant. For imaging, we consider images of size 1024 ×512 with a pixel size δx = 0.19′′. The
chosen pixel size corresponds to recovering the signal up to 2.5times the nominal resolution at
the highest frequency νL, given by the maximum baseline; BL=νL
cumax. It is common in RI
imaging to set the pixel size δx such that 1
5BLδx 1
3BL, so that all the PSFs are adequately
sampled. The resulting wideband image cube is of size 1024 ×512 ×32. With regards to the choice
of the regularization parameters in HyperSARA, we found that µ=Xdirty/ΨXdirty2,1(as
explained in Section 4.2.2) is large and results in smooth reconstructed model cubes. This can be
justied by the fact that the considered data set encompasses calibration errors, and the DDEs
are not corrected for in our measurement operator. However, we found that setting µ= 5 ×105
that is two orders of magnitude lower than the ratio estimated from the dirty model cube is a
good trade-o to recover high-resolution high-dynamic-range model cubes. Note that µis set to
1 as explained in Section 4.2.2. The regularization parameter associated with SARA is set to
eµ= 5 ×105. For SARA and HyperSARA, we solve 30 reweighted minimization problems using
the adaptive PDFB.
G055.7+3.4: The data are part of wideband VLA observations at the L band (1 - 2 GHz)
acquired in 20105. We process 10 channels from each of the following spectral windows: 1.444
1.498 GHz, 1.708 1.762 GHz and 1.836 1.89 GHz. Each consecutive 10 channels, corresponding
to one spectral window, have a frequency step of 6MHz and a total bandwidth of 60 MHz. The
data in each channel are of size 4×105visibilities, splitted to 4 blocks of size 105measurements on
average. The resulting wideband image cube is of size 1280 ×1280 ×30 with a pixel size δx = 8′′.
The chosen pixel size corresponds to recovering the signal up to 2times the nominal resolution of
the observations at the highest frequency νL. In a similar fashion to the Cyg A data set, we set the
regularization parameters µ= 1 and µ= 5 ×106. The regularization parameter associated with
SARA is set to eµ= 5 ×106. For SARA and HyperSARA, we solve 30 reweighted minimization
problems using the adaptive PDFB.
4.5.2 Imaging quality assessment
To assess the quality of the reconstruction, we perform a visual inspection of the obtained images.
For HyperSARA and SARA, we consider the estimated model cubes b
XHyperSARA and b
XSARA,
and the naturally-weighted residual image cubes RHyperSARA and RSARA. For JC-CLEAN, we
5Data courtesy of NRAO https://casaguides.nrao.edu/index.php/VLA_CASA_Imaging-CASA5.0.0.
57 4.5 Application to real data
consider Briggs weighting (the robustness parameter is set to 0.5) and examine the resultant
restored cube b
TJC-CLEAN and the Briggs-weighted residual image cube e
RJC-CLEAN. We report
the average standard deviation (aSTD) of all the residual image cubes, dened for an image cube
ZRN×Lby
aSTD(Z) = 1
L
L
X
l=1
STDl(zl),(4.23)
where STDl(zl)is the standard deviation of the image zlRNat the frequency νl.
We also provide a spectral analysis of selected pixels from the estimated cubes. These are
the estimated model cubes b
XHyperSARA and b
XSARA, and the estimated restored cube b
TJC-CLEAN.
For the case of unresolved source, i.e., point-like source, we derive its spectra from its total ux
at each frequency, integrated over the associated beam area. Finally, we report the similarity
of b
XHyperSARA and b
XSARA. Furthermore, we examine the smoothed versions of b
XHyperSARA,
b
XSARA and b
XJC-CLEAN at the resolution of the instrument, denoted by b
BHyperSARA,b
BSARA and
b
BJC-CLEAN, respectively. Recall that for channel indexed by l,b
bl=b
xlclwhere b
xlis the
estimated model image at the frequency νland clis the respective CLEAN beam. However, we
emphasize that smoothing b
XHyperSARA and b
XSARA is not recommended and is performed here
only for comparison purposes with JC-CLEAN.
4.5.3 Real imaging results
Cyg A: The estimated images of channels ν1= 2.04 GHz and ν32 = 5.96 GHz, obtained with the
proposed approach HyperSARA, the single-channel approach SARA and JC-CLEAN, are displayed
in Figures 4.10 and 4.11, respectively. Two key regions in Cyg A are emphasized: these are
the hotspots of the east and west jets (second and third columns). We can see that the model
images of HyperSARA exhibit more details at the low-frequency channels, visible at the hotspots
of Cyg A. Moreover, the features of Cyg A at the high-frequency channels are better resolved with
HyperSARA (see the emission line from the inner core to the east jet and the arc around the right
end of the west jet). The imaging quality of the SARA approach is poor at the low frequencies
since no inter-channel correlation is exploited, and the recovery is limited to the single-channel
inherent resolution. JC-CLEAN restored images are smooth since they result from convolving the
estimated model images with the corresponding CLEAN beams. In Figure 4.12, we display the
naturally-weighted residual images of HyperSARA and SARA. The aSTD values are 1.19 ×102
and 8.7×103, respectively which indicates higher delity to the naturally-weighted data of the
latter. Yet, SARA residual images (right) indicate poor recovery of Cyg A jets at the low-frequency
channels in comparison with those obtained with HyperSARA (left). Both HyperSARA and SARA
residual images present errors at the brightest pixel positions (the hotspots). These can be justied
by the dominant calibration errors at those positions. However, more substantial errors are kept in
the residual with HyperSARA and seem to be absorbed in the model images of SARA. HyperSARA
Chapter 4: Wideband super-resolution imaging in RI 58
and JC-CLEAN Briggs-weighted residual images are shown in Figure 4.13 with the respective
aSTD values are 4.1×103and 2.1×103. These indicate higher delity of JC-CLEAN to the
Briggs-weighted data. Recall that the two approaches solve for two dierent imaging problems;
HyperSARA solves for the naturally-weighted data whereas JC-CLEAN solves for the Briggs-
weighted data. Spectral analysis of the dierent approaches is revealed in Figure 4.14. One can
see that the spectra recovered with HyperSARA have higher intensity values at the low-frequency
channels, thanks to the weighted nuclear norm that enhances the details at the low-frequency
channels. Finally, given the unknown ground truth, we report the average similarity aSM values
of the proposed method with the benchmark approaches. These are aSM(b
XHyperSARA,b
XSARA) =
16.45 dB while aSM(b
BHyperSARA,b
BSARA) = 36.53 dB. Also aSM(b
BHyperSARA,b
BJC-CLEAN) = 33.36
dB. These values indicate high similarity of the recovered low spatial frequency content with all
the methods. In other words, there is strong agreement between the approaches up to the spatial
bandwidth of the observations.
G055.7+3.4: In Figures 4.15 and 4.16, we present the reconstructed images of channels ν1=
1.444 GHz and ν30 = 1.89 GHz, respectively, obtained with the proposed approach HyperSARA,
the single-channel approach SARA and JC-CLEAN. The gures clearly demonstrate a signicantly
higher performance of HyperSARA in terms of resolution and dynamic range. For instance, one can
see that the central extended emission is very well captured by HyperSARA in the overall estimated
model cube as opposed to SARA and JC-CLEAN. While SARA presents a smooth representation of
the source, JC-CLEAN provides a highly noisy representation. Moreover, the number of modelled
point sources is clearly higher with HyperSARA in particular at the low-frequency channels, unlike
SARA, where only a few sources are detected whereas JC-CLEAN presents a large number of false
detections. This suggests the eciency of the HyperSARA priors in capturing the correlation of
the data cube and enhancing the dynamic range of the recovered model cube. The naturally-
weighted residual images of HyperSARA and SARA are shown in Figure 4.17. The aSTD values
are 6.55 ×105and 8.37 ×105, respectively, which reects the higher delity to data achieved
by HyperSARA. The Briggs-weighted residual images of HyperSARA and JC-CLEAN are also
displayed in Figure 4.18, their respective aSTD values are 1.12 ×104and 7.75 ×105. These
indicate higher delity of JC-CLEAN to the Briggs-weighted data. Finally, we show examples of
the recovered spectra with the dierent approaches in Figure 4.19. When inspecting the dierent
spectra, one can see that HyperSARA succeeds to recover the scrutinized sources with higher ux
in comparison with the other approaches. Finally, we report the average similarity values of the
proposed method with the benchmark approaches. These are aSM(b
XHyperSARA,b
XSARA)=9.13
dB while aSM(b
BHyperSARA,b
BSARA) = 12.3dB. Also aSM(b
BHyperSARA,b
BJC-CLEAN) = 7.1dB.
The low aSM values conrm the substantial disagreement in the quality of the recovery up to
the resolution of the instrument, that is in agreement with the visual inspection of the estimated
59 4.5 Application to real data
Channel ν1= 2.04 GHz
Figure 4.10: Cyg A: recovered images of channel ν1= 2.04 GHz at 2.5times the nominal resolution at
the highest frequency νL. From top to bottom: estimated model images of the proposed approach
HyperSARA, estimated model images of the monochromatic approach SARA and estimated restored
images of JC-CLEAN using Briggs weighting. The full images are displayed in log10 scale (rst column)
as well as zooms on the east jet hotspot (second column) and the west jet hotspot (third column).
Chapter 4: Wideband super-resolution imaging in RI 60
Channel ν32 = 5.96 GHz
Figure 4.11: Cyg A: recovered images of channel ν32 = 5.96 GHz at 2.5times the nominal resolution at
the highest frequency νL. From top to bottom: estimated model images of the proposed approach
HyperSARA, estimated model images of the monochromatic approach SARA and estimated restored
images of JC-CLEAN using Briggs weighting. The full images are displayed in log10 scale (rst column)
as well as zooms on the east jet hotspot (second column) and the west jet hotspot (third column).
61 4.5 Application to real data
(a) Channel ν1= 2.04 GHz
(b) Channel ν32 = 5.96 GHz
Figure 4.12: Cyg A: naturally-weighted residual images obtained by the proposed approach
HyperSARA (left) and the monochromatic approach SARA (right). (a) Channel ν1= 2.04 GHz, and (b)
Channel ν32 = 5.96 GHz. The aSTD values are 1.19 ×102and 8.7×103, respectively.
(a) Channel ν1= 2.04 GHz
(b) Channel ν32 = 5.96 GHz
Figure 4.13: Cyg A: Briggs-weighted residual images obtained by the proposed approach HyperSARA
(left) and JC-CLEAN (right). (a) Channel ν1= 2.04 GHz, and (b) Channel ν32 = 5.96 GHz. The aSTD
values are 4.1×103and 2.1×103, respectively.
Chapter 4: Wideband super-resolution imaging in RI 62
(a) Estimated model image of HyperSARA at channel ν32 = 5.96 GHz
(b) Selected pixel P1
0
5
10
15
20
25
30
2.04 2.94 3.04 3.94 4.04 5.96
X
X
T
(d) Selected source S1
0
0.2
0.4
0.6
0.8
2.04 2.94 3.04 3.94 4.04 5.96
X
X
T
(c) Selected pixel P2
0
50
100
150
200
250
300
350
2.04 2.94 3.04 3.94 4.04 5.96
X
X
T
(e) Selected pixel P3
0
5
10
15
20
25
30
2.04 2.94 3.04 3.94 4.04 5.96
X
X
T
Figure 4.14: Cyg A: reconstructed spectra of selected pixels and point-like sources obtained by the
dierent approaches. Each considered pixel (P) or source (S) is highlighted with a red circle on the
estimated model image of HyperSARA at channel ν32 = 5.96 GHz displayed in (a).
63 4.6 Conclusions
images.
4.6 Conclusions
In this chapter, we presented the HyperSARA approach for wideband RI image reconstruction.
It consists in solving a sequence of weighted nuclear norm and 2,1minimization problems pro-
moting low-rankness and joint average sparsity of the wideband model cube in 0sense. Hyper-
SARA is able to achieve higher resolution of the reconstructed wideband model cube thanks to
the weighted nuclear norm that enforces inter-channel correlation. The overall dynamic range is
also enhanced thanks to the weighted 2,1norm that rejects decorrelated artefacts present on the
dierent channels. The eciency of HyperSARA was validated on simulations and VLA obser-
vations in comparison with the single-channel imaging approach SARA and the CLEAN-based
wideband imaging algorithm JC-CLEAN. As opposed to the CLEAN-based methods, the sophis-
ticated priors of HyperSARA come at the expense of increased computational cost and memory
requirements. To mitigate this eect, we adopt the primal-dual algorithmic structure dened in
the context of the theory of convex optimization, owing to its highly interesting functionalities for
wideband RI imaging. We have leveraged the preconditioning functionality to provide accelerated
convergence. The functionality of parallelization of the dierent functions and operators involved
in the minimization task was also promoted as a way to spread the computational cost and memory
requirements due to the large data volumes over a multitude of processing CPU cores with limited
resources.
Although HyperSARA is scalable to large data volumes, it can be prohibitive for very large
image cubes. This is because the complex prior terms, namely the nuclear norm and 2,1norm, are
not separable and the whole image cube Xis stored and processed in a single CPU core at each
iteration (See Algorithm 3, Steps 9and 12). Moreover, the proximity operator of the nuclear norm
in Step 9requires an SVD operation of the full image cube at each iteration which can be costly
for big image cubes (see Appendix .2 for the exact expressions of all the proximity operators). To
overcome this bottleneck and establish the full scalability potential of our approach, we present
Faceted HyperSARA in the next chapter.
Chapter 4: Wideband super-resolution imaging in RI 64
Channel ν1= 1.444 GHz
Figure 4.15: G055.7+3.4: recovered images of channel ν1= 1.444 GHz at 2times the nominal
resolution at the highest frequency νL. From top to bottom: estimated model images of the proposed
approach HyperSARA, estimated model images of the monochromatic approach SARA and estimated
restored images of JC-CLEAN using Briggs weighting. The full images are displayed in log10 scale (rst
column) as well as zoom on the central region (second column).
65 4.6 Conclusions
Channel ν30 = 1.89 GHz
Figure 4.16: G055.7+3.4: recovered images of channel ν30 = 1.89 GHz at 2times the nominal
resolution at the highest frequency νL. From top to bottom: estimated model images of the proposed
approach HyperSARA, estimated model images of the monochromatic approach SARA and estimated
restored images of JC-CLEAN using Briggs weighting. The full images are displayed in log10 scale (rst
column) as well as zoom on the central region (second column).
Chapter 4: Wideband super-resolution imaging in RI 66
(a) Channel ν1= 1.444 GHz
(b) Channel ν30 = 1.89 GHz
Figure 4.17: G055.7+3.4: naturally-weighted residual images obtained by the proposed approach
HyperSARA (left) and the monochromatic approach SARA (right). (a) Channel ν1= 1.444 GHz, and
(b) Channel ν30 = 1.89 GHz. The aSTD values are 6.55 ×105and 8.37 ×105, respectively.
67 4.6 Conclusions
(a) Channel ν1= 1.444 GHz
(b) Channel ν30 = 1.89 GHz
Figure 4.18: G055.7+3.4: Briggs-weighted residual images obtained by the proposed approach
HyperSARA (left) and JC-CLEAN (right). (a) Channel ν1= 1.444 GHz, and (b) Channel ν30 = 1.89
GHz. The aSTD values are 1.12 ×104and 7.75 ×105, respectively.
Chapter 4: Wideband super-resolution imaging in RI 68
(a) Estimated model image of HyperSARA at channel ν30 = 1.89 GHz
(b) Selected source S1
(d) Selected pixel P1
0
1
2
3
4
5
6
710-3
1.44 1.5 1.71 1.76 1.84 1.89
X
X
T
(c) Selected source S2
0
0.02
0.04
0.06
0.08
0.1
1.44 1.5 1.71 1.76 1.84 1.89
X
X
T
(e) Selected source S3
0
0.05
0.1
0.15
0.2
0.25
1.44 1.5 1.71 1.76 1.84 1.89
X
X
T
Figure 4.19: G055.7+3.4: reconstructed spectra of selected pixels and point-like sources obtained by
the dierent approaches. Each considered pixel (P) or source (S) is highlighted with a red circle on the
estimated model image of HyperSARA at channel ν30 = 1.89 GHz (rst row).
Chapter 5
Faceted HyperSARA for wideband
RI imaging: when precision meets
scalability
Contents
5.1 Motivation .................................... 70
5.2 Proposed faceting and Faceted HyperSARA approach .......... 71
5.2.0.1 Spectral faceting .......................... 71
5.2.0.2 Spatial faceting ........................... 73
5.3 Algorithm and implementation ........................ 75
5.3.1 Faceted HyperSARA algorithm ....................... 75
5.3.2 Underpinning primal-dual forward-backward algorithm .......... 76
5.3.3 Parallel algorithmic structure ........................ 77
5.3.4 MATLAB implementation .......................... 78
5.4 Validation on synthetic data .......................... 78
5.4.1 Simulation setting .............................. 79
5.4.1.1 Images and data .......................... 79
5.4.1.2 Spatial faceting ........................... 80
5.4.1.3 Spectral faceting .......................... 82
5.4.2 Hardware ................................... 82
5.4.3 Evaluation metrics .............................. 82
5.4.4 Results and discussion ............................ 83
5.4.4.1 Spatial faceting ........................... 83
5.4.4.2 Spectral faceting .......................... 84
69
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 70
5.5 Validation on real data ............................. 84
5.5.1 Dataset description and imaging settings .................. 90
5.5.2 Hardware ................................... 91
5.5.3 Evaluation metrics .............................. 91
5.5.4 Results and discussion ............................ 92
5.5.4.1 Imaging quality ........................... 92
5.5.4.2 Computing cost ........................... 94
5.6 Conclusions .................................... 94
5.1 Motivation
Modern proximal optimization algorithms have shown the potential to signicantly outperform
state-of-the-art approaches thanks to their ability to inject complex image models to regularize the
inverse problem for image formation from visibility data. More specically to wideband RI imaging,
HyperSARA, proposed in the previous chapter, has been shown to outperform wideband CLEAN
variant, dubbed JC-CLEAN [85] through the recovery of high-delity high-resolution image cubes.
HyperSARA, powered by the primal-dual forward-backward (PDFB) algorithm (3.4.2), enables the
decomposition of data into blocks and parallel processing of the block-specic data-delity terms
of the objective function, which provides scalability to large data volumes. HyperSARA however
models the image cube as a single variable, and the computational and storage requirements
induced by complex regularization terms can be prohibitive for a very large image size.
We address this bottleneck in this chapter. We propose to decompose the target image cube
into regular, content-agnostic, spatially overlapping spatio-spectral facets, with which are associ-
ated facet-specic regularization terms in the objective function, and further exploit the splitting
functionality of PDFB to enable parallel processing of the regularization terms and ultimately
provide further scalability. Our proposed approach is dubbed “Faceted HyperSARA”. Note that
faceting is not a novel paradigm in RI imaging: it has often been considered for calibration purposes
in the context of wide-eld imaging, assuming piece-wise constant direction-dependent eects. For
instance, [75] proposed an image tessellation scheme for LOFAR wide-eld images, which has been
leveraged by [115] in the context of wide-eld wideband calibration and imaging. However, to
the best of our knowledge, except for [80], facet imaging has hitherto been essentially addressed
with CLEAN-based algorithms. This class of approaches not only lacks theoretical convergence
guarantees but also does not oer much exibility to accommodate advanced regularization terms.
In contrast with [80], the proposed faceting approach does not need to be tailored to the con-
tent of the image, and thus oers more exibility to design balanced facets exclusively based on
computational considerations.
71 5.2 Proposed faceting and Faceted HyperSARA approach
The reconstruction performance of the proposed imaging approach is evaluated against Hyper-
SARA and SARA on synthetic data. We further validate the performance and scalability potential
of our approach through the reconstruction of a 15 GB image cube of Cyg A from 7.4 GB of VLA
observations across 480 channels. Our results conrm the recent discovery of Cyg A2, a second
super-massive black hole in Cyg A [40].
This chapter is structured as follows. Section 5.2 introduces the proposed faceted prior model
underpinning Faceted HyperSARA. The associated algorithm is described in Section 5.3, along
with the dierent levels of parallelization exploited in the proposed matlab implementation. Per-
formance validation is rst conducted on synthetic data in Section 5.4. We successively evaluate
the inuence of spectral and spatial faceting for a varying number of facets and spatial overlap,
both in terms of reconstruction quality and computing time. Section 5.5 is focused on the vali-
dation of the proposed approach on real VLA observations in terms of precision and scalability.
Conclusions and perspectives are reported in Section 5.6.
This work has been published in [118120].
5.2 Proposed faceting and Faceted HyperSARA approach
The proposed Faceted HyperSARA approach builds on the HyperSARA method explained in
Chapter 4, distributing both the average sparsity and the low-rankness prior over multiple spatio-
spectral facets to alleviate the computing and storage requirements inherent to HyperSARA. In
particular, we propose to decompose the 3D image cube into Q×Cspatio-spectral facets, as
illustrated in Figure 5.1 and detailed below.
5.2.0.1 Spectral faceting
The wideband image cube can rst be decomposed into separate image sub-cubes composed of
a subset of the frequency channels, with a separate prior for each sub-cube. Since the data-
delity terms are channel-specic, the overall objective function of HyperSARA (4.8) reduces to
the sum of independent objectives for each sub-cube. The smaller-size wideband imaging sub-
problems (smaller data sets, and smaller image volumes) can thus be solved independently in
parallel, oering scalability. Taken to the extreme, this simple spectral faceting can be used to
separate all channels and proceed with single-channel reconstructions (leading to SARA), however
simultaneously loosing completely the advantage of correlations between frames to improve image
precision. The key point is to keep an appropriate number of frames per sub-cube in order to
optimally take advantage of this correlation. Also, given the data at higher frequency channels
provide higher spatial frequency information than the lower frequency channels, it is of critical
importance that the whole extent of the frequency band of observation be exploited in each channel
reconstruction. In this context, we propose to decompose the cube into channel-interleaved spectral
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 72
(a) Full image cube (b) Spectral sub-cubes (c) Facets & weights
Figure 5.1: Illustration of the proposed faceting scheme, using a 2-fold spectral interleaving process and
9-fold spatial tiling process. The full image cube variable (a) is divided into two spectral sub-cubes (b)
with interleaved channels (for a 2-fold interleaving, even and odd channels respectively dene a
sub-cube). Each sub-cube is spatially faceted. A regular tessellation (dashed red lines) is used to dene
spatio-spectral tiles. The spatio-spectral facets result from the augmentation of each tile to produce an
overlap between facets (solid red lines). Panel (c) shows a single facet (left), as well as the spatial
weighting scheme (right) with linearly decreasing weights in the overlap region. Note that, though the
same tiling process is underpinning the nuclear norm and 21 norm regularization terms, the denition of
the appropriate overlap region is specic to each of these terms (via the selection operators Sqand e
Sqin
(5.3)).
73 5.2 Proposed faceting and Faceted HyperSARA approach
sub-cubes, each of which results from a uniform sub-sampling of the whole frequency band (see
Figure 5.1 (b)). We thus decompose the original inverse problem (2.4) into Cindependent, channel-
interleaved sub-problems, each considering Lcchannels from the original data cube, with L=
L1+. . . +LC. For each sub-cube c∈ {1, . . . , C },yc,l,b CMc,l,b denotes the visibilities associated
with the channel l∈ {1, . . . , Lc}and data-block b∈ {1, . . . , B}, and by Φc,l,b and ϵc,l,b the
associated measurement operator and 2ball radius, respectively. The proposed minimization
problem is thus formulated as
minimize
X=(Xc)1cCRN×L
+
C
X
c=1 Lc
X
l=1
B
X
b=1
ιB(yc,l,bc,l,b )Φc,l,bxc,l +rc(Xc)!,(5.1)
where, for every c∈ {1, . . . , C}, the indices (l, b)∈ {1, . . . , Lc}×{1, . . . , B}refer to a data block b
of channel land ιB(yc,l,bc,l,b)denotes the indicator function of the 2ball B(yc,l,b, ϵc,l,b )(4.9). This
indicator function acts as a data-delity term, in that it ensures the consistency of the modeled
data with the measurements and reect the Gaussian nature of the noise [26]. The notation
Xc= (xc,l)1lLcRN×Lcis the c-th sub-cube of the full image cube X, with xc,l RNthe l-th
image of the sub-cube Xc, and rc:RN×Lc],+]is a sub-part of the regularization term r,
only acting on the c-th sub-cube. Finally, note that an additional non-negativity prior is imposed
as for all approaches of the SARA family focusing on intensity imaging, with the aim to preserve
the physical consistency of the estimated surface brightness. This generalizes to the polarization
constraint when solving for all the Stokes parameters.
5.2.0.2 Spatial faceting
Faceting can also be performed in the spatial domain by decomposing the regularization term for
each spectral sub-cube into a sum of terms acting only locally in the spatial domain (see Figure 5.1
(c)). In this context, the resulting facets need to overlap to avoid edge eects, so that the overall
objective function (5.1) takes the form of the sum of inter-dependent facet-specic objectives. This
inter-dependence precludes separating the imaging problem into facet problems. However, the
splitting functionality of PDFB (explained in Section 3.4.2) can be exploited to enable parallel
processing of the facet-specic regularization terms and ensure further scalability (see Section 5.3).
On the one hand, we propose to split the average sparsity dictionary Ψinto Qsmaller wavelet
decomposition, leveraging the wavelet splitting technique introduced in [95, Chapter 4]. [95] pro-
posed an exact implementation of the discrete wavelet transform distributed over multiple facets.
In this context, the Daubechies wavelet bases are decomposed into a collection of facet-based op-
erators Ψ
qRTq×Nqacting only on the q-th facet of size Nq, with T=T1+. . . +TQ. The
overlap needed to ensure an exact faceted implementation of the wavelet transforms is composed
of a number of pixels between 15(2s2) and 15(2s1) in each spatial direction [95, Section 4.1.4],
with sbeing the level of decomposition. In practice, the overlap ensures that each facet contains
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 74
all the information needed to compute the convolutions underlying the discrete wavelet transforms
locally.
On the other hand, we consider a faceted low-rankness prior enforced by the sum of nuclear
norm priors on essentially the same overlapping facets as those introduced for the wavelet decom-
position. This provides a more tractable alternative to the global low-rankness prior encoded by
the nuclear norm of HyperSARA. Unlike the wavelet decomposition, there is no equivalent faceted
implementation of the eigenvalue decomposition. To mitigate reconstruction artifacts possibly
resulting from the faceting of the 3D image cube, for each facet q∈ {1, . . . , Q}, of size e
Nq, we
propose to introduce a diagonal matrix Dq]0,+[e
Nq×e
Nqensuring a smooth transition from the
borders of one facet to its neighbors. A natural choice consists in down-weighting the contribution
of pixels involved in multiple facets. A tampering window decaying in the overlapping regions
is considered while ensuring that the sum of all the weights associated with each pixel is equal
to unity. In this work, we consider weights in the form of a 2D triangular apodization window
as considered by [79] (see Figure 5.1 (c)). The size of the overlap for this term is taken as an
adjustable parameter of the Faceted HyperSARA approach to further promote local correlations.
Its inuence is investigated in Section 5.4. In practice, a larger overlap region than the one taken
for the faceted wavelet transform is considered, taking advantage of the overlap already imposed
by the faceted implementation of the wavelet decomposition and the associated 2,1norm priors.
The spatial faceting procedure therefore results in splitting the original log-sum priors of Hy-
perSARA in (4.2) into a sum of inter-dependent facet-specic log-sum priors, dening the Faceted
HyperSARA prior:
rc(Xc) =
Q
X
q=1
µc
Jc,q
X
j=1
log |σj(Dqe
SqXc)|+υ+µc
Tq
X
n=1
log [Ψ
qSqXc]n2+υ
.(5.2)
In (5.2), (µc, µc, υ)]0,+[3are regularization parameters, and, for every q∈ {1, . . . , Q},Jc,q
min( e
Nq, Lc)is the rank of Dqe
SqXc,σj(Dqe
SqXc)1jJc,q are the singular values of Dqe
SqXc,
and [Ψ
qSqXc]
ndenotes the n-th row of Ψ
qSqXc. The operators e
SqRe
Nq×Nand SqRNq×N
extract spatially overlapping spatio-spectral facets from the full image cube for the low-rankness
prior and the average sparsity prior, respectively. These two operators only dier in the number of
overlapping pixels considered, which is dened as an adjustable parameter for e
Sq, and prescribed
by [95] for Sq(Figure 5.1). Each facet relies on a spatial decomposition of the image into non-
overlapping tiles (see Figure 5.1 (b), delineated by dashed red lines), each overlapping with its
top and left spatial neighbor. In the following, the overlapping regions will be referred to as the
borders of a facet, in contrast with its underlying tile (see Figure 5.1). A border facet, i.e., which
does not admit a neighbor in one of the two spatial dimensions, has the same dimension as the
underlying tile in the directions where it does not admit a neighbor (e.g., corner facets have the
same dimension as the underlying tile). Note that HyperSARA corresponds to the case Q=C= 1.
75 5.3 Algorithm and implementation
As expected, the prior (5.2) reads as a sum of inter-dependent facet-specic priors. Crucially,
when minimizing the convex objective (5.1) with a reweighted procedure, the splitting functionality
of PDFB can now be further exploited to enable parallel processing of these facet-specic priors.
5.3 Algorithm and implementation
The parallel algorithmic structure of Faceted HyperSARA is described in this section, leveraging
PDFB (3.4.2) within a reweighting approach.
5.3.1 Faceted HyperSARA algorithm
To eciently address the log-sum prior underpinning the Faceted HyperSARA prior, we consider a
reweighted 1approach, which consists in successively solving convex optimization problems with
weighted 1norms [20]. The proposed Faceted HyperSARA algorithm is described in Algorithm 4.
At each iteration kN, the Faceted HyperSARA log-sum prior (5.2) is locally approximated by
the weighted hybrid norm prior
erc(Xc,X(k)
c) =
Q
X
q=1 µcDqe
SqXc,ωq(X(k)
c)+µcΨ
qSqXc2,1,ωq(X(k)
c),(5.3)
where for every q∈ {1, . . . , Q}, the weights ωq(X(k)
c) = ωq,j (X(k)
c)1jJc,q and ωq(X(k)
c) =
ωq,n(X(k)
c)1nTqare given by
ωq,j (X(k)
c) = σjDqe
SqX(k)
c+υ1,(5.4)
ωq,n(X(k)
c) = [Ψ
qSqX(k)
c]n2+υ1.(5.5)
Then, for each sub-cube c∈ {1, . . . , C}, the associated minimization problem
minimize
XcRN×L
+
Lc
X
l=1
B
X
b=1
ιB(yc,l,bc,l,b )Φc,l,bxc,l +
Q
X
q=1 µcDqe
SqXc,ωq(X(k)
c)+µcΨ
qSqXc2,1,ωq(X(k)
c)
(5.6)
is solved using the PDFB algorithm described in Algorithm 5. At the beginning of the algorithm,
the weights are initialized to one (see Algorithm 4line 4, where the notation 1Jcstands for the
vector of size Jcwith all coecients equal to 1, and Jc=Jc,1+. . . +Jc,Q). Note that the weights
dened in (5.4)-(5.5) are multiplied by the regularization parameter υin Algorithm 4, which is
equivalent to rescaling the sub-problem (5.6) by υ. This does not aect the set of minimizers of the
global problem. In a similar fashion to HyperSARA (Section 4.3.4), the regularization parameter
υin (5.4)-(5.5) should decrease from one iteration kto another by a factor 80% to improve the
convergence rate and the stability of the algorithm.
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 76
A complete description of the proposed PDFB algorithm used to solve the sub-problems (5.6)
(see Algorithm 4line 9) is provided in the next section.
5.3.2 Underpinning primal-dual forward-backward algorithm
At each iteration kN, line 9of Algorithm 4requires solving a sub-problem of the form (5.6),
corresponding to the approximation of (5.1) at X(k). The main advantage of a primal-dual algo-
rithm such as PDFB lies in the possibility of updating most of the variables to be estimated in
parallel, without resorting to costly operator inversions or sub-iterations. In this work, we resort to
a preconditioned variant of PDFB, which reduces the number of iterations necessary to converge.
A graphical illustration of the algorithm, instantiated for problem (5.6), is given in Figure 5.3. A
formal description is reported in Algorithm 5. Note that PDFB can simultaneously handle data-
delity and the image priors in parallel by associating a separate dual variable to each of them
(see Algorithm 5).
In Algorithm 5, the three main terms appearing in problem (5.6) (i.e., low-rankness prior,
average sparsity prior, and data-delity term) are handled in parallel, in the dual domain, through
their proximity operator. Their exact expression is provided in Appendix .2. First, the faceted
low-rankness prior is handled in lines 11-13 by computing in parallel the proximity operator of the
per facet weighted nuclear norms. Second, the average sparsity prior is addressed in lines 15-17
by computing the proximity operator of the weighted 2,1norm in parallel. Finally, the data-
delity terms are handled in parallel in lines 19-21 by computing, for every (c, l, b), the projection
onto the 2balls B(yc,l,b, ϵc,l,b)with respect to the metric induced by the diagonal matrices Uc,l,b,
chosen using the preconditioning strategy explained in Section 4.3.2. More precisely, their diagonal
coecients are the inverse of the sampling density in the vicinity of the probed Fourier modes. The
projections onto the 2balls for the metric induced by Uc,l,b do not admit an analytic expression,
and thus need to be approximated numerically through sub-iterations. In this work, we resort to
FISTA [9].
In a similar fashion to Section 4.3.2, Algorithm 5is guaranteed to converge to a global solution
to problem (5.6), for a given sub-cube c∈ {1, . . . , C}, provided that the preconditioning matrices
(Uc,l,b)c,l,b and the parameters (τ , ζ, η, κ)satisfy the following condition:
τζe
S2
S+ηU1/2
cΦc2
S+κΨ2
S<1(5.7)
The notation ∥·∥Sdenotes the spectral norm of an operator, e
S= (e
Sq)1qQand for every
XcRN×Lc,U1/2
cΦc(Xc) = U1/2
c,l,bΦc,l,b xc,l1lLc
1bB
.In particular, we propose to set these
parameters as follows
ζ=1
S2
S
= 1, η =1
U1/2
cΦc2
S
and κ=1
Ψ2
S
.(5.8)
77 5.3 Algorithm and implementation
In this setting, convergence is guaranteed for all 0< τ < 1/3.
We recall that PDFB can accommodate randomization in the update of the variables, e.g., by
randomly selecting a subset of the data and facet dual variables to be updated at each iteration.
This procedure can signicantly alleviate the memory load per node [93] at the expense of an
increased number of iterations for the algorithm to converge. This feature, which has been specif-
ically investigated in Appendix .4, is not leveraged in the implementation of Algorithm 5used for
the experiments reported in Sections 5.4 and 5.5.
5.3.3 Parallel algorithmic structure
To solve a spectral sub-problem, c∈ {1, . . . , C }, dierent parallelization strategies can be adopted,
depending on the computing resources available and the size of the problem to be addressed. We
propose to divide the variables to be estimated into the two following groups of computing cores.
Data cores: Each core involved in this group is responsible for the update of several dual
variables vc,l,b CMc,l,b associated with the data-delity terms (see Algorithm 5line 20).
These cores produce auxiliary variables e
vc,l,b RNof single-channel image size, each assumed
to be held in the memory of a single core (line 21). Note that the Fourier transform computed
for each channel lin line 7is performed once per iteration on the data core (l, 1). Each data
core (l, b), with b∈ {2, . . . , B}, receives only a few coecients of the Fourier transform of xl
from the data core (l, 1), selected by the operator Mc,l,b (line 9);
Facet cores: Each worker involved in this group, composed of Qcores, is responsible for the
update of an image tile (i.e., a portion of the primal variable) and the dual variables Pc,q
and Wc,q associated with the low-rankness and the joint average sparsity priors, respectively
(Algorithm 5, lines 12 and 16). Note that the image cube is stored across dierent facet cores,
which are responsible for updating their image tile (line 26). Since the facets underlying the
proposed prior overlap, communications involving a maximum of 4 contiguous facet cores are
needed to build the facet borders prior to updating the facets independently in the dual space
(Algorithm 5lines 1117). Values of the tile of each facet are broadcast to cores handling
neighboring facets in order to update their borders (Algorithm 5line 5, see Figure 5.2(a)).
In a second step, parts of the facet tiles overlapping with borders of nearby facets need to be
updated before each tile is updated independently in the primal space (Algorithm 5line 24).
More precisely, values of the parts of the borders overlapping with the tile of each facet are
broadcast by the workers handling neighboring facets, and averaged (see Figure 5.2(b)).
The complexity of the proposed algorithmic structure with parallelized data and facet cores
is dened by the size of one data block and the size of one image facet. The only requirement
for the algorithm to scale to big data and image dimensions is to have multitude of processing
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 78
Nx,q
Ny,q
Lc
(a) Broadcast values of the tile before facet update in
dual space
Nx,q
Ny,q
Lc
L
L
L
(b) Broadcast and average borders before tile update
in primal space
Figure 5.2: Illustration of the communication steps involving a facet core (represented by the top-left
rectangle in each sub-gure) and a maximum of three of its neighbours. The tile underpinning each facet,
located in its bottom-right corner, is delineated in thick black lines. At each iteration, the following two
steps are performed sequentially. (a) Facet borders need to be completed before each facet is updated
independently in the dual space (Algorithm 5lines 1117): values of the tile of each facet (top left) are
broadcast to cores handling the neighbouring facets in order to update their borders (Algorithm 5line 5).
(b) Parts of the facet tiles overlapping with borders of nearby facets need to be updated before each tile
is updated independently in the primal space (Algorithm 5line 24): values of the parts of the borders
overlapping with the tile of each facet are broadcast by the cores handling neighbouring facets, and
averaged.
cores. Leveraging the advanced HPC servers (see Section 5.4.2 for more details) with thousands of
computing cores, the proposed algorithm has the potential to scale to big data regimes expected
with the new generation telescopes.
5.3.4 MATLAB implementation
Amatlab implementation of Algorithms 4and 5is available on the Puri-Psi webpage. Both
HyperSARA and Faceted HyperSARA rely on MPI-like matlab parallelization features based on
the spmd matlab function, using composite matlab variables to handle parameters distributed
across several cores (e.g., for the wideband image cube). In this setting, 1 CPU core specically
ensures communication synchronization between the dierent computing cores, and is referred to
as master core in the following.
5.4 Validation on synthetic data
In this section, the impact of spatial faceting is rst assessed in terms of both reconstruction
quality and computing time for a single spectral sub-problem, using a varying number of facets
and a varying size of the overlapping regions. The impact of spectral faceting on the reconstruction
performance of Faceted HyperSARA is then quantied for a single underlying spatial tile (Q= 1).
79 5.4 Validation on synthetic data
1
Data core 1
xt1
c,1
Φc,1
FB step
prox
| {z }
Backward step
Forward step
z }| {
{ · · · ·}
Φ
c,1
Data block 1
··· ···
Data core Lc
xt1
c,Lc
Φc,Lc
FB step
prox
| {z }
Backward step
Forward step
z }| {
{ · · · ·}
Φ
c,Lc
Data block Lc
Facet core 1
e
S1Xt1
c,S1Xt1
c
Spatio-spectral facet 1
FB step
prox
| {z }
Backward step
Forward step
z }| {
{ · · · ·}
e
S1Xt
c,S1Xt
c
··· ···
Facet coreQ
e
SQXt1
c,SQXt1
c
Spatio-spectral facet Q
FB step
prox
| {z }
Backward step
Forward step
z }| {
{ · · · ·}
e
SQXt
c,SQXt
c
MNRAS 000, 000–000 (0000)
Figure 5.3: Illustration of the two groups of cores described in Section 5.3 with the main steps involved
in PDFB (Algorithm 5) applied to each independent sub-problem c∈ {1,...,C}, considering Qfacets
(along the spatial dimension) and B= 1 data block per channel. Data cores handle variables of the size
of data blocks (Algorithm 5lines 1921), whereas facet cores handle variables of the size of a
spatio-spectral facet (Algorithm 5lines 1117), respectively. Communications between the two groups
are represented by colored arrows. Communications between facet cores, induced by the overlap between
the spatio-spectral facets, are illustrated in Figure 5.2.
Results are compared with those of both SARA and HyperSARA.
5.4.1 Simulation setting
5.4.1.1 Images and data
Following the procedure described in Section 4.4, a wideband model image composed of Lspectral
channels is simulated from an image of the W28 supernova remnant of size N. The measurement
operator relies on a realistic VLA uv-coverage generated within the frequency range [ν1, νL] =
[1,2] GHz with uniformly sampled channels and total observation time of 6 hours. Note that
the uv-coverage associated with each channel indexed lcorresponds to the reference uv-coverage
at the frequency ν1scaled by the factor νl1. The data are corrupted by an additive, zero-
mean complex white Gaussian noise of variance ϱ2. An iSNR (4.13) of 60 dB is considered. As
explained in Section 4.2.2), the regularization parameters of HyperSARA can be set as µ= 1, and
µ, computed from the dirty wideband model cube as µ=Xdirty/ΨXdirty 2,1, where Xdirty
denotes the dirty image cube. For Faceted HyperSARA, we have observed that setting µc= 1
and µc= 102Xdirty
c/ΨXdirty
c2,1leads to a good trade-o to recover high resolution, high
dynamic range model cubes. Given the higher computational cost of HyperSARA, the size of the
data is chosen so that it can be run in a reasonable amount of time for the dierent simulation
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 80
Algorithm 4: Faceted HyperSARA approach
Input: X(0) = (X(0)
c)c,P(0) = (P(0)
c)c,W(0) = (W(0)
c)c,v(0) = (v(0)
c)c
1k0;
2Initialization of the weights
3for c= 1 to Cdo
4θ(0)
c= (θ(0)
c,q )1qQ=1T;θ(0)
c= (θ(0)
c,q )1qQ=1Jc;
5while stopping criterion not satised do
6Solve spectral sub-problems in parallel
7for c= 1 to Cdo
8Run Algorithm 5
9(X(k+1)
c,P(k+1)
c,W(k+1)
c,v(k+1)
c) = PDFBX(k)
c,P(k)
c,W(k)
c,v(k)
c,θ(k)
c,θ(k)
c;
10 for q= 1 to Qdo
11 Update weights: low-rankness prior
12 θ(k+1)
c,q =υωq(X(k+1)
c);// using (5.4)
13 Update weights: joint average sparsity prior
14 θ(k+1)
c,q =υωq(X(k+1)
c);// using (5.5)
15 kk+ 1;
Result: X(k),P(k),W(k),v(k)
scenarios described below.
5.4.1.2 Spatial faceting
The performance of Faceted HyperSARA is rst evaluated with C= 1 (number of facets along
the spectral dimension) for dierent parameters of the spatial faceting. Data generated from a
N= 1024×1024 image composed of L= 20 channels are considered, with Ml= 0.5Nmeasurements
per channel. The assessment is conducted with (i) varying Q(number of facets along the spatial
dimensions) and a xed overlap; (ii) a xed number of facets and a varying spatial overlap for the
nuclear norm regularization. Additional details can be found in the following lines.
Varying overlap: Reconstruction performance and computing time are evaluated with C= 1
and Q= 16 (4 facets along each spatial dimension) and a varying size of the overlapping
region for the faceted nuclear norm (0%, 6%, 20%, 33% and 50% of the spatial size of the
facet, corresponding to 0, 16, 64, 128 and 256 pixels respectively) in each of the two spatial
dimensions. Note that the overlap for the 2,1prior is a xed parameter [95]. The comparison
is conducted between SARA (with eµ= 103), HyperSARA (with µ= 1 and µ= 103) and
Faceted HyperSARA (with µc= 1 and µc= 105).
Varying number of facets: The reconstruction performance and computing time of Faceted
81 5.4 Validation on synthetic data
Algorithm 5: The PDFB algorithm underpinning Faceted HyperSARA
Data: (yc,l,b)l,b ,l∈ {1, . . . , Lc},b∈ {1, . . . , B}
Input: X(0)
c,P(0)
c=P(0)
c,q q,W(0)
c=W(0)
c,q q,v(0)
c=v(0)
c,l,bc,l,b ,θc=θc,q1qQ,
θc=θc,q1qQ
Parameters: (Dc,q)q,(Uc,l,b )l,b,(ϵc,l,b )l,b,µc,µc,τ,ζ,η,κ
1t0;ξ= +;ˇ
X(0)
c=X(0)
c;
2while ξ > 105do
3Broadcast auxiliary variables
4for q = 1 to Qdo
5e
X(t)
c,q =e
Sqˇ
X(t)
c;ˇ
X(t)
c,q =Sqˇ
X(t)
c;
6for l= 1 to Lcdo
7b
x(t)
c,l =FZˇ
x(t)
c,l ;// Fourier transforms
8for b = 1 to Bdo
9b
x(t)
c,l,b =Mc,l,b b
x(t)
c,l ;// send to data cores
10 Update low-rankness variables [facet cores]
11 for q = 1 to Qdo
12 P(t+1)
c,q =IJqproxζ1µc∥·∥,θc,q P(t)
c,q +Dc,q e
X(t)
c,q;
13 e
P(t+1)
c,q =D
qP(t+1)
c;
14 Update sparsity variables [facet cores]
15 for q = 1 to Qdo
16 W(t+1)
c,q =ITqproxκ1µc∥·∥2,1,θc,q W(t)
c,q +Ψ
qˇ
X(t)
c,q;
17 f
W(t+1)
c,q =ΨqW(t+1)
c,q ;
18 Update data-delity variables [data cores]
19 for (l,b) = (1, 1) to (Lc,B)do
20 v(t+1)
c,l,b =Uc,l,bIMc,l,b proxUc,l,b
ιB(yc,l,bc,l,b )U1
c,l,bv(t)
c,l,b +Θc,l,bGc,l,b b
x(t)
c,l,b;
21 e
v(t+1)
c,l,b =G
c,l,bΘ
c,l,bv(t+1)
c,l,b ;
22 Inter node communications
23 for l = 1 to Lcdo
24 h(t)
c,l =
Q
X
q=1ζe
S
qe
p(t+1)
c,q,l +κS
qe
w(t+1)
c,q,l +ηZF
B
X
b=1
M
c,l,be
v(t+1)
c,l,b ;
25 Update image tiles [on facet cores, in parallel]
26 X(t+1)
c= proxιRN×Lc
+X(t)
cτH(t)
c;
27 ˇ
X(t)
c= 2X(t+1)
cX(t)
c;// communicate facet borders
28 ξ=X(t+1)
cX(t)
cF
X(t)
cF
;
29 tt+ 1;
Result: X(t)
c,P(t)
c,W(t)
c,v(t)
c
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 82
HyperSARA are reported for experiments with Q∈ {4,9,16}(corresponding to 2, 3 and 4
facets along each spatial dimension) with a xed overlap corresponding to 50% of the spatial
size of a facet. The regularization parameters are set to the same values as those considered
in the experiment with a varying overlap.
5.4.1.3 Spectral faceting
The inuence of spectral faceting is evaluated in terms of computing time and reconstruction quality
from data generated with a ground truth image composed of N= 256 ×256 pixels in L= 100
channels, with Ml=Nmeasurements per channel. The overall reconstruction performance of
SARA (with eµ= 102), HyperSARA (with µ= 1 and µ= 102) and Faceted HyperSARA (with
µc= 1 and µc= 102) with a single facet along the spatial dimension (Q= 1) is compared.
For faceted HyperSARA, a channel-interleaving process with a varying number of facets along the
spectral dimension Cis considered (see Section 5.2 and Figure 5.1 (b)). The simulation scenario
involves facets composed of a varying number of channels Lc(Lc6, 10, 14, 20, 33 and 50
channels for each sub-problem c∈ {1, . . . , C}) obtained by down-sampling the data cube along the
frequency dimension.
5.4.2 Hardware
All the methods compared in this section have been run on a single compute node of Cirrus, one
of the UK’s Tier2 HPC services1. Cirrus is an SGI ICE XA system composed of 280 compute
nodes, each with two 2.1 GHz, 18-core, Intel Xeon E5-2695 (Broadwell) series processors. The
compute nodes have 256 GB of memory shared between the two processors. The system has a
single Inniband FDR network connecting nodes with a bandwidth of 54.5 Gb/s. Note that the
number of cores assigned to each group of cores of Faceted HyperSARA (i.e., data and facet cores)
has been chosen to ensure a reasonable balance between the dierent computing tasks.
5.4.3 Evaluation metrics
Performance is evaluated in terms of global computing time (elapsed real time) and reconstruction
SNR, dened for each channel l∈ {1, . . . , L}as in (4.19). Results are reported in terms of the
average SNR (aSNR) (4.20). Since the above criterion shows limitations to reect the dynamic
range and thus appreciate improvements in the quality of faint emissions, the following criterion
is computed over images in log10 scale
SNRlog,l(xl) = 20 log10 log10 (xl+ε1N)2
log10(xl+ϵ1N)log10(xl+ε1N)2,
1https://epsrc.ukri.org/research/facilities/hpc/tier2/
83 5.4 Validation on synthetic data
where the log10 function is applied term-wise, and εis an arbitrarily small parameter to avoid
numerical issues (εis set to machine precision). Results are similarly reported in terms of the
average log-SNR, dened as
aSNRlog(X) = 1
L
L
X
l=1
SNRlog,l(xl).(5.9)
5.4.4 Results and discussion
5.4.4.1 Spatial faceting
Varying spatial overlap: The results reported in Table 5.1 show that spatial faceting gives
a good reconstruction of high intensity pixels (reected by an aSNR close to HyperSARA).
Even if the performance of the proposed approach does not vary much in terms of aSNR
as the overlap for the faceted nuclear norm increases, the aSNRlog improves signicantly.
This reects the ability of the proposed prior to enhance the estimation of faint emissions
and ner details by promoting local correlations. This observation is further conrmed by
the reconstructed images, reported in Jy/pixel in Figure 5.4 for channel ν1= 1 GHz and in
Figure 5.5 for channel ν20 = 2 GHz, showing that Faceted HyperSARA reconstructs images
with a higher dynamic range (see the zoomed region delineated in white in Figures 5.4
and 5.5). The associated residual images (last row of Figures 5.4 and 5.5) are comparable to
or better than HyperSARA. Note that the regular patterns observed on the residual images
do not result from the faceting, as they are not aligned with the facet borders and appear
for both approaches. From a computational point of view, Table 5.1 shows that increasing
the overlap size results in a moderate increase in the computing time. Overall, an overlap
of 50% gives the best reconstruction SNR for reasonable computing time, and will thus be
considered as a default faceting setting for the real data experiments reported in Section 5.5.
Varying number of facets Qalong the spatial dimension: The reconstruction performance
and computing time reported in Table 5.2 show that Faceted HyperSARA gives an almost
constant reconstruction performance as the number of facets increases, for an overall com-
puting time getting closer to the SARA approach. The dynamic range of the reconstructed
images (second row of Figures 5.4 and 5.5) is notably higher for the faceted approach, as
indicated by the aSNRlog values reported in Table 5.2. These results conrm the potential of
the proposed approach to scale to large image sizes by increasing the number of facets along
the spatial dimensions while ensuring a stable reconstruction level as the number of facets
increases. In particular, the setting Q= 16 is reported to ensure a satisfactory reconstruction
performance for a signicantly reduced computing time.
In both experiments, Faceted HyperSARA has a much lower SNR standard deviation than
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 84
Time (h) aSNR (dB) aSNRlog (dB) CPU cores
SARA 5.89 32.78 (±2.76)-1.74 (±0.83)120
HyperSARA 133.1 38.63 (±0.23)-0.39 (±0.95)22
Faceted nuclear norm overlap (0%) 26.26 37.03 (±2.90 ·103)5.09 (±1.09)36
Faceted nuclear norm overlap (6%) 18.01 37.01 (±1.00 ·103)4.09 (±0.99)36
Faceted nuclear norm overlap (20%) 18.11 36.86 (±0.90 ·103)4.51 (±1.07)36
Faceted nuclear norm overlap (33%) 17.94 36.98 (±1.60 ·103)6.00 (±1.05)36
Faceted nuclear norm overlap (50%) 20.75 37.08 (±1.60 ·103)7.88 (±0.91)36
Table 5.1: Spatial faceting experiment: varying size of the overlap region for the faceted nuclear norm
regularization. Reconstruction performance of Faceted HyperSARA with Q= 16 and C= 1, compared
to HyperSARA (i.e. Faceted HyperSARA with Q=C= 1) and SARA. The results are reported in
terms of reconstruction time, aSNR and aSNRlog (both in dB with the associated standard deviation),
and total number of CPU cores used to reconstruct the full image. The evolution of the aSNRlog, of
specic interest for this experiment, is highlighted in bold face.
HyperSARA and SARA (see Tables 5.1 and 5.2), i.e., ensures a more stable recovery quality
across channels. This results from the stronger spatio-spectral correlations induced by the proposed
faceted regularization, in comparison with both the HyperSARA and SARA priors.
5.4.4.2 Spectral faceting
The results reported in Table 5.3 show that Faceted HyperSARA using channel-interleaved facets
retains most of the overall reconstruction performance of HyperSARA, ensuring a reconstruction
quality signicantly better than SARA. As expected, the reconstruction quality of faint emis-
sions, reected by the aSNRlog values, gradually decreases as fewer channels are involved in each
facet (i.e., as Cincreases). This observation is qualitatively conrmed by the images reported in
Figure 5.6 for channel ν1= 1 GHz and Figure 5.7 for channel ν100 = 2 GHz (in Jy/pixel), for
facets composed of 10 channels each (see the zoomed regions in Figures 5.6 and 5.7). The slight
loss of dynamic range is likely due to the reduction in the amount of data per spectral sub-cube.
Spectral faceting remains however computationally attractive, in that it preserves the overall imag-
ing quality of HyperSARA up to an already signicant amount of interleaving (see discussion in
Section 5.2.0.1), while allowing lower-dimension wideband imaging sub-problems to be considered
(see discussion in Section 5.2). This strategy oers an increased scalability potential to Faceted
HyperSARA over HyperSARA, which may reveal of signicant interest in extreme dimension.
5.5 Validation on real data
In this section, we illustrate both the precision and scalability potential of Faceted HyperSARA
through the reconstruction of a 15 GB image cube of Cyg A from 7.4 GB of VLA data. The
algorithm is mapped on 496 CPU cores on a high performance computing system, achieving a Ter-
aFLOPS proof of concept. The performance of the proposed approach is evaluated in comparison
with the monochromatic imaging approach SARA [88] and the CLEAN-based wideband imaging
85 5.5 Validation on real data
Time (h) aSNR (dB) aSNRlog (dB) CPU cores
SARA 6.23 32.78 (±2.76) -1.74 (±0.83) 120
HyperSARA 133.08 38.63 (±0.23) -0.39 (±0.95) 22
Faceted HyperSARA (Q= 4)42.04 36.58 (±1.80 ·103) 10.19 (±0.88) 24
Faceted HyperSARA (Q= 9)21.60 37.00 (±1.70 ·103) 5.88 (±1.00) 29
Faceted HyperSARA (Q= 16)17.94 37.08 (±1.60 ·103) 7.88 (±1.05) 36
Table 5.2: Spatial faceting experiment: varying number of facets along the spatial dimension Q.
Reconstruction performance of Faceted HyperSARA (C= 1, overlap of 50%), compared to HyperSARA
(i.e. Faceted HyperSARA with Q=C= 1) and SARA. The results are reported in terms of
reconstruction time, aSNR and aSNRlog (both in dB with the associated standard deviation), and total
number of CPU cores used to reconstruct the full image. The evolution of the computing time, of specic
interest for this experiment, is highlighted in bold face.
Time (h) aSNR aSNRlog CPU cores
SARA 0.19 25.04 (±4.06)-6.28 (±0.60) 100
HyperSARA 14.83 31.74(±1.31)-1.24 (±0.57) 7
Faceted HyperSARA (C= 16,Lc6) 1.31 31.05 (±0.98)-3.54 (±1.37) 112
Faceted HyperSARA (C= 10,Lc= 10) 1.87 31.48 (±0.82)-3.26 (±1.43) 70
Faceted HyperSARA (C= 7,Lc14) 2.36 31.68 (±0.90)-2.90 (±1.38) 49
Faceted HyperSARA (C= 5,Lc= 20) 3.31 31.84 (±0.92)-2.33 (±0.91) 35
Faceted HyperSARA (C= 3,Lc33) 5.10 32.00 (±1.04)-2.33 (±1.07) 21
Faceted HyperSARA (C= 2,Lc= 50) 7.56 31.97 (±1.08)-1.63 (±0.64) 14
Table 5.3: Spectral faceting experiment: reconstruction performance of Faceted HyperSARA with a
varying number of spectral sub-problems Cand Q= 1, compared to HyperSARA (i.e. Faceted
HyperSARA with Q=C= 1) and SARA. The results are reported in terms of reconstruction time,
aSNR and aSNRlog (both in dB with the associated standard deviation) and total number of CPU cores.
The reconstruction performance of Faceted HyperSARA, specically investigated in this experiment, is
highlighted in bold face.
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 86
Channel ν1= 1 GHz
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-3
-2
-1
0
1
2
3
10-4
-3
-2
-1
0
1
2
3
10-4
Figure 5.4: Spatial faceting analysis for synthetic data: reconstructed images (in Jy/pixel) reported in
log10 scale for channel ν1= 1 GHz for Faceted HyperSARA with Q= 16 and C= 1 (left), and
HyperSARA (i.e. Faceted HyperSARA with Q=C= 1, right). From top to bottom are reported the
ground truth image, the reconstructed and residual images. The overlap for the faceted nuclear norm
regularization corresponds to 50% of the spatial size of a facet. The non-overlapping tiles underlying the
denition of the facets are delineated on the residual images in red dotted lines, with the central facet
displayed in continuous lines.
87 5.5 Validation on real data
Channel ν20 = 2 GHz
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-3
-2
-1
0
1
2
3
10-4
-3
-2
-1
0
1
2
3
10-4
Figure 5.5: Spatial faceting analysis for synthetic data: reconstructed images (in Jy/pixel) reported in
log10 scale for channel ν20 = 2 GHz for Faceted HyperSARA with Q= 16 and C= 1 (left), and
HyperSARA (i.e. Faceted HyperSARA with Q=C= 1, right). From top to bottom are reported the
ground truth image, the reconstructed and residual images. The overlap for the faceted nuclear norm
regularization corresponds to 50% of the spatial size of a facet. The non-overlapping tiles underlying the
denition of the facets are delineated on the residual images in red dotted lines, with the central facet
displayed in continuous lines.
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 88
Channel ν1= 1 GHz
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-3
-2
-1
0
1
2
3
10-4
-3
-2
-1
0
1
2
3
10-4
Figure 5.6: Spectral faceting analysis for synthetic data: reconstructed images (in Jy/pixel) reported in
log10 scale for channel ν1= 1 GHz with Faceted HyperSARA for C= 10 and Q= 1 (left) and
HyperSARA (i.e. Faceted HyperSARA with Q=C= 1, right). Each sub-cube is composed of 10 out of
the L= 100 channels. From top to bottom: ground truth image, estimated model images and residual
images.
89 5.5 Validation on real data
Channel ν100 = 2 GHz
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-3
-2
-1
0
1
2
3
10-4
-3
-2
-1
0
1
2
3
10-4
Figure 5.7: Spectral faceting analysis for synthetic data: reconstructed images (in Jy/pixel) reported in
log10 scale for channel ν100 = 2 GHz with Faceted HyperSARA for C= 10 and Q= 1 (left) and
HyperSARA (i.e. Faceted HyperSARA with Q=C= 1, right). Each sub-cube is composed of 10 out of
the L= 100 channels. From top to bottom: ground truth image, estimated model images and residual
images.
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 90
algorithm JC-CLEAN in the software wsclean [85].
5.5.1 Dataset description and imaging settings
The data analyzed in this section are part of wideband VLA observations of the celebrated radio
galaxy Cyg A, acquired over two years (2015-2016) within the frequency range 2–18 GHz. We
consider 480 channels in C band spanning the frequency range [ν1, ν480] = [3.979,8.019] GHz, with
a frequency step δν = 8 MHz and a total bandwidth of 4.04 GHz. The phase center coordinates
are RA = 19h 59mn 28.356s (J2000) and DEC = +40442.07′′. The data set was acquired
at four instances which correspond to the frequency ranges [ν1, ν256] = [3.979,6.019] GHz and
[ν257, ν480 ] = [5.979,8.019] GHz, and VLA congurations A and C. The wideband data consists of
30 spectral windows, composed of 16 channels each, with approximately 106complex visibilities
per channel (about 8×105and 2×105measurements for congurations A and C, respectively),
stored as double-precision complex numbers.
In order to improve the accuracy of our modelled measurement operator, we have conducted
a pre-processing step consisting in a joint DDE calibration and imaging, applied to each chan-
nel separately. The approach, originally proposed by [98], consists in the alternate estimation
of the unknown DDEs and the image of interest, with a spatio-temporal smoothness DDE prior
and an average sparsity image prior [98,101,121]. The underpinning algorithmic structure oers
convergence guarantees to a critical point of the global non-convex optimization problem for joint
calibration and imaging, and the approach was suggested to open the door to a signicant im-
provement over the state-of-the-art [98]. Note that one would ultimately want to resort to such a
joint calibration and imaging approach to reconstruct the nal wideband image cube [45], rather
than applying it as a pre-processing step on each channel separately. However, the underpinning
algorithmic structure does not enable a faceting approach like the one proposed here, thereby
severely limiting its scalability. We thus restrict ourselves to using this approach separately on
each channel for scalability, and essentially to estimate DDEs. These are easily integrated into
the forward model (2.19) as explained in Section 2.4. The estimated model visibilities are also
exploited to determine estimates of the noise statistics, thus computing the 2bounds dening the
data-delity constraints. Note that both SARA and Faceted HyperSARA take advantage of this
pre-processing step, in contrast with JC-CLEAN, as the antenna-based DDEs estimates, provided
in the spatial Fourier domain, cannot be incorporated into the wsclean software.
We consider the reconstruction of images of size N= 2560×1536 from data acquired in L= 480
channels, with a pixel size δx = 0.06′′ (in both directions) corresponding to the eld of view (FoV)
Ω = 2.56×1.536. The pixel size is such that the spatial bandwidth of the recovered signal is
up to 1.75 times the nominal resolution at the highest frequency νL= 8.019 GHz, and 3.53 times
the nominal resolution at the lowest frequency νL= 3.979 GHz. For both SARA and Faceted
HyperSARA, we consider B= 2 data blocks per channel, associated with VLA congurations
91 5.5 Validation on real data
A and C and presenting dierent noise statistics. More specically to Faceted HyperSARA, we
consider C= 16 channel-interleaved sub-problems with Lc= 30, for any c∈ {1, . . . , C }, and
Q= 5 ×3facets along the spatial dimension, resulting in a total of Q×C= 240 spatio-spectral
facets.
With regards to initialization, both SARA and Faceted HyperSARA are initialized with the
wideband model cube X(0) = (X(0)
c)1cC, obtained by the monochromatic joint calibration and
imaging pre-processing step. From now on, for any (c, l)∈ {1, . . . , C} × {1, . . . , Lc}, the gridding
matrices Gc,l include the estimated DDEs, and Φc,l refers to the resulting measurement operator.
The 2bounds dening the data-delity constraints are approximated as follows. For each data
block indexed by (c, l, b)∈ {1, . . . , C}×{1, . . . , Lc{1, . . . , B}, we set ϵc,l,b =yc,l,bΦc,l,b x(0)
c,l 2.
More specically to Faceted HyperSARA, the weights of the reweighting scheme (Algorithm 4)
are initialized from the cube X(0) estimated in the pre-processing step based on (5.4) and (5.5).
Following the considerations from Section 5.4.1, the regularization parameters µcand µcare set
as µc= 1/X(0)
c= 102and µc= 102/ΨX(0)
c2,1= 5 ×106. Finally, SARA regularization
parameter eµis xed to eµ= 5 ×106.
5.5.2 Hardware
All the methods investigated in this section have been run on multiple nodes of Cirrus, having 36
cores and 256 GB of memory each (see Section 5.4.2 for further details). We recall that SARA and
Faceted HyperSARA are implemented using matlab whilst JC-CLEAN is implemented in c++.
Note that HyperSARA is not considered for the following experiment due to its prohibitive cost.
5.5.3 Evaluation metrics
We rst evaluate imaging precision by visually inspecting the images obtained with the proposed
Faceted HyperSARA, in comparison with the monochromatic imaging approach SARA [88] and
JC-CLEAN [85]. For Faceted HyperSARA and SARA, we consider the estimated model cube X
and the naturally-weighted residual image cube Rwhose columns, indexed by l∈ {1, . . . , L}, are
given by rl=ηlΦ
l(ylΦlxl), where yl=Θl¯
ylare the naturally-weighted RI measurements,
Θlis the corresponding noise-whitening matrix and Φl=ΘlGlFZ is the associated measurement
operator. The normalization factor ηlis obtained such that the associated PSF, given by ηlΦ
lΦlδ,
has a peak value equal to 1, where δRNis an image with value 1 at the phase center and
zero elsewhere. In contrast with SARA and Faceted HyperSARA, for which natural weighting is
adopted, optimal results are obtained for JC-CLEAN with Briggs weighting [16]. We consider the
Briggs-weighted residual image cube e
R= (e
rl)1lLand the restored image cube T= (tl)1lL
whose columns are dened as tl=xlcl+e
rl, where xlis the estimated model image and clis
the CLEAN beam (typically a Gaussian tted to the primary lobe of the associated PSF). As a
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 92
quantitative metric of delity to data the average standard deviation (aSTD) (4.23) is reported for
the three residual image cubes. The computing time (elapsed time), resources (number of CPU
cores) and overall computing cost of the dierent approaches are reported in Table 5.4 to assess
their scalability.
5.5.4 Results and discussion
5.5.4.1 Imaging quality
To assess the reconstruction quality of our approach in comparison with SARA and JC-CLEAN, we
rst examine the estimated images of channels ν1= 3.979 GHz and ν480 = 8.019 GHz, displayed
in Figures 5.8 and 5.9, respectively. Note that these channels correspond to channel indexes 1
and 30 of the 1st and 16th sub-problems, respectively. We then examine the average images of
the estimated cubes in Figure 5.10. Furthermore, we provide an analysis of the full estimated
image cubes obtained with the three methods, which are available online [117]. The displayed
images are overlaid with zooms on selected key regions of the radio galaxy. These are (i) the west
hotspot (top left, left panel) over the angular area 1= 0.08×0.08, centered at the position
given by RA = 19h 59mn 33.006s (J2000) and DEC = +404340.889′′ and (ii) the inner core
of Cyg A (top left, right panel) over the angular area 2= 0.03×0.03, centered at the position
RA = 19h 59mn 28.345s (J2000) and DEC = +40442.015′′. Note that the scale range of the
displayed zooms are adapted to ensure a clear visualization of the contrast within the dierent
structures of Cyg A.
In general, a visual inspection of the reconstructed images displayed in Figures 5.8 and 5.9
(log10 scale) indicates superior imaging quality of Faceted HyperSARA model images (top row),
compared to the model images of SARA (middle row) and the restored images of JC-CLEAN
(bottom row), with SARA imaging quality outperforming JC-CLEAN. On the one hand, the
higher resolution of Faceted HyperSARA is reected by a better reconstruction of the hotspots
and the inner core of Cyg A, in particular at the low-frequency channels (see Figure 5.8, rst
row, top left zooms). On the other hand, its higher dynamic range is reected by the enhanced
estimation of faint emissions in Cyg A, in particular, structures whose surface brightness is within
the range [0.01,0.1] mJy (see the arc around the right end of the west jet in Figure 5.9, rst
row). We further observe that the proposed spatial tessellation does not introduce artifacts in
the estimated images over the large dynamic range of interest. For SARA, given that no spectral
correlation is promoted, the reconstruction quality of the dierent channels is restricted to their
inherent resolution and sensitivity. This explains the lower reconstruction quality of SARA in
comparison with Faceted HyperSARA. JC-CLEAN restored images exhibit a comparatively poorer
reconstruction quality, as they are limited to the instrument’s resolution (through convolutions with
the channel-associated synthesized CLEAN beams). Furthermore, the associated dynamic range
93 5.5 Validation on real data
is limited by the prominent artifacts resulting from the lack of DDE calibration. The inspection
of the average images displayed in Figure 5.10 conrms the ability of the proposed approach to
recover ne details of Cyg A in comparison with SARA and JC-CLEAN.
Naturally-weighted residual images obtained with Faceted HyperSARA and SARA, and Briggs-
weighted residual images obtained with JC-CLEAN are reported in Figures 5.85.10, displayed on
bottom right panels overlaying the full recovered images. Their respective aSTD values are 5.46 ×
104,4.53×104and 5,2×104, indicating a comparable delity to data. Yet, a visual inspection
of the residual images, in particular for the average residual images (Figure 5.10, bottom right
panels), indicates that details of Cyg A jets are not fully recovered by SARA, as opposed to Faceted
HyperSARA. Given that both approaches satisfy the same data constraints, this demonstrates the
eciency of the Faceted HyperSARA prior to capture the details of the galaxy. Although no DDE
solutions are incorporated into the forward-modelling of JC-CLEAN, its residuals are homogeneous
due to the absence of the non-negativity constraint. In fact, negative components are absorbed
in its model images (consisting of the CLEAN components) to compensate for spurious positive
components.
Recently, [40] have reported the presence of a bright object in the inner core of the galaxy. The
object, dubbed Cyg A-2, has been identied as a second black hole. Its location is highlighted
with a white dashed circle in Figures 5.85.10 (top left, right panel), centered at the position given
by RA = 19h 59mn 28.322s (J2000) and DEC = +40441.89′′ with a radius of size 0.1′′. The
discovery was further conrmed in [44] by imaging two monochromatic VLA data sets at C band
(6.678 GHz) and X band (8.422 GHz) with SARA. Interestingly, the inspection of the estimated
image cube provided in [117] indicates that Cyg A-2 is discernible in images reconstructed by
Faceted HyperSARA at frequencies lower than ever. More precisely, the source is resolved in
all channels within the range [5.979,8.019] GHz with an average ux of 0.5164 (±0.1394) mJy.
SARA, however, succeeds in detecting it within the range [7.131,8.019] GHz with an average ux
of 0.5157 (±0.3957) mJy. Given the important calibration errors present in the associated restored
images, JC-CLEAN is not able to resolve Cyg A-2.
Interestingly, the examination of the full image cubes shows consistency in the image reconstruc-
tion quality of the 16 independent sub-problems. In general, we observe that the spectra recovered
by the dierent methods are nearly at across each adjacent 16 channels (composing a spectral win-
dow). We further observe a spectral discontinuity at channel ν257 = 5.979 GHz for the three meth-
ods, particularly noticeable in the inner core of Cyg A. Since the data set has been acquired sep-
arately in the frequency ranges [ν1, ν256] = [3.979,6.019] GHz and [ν257 , νL] = [5.979,8.019] GHz,
such spectral discrepancy may result from dierences in the calibration errors and noise statistics
between the two-channel ranges.
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 94
Time (h) CPU cores CPU time (h)
Faceted HyperSARA 68 496 33728
SARA 12.5 5760 72000
JC-CLEAN 22 576 12672
Table 5.4: Computing cost of Cyg A imaging at the spectral resolution 8MHz from 7.4 GB of data.
Results are reported for Faceted HyperSARA, SARA, and JC-CLEAN in terms of reconstruction time,
number of CPU cores and overall CPU time (highlighted in bold face).
5.5.4.2 Computing cost
The computing time and resources required by the dierent methods are reported in Table 5.4.
JC-CLEAN required 12672 CPU hours, with 36 CPU cores assigned to each sub-problem, whereas
SARA leveraged 72000 CPU hours using the parallelization procedure proposed by [87]. More
precisely, each channel is reconstructed using 12 CPU cores: 1 master CPU core, 2 CPU cores for
the data-delity terms (one core per data-delity term), and 9 CPU cores to handle the average
sparsity terms (associated with the nine bases of the SARA dictionary). Finally, Faceted Hyper-
SARA required 33728 CPU hours. More specically, each sub-problem (composed of 30 channels)
uses 1 master CPU core, 15 CPU cores to process the 2×30 data-delity terms (4 data-delity
blocks handled by each core), and 15 CPU cores to handle the 15 spatio-spectral facets. These
numbers indicate an overall higher eciency of the parallelization procedure adopted for Faceted
HyperSARA when compared to SARA, with better use of the allocated resources.
The results reported in this section show the ability of the proposed approach to provide signif-
icantly higher resolution and higher dynamic range image cubes in comparison with JC-CLEAN.
Interestingly, this quantum jump in imaging quality comes at a computing cost of Faceted Hyper-
SARA, implemented in matlab, not far from the JC-CLEAN c++ implementation, suggesting
that the gap can be signicantly reduced, if not closed, with the forthcoming c++ implementation
of Faceted HyperSARA.
5.6 Conclusions
We have introduced the Faceted HyperSARA method, which leverages a spatio-spectral facet
prior model for wideband radio-interferometric imaging. The underlying regularization encodes a
sophisticated facet-specic prior model to ensure precision of the image reconstruction, allowing
the bottleneck induced by the size of the image cube to be eciently addressed via parallelization.
Experiments conducted on synthetic data conrm that the proposed approach can provide a major
increase in scalability in comparison with the original HyperSARA algorithm, at no cost in imaging
quality, and showing potential to improve the reconstruction of faint emissions. Leveraging the
power of a large scale high performance computing system, our matlab implementation (avail-
able on github https://basp-group.github.io/Puri-Psi/) has been further validated on the
95 5.6 Conclusions
Channel ν1= 3.979 GHz
10-5
10-4
10-3
10-2
0.01
0.02
0.03
0.04
10-4
10-3
-0.01
0
0.01
10-5
10-4
10-3
10-2
0.01
0.02
0.03
0.04
10-4
10-3
-0.01
0
0.01
10-3
10-2
10-1
0.41
0.81
1.22
1.63
0.0041
0.0407
-0.01
0
0.01
Figure 5.8: Cyg A imaged at the spectral resolution 8MHz from 7.4 GB of data. Imaging results of
channel ν1= 3.979 GHz. Estimated images at the angular resolution 0.06′′ (3.53 times the observations
spatial bandwidth). From top to bottom: the respective estimated model images of the proposed Faceted
HyperSARA (Q= 15,C= 16) and SARA, both in units of Jy/pixel, and restored image of JC-CLEAN
in units of Jy/beam. The associated synthesized beam is of size 0.37′′ ×0.35′′ and its ux is 42.18 Jy.The
full FoV images (log10 scale) are overlaid with the residual images (bottom right, linear scale) and zooms
on selected regions in Cyg A (top left, log10 scale). These correspond to the west hotspot (left) and the
inner core of Cyg A (right). The zoomed regions are displayed with dierent value ranges for contrast
visualization purposes and are highlighted with white boxes in the full images. Cyg A-2 location is
highlighted with a white dashed circle. Negative pixel values of JC-CLEAN restored image and
associated zooms are set to 0 for visualization purposes. Full image cubes are available online [117].
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 96
Channel ν480 = 8.019 GHz
10-5
10-4
10-3
10-2
0.01
0.02
0.03
0.04
10-4
10-3
-5
0
5
10-3
10-5
10-4
10-3
10-2
0.01
0.02
0.03
0.04
10-4
10-3
-5
0
5
10-3
10-4
10-3
10-2
10-1
0.08
0.16
0.24
0.32
0.0008
0.008
-5
0
5
10-3
Figure 5.9: Cyg A imaged at the spectral resolution 8MHz from 7.4 GB of data. Reconstruction
results of channel ν480 = 8.019 GHz. Estimated images at the angular resolution 0.06′′ (1.75 times the
observations spatial bandwidths). From top to bottom: the respective estimated model images of the
proposed Faceted HyperSARA (Q= 15,C= 16) and SARA, both in units of Jy/pixel, and restored
image of JC-CLEAN in units of Jy/beam. The associated synthesized beam is of size 0.17′′ ×0.15′′ and
its ux is 8.32 Jy. The full FoV images (log10 scale) are overlaid with the residual images (bottom right,
linear scale) and zooms on selected regions in Cyg A (top left, log10 scale). These correspond to the west
hotspot (left) and the inner core of Cyg A (right). The zoomed regions are displayed with dierent value
ranges for contrast visualization purposes and are highlighted with white boxes in the full images. Cyg
A-2 location is highlighted with a white dashed circle. Negative pixel values of JC-CLEAN restored image
and associated zooms are set to 0 for visualization purposes. Full image cubes are available online [117].
97 5.6 Conclusions
Average estimated images
10-5
10-4
10-3
0.01
0.02
0.03
0.04
10-4
10-3
-1
0
1
10-3
10-5
10-4
10-3
0.01
0.02
0.03
0.04
10-4
10-3
-1
0
1
10-3
10-5
10-4
10-3
0.01
0.02
0.03
0.04
10-4
10-3
-1
0
1
10-3
Figure 5.10: Cyg A imaged at the spectral resolution 8MHz from 7.4 GB of data. Average estimated
images, computed as the mean along the spectral dimension. From top to bottom: the respective
estimated average model images of the proposed Faceted HyperSARA (Q= 15,C= 16) and SARA, and
the average restored image of JC-CLEAN (obtained as the mean of the restored images normalized by
the ux of their associated synthesized beam). The full FoV images (log10 scale) are overlaid with the
residual images (bottom right, linear scale) and zooms on selected regions in Cyg A (top left, log10 scale).
These correspond to the west hotspot (left) and the inner core of Cyg A (right). The zoomed regions are
displayed with dierent value ranges for contrast visualization purposes and are highlighted with white
boxes in the full images. Cyg A-2 location is highlighted with a white dashed circle. Negative pixel
values of JC-CLEAN restored image and associated zooms are set to 0 for visualization purposes.
Chapter 5: Faceted HyperSARA for wideband RI imaging: when precision meets scalability 98
reconstruction of a 15 GB image cube of Cyg A from 7.4 GB of VLA data. The associated results
are a practical proof of concept of the scalability of Faceted HyperSARA, which is also shown
to provide a signicant improvement in the imaging quality with respect to JC-CLEAN. Since a
comparison with HyperSARA would have been impractical, we show that Faceted HyperSARA
also supersedes the early monochromatic SARA approach in imaging precision. Interestingly, our
results conrm the recent discovery of a super-massive second black hole in the inner core of Cyg
A at much lower frequencies than both JC-CLEAN and SARA (the black hole is detected and
resolved at C band, starting from 5.979 GHz). Our work further illustrates the potential of ad-
vanced algorithms to enhance imaging quality beyond instrument resolution, opening the door to
cost-saving considerations for forthcoming arrays.
Having addressed the imaging problem in radio interferometry, we introduce in the next chapter
a new uncertainty quantication approach to assess the degree of condence in particular 3D
structures and faint emissions appearing in the estimated cube.
Chapter 6
Wideband uncertainty
quantication by convex
optimization
Contents
6.1 Motivation .................................... 99
6.2 Wideband uncertainty quantication approach ..............100
6.2.1 Bayesian hypothesis test ........................... 100
6.2.2 Choice of the set S.............................. 103
6.3 Proposed minimization problem .......................104
6.4 Proposed algorithmic structure ........................105
6.4.1 Epigraphical splitting ............................. 105
6.4.2 Underpinning primal-dual forward-backward algorithm .......... 106
6.5 Validation on synthetic data ..........................107
6.5.1 Simulation setting .............................. 107
6.5.2 Uncertainty quantication parameter .................... 109
6.5.3 Results and discussion ............................ 110
6.6 Conclusions ....................................111
6.1 Motivation
By now, we have tackled the wideband image formation problem in RI by introducing the Hyper-
SARA, and the Faceted HyperSARA approaches for precise and scalable imaging. These methods
provide solutions that are easily visualized, yet are typically unable to analyze the uncertainty
99
Chapter 6: Wideband uncertainty quantication by convex optimization 100
associated with the solution delivered. Since the wideband RI imaging problem is highly ill-posed,
assessing the degree of condence in specic 3D structures observed in the estimated cube is
very important. Besides, uncertainty quantication helps in making accurate decisions on the
3D structures under scrutiny (e.g., conrming the existence of a second black hole in the Cyg A
galaxy). Bayesian inference techniques naturally enable the quantication of uncertainty around
the image estimate via sampling the full posterior distribution based on a hierarchical Bayesian
model [7,39,69,70,114]. For instance, authors in [114] proposed a monochromatic Bayesian method
based on MCMC sampling techniques. The authors in [7,70] proposed to approximate the full pos-
terior distribution and draw samples from the approximate distribution. However, sampling-based
techniques are computationally very expensive and cannot currently scale to the data regime ex-
pected from modern telescopes.
Instead, we propose to solve the wideband RI uncertainty quantication problem by performing
a Bayesian hypothesis test leveraging modern and scalable convex optimization methods. This test
postulates that the 3D structure of interest is absent from the RI image cube. Then, the data and
the prior model are used to determine if the null hypothesis is rejected or not. The hypothesis test
is formulated as a convex program and solved eciently using the primal-dual forward-backward
(PDFB) algorithm (3.4.2). The underlying algorithmic structure benets from preconditioning and
parallelization capabilities, paving the road for scalability to large data sets and image dimensions.
This chapter is structured as follows. Section 6.2 explains our wideband uncertainty quanti-
cation approach and the postulated Bayesian hypothesis test. In Section 6.3, we present a convex
minimization problem to formulate the hypothesis test. The underpinning algorithmic structure
and the epigraphical splitting technique exploited to solve the minimization problem are presented
in Section 6.4. We showcase the performance of our approach on realistic simulations in Section
6.5. Finally, conclusions and perspectives are stated in Section 6.6.
This work has been published in [5].
6.2 Wideband uncertainty quantication approach
6.2.1 Bayesian hypothesis test
The proposed method takes the form of a Bayesian hypothesis test to properly assess the degree of
condence in specic 3D structures appearing in the MAP estimate. We recall that in a Bayesian
framework, the objective function of a minimization problem can be seen as the negative logarithm
of a posterior distribution, with the minimizer corresponding to a MAP estimate. To dene the
test, we postulate the following hypotheses:
H0: The 3D structure of interest is ABSENT from the true image cube,
H1: The 3D structure of interest is PRESENT in the true image cube.
101 6.2 Wideband uncertainty quantication approach
These hypotheses split the set of images RN×L
+onto two regions; a set S RN×L
+associated with
H0containing all images without the 3D structure, and the complement RN×L
+\S associated with
H1. The hypothesis test determines if the data and the prior model support that the 3D structure
is real (in favour of H1) or corresponds to a reconstruction artifact (in favour of H0). The null
hypothesis H0is rejected with signicance α]0,1[ if
P[H0|Y] = P[X∈ S|Y]α, (6.1)
or equivalently if
P[H1|Y] = P[XRN×L
+\ S|Y]>1α. (6.2)
Computing these hypothesis tests involves the calculations of probabilities in high-dimensional
space, which are typically intractable. One approach is to approximate these probabilities by
an MCMC algorithm [50,90]. The computation cost involved in these methods is still orders of
magnitude higher than that involved in computing the MAP estimator by convex optimization [29].
This necessitates the need for new uncertainty quantication methods that are both fast and
scalable to the data dimensions and image-size cubes expected with the new generation radio
telescopes.
In this regard, we propose to generalize our recent work for single-channel uncertainty quanti-
cation [99,100]. Similarly, we formulate the hypothesis test as a convex minimization problem that
can be solved eciently by convex optimization algorithms. The proposed approach only assumes
knowledge of the MAP estimate of the RI image cube and does not involve computing probabili-
ties. The hypothesis test is solved by comparing the set Swith the region of the parameter space
where most of the posterior probability mass of Xlies. This region is called credible region in the
Bayesian decision theory framework [103]. A set Cαis a posterior credible region with condence
level (1 α)% if
P[X∈ Cα|Y] = 1 α, for α]0,1[.(6.3)
For every α]0,1[, one can nd many regions in the parameter space that verify (6.3). Here
we consider the highest posterior density (HPD) region that is optimal in the sense that it has
minimum volume [103] and is given by
C
α={X|r(X)ηα},(6.4)
where ris the objective function and ηαRis chosen such that (6.3) holds. Computing the
exact HPD region is computationally very expensive in imaging problems because of the high
dimensionality involved. To overcome this diculty, we resort to the conservative credible region
e
Cα[91], where P[Xe
Cα|Y]1α. For any α] exp(NL/3),1[, the conservative credible
Chapter 6: Wideband uncertainty quantication by convex optimization 102
region is dened as
e
Cα={X|r(X)eηα},(6.5)
where the parameter eηαRis computed directly from the MAP estimate Xas [91]
eηα=r(X) + N L(τα+ 1),(6.6)
with
τα=p16 log(3)/NL. (6.7)
Figure 6.1: 1D illustration of the exact HPD region C
αand the approximated one e
Cα. Notice that
C
αe
Cα.
In this work, we consider the RI image cube is estimated with the HyperSARA approach (4.2),
hence e
Cαis dened as
e
Cα=XRN×L
+|r(X)eηαand ((l, b)Φl,b xl∈ B2(yl,b, ϵl,b),(6.8)
with:
r(X) = µ
J
X
j=1
log |σj(X)|+υ+µ
T
X
n=1
log [ΨX]n2+υ.(6.9)
The function r(X)is a non-convex function. In a similar fashion to (4.3), we approximate the
function r(X)by its convex majorant at the MAP estimate Xof the RI image cube:
r(X) = µX,ω(X)+µΨX2,1,ω(X).(6.10)
103 6.2 Wideband uncertainty quantication approach
The set e
Cαis a convex set since it results from the intersection of multiple convex sets. This
is very important for the proposed convex optimization methodology. Note that, when Faceted
HyperSARA is used for computing the MAP estimate of the wideband image cube, we replace the
prior model r(X)with the spatio-spectral faceted prior (5.3).
According to Theorem 3.2 in [100], if e
Cα∩ S =, then S RN×L
+\e
Cα. And because
P[Xe
Cα|Y]1α, if e
CαS =, then P[H0|Y]α. That being said, we can verify if the null
hypothesis H0is rejected, i.e.,P[H0|Y]αby solving the following problem:
determine if e
Cα∩ S =at level α. (6.11)
There are two possible scenarios:
e
Cα∩ S =, then the hypothesis test H0is rejected at level αand the 3D structure under
scrutiny presents in the true RI image cube with probability (1 α).
e
Cα∩ S ̸=, then we fail to reject H0at level αand the presence of the 3D structure of
interest is uncertain.
6.2.2 Choice of the set S
We consider in this work spatially localized 3D structures. To dene these type of structures,
we introduce the selection matrix M∈ {0,1}NM×N, that is, for an image cube XRN×L,
MX RNM×Ldenotes the region of the 3D structure. The matrix Mc∈ {0,1}(NNM)×Nis the
complementary matrix of M. We dene the set Sbeing a subset of the intensity image cubes
of RN×L, by imposing a non-negativity constraint. In addition, to smooth the area of the 3D
structure, we use a positive inpainting matrix LRNM×(NNM)that lls the region of the 3D
structure MX with the information from the other pixels McXsuch that
MX =LMcX+T,(6.12)
where T[τ, τ ]NM×Land τ > 0is a small tolerance value. Note that, the inpainting technique is
done in 3D, meaning that each pixel in the 3D structure is replaced by a weighted sum of the pixels
in the 3D neighborhood, ensuring spatio-spectral smoothness in the region of the 3D structure.
The inpainting procedure might amplify the energy in MX and lead to articial artifacts. To
alleviate this issue, we constrain the energy in the region of the 3D structure to a certain bound
θ= (θl)1lLRL
+from a background reference level BRNM×L,i.e.,
MX ∈ B2(B,θ)(6.13)
Chapter 6: Wideband uncertainty quantication by convex optimization 104
That being said, the set Scontaining all image cubes without the 3D structure of interest is given
by
S=XRN×L
+|(MLMc)X[τ, τ ]NM×Land MX ∈ B2(B,θ),(6.14)
where Sis a convex set since it results from the intersection of multiple convex sets.
6.3 Proposed minimization problem
We recall that the proposed approach consists in solving problem (6.11), that is, determining if the
intersection between Sand e
Cαis empty or not. Intuitively, e
CαS =hold only if dist(e
Cα,S)>0,
and consequently we can conclude that the hypothesis test H0is rejected at level α, where
dist(e
Cα,S) = inf e
Cα− S∥ = inf nXe
CαXSF: (Xe
CαXS)e
Cα× So,(6.15)
Oppositely, if dist(e
Cα,S) = 0, we can conclude that e
Cα∩ S ̸=and we are uncertain about the
presence of the 3D structure of interest. For more clarity, Figure 6.2 shows a simple illustration of
the proposed approach.
Figure 6.2: Illustration of the proposed method for the two dierent scenarios. Our approach simply
consists in examining the euclidean distance between the two sets Sand e
Cα. Left: there is no intersection
between the two sets, thus H0is rejected at level α. Right: the two sets intersect, thus one cannot reject
H0,i.e., one cannot conclude if the 3D structure exists in the true image cube or not.
By combining the denition of the distance (6.15) and the denitions of the sets e
Cα(6.8) and
S(6.14), we can reformulate the problem (6.11) as
105 6.4 Proposed algorithmic structure
minimize
XeCαRN×L,XSRN×L
γ
2Xe
CαXS2
F(6.16)
subject to
XSRN×L
+,LXS[τ, τ ]NM×L,MXS∈ B2(B,θ),
Xe
CαRN×L
+,(l, b)Φl,bXe
Cαl∈ B2(yl,b, ϵl,b ),
µXe
Cα,ω+µΨXe
Cα2,1,ωeηα,(6.16a)
with L=MLMcand γ > 0. Note that the choice of the parameter γdoes not aect the solution
of the minimization problem. However, this parameter can be used in practice to accelerate the
convergence speed. The notation Xe
Cαlrepresents a column of the matrix Xe
Cα
1.
6.4 Proposed algorithmic structure
To solve the minimization problem (6.16), we adopt the PDFB algorithm explained in Section
3.4.2. However, solving for (6.16) involves projection onto a convex set dened by the constraint
(6.16a). This projection does not have a closed-form solution. To overcome this diculty, we
resort to a splitting technique based on epigraphical projection recently proposed in [31] to handle
minimization problems involving sophisticated constraints.
6.4.1 Epigraphical splitting
The epigraphical splitting proposed by [31] aims to replace complex constraints such as (6.16a)
with a collection of epigraphs and a closed half-space constraint. Thus, the problem of com-
puting the projection onto the original constraint is reduced to the problem of computing the
projection onto smaller epigraphs. Leveraging this technique, we introduce the auxiliary variables
q= (qj)1jJRJand z= (zn)1nTRTin the minimization problem (6.16) and thereby
splitting the constraint (6.16a) into simpler set of constraints. Consequently, the minimization
problem (6.16) can be equivalently reformulated as
1The notation Xlis equivalent to xladopted in the previous chapters. Since most of the symbols in this chapter
have subscripts, the new notation has been adopted for more clarity.
Chapter 6: Wideband uncertainty quantication by convex optimization 106
minimize
XeCαRN×L,XSRN×L
qRJ,zRT
γ
2Xe
CαXS2
F(6.17)
subject to
XSRN×L
+,LXS[τ, τ ]NM×L,MXS∈ B2(B,θ),
Xe
CαRN×L
+,(l, b)Φl,bXe
Cαl∈ B2(yl,b, ϵl,b ),
µXe
Cα,ωeq,
µΨXe
Cα2,1,ωez,
eq+ezeηα.
(6.17a)
(6.17b)
(6.17c)
with eq=PJ
j=1 qjand ez=PT
n=1 zn. Note that the variable Xe
Cαsatisfying the constraint
(6.16a) is equivalent to having the variables (Xe
Cα,q,z)satisfying the constraints (6.17a), (6.17b)
and (6.17c). Following the epigraph denition (Appendix .1, equation (2)), one can observe that the
condition (6.17a) represents the epigraph of the weighted nuclear norm E,ω. Similarly, condition
(6.17b) represents the epigraph of the weighted 2,1norm E2,1,ω. The constraint (6.17c) accounts
for a closed half-space. The associated projections with conditions (6.17a), (6.17b) and (6.17c)
admit a closed form and are presented in Appendix .2.
To t the minimization problem (6.17) in the PDFB framework, the constraints are imposed
by means of the indicator function ιCof a convex set C(3.21). By doing so, the minimization
problem (6.17) can be equivalently redened as
minimize
XeCαRN×L,XSRN×L
qRJ,zRT
γ
2Xe
CαXS2
F+ιRN×L
+(XS) + ι[τ,τ ]NM×L(LXS) + ιB2(B,θ)(MXS)+
+ιRN×L
+(Xe
Cα) +
L
X
l=1
B
X
b=1
ιB2(yl,bl,b )Φl,bXe
Cαl+
+ιE,ω(Xe
Cα,q),+ιE2,1,ω(ΨXe
Cα,z),+ιV(eηα)(q/µ, z),
(6.18)
where:
V=
(q/µ, z)RJ×RT|
J
X
j=1
qj+
T
X
n=1
zneηα
.(6.19)
6.4.2 Underpinning primal-dual forward-backward algorithm
The details of the proposed algorithmic structure are presented in Algorithm 6. All the variables of
interest are updated via forward-backward steps. At each iteration tN, the algorithm minimizes
the distance between Xe
Cαand XSin lines 9and 12 (the image cubes of interest). In addition, pro-
107 6.5 Validation on synthetic data
jections onto the convex set e
Cαare performed in steps 21,24,27. These correspond to projections
of the estimated data Φl,bXe
Cαlonto the associated 2balls with respect to the preconditioning
matrices Ul,b, and epigraphical projections of the variable Xe
Cαonto the weighted nuclear ball (line
24) and the weighted 2,1ball (line 27), respectively. Similarly, projections onto the convex set
Sare performed in steps 30,32. These correspond to performing the linear inpainting (6.12) and
controlling the energy of the 3D structure of interest (6.13), respectively. All the projections are
updated in parallel and used later in the update of the primal variables of interest Xe
Cαand XSin
steps 10,13, respectively.
The exact expressions of all the projections are provided in Appendix .2. The diagonal precon-
ditioning matrices Ul,b are chosen according to the preconditioning strategy described in Section
4.3.2. More precisely, the coecients on their diagonal are given by the inverse of the sampling
density in the vicinity of the probed Fourier modes. It is worth noting that the projections onto
the 2balls with respect to the preconditioning matrices Ul,b do not admit an analytic expression.
Instead, they can be numerically estimated with an iterative algorithm. In this work, we resort to
FISTA [9].
Following (3.29) and in a similar fashion to Section 4.3.2, Algorithm 6is guaranteed to con-
verge to the global minimum of the minimization problem (6.18) if the preconditioning matrices
(Ul,b)1lL
1bB
and the parameters (κi)1i5, ζ and γsatisfy the inequality:
1ζκ1U1/2Φ2
S+κ2+κ3Ψ2
S+κ4L2
S+κ5M2
S> ζ γ/2(6.20)
where for every XRN×L,U1/2Φ(X) = (U1/2
l,b Φl,bXl)1lL
1bB
. A convenient choice of (κi)1i5
is
κ1=1
U1/2Φ2
S
, κ2= 1, κ3=1
Ψ2
S
, κ4=1
L2
S
and κ5=1
M2
S
.(6.21)
In this setting, convergence is guaranteed for all 0< ζ < 2
10+γ.
6.5 Validation on synthetic data
6.5.1 Simulation setting
Following the procedure described in Section 4.4, we simulate a wideband RI model cube composed
of L= 15 spectral channels from an image of the W28 supernova remnant of size N= 256 ×256
pixels. The wideband RI data are generated from a realistic uv-coverage within the frequency
range [ν1, νL] = [1,2] GHz with uniformly sampled channels and total observation time of 4 hours.
For each channel indexed l∈ {1,··· , L}, its corresponding uv-coverage is obtained by scaling the
reference uv-coverage with νl1. We consider an input signal-to-noise ratio (iSNR) (4.13) of 60 dB.
The 2bounds ϵl,b on the data-delity terms are computed from the noise statistics as explained
Chapter 6: Wideband uncertainty quantication by convex optimization 108
Algorithm 6: Wideband uncertainty quantication by convex optimization
Data: (yl,b)l,b ,l∈ {1,··· , L},b∈ {1,··· , B}
Input: X(0)
eCα
,X(0)
S,q(0),z(0) ,B(0),a(0) ,(C(0)
d,e(0)
d)1dD,(v(0)
l,b )l,b,D(0) ,S(0),ω(0) ,ω(0)
Parameters: (Ul,b)l,b,(ϵ(0)
l,b )l,b,eηα,µ,µ,ζ,(κi)1i5
1t0;
2while stopping criterion not satised do
3for l= 1 to Ldo
4bx(t)
l=FZ exl;// Fourier transforms
5for b= 1 to Bdo
6bx(t)
l,b =Ml,bbx(t)
l;// send to data cores
7Update primal variables simultaneously
8for l= 1 to Ldo
9H(t+1)
eCαl=γX(t)
eCαlγX(t)
Sl+κ1ZF
B
X
b=1
M
l,bev(t)
l,b +κ2B(t)l+κ3
D
X
d=1 e
C(t)
dl
10 X(t+1)
eCα
=PRN×L
+X(t)
eCα
ζH(t+1)
eCα
11 X(t+1)
eCα
= 2X(t+1)
eCα
X(t)
eCα
12 H(t+1)
S=γX(t)
SγX(t)
eCα
+κ4LD(t)+κ5MS(t)
13 X(t+1)
S=PRN×L
+X(t)
SζH(t+1)
S
14 X(t+1)
S= 2X(t+1)
SX(t)
S
15
q(t+1)
z(t+1)
1
.
.
.
z(t+1)
D
=PV
(q(t)ζκ2a(t))
(z(t)
1ζκ3e(t)
1)
.
.
.
(z(t)
Dζκ3e(t)
D)
,eηα
16 q(t+1) = 2q(t+1) q(t)
17 z(t+1) = 2z(t+1) z(t)
18 Update dual variables simultaneously
19 Enforce data delity
20 for (l, b)= (1,1) to (L, B)do
21 v(t+1)
l,b =Ul,b IMl,b proxUl,b
B2(yl,bl,b )U1
l,b v(t)
l,b +Θl,bGl,b bx(t)
l,b
22 ev(t+1)
l,b =G
l,bΘ
l,bv(t+1)
l,b
23 Epigraphical projection onto the nuclear ball
24 B(t+1),a(t+1) =IJ− PE,ωB(t)+X(t+1)
eCα
,(a(t)+q(t+1))
25 Epigraphical projection on the 2,1ball
26 for d= 1 to Ddo
27 C(t+1)
d,e(t+1)
d=IN− PE2,1,ωC(t)
d+Ψ
dX(t+1)
eCα
,(e(t)
d+z(t+1)
d)
28 e
C(t+1)
d=ΨdC(t+1)
d
29 3D structure inpainting
30 D(t+1) =INM×L− P[τ,τ]NM×LD(t)+LX(t+1)
S
31 3D structure energy bounding
32 S(t+1) =INM×L− PB2(B,θ)S(t)+MX(t+1)
S
33 tt+ 1
109 6.5 Validation on synthetic data
in Section 4.4. Several tests are performed varying the sampling rate SR (4.15) from 0.005 to 2.
The performance of the proposed uncertainty quantication algorithm (Algorithm 6), denoted
by HyperSARA-UQ, where the MAP estimate is computed using the HyperSARA approach (Algo-
rithm 2) with 5 reweights, is compared with that of the joint average sparsity approach, denoted by
JAS-UQ, where the MAP estimated is computed by solving a sequence of 5 consecutive JAS min-
imization problems of the form (4.17) using the PDFB algorithm explained in Section 4.3.2. The
regularization parameters present in the HyperSARA minimization task (4.3) are set to µ= 1 and
µ= 102, leveraging the dirty wideband model cube Xdirty (Section 4.2.2), and the free parameter
appears in the JAS minimization problem (4.17) is set to µ2= 102. Note that for JAS-UQ, the
prior model rinvolved in the denition of the set e
Cα(6.8) is reduced to the joint average sparsity
prior, i.e.,r(X) = µ2ΨX2,1,ω. In this setting, JAS-UQ is solved using a simplied version of
Algorithm 6where the projection onto the set e
Cαinvolves a simple projection onto the weighted
2,1ball rather than epigraphical projections onto the weighted nuclear ball and the weighted 2,1
ball.
To precisely evaluate the interest of the proposed approach, we quantify the uncertainty of three
spatially localized 3D structures appearing in the MAP estimate. These structures are compact or
slightly extended sources corresponding to the denition of the set S(6.14). The linear inpainting
matrix Lgiven in (6.12) is chosen such that L=1
3(L5×5×3+L7×7×3+L11×11×3), where L5×5×3,
L7×7×3and L11×11×3model a 3D normalized convolutions between the image cube (lled with
zeros inside the 3D structure) and 3D Gaussian convolution kernels of size 5×5×3,7×7×3and
11 ×11 ×3, respectively. Also, we set (τl=std(M[X]l))1lL. To control the energy inside the
region of the 3D structure as dened in (6.13), we set B=0and (θl=LMc[X]l2)1lL.
6.5.2 Uncertainty quantication parameter
Recall that the solution (X
Cα,X
S)generated by Algorithm 6satisfy
dist(e
Cα,S) = X
CαX
SF.(6.22)
To relate this distance to the 3D structure’s intensity, we introduce the image cube X
S∈ S that
corresponds to the MAP estimate Xwhere the 3D structure of interest has been removed via
applying the linear inpainting Lto the region of the 3D structure of interest. Formally, we dene:
MX
S=LMcXand McX
S=McX.(6.23)
Chapter 6: Wideband uncertainty quantication by convex optimization 110
At this point, we dene the normalized intensity of the 3D structure ραas the ratio between the
distance (6.22) and the intensity of the 3D structure present in the MAP, given by XX
SF:
ρα=dist(e
Cα,S)
XX
SF
.(6.24)
Notice that ρα= 0 is equivalent to dist(e
Cα,S) = 0. Consequently, we can conclude:
ρα= 0 implies that e
Cα∩ S ̸=. Then, we fail to reject H0at level αand the presence of
the 3D structure of interest is uncertain.
ρα>0implies that e
Cα∩ S =. Thus, the hypothesis test H0is rejected at level αand the
value ραrepresents the energy percentage of the 3D structure that is conrmed in the MAP
estimate [100].
In our simulations, we consider α= 1% and we consider that H0is rejected when ρα>2% (to
allow for numerical errors).
6.5.3 Results and discussion
To showcase the eciency of the hybrid prior image model, we compare the performance of
HyperSARA-UQ against JAS-UQ. We perform several tests on the data sets generated using
a realistic uv-coverage where we vary the Fourier SR in the interval [0.01 2]. In our simulations,
we study three dierent spatially localized 3D structures; these are two strong sources: Structure 1
and Structure 3, highlighted with blue and red rectangles, respectively, in the ground-truth image
cube at channels ν1and ν15 and one weak and slightly extended source, Structure 2, highlighted
with green rectangle (see Figure 6.3, rst row). Figure 6.3 (c) shows curves representing the ρα
values of the three Structures obtained by HyperSARA-UQ (solid curves) and JAS-UQ (dashed
curves).
We notice for the three structures that the conrmed energy ραwith both methods naturally
increases when the number of measurements increases, meaning that the 3D structure uncertainty
decreases when SR increases. This is because the investigated structures are originally present in
the true image cube. We observe comparable performance between HyperSARA-UQ and JAS-UQ
for Structure 3 where ρα>90% for all sampling rates above 0.22. In this case, at least 90% of
Structure 3 intensity is conrmed with probability 99%. However, this is not the case for Structure
1 and Structure 2, where HyperSARA-UQ succeeds in conrming more energy of the sources in
comparison with JAS-UQ (almost 10% increase in the conrmed energy is reported for Structure
1 with SR >0.1, and for Structure 2 with all considered sampling rates). These results show the
power of the combination of the low-rankness and joint average sparsity priors in conrming the
energy of the considered true structures. For the weak source, Structure 2, the hypothesis H0is
always rejected for all the considered sampling rates. Interestingly, 37% of the energy of Structure 2
111 6.6 Conclusions
is conrmed for the drastic sampling SR = 0.01 with HyperSARA-UQ, while the conrmed energy
drops to 8.93% with JAS-UQ, reecting the importance of the low-rankness prior in HyperSARA-
UQ in regularizing the inverse problem. Conversely, H0cannot be rejected with both methods for
Structure 1 when SR <0.1and cannot be rejected for Structure 3 when SR <0.02.
For qualitative comparison, we proceed with the visual inspection of the images obtained with
HyperSARA-UQ and JAS-UQ applied to Structure 1 (Figures 6.4 and 6.5), Structure 2 (Figures
6.6 and 6.7) and Structure 3 (Figures 6.8 and 6.9). The images are obtained with SR = 0.5 and
iSNR = 60 dB and reported for channels ν1and ν15. The rst row of all gures show the MAP
estimate of HyperSARA (left) and JAS (right). The reported aSNR values are 32.32 dB and
30.87 dB, respectively, suggesting the superiority of HyperSARA in recovering higher-resolution
higher-dynamic-range image cubes. Comparing the uncertainty quantication parameter, we have
for Structure 1 (Figures 6.4 and 6.5)ρα= 75.21% with HyperSARA-UQ and ρα= 64.53% with
JAS-UQ. Thus, we can conclude that e
Cα∩ S =and H0can be rejected with signicance
α= 1%. Similar conclusion can be drawn for Structure 3 (Figures 6.8 and 6.9), with 97.43% of the
3D structure energy is conrmed using HyperSARA-UQ and 96.95% of the energy is conrmed
using JAS-UQ, suggesting the similarity between the two methods in analyzing strong sources.
Interestingly for the weak and slightly extended source, Structure 2, we have ρα= 54.91% using
HyperSARA-UQ and ρα= 45.1% using JAS-UQ. These values indicates that Structure 2 is real
and not an artifact. One can observe that the images [X
e
Cα
]land [X
S]l(Figures 6.8 and 6.9, bottom
row), obtained by performing uncertainty quantication of Structure 3 exhibit more artifacts than
those obtained of Structure 2 (Figures 6.6 and 6.7, bottom row). This can be justied by the fact
that Structure 3 is a very strong source and when removing it by projecting onto the set S, the
data tend to create artifacts in other parts of the image to compensate for the lost energy.
6.6 Conclusions
In this work, we presented a wideband uncertainty quantication approach that measures the
degree of condence in specic 3D structures appearing in the MAP estimate. The proposed
method is a generalization of our previous works for monochromatic uncertainty quantication
[99,100] and is based on the recent Bayesian inference results presented in [91]. Our approach
takes the form of a Bayesian hypothesis test, formulated as a convex minimization problem and
solved using a primal-dual algorithm. As opposed to Bayesian inference techniques, the algorithmic
structure is shipped with preconditioning and parallelization functionalities paving the road for
scalability to big data regimes. We investigated the interest of our approach for wideband RI
imaging on realistic simulations through investigating dierent spatially localized 3D structures.
Another interesting study consists of analyzing artifacts appearing in the MAP estimate. In this
case, ραis expected to decrease as SR increases. Unfortunately, we could not perform this analysis
Chapter 6: Wideband uncertainty quantication by convex optimization 112
with the considered data set as the MAP estimate presents no artifacts even for low SR rates. We
leave this study for real data where the reconstructed image cube typically exhibits artifacts result
from calibration errors or mis-modelling of the measurement operator. Validating the proposed
approach on real data sets is an ongoing project where preliminary real data results suggest the
design of more sophisticated and scalable sets S. More precisely, the current implementation
showed two problems for real data: (i) Applying the linear inpainting at each iteration could be
very costly for GB images. (ii) Our linear inpainting matrix replaces each pixel in the 3D structure
by a weighted sum of the pixels in the 3D neighborhood. More sophisticated denitions of the
inpainting operation can be investigated (e.g., one can dene the inpainting operator in the 3D
wavelets space). To conclude, we emphasize on the fact that wideband uncertainty quantication
tools are of great interest for astronomers, particularly in the era of the new generation telescopes,
namely SKA, where Petabyte image cubes are expected.
113 6.6 Conclusions
(a) Ground-truth image at channel
ν1= 1 GHz
(b) Ground-truth image at channel
ν15 = 2 GHz
(c) Curves representing the values of ρα
10-2 10-1 10 0
0
20
40
60
80
100
Figure 6.3: Simulations with realistic uv-coverage: (c) Curves representing the values of ραin
percentage (y-axis) as a function of the sampling rate SR=Ml/N (x-axis), in log10 scale, for the 3D
structures of interest. The considered 3D structures are highlighted with rectangles on channel
ν1= 1 GHz (a) and channel ν15 = 2 GHz (b) of the ground-truth image cube, in log10 scale. Each point
corresponds to the mean value of 5tests with dierent antenna positions and noise realizations, and the
vertical bars represents the standard deviation of the 5tests.
Chapter 6: Wideband uncertainty quantication by convex optimization 114
Channel ν1= 1 GHz
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
Figure 6.4: Uncertainty quantication of 3D Structure 1: results, reported for channel ν1= 1 GHz, are
obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB. The images from top to bottom are:
the MAP estimate [X]1, the uncertainty quantication results [X
eCα
]1and [X
S]1. The results are given
for HyperSARA-UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 75.21%, and JAS-UQ (right) with MAP estimate aSNR = 30.87 dB and ρα= 64.53%.
All images are displayed in log10 scale and overlaid with zoom onto the region of Structure 1.
115 6.6 Conclusions
Channel ν15 = 2 GHz
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
Figure 6.5: Uncertainty quantication of 3D Structure 1: results, reported for channel ν15 = 2 GHz, are
obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB. The images from top to bottom are:
the MAP estimate [X]15, the uncertainty quantication results [X
eCα
]15 and [X
S]15. The results are
given for HyperSARA-UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 75.21%, and JAS-UQ (right) with MAP estimate aSNR = 30.87 and ρα= 64.53%. All
images are displayed in log10 scale and overlaid with zoom onto the region of Structure 1.
Chapter 6: Wideband uncertainty quantication by convex optimization 116
Channel ν1= 1 GHz
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
Figure 6.6: Uncertainty quantication of 3D Structure 2: results, reported for channel ν1= 1 GHz, are
obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB. The images from top to bottom are:
the MAP estimate [X]1, the uncertainty quantication results [X
eCα
]1and [X
S]1. The results are given
for HyperSARA-UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 54.91%, and JAS-UQ (right) with MAP estimate aSNR = 30.87 dB and ρα= 45.1%. All
images are displayed in log10 scale and overlaid with zoom onto the region of Structure 2.
117 6.6 Conclusions
Channel ν15 = 2 GHz
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
Figure 6.7: Uncertainty quantication of 3D Structure 2: results, reported for channel ν15 = 2 GHz, are
obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB. The images from top to bottom are:
the MAP estimate [X]15, the uncertainty quantication results [X
eCα
]15 and [X
S]15. The results are
given for HyperSARA-UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 75.21%, and JAS-UQ (right) with MAP estimate aSNR = 30.87 and ρα= 64.53%. All
images are displayed in log10 scale and overlaid with zoom onto the region of Structure 2.
Chapter 6: Wideband uncertainty quantication by convex optimization 118
Channel ν1= 1 GHz
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
0
Figure 6.8: Uncertainty quantication of 3D Structure 3: results, reported for channel ν1= 1 GHz, are
obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB. The images from top to bottom are:
the MAP estimate [X]1, the uncertainty quantication results [X
eCα
]1and [X
S]1. The results are given
for HyperSARA-UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 97.43%, and JAS-UQ (right) with MAP estimate aSNR = 30.87 dB and ρα= 96.95%.
All images are displayed in log10 scale and overlaid with zoom onto the region of Structure 3.
119 6.6 Conclusions
Channel ν15 = 2 GHz
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
-4
-3
-2
-1
0
Figure 6.9: Uncertainty quantication of 3D Structure 3: results, reported for channel ν15 = 2 GHz, are
obtained with realistic uv-coverage, SR = 0.5and iSNR = 60 dB. The images from top to bottom are:
the MAP estimate [X]15, the uncertainty quantication results [X
eCα
]15 and [X
S]15. The results are
given for HyperSARA-UQ (left) with MAP estimate aSNR = 32.32 dB and uncertainty quantication
parameter ρα= 97.43%, and JAS-UQ (right) with MAP estimate aSNR = 30.87 and ρα= 96.95%. All
images are displayed in log10 scale and overlaid with zoom onto the region of Structure 3.
Chapter 7
Conclusions and perspectives
The research reported in this thesis leverages convex optimization techniques to achieve precise
and scalable imaging for wideband radio interferometry and further assess the degree of condence
in particular 3D structures present in the reconstructed cube. On the one hand, the radio inter-
ferometric inverse problem for image formation is highly ill-posed, which prompts the adoption of
sophisticated image priors to regularize the inverse problem. On the other hand, modern radio
telescopes provide vast amounts of data, which necessitates the design of scalable and distributed
wideband imaging and uncertainty quantication algorithms that can recover and analyze the
expected very large image cubes.
To meet these extreme challenges and achieve the anticipated scientic goals, we proposed the
HyperSARA approach in Chapter 4. HyperSARA consists in solving a minimization problem with
sophisticated log-sum priors, promoting low-rankness and joint average sparsity of the estimated
image cube in 0sense. This hybrid prior image model has proved ecient in recovering high-
resolution high-dynamic-range image cubes in comparison with the state-of-the-art approaches.
The underpinning algorithmic structure, namely the primal-dual framework, exhibits interesting
functionalities such as preconditioning to accelerate the convergence speed and splitting of the data
into eciently-designed data-blocks to spread the computation cost over a multitude of processing
CPU cores, allowing scalability to large data volumes. Besides, HyperSARA incorporates an
adaptive strategy to adjust the bounds associated with the data-delity terms, allowing for imaging
real data that present calibrations errors in addition to the thermal noise. Although HyperSARA
processes all data-delity blocks independently in parallel, the involved priors require computations
scaling with the size of the image cube to be reconstructed. This is prohibitively expensive in terms
of computation time and memory requirements in the era of the awaited Petabyte image cubes.
To overcome this bottleneck and push the scalability potential of HyperSARA, we developed in
Chapter 5 the Faceted HyperSARA algorithm. Faceted HyperSARA decomposes the full image
cube into regular, content-agnostic, spatio-spectral facets, each of which is associated with a facet-
120
121 7.1 Perspectives
based low-rankness and joint average sparsity regularization term. This sophisticated facet-specic
prior model was shown to provide higher imaging quality, reected by a better estimation of faint
emissions compared to HyperSARA. Faceted HyperSARA, powered by the primal-dual algorithm,
allows for parallel processing of all data blocks and image facets over a multiplicity of CPU cores.
The performance of Faceted HyperSARA was validated on a reconstruction of a 15 GB image cube
of Cyg A from 7.4 GB of VLA observations across 480 channels, utilizing 496 CPU cores on a
high performance computing system for 68 hours. On the one hand, the associated results have
proved the imaging precision of Faceted HyperSARA reected by a signicant improvement in the
imaging quality with respect to the CLEAN-based wideband algorithm JC-CLEAN. Importantly
and in contrast with JC-CLEAN, Faceted HyperSARA results conrmed the recent discovery of
a super-massive second black hole in the inner core of Cyg A at much lower frequencies than
ever. On the other hand, the computing cost of Faceted HyperSARA, implemented in matlab,
was not far from that for JC-CLEAN c++ implementation (22 hours), suggesting that the gap
can be signicantly reduced, if not closed, with the forthcoming c++ implementation of Faceted
HyperSARA, and conrming its the scalability potential.
Even though HyperSARA and Faceted HyperSARA have proved to provide an accurate esti-
mation of the wideband sky, measuring the degree of condence in specic 3D structures appearing
in the estimated cube is crucial due to the severe ill-posedness of the problem. In this context, we
developed a new method in Chapter 6 that solves the uncertainty quantication problem in radio
interferometry. Mainly, the approach performs a Bayesian hypothesis test, which postulates that
the 3D structure under scrutiny is absent from the true image cube. Then, the data and the prior
image model are employed to decide if the null hypothesis is rejected or not. The hypothesis test
is expressed as a convex minimization problem and solved eciently leveraging the sophisticated
primal-dual framework with the preconditioning and splitting functionalities. As opposed to typi-
cal Bayesian inference techniques that provide uncertainty measures leveraging MCMC or proximal
MCMC algorithms, no sampling is involved in the proposed approach, allowing for scalability to
high-dimensional data and image regimes.
7.1 Perspectives
At this point, we shed light on some research directions that can further enhance the work developed
in this thesis.
Faceted HyperSARA successfully addressed the computational bottlenecks raised by both the
volume of the data and the size of the image cube. Future work should contemplate the denition
and implementation of a faceted Fourier transform to improve the data and image locality in the
proposed algorithm. More precisely, for each channel indexed l, a Fourier transform is computed
on the data core (l, 1) at each iteration. Then, each data core (l, b), with b∈ {2, . . . , B }, receives
Chapter 7: Conclusions and perspectives 122
only a few coecients of the Fourier transform from the data core (l, 1). This procedure requires
communications between the data core (l, 1) and all the data cores (l, b), with b∈ {2, . . . , B}on the
one side, and between the data core (l, 1) and all facet cores on the other side. A faceted Fourier
transform implementation would overcome this problem and enhance the distribution scheme of
Faceted HyperSARA, in that each data core implements a faceted Fourier transform of a facet core.
Hence, no communications between data cores are needed anymore. Furthermore, only one-to-one
facet to data cores communications are required.
Another perspective consists in developing a production c++ version of Faceted HyperSARA,
building from the existing c++ version of HyperSARA (see the Puri-Psi webpage), to achieve
maximum performance and scalability of a software implementation.
Deep neural networks (DNNs), in particular convolutional neural networks (CNNs), have shown
promising results in solving inverse imaging problems [67,71]. The computational cost associated
with DNNs boils down to training the DNN oine. Once trained, its application, leveraging GPU
systems, can be extremely fast, opening the door for further scalability potential. However, the
extreme ill-posedness of the radio interferometric inverse problem and the lack of rich training
data sets makes the problem complicated even for sophisticated DNN structures such as the U-
net. The emerging plug-and-play (PnP) methods in optimization theory propose to replace the
proximity operator associated with the regularization term in a proximal splitting algorithm by
a more general denoising operator. In this regard, we suggest investigating the performance of
Faceted HyperSARA, where the proximity operator of one or two of the adopted regularization
terms, namely the nuclear norm and the 2,1norm, is replaced by a CNN acting as a denoiser. In
this scenario, the prior image model can be learned directly from the data.
Last but not least, the developed uncertainty quantication method should be validated on
real data sets for broader acceptability in the research community. Importantly, real data usually
present calibration errors and if not accounted for in the measurement operator, might lead to
reconstruction artifacts. Thus, analyzing such suspected regions is crucial to make scientic deci-
sions (e.g, conrming if the faint emission in the inner core of Cyg A corresponds to a second black
hole or a reconstruction artifact). Particularly for the choice of the set S, the preliminary real
data results suggest the design of more sophisticated and scalable sets. More precisely, the current
implementation constructs Susing an inpainting technique, which replaces each pixel in the region
of the 3D structure by a weighted sum of the pixels in the 3D neighborhood. Applying this linear
inpainting at each iteration is very costly for GB image cubes. One suggestion for future work is
to dene the inpainting operator in the 3D wavelet space where the image cube is represented by
a few sparse coecients, hence accelerating the inpainting operation at each iteration.
Appendices
123
124
.1 Basic denitions in convex optimization
Denition: The domain of a function f:RN]− ∞,+]is dened as
dom f={zRN|f(z)<+∞}.(1)
Denition: A function f:RN]− ∞,+]is proper if: dom f̸=.
Denition: The epigraph of a function f:RN]− ∞,+]is the closed convex subset of
RNdened as
epi f={(z, γ)RN×R|f(z)γ}.(2)
Denition: A function f:RN],+]is lower semi-continuous if epi fis a closed set.
Denition: Let f:RN]− ∞,+]be a convex function. The subdierential of fat
¯
zRNis the set
∂f (¯
z) = y:f(z)f(¯
z) + y|z¯
z,zRN.(3)
Denition: Let fand hbe convex functions from RNto ]− ∞,+]. The inf-convolution of
fand his
(fh)(z) = inf
uRNf(u) + h(zu).(4)
125 .2 Proximity operators
.2 Proximity operators
In what follows, we dene the proximity operators required to deal with the non-smooth functions
present in the minimization problems developed in this thesis.
The indicator function of the positivity constraint ιRN×L
+:The proximity operator of the
function ιRN×L
+, enforcing positivity, is dened for a matrix ZRN×Las the projection onto the
real positive orthant:
PRN×L
+(Z) = max{ℜ(Z),0}.(5)
The weighted nuclear norm µ∥·∥:Given a matrix ZRN×L, the proximity operator of
the weighted nuclear norm µZ , involves soft-thresholding of the vector of the singular values
σ(Z). These are obtained by means of the singular value decomposition (SVD); Z=Λ1ΣΛ
2, with
Σ= Diag(σ). The proximity operator of µZis thus given by:
S
µω(Z) = Λ1Diag S1
µω(σ)Λ
2,(6)
with:
S1
µω(σ) = max {σjµ ωj,0},j∈ {1,·· · , J},(7)
where ωj0is the weight associated with the j-th singular value σjand µis the soft-thresholding
parameter.
The weighted 2,1norm µ· 2,1:The proximity operator of the weighted 2,1norm reads a
row-wise soft-thresholding operation, dened for a matrix Z= [z
1,·· · ,z
N]RN×Las follows:
S2,1
µω(Z) = max {∥zn2µ ωn,0}z
n
zn21nN
,(8)
where ωn0is the weight associated with the row z
nand µis the soft-thresholding parameter.
The epigraph of the weighted nuclear norm E :Given a matrix ZRN×L, the proximity
operator of the epigraph of the weighted nuclear norm reads an epigraphical projection onto the
weighted nuclear ball of radius eu=PJ
j=1 uj, with Jbeing the the number of the singular values
of Zand u= (uj)1jJ. Interestingly, this projection, denoted by PE(Z,u), resolves to an
epigraphical projection of the vector of the singular values σ(Z)onto the weighted 1ball of radius
eu. This requires an SVD operation; Z=Λ1ΣΛ
2, with Σ= Diag(σ). That being said, we can
write the epigraphical projection onto the weighted nuclear ball as
PE(Z,u) = Λ1Diag PE1(σ,u)Λ
2,(9)
126
where PE1(σ,u) = (p,θ)with p= (pj)1jJ,θ= (θj)1jJand j∈ {1,·· · , J}:
pj=
σjωjσjuj
1
1 + ω2
j
max{σj+ωjuj,0}Otherwise,,(10)
θj= max{ωjpj, uj},(11)
where ωj0is the weight associated with the j-th singular value σj.
The epigraph of the weighted 2,1norm E2,1:Given a matrix Z= [z
1,·· · ,z
N]RN×L,
the proximity operator of the epigraph of the weighted 2,1norm reads an epigraphical projection
onto the weighted 2,1ball of radius eu=PN
n=1 unwith u= (un)1nN. This projection, denoted
by PE2,1(Z,u)reads a row-wise epigraphical projections onto the weighted 2balls as follows:
PE2,1(Z,u) = (P,θ)with P= [p
1,·· · ,p
N]RN×L,θ= (θn)1nNand n∈ {1,·· · , N}:
p
n=
0z
n=0
z
nωnzn2un
1
1 + ω2
n
max 1 + ωnun
zn2
,0z
nOtherwise,
,(12)
θn= max{ωnpn2, un},(13)
where ωn0is the weight associated with the row z
n.
The indicator function of a half-space ιV:The proximity operator of the indicator function
of a half-space dened as
V=
z= [z1,z2]RN1×RN2|
J
X
j=1
z1,j +
N
X
n=1
z2,n eηα
(14)
is given by the projection onto the hyper-plane dening the boundary of the half-space as follows:
PV(eηα)(z) =
zPJ
j=1 z1,j +PN
n=1 z2,n eηα
z+eηαPJ
j=1 z1,j +PN
n=1 z2,n
N1+N2
Otherwise.
(15)
The indicator function of a box ι[τ,τ ]N×L:Given a matrix ZRN×L, the proximity operator
of the indicator function ι[τ,τ ]N×Lis a projection onto the box as follows:
Pι[τ,τ ]N×L(Z) = max{min{Z, τ },τ}.(16)
127 .2 Proximity operators
The indicator function of the 2ball ιB2(y):Given a vector zRN, the proximity operator
of the indicator function ιB2(y)is a projection onto 2ball centered at yRNand of radius ϵR
as follows:
PιB2(y(z) = y+min ϵ
zy,1(zy).(17)
128
.3 Overview of the parameters specic to the adaptive PDFB
algorithm (Algorithm 3)
An overview of the variables and parameters involved in the adjustment of the 2bounds on the
data delity terms is presented in Tables 1and 2, respectively.
Table 1: Overview of the variables employed in the adaptive procedure incorporated in Algorithm 3.
ρb
l
(t)the 2norm of the residual data corre-
sponding to the data block yb
lat iteration
t.
ϑb
l
(t1) iteration index of the previous update of
the 2bound of the data block yb
l.
β(t1) the relative variation of the solution at it-
eration t1.
Table 2: Overview of the parameters involved in the adaptive procedure incorporated in Algorithm 3.
λ1]0,1[ the bound on the relative variation of the
solution (we x it to 5×104).
λ2]0,1[ the tolerance on the relative dierence be-
tween the current estimate of a data block
2bound and the 2norm of the associated
residual data (we x it to 0.01).
λ3]0,1[ the parameter dening the increment of
the 2bound with respect to the 2norm
of the residual data (we x it to 0.618).
¯
ϑthe minimum number of iterations between
consecutive updates of each 2bound (we
x it to 100).
129 .4 Randomized PDFB algorithm
.4 Randomized PDFB algorithm
We have explained in Section 3.4.2 that the primal dual framework adopted in this thesis allows for
randomized updates of the dual variables [93]. This feature can signicantly reduce the memory
requirements, thus ensuring higher scalability of the algorithmic structure, at the expense of an
increased number of iterations to achieve convergence. To give the reader a brief idea about the
randomization procedure, we showcase the performance of the HyperSARA approach (proposed
in Chapter 4) with and without randomization of the dual variables associated with the data-
delity blocks. Furthermore, we report the performance of one of the convex optimization methods
developed for wideband RI imaging, dubbed WDCT, [54]. Since WDCT involves no reweighting,
we restrict HyperSARA to the simple case of no reweighting, resulting the LRJAS algorithm.
One the one hand, LRJAS solves a constrained minimization problem of the form (4.8) with
ω=1Jand ω=1T, promoting low-rankness and joint average sparsity of the image cube in a
redundant wavelet dictionary, namely the SARA dictionary. On the other hand, WDCT solves an
unconstrained minimization problem of the form (3.32), promoting sparsity of both the spatial and
spectral information. Spatial sparsity is promoted in a redundant wavelet dictionary and sparsity
of the spectra is enforced in a DCT dictionary.
.4.1 Simulations and results
Results are reported for simulations using a radio emission map from an HII region in the M31
galaxy. The image is of size N= 256 ×256 pixels and is considered as the original sky image x1
at the reference frequency ν1= 1.4GHz (Figure 1, rst row). The RI image cube is simulated
following the spectral curvature model (2.23). In order to ensure a spatial correlation in the
spectral index map, the latter is generated in an ad hoc manner, similarly to [54,69], that is a
linear combination of the reference sky image x1smoothed with a Gaussian kernel of size 3×3at
FWHM, and a random Gaussian eld. The RI data cube is simulated using realistic uv-coverage
from the VLA array at the reference frequency ν1. For each channel indexed l, its corresponding
uv-coverage is obtained by scaling the reference uv-coverage with νl1. The data cube is generated
within the frequency range [ν1, νL] = [1.4,2.8] GHz, with uniformly sampled channels. The test is
carried on a cube of total number of L= 16 channels, sampling rate SR (4.15) equal to 0.5, and
iSNR (4.13) of 30 dB. We assign one data block per channel, resulting in a total number of 16
data blocks. The adopted metric to assess the reconstruction quality of the dierent methods is
the aSNR metric (4.20).
Figure 1, second row, reveals the aSNR evolution for the dierent algorithms. For the ran-
domised LRJAS method, dubbed LRJAS-R, we x the probability of selecting an active subset
from the full data to 0.5, meaning that only half of the data-delity blocks, selected at random,
are updated at each iteration. This has the advantage of lower infrastructure and memory require-
130
(a) Ground-truth image at channel ν1= 1.4GHz
(b) aSNR evolution
Figure 1: Simulations with VLA uv-coverage: (a) The ground-truth image at the reference frequency
x1. (b) Curves representing the evolution of aSNR (y-axis) as a function of the number of iterations
(x-axis), for the dierent methods: LRJAS, LRJAS-R (LRJAS with randomized updates) and WDCT.
ments, at the expense of an increased number of iterations to achieve convergence. We can see that
LRJAS and LRJAS-R exhibit comparable performance. Yet, LRJAS-R needs more iterations to
reach the same aSNR value. Also, when compared to WDCT, the randomized algorithm presents
superior performance; LRJAS-R reaches an aSNR = 25 dB that is 5dB higher than WDCT.
It is worth noting that this randomization scheme can be used in the same fashion with the
Faceted HyperSARA approach (proposed in Chapter 5) and the uncertainty quantication algo-
rithm (introduced in Chapter 6), allowing to update the dual variables less often.
Bibliography
[1] A. Abdulaziz. A Low-Rank and Joint-Sparsity Model for Wide-Band Radio-Interferometric
Imaging. Master’s thesis, Heriot-Watt University, United Kingdom, 2016.
[2] A. Abdulaziz, A. Dabbech, A. Onose, and Y. Wiaux. A low-rank and joint-sparsity model
for hyper-spectral radio-interferometric imaging. In 2016 24th European Signal Processing
Conference (EUSIPCO), pages 388–392, Aug 2016.
[3] A. Abdulaziz, A. Dabbech, and Y. Wiaux. Wideband super-resolution imaging in radio
interferometry via low rankness and joint average sparsity models (hypersara). Monthly
Notices of the Royal Astronomical Society, 489(1):1230–1248, 2019.
[4] A. Abdulaziz, A. Onose, A. Dabbech, and Y. Wiaux. A distributed algorithm for wide-
band radio-interferometry. In International Biomedical and Astronomical Signal Processing
Frontiers Workshop, page 6, 1 2017.
[5] A. Abdulaziz, A. Repetti, and Y. Wiaux. Hyperspectral uncertainty quantication by opti-
mization. 2019.
[6] R. Ammanouil, A. Ferrari, R. Flamary, C. Ferrari, and D. Mary. Multi-frequency image
reconstruction for radio-interferometry with self-tuned regularization parameters. In 2017
25th European Signal Processing Conference (EUSIPCO), pages 1435–1439, Aug 2017.
[7] P. Arras, P. Frank, R. Leike, R. Westermann, and T. Enßlin. Unied radio interferometric cal-
ibration and imaging with joint uncertainty quantication. arXiv preprint arXiv:1903.11169,
2019.
[8] H. H. Bauschke and P. L. Combettes. Convex analysis and monotone operator theory in
Hilbert spaces. Springer Science & Business Media, 2011.
[9] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse
problems. SIAM journal on imaging sciences, 2(1):183–202, 2009.
[10] S. Bhatnagar and T. J. Cornwell. Scale sensitive deconvolution of interferometric images-i.
adaptive scale pixel (asp) decomposition. Astronomy & Astrophysics, 426(2):747–754, 2004.
131
BIBLIOGRAPHY 132
[11] J. Birdi, A. Repetti, and Y. Wiaux. Sparse interferometric Stokes imaging under polarization
constraint (Polarized SARA). 478(4):4442–4463, August 2018.
[12] J. Birdi, A. Repetti, and Y. Wiaux. Polca SARA - full polarization, direction-dependent
calibration and sparse imaging for radio interferometry. 2019. to appear.
[13] T. Blumensath and M. E. Davies. Iterative thresholding for sparse approximations. Journal
of Fourier analysis and Applications, 14(5-6):629–654, 2008.
[14] M. Born and E. Wolf. Principles of optics: electromagnetic theory of propagation, interference
and diraction of light. Elsevier, 2013.
[15] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and
statistical learning via the alternating direction method of multipliers. Foundations and
Trends® in Machine Learning, 3(1):1–122, 2011.
[16] D. S. Briggs. High delity interferometric imaging: robust weighting and nnls deconvolution.
In Bulletin of the American Astronomical Society, volume 27, page 1444, 1995.
[17] C. L. Brogan, J. D. Gelfand, B. M. Gaensler, N. E. Kassim, and T. J. W Lazio. Discovery of 35
new supernova remnants in the inner galaxy. The Astrophysical Journal Letters, 639(1):L25,
2006.
[18] G. B.Taylor, C. l. Carilli, and R. A. Perley. Synthesis imaging in radio astronomy ii. In
Synthesis Imaging in Radio Astronomy II, volume 180, 1999.
[19] E. J. Candès. Compressive sampling. In Proceedings of the international congress of mathe-
maticians, volume 3, pages 1433–1452. Madrid, Spain, 2006.
[20] E. J. Candès, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? journal
of ACM, 58(1):1–37, 2009.
[21] E. J. Candès, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal recon-
struction from highly incomplete frequency information. IEEE Transactions on information
theory, 52(2):489–509, 2006.
[22] E. J. Candès and M. B. Wakin. An introduction to compressive sampling [a sensing/sampling
paradigm that goes against the common knowledge in data acquisition]. IEEE signal pro-
cessing magazine, 25(2):21–30, 2008.
[23] E. J. Candes, M. B. Wakin, and S. P. Boyd. Enhancing sparsity by reweighted � 1 minimiza-
tion. Journal of Fourier analysis and applications, 14(5):877–905, 2008.
133 BIBLIOGRAPHY
[24] C. L. Carilli, S. Furlanetto, F. Briggs, M. Jarvis, S. Rawlings, and H. Falcke. Probing the
dark ages with the square kilometer array. New Astronomy Reviews, 48(11):1029 – 1038,
2004. Science with the Square Kilometre Array.
[25] R. E. Carrillo, J. D. McEwen, D. Van De Ville, J. Thiran, and Y. Wiaux. Sparsity averaging
for compressive imaging. IEEE Signal Processing Letters, 20(6):591–594, June 2013.
[26] R. E. Carrillo, J. D. McEwen, and Y. Wiaux. Sparsity averaging reweighted analysis
(SARA): a novel algorithm for radio-interferometric imaging. Monthly Notices of the Royal
Astronomical Society, 426(2):1223–1234, 2012.
[27] R. E. Carrillo, J. D. McEwen, and Y. Wiaux. PURIFY: a new approach to radio-
interferometric imaging. Monthly Notices of the Royal Astronomical Society, 439(4):3591–
3604, 2014.
[28] A. Chambolle and T. Pock. A rst-order primal-dual algorithm for convex problems with
applications to imaging. Journal of mathematical imaging and vision, 40(1):120–145, 2011.
[29] A. Chambolle and T. Pock. An introduction to continuous optimization for imaging. Acta
Numerica, 25:161–319, 2016.
[30] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit.
SIAM review, 43(1):129–159, 2001.
[31] G. Chierchia, N. Pustelnik, J. c. Pesquet, and B. Pesquet-Popescu. Epigraphical projection
and proximal tools for solving constrained convex optimization problems. Signal, Image and
Video Processing, 9(8):1737–1749, 2015.
[32] B.G. Clark. An ecient implementation of the algorithm’clean’. Astronomy and Astrophysics,
89:377, 1980.
[33] P. L. Combettes and J.-C. Pesquet. Proximal thresholding algorithm for minimization over
orthonormal bases. SIAM Journal on Optimization, 18(4):1351–1376, 2007.
[34] P. L. Combettes and J.-C. Pesquet. Proximal splitting methods in signal processing. In Fixed-
point algorithms for inverse problems in science and engineering, pages 185–212. Springer,
2011.
[35] P. L. Combettes and J.-C. Pesquet. Primal-dual splitting algorithm for solving inclusions with
mixtures of composite, lipschitzian, and parallel-sum type monotone operators. Set-Valued
and Variational Analysis, 20(2):307–330, 2012.
[36] P. L. Combettes and B. C. Vũ. Variable metric forward–backward splitting with applications
to monotone inclusions in duality. Optimization, 63(9):1289–1318, 2014.
BIBLIOGRAPHY 134
[37] L. Condat. A primal–dual splitting method for convex optimization involving lipschitzian,
proximable and linear composite terms. Journal of Optimization Theory and Applications,
158(2):460–479, 2013.
[38] T. J. Cornwell. Multiscale clean deconvolution of radio synthesis images. IEEE Journal of
Selected Topics in Signal Processing, 2(5):793–801, Oct 2008.
[39] E. C.Sutton and B. D. Wandelt. Optimal image reconstruction in radio interferometry. The
Astrophysical Journal Supplement Series, 162(2):401, 2006.
[40] D. A. Perley, R. A. Perley, V. Dhawan, and C. L. Carilli. Discovery of a Luminous Ra-
dio Transient 460 pc from the Central Supermassive Black Hole in Cygnus A. Journal of
something, 841:117, June 2017.
[41] A. Dabbech. Déconvolution d’images en radio-astronomie centimétrique pour l’exploitation
de LOFAR et SKA: caractérisation du milieu non-thermique des amas de galaxies. PhD
thesis, PhD thesis, Univ. Nice, 2014.
[42] A. Dabbech, C. Ferrari, D. Mary, E. Slezak, O. Smirnov, and J. S. Kenyon. MORESANE:
MOdel REconstruction by Synthesis-ANalysis Estimators - A sparse deconvolution algorithm
for radio interferometric imaging”. Astronomy and Astrophysics, 576:16, 2015.
[43] A. Dabbech, D. Mary, and C. Ferrari. Astronomical image deconvolution using sparse priors:
An analysis-by-synthesis approach. In 2012 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pages 3665–3668, March 2012.
[44] A. Dabbech, A. Onose, A. Abdulaziz, R. A. Perley, O. M. Smirnov, and Y. Wiaux. Cygnus
a super-resolved via convex optimization from vla data. Monthly Notices of the Royal Astro-
nomical Society, 476(3):2853–2866, 2018.
[45] A. Dabbech, A. Repetti, and Y. Wiaux. Self direction-dependent calibration for wideband
radio-interferometric imaging. 2 2019. International BASP Frontiers workshop 2019 ; Con-
ference date: 03-02-2019 Through 08-02-2019.
[46] I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm for linear in-
verse problems with a sparsity constraint. Communications on Pure and Applied Mathemat-
ics: A Journal Issued by the Courant Institute of Mathematical Sciences, 57(11):1413–1457,
2004.
[47] J. Deguignet, A. Ferrari, D. Mary, and C. Ferrari. Distributed multi-frequency image re-
construction for radio-interferometry. In 2016 24th European Signal Processing Conference
(EUSIPCO), pages 1483–1487, Aug 2016.
135 BIBLIOGRAPHY
[48] P. Dewdney, W. Turner, R. Millenaar, R. McCool, J. Lazio, and T. Cornwell. Ska1 system
baseline design. Document number SKA-TEL-SKO-DD-001 Revision, 1(1), 2013.
[49] D. L. Donoho. Compressed sensing. Information Theory, IEEE Transactions on, 52(4):1289–
1306, 2006.
[50] A. Durmus, E. Moulines, and M. Pereyra. Ecient bayesian computation by proximal
markov chain monte carlo: when langevin meets moreau. SIAM Journal on Imaging Sciences,
11(1):473–506, 2018.
[51] M. Elad, B. Matalon, J. Shtok, and M. Zibulevsky. A wide-angle view at iterated shrinkage
algorithms. In Wavelets XII, volume 6701, page 670102. International Society for Optics and
Photonics, 2007.
[52] M. Elad, P. Milanfar, and R. Rubinstein. Analysis versus synthesis in signal priors. Inverse
problems, 23(3):947, 2007.
[53] H. W. Engl, M. Hanke, and A. Neubauer. Regularization of inverse problems, volume 375.
Springer Science & Business Media, 1996.
[54] A. Ferrari, J. Deguignet, C. Ferrari, D. Mary, A. Schutz, and O. Smirnov. Multi-frequency
image reconstruction for radio interferometry. a regularized inverse problem approach. arXiv
preprint arXiv:1504.06847, 2015.
[55] J. A. Fessler and B. P. Sutton. Nonuniform fast Fourier transforms using min-max interpo-
lation. 51(2):560–574, January 2003.
[56] B. M. Gaensler, R. Beck, and L. Feretti. The origin and evolution of cosmic magnetism. New
Astronomy Reviews, 48(11-12):1003–1012, 2004.
[57] H. Garsden, J. Girard, J.-L. Starck, S. Corbel, C. Tasse, A. Woiselle, J. McKean, A. S.Van
Amesfoort, J. Anderson, I. Avruch, et al. Lofar sparse image reconstruction. Astronomy &
astrophysics, 575:A90, 2015.
[58] G. H. G.Chen and R. T. Rockafellar. Convergence rates in forward–backward splitting. SIAM
Journal on Optimization, 7(2):421–444, 1997.
[59] J. Geiping and M. Moeller. Composite optimization by nonconvex majorization-
minimization. 11(4):2494–2598, 2018.
[60] J. N. Girard, H. Garsden, J. L. Starck, S. Corbel, A. Woiselle, C. Tasse, J. P. McKean, and
J. Bobin. Sparse representations and convex optimization as tools for lofar radio interfero-
metric imaging. Journal of Instrumentation, 10(08):C08013, 2015.
BIBLIOGRAPHY 136
[61] M. Golbabaee, S. Arberet, and P. Vandergheynst. Compressive source separation: The-
ory and methods for hyperspectral imaging. IEEE Transactions on Image Processing,
22(12):5096–5110, Dec 2013.
[62] M. Golbabaee and P. Vandergheynst. Hyperspectral image compressed sensing via low-rank
and joint-sparse matrix recovery. In 2012 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pages 2741–2744, March 2012.
[63] J.-B. Hiriart-Urruty and C. Lemaréchal. Convex analysis and minimization algorithms ii:
Advanced theory and bundle methods, vol. 306 of grundlehren der mathematischen wis-
senschaften, 1993.
[64] J. A. Högbom. Aperture synthesis with a non-regular distribution of interferometer baselines.
Astronomy and Astrophysics Supplement Series, 15:417, 1974.
[65] D. R. Hunter and K. Lange. A tutorial on MM algorithms. The American Statistician,
58(1):30–37, 2004.
[66] M. Jiang, J. Bobin, and J. Starck. Joint multichannel deconvolution and blind source sepa-
ration. SIAM Journal on Imaging Sciences, 10(4):1997–2021, 2017.
[67] K. H. Jin, M. T. McCann, E. Froustey, and M. Unser. Deep convolutional neural network
for inverse problems in imaging. IEEE Transactions on Image Processing, 26(9):4509–4522,
2017.
[68] J. Jonas et al. The meerkat radio telescope. In MeerKAT Science: On the Pathway to the
SKA, volume 277, page 001. SISSA Medialab, 2018.
[69] H. Junklewitz, M. R. Bell, and T. Enßlin. A new approach to multifrequency synthesis in
radio interferometry. Astronomy & Astrophysics, 581:A59, 2015.
[70] H. Junklewitz, M. R. Bell, M. Selig, and T. A. Enßlin. Resolve: A new algorithm for aperture
synthesis imaging of extended emission in radio astronomy. Astronomy & Astrophysics,
586:A76, 2016.
[71] F. Knoll, K. Hammernik, C. Zhang, S. Moeller, T. Pock, D. K. Sodickson, and M. Akcakaya.
Deep learning methods for parallel magnetic resonance image reconstruction. arXiv preprint
arXiv:1904.01112, 2019.
[72] N. Komodakis and J.-C. Pesquet. Playing with duality: An overview of recent primal-dual
approaches for solving large-scale optimization problems. IEEE Signal Processing Magazine,
32(6):31–54, Nov 2015.
137 BIBLIOGRAPHY
[73] D. Li, R. Nan, and Z. Pan. The ve-hundred-meter aperture spherical radio telescope project
and its early science opportunities. Proceedings of the International Astronomical Union,
8(S291):325–330, 2012.
[74] F. Li, T. J. Cornwell, and F. de Hoog. The application of compressive sampling to radio
astronomy - i. deconvolution. Astronomy & Astrophysics, 528:A31, 2011.
[75] M. P. van Haarlem, M. W. Wise, A. W. Gunst, G. Heald, J. P. McKean, J. W. T. Hessels,
A. G. de Bruyn, R. Nijboer, J. Swinbank, R. Fallows, and others. Lofar: The low-frequency
array. A&A, 556:A2, 2013.
[76] S. Mallat et al. A wavelet tour of signal processing: The sparce way. AP Professional, Third
Edition, London, 2009.
[77] S. G. Mallat and Z. Zhang. Matching with time-frequency dictionaries. IEEE Transactions
on signal processing, 41(12):3397–3415, 1993.
[78] J.-J. Moreau. Proximité et dualité dans un espace hilbertien. Bulletin de la Société mathé-
matique de France, 93:273–299, 1965.
[79] R. Murya, A. Ferrari, R. Flamary, and C. Richard. Distributed deblurring of large images
of wide eld-of-view. arXiv preprint, May 2017.
[80] S. Naghibzedeh, A. Repetti, A.-J. van der Veen, and Y. Wiaux. Facet-based regularization
for scalable radio-interferometric imaging. Roma, Italy, September 2018. to appear.
[81] J. Nocedal and S. Wright. Numerical optimization. Springer Science & Business Media, 2006.
[82] P. Ochs, A. Dosovitskiy, T. Brox, and T. Pock. On iteratively reweighted algorithms for
non-smooth nonconvex optimization in computer vision. 8(1):331–372, 2015.
[83] P. Ochs, J. Fadili, and T. Brox. Non-smooth non-convex Bregman minimization: Unication
and new algorithms. 181(1):244–278, 2019.
[84] A. R. Oringa, B. McKinley, N. Hurley-Walker, F. H. Briggs, R. B. Wayth, D. L. Kaplan,
M. E. Bell, L. U. Feng, A. R. Neben, J. D. Hughes, et al. Wsclean: an implementation of a
fast, generic wide-eld imager for radio astronomy. Monthly Notices of the Royal Astronomical
Society, 444(1):606–619, 2014.
[85] A. R. Oringa and O. Smirnov. An optimized algorithm for multiscale wideband decon-
volution of radio astronomical images. Monthly Notices of the Royal Astronomical Society,
471(1):301–316, 2017.
BIBLIOGRAPHY 138
[86] A. Onose, R. E. Carrillo, J. D. McEwen, and Y. Wiaux. A randomised primal-dual algo-
rithm for distributed radio-interferometric imaging. In 2016 24th European Signal Processing
Conference (EUSIPCO), pages 1448–1452, Aug 2016.
[87] A. Onose, R. E. Carrillo, A. Repetti, J. D. McEwen, J. p. Thiran, J.-C. Pesquet, and
Y. Wiaux. Scalable splitting algorithms for big-data interferometric imaging in the ska
era. Monthly Notices of the Royal Astronomical Society, 462(4):4314–4335, 2016.
[88] A. Onose, A. Dabbech, and Y. Wiaux. An accelerated splitting algorithm for radio-
interferometric imaging: when natural and uniform weighting meet. Monthly Notices of
the Royal Astronomical Society, 469(1):938–949, 2017.
[89] H. Pan, M. Simeoni, P. Hurley, T. Blu, and M. Vetterli. Leap: Looking beyond pixels with
continuous-space estimation of point sources. Astronomy & Astrophysics, 608:A136, 2017.
[90] M. Pereyra. Proximal markov chain monte carlo algorithms. Statistics and Computing,
26(4):745–760, 2016.
[91] M. Pereyra. Maximum-a-posteriori estimation with bayesian condence regions. SIAM Jour-
nal on Imaging Sciences, 10(1):285–302, 2017.
[92] R. A. Perley, C. J. Chandler, B. J. Butler, and J. M. Wrobel. The expanded very large array:
A new telescope for new science. The Astrophysical Journal Letters, 739(1):L1, 2011.
[93] J.-C. Pesquet and A. Repetti. A class of randomized primal-dual algorithms for distributed
optimization. 16(12):2453–2490.
[94] L. Pratley, J. D. McEwen, M. d’Avezac, R. E. Carrillo, A. Onose, and Y. Wiaux. Robust
sparse image reconstruction of radio interferometric observations with purify. Monthly Notices
of the Royal Astronomical Society, 473(1):1038–1058, 2017.
[95] Z. Pruša. Segmentwise discrete wavelet transform. PhD thesis, Brno university of technology,
2012.
[96] U. Rau and T. J. Cornwell. A multi-scale multi-frequency deconvolution algorithm for syn-
thesis imaging in radio interferometry. Astronomy & Astrophysics, 532:A71, 2011.
[97] S. Rawlings, F. B. Abdalla, S. L. Bridle, C. A. Blake, C. M. Baugh, L. J. Greenhill, and
J. M. van der Hulst. Galaxy evolution, cosmology and dark energy with the square kilometer
array. New Astronomy Reviews, 48(11):1013 – 1027, 2004. Science with the Square Kilometre
Array.
[98] A. Repetti, J. Birdi, A. Dabbech, and Y. Wiaux. Non-convex optimization for self-calibration
of direction-dependent eects in radio interferometric imaging. 470(4):3981–4006, October
2017.
139 BIBLIOGRAPHY
[99] A. Repetti, M. Pereyra, and Y. Wiaux. Uncertainty quantication in imaging: When convex
optimization meets bayesian analysis. In 2018 26th European Signal Processing Conference
(EUSIPCO), pages 2668–2672. IEEE, 2018.
[100] A. Repetti, M. Pereyra, and Y. Wiaux. Scalable bayesian uncertainty quantication in
imaging inverse problems via convex optimization. SIAM Journal on Imaging Sciences,
12(1):87–118, 2019.
[101] A. Repetti and Y. Wiaux. A non-convex perspective on calibration and imaging in radio
interferometry. In Proceedings of the conference on Wavelets and Sparsity XVII, part of
the SPIE Optical Engineering + Applications, San Diego, California, United States, August
2017.
[102] A. Repetti and Y. Wiaux. Variable metric forward-backward algorithm for composite mini-
mization problems. arXiv preprint, 2019.
[103] C. Robert. The Bayesian choice: from decision-theoretic foundations to computational im-
plementation. Springer Science & Business Media, 2007.
[104] R. Rubinstein, A. M. Bruckstein, and M. Elad. Dictionaries for sparse representation mod-
eling. Proceedings of the IEEE, 98(6):1045–1057, 2010.
[105] L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algo-
rithms. Physica D: nonlinear phenomena, 60(1-4):259–268, 1992.
[106] G. B. Rybicki and A. P. Lightman. Radiative processes in astrophysics. John Wiley & Sons,
2008.
[107] R. J. Sault and M. H. Wieringa. Multi-frequency synthesis techniques in radio interferometric
imaging. Astronomy and Astrophysics Supplement Series, 108, 1994.
[108] A. M. M. Scaife. Big telescope, big data: towards exascale with the square kilometre array.
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering
Sciences, 378(2166):20190060, 2020.
[109] F. R. Schwab and W. D. Cotton. Global fringe search techniques for vlbi. The Astronomical
Journal, 88:688–694, 1983.
[110] J.-L. Starck, D. L. Donoho, and E. J. Candès. Astronomical image representation by the
curvelet transform. Astronomy & Astrophysics, 398(2):785–800, 2003.
[111] J.-L. Starck and F. Murtagh. Image restoration with noise suppression using the wavelet
transform. Astronomy and Astrophysics, 288:342–348, 1994.
BIBLIOGRAPHY 140
[112] J.-L. Starck, F. Murtagh, and J. M. Fadili. Sparse image and signal processing: wavelets,
curvelets, morphological diversity. Cambridge university press, 2010.
[113] J.-L. Starckand, A. Bijaoui, B. Lopez, and C. Perrier. Image reconstruction by the wavelet
transform applied to aperture synthesis. Astronomy and Astrophysics, 283:349–360, 1994.
[114] P. M. Sutter, B. D. Wandelt, J. D. McEwenand E. F. Bunn, A. Karakci, A. Korotkov,
P. Timbie, and G. S. Tuckerand L. Zhang. Probabilistic image reconstruction for radio
interferometers. Monthly Notices of the Royal Astronomical Society, 438(1):768–778, 2014.
[115] C. Tasse, B. Hugo, M. Mirmont, O. Smirnov, M. Atemkeng, L. Bester, M. J. Hardcastle,
R. Lakhoo, S. Perkins, and T. Shimwell. Faceting for direction-dependent spectral deconvo-
lution. Astronomy & Astrophysics, 611:A87, March 2018.
[116] A. R. Thompson, J. M. Moran, and G. W. Swenson. Interferometry and Synthesis in Radio
Astronomy. Wiley-VCH, 2007.
[117] P.-A. Thouvenin, A. Abdulaziz, M. Jiang, A. Dabbech, A. Repetti, A. Jackson, J.-P. Thi-
ran, and Y. Wiaux. Cygnus A image cubes at C band (4-8 GHz) obtained with Faceted
HyperSARA, 2020.
[118] P.-A. Thouvenin, A. Abdulaziz, M. Jiang, A. Dabbech, A. Repetti, A. Jackson, J.-P. Thiran,
and Y. Wiaux. Parallel faceted imaging in radio interferometry via proximal splitting (faceted
hypersara): when precision meets scalability. Monthly Notices of the Royal Astronomical
Society, 2020.
[119] P.-A. Thouvenin, A. Abdulaziz, M. Jiang, A. Repetti, and Y. Wiaux. A faceted prior for
scalable wideband computational imaging. 2019.
[120] P.-A. Thouvenin, A. Abdulaziz, M. Jiang, A. Repetti, and Y. Wiaux. A faceted prior for
scalable wideband imaging: Application to radio astronomy. In 2019 IEEE International
Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP),
2019.
[121] P.-A. Thouvenin, A. Repetti, A. Dabbech, and Y. Wiaux. Time-regularized blind deconvo-
lution approach for radio interferometry. pages 475–479, Sheeld, UK, July 2018.
[122] M. P. van Haarlem, M. W. Wise, A. W. Gunst, G. Heald, J. P. McKean, J. W. Hessels,
A. G. de Bruyn, R. Nijboer, Swinbank, R. Fallows, et al. Lofar: The low-frequency array.
Astronomy & astrophysics, 556:A2, 2013.
[123] B. C. Vũ. A splitting algorithm for dual monotone inclusions involving cocoercive operators.
Advances in Computational Mathematics, 38(3):667–681, 2013.
141 BIBLIOGRAPHY
[124] S. Wenger and M. Magnor. A sparse reconstruction algorithm for multi-frequency radio
images. Technical report, Computer Graphics Lab, TU Braunschweig, 2014.
[125] S. Wenger, M. Magnor, Y. Pihlström, S. Bhatnagar, and U. Rau. Sparseri: A compressed
sensing framework for aperture synthesis imaging in radio astronomy. Publications of the
Astronomical Society of the Pacic, 122(897):1367, 2010.
[126] Y. Wiaux, L. Jacques, G. Puy, A. M. M. Scaife, and P. Vandergheynst. Compressed sensing
imaging techniques for radio interferometry. Monthly Notices of the Royal Astronomical
Society, 395(3):1733–1742, 2009.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Upcoming radio interferometers are aiming to image the sky at new levels of resolution and sensitivity, with wide-band image cubes reaching close to the Petabyte scale for SKA. Modern proximal optimization algorithms have shown a potential to significantly outperform CLEAN thanks to their ability to inject complex image models to regularize the inverse problem for image formation from visibility data. They were also shown to be parallelizable over large data volumes thanks to a splitting functionality enabling the decomposition of the data into blocks, for parallel processing of block-specific data-fidelity terms involved in the objective function. Focusing on intensity imaging, the splitting functionality is further exploited in this work to decompose the image cube into spatio-spectral facets, and enable parallel processing of facet-specific regularization terms in the objective function, leading to the “Faceted HyperSARA” algorithm. Reliable heuristics enabling an automatic setting of the regularization parameters involved in the objective are also introduced, based on estimates of the noise level, transferred from the visibility domain to the domains where the regularization is applied. Simulation results based on a MATLAB implementation and involving synthetic image cubes and data close to Gigabyte size confirm that faceting can provide a major increase in parallelization capability when compared to the non-faceted approach (HyperSARA).
Article
Full-text available
New generation of radio interferometers are envisaged to produce high quality, high dynamic range Stokes images of the observed sky from the corresponding undersampled Fourier domain measurements. In practice, these measurements are contaminated by the instrumental and atmospheric effects that are well represented by Jones matrices, and are most often varying with observation direction and time. These effects, usually unknown, act as a limiting factor in achieving the required imaging performance and thus, their calibration is crucial. To address this issue, we develop a global algorithm, named Polca SARA, aiming to perform full polarization, direction-dependent calibration, and sparse imaging by employing a non-convex optimization technique. In contrast with the existing approaches, the proposed method offers global convergence guarantees and flexibility to incorporate sophisticated priors to regularize the imaging as well as the calibration problem. Thus, we adapt a polarimetric imaging specific method, enforcing the physical polarization constraint along with a sparsity prior for the sought images. We perform extensive simulation studies of the proposed algorithm. The results indicate the superior performance of polarization constraint based imaging when combined with the calibration of the direction-dependent effects for full Jones matrices, including their off-diagonal terms (denoting polarization leakage). The chosen priors in the proposed approach are also shown to handle the unitary ambiguity problem to a good extent.
Article
Full-text available
Unlike optical telescopes, radio interferometers do not image the sky directly but require specialized image formation algorithms. For the Square Kilometre Array (SKA), the computational requirements of this image formation are extremely demanding due to the huge data rates produced by the telescope. This processing will be performed by the SKA Science Data Processor facilities and a network of SKA Regional Centres, which must not only deal with SKA-scale data volumes but also with stringent science-driven image fidelity requirements. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.
Article
Full-text available
We propose a new approach within the versatile framework of convex optimization to solve the radio-interferometric wideband imaging problem. Our approach, dubbed HyperSARA, leverages low rankness and joint average sparsity priors to enable formation of high resolution and high dynamic range image cubes from visibility data. The resulting minimization problem is solved using a Primal-Dual (PD) algorithm. The algorithmic structure is shipped with highly interesting functionalities such as preconditioning for accelerated convergence, and parallelization enabling to spread the computational cost and memory requirements across a multitude of processing nodes with limited resources. In this work, we provide a proof of concept for wideband image reconstruction of megabyte-size images. The better performance of HyperSARA, in terms of resolution and dynamic range of the formed images, compared to single channel imaging and the-based wideband imaging algorithm in the software, is showcased on simulations and real VLA observations. Our code is available online on .
Conference Paper
Full-text available
Wideband radio-interferometric (RI) imaging consists in estimating images of the sky across a whole frequency band from incomplete Fourier data. Powerful prior information is needed to regularize the inverse imaging problem. At the extreme resolution and dynamic range of interest to modern telescopes, image cubes will far exceed Terabyte sizes, with data volumes orders of magnitude larger, making image estimation a very challenging task. The computational cost and memory requirements of corresponding iterative image recovery algorithms are extreme and call for high parallelism. A data-splitting strategy was recently introduced to parallelize computations over data blocks within an advanced primal-dual convex optimization algorithm. Building on the same algorithm, we propose an image faceting approach that consists in splitting the image cube into 3D overlapping facets with their own prior, reducing the computational bottleneck from full image to facet size. Simulation results suggest our prior provides similar if not superior reconstruction quality to the corresponding state-of-the-art non-faceted approach, with facet parallelization offering acceleration and therefore increased potential of scalability to large data and image sizes. Index Terms-Wideband radio-interferometric imaging, facet-based prior, preconditioned primal-dual algorithm.
Article
Full-text available
The data reduction procedure for radio interferometers can be viewed as a combined calibration and imaging problem. We present an algorithm that unifies cross-calibration, self-calibration, and imaging. Because it is a Bayesian method, this algorithm not only calculates an estimate of the sky brightness distribution, but also provides an estimate of the joint uncertainty which entails both the uncertainty of the calibration and that of the actual observation. The algorithm is formulated in the language of information field theory and uses Metric Gaussian Variational Inference (MGVI) as the underlying statistical method. So far only direction-independent antenna-based calibration is considered. This restriction may be released in future work. An implementation of the algorithm is contributed as well.
Conference Paper
Full-text available
Hyperspectral images exhibit strong spectral correlations, which can be exploited via a low-rankness and joint-sparsity prior when reconstructed from incomplete and noisy measurements. A state-of-the-art solution consists in using a regularization term based on both the 2,1 and the nuclear norms, which however does not scale well with large numbers of spectral channels and huge image sizes. To alleviate this issue, we propose a parallelizable faceted low-rankness and joint-sparsity prior to improve the scalability of the associated imaging algorithm while preserving its reconstruction performance and better promoting local spectral correlations. We illustrate our approach on synthetic data in the context of radio-astronomy. I. A SCALABLE LOW-RANKNESS AND JOINT-SPARSITY PRIOR Context and motivations. Hyperspectral (HS) imaging consists in recovering an image in several contiguous spectral channels from a set of noisy, possibly incomplete measurements. The problem can be cast as the following generic optimization task minimize X∈R N ×L + f (Y, ΦX) + r(X) (1) where N is the image size, L is the number of spectral channels, X is the unknown HS image, Φ ∈ C M ×N represents a linear measurement operator and Y ∈ C M ×L are the measurements. The functions f and r are respectively the data fitting and the regularization terms, encoding additional prior on the structure of X. In many applications, X exhibits significant spectral correlations, which can be exploited in the estimation process [1]-[4]. An efficient approach consists in resorting to low-rank and joint-sparse regularizations based on both the nuclear and 2,1 norms [5]: r(X) = λX * + µΨ † X2,1, where Ψ † ∈ R P ×N represents a sparsifying dictionary. To solve the resulting problem, [6] proposed an iterative primal-dual (PD) algorithm, which can handle all the functions in parallel without needing sub-iterations or to invert the linear operators involved [7]. Nevertheless, handling r may be computationally demanding in practice, and consequently not suitable in a very high dimensional setting. Radio-astronomy is an extreme example: new generation of radio telescopes are expected to provide surveys with sub-arcsec resolution over thousands of frequency channels, producing widefield images composed of 10 14 pixels for the Square Kilometer Array (SKA) [8]. To overcome this issue, scalable alternatives to the nuclear norm have been proposed in the literature. In [9], the nuclear norm is formulated as a solution to a scalable non-convex problem. However, due to the non-convexity, methods such as the primal-dual algorithm cannot be used. Alternatively, in [10] a scalable low-rank framework based on graph signal processing is introduced when the measurements are in the image domain, which is not the case for applications such as radio-astronomy (where observations are acquired in the Fourier domain). Proposed approach. We propose a simple facet-based version of the 2,1 and nuclear norm regularization expressed as r(X) = I i=1 λiWi˜SiXλiWi˜ λiWi˜SiX * + µiΨ † i SiX2,1 (2) where the masking operators { ˜ Si, Si} I i=1 produce spatially overlapping groups of pixels (i.e., rows of X) referred to as facets, whose definition is application-dependent. The diagonal weighting matrices {Wi} I i=1 are aimed at ensuring a smooth transition between facets to reduce potential tessellation artifacts. Different decomposition strategies can be adopted, e.g., tailored for the structure of the initial dictionary Ψ † (when Ψ † is a wavelet transform, it can be exactly decomposed in terms of facet-based wavelet transforms Ψ † i [11]). This prior offers additional degrees of parallelism to algorithms based on variable splitting [12], in particular for the PD algorithm [13]. Indeed, leveraging advanced PD functionalities, at each iteration, each term of the objective function (i.e. each facet) can be handled independently in parallel before being aggregated to ensure convergence to a solution to the global problem (1). Thus, combining the proposed faceting prior to the PD algorithm reduces both the memory requirement for the estimated HS image and the computational cost per iteration, leading to a new highly scalable method for HS imaging. II. APPLICATION TO WIDEBAND RADIO-ASTRONOMY We leverage (2) to solve the wideband imaging problem (1) for radio-astronomy. The proposed approach can be seen as a scalable version of the PD algorithm HyperSARA [6] with a facet-based prior that better promotes local spectral correlation. In this context, f models an 2 constraint. The SARA prior [14] involved in HyperSARA, based on wavelet transforms, leads to define {Si} I i=1 as in [11] to ensure an exact decomposition of the 2,1 term. The operators { ˜ Si} I i=1 are defined with a larger overlap to mitigate reconstruction artifacts resulting from the tessellation of the nuclear norm. Simulation settings. Following [6], we simulate a wideband model image of the W28 supernova remnant composed of N = 1024 × 1024 pixels and L = 20 spectral channels. The measurement operator corresponds to a realistic spatial Fourier sampling, where M ≈ 0.5N. The data are corrupted by an additive zero-mean white Gaussian noise, leading to a signal-to-noise ratio (SNR) of 60 dB. To evaluate the interest of (2), we compare the reconstruction performance of a primal-dual algorithm to solve (1) with: (i) r(X) = µΨ † X1,1, µ = 10 −3 (corresponding to SARA [14]) (ii) r(X) = λX * + µΨ † X2,1 (HyperSARA [6]), with (λ, µ) = (1, 10 −3); (iii) the proposed faceted prior defined in (2), with I = 16 facets (4 along each spatial dimension), and (λi, µi) = (1, 10 −5) for i ∈ {1,. .. , I}. Experimental results. The reconstructed images and error images obtained with the three methods in channels 1 and 20 are reported in Fig. 1, along with the reconstruction SNR and per-iteration reconstruction time. The proposed approach yields a reconstruction quality similar to HyperSARA (outperforming SARA), for a computing time closer to SARA, the fastest approach. The error images specifically highlight the efficiency of the proposed approach in reconstructing very low intensity emission, possibly in relation with a better handling of local spectral correlations. Conclusion and perspectives. We have proposed a highly scalable HS imaging method, based on a faceting prior promoting both low-rankness and joint-sparsity of the HS image of interest. We have shown that the proposed approach improves scalability of the state-of-the-art HS method in radio-astronomical imaging, while preserving its reconstruction performance. Additional experiments will be conducted in future works to better appreciate the performance of the proposed faceting approach. In particular, randomization could be leveraged to control the number of facets handled at each iteration.
Conference Paper
Full-text available
We leverage convex optimization techniques to perform Bayesian uncertainty quantification (UQ) for hyperspectral (HS) inverse imaging problems. The proposed approach generalizes our recent work for single-channel UQ [1]. Similarly, the Bayesian hypothesis test is formulated as a convex minimization problem and solved using a primal-dual algorithm to quantify the uncertainty associated with particular 3D structures appearing in the maximum a posteriori (MAP) estimate of the HS cube. We investigate the interest of the proposed method for wideband radio-interferometric (RI) imaging that consists in inferring the wideband sky image from incomplete and noisy Fourier measurements. We showcase the performance of our approach on realistic simulations.
Thesis
Full-text available
The future radio-interferometric telescopes like the Square Kilometre Array will have un- precedented resolution, sensitivity and bandwidth. To take advantage of these powerful instruments and handle the extreme amounts of wide-band data, novel signal process- ing methods have to be tailored. In this respect, we present a generic non-parametric approach expressed as a convex optimisation problem with low-rank and joint-sparsity priors. The proposed approach requires only one tuning parameter, namely the relative weight between the regularisers. We solve the problem with an efficient algorithm with full parallelism and distributing capabilities. Our results show superior performance of the approach with respect to state-of-the-art non-parametric wide-band methods, as well as to the conventional single-band imaging.
Article
We present a forward-backward-based algorithm to minimize a sum of a differentiable function and a nonsmooth function, both being possibly nonconvex. The main contribution of this work is to consider the challenging case where the nonsmooth function corresponds to a sum of nonconvex functions, resulting from composition between a strictly increasing, concave, differentiable function and a convex nonsmooth function. The proposed variable metric composite function forward-backward (C2FB) algorithm circumvents the explicit, and often challenging, computation of the proximity operator of the composite functions through a majorize-minimize approach. Precisely, each composite function is majorized using a linear approximation of the differentiable function, which allows one to apply the proximity step only to the sum of the nonsmooth functions. We prove the convergence of the algorithm iterates to a critical point of the objective function leveraging the Kurdyka-Łojasiewicz inequality. The convergence is guaranteed even if the proximity operators are computed inexactly, considering relative errors. We show that the proposed approach is a generalization of reweighting methods, with convergence guarantees. In particular, applied to the log-sum function, our algorithm reduces to a generalized version of the celebrated reweighted \ell 1 method. Finally, we show through simulations on an image processing problem that the proposed C2FB algorithm necessitates fewer iterations to converge and leads to better critical points compared with traditional reweighting methods and classic forward-backward algorithms.