The persistent cosmic web and its filamentary structure – I. Theory and implementation
ABSTRACT We present DisPerSE, a novel approach to the coherent multiscale identification of all types of astrophysical structures, in particular the filaments, in the large-scale distribution of the matter in the Universe. This method and the corresponding piece of software allows for a genuinely scale-free and parameter-free identification of the voids, walls, filaments, clusters and their configuration within the cosmic web, directly from the discrete distribution of particles in N-body simulations or galaxies in sparse observational catalogues. To achieve that goal, the method works directly over the Delaunay tessellation of the discrete sample and uses the Delaunay tessellation field estimator density computed at each tracer particle; no further sampling, smoothing or processing of the density field is required.The idea is based on recent advances in distinct subdomains of the computational topology, namely the discrete Morse theory which allows for a rigorous application of topological principles to astrophysical data sets, and the theory of persistence, which allows us to consistently account for the intrinsic uncertainty and Poisson noise within data sets. Practically, the user can define a given persistence level in terms of robustness with respect to noise (defined as a ‘number of σ’) and the algorithm returns the structures with the corresponding significance as sets of critical points, lines, surfaces and volumes corresponding to the clusters, filaments, walls and voids – filaments, connected at cluster nodes, crawling along the edges of walls bounding the voids. From a geometrical point of view, the method is also interesting as it allows for a robust quantification of the topological properties of a discrete distribution in terms of Betti numbers or Euler characteristics, without having to resort to smoothing or having to define a particular scale.In this paper, we introduce the necessary mathematical background and describe the method and implementation, while we address the application to 3D simulated and observed data sets in the companion paper (Sousbie, Pichon & Kawahara, Paper II).
-
Citations (0)
-
Cited In (0)
Page 1
arXiv:1009.4015v1 [astro-ph.CO] 21 Sep 2010
Mon. Not. R. Astron. Soc. 000, 000–000 (0000)Printed 22 September 2010(MN LATEX style file v2.2)
The persistent cosmic web and its filamentary structure
I: Theory and implementation
T. Sousbie1,2
1Department of Physics, The University of Tokyo, Tokyo 113-0033, Japan,
2Institut d’astrophysique de Paris & UPMC (UMR 7095), 98, bis boulevard Arago, 75 014, Paris.
tsousbie@gmail.com, sousbie@utap.phys.s.u-tokyo.ac.jp
22 September 2010
ABSTRACT
We present DisPerSE, a novel approach to the coherent multi-scale identification of
all types of astrophysical structures, and in particular the filaments, in the large scale
distribution of matter in the Universe. This method and corresponding piece of soft-
ware allows a genuinely scale free and parameter free identification of the voids, walls,
filaments, clusters and their configuration within the cosmic web, directly from the
discrete distribution of particles in N-body simulations or galaxies in sparse observa-
tional catalogues. To achieve that goal, the method works directly over the Delaunay
tessellation of the discrete sample and uses the DTFE density computed at each tracer
particle; no further sampling, smoothing or processing of the density field is required.
The idea is based on recent advances in distinct sub-domains of computational
topology, namely the discrete Morse theory which allows a rigorous application of
topological principles to astrophysical data sets, and the theory of persistence, which
allows us to consistently account for the intrinsic uncertainty and Poisson noise within
data sets. Practically, the user can define a given persistence level in terms of robust-
ness with respect to noise (defined as a “number of sigmas”) and the algorithm returns
the structures with the corresponding significance as sets of critical points, lines, sur-
faces and volumes corresponding to the clusters, filaments, walls and voids; filaments,
connected at cluster nodes, crawling along the edges of walls bounding the voids. From
a geometrical point of view, the method is also interesting as it allows for a robust
quantification of the topological properties of a discrete distribution in terms of Betti
numbers or Euler characteristics, without having to resort to smoothing or having to
define a particular scale.
In this paper, we introduce the necessary mathematical background and describe
the method and implementation, while we address the application to 3D simulated
and observed data sets to the companion paper, Sousbie, Pichon, Kawahara (2010).
Key words: Cosmology: simulations, statistics, observations, Galaxies: formation,
dynamics.
1INTRODUCTION
The existence of an intricate network of filaments in the
large scale distribution of matter is now considered an
established fact. Its was first observed by de Lapparent
et al. (1986) (see also e.g. Colless et al. 2003) and latter
theorized (see e.g.
Pogosyan et al. 1996; Bond et al.
1996): under-dense void regions bounded by sheet-like walls
embedded in a web like filamentary network branching on
high density dark matter haloes and galaxy clusters form
the so called cosmic web Bond et al. (1996), that spans over
a wide range of scales larger than the Megaparsec. Dark
matter halos and galaxy clusters have arguably been the
most studied component, and there exist a wide range of
methods to identify them in simulations or observational
catalogues such as the classical friend-of-friend (FOF)
(Huchra & Geller 1982), HFOF and 6D minimal spanning
tree(Gottloeber 1998), SUBFIND (Springel et al. 2001),
VOBOZ (Neyrinck et al. 2005) or ADAPTAHOP (Aubert
et al. 2004; Tweed et al. 2009) (the list is not exhaustive).
Cosmological voids were first observed by Kirshner et al.
(1981) and theoretical models were latter developed (see e.g.
Hoffman & Shaham 1982; Icke 1984; Bertschinger 1985).
Although they have been the subject of less attention,
there still exist a large number of references describing
their features and introducing numerical void finders such
Page 2
2 T. Sousbie
as for instance Neyrinck (2008), Platen et al. (2007) or
Aragon-Calvo et al. (2010) (see also the references therein).
Because of the intrinsic difficulty of even defining the
concepts of wall and filament, not to mention designing
consistent identificationalgorithms
case of observational data), their generic properties still
remain relatively uncertain. One can for instance refer to
Arag´ on-Calvo et al. (2010) for a nice review of the different
identification techniques and a study of the filaments
properties in dark matter N-body simulations (see also
e.g. Gay et al. 2010), and Stoica et al. (2010) or Sousbie
et al. (2008) for recent attempts at identifying filaments
properties in the SDSS and 2dFGRS galaxy catalogues,
using the CANDY model (Stoica et al. 2005) and skeleton
formalism (Sousbie et al. 2008) respectively. In this paper,
we present a general framework within which the physically
meaningful objects that are the voids, walls, filaments
and haloes are rigorously and consistently defined and
we also detail the corresponding numerical method that
allows for their direct identification in simulated as well
as observational data sets. We focus in particular on what
is probably the most striking feature of matter distribu-
tion on large scales in the Universe, its filamentary structure.
(especially inthe
During the last few years, Morse theory (e.g. Milnor
(1963); Jost (2008)) has been recognized as a very promis-
ing approach to the global identification of all types of
astrophysically significant features of the large scale galaxy
distribution in the universe (see e.g. Novikov et al. 2006;
Hahn et al. 2007; Sousbie et al. 2008, 2009, 2008; Aragon-
Calvo et al. 2008; Forero-Romero et al. 2009). The main
reason for this strong interest comes from the fact that all
the salient features of the web-like pattern of galaxies have
a direct, mathematically well defined equivalent in Morse
theory. In fact, Morse theory mainly relies on the definition
of so-called ascending and descending k-manifolds, which
partition space into series of k-dimensional domains defined
by the gradient of a function (in the present case, the den-
sity field), and the network whose branches are formed by
their intersections and whose nodes are the critical points,
the so-called Morse-complex (see section 2). As illustrated
on figure 1, each of those can be directly associated to an
astrophysical objects of interest: an ascending 3-manifold
defines a void, an ascending 2-manifold defines a wall,
and ascending 1-manifold defines a filaments, a descending
3-manifold defines a peak-patch of peak theory (Bardeen
et al. 1986), ... and the Morse complex defines some sort
of hierarchy and a notion of neighbourhood between them
(see section 2 for more details).
Nevertheless, and as promising as it may seem, all
the efforts toward applying Morse theory to astrophysical
data sets such as galaxy catalogues have so far been
plagued by major difficulties. Those difficulties are a direct
consequence of the fact that Morse theory, although very
attractive, is fundamentally a mathematical theory defined
for idealized, well defined and properly behaved smooth
functions, which of course is not generally the case of
any physical data set resulting from actual measurements.
At least two critical issues can be identified in the case
of the large scale structure identification problem. The
first results from the presence of Poisson noise and large
observational biases in galaxy catalogues, which should be
dealt with from the start, especially when the data set is
relatively sparse as it becomes even more difficult in that
case to distinguish between noise features and the actual
features of the sampled data set. The second issue arises
from the fact that Morse theory applies to so called Morse
functions (see definition 2.2), which are sufficiently smooth
twice differentiable continuous functions (whose critical
points are non-degenerate) whereas the galaxy distribution
is discrete by nature. This incompatibility is fundamental,
as it means that the theoretical notions of Morse theory
may actually not apply to any practical data set. A more
detailed discussion of this problem is presented in appendix
A as well as an example of the consequences of neglecting
this inconsistency in the case of watershed based meth-
ods such as Sousbie et al. (2009); Aragon-Calvo et al. (2008).
In this paper, we focus on presenting DisPerSE, a
formalism and corresponding software specifically designed
for analyzing the cosmic web and its filamentary network.
This formalism is based on Morse theory, while the afore-
mentioned incompatibilities with astrophysical data sets
are overcome by relying on relatively recent advances in
distinct sub-domains of computational topology. These
domains are discrete Morse theory (a distinct though
related theory developed by Forman see
2002) and references therein) and persistent homology,
first introduced in Edelsbrunner et al. (2000, 2002). We
therefore start by introducing the corresponding necessary
notions of computational topology in sections 2, 3 and
4 respectively. Note that no previous knowledge in the
field of computational topology is assumed here, the goal
of those sections being mainly to introduce the required
mathematical vocabulary that we use extensively in the
following sections, and give a glimpse at how those theories
can help deepen our understanding of the structure of
the cosmic web. The reader interested in pursuing this
investigation further should refer to the aforementioned
references for a more detailed and involved introduction. In
particular, we strongly recommend the reading of Gyulassy
(2008) and especially Zomorodian (2009) for a very didac-
tic presentation of these concepts. Indeed, the particular
method and implementation presented in this paper are
inspired by the work presented in those two references.
Forman (1998b,
We then proceed by showing in section 5 how it is possi-
ble, relying on the previously mentioned theories, to design
an algorithm that rigorously computes the discrete Morse
complex of a discrete density field, obtained using DTFE
technique (Schaap & van de Weygaert 2000) from the delau-
nay tesselation of a given discretely sampled data set, such
as the distribution of galaxies in the universe. Within our
approach, the Morse complex is directly computed from the
delaunay tessellation which means it is scale adaptive and
parameter free. The problem of dealing with Poisson noise
and measurement errors is addressed in section 6, where we
make use of persistence theory to remove spurious topolog-
ical features from the Morse complex. Practically, the fila-
mentary network (and associated voids, walls, ...) computed
from the initial distribution is simplified by canceling pairs of
critical points according to a persistence criterion, that can
be restated in terms of significance relative to shot noise.
Page 3
Persistent cosmic web I: Theory and implementation3
Figure 1. The dark matter density distribution in a 50h−1Mpc large cosmological simulation (top left frame), with its ascending
3-manifolds (i.e. the voids, top right frame), ascending 2-manifolds (i.e. the walls, bottom left frame) and ascending 1-manifolds (i.e.
the filaments, bottom right frame). The manifolds were computed using the method introduced in Sousbie et al. (2009).
Finally, in section 7, we address technical questions such as
dealing with boundary conditions, smoothing the identified
voids, walls and filaments and important implementation
problems before concluding in section 8.
Importantly, let us emphasize that within this frame-
work, the mathematical theories that we use are fundamen-
tally discrete and readily apply to the measured raw data;
the unique supplementary but critical step consists in defin-
ing heuristically a consistent labeling of the segments, trian-
gle and tetrahedron of the delaunay tesselation with regards
to the DTFE densities computed at the sampling points (see
section 5.1). This warrants that all the well known and ex-
tensively studied mathematical properties of the Morse com-
plex are ensured by construction at the mesh level, and that
the corresponding cosmological structures therefore corre-
spond to well defined mathematical objects with known
mathematical properties. It also provides a consistent way
of reconnecting the corresponding network after the removal
of insignificant (non-persistent) pairs of critical points.
Note that a reference is given on the last two page,
in which most mathematical terminology introduced in sec-
tions 2, 3 and 4 is defined in relatively simple terms. As we
only aim here to introduce the necessary mathematical no-
tions and giving a detailed description of the computation
pipeline, extensively illustrating each step, the application
to actual data sets is presented in a less technical companion
paper, Sousbie, Pichon, Kawahara (2010). In that paper, we
show the potential of this approach by applying it to typical
cosmological data-sets: a large scale dark matter cosmologi-
cal N-body simulation and the 7thdata release (DR7) of the
SDSS galaxy catalogue (Abazajian et al. 2009).
Page 4
4T. Sousbie
Figure 2. A 2D density field with its gradient (top left), its descending 2-manifolds (top right), its ascending 2-manifolds (bottom left),
and its Morse-Smale complex (bottom right, see the black and white network). The maxima/saddle points/minima are represented as
red/green/blue circled disks respectively and three integral lines are drawn in pink on the top left frame. On the central left part of the
bottom right frame, an arc (i.e. a 1-cell) is represented in yellow (intersection of a green ascending 1-manifold and a blue descending
2-manifold) and a quad (i.e. a 2-cell) in purple (intersection of a red descending 2-manifold and a blue ascending 2-manifold).
2 MORSE THEORY FOR SMOOTH
MANIFOLDS
Mathematically speaking, Morse theory is concerned with
smooth scalar functions (say height of a mountain, or the
temperature in a room) defined over generic manifolds. In
the present case we are mainly interested in density fields:
real valued functions defined over d dimensional Euclidian
spaces1Rd. We will therefore restrict the present discussion
1This is actually not generally true. Numerical simulations for
instance often use periodic boundary conditions, which amounts
to defining density on a torus Td⊂ Rd.
to such geometries for the sake of simplicity. Morse theory
provides a way to capture the intricate relation between
the geometrical and topological properties of a function.
What one means by geometrical property is basically any
property unaffected by rigid motions such as translations
or rotations. If h is the altitude function of a mountain
landscape for instance, the altitude of the highest peak or
its total surface are geometrical properties. Topology on
the other hand captures how points are connected to each
other with notions such as that of neighborhood. Topo-
logical properties are invariant under smooth continuous
transformations. Sometimes topology is coined to be rubber
geometry. Sticking to the landscape analogy and defining
Page 5
Persistent cosmic web I: Theory and implementation5
a mountain as the set of points that can be reached from
its summit by going down the slope (i.e. following the
gradient of h), then the mountain itself is in some sense
a topological property of the altitude function. Indeed, in
winter, when covered with snow, or during summer, after
the snow melted, the altitude map slightly changes, but
the underlying mountain can still be easily identified as
the same mountain. For the same reasons, a crest linking
two mountains or a valley for instance are also topological
properties of the landscape. When it comes to characteriz-
ing a function such as the matter density ρ on large scales
in the universe, both topological and geometrical properties
are interesting. While topological properties such as the
number of galaxy clusters or dark matter haloes in a given
volume are robust with respect to changes in the precise
measured value of ρ, geometrical properties such as the
density profile and precise location of a halo or a filament
are more specific and characterize better the properties of ρ.
The relation between geometry and topology is intri-
cate, and while modifying topology certainly requires a mod-
ification of geometry, the reverse is not generally true. For
instance, the shape of a mountain may only slightly change
with season, but more drastic events such as the explosion of
a volcano (i.e. a drastic change in geometry) could actually
erase it. Morse theory captures this relation for a generic
function f by relying on the gradient ∇xf (x) = df/dx(x)
and its flow. The gradient defines a preferential direction at
every point (the direction of steepest ascent) except where it
vanishes (i.e. where ∇xf = 0). Those particular points are
called critical points and can be classified according to the
sign of the Hessian matrix, the d × d matrix of the second
derivatives Hf(x) = d2f/dxidxj(x):
Definition 2.1. (critical point of order k) Let f be a
function defined over Rdand P a point with coordinate
p ∈ Rd. Then P is a critical point of f if ∇xf (p) = 0.
It is said to be of order k if the Hessian matrix Hf(p) has
exactly k negative eigenvalues.
Intuitively, in 2D, the top of a mountain is a maximum (or-
der 2), a pass is a saddle-points (order 1) and the bottom of
a valley a minimum (order 0). The top left frame of figure 2
shows the gradient and critical points of a function defined
over R2. On this picture, the blue, green and red circles stand
for the critical points of order 0 (minima), 1 (saddle points)
and 2 (maxima) respectively. Note that according to defini-
tion 2.1, the order of a critical point is defined by the sign of
the eigenvalues of the Hessian, which must therefore be non
null. This condition is essential to Morse theory: a func-
tion f which obeys Morse theory must necessarily satisfy
this constraint. Conversely, such functions are called Morse
functions:
Definition 2.2. (Morse function) A Morse function is a
smooth function whose critical points are non-degenerate.
This means that for any P such that ∇xf (p) = 0,
detHf(p) ?= 0.
We will assume from now on that f is a Morse func-
tion.2At the location of any non critical point, the gradient
indicates a prefered direction, and one can therefore define
specific lines, the integral lines, by following the gradient
flow:
Definition 2.3. (Integral line or field line) An
gral line (also called field line) is a curve L(t) ∈ Rdsuch
that
inte-
dL(t)
dt
= ∇xf .(1)
Its origin and destination are defined as limt→−∞L(t) and
limt→+∞L(t) respectively.
The pink curves on top left frame of figure 2 show examples
of integral lines: the lower order critical point at their ex-
tremity is their origin and the higher one their destination.
The integral lines of a Morse function actually always have
critical points as origin and destination. Let us consider the
case of an integral line passing through a base point P. One
can show that such integral line obeys certain properties:
Property 2.3.1. (Integral lines of a Morse function)
The integral lines of a Morse function f defined over Rdand
passing through a given point P is obtained by folowing the
gradient and minus the gradient from P. It obeys certain
properties:
• The origin and destination of an integral line is a critical
point.
• Two integral lines passing through points P and P′re-
spectively may only be identical or fully distinct : two inte-
gral lines cannot intersect (they can share their origin and/or
destination though).
• The set of all the integral lines cover all of Rdand each
point P of space belong to exactly one integral line. It may
be the origin/destination of several integral lines if it is a
critical point though.
• An integral line with base point a critical point P is
reduced to that point P.
The combination of the first and second properties is par-
ticularly interesting, as it allows classifying each points of
space according to the origin or destination of its (unique)
integral line. Such classification defines distinct regions of
space called ascending and descending manifolds:
Definition 2.4. (Ascending/Descending n-manifold)
Let P be a critical point of order k of the Morse function
f defined over Rd. The ascending (d−k)-manifold defines
a region of space with dimension (d−k): the set of points
reached by integral lines with origin P. The descending
k-manifold defines a region of space with dimension k, the
set of points reached by integral lines with destination P.
There exist exactly d different classes of ascending and de-
scending manifolds, classified according to the order of the
critical point at their origin or destination. Note that an
ascending or descending d-manifold of a Morse function al-
ways spans a domain of dimension d (i.e. a 0-manifold is a
(critical) point, a 1-manifold a line, a 2-manifold a surface,
2this is a strong requirement in practice, as shown in appendix
A .