Page 1

arXiv:0903.3411v2 [astro-ph.CO] 19 Jul 2010

The Astrophysical Journal, 697:1842-1860, 2009 June 1

Preprint typeset using LATEX style emulateapj v. 04/20/08

AN OPTICAL GROUP CATALOGUE TO Z = 1 FROM THE ZCOSMOS 10K SAMPLE1

C. Knobel2, S. J. Lilly2, A. Iovino7, C. Porciani2,17, K. Kovaˇ c2, O. Cucciati7, A. Finoguenov5,

M. G. Kitzbichler19C. M. Carollo2, T. Contini6, J.-P. Kneib8, O. Le F` evre8, V. Mainieri4, A. Renzini13,

M. Scodeggio9, G. Zamorani5, S. Bardelli5, M. Bolzonella5, A. Bongiorno3, K. Caputi2, G. Coppa5, S. de la

Torre8, L. de Ravel8, P. Franzetti9, B. Garilli9, P. Kampczyk2, F. Lamareille6, J.-F. Le Borgne6, V. Le Brun8,

C. Maier2, M. Mignoli5, R. Pello6, Y. Peng2, E. Perez Montero6, E. Ricciardelli13, J. D. Silverman2,

M. Tanaka4, L. Tasca8, L. Tresse8, D. Vergani5, E. Zucca5, U. Abbas8,15, D. Bottini9, A. Cappi5, P. Cassata8,

A. Cimatti10, M. Fumana9, L. Guzzo7, A. M. Koekemoer20, A. Leauthaud18, D. Maccagni9, C. Marinoni11,

H. J. McCracken12, P. Memeo9, B. Meneux3,16, P. Oesch2, L. Pozzetti5, R. Scaramella14

The Astrophysical Journal, 697:1842-1860, 2009 June 1

ABSTRACT

We present a galaxy group catalogue spanning the redshift range 0.1 ? z ? 1 in the ∼ 1.7 deg2

COSMOS field, based on the first ∼ 10,000 zCOSMOS spectra. The performance of both the Friends-

of-Friends (FOF) and Voronoi-Delaunay-Method (VDM) approaches to group identification has been

extensively explored and compared using realistic mock catalogues. We find that the performance

improves substantially if groups are found by progressively optimizing the group-finding parameters

for successively smaller groups, and that the highest fidelity catalogue, in terms of completeness and

purity, is obtained by combining the independently created FOF and VDM catalogues. The final

completeness and purity of this catalogue, both in terms of the groups and of individual members,

compares favorably with recent results in the literature. The current group catalogue contains 102

groups with N ≥ 5 spectroscopically confirmed members, with a further ∼ 700 groups with 2 ≤ N ≤ 4.

Most of the groups can be assigned a velocity dispersion and a dark-matter mass derived from the

mock catalogues, with quantifiable uncertainties. The fraction of zCOSMOS galaxies in groups is

about 25% at low redshift and decreases toward ∼ 15% at z ∼ 0.8. The zCOSMOS group catalogue

is broadly consistent with that expected from the semi-analytic evolution model underlying the mock

catalogues. Not least, we show that the number density of groups with a given intrinsic richness

increases from redshift z ∼ 0.8 to the present, consistent with the hierarchical growth of structure.

Subject headings: catalogs — galaxies: clusters: general — galaxies: high-redshift — methods: data

analysis

1European

175.A-0839

2Institute for Astronomy, ETH Zurich, 8093 Zurich, Switzer-

land

3Max-Planck-Institut f¨ ur extraterrestrische Physik, D-84571

Garching, Germany

4European Southern Observatory, Karl-Schwarzschild- Strasse

2, Garching, D-85748, Germany

5INAF Osservatorio Astronomico di Bologna, via Ranzani 1,

I-40127, Bologna, Italy

6Laboratoire dAstrophysique de Toulouse-Tarbes, Universit de

Toulouse, CNRS, 14 avenue Edouard Belin, F-31400 Toulouse,

France

7INAF Osservatorio Astronomico di Brera, Milan, Italy

8Laboratoire d’Astrophysique de Marseille, Marseille, France

9INAF - IASF Milano, Milan, Italy

10Dipartimento di Astronomia, Universit´ a di Bologna, via

Ranzani 1, I-40127, Bologna, Italy

11Centre de Physique Theorique, Marseille, Marseille, France

12Institut d’Astrophysique de Paris, UMR 7095 CNRS, Univer-

sit´ e Pierre et Marie Curie, 98 bis Boulevard Arago, F-75014 Paris,

France

13Dipartimento di Astronomia, Universita di Padova, Padova,

Italy

14INAF, Osservatorio di Roma, Monteporzio Catone (RM),

Italy

15ELSA Marie Curie Postdoctoral Fellow, INAF - Osservatorio

Astronomico di Torino, 10025 Pino Torinese, Italy

16Universitats-Sternwarte,

Muenchen, Germany

17Argelander-Institut f¨ ur Astronomie, Auf dem H¨ ugel 71,

D-53121 Bonn, Germany

18LBNL & BCCP, University of California, Berkeley, CA,

94720

SouthernObservatory(ESO),Large Program

Scheinerstrasse 1,D-81679

1. INTRODUCTION

Groups and clusters of galaxies are the most massive

virialized structures in the Universe. They are impor-

tant for several reasons. First, groups and clusters define

the environment in which most galaxies actually reside

and in which we may expect many important processes

determining the evolution of galaxies (e.g., Voit 2005).

Studying the properties of galaxies in groups at different

redshifts is a direct probe of how the local environment

affects the formation and evolution of galaxies with cos-

mic time. Second, characterization of galaxies in groups

provides information about the galactic content of dark

matter (DM) halos. This yields statistical quantities such

as the halo occupation distribution (e.g., Collister & La-

hav 2005) or the conditional luminosity function (e.g.,

Yang et al.2008) which themselves yield useful con-

straints on various physical processes that govern the

formation and evolution of galaxies. Finally, the num-

ber density and clustering of groups strongly depend on

cosmological parameters and thus are a potentially sen-

sitive probe of the underlying cosmological model (e.g.,

Bahcall et al. 2003; Gladders et al. 2007; Rozo et al.

2009).

19Max Planck Institute of Astrophysics, Karl-Schwarzschild

Str. 1, PO Box 1317, D-85748 Garching, Germany

20Space Telescope Science Institute, 3700 San Martin Drive,

Baltimore, MD 21218

Page 2

2Knobel et al.

From an observational point of view, there are many

ways to identify a group21.

framework it is natural to associate groups with DM ha-

los, and this is the definition adopted by most authors.

Therefore, throughout this paper we refer to a “group”

as a set of galaxies occupying the same DM halo22.

There are many different observational techniques to

identify groups in the local and distant Universe in use

today. Groups can be detected in the optical/near-

infrared (NIR) (e.g., Gal 2006), by diffuse X-ray emis-

sion (e.g., Pierre et al. 2006; Finoguenov et al. 2007),

by the Sunyaev-Zel’dovich effect in the cosmic microwave

background (e.g., Carlstrom et al. 2002; Voit 2005), by

particular wide-angle tailed (WAT) galaxies (e.g., Blan-

ton et al. 2003), and through cosmic shear due to weak

gravitational lensing (e.g., Feroz et al. 2008). Each of

these methods has its own advantages and problems (see

e.g. Voit 2005; Johnston et al. 2007 § 1), and the choice

of a particular method might depend on the desired ap-

plication.

If one aims to study the galaxy population in groups,

searching for groups directly in large optical galaxy sur-

veys is relatively straightforward and efficient.

are many different methods discussed in the literature

to identify groups in an optical survey (for a review see

e.g. Gerke et al. 2005 § 4.1; Gal 2006). In essence, these

aim to identify overdensities in redshift space, luminosity

and/or color space, depending on the availability of red-

shift information and/or photometry. Whatever method

is used, it should conform to the following general rules

(see e.g. Gal 2006): First, it should be based on an ob-

jective, automated algorithm to minimize human biases.

Second, the algorithm should impose minimal constraints

on the physical properties of the clusters to avoid selec-

tion biases. The latter point is especially important if

one aims to investigate the evolution of the galaxy pop-

ulation in groups. For instance, it has been shown that

the addition of color information provides a powerful tool

to find clusters in the Universe. There are methods such

as the Cluster Red Sequence (CRS) method (Gladders &

Yee 2000) or the maxBCG algorithm (Hansen et al. 2005,

Koester et al. 2007a) which are based on the fact that

the most luminous galaxies in clusters inhabit a tight se-

quence in the color-magnitude diagram called the “red

sequence”. Using the red sequence information, these

methods have proved to be very successful in finding clus-

ters in the local (Koester et al. 2007b) and the distant

Universe up to redshift z ∼ 1 (Gladders & Yee 2005). A

further advantage of these methods is that no redshift in-

formation is needed. However, clearly the requirement of

a substantial population of red sequence galaxies inhab-

iting the red sequence may impose a pre-selection that

In the current ΛCDM-

There

21In this paper, we will not distinguish between “groups” and

“clusters”, since from an optical/near-infrared point of view the

difference between groups and clusters is rather a gradual, quanti-

tative one, and not a qualitative one. So when we talk of “groups”,

we do not make any assumption about the mass or other properties

oh these systems.

22Throughout this paper, a DM halo is operationally defined as

a friends-of-friends group of DM particles with a linking length of

b = 0.2, since this is the definition adopted in the Millenium DM

N-body Simulation (Springel et al. 2005) used for our analysis.

So DM halos correspond to a mean overdensity of roughly 200.

Alternative practical definitions or higher overdensities would then,

in principle, correspond to different group catalogues.

makes evolutionary studies more difficult.

The large number of accurate spectroscopic redshifts

available for the large numbers of galaxies from the

zCOSMOS redshift survey in the COSMOS field (Scov-

ille et al. 2007) enables us to use the most fundamental

signature of groups – overdensities in redshift space –

without recourse to additional color information. Nev-

ertheless, even with precise spectroscopic redshifts, to

identify groups in redshift space one has to deal with cer-

tain difficulties: Firstly, the peculiar velocities of galax-

ies in groups elongates groups in the redshift dimension

(the “fingers-of-god” effect). This effectively decreases

the galaxy density within groups in redshift space, and

thus makes them harder to detect, and may cause group

members to intermingle with other nearby field galaxies

or even to merge into another nearby group. It is al-

most impossible to separate interlopers from real group

galaxies if they appear within the group in redshift space.

Second, in magnitude limited surveys such as ZCOS-

MOS, the mean density of galaxies decreases with red-

shift. So any algorithm based on the distance between

neighbouring galaxies has to take into account the de-

pendence of the mean galaxy separation with redshift.

Third, the observational selection of galaxies (e.g. in-

homogeneous sampling rate in the spectroscopic survey)

frequently produce additional complications.

To cope with these difficulties, some forms of the tra-

ditional Friends-of-Friends (FOF) algorithm (Huchra &

Geller 1982) are still widely used (e.g. Eke et al. 2004;

Berlind et al. 2006), although FOF has some well known

shortcomings (e.g. Nolthenius & White 1987; Frederic

1995). For instance, the FOF algorithm depends sensi-

tively on the value of the linking length, and can merge

neighbouring groups into single big groups, or fragment

large groups into smaller pieces.

Until now, there have not been many spectroscopic red-

shift surveys searching for groups at high redshift. Carl-

berg et al. (2001) describe a group catalogue obtained

from CNOC2 in the redshift range 0.1 ? z ? 0.5. For the

redshift range z ? 0.5 only the DEEP2 redshift survey

(Davis et al. 2003), covering a total area of ∼ 3 deg2and

redshift range of 0.7 ? z ? 1.4, has sufficient size and

sampling rate to identify a large number of groups in

redshift space. To achieve this, Gerke et al. (2005, 2007)

have adapted the Voronoi-Delaunay-Method (VDM) of

Marinoni et al. (2002), which is claimed to compensate

for some of the shortcomings of the traditional FOF al-

gorithm.

The aim of this paper is to create a group catalogue

from the ∼ 10,000 spectra in the zCOSMOS 10k sam-

ple (S. J. Lilly et al. 2009, in preparation) to enable the

study of the group population over the redshift range

0.1 ? z ? 1. We will compare the performance of both

the FOF and VDM algorithms on the 10k sample, and

try to optimize the group-finding methods by the intro-

duction of a “multi-run procedure”. In § 2 we describe

the 10k sample and corresponding realistic mock cata-

logues that were generated to test the group finding al-

gorithms. § 3 gives a detailed description of our adopted

group-finding method, and discusses the performance of

the two groupfinders. In § 4 we present the 10k group

catalogue, and describe how basic group properties are

estimated. § 5 compares the 10k group catalogue to the

Page 3

3

mocks and to 2dfGRS. Finally, § 6 summarizes the pa-

per.Where necessary, a concordance cosmology with

H0= 73 km s−1Mpc−1, Ωm= 0.25, and ΩΛ= 0.75 is

adopted. All magnitudes are quoted in the AB system.

2. DATA

2.1. zCOSMOS survey

zCOSMOS is a spectroscopic redshift survey (Lilly et

al. 2007, 2009 in preparation) covering the ∼ 1.7 deg2

COSMOS field (Scoville et al.

are measured with the VLT using the VIMOS spec-

trograph (Le F` evre et al. 2003). The zCOSMOS sur-

vey is split into two parts: The first part, “zCOSMOS-

bright”, is a pure magnitude selected survey with 15 ≤

IAB ≤ 22.5, IAB the F814W HST/ACS band (Koeke-

moer et al. 2007). This magnitude limit will yield a

survey of approximately 20,000 galaxies in the redshift

range 0.1 ? z ? 1.2. Repeated observations of some

zCOSMOS galaxies have shown that the redshift error

is approximately Gaussian distributed with a standard

deviation of σv ≃ 100 km s−1.

zCOSMOS, “zCOSMOS-deep”, aims at observing about

10,000 galaxies in the redshift range 1.5 ? z ? 3.0 se-

lected through a well-defined color criteria.

To date, about a half of zCOSMOS-bright has been

completed yielding about 10,500 spectra (S. J. Lilly et

al. 2009, in preparation). Among these redshifts about

15% are classified as unreliable. For the group catalogue,

we have accepted all objects with the confidence classes

4 and 3, 9.5, 9.3, 2.5, 9.4, 2.4, 1.5, and 1.4 (see S. J. Lilly

et al. 2009, in preparation). The redshifts with these

confidence classes constitute 86% of the whole 10k sam-

ple and have a spectroscopic confirmation rate of 98.6%

as found by duplicate observations. After removing the

stars (∼ 5%), we finally end up with a sample of 8417

galaxies with usable redshifts (“10k sample”).

At the current stage of the survey, the spatial spec-

troscopic sampling rate of galaxies across the COSMOS

field is very inhomogeneous, and there are clearly some

linear features such as stripes visible (see Figure 5 of S.

J. Lilly et al. 2009, in preparation). Since this will affect

the number of detectable groups in this sample in a non-

trivial way, we have created mock catalogues that have

the same kind of inhomogeneous coverage. To create the

group catalogue and generate the statistics describing the

fidelity of the catalogue, the groupfinders were applied to

the whole field spanning the range 149.47◦? α ? 150.77◦

and 1.62◦? δ ? 2.83◦. However, for some applications

discussed below we restrict ourselves to the “central re-

gion” of the COSMOS field defined by α = 150±0.4◦and

δ = 2.15 ± 0.4◦, since this region is relatively complete

compared to the total field. Only about 25% of the area

has a completeness lower than 30% while for the whole

field this area constitutes more than 50%.

The number of galaxies per unit redshift dNgal/dz is

shown in Figure 1. There are two striking density peaks

at redshifts z ∼ 0.3 and ∼ 0.7.

2.2. zCOSMOS 10k mocks

The mocks we use to calibrate and test our groupfind-

ers are adapted from the COSMOS mock lightcones

(Kitzbichler & White 2007). These light cones are based

on the Millenium DM N-body simulation (Springel et al.

2007).The redshifts

The second part of

00.20.40.60.811.21.4

0

0.5

1

1.5

2

z

dNgal / dz × 10−4

10k

mocks

Fig. 1.— Number of galaxies per redshift dNgal/dz. The his-

togram shows the dNgal/dz of the 10k sample used in this paper.

Two large over-densities at z ∼ 0.3 and z ∼ 0.7 are clearly visible.

The dashed line shows the mean dNgal/dz of the 24 mocks and the

shaded area their scatter. As noted below, the magnitude limit

in the mocks have been adapted such that the mean dNgal/dz of

the mocks matches the smoothed dNgal/dz of HST/ACS COSMOS

catalogue. The shaded area shows that, although COSMOS covers

an unprecedentedly large area for a survey of this depth, cosmic

variance is still an important issue.

2005) which was run with the cosmological parameters

Ωm= 0.25, ΩΛ= 0.75, Ωb= 0.045, h = 0.73, n = 1, and

σ8 = 0.9. The semi-analytic recipes for populating the

volume with galaxies in the lightcones is that of Croton

et al. (2006) as updated by De Lucia & Blaizot (2006).

There are 24 independent mocks, each covering an area

of 1.4 deg×1.4 deg with an apparent magnitude limit of

r ≤ 26 and galaxies in the redshift range z ? 7.

These lightcones were adjusted to resemble the real 10k

sample as much as possible. First, a magnitude cut of

15 ≤ i ≤ 22.5 was applied. However, the mean number of

galaxies in the resulting mocks was about 5−10% higher

than in the zCOSMOS target catalogue (i.e. a 1 − 2σ

effect). To make the mocks more closely resemble the

real data, we adjusted the magnitude cut in a redshift

dependent way so that the mean number of galaxies per

unit redshift

¯

Ngal(z)/dz in the mocks was equal to the

smoothed Ngal(z)/dz of the zCOSMOS input target cat-

alogue (see Figure 1). Then, the spatial sampling com-

pleteness and the redshift success rate were simulated by

removing galaxies from the mocks according to the prob-

ability that a galaxy with a certain position and redshift

would have been observed in the 10k sample. It should

be noted that zCOSMOS is a slit-based survey. How-

ever, the bias against close neighbours — already small

because of the multiple passes (upto 8 in the central re-

gion) across the field — is further mitigated by galaxies

appearing serendipitously in slits targeted at other galax-

ies (see P. Kampczyck et al. 2009, in preparation). The

small variation in sampling rate on these small scales,

which is anyway well below the mean intergalactic sep-

aration in 3-d space, has been ignored in constructing

the mocks. To further enhance the conformity with the

10k sample, the redshift of each galaxy was perturbed

by an amount drawn from a Gaussian distribution with

standard deviation σz= 100(1+ z)/c km s−1.

Page 4

4Knobel et al.

11 12

log ( M / Msol )

13 14

0

0.2

0.4

0.6

0.8

1

fraction

mag. lim.

10k

−6

−5

−4

−3

−2

log ( n × Mpc3 )

0.2 ≤ z ≤ 0.5

all

mag. lim.

10k

12

log ( M / Msol )

1314

mag. lim.

10k

0.5 ≤ z ≤ 0.8

all

mag. lim.

10k

Fig. 2.— Fraction of detectable groups in the “ideal” mocks as

a function of DM halo mass. The left panel shows the redshift

range 0.2 ≤ z ≤ 0.5, and the right panel shows 0.5 ≤ z ≤ 0.8. The

upper panels show the number density of halos in the 10k mocks

(blue), in a purely 22.5 magnitude limited sample (red), and in

total (black), and the lower panels show the fraction of halos in

the 10k sample (solid line) and in the magnitude limited sample

(dashed line) with respect to the total number of halos at a given

mass. The shaded regions show the upper and lower quartiles of

the fractions among the 24 mocks. For both redshift ranges, the

10k sample was restricted to the central region of the survey.

2.3. Detectability of groups

Since, according to our definition, a group is the set of

galaxies occupying the same DM halo, we can only hope

to detect those groups which host at least two galaxies

in the 10k sample. The collection of all these “detectable

groups” constitutes the ideal (or “real”) group catalogue.

This is the best catalogue that can be produced with the

10k sample, and this is the catalogue we aim to recon-

struct with our groupfinder. Any DM Halo hosting only

a single zCOSMOS 10k galaxy is not detectable and the

corresponding galaxies will be termed “field galaxies”.

For this reason, even the ideal group catalogue that is

detectable with the 10k sample will not be a complete

rendition of the true underlying group population in the

COSMOS volume. Nevertheless, whenever we discuss

the statistical properties of a group catalogue, such as

completeness or purity (see § 3.2), these will be measured

relative to this “ideal” group catalogue, rather than the

underlying population.

In a flux limited survey such as zCOSMOS, the popu-

lation of galaxies that is observed changes with redshift,

and the same will also therefore be true of the groups.

For instance, for a group to be detectable at high red-

shift, it has to host at least two rather bright galaxies.

Figure 2 shows the fraction of detectable groups in the

mocks (i.e. in the “ideal” catalogue in the previous para-

graph) as a function of the halo mass in the two red-

shift bins 0.2 ≤ z ≤ 0.5 and 0.5 ≤ z ≤ 0.8.

in the lower redshift bin the sample should be complete

down to ∼ 5×1013M⊙, this limit increases in the higher

redshift bin to ∼ 2 × 1014M⊙. However, in both bins

the bulk of the detectable halos are in the mass range

While

1012M⊙? M ? 5 × 1013M⊙.

3. GROUP-FINDING METHOD

In this section the different group-finding methods and

the statistical properties of the resulting group cata-

logues are discussed. We have applied both the FOF and

VDM algorithms to our sample. In this way we are able

to compare the resulting group catalogues obtained by

the different methods and to investigate the robustness

of the results.

3.1. The FOF and VDM algorithms

3.1.1. FOF

The FOF algorithm is adopted from Eke et al. (2004).

It has three free parameters: the linking length b, the

maximum perpendicular linking length in physical coor-

dinates Lmax, and the ratio between the linking length

along and perpendicular to the line of sight R. The exact

meaning of these parameters becomes clear by regarding

the linking criteria: Consider two galaxies i and j with

comoving distances di and dj respectively. These two

galaxies are assigned to the same group if their angular

separation θijsatisfies

θij≤1

2

?l⊥,i

di

+l⊥,j

dj

?

(1)

and, simultaneously, the difference between their dis-

tances satisfies

|di− dj| ≤l?,i+ l?,j

2

. (2)

l⊥and l?are the comoving linking lengths perpendicular

and parallel to the line of sight defined by

l⊥=min

?

Lmax(1 + z),

b

¯ n1/3

?

(3)

l?=R l⊥,(4)

where ¯ n is the mean density of galaxies. Since the sam-

ple of galaxies is magnitude limited, the mean density of

galaxies decreases with redshift leading to a steady in-

crease of the mean inter-galaxy separation with redshift.

Eke et al. (2004) argued that scaling both l⊥and l?with

n−1/3will compensate for the magnitude limit and lead

to groups of similar shape and overdensity throughout

the survey. The free parameter Lmax has been intro-

duced to avoid unphysically large values for l⊥ at high

redshifts where the galaxy distribution is sampled very

sparsely. Since Lmaxis measured in physical coordinates,

Lmax(1+z) is the maximal comoving linking length per-

pendicular to the line of sight. Finally, the free parameter

R allows l?to be larger than l⊥taking into account the

elongation of groups along the line of sight due to the

fingers-of-god effect.

3.1.2. VDM

The VDM algorithm was adopted from Gerke et al.

(2005) which was itself based on the method developed

by Marinoni et al.(2002).

complicated than the FOF and has six free parameters

This algorithm is more

Page 5

5

instead of three.

Delaunay tesselation23of the input galaxy sample and

the volumes of each Voronoi-cell. The Voronoi-Delaunay-

tesselation was computed using Qhull24(Barber, Dobkin,

& Huhdanapaa 1996) and the volumes of the Voronoi-

cells using the algorithm of Mirtich (1996).

The VDM algorithm can be divided into 3 phases: In

Phase I, the galaxies are ordered in ascending order of

Voronoi-volumes. Then, the first galaxy in this sorted

list is taken as a “seed galaxy” and a cylinder of radius

RIand length 2LIusing comoving coordinates is placed

around it such that the axis of the cylinder is directed

along the line of sight. If there is no other galaxy inside

this cylinder, the “seed galaxy” is regarded as a field

galaxy and one proceeds to the next galaxy in the list.

If, however, there are other galaxies within the cylin-

der, Phase II starts. In this phase, a second cylinder

with radius RIIand length 2LIIis defined and all galax-

ies inside this second cylinder directly connected to the

seed galaxy or to its immediate Delaunay-neighbours by

means of the Delaunay-mesh are assigned to the same

group. The number of galaxies inside the second cylin-

der NIIis taken as an estimate of the central richness of

the group. In Phase III, a third cylinder with radius

Basically one needs a full Voronoi-

RIII=r(˜ NII)1/3

LIII=l(˜ NII)1/3f(z)

(5)

(6)

is defined, whereas r and l are two free parameters,˜ NIIis

the central richness corrected for the redshift dependent

mean density ¯ n(z), and f(z) is a function introduced to

take into account that for a fixed velocity dispersion the

length of the fingers of god in redshift space is a function

of redshift.˜ NIIand f(z) are given by

˜ NII=¯ n(zref)

¯ n(z)

NII

(7)

and

f(z) =

s(z)

s(zref),s(z) =

1 + z

?Ωm(1 + z)3+ ΩΛ

(8)

respectively, where zrefis an arbitrary reference redshift

chosen to be 0.5. In this third phase, again all galax-

ies within the third cylinder are assigned to the current

group. After fixing zref, 6 free parameters RI, LI, RII,

LII, r, and l remain. The reader is referred to Table 1

for typical parameter-sets for the two group-finders.

It can be seen that both group-finding algorithms are

somewhat arbitrary and neither is directly inked to the

physical basis of a group, namely virialized motion within

a common potential well. While it seems that the VDM

algorithm is at least partly motivated by certain scaling

relations for groups (Gerke et al. 2005), this is at the

expense of simplicity which is clearly the mark of the

FOF algorithm.

23For a given set of sites in space, the “Voronoi-cell” of a certain

site consists of all points closer to this site than to any other site.

Furthermore, two sites whose Voronoi-cells share a common in-

terface are called “Delaunay-neighbours”. The “Voronoi-Delaunay

tesselation” for a given set of sites is the complete set of all its

Voronoi-cells and Delaunay-neighbours. For more formal defini-

tions and basic properties of Voronoi-Delaunay tesselations we refer

to basic textbooks of geometry.

24http://www.qhull.org

3.2. Basic statistical quantities

In order to assess the performance of a groupfinder, re-

alistic mock catalogues containing full information about

the underlying DM halos and their properties are needed.

In this section, we introduce some useful statistics to

characterize the overall fidelity of the resulting group cat-

alogues.

The fidelity of the group catalogue can be assessed

through comparing the “reconstructed” groups, obtained

by running the groupfinder on the mock catalogues, to

the “real” group catalogue described above - i.e. the set

of all DM haloes in the mocks that contain, after the 10k

selection criteria have been applied, at least two galax-

ies. The comparison is therefore of two identical point

sets, the galaxies in the mocks, whose points are grouped

together in possibly different ways. This is schematically

illustrated in Figure 3.

We follow here the definitions and notations of Gerke

et al. (2005). The two big circles constitute two group

catalogues. Each point corresponds to a galaxy of the

input galaxy sample and the encircled galaxies belong

to the same group. In the left-hand catalogue are the

“real groups” as given by the DM halos in the simulation,

while in the right-hand catalogue are the “reconstructed

groups” as identified by our groupfinder. Some sort of

measure is needed of how many reconstructed groups can

be identified with real groups and how many real groups

are recovered by our groupfinder. Following Gerke et al.

(2005), we define the following terms:

Association: A group i is associated to another group

j if group j contains more than the fraction f of

the members of group i. For this association to

be unique, it must hold f ≥ 0.5. Throughout this

paper, we set f = 0.5 as did Gerke et al. (2005).

One-way-match: If group i is associated to group j, but

group j is not associated to group i (illustrated by

an arrow from group i to group j).

Two-way-match: If group i is associated to group j and

vice versa (illustrated by a double-arrow).

While each group can only have a single, unique associ-

ated group (i.e. an arrow pointing away), it might well

happen that a certain group is the associated group for

many other groups (i.e. many arrows pointing toward

it). We therefore have the following terminology:

Over-merging: If more than one real group is associated

to the same reconstructed group.

Fragmentation: If more than one reconstructed group

is associated to the same real group.

Spurious group: A reconstructed group which has no

associated real group.

Undetected group: A real group which has no associated

reconstructed group.

Group galaxy: A galaxy which belongs to a group.

Field galaxy: A galaxy not associated to any group.

Page 6

6Knobel et al.

Fig. 3.— Schematic illustration of comparing a reconstructed group catalogue to a real group catalogue as obtained from DM simulation.

The left big circle constitutes the real group catalogue and the right big circle the reconstructed group catalogue. Each point displays a

galaxy and the encircled points inside the big circles constitute groups. A group in the real (reconstructed) catalogue may be associated to

a group in the reconstructed (real) catalogue (see the text for details). Such an association is indicated by an arrow pointing from the real

(reconstructed) group to the reconstructed (real) group. If there is an arrow pointing from one group to another and also an arrow pointing

backwards, such an association is termed a “two-way-match”. Otherwise it is just a “one-way-match”. If more than one reconstructed

group points to the same real group this is called “fragmentation”, if there is more than one real group associated to the same reconstructed

group, this is called “over-merging”.

TABLE 1

Optimal multi-run parameter-sets for FOF and VDM

FOFVDM

stepbLmax

(Mpc)a

RRI

LI

RII

(Mpc)b

LII

rl

(Mpc)b

(Mpc)b

(Mpc)b

(Mpc)b

(Mpc)b

1

2

3

4

5

0.11

0.11

0.08

0.19

0.07

0.45

0.45

0.4

0.4

0.3

13

13

12

11

18

0.7

0.7

0.5

0.4

0.4

10

10

6

10

8

1.0

0.4

0.2

0.4

0.6

80.6

0.6

0.5

0.5

0.5

10

7

7

4

7

12

12

12

8

Note. — The definitions of these parameters are given in § 3.1.

aphysical coordinates

bcomoving coordinates

With this terminology, the following statistical mea-

sures can be defined that together describe the overall

fidelity of the reconstructed group catalogue and thus

its potential usefulness for quantitative analysis.

Nreal

gr(Nreal) denote the number of real groups with Nreal

members, and Nrec

gr(Nrec) the number of reconstructed

groups with Nrecmembers. Then by

Let

A[Nreal

gr (Nreal) → Nrec

gr(Nrec)] (9)

we denote the number of associations of real groups with

Nrealmembers to reconstructed groups with Nrecmem-

bers. In the same way,

A[Nrec

gr(Nrec) → Nreal

gr(Nreal)](10)

denotes the number of associations of reconstructed

groups with Nrec members to real groups with Nreal

members. The analogue notations for the numbers of

two-way-associations are

A[Nreal

A[Nrec

gr (Nreal)↔Nrec

gr(Nrec)↔Nreal

gr(Nrec)] (11)

gr(Nreal)].(12)

Note that the last two expressions are equivalent to each

other. Then, with these notations we can formally intro-

duce the “one-way completeness” c1(N) and the “two-

way completeness” c2(N) by

c1(N)=A[Nreal

gr(≥N) → Nrec

Nreal

gr(≥N)

gr (≥N) ↔ Nrec

Nreal

gr(≥N)

gr(≥2)]

(13)

c2(N)=A[Nreal

gr(≥2)]

. (14)

Analogously, we define the “one-way purity” p1(N) and

“two-way purity” p2(N) as

p1(N)=A[Nrec

gr(≥N) → Nreal

Nrec

gr(≥N)

gr(≥2)]

(15)

Page 7

7

p2(N)=A[Nrec

gr(≥N) ↔ Nreal

Nrec

gr(≥N)

gr(≥2)]

.(16)

The one-way “completeness” c1(N) is a measure of the

fraction of real groups with N or more members that

are successfully recovered in the reconstructed group cat-

alogue, and the one-way “purity” p1(N) is a measure

of the fraction of reconstructed groups with N or more

members that belong to real groups. The higher c1(N)

the smaller the fraction of undetected groups (1−c1(N)),

and the higher p1(N) the smaller the fraction of spuri-

ous groups (1 − p1(N)). On the other hand, the smaller

the ratios c2(N)/c1(N) or p2(N)/p1(N) the more over-

merging or fragmentation, respectively, is present. By

definition the four quantities c1(N), c2(N), p1(N), and

p2(N) all take only values between 0 and 1.

While Gerke et al. (2005) have introduced these four

quantities c1, c2, p1, and p2 globally for a group cata-

logue including all groups, we have defined them to be

functions of the number of members N (“richness”). It

will become clear below that investigating these statistics

as a function of N is very useful for improving the perfor-

mance of a group catalogue. Note that the argument N

always means “for groups with N or more members” as is

clear from their definitions, so for N = 2 the two defini-

tions are identical. Throughout this paper we will always

consider the set of groups down to a given richness-class

N. So this convention eases the notation. It would, how-

ever, be straightforward to define the analogue quantities

in a non-cumulative way.

While c1(N), p1(N) etc. are statistical quantities on a

group-to-group basis, statistical quantities on a galaxy-

to-group basis may be useful as well. Therefore, following

Gerke et al. (2005), we define the “galaxy success rate”

Sgal(N) and the “interloper fraction” fI(N) as

Sgal(N)=[Sgal

real(≥N) ∩ Sgal

[Sgal

real(≥N)]

rec(≥N) ∩ Sgal

[Sgal

rec(≥N)]

rec(≥2)]

(17)

fI(N)=[Sgal

field]

,(18)

where Sgal

groups of N members, Sgal

ciated to reconstructed groups of N members, and Sgal

the set of real field galaxies. The square brackets [.] here

denote the number of elements in a set and the ∩ is the

usual intersection from set theory. Thus galaxy success

rate Sgal(N) is just the fraction of galaxies belonging to

real groups of richness ≥N that have ended up in any

reconstructed group, and the interloper fraction fI(N) is

the fraction of galaxies belonging to reconstructed groups

of richness ≥N that are field galaxies (“interlopers”).

Like c1(N), p1(N), etc., Sgal(N) and fI(N) will also take

values between 0 and 1.

It is well known (e.g. Frederic 1995, Gerke et al. 2005)

that a perfect reconstructed group catalogue is impossi-

ble to achieve and furthermore, that completeness and

purity tend to be mutually exclusive. As would be ex-

pected, the higher the completeness, the lower the pu-

rity, and vice versa (see Figure 4). There is also a simi-

lar dichotomy between over-merging and fragmentation.

Therefore, we introduce additional measures of “good-

real(N) is the set of galaxies associated to real

rec(N) the set of galaxies asso-

field

ness” which combine the statistics such as completeness

and purity in a way that maximizing (or minimizing)

them yields a sort of “optimal” group catalogue. We for-

mally define as (omitting the dependence of N for the

sake of clarity):

g1=

g2=c2

?

c1

?

(1 − c1)2+ (1 − p1)2

p2

p1

(19)

(20)

g3= (1 − Sgal)2+ f2

I.(21)

The meaning of these quantities is as follows: Since a

perfect group catalogue features (c1,p1) = (1,1), i.e. en-

tirely complete and absolutely pure, the reconstructed

group catalogue should come as close as possible to this

point in the c1-p1-plane. So g1gives the distance to this

optimal point in the c1-p1-plane and thus is a measure

of the balance of completeness and purity. Then, a good

group catalogue should exhibit c1 ≃ c2 and p1 ≃ p2

meaning that essentially no over-merging and fragmen-

tation is present in the catalogue. Hence, g2 measures

the balance between over-mergingand fragmentation and

should also approach 1. Finally, g3is similar to g1but

is on a galaxy-to-group basis instead of a group-to-group

basis. As is clear from their definitions, these measures

of goodness again take only values between 0 and 1. It

is clear that g1 and g3 should be minimized, while g2

should be maximized.

3.3. Optimization strategy

Since there exists no single perfect reconstructed group

catalogue, one has to optimize the group-finding param-

eters, in principle, in a way that the resulting group

catalogue serves as well as possible the intended scien-

tific purpose. However, as we will see, there seems to

be a rather natural way to construct a group catalogue

which is useful for many different purposes. The only

way to find such optimal parameters of a groupfinder is

to run it on the mocks for different parameter-sets, and

to compare the resulting group catalogues by means of

the statistics introduced in the previous section.

The completeness c1(8) and purity p1(8) of the recon-

structed group catalogues, after running FOF and VDM

over a large parameter space, are shown in Figure 4. It

is obvious that the points do not extend arbitrarily close

to the right upper corner (i.e. the perfect group cat-

alogue). The parameters c1(8) and p1(8) are in some

sense anti-correlated. In fact, the cloud of points seem

to feature a boundary toward high completeness and pu-

rity beyond which there is a region totally free of points.

It is notable, how similar this boundary is for FOF and

VDM approaches — clearly neither is markedly superior

to the other. The same holds for the g2(8)-goodness,

color coded in the figure, along this boundary region.

These similarities between FOF and VDM are observed

for all richness classes N. This indicates that this bound-

ary is probably the limit of what can be achieved with a

zCOSMOS-10k-like sample and does not depend on the

choice of algorithm. This also suggests that the choice

of a particular groupfinder such as FOF or VDM is less

important than sometimes argued, although, as we will

see, the properties of group catalogues obtained using

the two groupfinders are not absolutely identical.

Page 8

8Knobel et al.

0.50.60.70.80.91

0.5

0.6

0.7

0.8

0.9

1

c1 (8)

p1 (8)

2

3

4

5

6

8

10

12

FOF

0.50.60.70.80.91

0.5

0.6

0.7

0.8

0.9

1

c1 (8)

p1 (8)

2

3

4

5

6

8

10

12

VDM

0.5

0.6

0.7

0.8

0.9

1

Fig. 4.— Distributions of parameter-sets in the c1(8)-p1(8)-plane for a wide range of group-finding parameters. In the left panel are the

parameter-sets for FOF and in the right panel those for VDM. Each parameter-set is positioned at the average value for the 24 separate mock

catalogues. The parameter-sets are color coded by the goodness parameter g2(8) indicating the degree of over-merging or fragmentation.

The dotted line is the largest circle around the upper right corner being empty of points, i.e. the radius of this circle is equal to the smallest

g1(8) value. The best g1(8) parameter-set is marked by a diamond and the error bars exhibit the scatter among the 24 mocks for this

particular parameter-set. The labeled black points show the sites where the best g1(N) sets for different N reside on this plane, N being

denoted by the label of the points. Although these best sets inhabit, in general, very different places, they converge for N ≥ 8, at least for

FOF. The position of the best g2(8)-set is marked by a triangle and the one of the best g3(8) by a square.

VDM, much more than FOF, also exhibits some scat-

ter in the range given by 0.5 < c1(8) < 0.85 and

p1(8) > 0.65. The existence of such parameter-sets is

a natural side-effect of the relatively large number of

free parameters of the VDM groupfinder resulting in

many parameter combinations with obviously subopti-

mal properties in terms of c1(8) and p1(8). The extent

of this scatter, of course, also depends strongly on the ex-

plored range of values in the parameter space. Since we

are interested in parameter-sets yielding simultaneously

high completeness and high purity, we will only focus on

the boundary mentioned above.

The challenge is to find the best group catalogues

among those plotted in Figure 4, making the best com-

promise between c1and p1. A natural choice is the point

that lies closest to (c1,p1) = (1,1) indicated by the di-

amond. According to equation (19), this is the point

where g1is minimal. We will refer to this parameter-set

as the “best g1-set”. It defines a circle around the upper

right corner (dotted line) that is entirely empty of points.

In addition to minimizing g1, one would prefer, of

course, to simultaneously maximize g2and minimize g3.

In general, the best parameter-sets for these three good-

nesses will not coincide. Rather it turns out that the best

g2-set lies usually at slightly higher completeness relative

to the best g1-set (see triangles in Figure 4), while the

best g3-set lies usually at slightly lower completeness (see

squares in Figure 4). However, as is clear from Figure

4, the gradient of g2 is rather shallow around the best

g1-set and nearly maximal, so that the precise site of the

optimal g2-set is not that important. The same holds for

the gradient of g3. Finally, it seems that the best g1-set

is a good choice.

3.3.1. Multi-run procedure

Since c1(N), c2(N), etc. are functions of richness N,

one might wonder how the best g1(N)-sets for different

N are distributed in the c1(8)-p1(8)-plane. This is shown

in Figure 4 by the labeled points where the labels denote

the corresponding N.

optimal for all N ≥ 8 as well, while for N < 8 the optimal

g1(N)-sets reside at lower completeness. For VDM this

is less obvious, but at least for N ≥ 10 the best g1(N)-

sets seem to converge. In any case, it is clear that it

is not possible to simultaneously optimize g1(N) for all

N with a single parameter-set. If the parameter-set is

optimized for groups with N ≥ 8, the resulting group

catalogue is very complete for groups with N < 8 but

the purity starts to decrease severely for N < 5, and

a lot of spurious small groups enter the catalogue (see

Figures 5 and 6). Since around ≃ 80% of the groups

have N < 5, this is unsatisfactory.

This suggests that the groupfinder should be run sev-

eral times with different parameter-sets, each time opti-

mized for a different richness range. This is analogous

to the “hot-cold” double pass approach often used with

image detection algorithms such as SExtractor. We will

refer to this approach as the “multi-run procedure”, and

it was implemented it as follows:

For FOF, the best g1(8)-set is

1. The parameter-set is optimized for the range N ≥

6, the groupfinder is run, and only those groups

that are in this richness range are kept in the group

catalogue.

2. The parameter-set is then optimized for groups

with N = 5, the groupfinder is run again, and only

groups with N = 5 that are not yet detected in the

first step are added to the group catalogue.

3. Repeat the previous step for N = 4.

4. Repeat the previous step for N = 3.

5. Repeat the previous step for N = 2.

In each step, only those groups are accepted which have

not been found in an earlier step. It is better to work

down in richness because the richer groups are more eas-

ily detected. The optimal parameter set in each step

Page 9

9

0.6

0.7

0.8

0.9

1

completeness

single run

multi runFOF

single run

multi runVDM

246810 12

0.4

0.6

0.8

1

≥ N

purity

single run

multi runFOF

2468 10 12

≥ N

single run

multi runVDM

Fig. 5.— Comparison of the completeness and purity obtained

from a single run and from the multi-run-procedure. The left two

panels show the statistics for FOF and the right two panels for

VDM. In each panel, the blue color corresponds to the single run

and the red color to the multi-run-procedure. In the upper two pan-

els, the solid lines display the one-way completeness c1, and in the

lower two panels they show the one-way purity p2. In each panel,

the dotted lines display the corresponding two-way-quantities being

c2or p2. It is shown that the purity obtained from the multi-run-

procedure is more balanced than that from the single run. For

FOF this leads also to a more balanced completeness.

246810

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

FOF single run

FOF multi run

1WM

VDM multi run

≥ N

(Nrec−Nreal) / Nreal

Fig. 6.— Relative abundance of reconstructed groups as com-

pared with the real groups as a function of richness N. The green

line shows the mean relative abundance of single-run FOF groups,

the blue line the mean relative abundance of multi-run FOF groups,

and the red line the mean relative abundance of one-way-matched

groups. The errorbars always exhibit the scatter among the 24

mocks. The gray shaded region displays the spread of the relative

abundance of real groups among the 24 mocks (i.e. cosmic vari-

ance plus shot-noise). For N ? 6 the number of multi-run FOF

groups is slightly too high and exceeds the margin of cosmic vari-

ance while the abundance of the one-way-matched groups is well

within the region dominated by cosmic variance. For comparison

the relative abundance of multi-run-VDM groups is shown as well

(black dotted line).

is basically just the best g1-set for the corresponding

richness range. However, particularly in the first step,

also other choices are possible. In fact, for VDM, we

have chosen a special set for the first step since the best

g1(6)-set proved to be by no means optimal for N ? 8.

Table 1 gives the optimal parameter-sets for FOF and

VDM. Since there are some degeneracies between the

parameters, there are no simple trends from step 1 to

step 5 for the single parameters.

2468 1012

0.6

0.7

0.8

0.9

1

≥ N

fraction

FOF → VDM

real data

mocks, N ≤ 20

mocks

246810 12

0.6

0.7

0.8

0.9

1

≥ N

fraction

FOF ← VDM

real data

mocks, N ≤ 20

mocks

Fig. 7.— Comparison between the multi-run FOF and the multi-

run VDM catalogues. The upper panel shows the fraction of FOF

groups associated to VDM-groups.

real data 10k group catalogue, whereas the solid line designates

one-way-matches, and the dotted line two-way-matches. The black

solid line corresponds to the mean fraction of associations in the

mocks, if groups with N > 20 are omitted, and the error bars

exhibit the scatter among the 24 mocks. The black dotted line

shows the same, if all groups are taken into account. In the lower

panel, the symbols have identical meaning but exhibit the fraction

of VDM groups associated to FOF groups.

The red corresponds to the

Figure 5 shows how the multi-run-procedure compares

to the single run best g1(8)-set. In the case of FOF,

the completeness has slightly decreased for N ? 5 com-

pared to the single run, but the high completeness of the

single run in this richness range comes at the cost of a

low purity. In fact, for the multi-run, the purity has in-

creased for N ? 5, and has become almost constant for

all richness classes. Thus, the overall behaviour of the

completeness and purity is now more balanced.

For VDM, we observe a similar trend. Here, it is par-

ticularly evident that in a single run, even if optimized

for N ≥ 8, the completeness decreases for N ? 6. The

multi-run-procedure can correct for this and still increase

the purity for small groups.

While the overall statistics of the two multi-run-

catalogues are similar, there are some minor differences:

The overall behaviour of the completeness and purity as

a function of N seems to be more balanced for FOF. Also

the ratios c2/c1and p2/p1are more balanced for FOF,

while for VDM, c2/c1increases and p2/p1decreases to-

ward higher N. On the other hand, the total number of

groups found with FOF is too high for N ? 3, while for

VDM, the number of reconstructed groups is too high for

N ? 5 (Figure 6). All things considered, the multi-run-

procedure works better with FOF than with VDM.

3.3.2. Combining FOF and VDM

With the FOF and VDM multi-run group catalogues,

there are now two catalogues available, obtained by dif-

ferent algorithms, and exhibiting similar purity and com-

pleteness.A comparison of the two catalogues on a

group-to-group basis is shown in Figure 7. The red lines

show the result for the real 10k sample. An FOF group

with N ≥ 2 has a probability of being associated with a

VDM group of ∼ 80%, increasing roughly linear with N

until it reaches 100% for N ≥ 10. On the other hand,

the probability of any VDM group being associated with

Page 10

10Knobel et al.

a FOF group is greater than ∼ 80%, and even higher

than ∼ 90% for N ≥ 8. The reason that for N ? 4

the VDM groups have a higher probability of being as-

sociated with the FOF groups than vice versa is due to

the excess production of small groups in the FOF cat-

alogue. Furthermore, note that whenever a group with

N ≥ 6 has an associated group this association is a two-

way-association. Thus, the two catalogues, though not

identical, contain mainly the same structures. Moreover,

the real data agree very well with the mocks (black solid

lines), if groups with N > 20 in the mocks are omitted (in

the mocks there are too many of them, see § 5.1). This

shows that the groupfinders work indeed comparably on

the real data as they do on the mocks.

Is there a way to combine the information in the two

catalogues in order to obtain a single optimal group cata-

logue? It seems natural to consider those group galaxies

that were recovered by both groupfinders. We introduce

a “galaxy purity parameter” (GAP) for each galaxy. The

GAP is a flag indicating if a certain group galaxy is con-

tained simultaneously in both catalogues. For a certain

FOF group galaxy it is defined as follows:

• If there is no VDM group containing this galaxy, it

gets a GAP equal to 0.

• If it is also contained in a VDM group, and the FOF

group has a one-way-match to this VDM group or

this VDM group exhibits a one-way-match to the

FOF group, the galaxy gets a GAP equal to 1.

• If it is contained in a VDM group, and the FOF

group has a two-way-match to this VDM group,

the galaxy gets a GAP of 2.

Thus, we expect that the higher the GAP for a galaxy,

the more reliable the detection, and the higher the prob-

ability that this galaxy is a real group galaxy and not

an artefact introduced by one of the groupfinders. The

GAP is a useful flag for excluding uncertain group mem-

bers if needed, and defines more clearly the reliable core

of a group.

Then, we can define two sub-sets, or “sub-catalogues”,

of the basic FOF catalogue.

(1WM) sub-catalogue contains only FOF group galax-

ies with a GAP ≥ 1.

way-matched” sub-catalogue contains only group galax-

ies with a GAP = 2, i.e. all galaxies with a GAP ≤ 1

become field galaxies. Note that we have defined the

GAP, and thus the 1WM and 2WM, based on the FOF

groups. They could, of course, also be defined based

on the VDM groups. However, to obtain a single opti-

mal group catalogue, we have to choose between FOF

and VDM. As discussed in the last section, though the

multi-run catalogues obtained by these groupfinders ex-

hibit similar statistics, some (minor) properties are over-

all better for FOF. So we have decided the FOF cata-

logue to be the basic catalogue. The VDM catalogue, by

contrast, is therefore only used to determine the GAPs

of FOF group members. Since the two sub-catalogues

preserve the group structure of the basic FOF catalogue,

this set of three group catalogues can be presented as one

single big catalogue with the GAP flags to indicate the

increasing purity.

The “one-way-matched”

In a similar way, the “two-

TABLE 2

Catalogue statistics for N ≥ 5

cataloguec1a

p1b

c2/c1

p2/p1

Sgalc

fId

FOF

1WM

2WM

0.85

0.81

0.77

0.78

0.82

0.83

0.92

0.93

0.95

0.92

0.92

0.92

0.87

0.81

0.72

0.19

0.17

0.17

Note.

quantities are given in § 3.2.

aOne-way completeness

bOne-way purity

cGalaxy success rate

dInterloper fraction

— The precise definitions of the statistical

3.4. Results on the mocks

In this section, we will summarize our findings and give

a detailed statistical description of the FOF catalogue

with its two sub-catalogues (1WM and 2WM).

The statistics of the merged catalogues in comparison

with the reference FOF catalogues is shown in Figure 8

and for N ≥ 5 in Table 2.

the mean among the 24 mocks and the errorbars their

scatter. The FOF basis catalogue has a completeness

c1≃ 0.85 almost not depending on the richness N and

a purity p1≃ 0.78 only weakly depending on N. Only

for N = 2 there is a significant decrease in both com-

pleteness and purity. The corresponding statistics for

the 1WM and 2WM sub-catalogues have almost identi-

cal dependences on N but, as expected, their c1is lower

and p1higher.

It can be seen that the gain of the 2WM catalogue

compared to 1WM in terms of both purity or interloper

fraction is much smaller than the deficit in terms of com-

pleteness and galaxy success rate. This indicates that by

keeping only group galaxies with a GAP = 2, many real

group galaxies are removed, but only a relatively small

number of interlopers are eliminated. By contrast, the

gain in purity of the 1WM with respect to the reference

FOF is quite comparable to the associated decrease in

completeness. Thus, while the 1WM catalogue is a use-

ful construction, little is gained by the more restrictive

2WM catalogue. In the remainder of this paper, we will

mainly refer to the FOF and its 1WM sub-catalogue. We

note that not only do the ratios c2/c1and p2/p1behave

well as a function of N for the three catalogues, but also

c2/c1 ≃ p2/p1. This means that the contributions of

over-merging and fragmentation are not only small, but

are also well-balanced.

So far, we have considered the statistics averaged over

the whole redshift range, i.e. 0.1 ? z ? 1. In Figure

9, the completeness (blue line) and the purity (red line)

of the FOF catalogue are shown as functions of redshift

for several richness classes N. The curves are consistent

with a relatively constant completeness and purity with

redshift. Only the highest redshift bins for N ? 4 show

possibly a slight decrease. This emphasizes further the

robustness of our catalogue. Figure 10 shows how the

galaxy success rate Sgal and the interloper fraction fI

behave as a function of the normalized projected distance

from the group centers.

The distance variable r is defined for each group galaxy

The lines exhibit

Page 11

11

2468 10 12

0.5

0.6

0.7

0.8

0.9

1

≥ N

completeness

FOF multi−run

1WM

2WM

24681012

0.5

0.6

0.7

0.8

0.9

1

≥ N

purity

24681012

0.5

0.6

0.7

0.8

0.9

1

≥ N

galaxy success rate

24681012

0

0.1

0.2

0.3

0.4

0.5

≥ N

interloper fraction

Fig. 8.— Statistics of the FOF and its two sub-catalogues, WM1 and WM2, as function of richness N. For all panels, blue refers to the

FOF groups, red to the one-way-matched groups and green to the two-way-matched groups. The errorbars show the scatter among the 24

mocks. The upper left panel exhibits completeness and the upper right panel purity. The solid lines correspond to c1 and p1 respectively

and the dashed line c2and p2respectively. The lower right panel shows the galaxy success rate Sgaland the lower left panel the interloper

fraction fI.

0.5

1

N ≥ 2

completeness

purity

N ≥ 3

0.5

1

N ≥ 4N ≥ 5

0.5

1

N ≥ 6N ≥ 8

0 0.20.40.60.81

0

0.5

1

N ≥ 10

z

0.20.40.60.81

N ≥ 12

z

Fig. 9.— Completeness and purity of the FOF groups as a func-

tion of redshift for 8 different richness classes. In each panel, the

blue solid line corresponds to the mean c1-completeness and the

red solid line to the mean p1-purity, whereas the errorbars exhibit

the scatter among the 24 mocks. The dashed lines are for the cor-

responding 2-way-quantities, respectively. The richness class N is

indicated in each panel.

200

400

# group galaxies

real

123

0

0.2

0.4

0.6

0.8

1

r

galaxy success rate

FOF

1WM

FOF

1WM

123

r

interloper fraction

FOF

1WM

Fig. 10.— Behaviour of the galaxy success rate Sgal and the

interloper fraction fIas a function of the normalized projected dis-

tance r from the group centers, where r is defined in Equation 22.

The left lower panel shows the galaxy success rate Sgal, where the

blue line corresponds to the FOF and the red line to the 1WM cat-

alogue. The left upper panel shows the distribution of real group

galaxies as a function of separation from the cluster centers. It is

clear that at r ? 1.5, where most real group galaxies reside, Sgal

is ? 0.9 for FOF groups and only slightly lower for 1WM groups.

The right lower panel exhibits interloper fraction fIand the right

upper panel the distribution of galaxies in reconstructed groups as

a function of r, whereas blue corresponds to FOF and red to 1WM.

Page 12

12Knobel et al.

as

r =

??

θra

∆θra

?2

+

?

θdec

∆θdec

?2

,(22)

where θra is its separation from the group center in α,

and ∆θrathe second moment in α among all members of

this group. Similar definitions hold for θdec and ∆θdec.

Only groups with 3 or more members are taken into ac-

count, since for groups with only 2 members r becomes

meaningless.

The left lower panel shows the galaxy success rate Sgal

as a function of r from the real group centers. As one

would expect, it increases toward the group centers. For-

tunately, the group centers are also the region where most

of the real group galaxies reside (left upper panel). Note

that Sgal can decrease, in principal, in two ways: First

of all, by failing to identify certain real group galaxies in

successfully detected real groups, and second, by failing

to detect a real group at all. The small deficit in Sgalfor

r ? 1 is due to the second reason, while the first reason

becomes more important with increasing r.

In the right lower panel, the interloper fraction fI is

plotted as a function of r, where r is now related to the

centers of the reconstructed groups. As expected, the in-

terloper fraction shows the opposite behaviour as a func-

tion of r. However, the difference in fIbetween near and

far galaxies from the group centers is less strong than for

Sgal. For small r, the most important contribution to fI

comes from spuriously detected groups with 3 members.

Finally, Figure 6 shows the numbers of reconstructed

groups relative to the number of real groups. As was al-

ready mentioned, the mean difference between the num-

ber of reconstructed FOF groups and the number of real

groups exceeds the uncertainty expected by cosmic vari-

ance from mock to mock for N ? 5, while the groups of

the 1WM sub-catalogue are well within this region.

According to the statistics discussed in this paragraph,

particularly in Figure2, it became clear that the FOF

group catalogue along with its 1WM sub-cataloge has

the potential to be useful for many different applications

such as galaxy evolution studies, group statistics, or grav-

itational lensing. For example, if one aims to study the

evolution of galaxies in groups, a high purity and a low

interloper fraction are desirable, so the 1WM catalogue

is probably appropriate. On the other hand, in order to

have a relative pure sample of field galaxies, galaxies not

contained in the basic FOF catalogue should be selected.

Generally, it holds that whenever small groups, number

of groups, or purity of the group sample is important, the

1WM catalogue is to be preferred to the FOF catalogue.

3.5. Comparison with DEEP2

For DEEP2, Gerke et al. (2005) optimized their VDM

groupfinder in order to obtain the correct number of re-

constructed groups Nrec

gr(σ,z) as a function of velocity

dispersion σ and redshift. As result, they present two

group catalogues: an “optimal” catalogue and one with

maximized purity. Since Gerke et al. (2005) did not treat

completeness and purity as a function of richness N, all

their statistics correspond to N ≥ 2.

The statistics for their optimal parameter set are c1=

0.782 ± 0.006, p1 = 0.545 ± 0.005, Sgal = 0.786, and

fI= 0.458± 0.004. The ratios between the two-way and

the one-way-quantities are therefore c2/c1 = 0.919 and

p2/p1= 0.987. So in comparison with our own FOF N ≥

2 statistics, their completeness c1and galaxy success rate

Sgalare ∼ 3% and ∼ 6% lower, respectively, while their

purity p1is ∼ 17% lower, and their interloper fraction fI

∼ 56% higher.

We conclude that, compared with the DEEP2 “opti-

mal” group catalogue, the performance of our FOF group

catalogue is very high. Moreover, it would be very inter-

esting to compare the statistics for the higher richness

classes as well.Since Gerke et al.

their catalogue using all groups with N ≥ 2 their cat-

alogue should be optimal regarding the N ≥ 2 statistics.

But, in contrast to a multi-run catalogue, this might not

be the case for the higher richness statistics, since the

N ≥ 2 statistics are actually dominated by 2-member

groups being by far the most abundant. This suggests

that the relative superiority of our FOF catalogue over

the DEEP2 catalogue could be even higher for the higher

richness classes.

(2005) optimized

4. THE REAL DATA 10K GROUP CATALOGUE

In this section, the real data 10k group catalogue is pre-

sented. It is given by means of the Tables 3 and 4. Table

3 is a list of all groups along with their properties, and

Table 4 is a list of all group galaxies. The group galax-

ies are associated to their group by means of the unique

group-ID. The galaxy-IDs refer to the 10k catalogue pub-

lished by S. J. Lilly et al. (2009, in preparation).

4.1. Group purity parameter

Since we are presenting the FOF catalogue along with

its two sub-catalogues, defined by the GAP parameter in

the final column, any group property can, in principle,

be calculated for all three catalogues. For instance, it is

possible to assign to each group three observed richnesses

N. To avoid confusion and to keep the discussion simple,

all group properties given in Table 3 correspond to the

basic FOF catalogue. In order to quantify the number

of 1WM galaxies in a certain group, we introduce the

group purity parameter (GRPi) for i = 1,2, defined by

the fraction of FOF members having a galaxy purity pa-

rameter GAP ≥ i. For i = 1 this is the fraction of FOF

members that are also 1WM members, and for i = 2 that

are also 2WM members. Note that if the GRP1is zero,

then there is no association between the FOF group and

a VDM group.

The statistics of the number of groups and the GRP1

are summarized in Table 5. The basic FOF catalogue

contains 800 groups with N ≥ 2, 102 groups with N ≥ 5,

and 23 groups with N ≥ 8. Over 80% of the groups with

N ≥ 2 have a GRP1greater than zero, i.e. these groups

have at least one group galaxy that was independently

recovered by both FOF and VDM. For the groups with

N ≥ 5, the number of groups with GRP1 > 0 rises to

95%, and for those with N ≥ 8 it is 96% (22 out of 23).

Figure 7 shows the comparison between the real data

FOF and VDM catalogues.

The mean GRP as a function of richness N is given

in Figure 11. The blue solid line shows the mean GRP1

taking into account all groups with ≥ N. There is a slight

and noisy rise from about 0.8 for N ≥ 2 to 0.9 for N ≥ 9

due to the fact that the fraction of groups with GRP1

Page 13

13

TABLE 3

Group catalogue (excerpt)

Group-IDNa

?α?

(deg)

?δ?

(deg)

?z?ˆ σb

Mfudgec

(M⊙)

GRP1d

(km/s)

0

1

2

3

4

5

6

7

8

9

10

7

20

6

14

6

6

9

12

10

6

6

6

8

150.0087

149.4817

150.3004

150.3444

149.8568

149.7238

150.2824

150.078

150.1122

150.4526

150.2371

150.2304

150.1636

150.2494

2.0287

2.5073

2.4489

2.1544

1.8151

2.399

2.1531

2.2136

2.3564

2.6799

1.9404

2.5608

2.0342

2.6574

0.0788

0.0919

0.1231

0.1222

0.1243

0.1252

0.1686

0.1865

0.2208

0.2179

0.2188

0.2207

0.2208

0.2672

409

393

442

8.20e12

9.58e12

6.09e13

1.13e13

1.92e13

1.23e13

5.98e12

2.18e13

2.71e13

2.83e13

1.13e13

2.29e13

1.35e13

3.85e13

0.9

1

0.95

1

0.93

1

0.5

1

0.92

1

1

0

1

1

0

532

0

69

242

596

768

238

530

355

403

10

11

12

13

Note. — The full table is available in electronic form.

aObserved richness

bVelocity dispersion (see § 4.3)

cVirial mass of the DM halo (see § 4.4)

dGroup purity parameter (GPR1) (see § 4.1)

TABLE 4

Group galaxies (excerpt)

Galaxy-IDGroup-IDNαδz GAPa

(deg) (deg)

818787

818888

818934

818935

818982

819035

819041

819060

819118

819133

842033

842048

842049

0

0

0

0

0

0

0

0

0

0

1

1

1

10

10

10

10

10

10

10

10

10

10

7

7

7

150.0605

150.0365

150.0241

150.0239

150.0134

149.9989

149.9984

149.9912

149.9724

149.9681

149.4897

149.4844

149.4839

2.0067

2.0249

1.9687

2.0727

2.0296

1.9858

2.0351

1.9912

2.1054

2.0673

2.5164

2.4991

2.5211

0.0785

0.0794

0.0779

0.0779

0.0791

0.0805

0.0789

0.0797

0.0781

0.0779

0.0913

0.0907

0.0915

2

2

2

2

2

0

2

2

2

2

2

2

2

Note. — The full table is available in electronic form.

aGalaxy purity parameter (see § 3.3.2)

TABLE 5

Number of groups

≥ NNgra

f(GRP1> 0)b

< GRP1>

2

3

4

5

6

7

8

9

800

286

150

102

59

36

23

17

0.82

0.86

0.95

0.95

0.95

0.94

0.96

1.00

0.80

0.81

0.88

0.88

0.87

0.86

0.88

0.93

aNumber of groups

bFraction of groups with a group purity param-

eter (GPR1) larger than zero.

equals zero is slightly bigger for smaller N. On the other

hand, the dashed blue line shows the mean GRP1taking

into account only groups with a non zero GRP1, i.e. only

groups simultaneously found by both groupfinder. For

these groups, the GRP1 is slightly decreasing since for

bigger groups it becomes easier for the two groupfinders

23456789

0.6

0.7

0.8

0.9

1

≥ N

<GRP>

GRP1

GRP1, only groups with GRP1>0

GRP2

GRP2, only groups with GRP1>0

Fig. 11.— Mean GRP as a function of observed richness N. The

blue solid line shows the GRP1 and the red solid line the GRP2.

The dashed lines show the corresponding GRPi, i = 1,2, by taking

into account only groups with a non zero GRP1.

to disagree on one or two galaxies in the outskirts of

the group. The red lines in Figure 11 show the same

quantities for GRP2.

4.2. Corrected richness Ncorr

The distribution of FOF groups as a function of red-

shift for three richness classes N is shown in Figure 12.

Comparing the black histograms (groups) with the red

dashed lines (all galaxies) it is clear that the number

of groups at a given redshift scales with the number of

galaxies at the same redshift. This is basically true for

all richness classes although for the richest N ≥ 8 there

is a lack of groups at redshifts z ? 0.5. In the framework

of the hierarchical cold dark matter (CDM) structure

formation scenario we expect the cluster mass function

to grow with time (for a review see Voit 2005). This

growth should be reflected in the decrease of the number

of groups of a given richness with redshift.

In order to address this question, it is necessary to

correct the observed richness of a cluster to produce an

intrinsic richness that is redshift-independent. We there-

Page 14

14Knobel et al.

0

20

40

60

80

100

120

Ngr

N ≥ 2

0

5

10

15

Ngr

N ≥ 5

0 0.20.40.6 0.81

0

2

4

6

z

Ngr

N ≥ 8

Fig. 12.— Number of groups as a function of redshift for different

richness classes N. The top panel shows the number of groups Ngr

for groups with N ≥ 2, the middle panel for N ≥ 5, and the bottom

panel for N ≥ 8. The red dotted line shows the number of galaxies

Ngalfor the galaxy sample scaled down for comparison with the

groups. It is obvious that the distribution of groups follows the

distribution of galaxies.

fore introduce the corrected richness Ncorr, correcting the

observed richness N for spatial sampling rate and red-

shift success rate, and considering for each group only

the number of members brighter than a given absolute

magnitude limit Mb,lim(z), i.e for each group

Ncorr(Mb,lim(z)) =

?

i

1

Cαδ,i

1

Crsr,i,(23)

where the sum is over the members of the group with

Mb ≤ Mb,lim(z), and Cαδ,i and Crsr,i are the sampling

rate and the redshift success rate, respectively, for the

galaxy i. The redshift dependence of the absolute mag-

nitude limit is always taken to be Mb,lim(z) = Mb,lim−z,

whereas the subtraction of the redshift is to account ap-

proximately for the luminosity evolution of the galaxies.

So Ncorrcan simply be characterized by Mb,limbeing the

absolute magnitude limit at redshift zero. Absolute mag-

nitudes were obtained by means of standard multicolor

spectral energy distribution (SED) fitting using an upt-

1 10

11.5

12

12.5

13

13.5

14

14.5

15

log ( Mfudge / Msol )

Ncorr (−20)

N ≥ 2

Fig. 13.— Correlation between Ncorr(−20) and Mfudge(see § 4.4)

for the 10k groups. Shown are all groups region having a redshift

z < 0.8, so that the sample is volume limited for Mb,lim(z) =

−20 − z. The solid line is a linear regression through the points,

and the dashed line is the same quantity for the reconstructed

groups in the mocks not shown here. The dotted line exhibits the

linear regression for the Ntrue(−20)-M-relation for the real groups

in the mocks. Taking into account the overestimation of Ncorr of

about 50%−100% (see the text), the dotted curve can be reconciled

with the solid one.

dated version of the ZEBRA code (Feldmann et al. 2006,

P. Oesch et al. 2009 in preparation).

If we denote the actual number of group mem-

bers brighter than Mb,lim(z) in a real group in the

mocks (without the spatial or redshift sampling rates)

by Ntrue(Mb,lim), then we find that, for reconstructed

groups exhibiting a two-way-match to real groups, the

estimated Ncorr(−20) exhibit a relatively large scatter

(±50%) compared with Ntrue(−20) of the corresponding

real groups. Furthermore, Ncorr(−20) on average over-

estimates Ntrue(−20) by about 50 − 100% depending on

N. This is because (1) for most groups we are in the low

number regime, (2) the sampling rate in the 10k sam-

ple is rather low (so the corrections are big and noisy),

(3) groups with no galaxy brighter than Mb,lim(z) can-

not be corrected for sampling rate at all, and (4) the

reconstructed groups are affected by interlopers. One

should therefore be cautious in interpreting Ncorras the

actual richness of the groups. Nevertheless, Ncorr(−20)

shows a relatively tight correlation with the estimated

halo mass Mfudge(see § 4.4), even for N ≥ 2, as is shown

in Figure 13 and thus is still a useful quantity. The anal-

ysis of the redshift distribution for groups with a given

Ncorr(−20) is performed in § 5.2.

4.3. Velocity dispersion estimation

The corrected richness Ncorrdiscussed in the last sec-

tion is probably the simplest and most straightforward

characterization of a group. However, there are other

characterizations of groups which may be more directly

useful from a physical point of view such as velocity

dispersion σ or dynamical mass M. Since most of our

groups have richness N ? 10, we are in a low number

regime, where the estimation of both velocity dispersion

and mass is non-trivial.

According to Beers, Flynn & Gebhardt (1990) the

Page 15

15

best estimators for velocity dispersion in groups with

few members are the gapper estimator and the simple

standard deviation. On the other hand, the biweight

estimator seems to work very well on a large range of

richness classes N except for N ? 20 where its perfor-

mance is lower but still sufficient. For comparison, we

have implemented all three estimators and none of them

is significantly superior to the others when applied to the

mocks, and so we will stick to the gapper estimator since

it is the most commonly used among the three.

The implementation for a group with N members is as

follows: First of all, for each group member i we com-

puted the redshift difference dziin respect to the mean

group redshift zgr. Then these redshift differences were

converted into velocities by

dvi= c dzi/(1 + zgr)(24)

with c the speed of light. Then after sorting the velocities

dviin ascending order, the gapper estimate is given by

√π

N(N − 1)

whereas the weights wiand the gaps giare defined by

σgap=

N−1

?

i=1

wigi, (25)

wi=i(N − 1)

gi=dvi+1− dvi

(26)

(27)

for i = 1,...,N −1. However in order to have a realistic

estimate of the velocity dispersion of our group we have

to correct σgapfor our redshift uncertainty σvof roughly

100 km s−1. This is done by

√3

?

where ˆ σ is the final estimate of the velocity dispersion σ.

The factor√3 converts the line of sight velocity disper-

sion to the 3D velocity dispersion. If σv is larger than

σgapwe set ˆ σ formally to zero.

Since the COSMOS lightcones (Kitzbichler & White

2007) provide only the “virial velocity”25vvirof the DM

halos and not directly the “velocity dispersion” σ, we

cannot precisely estimate the uncertainty of the esti-

mated ˆ σ for a group. But comparing ˆ σ to vvir should

provide an upper limit to the uncertainty. To take into

account the influence of interlopers on ˆ σ, we considered

the estimated velocity dispersion of reconstructed groups

exhibiting a two-way-match with real groups. (Wrongly

detected groups do not exhibit a meaningful velocity dis-

persion.)

We find that for N ≥ 5 the ratio between the median

virial velocity vvir and ˆ σ remains roughly constant for

ˆ σ ? 350 km s−1, and exhibits an error of about 25%

(upper and lower quartile) (Figure 14). Note that the

estimated ˆ σ do not need to fall exactly on the 45◦-line,

since σ and vvirare not exactly the same quantities. For

ˆ σ ? 350 km s−1the estimated velocity dispersion ˆ σ is

biased to lower values due to the subtraction in equation

ˆ σ =

σ2

gap− σ2

v,(28)

25In the COSMOS lightcones, the virial velocity is simply de-

fined by vvir=

?GM200/r200, whereas G is the gravitational con-

stant, and M200and r200 are the virial mass and the virial radius,

respectively, related by M200 = 4/3πr3

critical density of the universe at the redshift of the halo.

200200ρc(z) with ρc(z) the

0 200400600

σ (km/s)

8001000 12001400

0

200

400

600

800

1000

1200

1400

vvir (km/s)

N ≥ 5

Fig. 14.— Correlation between the estimated velocity dispersions

ˆ σ of groups with N ≥ 5 and the virial velocities vvirof the DM halos

. Each point displays a reconstructed group exhibiting a two-way-

match to a real group whose DM halo yields vvir. It is obvious that

for ˆ σ ? 350 the estimated velocity dispersion is underestimating

the virial velocity.

(28). On the other hand, for N < 5, the correlation be-

tween ˆ σ and vvir is very weak, so that the ˆ σ for these

richness classes contains almost no information. Hence,

we have decided to assign no estimated velocity disper-

sion to groups with N < 5 in Table 3.

Note that applying the velocity dispersion estimation

to the real groups instead of the reconstructed groups

does not significantly alter these results. Even estimat-

ing the velocity dispersion for the real groups in the 10k

mocks taking into account all galaxies down to r ≤ 26

still yields a scatter of about 10 to 15%.

4.4. Estimation of dynamical mass

Estimating the dynamical mass of the underlying dark

matter halo of a group is even more difficult than esti-

mating the velocity dispersion. The simplest method for

the estimation of dynamical mass is by using some form

of the virial theorem. The standard relation is (e.g. Eke

et al. 2004)

ˆ

M = Aˆ σ2r⊥

G

,(29)

where A is a constant depending on the mass distribution

of the halo (e.g. geometry, concentration, etc.), ˆ σ the es-

timated velocity dispersion, and r⊥is some estimate of

its projected radius. Heisler, Tremaine & Bahcall (1985)

discuss four simple mass estimators, each being only a

function of the projected distances and radial velocities

of the group galaxies in respect to the group center. In

applying them to the reconstructed groups in the mocks,

none of them works substantially better than the sim-

ple relation in equation (29) and all show a similar be-

haviour, so we consider only the standard virial theorem.

To use the estimator in Equation (29), the constant of

proportionality A needs to be calibrated properly. Doing

this with the mocks and using an appropriate estimation

for the projected radius, we find a similar behaviour for

the estimated masses like for the velocity dispersion. For

N ? 5 there is only a very weak correlation between the

estimated mass and the actual mass of the underlying