Available via license: CC BY 4.0
Content may be subject to copyright.
Using Neural Networks to Automate the Identification of Brightest Cluster Galaxies in
Large Surveys
Patrick Janulewicz
1,2,3,4
, Tracy M. A. Webb
1,2
, and Laurence Perreault-Levasseur
2,3,4,5,6,7
1
Department of Physics, McGill University, Montréal, QC, Canada
2
Trottier Space Institute, McGill University, Montréal, QC, Canada
3
Ciela Institute, Montréal, QC Canada
4
Mila—Quebec Artificial Intelligence Institute, Montréal, QC, Canada
5
Department of Physics, Université de Montréal, Montréal, QC, Canada
6
Center for Computational Astrophysics, Flatiron Institute, NY 10010, USA
7
Perimeter Institute for Theoretical Physics, Waterloo, ON, Canada
Received 2024 September 13; revised 2025 January 29; accepted 2025 January 30; published 2025 March 4
Abstract
Brightest cluster galaxies (BCGs)lie deep within the largest gravitationally bound structures in existence. Though
some cluster finding techniques identify the position of the BCG and use it as the cluster center, other techniques
may not automatically include these coordinates. This can make studying BCGs in such surveys difficult, forcing
researchers to either adopt oversimplified algorithms or perform cumbersome visual identification. For large
surveys, there is a need for a fast and reliable way of obtaining BCG coordinates. We propose machine learning to
accomplish this task and train a neural network to identify positions of candidate BCGs given no more information
than multiband photometric images. We use both mock observations from THE THREE HUNDRED project and real
ones from the Sloan Digital Sky Survey, and we quantify the performance. Training on simulations yields a
squared correlation coefficient, R
2
, between predictions and ground truth of R
2
≈0.94 when testing on simulations,
which decreases to R
2
≈0.60 when testing on real data owing to discrepancies between data sets. Limiting the
application of this method to real clusters more representative of the training data, such as those with a BCG r-band
magnitude r
BCG
16.5, yields R
2
≈0.99. The method performs well up to a redshift of at least z≈0.6. We find
this technique to be a promising method to automate and accelerate the identification of BCGs in large data sets.
Unified Astronomy Thesaurus concepts: Brightest cluster galaxies (181);Hydrodynamical simulations (767);
Neural networks (1933);Galaxy clusters (584)
1. Introduction
Galaxy clusters are the largest gravitationally bound objects
in the Universe. At the heart of these clusters lies the brightest
cluster galaxy (BCG). These BCGs are among the most
massive and luminous galaxies in the Universe today. They
also play a pivotal role in our understanding of galaxy
evolution. The BCG is often considered to be the optical center
of the galaxy cluster, lying at the base of the cluster’s
gravitational potential well (F. C. van den Bosch et al. 2005).
Some optical cluster finding techniques naturally identify the
most likely BCG candidate and set the center of the cluster to
be at that point. A well-known example of this is redMaPPer
(E. S. Rykoff et al. 2014), which assigns up to five candidates
ranked by probability. Another example is the method seen in
Z. L. Wen et al. (2012, hereafter WHL12), which selects the
brightest galaxy within a certain distance from the cluster
center.
However, not all cluster finding techniques use the BCG as a
center. X-ray surveys (A. Vikhlinin et al. 1998; A. Liu et al.
2022), for example, select the X-ray emission peak instead.
Another popular method is to use fluctuations in the cosmic
microwave background caused by the Sunyaev–Zeldovich (SZ)
effect as tracers of galaxy clusters (L. E. Bleem et al. 2015;
Planck Collaboration et al. 2016). In this case, the center is
selected to be the peak of the hot gas distribution. The BCG,
however, has been shown to often have an offset with the
center found in X-ray or SZ surveys, particularly as redshift
increases. These offsets are also influenced by the dynamical
state of the cluster, as more disturbed clusters often host BCGs
that are less centrally located (G. Gozaliasl et al. 2019;R.De
Propris et al. 2021). Similar results are also suggested by
cosmological simulations (W. Cui et al. 2016). Even in optical
surveys, it is not guaranteed that the BCG is automatically
identified. For instance, some algorithms select clusters by
searching for overdensities of galaxies in redshift or color
space. These include the Spitzer Adaptation of the Red-
sequence Cluster Survey (A. Muzzin et al. 2009; G. Wilson
et al. 2009)and the Massive and Distant Clusters of WISE
Survey (A. H. Gonzalez et al. 2019). Because these techniques
focus on overdensities rather than individual galaxies, the
coordinates of the BCG centroid are not necessarily provided.
With recent and upcoming projects such as Euclid (R. Lau-
reijs et al. 2011)and the Vera C. Rubin Observatory (Ž. Ivezić
et al. 2019)expected to greatly increase the quantity of
available wide-field data, having a quick and accurate method
of identifying the BCG from images will become crucial. Such
large amounts of data will make manual identification
infeasible, while overly simplistic algorithms may fail given
the complexity of the task. As a result, there is a need to better
automate this process.
To study a population of BCGs, researchers may opt to
construct a catalog containing their positions and properties
(A. Chu et al. 2021; M. Kluge & R. Bender 2023; J. C. Smith
The Astrophysical Journal, 981:117 (10pp), 2025 March 10 https://doi.org/10.3847/1538-4357/adb1ad
© 2025. The Author(s). Published by the American Astronomical Society.
Original content from this work may be used under the terms
of the Creative Commons Attribution 4.0 licence. Any further
distribution of this work must maintain attribution to the author(s)and the title
of the work, journal citation and DOI.
1
et al. 2023). BCG identification may also be done in
simulations (C. Roche et al. 2024). A few approaches to
automate this task have been explored. For instance, work from
T. Somboonpanyakul et al. (2022)utilizes the probability
distribution of redshift and stellar mass for objects near the SZ
cluster center to assign BCG likelihoods. On the other hand,
methods from A. Chu et al. (2022)involve careful treatment of
foreground and background objects before selecting the most
likely remaining BCG candidate.
In this work, we suggest an automated approach that is
capable of appropriately weighing features such as optical
color, morphology, and size, requiring little more than
photometric images. Such requests lend themselves well to
the use of machine learning.
We therefore test the abilities of neural networks to identify
the BCG centroid from galaxy cluster images. We train a neural
network on simulated cluster images from THE THREE
HUNDRED project (W. Cui et al. 2018)and test its performance
on real cluster images from the Sloan Digital Sky Survey
(SDSS; D. G. York et al. 2000; H. Aihara et al. 2011). We also
train the neural network on real galaxy cluster images and
compare the results. Though we test the network on galaxy
clusters up to redshift z=0.8 later in the paper, we introduce
this work as a proof of concept for a relatively narrow range of
low-redshift observations (z=0.15–0.25). By doing so, we
hope to minimize evolutionary effects and limitations caused
by resolution or depth, thus reducing systematic uncertainties.
This method is beneficial for several reasons. For one, it
works directly on images, circumventing the need to obtain
galaxy catalogs, remove foreground and background objects, or
perform other preparatory steps. Moreover, it does not require
redshift information for each galaxy, but rather an approximate
redshift of the cluster. This can be particularly helpful for
surveys with limited optical follow-up or high photometric
redshift uncertainties. It is also physics-agnostic by nature,
which allows it to function without the constraints of
predefined models of galaxy behavior. Instead, this method
can adapt to any range of cluster environment that is
represented in the training data set.
The option to train on simulations also provides a great deal
of advantages. For instance, simulations allow us to be certain
about our choice of BCG. In observations, there is always a
chance that the galaxy selected as the BCG is incorrectly
identified. This is especially useful for generalizing the task to
higher redshift, where the true BCG may be difficult to discern
using real observations. In particular, the BCG has been
predicted to lose its unique identity around z≈0.7 (G. De
Lucia & J. Blaizot 2007), as it becomes less certain that the
brightest galaxy at this time will become the main progenitor of
the redshift zero BCG. Using simulated data also gives access
to a variety of characteristics not necessarily known to the
observer, allowing for the possibility of further inference.
While we recognize that training on mock observations may
introduce biases when the simulations stray from reality, we
believe that their various benefits make them particularly
attractive.
The paper is laid out as follows: In Section 2, we describe
the data set, which contains both simulated observations and
real ones. In Section 3, we describe the methods used to
process the data, train the machine learning model, and assess
its accuracy. In Section 4,wepresenttheresults,which
include studying the differences between simulated and real
observations, training and testing the model, and performing
relevant tests for those wishing to apply these methods to
other data sets. In Section 5, we summarize the contents of the
paper and discuss the application of this technique to large
surveys.
2. Data
2.1. The Three Hundred Data
Simulated clusters are obtained from THE THREE HUNDRED
project cosmological simulation, which consists of 324 galaxy
clusters modeled with different full-physics hydrodynamical
resimulations and semianalytical models. We use clusters
obtained from the GIZMO-SIMBA run, as this has been shown to
produce more realistic BCGs compared to other codes from the
same simulation (W. Cui et al. 2022). We select a redshift
range between z=0.15 and z=0.25. The full data set of
simulated clusters contains four snapshots within this range, all
of which are included in our data set.
Simulated cluster images are then generated using the
Python package PyMGal,
8
which calculates magnitudes using
techniques from EzGal (C. L. Mancone & A. H. Gonza-
lez 2012)and then projects them to create mock observations.
When using the software, we assume the simple stellar
population model described in G. Bruzual & S. Charlot
(2003)with a Chabrier initial mass function (G. Chabrier
2003), and we select the Gaussian smoothing length of a given
particle to be the distance to its 30th-nearest neighbor.
For each snapshot, 10 random projection angles are chosen,
giving 40 different projections for each of the 324 clusters. This
makes for a total of 12,960 projections. For each one, we
generate images in all five of the SDSS bands. These bands are
ultraviolet (u), green, (g), red (r), near-infrared (i), and infrared
(z). Each projection is thus made up of a five-channel
256 ×256 image. We select the side lengths to represent a
physical distance of roughly 1.33 Mpc. A sample can be found
in the first row of Figure 1.
2.2. SDSS Data
The data obtained from SDSS can be split into two separate
parts: noise images and cluster images.
We begin with the noise. To make the simulated clusters
similar in appearance to observed ones, we must add realism to
them. To do this, we superimpose randomly selected images of
the sky obtained from SDSS. Coordinates are chosen randomly
from the area of the Baryon Oscillation Spectroscopic Survey
(S. Alam et al. 2017). We then cut out an image with side
length equal to approximately 1.33 Mpc to match the scale of
simulated images. This side length is calculated by producing a
random redshift between z=0.15 and z=0.25. For a given
redshift, the length is converted to a physical distance by
assuming a cosmology of H
0
=67.8 and Ω
m,0
=0.307. The
number of unique noise samples collected is equal to the
number of simulated observations, meaning that no noise
sample is repeated.
Though the mock observations each have a side length of
256 pixels, this is not necessarily true for SDSS data. We must
therefore resize SDSS images to match the dimensions of the
simulations. We use the rebinning algorithm described in the
RealSim Python package (C. Bottrell et al. 2019), which resizes
8
https://pypi.org/project/pymgal/
2
The Astrophysical Journal, 981:117 (10pp), 2025 March 10 Janulewicz, Webb, & Perreault-Levasseur
images to a charge-coupled device angular scale. This method
ensures that the total flux is conserved and maximizes the
fidelity of the mock observations. Clean simulated images and
SDSS noise are then added together in FITS format as shown in
Figure 1. They are then converted to PNG format using
logarithmic scaling.
Images of real galaxy clusters are also obtained from SDSS
using the WHL12 catalog. The catalog contains the name,
celestial coordinates, and photometric redshift of each galaxy
cluster. We use this photometric redshift to select all clusters
between z=0.15 and z=0.25. We then cut out regions with a
side length of 1.33 Mpc and select all images that fit within the
boundaries of their SDSS field. We find a total of 6962 real
cluster images satisfying these criteria. Images are then
processed with the same techniques used for noise images,
which include the same rebinning algorithm and pixel
dimensions. An example of a real galaxy cluster image is
shown at the bottom of Figure 1.
The WHL12 catalog also contains data describing the BCG
brightness and the richness of the cluster. The brightness is
defined as the Petrosian magnitude of the BCG in the rband,
which we will refer to as r
BCG
. The catalog defines a cluster
richness *
R
Lthat can be defined mathematically as **
R
L
L
L
200
=,
where L
200
is the total r-band luminosity within R
200
and L
*
is
the evolved characteristic luminosity of galaxies in the rband.
Also included is a scaling relation between *
R
Land the mass
M
200
. This provides us with an approximate relationship
between richness and mass, which is shown in the following
equation:
()()()
*
MRlog 1.49 0.05 1.17 0.03 log . 1
L
200 =-+
These properties can then be used to isolate subsets of BCGs
that align more closely with the simulated data set. We explore
these relationships in greater detail in Section 4.
3. Methods
3.1. Machine Learning Details
Because both the real and simulated clusters are centered on
their BCG, we must modify them to ensure that the model is
capable of generalizing to instances where the BCG may be
offset, as this will be the case for many surveys. However, we
want to instill in the model a tendency to prefer central
galaxies, as the BCG in real cluster images should still be near
the cluster center defined by the given technique. To do this, we
offset the images using a Gaussian perturbation from the center.
We randomly draw xand yvalues from a normal distribution
and offset the images by the result. The distribution has a mean
of 0 kpc and a standard deviation of 150 kpc. This roughly
covers the distribution of offsets found in other works, such as
the offset between the BCG and the X-ray peak (R. Seppi et al.
2023). To ensure that the smaller image does not exceed the
boundaries of the larger one, we truncate the Gaussian at the
maximum possible offset. We cut out a region of 1 Mpc from
the larger 1.33 Mpc image, meaning that this maximum offset
is half the excess space, or approximately 167 kpc. The
resulting images have pixel dimensions of 192 ×192.
The images are then divided into training, validation, and
test data sets. For real observations, we use 80% of images for
training, 10% for validation, and 10% for testing. For the
simulations, we split by cluster number to ensure that no
simulated galaxy cluster appears in two data sets. These
numbers are drawn at random and assigned to a data set. In
other words, we select 80% of the 324 clusters for training,
10% for validation, and 10% for testing. A breakdown of each
data set can be found in Table 1.
We then pass the off-centered five-channel training images
to the neural network. We select a ResNet18 architecture
(K. He et al. 2015)and pass the xand ypixel values of the BCG
centroid as learned parameters. We use a rectified linear unit
(ReLU)activation function (V. Nair & G. Hinton 2010)and an
Figure 1. The five columns on the left represent photometric bands from SDSS. Simulated galaxy clusters are shown in the first row, and random SDSS noise is shown
in the second row. The combination of the two is shown in the third row, and a real WHL12 cluster is shown in the final row for comparison. In the rightmost column,
we give an example of a cluster being off-centered by illustrating the process on the simulated rband.
3
The Astrophysical Journal, 981:117 (10pp), 2025 March 10 Janulewicz, Webb, & Perreault-Levasseur
Adam optimizer (D. P. Kingma & J. Ba 2017). We select the
mean squared error (MSE)as a loss function, which can be
expressed mathematically by (ˆ)MSE ni
nii
1
12
qq=å-
=for a
number of samples n, a truth vector θ, and a prediction vector ˆ
q
.
Further details regarding the model architecture and hyperpara-
meters can be found in Table 2.
We also use real cluster images to train different networks
with the same architecture. We compare one model trained
only on simulations, one trained only on real cluster images,
and one trained on both via transfer learning. For the transfer
learning model, we partially train on simulated images and use
the result to initialize the network’s weights. We then decrease
the learning rate by one order of magnitude and fine-tune on
real images. This allows the model to gain information on both
simulated and real observations. More details on these tests can
be found in Section 4.
3.2. Assessing Accuracy
We now define the statistics used to quantify the model’s
accuracy. The first statistic is the coefficient of determination,
denoted R
2
. To assess the quality of the predictions, we can plot
the values predicted by the model as a function of the ground
truth. A strong correlation indicates that predictions rarely
deviate from the truth, meaning that a perfectly accurate model
would produce R
2
=1. A mathematical description of these
statistics is shown below, where we denote predicted values
with a hat operator, averages with an overline, and true values
without an operator symbol. We take nto be the number of
images in the relevant test data set
()
()
()
¯
^
Rxx
xx
12
x
i
nii
i
ni
21
2
12
=-
å-
å-
=
=
()
()
()
¯
^
Ryy
yy
1.3
y
i
n
ii
i
n
i
21
2
12
=-
å-
å-
=
=
Another way of quantifying success is by measuring the
proportion of predictions that are correct within some reason-
able error. In this work, we consider a prediction to be correct if
it lies within some distance of the true position and incorrect
otherwise. Such a threshold should be large enough to
encompass the entire BCG but small enough to avoid false
positives. Unless otherwise specified, we will use a threshold of
25 kpc, as this is a conservative estimate for the region of the
cluster dominated by the BCG (S. Brough et al. 2024).We
define the proportion of predictions that are correct within the
distance threshold and refer to it as the accuracy A
T
. This
accuracy can be more formally defined by the following
equation:
() ()And1
1.4
T
i
n
Ti
1
å
=
=
Note that d
i
is equal to the Euclidean distance
(ˆ)( ˆ)xx yy
ii ii
22
-+- between the true and predicted
positions. We use 1
T
(d
i
)to denote the indicator function,
which is equal to 1 if the error is less than the threshold and 0
otherwise.
4. Results
4.1. Quantifying Differences between Data Sets
Before proceeding to train and test the model, it is crucial to
understand potential differences between the real and simulated
clusters. To begin, we consider two tests to summarize these
differences, which can be done directly on FITS images before
they are offset and converted to PNG. To perform these tests,
we randomly select 5000 images from each data set, thus
ensuring a large sample size that is consistent across the two
groups.
The first test is to study the one-dimensional probability
density function (1D PDF)of the flux values. This PDF can be
obtained by binning the pixels of each image in a given band to
a histogram and then averaging the resulting histograms over
all images. Because this is repeated over a large number of
experiments, the result should demonstrate the typical bright-
ness distribution of a galaxy cluster image. We obtain a PDF in
each band for both simulated and real images, and we show the
result in the first row of Figure 2.
The second test is to compute the power spectrum of the
images. The power spectrum indicates the typical variability
between two pixel values at a given length scale. The power,
which we denote P(k),quantifies the amount of variability
between samples for a given wavenumber k. This wavenum-
ber is a spatial frequency representing how often a wave
pattern repeats per unit distance. In this case, the largest
wavenumber represents the smallest distance, which corre-
sponds to 1 pixel or approximately 5 kpc. The smallest
wavenumber represents the largest distance, which corre-
sponds to half the image dimension or approximately 667 kpc.
For each of the 5000 images, we compute a power spectrum in
Table 1
The Number of Images across Each Data Set
Data Set Train Validation Test Total
Primary data sets:
Simulation 10400 1280 1280 12960
Real 5569 696 697 6962
Comparison data sets:
Simulation 5000 LLL
Real 5000 LLL
Real (full redshift)5000 L5000 L
Note. Note that the simulated and real comparison data sets are subsets of the
corresponding primary data set. All data sets consist of a redshift range
between z=0.15 and z=0.25 except those in the bottom row specified to
cover the full redshift range, which consists of samples between z=0.05
and z=0.8.
Table 2
Details and Hyperparameters of the Model Used to Determine the BCG
Position
Parameter Value
Main Model
Architecture ResNet18
Learning rate 0.001
Batch size 16
Optimizer Adam
Activation function ReLU
Loss function MSE
Transfer Learning
Learning rate 0.0001
4
The Astrophysical Journal, 981:117 (10pp), 2025 March 10 Janulewicz, Webb, & Perreault-Levasseur
each of the five bands. We then average the resulting power
spectra across all images and report the result. The results are
showninthesecondrowofFigure2.
The flux PDF's demonstrate a tendency for the simulated
cluster images to be slightly brighter than the real ones. This is
repeated across all SDSS bands. This may indicate that the
population of simulated clusters is more luminous than the
population of real ones.
The power spectra show that the simulated cluster images
show slightly higher variability overall. This is particularly true
for smaller wavenumbers, meaning that large distances vary
more in simulated images than in real ones. This may also
suggest that the simulations show brighter centers, as the larger
distances compare BCGs with the outskirts of the galaxy
cluster. We will take these differences into consideration in
Section 4.3 by isolating subsets of clusters with more
prominent BCGs.
Another key difference is the way in which mass and redshift
are distributed between the two data sets. Because simulations
are captured via discrete snapshots in time, we expect the real
clusters to be far more varied in their redshift. Furthermore,
differences in mass cuts can result in significant discrepancies
in cluster mass. To visualize these details, we plot the redshift
and mass for each. We combine the training, validation, and
test data sets for both the real and simulated galaxy clusters to
obtain the entire population, and we show the results in
Figure 3.
As expected, the real clusters are found to occupy a wide
range of values, while the space covered by simulated clusters
tends to be more limited. Redshifts from simulations are
grouped into thin horizontal lines, while those from real
clusters are more evenly distributed. We also see that the
simulated clusters tend to be considerably more massive than
those found in WHL12. While many low-mass clusters are
present in the real data, they are mostly absent from the
simulations.
4.2. Training the Model
We study the model’s behavior during training via a learning
curve. The learning curve tracks the loss as a function of
training epochs. In machine learning, the loss refers to the
penalty for incorrect predictions and can be used to indicate the
model’s accuracy. Training epochs refer to the number of
complete passes through the data set, during which the
network’s weights are adjusted using gradient descent. The
learning curve shows how quickly and how well the model
learns to make correct predictions.
Examining this curve can provide important insights about the
model’s training process. For instance, learning curves help
identify when the model begins to overfit the training set,
resulting in difficulty generalizing to unseen data. Preventing
overfitting is particularly important for this task, as the
simulations contain a limited number of unique training samples.
Figure 2. The first row contains the 1D PDF of flux values averaged over 5000 images for each data set. The second row shows the power spectrum averaged over the
same images for each data set. Columns indicate SDSS bands. While the two data sets appear to be quite similar, the simulated images show higher flux and power.
Figure 3. The redshift and mass distribution of galaxy clusters. Real clusters
are represented by blue triangles, while simulated ones are represented by black
circles. Note that the horizontal lines from simulations are due to the four
discrete snapshots. Overall, the real clusters cover a wider mass range and tend
to be less massive as a whole compared to simulated ones.
5
The Astrophysical Journal, 981:117 (10pp), 2025 March 10 Janulewicz, Webb, & Perreault-Levasseur
The learning curve shown in Figure 4tracks the loss in three
different data sets. The training set and validation set of
simulated galaxy cluster images make up two of these data sets.
The third is the validation set of real galaxy clusters. We can
therefore look at both the real and simulated validation sets to
identify potential overfitting.
We draw two notable conclusions from this test. The first
conclusion is that performance on both validation sets appears
to plateau after relatively few epochs. We find that training
beyond this point does not significantly reduce the validation
loss. To avoid overfitting, we will use a model trained for 100
epochs for testing unless otherwise specified. The second
conclusion concerns the relationship between the curves for the
two validation sets. Despite both curves plateauing, the loss for
the real clusters stays consistently above the loss for the
simulated ones. It appears that additional training is insufficient
to properly reduce the loss on real images.
4.3. Testing the Model
We now test the simulation-trained model on real cluster
images. To account for the differences between the two data
sets, we can divide the SDSS clusters into groups based on
their physical properties. We divide according to the BCG r-
band magnitude r
BCG
and the cluster mass M
200
, which can be
inferred from the richness *
R
Lusing the scaling relation shown
in Equation (1). Grouping clusters by BCG brightness will
allow us to separate prominent BCGs from subtler ones.
Meanwhile, separating clusters by mass will allow us to isolate
ranges that better align with the simulations used in training.
We determine the fraction of predictions accurate within 25 kpc
for each subset and show the results in Figure 5. While all other
experiments in this subsection are performed on the test data set
only, we use all 6962 real images in Figure 5, as we require a
large sample to cover all subsets.
While predictions on bright and massive clusters are strong,
accuracy drops steeply when testing on clusters with lesser
r
BCG
and M
200
. These results raise a few interesting points. The
first is that it is possible to identify subsets of the data where
accuracy will be strong. When testing on these subsets, we can
have confidence that the model will be reliable. When
venturing beyond this subset, we must be more cautious. The
second conclusion, which aligns with the findings from
Section 4.1, is that the simulations do not appear to encompass
all possible galaxy clusters. Rather, there appears to be a lack of
dim BCGs in the simulated training data. This is not entirely
unexpected, as we only have 324 unique galaxy clusters.
Nonetheless, this limits our ability to generalize well to the
wide range of SDSS observations.
Upon inspection of Figure 5, it appears that the brightness of
the BCG has a larger impact on accuracy than cluster mass.
While predictions do improve as cluster mass increases,
particularly for clusters with dim BCGs, M
200
appears to be a
less reliable indicator of accuracy than r
BCG
. As a result, we
consider a subset of the brightest BCGs available in the WHL12
catalog, which we henceforth define as having r
BCG
16.5. The
test data set contains 60 clusters satisfying this requirement.
To further assess performance, we study how error is
distributed among the model’s predictions. We compute the
Euclidean distance from the true BCG centroid to the value
predicted by the model. We obtain the predicted positions in
pixel space and convert them to physical positions. We can then
examine the way this error is spread using a cumulative
distribution function (CDF). This function plots the fraction of
predictions with an error less than or equal to a given value. A
CDF that increases steeply near the origin indicates that most
predictions are fairly accurate. On the other hand, a function that
increases slowly shows that more predictions are inaccurate. The
error CDF can be found at the top of Figure 6.
The CDF shows a discrepancy between the model’s
predictions across the two data sets. The simulated images
tend to have very low error. In fact, we see that most
predictions are accurate within roughly 25 kpc. When we look
at the real images, this proportion drops by nearly half. This
suggests that the model is succeeding in properly identifying
some BCGs but failing in other cases.
We also test performance by plotting the values predicted by
the model as a function of the ground truth. The resulting
correlation strength will then serve as a measurement of
success. We use Rx
2and Ry
2as defined in Section 3.2 to quantify
this strength.
Plots of the true and predicted results can be found at the
bottom of Figure 6. The model trained on simulated images
performs well when tested on simulated images, though we find
a far weaker correlation on real ones. While many of the
Figure 4. Learning curve when training on simulated galaxy clusters. We
compare the training and validation loss for simulated clusters. We also add the
validation loss for real clusters. Both validation curves plateau after relatively
few epochs, with the validation loss for real cluster images remaining well
above that of simulated images.
Figure 5. Performance for different values of r
BCG
and M
200
. Accuracy is
defined as the proportion of predictions within 25 kpc of the true position. The
number of samples in each bin is indicated in parentheses. Lower boundaries
on ranges are inclusive, while upper boundaries are exclusive. The
*
R
Lrichness
values corresponding to the M
200
mass values are shown at the top. In most
cases, an increasing trend in accuracy can be seen with greater brightness and
greater mass.
6
The Astrophysical Journal, 981:117 (10pp), 2025 March 10 Janulewicz, Webb, & Perreault-Levasseur
predictions fall near the true values, we see far more outliers.
We test on the subset of real clusters with r
BCG
16.5 and find
the correlation strength to be comparable and even superior to
tests on simulations.
In both cases, a trend can be seen when training on
simulations and testing on real data. Predictions are highly
accurate for bright BCGs, but they become less reliable for
dimmer ones. Though much of this error is due to the neural
network, it is worth noting that dim BCGs should be more
challenging to identify for the WHL12 algorithm as well. As a
result, looking at subtler cases can make the ground truth less
reliable, therefore adding an additional source of uncertainty.
Moving on from this test, we also look into correcting these
biases with transfer learning. Ideally, we would like the model
to perform well without having ever seen real data. This would
make it possible to pretrain a model for upcoming surveys and
apply it as soon as there is a new data release. If this is not
possible, we would at least hope to be able to fine-tune the
model with small amounts of data.
We test the model’s accuracy with different numbers of real
images used for training. We begin with a small set of 50
randomly selected samples. We then continue to double this
number until it reaches a size greater than half of the training
data set, in which case we use all available real training images.
In each case, we fine-tune for 100 epochs with a learning rate
reduced by an order of magnitude as indicated in Table 2. The
fine-tuned transfer learning model can then be compared with a
model trained from scratch. Results are shown in Figure 7.
This fine-tuned model achieves a strong performance with
very few training samples, suggesting that biases in the
simulations can be corrected with small amounts of data. In the
event that the simulations for large surveys are biased
compared to real observations, researchers may be able to
correct for this with limited early release data. It is also
noteworthy that the gap in performance closes significantly
after only a few hundred training images. While simulations
have better ground truth and more available information, real
data may potentially be less biased. We find that either method
can be viable once a modest amount of data is obtained.
4.4. Comparing Different Data Set Configurations
We also analyze the model’s ability to adapt to different
configurations of training and test data sets. One configuration is
to train and test on real galaxy cluster images. Another is to invert
the problem by training on real images and testing on simulations.
If the model trained on real data performs badly when testing on
simulations and the model trained on simulations performs badly
when testing on real data, then the issue is likely due to bias
between the two data sets. However, if the former model
performs well and the latter model performs poorly, this may
indicate that the simulated clusters form a subset of the real ones.
Finally, we are also interested in whether we can improve
performance using transfer learning as described in Section 3.1.
We train three new models on the simulated and real
comparison data sets having redshift 0.15 z0.25 as
indicated in Table 1. We train one network on 5000 simulated
cluster images for 100 epochs and a second network on
5000 real cluster images for 100 epochs. For transfer learning,
we take the model trained on 5000 simulated images for
100 epochs and fine-tune it on 1000 real images for another
100 epochs. We show the results in Figure 8by comparing the
three statistics defined in Section 3.2.
Figure 6. Performance when trained on simulated clusters. The model is tested
on simulated clusters, real SDSS clusters, and a bright subset of these SDSS
clusters satisfying r
BCG
16.5. Displayed on top is a CDF of the Euclidean
error across the three test sets. The four figures on the bottom show the true
BCG positions plotted against the model’s predictions, with a black line
displaying a 1:1 relationship shown for reference. The left and right columns
show the results for xand y, respectively. Performance is strong when tested on
simulations and the bright subset, but it drops when tested on the full set of real
SDSS clusters.
Figure 7. The accuracy within a threshold of 25 kpc as a function of training
set size. We compare a model trained from scratch on real images to a model
pretrained on simulated images and then fine-tuned on real ones. The solid
black line indicates the accuracy for the pretrained model before transfer
learning is applied, with worse values falling in the shaded region below it.
Though both models perform well with only a few hundred images, training
from scratch requires a larger set of training samples.
7
The Astrophysical Journal, 981:117 (10pp), 2025 March 10 Janulewicz, Webb, & Perreault-Levasseur
The model trained on real clusters works well when tested on
real clusters, and the model trained on simulations performs well
when tested on simulations. Transfer learning shows strong
performance across both data sets. When testing on a different
data set than the training set, we get mixed results. We continue
to see a drop in performance when training on simulations and
testing on real data. However, the model trained on real images
appears to perform well when tested on simulations, further
suggesting that the simulations form a subset of the real images.
We once again isolate the subset of real observations satisfying
r
BCG
16.5. All networks, regardless of training data set, make
high-accuracy predictions on this subset.
4.5. Comparing to Alternative Approaches
We now discuss the advantages of this approach compared
to alternative methods of automatically identifying BCGs. By
requiring only photometric images, this approach reduces the
need for data products that may be more challenging to obtain,
such as accurate redshift values. In the absence of available and
accurate galaxy redshifts, separating foreground objects from
cluster members becomes more challenging. One straightfor-
ward approach in this scenario is to find the brightest galaxy in
the cluster’sfield of view and select this as the BCG.
However, the presence of foreground interlopers can make
this a questionable assumption. We therefore investigate how
often the brightest galaxy in the field of view is the true BCG
by running tests on simulations. For the neural network, we
take the simulation-trained model used in Section 4.3 and test it
on the simulated cluster images, which have been off-centered
and converted from FITS to PNG. For the alternative approach,
we take the same simulated test data set with the same offsets,
but we leave the images in FITS format. We then run
SExtractor (E. Bertin & S. Arnouts 1996)on the rband using
default parameters with a detection threshold of 1.5σabove the
background noise level. To isolate galaxies and remove
contamination from stars, we remove objects that do not
satisfy CLASS_STAR 0.1. We then select the brightest
galaxy in the rband based on the MAG_AUTO parameter.
We consider two similar tests. The first is to select the
brightest galaxy in the entire field of view, which has a side
length of 1 Mpc. The other approach is to select the brightest
galaxy near the center of the cluster. Because the distribution of
offsets is known, we can select an appropriate radius of 200 kpc
from the center of the image. However, it is worth noting that
this may be a more hazardous design choice in real data, where
the BCG may have a wider range of possible offsets. The
results of this test can be found in Figure 9.
The neural network outperforms selecting the brightest
galaxy in the field of view, highlighting its ability to recognize
the difference between BCGs and bright foreground galaxies.
Even when imposing an optimal search radius based on a
known distribution of offsets, the neural network achieves the
highest accuracy for every snapshot. Another notable detail is
that the neural network shows the smallest decrease in accuracy
when redshift is increased. This is likely due to the fact that
foreground contamination becomes a greater challenge at
higher redshift. In all cases, the neural network outperforms
selecting the brightest galaxy and appears to be better suited to
adapt to more challenging cases.
4.6. Pushing toward Higher Redshift
Finally, we test the model’s ability to adapt to higher redshift
and identify the point at which it breaks down. We test two
different models, both trained on the real comparison data sets
shown in Table 1to avoid bias from the simulations. The first
model is trained on 5000 real galaxy clusters between z=0.15
and z=0.25. The second one is trained on the same number of
cluster images for the same number of epochs, but covering the
entire redshift range. We then create a test data set by collecting
5000 test images over the full redshift range that do not overlap
with either training set. In other words, there is an equal number
of images used in training and testing for this particular
experiment. All images are processed using the same methods
described in Section 2, including the same rebinning algorithm,
Figure 8. A comparison of Rx
2,Ry
2, and A
T
for different networks tested on the
real and simulated cluster data sets. The threshold distance for A
T
is set to be
25 kpc. Labels to the left of the arrow represent the data set used for training.
They are abbreviated Sfor simulated images, Rfor real images, and TL for
transfer learning. Labels to the right of the arrow represent the test data set and
are abbreviated similarly. Performance on the bright subset of BCGs
(r
BCG
16.5)is indicated by shaded regions for comparison. Results are
strong in all cases except for the model trained on simulations and tested on
real images, although they improve drastically when tested on the bright subset.
Figure 9. The blue curve represents the accuracy of the neural network used in
this work for each simulation snapshot, the black curve represents the accuracy
achieved by selecting the brightest galaxy in the rband within 200 kpc of the
cluster center, and the gray curve represents the accuracy achieved by selecting
the brightest galaxy in the entire field of view. Shaded regions indicate the
spread between the highest and lowest accuracy, while the dotted lines within
them represent the average accuracy over all redshifts. All tests are performed
on off-centered simulated images with a side length of 1 Mpc, with estimates
considered to be correct if the predicted position is within 25 kpc of the true
value. The neural network achieves the highest accuracy and sees the smallest
decrease in performance at higher redshift.
8
The Astrophysical Journal, 981:117 (10pp), 2025 March 10 Janulewicz, Webb, & Perreault-Levasseur
pixel dimensions, and offset distribution. Results can be seen in
Figure 10.
The neural network trained on the limited redshift range
shows a gradual decrease as we move toward higher redshift.
Results begin to deteriorate immediately as we push beyond the
training values, and they decline steeply around z=0.35. Even
within the training range, there appears to be slightly better
performance for closer objects. By the tail end of the redshifts
available in the catalog, accuracy has decreased to nearly zero.
We compare this to the network trained on the entire range.
This model shows much better adaptability, as performance
does not sharply decrease until roughly z=0.6. It does not
perform as well on low-redshift clusters as the first, suggesting
that there is some benefit to a more specialized model. This also
shows that the task can be performed at somewhat higher
redshift without significant decreases in accuracy.
Neither model is capable of accurately predicting the BCG
position at the tail end of the range. While it is true that these
BCGs may be subtler than those nearby, this is not necessarily
the reason for the drop in accuracy. Limitations in depth and
resolution from SDSS likely play a role as well. To properly
study this phenomenon, observations from more modern
telescopes may be required.
5. Discussion
Neural networks show promising results when tasked with
identifying BCGs in multiband photometric images. Models
trained on simulated images can identify the simulated BCGs
with near-perfect accuracy, achieving a correlation strength of
R
2
≈0.94 when plotting the predicted values as a function of
the truth. Perhaps more impressive is that training and testing
on real galaxy cluster images show similar success. While the
simulations are limited to 324 unique galaxy clusters and their
BCGs, each telescope observation obtained is truly unique. We
therefore expect a greater challenge, as there is greater
variability in the clusters and the galaxies that form them.
Real data also introduce the possibility of false detections or
other uncertainties, which could further hurt performance.
Despite these challenges, we obtain accurate predictions even
when training on as few as 1000 samples.
The biggest challenge arises when attempting to bridge the
gap between data sets due to differences such as mass cuts and
the typical brightness of BCGs. This makes it difficult for the
model to generalize between the two data sets even when
precautions are taken to avoid overfitting. The quality of
predictions from the simulation-trained model drops when
tested on real clusters, yielding a correlation strength of
R
2
≈0.60. Nonetheless, we find some degree of success when
training and testing on different data sets. The model trained on
real observations produces strong results when tested on
simulations. The converse, however, is not true, suggesting that
the simulations cover only a subset of reality.
We are able to isolate a subset of more obvious cases where
the network trained on simulations shows extremely strong
results. A simple way of doing this is by taking only clusters
with particularly bright BCGs, such as those with a BCG r-band
magnitude r
BCG
16.5. In this case, the correlation strength
increases to R
2
≈0.99. The accuracy within 25 kpc is also high
in this subset, with the percentage of successes ranging from the
eighties to even the high nineties in some cases. These tests
show that it is possible for a neural network trained on
simulations to identify BCGs in real observations, suggesting
that this task can also be done in anticipation of upcoming
surveys. Though preparing for these surveys by training on
simulations is not trivial, it can be done with great success given
the correct data set. Those wishing to use such an algorithm
should take these findings into consideration and thoroughly test
the model’s adaptability to unseen data. One possible strategy to
address this challenge is to assign a confidence level or an
uncertainty to a given prediction. Software such as Python’s
simulation-based inference package (A. Tejero-Cantero et al.
2020)exists for these purposes. In surveys where the ground
truth is not necessarily known, estimating uncertainties may help
isolate simple cases from more challenging ones.
Even in the event of slight differences between data sets,
transfer learning can be employed to fine-tune the network and
drastically improve performance. Biases in the model can
easily be corrected with only a few hundred samples.
Given a trained model, BCGs can be accurately identified with
simple photometric images and without the need for significant
preprocessing, such as handling foreground and background
objects. This approach strongly outperforms selecting the bright-
est galaxy in the field of view, as it is capable of discerning
between bright foreground interlopers and the true BCG. The
option to train on simulations also opens interesting possibilities
for identifying high-redshift BCGs or BCG progenitors.
We also see a level of adaptability to higher redshift. While
results are best when testing on the same redshift range used in
training, the boundaries can be pushed to some degree without
major drops in accuracy. Training over a wider range is also a
viable option, but there is a limit around z≈0.6 at which
performance drops steeply. Given the depth and resolution of
SDSS, it is unclear whether this limit is the result of
technological limitations or the evolution of the galaxy clusters.
To settle this, a similar experiment would need to be performed
on a deeper, higher-resolution survey.
Figure 10. The proportion of predictions within a given distance to the ground
truth at various redshifts. The model trained on a range of 0.15 z0.25 is
compared to a model trained over the entire redshift range of the catalog. Both
are tested on real SDSS clusters. Solid lines represent the accuracy computed
with a threshold of 25 kpc. Shaded regions below the solid lines indicate a
tighter error threshold. Ordered from darkest to lightest, the breaks between
each shaded region indicate a threshold of 20, 15, and 10 kpc. A histogram
showing the redshift distribution of each data set is shown at the bottom. Both
models tend to perform well within the training range, though the task becomes
increasingly difficult at higher redshift.
9
The Astrophysical Journal, 981:117 (10pp), 2025 March 10 Janulewicz, Webb, & Perreault-Levasseur
We reiterate that this method shows strong results and that
neural networks are highly effective for identifying BCGs in
large surveys. The primary challenge facing this method is
adapting between simulated and real observations. We encou-
rage individuals performing this task to exercise caution when
training on simulations, as the quality and diversity of these
simulations have a significant impact on the model’s adaptability
to real observations. While bridging the gap is indeed possible, it
is challenging owing to the limited amount of unique data from
cosmological simulations. Those applying this task to upcoming
surveys should consider these facts before moving forward.
Acknowledgments
We acknowledge support from the Centre de recherche en
astrophysique du Québec, un regroupement stratégique du
FRQNT. T.W. and P.J. acknowledge the support of the
Natural Sciences and Engineering Research Council of
Canada (NSERC; funding reference No. RGPIN-2020-04606).
L.P.-L. acknowledges support from the Canada Research Chair
Program. We also acknowledge access to the theoretically
modeled galaxy cluster data via THE THREE HUNDRED
collaboration.
9
The simulations used in this paper have been
performed in the MareNostrum Supercomputer at the
Barcelona Supercomputing Center, thanks to CPU time granted
by the Red Española de Supercomputación. As part of THE
THREE HUNDRED project, this work has received financial
support from the European Union’s Horizon 2020 Research
and Innovation program under the Marie Skłodowska-Curie
grant agreement No. 734374, the LACEGAL project.
ORCID iDs
Patrick Janulewicz https://orcid.org/0009-0004-9465-428X
Tracy M. A. Webb https://orcid.org/0000-0002-0104-9653
Laurence Perreault-Levasseur https://orcid.org/0000-0003-
3544-3939
References
Aihara, H., Allende Prieto, C., An, D., et al. 2011, ApJS,193, 29
Alam, S., Ata, M., Bailey, S., et al. 2017, MNRAS,470, 2617
Bertin, E., & Arnouts, S. 1996, A&AS,117, 393
Bleem, L. E., Stalder, B., de Haan, T., et al. 2015, ApJS,216, 27
Bottrell, C., Hani, M. H., Teimoorinia, H., et al. 2019, MNRAS,490, 5390
Brough, S., Ahad, S. L., Bahé, Y. M., et al. 2024, MNRAS,528, 771
Bruzual, G., & Charlot, S. 2003, MNRAS,344, 1000
Chabrier, G. 2003, PASP,115, 763
Chu, A., Durret, F., & Márquez, I. 2021, A&A,649, A42
Chu, A., Sarron, F., Durret, F., & Márquez, I. 2022, A&A,666, A54
Cui, W., Dave, R., Knebe, A., et al. 2022, MNRAS,514, 977
Cui, W., Knebe, A., Yepes, G., et al. 2018, MNRAS,480, 2898
Cui, W., Power, C., Biffi, V., et al. 2016, MNRAS,456, 2566
De Lucia, G., & Blaizot, J. 2007, MNRAS,375, 2
De Propris, R., West, M. J., Andrade-Santos, F., et al. 2021, MNRAS,500, 310
Gonzalez, A. H., Gettings, D. P., Brodwin, M., et al. 2019, ApJS,240, 33
Gozaliasl, G., Finoguenov, A., Tanaka, M., et al. 2019, MNRAS,483, 3545
He, K., Zhang, X., Ren, S., & Sun, J. 2015, in 2016 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR)(Piscataway, NJ: IEEE),1
Ivezić,Ž., Kahn, S. M., Tyson, J. A., et al. 2019, ApJ,873, 111
Kingma, D. P., & Ba, J. 2017, arXiv:1412.6980
Kluge, M., & Bender, R. 2023, ApJS,267, 41
Laureijs, R., Amiaux, J., Arduini, S., et al. 2011, arXiv:1110.3193
Liu, A., Bulbul, E., Ghirardini, V., et al. 2022, A&A,661, A2
Mancone, C. L., & Gonzalez, A. H. 2012, PASP,124, 606
Muzzin, A., Wilson, G., Yee, H. K. C., et al. 2009, ApJ,698, 1934
Nair, V., & Hinton, G. 2010, in Proc. of the 27th Int. Conf. on Machine
Learning, ed. J. Fürnkranz & T. Joachims (Madison, WI: Omnipress), 807
Planck Collaboration, Ade, P. A. R., Aghanim, N., et al. 2016, A&A,594, A27
Roche, C., McDonald, M., Borrow, J., et al. 2024, OJAp,7, 65
Rykoff, E. S., Rozo, E., Busha, M. T., et al. 2014, ApJ,785, 104
Seppi, R., Comparat, J., Nandra, K., et al. 2023, A&A,671, A57
Smith, J. C., Ryczanowski, D., Bianconi, M., et al. 2023, RNAAS,7, 51
Somboonpanyakul, T., McDonald, M., Noble, A., et al. 2022, AJ,163, 146
Tejero-Cantero, A., Boelts, J., Deistler, M., et al. 2020, JOSS,5, 2505
van den Bosch, F. C., Weinmann, S. M., Yang, X., et al. 2005, MNRAS,
361, 1203
Vikhlinin, A., McNamara, B. R., Forman, W., et al. 1998, ApJ,502, 558
Wen, Z. L., Han, J. L., & Liu, F. S. 2012, ApJS,199, 34
Wilson, G., Muzzin, A., Yee, H. K. C., et al. 2009, ApJ,698, 1943
York, D. G., Adelman, J., Anderson, J. E. J., et al. 2000, AJ,120, 1579
9
https://www.the300-project.org
10
The Astrophysical Journal, 981:117 (10pp), 2025 March 10 Janulewicz, Webb, & Perreault-Levasseur