Article

Machine Learning methods to estimate observational properties of galaxy clusters in large volume cosmological N-body simulations

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this paper we study the applicability of a set of supervised machine learning (ML) models specifically trained to infer observed related properties of the baryonic component (stars and gas) from a set of features of dark matter only cluster-size halos. The training set is built from The Three Hundred project which consists of a series of zoomed hydrodynamical simulations of cluster-size regions extracted from the 1 Gpc volume Multidark dark-matter only simulation (MDPL2). We use as target variables a set of baryonic properties for the intra cluster gas and stars derived from the hydrodynamical simulations and correlate them with the properties of the dark matter halos from the MDPL2 N-body simulation. The different ML models are trained from this database and subsequently used to infer the same baryonic properties for the whole range of cluster-size halos identified in the MDPL2. We also test the robustness of the predictions of the models against mass resolution of the dark matter halos and conclude that their inferred baryonic properties are rather insensitive to their DM properties which are resolved with almost an order of magnitude smaller number of particles. We conclude that the ML models presented in this paper can be used as an accurate and computationally efficient tool for populating cluster-size halos with observational related baryonic properties in large volume N-body simulations making them more valuable for comparison with full sky galaxy cluster surveys at different wavelengths. We make the best ML trained model publicly available.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... dergo a physical rearrangement to follow a polytropic equation of state (see §3.2). Other gas models have also been used, relying on different physical assumptions (e.g., Battaglia 2016;Mead et al. 2020), along with advances in machine learning techniques to draw a mapping between the matter field in gravity-only simulations and baryon properties in hydrodynamic volumes (e.g., Tröster et al. 2019;Thiele et al. 2020;de Andres et al. 2023;Chadayammuri et al. 2023). Baryon pasting methods today constitute a cornerstone of synthetic cosmological datasets generation, having been used on various simulation suites to create multi-wavelength sky maps for cluster science (e.g., Sehgal et al. 2010;Stein et al. 2020;Omori 2022). ...
... The300 Project (Cui et al. 2020) is a re-simulation of a mass-complete sample of 324 galaxy clusters from the Multi-Dark Planck 2 (MDPL2) N -body simulation (Klypin et al. 2016) 3 with 1Gpc/h simulation box size (see Zhang et al. 2022;de Andres et al. 2022de Andres et al. , 2023, for the benefit of this particular setup). These clusters are identified with the rockstar (Behroozi et al. 2013) halo finder. ...
Preprint
Protoclusters are the progenitors of massive galaxy clusters. Understanding the properties of these structures is important for building a complete picture of cluster formation and for understanding the impact of environment on galaxy evolution. Future cosmic microwave background (CMB) surveys may provide insight into the properties of protoclusters via observations of the thermal Sunyaev Zel'dovich (SZ) effect and gravitational lensing. Using realistic hydrodynamical simulations of protoclusters from the Three Hundred Project, we forecast the ability of CMB Stage 4-like (CMB-S4) experiments to detect and characterize protoclusters with observations of these two signals. For protoclusters that are the progenitors of clusters at $z = 0$ with $M_{200c} \gtrsim 10^{15}\,M_{\odot}$ we find that the S4-Ultra deep survey has a roughly 20% chance of detecting the main halos in these structures with ${\rm SNR} > 5$ at $z \sim 2$ and a 10% chance of detecting them at $z \sim 2.5$, where these probabilities include the impacts of noise, CMB foregrounds, and the different possible evolutionary histories of the structures. On the other hand, if protoclusters can be identified using alternative means, such as via galaxy surveys like LSST and Euclid, CMB-S4 will be able to obtain high signal-to-noise measurements of their stacked lensing and SZ signals, providing a way to measure their average mass and gas content. With a sample of 2700 protoclusters at $z = 3$, the CMB-S4 wide survey can measure the stacked SZ signal with a signal-to-noise of 7.2, and the stacked lensing signal with a signal-to-noise of 5.7. Future CMB surveys thus offer exciting prospects for understanding the properties of protoclusters.
... One common application of machine learning to cosmological simulations has been to train a model to predict the baryonic properties of dark matter only halos (e.g. Moster et al. 2021;Machado Poletti Valle et al. 2021;Icaza-Lizaola et al. 2021;de Santi et al. 2022;Stiskalek et al. 2022;Chittenden & Tojeiro 2023;Jagvaral et al. 2022;Hausen et al. 2023;de Andres et al. 2023;Icaza-Lizaola et al. 2023). In McGibbon & Khochfar (2022) the authors introduced a new method for making these predictions, which used the properties of the subhalo over a wide range of times as model input. ...
Preprint
Full-text available
Using a novel machine learning method, we investigate the buildup of galaxy properties in different simulations, and in various environments within a single simulation. The aim of this work is to show the power of this approach at identifying the physical drivers of galaxy properties within simulations. We compare how the stellar mass is dependent on the value of other galaxy and halo properties at different points in time by examining the feature importance values of a machine learning model. By training the model on IllustrisTNG we show that stars are produced at earlier times in higher density regions of the universe than they are in low density regions. We also apply the technique to the Illustris, EAGLE, and CAMELS simulations. We find that stellar mass is built up in a similar way in EAGLE and IllustrisTNG, but significantly differently in the original Illustris, suggesting that subgrid model physics is more important than the choice of hydrodynamics method. These differences are driven by the efficiency of supernova feedback. Applying principal component analysis to the CAMELS simulations allows us to identify a component associated with the importance of a halo's gravitational potential and another component representing the time at which galaxies form. We discover that the speed of galactic winds is a more critical subgrid parameter than the total energy per unit star formation. Finally we find that the Simba black hole feedback model has a larger effect on galaxy formation than the IllustrisTNG black hole feedback model.
... We briefly illustrate the details of baryon models for G -X and G -SIMBA in the following subsection. More details can be found in Davé et al. (2016Davé et al. ( , 2019; Cui et al. (2022) (for G -SIMBA) and Cui et al. (2018) (for G -X). Benefiting from the unique setups, these simulated clusters from the300 project have been widely used for different studies, for example, environment effect (Wang et al. 2018), cluster profiles (Mostoghiu et al. 2019;Li et al. 2020;Baxter et al. 2021), splash-back galaxies (Arthur et al. 2019;Haggar et al. 2020;Knebe et al. 2020), cluster dynamical state Capalbo et al. 2021;Zhang et al. 2021;Li et al. 2022a), filament structures (Kuchner et al. 2020;Rost et al. 2021;Kuchner et al. 2021), lensing studies Herbonnet et al. 2022;Euclid Collaboration et al. 2023), cluster mass (Li et al. 2021;Gianfagna et al. 2023) and machine learning studies (de Andres et al. 2022(de Andres et al. , 2023Ferragamo et al. 2023). ...
Preprint
The distribution of baryons provides a significant way to understand the formation of galaxy clusters by revealing the details of its internal structure and changes over time. In this paper, we present theoretical studies on the scaled profiles of physical properties associated with the baryonic components, including gas density, temperature, metallicity, pressure and entropy as well as stellar mass, metallicity and satellite galaxy number density in galaxy clusters from $z=4$ to $z=0$ by tracking their progenitors. These mass-complete simulated galaxy clusters are coming from THE THREE HUNDRED with two runs: GIZMO-SIMBA and Gadget-X. Through comparisons between the two simulations, and with observed profiles which are generally available at low redshift, we find that (1) the agreements between the two runs and observations are mostly at outer radii $r \gtrsim 0.3r_{500}$, in line with the self-similarity assumption. While Gadget-X shows better agreements with the observed gas profiles in the central regions compared to GIZMO-SIMBA; (2) the evolution trends are generally consistent between the two simulations with slightly better consistency at outer radii. In detail, the gas density profile shows less discrepancy than the temperature and entropy profiles at high redshift. The differences in the cluster centre and gas properties imply different behaviours of the AGN models between Gadget-X and GIZMO-SIMBA, with the latter, maybe too strong for this cluster simulation. The high-redshift difference may be caused by the star formation and feedback models or hydrodynamics treatment, which requires observation constraints and understanding.
... In this paper, we investigate the halo-galaxy connection from a machine learning (ML) perspective. The issue of the halo-galaxy connection has been addressed using ML by many works (e.g., Kamdar et al. 2016;Agarwal et al. 2018 Kasmanoff et al. 2020;Delgado et al. 2021;McGibbon & Khochfar 2021;Shao et al. 2021;Lovell et al. 2022;Stiskalek et al. 2022;de Andres et al. 2022;Jespersen et al. 2022;Chittenden & Tojeiro 2023). In de Santi et al. (2022) we provide a ML suite combining some of the most powerful, well-known models in the literature to predict central galaxy properties using host halo properties. ...
Preprint
The relationship between galaxies and haloes is central to the description of galaxy formation, and a fundamental step towards extracting precise cosmological information from galaxy maps. However, this connection involves several complex processes that are interconnected. Machine Learning methods are flexible tools that can learn complex correlations between a large number of features, but are traditionally designed as deterministic estimators. In this work, we use the IllustrisTNG300-1 simulation and apply neural networks in a binning classification scheme to predict probability distributions of central galaxy properties, namely stellar mass, colour, specific star formation rate, and radius, using as input features the halo mass, concentration, spin, age, and the overdensity on a scale of 3 $h^{-1}$ Mpc. The model captures the intrinsic scatter in the relation between halo and galaxy properties, and can thus be used to quantify the uncertainties related to the stochasticity of the galaxy properties with respect to the halo properties. In particular, with our proposed method, one can define and accurately reproduce the properties of the different galaxy populations in great detail. We demonstrate the power of this tool by directly comparing traditional single-point estimators and the predicted joint probability distributions, and also by computing the power spectrum of a large number of tracers defined on the basis of the predicted colour-stellar mass diagram. We show that the neural networks reproduce clustering statistics of the individual galaxy populations with excellent precision and accuracy.
Article
Full-text available
Physics-informed neural networks have emerged as a coherent framework for building predictive models that combine statistical patterns with domain knowledge. The underlying notion is to enrich the optimization loss function with known relationships to constrain the space of possible solutions. Hydrodynamic simulations are a core constituent of modern cosmology, while the required computations are both expensive and time-consuming. At the same time, the comparatively fast simulation of dark matter requires fewer resources, which has led to the emergence of machine learning algorithms for baryon inpainting as an active area of research; here, recreating the scatter found in hydrodynamic simulations is an ongoing challenge. This paper presents the first application of physics-informed neural networks to baryon inpainting by combining advances in neural network architectures with physical constraints, injecting theory on baryon conversion efficiency into the model loss function. We also introduce a punitive prediction comparison based on the Kullback-Leibler divergence, which enforces scatter reproduction. By simultaneously extracting the complete set of baryonic properties for the Simba suite of cosmological simulations, our results demonstrate improved accuracy of baryonic predictions based on dark matter halo properties and successful recovery of the fundamental metallicity relation, and retrieve scatter that traces the target simulation’s distribution.
Article
Galaxy cluster mass functions are a function of cosmology, but mass is not a direct observable, and systematic errors abound in all its observable proxies. Mass-free inference can bypass this challenge, but it requires large suites of simulations spanning a range of cosmologies and models for directly observable quantities. In this work, we devise a U-net – an image-to-image machine learning algorithm – to ‘paint’ the illustristng model of baryons on to dark matter-only (DMO) simulations of galaxy clusters. Using 761 galaxy clusters with M200c ≳ 1014 M⊙ from the TNG300 simulation at z < 1, we train the algorithm to read in maps of projected dark matter mass and output maps of projected gas density, temperature, and X-ray flux. Despite being trained on individual images, the model reproduces the true scaling relation and scatter for the MDM–LX, as well as the distribution functions of the cluster X-ray luminosity and gas mass. For just one decade in cluster mass, the model reproduces three orders of magnitude in LX. The model is biased slightly high when using dark matter maps from the DMO simulation. The model performs well on inputs from TNG300-2, whose mass resolution is eight times coarser; further degrading the resolution biases the predicted luminosity function high. We conclude that U-net-based baryon painting is a promising technique to build large simulated cluster catalogues, which can be used to improve cluster cosmology by combining existing full-physics and large N-body simulations.
Preprint
Full-text available
Galaxy cluster mass functions are a function of cosmology, but mass is not a direct observable, and systematic errors abound in all its observable proxies. Mass-free inference can bypass this challenge, but it requires large suites of simulations spanning a range of cosmologies and models for directly observable quantities. In this work, we devise a U-net - an image-to-image machine learning algorithm - to ``paint'' the IllustrisTNG model of baryons onto dark-matter-only simulations of galaxy clusters. Using 761 galaxy clusters with $M_{200c} \gtrsim 10^{14}M_\odot$ from the TNG-300 simulation at $z<1$, we train the algorithm to read in maps of projected dark matter mass and output maps of projected gas density, temperature, and X-ray flux. The models train in under an hour on two GPUs, and then predict baryonic images for $\sim2700$ dark matter maps drawn from the TNG-300 dark-matter-only (DMO) simulation in under two minutes. Despite being trained on individual images, the model reproduces the true scaling relation and scatter for the $M_{DM}-L_X$, as well as the distribution functions of the cluster X-ray luminosity and gas mass. For just one decade in cluster mass, the model reproduces three orders of magnitude in $L_X$. The model is biased slightly high when using dark matter maps from the DMO simulation, which is known to have ultra-dense clumps that in the FP simulation are smoothed out by baryonic feedback. The model performs well on inputs from TNG-300-2, whose mass resolution is 8 times coarser; further degrading the resolution biases the predicted luminosity function high. We conclude that U-net-based baryon painting is a promising technique to build large simulated cluster catalogs which can be used to improve cluster cosmology by combining existing full-physics and large $N$-body simulations.
Article
Full-text available
Using a novel machine learning method, we investigate the buildup of galaxy properties in different simulations, and in various environments within a single simulation. The aim of this work is to show the power of this approach at identifying the physical drivers of galaxy properties within simulations. We compare how the stellar mass is dependent on the value of other galaxy and halo properties at different points in time by examining the feature importance values of a machine learning model. By training the model on IllustrisTNG we show that stars are produced at earlier times in higher density regions of the universe than they are in low density regions. We also apply the technique to the Illustris, EAGLE, and CAMELS simulations. We find that stellar mass is built up in a similar way in EAGLE and IllustrisTNG, but significantly differently in the original Illustris, suggesting that subgrid model physics is more important than the choice of hydrodynamics method. These differences are driven by the efficiency of supernova feedback. Applying principal component analysis to the CAMELS simulations allows us to identify a component associated with the importance of a halo’s gravitational potential and another component representing the time at which galaxies form. We discover that the speed of galactic winds is a more critical subgrid parameter than the total energy per unit star formation. Finally we find that the Simba black hole feedback model has a larger effect on galaxy formation than the IllustrisTNG black hole feedback model.
Article
Models that connect galaxy and halo properties often summarize a halo’s mass accretion history (MAH) with a single value, and use this value as the basis for predictions. However, a single-value summary fails to capture the complexity of MAHs and information can be lost in the process. We present MultiCAM, a generalization of traditional abundance matching frameworks, which can simultaneously connect the full MAH of a halo with multiple halo and/or galaxy properties. As a first case study, we apply MultiCAM to the problem of connecting dark matter halo properties to their MAHs in the context of a dark matter-only simulation. While some halo properties, such as concentration, are more strongly correlated to the early-time mass growth of a halo, others, like the virial ratio, have stronger correlations with late-time mass growth. This highlights the necessity of considering the impact of the entire MAH on halo properties. For most of the halo properties we consider, we find that MultiCAM models that use the full MAH achieve higher accuracy than conditional abundance matching models which use a single epoch. We also demonstrate an extension of MultiCAM that captures the covariance between predicted halo properties. This extension provides a baseline model for applications where the covariance between predicted properties is important.
Article
The distribution of baryons provides a significant way to understand the formation of galaxy clusters by revealing the details of its internal structure and changes over time. In this paper, we present theoretical studies on the scaled profiles of physical properties associated with the baryonic components, including gas density, temperature, metallicity, pressure and entropy as well as stellar mass, metallicity and satellite galaxy number density in galaxy clusters from z = 4 to z = 0 by tracking their progenitors. These mass-complete simulated galaxy clusters are coming from The Three Hundred with two runs: Gizmo-SIMBA and Gadget-X. Through comparisons between the two simulations, and with observed profiles which are generally available at low redshift, we find that (1) the agreements between the two runs and observations are mostly at outer radii r ≳ 0.3r500, in line with the self-similarity assumption. While Gadget-X shows better agreements with the observed gas profiles in the central regions compared to Gizmo-SIMBA; (2) the evolution trends are generally consistent between the two simulations with slightly better consistency at outer radii. In detail, the gas density profile shows less discrepancy than the temperature and entropy profiles at high redshift. The differences in the cluster centre and gas properties imply different behaviours of the AGN models between Gadget-X and Gizmo-SIMBA, with the latter, maybe too strong for this cluster simulation. The high-redshift difference may be caused by the star formation and feedback models or hydrodynamics treatment, which requires observation constraints and understanding.
Article
The relationship between galaxies and haloes is central to the description of galaxy formation, and a fundamental step towards extracting precise cosmological information from galaxy maps. However, this connection involves several complex processes that are interconnected. Machine Learning methods are flexible tools that can learn complex correlations between a large number of features, but are traditionally designed as deterministic estimators. In this work, we use the IllustrisTNG300-1 simulation and apply neural networks in a binning classification scheme to predict probability distributions of central galaxy properties, namely stellar mass, colour, specific star formation rate, and radius, using as input features the halo mass, concentration, spin, age, and the overdensity on a scale of 3 h−1 Mpc. The model captures the intrinsic scatter in the relation between halo and galaxy properties, and can thus be used to quantify the uncertainties related to the stochasticity of the galaxy properties with respect to the halo properties. In particular, with our proposed method, one can define and accurately reproduce the properties of the different galaxy populations in great detail. We demonstrate the power of this tool by directly comparing traditional single-point estimators and the predicted joint probability distributions, and also by computing the power spectrum of a large number of tracers defined on the basis of the predicted colour-stellar mass diagram. We show that the neural networks reproduce clustering statistics of the individual galaxy populations with excellent precision and accuracy.
Preprint
Physics-informed neural networks have emerged as a coherent framework for building predictive models that combine statistical patterns with domain knowledge. The underlying notion is to enrich the optimization loss function with known relationships to constrain the space of possible solutions. Hydrodynamic simulations are a core constituent of modern cosmology, while the required computations are both expensive and time-consuming. At the same time, the comparatively fast simulation of dark matter requires fewer resources, which has led to the emergence of machine learning algorithms for baryon inpainting as an active area of research; here, recreating the scatter found in hydrodynamic simulations is an ongoing challenge. This paper presents the first application of physics-informed neural networks to baryon inpainting by combining advances in neural network architectures with physical constraints, injecting theory on baryon conversion efficiency into the model loss function. We also introduce a punitive prediction comparison based on the Kullback-Leibler divergence, which enforces scatter reproduction. By simultaneously extracting the complete set of baryonic properties for the Simba suite of cosmological simulations, our results demonstrate improved accuracy of baryonic predictions based on dark matter halo properties, successful recovery of the fundamental metallicity relation, and retrieve scatter that traces the target simulation's distribution.
Article
We develop a machine learning algorithm to infer the 3D cumulative radial profiles of total and gas mass in galaxy clusters from thermal Sunyaev-Zel’dovich effect maps. We generate around 73,000 mock images along various lines of sight using 2,522 simulated clusters from the The Three Hundred project at redshift z < 0.12 and train a model that combines an autoencoder and a random forest. Without making any prior assumptions about the hydrostatic equilibrium of the clusters, the model is capable of reconstructing the total mass profile as well as the gas mass profile, which is responsible for the SZ effect. We show that the recovered profiles are unbiased with a scatter of about $10\%$, slightly increasing towards the core and the outskirts of the cluster. We selected clusters in the mass range of 1013.5 ≤ M200/( h−1M⊙) ≤ 1015.5, spanning different dynamical states, from relaxed to disturbed halos. We verify that both the accuracy and precision of this method show a slight dependence on the dynamical state, but not on the cluster mass. To further verify the consistency of our model, we fit the inferred total mass profiles with an NFW model and contrast the concentration values with those of the true profiles. We note that the inferred profiles are unbiased for higher concentration values, reproducing a trustworthy mass-concentration relation. The comparison with a widely used mass estimation technique, such as hydrostatic equilibrium, demonstrates that our method recovers the total mass that is not biased by non-thermal motions of the gas.
Article
Full-text available
Galaxy clusters are useful laboratories to investigate the evolution of the Universe, and accurate measurement of their total masses allows us to constrain important cosmological parameters. However, estimating mass from observations that use different methods and spectral bands introduces various systematic errors. Here we evaluate the use of a convolutional neural network (CNN) to reliably and accurately infer the masses of galaxy clusters from the Compton-y parameter maps provided by the Planck satellite. The CNN is trained with mock images generated from hydrodynamic simulations of galaxy clusters, with Planck’s observational limitations taken into account. We observe that the CNN approach is not subject to the usual observational assumptions, and therefore is not affected by the same biases. By applying the trained CNNs to the real Planck maps, we find cluster masses compatible with Planck measurements within a 15% bias. Finally, we show that this mass bias can be explained by the well-known hydrostatic equilibrium assumption in Planck masses, and the different parameters in the integrated Compton-y signal and the mass scaling laws. This work highlights that CNNs, supported by hydrodynamic simulations, are a promising and independent tool for estimating cluster masses with high accuracy, which can be extended to other surveys as well as to observations in other bands.
Article
Full-text available
Galaxy clusters are useful laboratories to investigate the evolution of the Universe, and accurate measurement of their total masses allows us to constrain important cosmological parameters. However, estimating mass from observations that use different methods and spectral bands introduces various systematic errors. Here we evaluate the use of a convolutional neural network (CNN) to reliably and accurately infer the masses of galaxy clusters from the Compton-y parameter maps provided by the Planck satellite. The CNN is trained with mock images generated from hydrodynamic simulations of galaxy clusters, with Planck’s observational limitations taken into account. We observe that the CNN approach is not subject to the usual observational assumptions, and therefore is not affected by the same biases. By applying the trained CNNs to the real Planck maps, we find cluster masses compatible with Planck measurements within a 15% bias. Finally, we show that this mass bias can be explained by the well-known hydrostatic equilibrium assumption in Planck masses, and the different parameters in the integrated Compton-y signal and the mass scaling laws. This work highlights that CNNs, supported by hydrodynamic simulations, are a promising and independent tool for estimating cluster masses with high accuracy, which can be extended to other surveys as well as to observations in other bands.
Article
Full-text available
We apply machine learning, a powerful method for uncovering complex correlations in high-dimensional data, to the galaxy–halo connection of cosmological hydrodynamical simulations. The mapping between galaxy and halo variables is stochastic in the absence of perfect information, but conventional machine learning models are deterministic and hence cannot capture its intrinsic scatter. To overcome this limitation, we design an ensemble of neural networks with a Gaussian loss function that predict probability distributions, allowing us to model statistical uncertainties in the galaxy–halo connection as well as its best-fit trends. We extract a number of galaxy and halo variables from the Horizon-AGN and IllustrisTNG100-1 simulations and quantify the extent to which knowledge of some subset of one enables prediction of the other. This allows us to identify the key features of the galaxy–halo connection and investigate the origin of its scatter in various projections. We find that while halo properties beyond mass account for up to 50 per cent of the scatter in the halo-to-stellar mass relation, the prediction of stellar half-mass radius or total gas mass is not substantially improved by adding further halo properties. We also use these results to investigate semi-analytic models for galaxy size in the two simulations, finding that assumptions relating galaxy size to halo size or spin are not successful.
Article
Full-text available
We introduce Gizmo-Simba, a new suite of galaxy cluster simulations within The Three Hundred project. The Three Hundred consists of zoom re-simulations of 324 clusters with M200 ≳ 1014.8 M⊙ drawn from the MultiDark-Planck N-body simulation, run using several hydrodynamic and semi-analytic codes. The Gizmo-Simba suite adds a state-of-the-art galaxy formation model based on the highly successful Simba simulation, mildly re-calibrated to match z = 0 cluster stellar properties. Comparing to The Three Hundred zooms run with Gadget-X, we find intrinsic differences in the evolution of the stellar and gas mass fractions, BCG ages, and galaxy colour-magnitude diagrams, with Gizmo-Simba generally providing a good match to available data at z ≈ 0. Gizmo-Simba’s unique black hole growth and feedback model yields agreement with the observed BH scaling relations at the intermediate-mass range and predicts a slightly different slope at high masses where few observations currently lie. Gizmo-Simba provides a new and novel platform to elucidate the co-evolution of galaxies, gas, and black holes within the densest cosmic environments.
Article
Full-text available
We present a novel machine learning method for predicting the baryonic properties of dark matter only subhalos from N-body simulations. Our model is built using the extremely randomized tree (ERT) algorithm and takes subhalo properties over a wide range of redshifts as its input features. We train our model using the IllustrisTNG simulations to predict blackhole mass, gas mass, magnitudes, star formation rate, stellar mass, and metallicity. We compare the results of our method with a baseline model from previous works, and against a model that only considers the mass history of the subhalo. We find that our new model significantly outperforms both of the other models. We then investigate the predictive power of each input by looking at feature importance scores from the ERT algorithm. We produce feature importance plots for each baryonic property, and find that they differ significantly. We identify low redshifts as being most important for predicting star formation rate and gas mass, with high redshifts being most important for predicting stellar mass and metallicity, and consider what this implies for nature versus nurture. We find that the physical properties of galaxies investigated in this study are all driven by nurture and not nature. The only property showing a somewhat stronger impact of nature is the present-day star formation rate of galaxies. Finally we verify that the feature importance plots are discovering physical patterns, and that the trends shown are not an artefact of the ERT algorithm.
Article
Full-text available
High-resolution cosmological hydrodynamic simulations are currently limited to relatively small volumes due to their computational expense. However, much larger volumes are required to probe rare, overdense environments, and measure clustering statistics of the large scale structure. Typically, zoom simulations of individual regions are used to study rare environments, and semi-analytic models and halo occupation models applied to dark matter only (DMO) simulations are used to study the Universe in the large-volume regime. We propose a new approach, using a machine learning framework to explore the halo-galaxy relationship in the periodic eagle simulations, and zoom C-EAGLE simulations of galaxy clusters. We train a tree based machine learning method to predict the baryonic properties of galaxies based on their host dark matter halo properties. The trained model successfully reproduces a number of key distribution functions for an infinitesimal fraction of the computational cost of a full hydrodynamic simulation. By training on both periodic simulations as well as zooms of overdense environments, we learn the bias of galaxy evolution in differing environments. This allows us to apply the trained model to a larger DMO volume than would be possible if we only trained on a periodic simulation. We demonstrate this application using the (800 Mpc)3 P-Millennium simulation, and present predictions for key baryonic distribution functions and clustering statistics from the eagle model in this large volume.
Article
Full-text available
While cosmological dark matter-only simulations relying solely on gravitational effects are comparably fast to compute, baryonic properties in simulated galaxies require complex hydrodynamic simulations that are computationally costly to run. We explore the merging of an extended version of the equilibrium model, an analytic formalism describing the evolution of the stellar, gas, and metal content of galaxies, into a machine learning framework. In doing so, we are able to recover more properties than the analytic formalism alone can provide, creating a high-speed hydrodynamic simulation emulator that populates galactic dark matter haloes in N-body simulations with baryonic properties. While there exists a trade-off between the reached accuracy and the speed advantage this approach offers, our results outperform an approach using only machine learning for a subset of baryonic properties. We demonstrate that this novel hybrid system enables the fast completion of dark matter-only information by mimicking the properties of a full hydrodynamic suite to a reasonable degree, and discuss the advantages and disadvantages of hybrid versus machine learning-only frameworks. In doing so, we offer an acceleration of commonly deployed simulations in cosmology.
Article
Full-text available
The family of gradient boosting algorithms has been recently extended with several interesting proposals (i.e. XGBoost, LightGBM and CatBoost) that focus on both speed and accuracy. XGBoost is a scalable ensemble technique that has demonstrated to be a reliable and efficient machine learning challenge solver. LightGBM is an accurate model focused on providing extremely fast training performance using selective sampling of high gradient instances. CatBoost modifies the computation of gradients to avoid the prediction shift in order to improve the accuracy of the model. This work proposes a practical analysis of how these novel variants of gradient boosting work in terms of training speed, generalization performance and hyper-parameter setup. In addition, a comprehensive comparison between XGBoost, LightGBM, CatBoost, random forests and gradient boosting has been performed using carefully tuned models as well as using their default settings. The results of this comparison indicate that CatBoost obtains the best results in generalization accuracy and AUC in the studied datasets although the differences are small. LightGBM is the fastest of all methods but not the most accurate. Finally, XGBoost places second both in accuracy and in training speed. Finally an extensive analysis of the effect of hyper-parameter tuning in XGBoost, LightGBM and CatBoost is carried out using two novel proposed tools.
Article
Full-text available
SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent technical developments. This Perspective describes the development and capabilities of SciPy 1.0, an open source scientific computing library for the Python programming language.
Article
Full-text available
In the last few years, Artificial Intelligence (AI) has achieved a notable momentum that, if harnessed appropriately, may deliver the best of expectations over many application sectors across the field. For this to occur shortly in Machine Learning, the entire community stands in front of the barrier of explainability, an inherent problem of the latest techniques brought by sub-symbolism (e.g. ensembles or Deep Neural Networks) that were not present in the last hype of AI (namely, expert systems and rule based models). Paradigms underlying this problem fall within the so-called eXplainable AI (XAI) field, which is widely acknowledged as a crucial feature for the practical deployment of AI models. The overview presented in this article examines the existing literature and contributions already done in the field of XAI, including a prospect toward what is yet to be reached. For this purpose we summarize previous efforts made to define explainability in Machine Learning, establishing a novel definition of explainable Machine Learning that covers such prior conceptual propositions with a major focus on the audience for which the explainability is sought. Departing from this definition, we propose and discuss about a taxonomy of recent contributions related to the explainability of different Machine Learning models, including those aimed at explaining Deep Learning methods for which a second dedicated taxonomy is built and examined in detail. This critical literature analysis serves as the motivating background for a series of challenges faced by XAI, such as the interesting crossroads of data fusion and explainability. Our prospects lead toward the concept of Responsible Artificial Intelligence , namely, a methodology for the large-scale implementation of AI methods in real organizations with fairness, model explainability and accountability at its core. Our ultimate goal is to provide newcomers to the field of XAI with a thorough taxonomy that can serve as reference material in order to stimulate future research advances, but also to encourage experts and professionals from other disciplines to embrace the benefits of AI in their activity sectors, without any prior bias for its lack of interpretability.
Article
Full-text available
We present the full public release of all data from the TNG100 and TNG300 simulations of the IllustrisTNG project. IllustrisTNG is a suite of large volume, cosmological, gravo-magnetohydrodynamical simulations run with the moving-mesh code Arepo. TNG includes a comprehensive model for galaxy formation physics, and each TNG simulation self-consistently solves for the coupled evolution of dark matter, cosmic gas, luminous stars, and supermassive black holes from early time to the present day, z=0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$z=0$\end{document}. Each of the flagship runs—TNG50, TNG100, and TNG300—are accompanied by halo/subhalo catalogs, merger trees, lower-resolution and dark-matter only counterparts, all available with 100 snapshots. We discuss scientific and numerical cautions and caveats relevant when using TNG. The data volume now directly accessible online is ∼750 TB, including 1200 full volume snapshots and ∼80,000 high time-resolution subbox snapshots. This will increase to ∼1.1 PB with the future release of TNG50. Data access and analysis examples are available in IDL, Python, and Matlab. We describe improvements and new functionality in the web-based API, including on-demand visualization and analysis of galaxies and halos, exploratory plotting of scaling relations and other relationships between galactic and halo properties, and a new JupyterLab interface. This provides an online, browser-based, near-native data analysis platform enabling user computation with local access to TNG data, alleviating the need to download large datasets.
Preprint
Full-text available
We introduce the THE THREE HUNDRED project, an endeavour to model 324 large galaxy clusters with full-physics hydrodynamical re-simulations. Here we present the data set and study the differences to observations for fundamental galaxy cluster properties and scaling relations. We find that the modelled galaxy clusters are generally in reasonable agreement with observations with respect to baryonic fractions and gas scaling relations at redshift z = 0. However, there are still some (model-dependent) differences, such as central galaxies being too massive, and galaxy colours (g - r) being bluer (about 0.2 dex lower at the peak position) than in observations. The agreement in gas scaling relations down to 10^{13} h^{-1} M_{\odot} between the simulations indicates that particulars of the sub-grid modelling of the baryonic physics only has a weak influence on these relations. We also include - where appropriate - a comparison to three semi-analytical galaxy formation models as applied to the same underlying dark-matter-only simulation. All simulations and derived data products are publicly available.
Article
Full-text available
Motivation: Random forests are fast, flexible and represent a robust approach to analyze high dimensional data. A key advantage over alternative machine learning algorithms are variable importance measures, which can be used to identify relevant features or perform variable selection. Measures based on the impurity reduction of splits, such as the Gini importance, are popular because they are simple and fast to compute. However, they are biased in favor of variables with many possible split points and high minor allele frequency. Results: We set up a fast approach to debias impurity-based variable importance measures for classification, regression and survival forests. We show that it creates a variable importance measure which is unbiased with regard to the number of categories and minor allele frequency and almost as fast as the standard impurity importance. As a result, it is now possible to compute reliable importance estimates without the extra computing cost of permutations. Further, we combine the importance measure with a fast testing procedure, producing p-values for variable importance with almost no computational overhead to the creation of the random forest. Applications to gene expression and genome-wide association data show that the proposed method is powerful and computationally efficient. Availability and implementation: The procedure is included in the ranger package, available at https://cran.r-project.org/package=ranger and https://github.com/imbs-hl/ranger. Contact: stefanonembrini@ufl.edu; wright@leibniz-bips.de. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
Cosmological simulations are fundamental tools to study structure formation and the astrophysics of evolving structures, in particular clusters of galaxies. While hydrodynamical simulations cannot sample efficiently large volumes and explore different cosmologies at the same time, N-body simulations lack the baryonic physics that is crucial to determine the observed properties of clusters. One solution is to use (semi-)analytical models to implement the needed baryonic physics. In this way, we can generate the many mock universes that will be required to fully exploit future large sky surveys, such as that from the upcoming eROSITA X-ray telescope. We developed a phenomenological model based on observations of clusters to implement gas density and temperature information on the dark-matter-only halos of the MultiDark simulations. We generate several full-sky mock light-cones of clusters for the WMAP and Planck cosmologies, adopting different parameters in our phenomenological model of the intra-cluster medium. For one of these simulations and models, we also generate 100 light-cones corresponding to 100 random observers and explore the variance among them in several quantities. In this first paper on MultiDark mock galaxy cluster light-cones, we focus on presenting our methodology and discuss predictions for eROSITA, in particular, exploring the potential of angular power spectrum analyses of its detected (and undetected) cluster population to study X-ray scaling relations, the intra-cluster medium, and the composition of the cosmic X-ray background. We make publicly available on-line more than 400 GB of light-cones, which include the expected eROSITA count rate, on Skies & Universes (http://www.skiesanduniverses.org).
Article
Full-text available
We analyse cosmological hydrodynamical simulations of galaxy clusters to study the X-ray scaling relations between total masses and observable quantities such as X-ray luminosity, gas mass, X-ray temperature, and YX. Three sets of simulations are performed with an improved version of the smoothed particle hydrodynamics GADGET-3 code. These consider the following: non-radiative gas, star formation and stellar feedback, and the addition of feedback by active galactic nuclei (AGN).We select clusters with M500 > 10¹⁴M⊙E(z)⁻¹, mimicking the typical selection of Sunyaev-Zeldovich samples. This permits to have a mass range large enough to enable robust fitting of the relations even at z ~ 2. The results of the analysis show a general agreement with observations. The values of the slope of the mass-gas mass and mass- temperature relations at z=2 are 10 per cent lowerwith respect to z=0 due to the appliedmass selection, in the former case, and to the effect of early merger in the latter. We investigate the impact of the slope variation on the study of the evolution of the normalization. We conclude that cosmological studies through scaling relations should be limited to the redshift range z = 0-1, where we find that the slope, the scatter, and the covariance matrix of the relations are stable. The scaling between mass and YX is confirmed to be the most robust relation, being almost independent of the gas physics. At higher redshifts, the scaling relations are sensitive to the inclusion of AGNs which influences low-mass systems. The detailed study of these objects will be crucial to evaluate the AGN effect on the ICM. © 2017 The Author(s). Published by Oxford University Press on behalf of the Royal Astronomical Society.
Article
Full-text available
Recent studies have presented evidence for tension between the constraints on Omega_m and sigma_8 from the cosmic microwave background (CMB) and measurements of large-scale structure (LSS). This tension can potentially be resolved by appealing to extensions of the standard model of cosmology and/or untreated systematic errors in the modelling of LSS, of which baryonic physics has been frequently suggested. We revisit this tension using, for the first time, carefully-calibrated cosmological hydrodynamical simulations, which thus capture the back reaction of the baryons on the total matter distribution. We have extended the BAHAMAS simulations to include a treatment of massive neutrinos, which currently represents the best motivated extension to the standard model. We make synthetic thermal Sunyaev-Zel'dovich effect, weak galaxy lensing, and CMB lensing maps and compare to observed auto- and cross-power spectra from a wide range of recent observational surveys. We conclude that: i) in general there is tension between the primary CMB and LSS when adopting the standard model with minimal neutrino mass; ii) after calibrating feedback processes to match the gas fractions of clusters, the remaining uncertainties in the baryonic physics modelling are insufficient to reconcile this tension; and iii) invoking a non-minimal neutrino mass, typically of 0.2-0.4 eV (depending on the priors on the other relevant cosmological parameters and the datasets being modelled), can resolve the tension. This solution is fully consistent with separate constraints on the summed neutrino mass from the primary CMB and baryon acoustic oscillations, given the internal tensions in the Planck primary CMB dataset.
Article
Full-text available
We report on the successful completion of a 2 trillion particle cosmological simulation to z=0 run on the Piz Daint supercomputer (CSCS, Switzerland), using 4000+ GPU nodes for a little less than 80h of wall-clock time or 350,000 node hours. Using multiple benchmarks and performance measurements on the US Oak Ridge National Laboratory Titan supercomputer, we demonstrate that our code PKDGRAV3, delivers, to our knowledge, the fastest time-to-solution for large-scale cosmological N-body simulations. This was made possible by using the Fast Multipole Method in conjunction with individual and adaptive particle time steps, both deployed efficiently (and for the first time) on supercomputers with GPU-accelerated nodes. The very low memory footprint of PKDGRAV3 allowed us to run the first ever benchmark with 8 trillion particles on Titan, and to achieve perfect scaling up to 18000 nodes and a peak performance of 10 Pflops.
Article
Full-text available
Intrinsic galaxy shape and angular momentum alignments can arise in cosmological large-scale structure due to tidal interactions or galaxy formation processes. Cosmological hydrodynamical simulations have recently come of age as a tool to study these alignments and their contamination to weak gravitational lensing. We probe the redshift and luminosity evolution of intrinsic alignments in Horizon-AGN between $z=0$ and $z=3$ for galaxies with an $r$-band absolute magnitude of $M_r\leq-20$. Alignments transition from being radial at low redshifts and high luminosities, dominated by the contribution of ellipticals, to being tangential at high redshift and low luminosities, where discs dominate the signal. This cannot be explained by the evolution of the fraction of ellipticals and discs alone: intrinsic evolution in the amplitude of alignments is necessary. We constrain the evolution of the alignment amplitude as a function of luminosity for elliptical galaxies alone and find it to be in good agreement with current observations and the nonlinear tidal alignment model at projected separations of $\gtrsim 1$ Mpc. Alignments of discs are null in projection and consistent with current low redshift observations. The combination of the two populations yields an overall amplitude to be a factor of $\simeq 2$ lower than observed alignments of luminous red galaxies with a steeper luminosity dependence. The restriction on accurate galaxy shapes implies that the galaxy population in the simulation is complete only to $M_r\leq-20$. Higher resolution simulations will be necessary to avoid extrapolation of the intrinsic alignment predictions to the range of luminosities probed by future surveys.
Article
Full-text available
We present a new version of the GALFORM semi-analytical model of galaxy formation. This brings together several previous developments of GALFORM into a single unified model, including a different initial mass function (IMF) in quiescent star formation and in starbursts, feedback from active galactic nuclei supressing gas cooling in massive halos, and a new empirical star formation law in galaxy disks based on their molecular gas content. In addition, we have updated the cosmology, introduced a more accurate treatment of dynamical friction acting on satellite galaxies, and updated the stellar population model. The new model is able to simultaneously explain both the observed evolution of the K-band luminosity function and stellar mass function, and the number counts and redshift distribution of sub-mm galaxies selected at 850 mu. This was not previously achieved by a single physical model within the LambdaCDM framework, but requires having an IMF in starbursts that is somewhat top-heavy. The new model is tested against a wide variety of observational data covering wavelengths from the far-UV to sub-mm, and redshifts from z=0 to z=6, and is found to be generally successful. These observations include the optical and near-IR luminosity functions, HI mass function, Tully-Fisher relation, fraction of early type galaxies, metallicity-luminosity relation and size-luminosity relation at z=0, as well as far-IR number counts, and far-UV luminosity functions at z ~ 3-6. [abridged]
Article
Full-text available
Within a sufficiently large cosmic volume, conservation of baryons implies a simple ‘closed box’ view in which the sum of the baryonic components must equal a constant fraction of the total enclosed mass. We present evidence from Rhapsody-G hydrodynamic simulations of massive galaxy clusters that the closed-box expectation may hold to a surprising degree within the interior, non-linear regions of haloes. At a fixed halo mass, we find a significant anti-correlation between hot gas mass fraction and galaxy mass fraction (cold gas + stars), with a rank correlation coefficient of −0.69 within R500c. Because of this anti-correlation, the total baryon mass serves as a low-scatter proxy for total cluster mass. The fractional scatter of total baryon fraction scales approximately as 0.02(Δc/100)0.6, while the scatter of either gas mass or stellar mass is larger in magnitude and declines more slowly with increasing radius. We discuss potential observational tests using cluster samples selected by optical and hot gas properties; the simulations suggest that joint selection on stellar and hot gas has potential to achieve 5 per cent scatter in total halo mass.
Article
Full-text available
The thermal Sunyaev–Zel'dovich (tSZ) effect offers a means of probing the hot gas in and around massive galaxies and galaxy groups and clusters, which is thought to constitute a large fraction of the baryon content of the Universe. The Planck collaboration recently performed a stacking analysis of a large sample of ‘locally brightest galaxies’ (LBGs) and, surprisingly, inferred an approximately self-similar relation between the tSZ flux and halo mass. At face value, this implies that the hot gas mass fraction is independent of halo mass, a result which is in apparent conflict with resolved X-ray observations. We test the robustness of the inferred trend using synthetic tSZ maps generated from cosmological hydrodynamical simulations and using the same tools and assumptions applied in the Planck study. We show that, while the detection and the estimate of the ‘total’ flux (within 5r500) is reasonably robust, the inferred flux originating from within r500 (i.e. the limiting radius to which X-ray observations typically probe) is highly sensitive to the assumed pressure distribution of the gas. Using our most realistic simulations with AGN feedback, that reproduce a wide variety of X-ray and optical properties of groups and clusters, we estimate that the derived tSZ flux within r500 is biased high by up to an order of magnitude for haloes with masses M500 ∼ 1013 M⊙. Moreover, we show that the AGN simulations are consistent with the total tSZ flux–mass relation observed with Planck, whereas a self-similar model is ruled out.
Article
Full-text available
Accurately predicting structural properties of dark matter halos is one of the fundamental goals of modern cosmology. We use the new suite of MultiDark cosmological simulations to study the evolution of dark matter halo density profiles, concentrations, and velocity anisotropies. The MultiDark simulations cover a large range of masses 1e10-1e15Msun and volumes upto 50Gpc**3. The total number of dark matter halos in all the simulations exceeds 60 billion. We find that in order to understand the structure of dark matter halos and to make ~1% accurate predictions for density profiles, one needs to realize that halo concentration is more complex than the traditional ratio of the virial radius to the core radius in the NFW profile. For massive halos the averge density profile is far from the NFW shape and the concentration is defined by both the core radius and the shape parameter alpha in the Einasto approximation. Combining results from different redshifts, masses and cosmologies, we show that halos progress through three stages of evolution. (1) They start as rare density peaks that experience very fast and nearly radial infall. This radial infall brings mass closer to the center producing a high concentrated halo. Here, the halo concentration increases with the increasing halo mass and the concentration is defined by the alpha parameter with nearly constant core radius. Later halos slide into (2) the plateau regime where the accretion becomes less radial, but frequent mergers still affect even the central region. Now the concentration does not depend on halo mass. (3) Once the rate of accretion slows down, halos move into the domain of declining concentration-mass relation because new accretion piles up mass close to the virial radius while the core radius is staying constant. We provide accurate analytical fits to the numerical results for halo density profiles and concentrations.
Article
Full-text available
We introduce the Virgo Consortium's EAGLE project, a suite of hydrodynamical simulations that follow the formation of galaxies and black holes in representative volumes. We discuss the limitations of such simulations in light of their finite resolution and poorly constrained subgrid physics, and how these affect their predictive power. One major improvement is our treatment of feedback from massive stars and AGN in which thermal energy is injected into the gas without the need to turn off cooling or hydrodynamical forces, allowing winds to develop without predetermined speed or mass loading factors. Because the feedback efficiencies cannot be predicted from first principles, we calibrate them to the z~0 galaxy stellar mass function and the amplitude of the galaxy-central black hole mass relation, also taking galaxy sizes into account. The observed galaxy mass function is reproduced to $\lesssim 0.2$ dex over the full mass range, $10^8 < M_*/M_\odot \lesssim 10^{11}$, a level of agreement close to that attained by semi-analytic models, and unprecedented for hydrodynamical simulations. We compare our results to a representative set of low-redshift observables not considered in the calibration, and find good agreement with the observed galaxy specific star formation rates, passive fractions, Tully-Fisher relation, total stellar luminosities of galaxy clusters, and column density distributions of intergalactic CIV and OVI. While the mass-metallicity relations for gas and stars are consistent with observations for $M_* \gtrsim 10^9 M_\odot$, they are insufficiently steep at lower masses. The gas fractions and temperatures are too high for clusters of galaxies, but for groups these discrepancies can be resolved by adopting a higher heating temperature in the subgrid prescription for AGN feedback. EAGLE constitutes a valuable new resource for studies of galaxy formation.
Article
Full-text available
We investigate baryon effects on the halo mass function (HMF), with emphasis on the role played by active galactic nuclei (AGN) feedback. Haloes are identified with both friends-of-friends (FoF) and spherical overdensity (SO) algorithms. We embed the standard SO algorithm into a memory-controlled frame program and present the Python spherIcAl Overdensity code – piao (Chinese character: ). For both FoF and SO haloes, the effect of AGN feedback is that of suppressing the HMFs to a level even below that of dark matter (DM) simulations. The ratio between the HMFs in the AGN and in the DM simulations is ∼0.8 at overdensity Δc = 500, a difference that increases at higher overdensity Δc = 2500, with no significant redshift and mass dependence. A decrease of the halo masses ratio with respect to the DM case induces the decrease of the HMF in the AGN simulation. The shallower inner density profiles of haloes in the AGN simulation witnesses that mass reduction is induced by the sudden displacement of gas induced by thermal AGN feedback. We provide fitting functions to describe halo mass variations at different overdensities, which can recover the HMFs with a residual random scatter ≲5 per cent for halo masses larger than 1013 h−1 M⊙.
Article
Full-text available
We introduce the Illustris Project, a series of large-scale hydrodynamical simulations of galaxy formation. The highest resolution simulation, Illustris-1, covers a volume of $(106.5\,{\rm Mpc})^3$, has a dark mass resolution of ${6.26 \times 10^{6}\,{\rm M}_\odot}$, and an initial baryonic matter mass resolution of ${1.26 \times 10^{6}\,{\rm M}_\odot}$. At $z=0$ gravitational forces are softened on scales of $710\,{\rm pc}$, and the smallest hydrodynamical gas cells have an extent of $48\,{\rm pc}$. We follow the dynamical evolution of $2\times 1820^3$ resolution elements and in addition passively evolve $1820^3$ Monte Carlo tracer particles reaching a total particle count of more than $18$ billion. The galaxy formation model includes: primordial and metal-line cooling with self-shielding corrections, stellar evolution, stellar feedback, gas recycling, chemical enrichment, supermassive black hole growth, and feedback from active galactic nuclei. At $z=0$ our simulation volume contains about $40,000$ well-resolved galaxies covering a diverse range of morphologies and colours including early-type, late-type and irregular galaxies. The simulation reproduces the cosmic star formation rate density, the galaxy luminosity function, and baryon conversion efficiency at $z=0$. It also qualitatively captures the impact of galaxy environment on the red fractions of galaxies. The internal velocity structure of selected well-resolved disk galaxies obeys the stellar and baryonic Tully-Fisher relation together with flat circular velocity curves. In the well-resolved regime the simulation reproduces the observed mix of early-type and late-type galaxies. Our model predicts a halo mass dependent impact of baryonic effects on the halo mass function and the masses of haloes caused by feedback from supernova and active galactic nuclei.
Article
Hydrodynamic simulations provide a powerful, but computationally expensive, approach to study the interplay of dark matter and baryons in cosmological structure formation. Here we introduce the EMulating Baryonic EnRichment (EMBER) Deep Learning framework to predict baryon fields based on dark-matter-only simulations thereby reducing computational cost. EMBER comprises two network architectures, U-Net and Wasserstein Generative Adversarial Networks (WGANs), to predict two-dimensional gas and HI densities from dark matter fields. We design the conditional WGANs as stochastic emulators, such that multiple target fields can be sampled from the same dark matter input. For training we combine cosmological volume and zoom-in hydrodynamical simulations from the Feedback in Realistic Environments (FIRE) project to represent a large range of scales. Our fiducial WGAN model reproduces the gas and HI power spectra within 10% accuracy down to ∼10 kpc scales. Furthermore, we investigate the capability of EMBER to predict high resolution baryon fields from low resolution dark matter inputs through upsampling techniques. As a practical application, we use this methodology to emulate high-resolution HI maps for a dark matter simulation of a L = 100 Mpc /h comoving cosmological box. The gas content of dark matter haloes and the HI column density distributions predicted by EMBER agree well with results of large volume cosmological simulations and abundance matching models. Our method provides a computationally efficient, stochastic emulator for augmenting dark matter only simulations with physically consistent maps of baryon fields.
Article
We present the BACCO project, a simulation framework specially designed to provide highly-accurate predictions for the distribution of mass, galaxies, and gas as a function of cosmological parameters. In this paper, we describe our main suite of gravity-only simulations (L ∼ 2 Gpc and 43203 particles) and present various validation tests. Using a cosmology-rescaling technique, we predict the nonlinear mass power spectrum over the redshift range 0 < z < 1.5 and over scales 10−2 < k/( h Mpc−1) < 5 for 800 points in an 8-dimensional cosmological parameter space. For an efficient interpolation of the results, we build an emulator and compare its predictions against several widely-used methods. Over the whole range of scales considered, we expect our predictions to be accurate at the $2{{\ \rm per\ cent}}$ level for parameters in the minimal ΛCDM model and to $3{{\ \rm per\ cent}}$ when extended to dynamical dark energy and massive neutrinos. We make our emulator publicly available under http://www.dipc.org/bacco
Article
Upcoming 21 cm surveys will map the spatial distribution of cosmic neutral hydrogen (Hi) over very large cosmological volumes. In order to maximize the scientific return of these surveys, accurate theoretical predictions are needed. Hydrodynamic simulations currently are the most accurate tool to provide those predictions in the mildly to nonlinear regime. Unfortunately, their computational cost is very high: tens of millions of CPU hours. We use convolutional neural networks to find the mapping between the spatial distribution of matter from N-body simulations and Hi from the state-of-the-art hydrodynamic simulation IllustrisTNG. Our model performs better than the widely used theoretical model: halo occupation distribution for all statistical properties up to the nonlinear scales k ≲ 1 hr Mpc-1. Our method allows the generation of 21 cm mocks over very big cosmological volumes with similar properties to hydrodynamic simulations. © 2021. The American Astronomical Society. All rights reserved..
Article
We introduce the Uchuu suite of large high-resolution cosmological N-body simulations. The largest simulation, named Uchuu, consists of 2.1 trillion (128003) dark matter particles in a box of side-length 2.0$\, h^{-1} \rm Gpc$, with particle mass 3.27 × 108$\, h^{-1} \rm M_{\odot }$. The highest resolution simulation, Shin-Uchuu, consists of 262 billion (64003) particles in a box of side-length 140$\, h^{-1} \rm Mpc$, with particle mass 8.97 × 105$\, h^{-1} \rm M_{\odot }$. Combining these simulations we can follow the evolution of dark matter halos and subhalos spanning those hosting dwarf galaxies to massive galaxy clusters across an unprecedented volume. In this first paper, we present basic statistics, dark matter power spectra, and the halo and subhalo mass functions, which demonstrate the wide dynamic range and superb statistics of the Uchuu suite. From an analysis of the evolution of the power spectra we conclude that our simulations remain accurate from the Baryon Acoustic Oscillation scale down to the very small. We also provide parameters of a mass-concentration model, which describes the evolution of halo concentration and reproduces our simulation data to within 5 per cent for halos with masses spanning nearly eight orders of magnitude at redshift 0 ≤ z ≤ 14. There is an upturn in the mass-concentration relation for the population of all halos and of relaxed halos at z ≳ 0.5, whereas no upturn is detected at z < 0.5. We make publicly available various N-body products as part of Uchuu Data Release 1 on the Skies & Universes site†. Future releases will include gravitational lensing maps and mock galaxy, X-ray cluster, and active galactic nuclei catalogues.
Article
We present a simulation-based inference framework using a convolutional neural network to infer dynamical masses of galaxy clusters from their observed 3D projected phase-space distribution, which consists of the projected galaxy positions in the sky and their line-of-sight velocities. By formulating the mass estimation problem within this simulation-based inference framework, we are able to quantify the uncertainties on the inferred masses in a straightforward and robust way. We generate a realistic mock catalogue emulating the Sloan Digital Sky Survey (SDSS) Legacy spectroscopic observations (the main galaxy sample) for redshifts z ≲ 0.09 and explicitly illustrate the challenges posed by interloper (non-member) galaxies for cluster mass estimation from actual observations. Our approach constitutes the first optimal machine learning-based exploitation of the information content of the full 3D projected phase-space distribution, including both the virialized and infall cluster regions, for the inference of dynamical cluster masses. We also present, for the first time, the application of a simulation-based inference machinery to obtain dynamical masses of around 800 galaxy clusters found in the SDSS Legacy Survey, and show that the resulting mass estimates are consistent with mass measurements from the literature.
Article
We present an algorithm for inferring the dynamical mass of galaxy clusters directly from their respective phase-space distributions, that is, the observed line-of-sight velocities and projected distances of galaxies from the cluster centre. Our method employs normalizing flows, a deep neural network capable of learning arbitrary high-dimensional probability distributions, and inherently accounts, to an adequate extent, for the presence of interloper galaxies which are not bounded to a given cluster, the primary contaminant of dynamical mass measurements. We validate and showcase the performance of our neural flow approach to robustly infer the dynamical mass of clusters from a realistic mock cluster catalogue. A key aspect of our novel algorithm is that it yields the probability density function of the mass of a particular cluster, thereby providing a principled way of quantifying uncertainties, in contrast to conventional machine learning (ML) approaches. The neural network mass predictions, when applied to a contaminated catalogue with interlopers, have a mean overall logarithmic residual scatter of 0.028 dex, with a lognormal scatter of 0.126 dex, which goes down to 0.089 dex for clusters in the intermediate- to high-mass range. This is an improvement by nearly a factor of 4 relative to the classical cluster mass scaling relation with the velocity dispersion, and outperforms recently proposed ML approaches. We also apply our neural flow mass estimator to a compilation of galaxy observations of some well-studied clusters with robust dynamical mass estimates, further substantiating the efficacy of our algorithm.
Article
We present the UNIT N-body cosmological simulations project, designed to provide precise predictions for non-linear statistics of the galaxy distribution. We focus on characterizing statistics relevant to emission line and luminous red galaxies in the current and upcoming generation of galaxy surveys. We use a suite of precise particle mesh simulations (fastpm) as well as with full N-body calculations with a mass resolution of ${\sim } 1.2\times 10^9\, h^{-1}$M⊙ to investigate the recently suggested technique of Angulo and Pontzen to suppress the variance of cosmological simulations. We study redshift-space distortions, cosmic voids, higher order statistics from z = 2 down to 0. We find that both two- and three-point statistics are unbiased. Over the scales of interest for baryon acoustic oscillations and redshift-space distortions, we find that the variance is greatly reduced in the two-point statistics and in the cross-correlation between haloes and cosmic voids, but is not reduced significantly for the three-point statistics. We demonstrate that the accuracy of the two-point correlation function for a galaxy survey with effective volume of 20 (h−1Gpc)3 is improved by about a factor of 40, indicating that two pairs of simulations with a volume of 1 (h−1Gpc)3 lead to the equivalent variance of ∼150 such simulations. The N-body simulations presented here thus provide an effective survey volume of about seven times the effective survey volume of Dark Energy Spectroscopic Instrument or Euclid. The data from this project, including dark matter fields, halo catalogues, and their clustering statistics, are publicly available.
Article
Using the catalogues of galaxy clusters from The Three Hundred project, modelled with both hydrodynamic simulations (gadget-x and gadget-music), and semi-analytical models (SAMs), we study the scatter and self-similarity of the profiles and distributions of the baryonic components of the clusters: the stellar and gas mass, metallicity, the stellar age, gas temperature, and the (specific) star formation rate. Through comparisons with observational results, we find that the shape and the scatter of the gas density profiles matches well the observed trends including the reduced scatter at large radii which is a signature of self-similarity suggested in previous studies. One of our simulated sets, gadget-x, reproduces well the shape of the observed temperature profile, while gadget-music has a higher and flatter profile in the cluster centre and a lower and steeper profile at large radii. The gas metallicity profiles from both simulation sets, despite following the observed trend, have a relatively lower normalization. The cumulative stellar density profiles from SAMs are in better agreement with the observed result than both hydrodynamic simulations which show relatively higher profiles. The scatter in these physical profiles, especially in the cluster centre region, shows a dependence on the cluster dynamical state and on the cool-core/non-cool-core dichotomy. The stellar age, metallicity, and (s)SFR show very large scatter, which are then presented in 2D maps. We also do not find any clear radial dependence of these properties. However, the brightest central galaxies have distinguishable features compared to the properties of the satellite galaxies.
Article
We introduce the simba simulations, the next generation of the mufasa cosmological galaxy formation simulations run with gizmo’s meshless finite mass hydrodynamics. simba includes updates to mufasa’s sub-resolution star formation and feedback prescriptions, and introduces black hole growth via the torque-limited accretion model of Anglés-Alcázar et al. from cold gas and Bondi accretion from hot gas, along with black hole feedback via kinetic bipolar outflows and X-ray energy. Ejection velocities are taken to be |${\sim } 10^3\,\,{\rm km}\, {\rm s}^{-1}$| at high Eddington ratios, increasing to |${\sim } 8000\,\,{\rm km}\, {\rm s}^{-1}$| at Eddington ratios below 2 per cent, with a constant momentum input of 20L/c. simba further includes an on-the-fly dust production, growth, and destruction model. Our simba run with |$(100h^{-1}\, {\rm Mpc})^3$| and 1024³ gas elements reproduces numerous observables, including galaxy stellar mass functions at z = 0−6, the stellar mass–star formation rate main sequence, H i and H2 fractions, the mass–metallicity relation at z ≈ 0, 2, star-forming galaxy sizes, hot gas fractions in massive haloes, and z = 0 galaxy dust properties. However, simba also yields an insufficiently sharp truncation of the z = 0 mass function, and too-large sizes for low-mass quenched galaxies. We show that simba’s jet feedback is primarily responsible for quenching massive galaxies.
Article
We present results from the semi-analytic model of galaxy formation SAG applied on the MultiDark simulation MDPL2. SAG features an updated supernova (SN) feedback scheme and a robust modelling of the environmental effects on satellite galaxies. This incorporates a gradual starvation of the hot gas halo driven by the action of ram pressure stripping (RPS), that can affect the cold gas disc, and tidal stripping (TS), which can act on all baryonic components. Galaxy orbits of orphan satellites are integrated providing adequate positions and velocities for the estimation of RPS and TS. The star formation history and stellar mass assembly of galaxies are sensitive to the redshift dependence implemented in the SN feedback model. We discuss a variant of our model that allows to reconcile the predicted star formation rate density at $z \gtrsim 3$ with the observed one, at the expense of an excess in the faint end of the stellar mass function at $z=2$. The fractions of passive galaxies as a function of stellar mass, halo mass and the halo-centric distances are consistent with observational measurements. The model also reproduces the evolution of the main sequence of star forming central and satellite galaxies. The similarity between them is a result of the gradual starvation of the hot gas halo suffered by satellites, in which RPS plays a dominant role. RPS of the cold gas does not affect the fraction of quenched satellites but it contributes to reach the right atomic hydrogen gas content for more massive satellites ($M_{\star}\gtrsim 10^{10}\,{\rm M}_{\odot}$).
Article
Current benchmark reports of classification algorithms generally concern common classifiers and their variants but do not include many algorithms that have been introduced in recent years. Moreover, important properties such as the dependency on number of classes and features and CPU running time are typically not examined. In this paper, we carry out a comparative empirical study on both established classifiers and more recently proposed ones on 71 data sets originating from different domains, publicly available at UCI and KEEL repositories. The list of 11 algorithms studied includes Extreme Learning Machine (ELM), Sparse Representation based Classification (SRC), and Deep Learning (DL), which have not been thoroughly investigated in existing comparative studies. It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines (SVM) and Random Forests (RF), while being the fastest algorithm in terms of prediction efficiency. ELM also yields good accuracy results, ranking in the top-5, alongside GBDT, RF, SVM, and C4.5 but this performance varies widely across all data sets. Unsurprisingly, top accuracy performers have average or slow training time efficiency. DL is the worst performer in terms of accuracy but second fastest in prediction efficiency. SRC shows good accuracy performance but it is the slowest classifier in both training and testing.
Article
We introduce the Hydrangea simulations, a suite of 24 cosmological hydrodynamic zoom-in simulations of massive galaxy clusters (M_200c = 10^14-10^15 M_Sun) with baryon particle masses of ~10^6 M_Sun. Designed to study the impact of the cluster environment on galaxy formation, they are a key part of the `Cluster-EAGLE' project (Barnes et al. 2017). They use a galaxy formation model developed for the EAGLE project, which has been shown to yield both realistic field galaxies and hot gas fractions of galaxy groups consistent with observations. The total stellar mass content of the simulated clusters agrees with observations, but central cluster galaxies are too massive, by up to 0.6 dex. Passive satellite fractions are higher than in the field, and at stellar masses Mstar > 10^10 M_Sun this environmental effect is quantitatively consistent with observations. The predicted satellite stellar mass function matches data from local cluster surveys. Normalized to total mass, there are fewer low-mass (Mstar < 10^10 M_Sun) galaxies within the virial radius of clusters than in the field, primarily due to star formation quenching. Conversely, the simulations predict an overabundance of massive galaxies in clusters compared to the field that persists to their far outskirts (> 5r_200c). This is caused by a significantly increased stellar mass fraction of (sub-)haloes in the cluster environment, by up to ~0.3 dex even well beyond r_200c. Haloes near clusters are also more concentrated than equally massive field haloes, but these two effects are largely uncorrelated.
Article
We introduce the Cluster-EAGLE (C-EAGLE) simulation project, a set of cosmological hydrodynamical zoom simulations of the formation of $30$ galaxy clusters in the mass range $10^{14}<M_{200}/\mathrm{M}_{\odot}<10^{15.4}$ that incorporates the Hydrangea sample of Bah\'e et al. (2017). The simulations adopt the state-of-the-art EAGLE galaxy formation model, with a gas particle mass of $1.8\times10^{6}\,\mathrm{M}_{\odot}$ and physical softening length of $0.7\,\mathrm{kpc}$. In this paper, we introduce the sample and present the low-redshift global properties of the clusters. We calculate the X-ray properties in a manner consistent with observational techniques, demonstrating the bias and scatter introduced by using estimated masses. We find the total stellar content and black hole masses of the clusters to be in good agreement with the observed relations. However, the clusters are too gas rich, suggesting that the AGN feedback model is not efficient enough at expelling gas from the high-redshift progenitors of the clusters. The X-ray properties, such as the spectroscopic temperature and the soft-band luminosity, and the Sunyaev-Zel'dovich properties are in reasonable agreement with the observed relations. However, the clusters have too high central temperatures and larger-than-observed entropy cores, which is likely driven by the AGN feedback after the cluster core has formed. The total metal content and its distribution throughout the ICM are a good match to the observations.
Conference Paper
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Article
We present the MAssive ClusterS and Intercluster Structures (MACSIS) project, a suite of 390 clusters simulated with baryonic physics that yields realistic massive galaxy clusters capable of matching a wide range of observed properties. MACSIS extends the recent BAHAMAS simulation to higher masses, enabling robust predictions for the redshift evolution of cluster properties and an assessment of the effect of selecting only the hottest systems. We study the observable-total mass and X-ray luminosity-temperature scaling relations across the complete observed cluster mass range, finding the slope of the scaling relations and the evolution of their normalization with redshift to depart significantly from self-similar predictions. This is driven by the impact of AGN feedback, the presence of non-thermal pressure support and biased X-ray temperatures. For a sample of hot clusters with core-excised temperatures $k_{\rm{B}}T\geq5\,\rm{keV}$ the normalization and slope of the observable-mass relations and their evolution are significantly closer to self-similar. The exception is the temperature-mass relation, for which the increased importance of non-thermal pressure support and biased X-ray temperatures leads to a greater departure from self-similarity in the hottest systems. We also demonstrate that this affects the slope and evolution of the normalization in the luminosity-temperature relation. The median hot gas profiles also show good agreement with observational data at $z=0$ and $z=1$, with their evolution again departing significantly from the self-similar prediction. However, selecting a hot sample of clusters yields profiles that evolve significantly closer to the self-similar prediction.
Article
This paper presents cosmological results based on full-mission Planck observations of temperature and polarization anisotropies of the cosmic microwave background (CMB) radiation. Our results are in very good agreement with the 2013 analysis of the Planck nominal-mission temperature data, but with increased precision. The temperature and polarization power spectra are consistent with the standard spatially-flat 6-parameter ΛCDM cosmology with a power-law spectrum of adiabatic scalar perturbations (denoted "base ΛCDM" in this paper). From the Planck temperature data combined with Planck lensing, for this cosmology we find a Hubble constant, H0 = (67.8 ± 0.9) km s⁻¹Mpc⁻¹, a matter density parameter Ωm = 0.308 ± 0.012, and a tilted scalar spectral index with ns = 0.968 ± 0.006, consistent with the 2013 analysis. Note that in this abstract we quote 68% confidence limits on measured parameters and 95% upper limits on other parameters. We present the first results of polarization measurements with the Low Frequency Instrument at large angular scales. Combined with the Planck temperature and lensing data, these measurements give a reionization optical depth of τ = 0.066 ± 0.016, corresponding to a reionization redshift of \hbox{$z-{\rm re}=8.8{+1.7}-{-1.4}$}. These results are consistent with those from WMAP polarization measurements cleaned for dust emission using 353-GHz polarization maps from the High Frequency Instrument. We find no evidence for any departure from base ΛCDM in the neutrino sector of the theory; for example, combining Planck observations with other astrophysical data we find Neff = 3.15 ± 0.23 for the effective number of relativistic degrees of freedom, consistent with the value Neff = 3.046 of the Standard Model of particle physics. The sum of neutrino masses is constrained to â'mν < 0.23 eV. The spatial curvature of our Universe is found to be very close to zero, with | ΩK | < 0.005. Adding a tensor component as a single-parameter extension to base ΛCDM we find an upper limit on the tensor-to-scalar ratio of r0.002< 0.11, consistent with the Planck 2013 results and consistent with the B-mode polarization constraints from a joint analysis of BICEP2, Keck Array, and Planck (BKP) data. Adding the BKP B-mode data to our analysis leads to a tighter constraint of r0.002 < 0.09 and disfavours inflationarymodels with a V(φ) φ² potential. The addition of Planck polarization data leads to strong constraints on deviations from a purely adiabatic spectrum of fluctuations. We find no evidence for any contribution from isocurvature perturbations or from cosmic defects. Combining Planck data with other astrophysical data, including Type Ia supernovae, the equation of state of dark energy is constrained to w =-1.006 ± 0.045, consistent with the expected value for a cosmological constant. The standard big bang nucleosynthesis predictions for the helium and deuterium abundances for the best-fit Planck base ΛCDM cosmology are in excellent agreement with observations. We also constraints on annihilating dark matter and on possible deviations from the standard recombination history. In neither case do we find no evidence for new physics. The Planck results for base ΛCDM are in good agreement with baryon acoustic oscillation data and with the JLA sample of Type Ia supernovae. However, as in the 2013 analysis, the amplitude of the fluctuation spectrum is found to be higher than inferred from some analyses of rich cluster counts and weak gravitational lensing. We show that these tensions cannot easily be resolved with simple modifications of the base ΛCDM cosmology. Apart from these tensions, the base ΛCDM cosmology provides an excellent description of the Planck CMB observations and many other astrophysical data sets.
Article
We use the cosmo-OWLS suite of cosmological hydrodynamical simulations to investigate the scatter and evolution of the global hot gas properties of large simulated populations of galaxy groups and clusters. Our aim is to compare the predictions of different physical models and to explore the extent to which commonly-adopted assumptions in observational analyses (e.g. self-similar evolution) are violated. We examine the relations between (true) halo mass and the X-ray temperature, X-ray luminosity, gas mass, Sunyaev-Zel'dovich (SZ) flux, the X-ray analogue of the SZ flux ($Y_X$) and the hydrostatic mass. For the most realistic models, which include AGN feedback, the slopes of the various mass-observable relations deviate substantially from the self-similar ones, particularly at late times and for low-mass clusters. The amplitude of the mass-temperature relation shows negative evolution with respect to the self-similar prediction (i.e. slower than the prediction) for all models, driven by an increase in non-thermal pressure support at higher redshifts. The AGN models predict strong positive evolution of the gas mass fractions at low halo masses. The SZ flux and $Y_X$ show positive evolution with respect to self-similarity at low mass but negative evolution at high mass. The scatter about the relations is well approximated by log-normal distributions, with widths that depend mildly on halo mass. The scatter decreases significantly with increasing redshift. The exception is the hydrostatic mass-halo mass relation, for which the scatter increases with redshift. Finally, we discuss the relative merits of various hot gas-based mass proxies.
Article
We present and test a method that dramatically reduces variance arising from the sparse sampling of wavemodes in cosmological simulations. The method uses two simulations which are fixed (the initial Fourier mode amplitudes are fixed to the ensemble average power spectrum) and paired (with initial modes exactly out of phase). We measure the power spectrum, monopole and quadrupole redshift-space correlation functions, halo mass function and reduced bispectrum at z = 1. By these measures, predictions from a fixed pair can be as precise on non-linear scales as an average over 50 traditional simulations. The fixing procedure introduces a non-Gaussian correction to the initial conditions; we give an analytic argument showing why the simulations are still able to predict the mean properties of the Gaussian ensemble. We anticipate that the method will drive down the computational time requirements for accurate large-scale explorations of galaxy bias and clustering statistics, enabling more precise comparisons with theoretical models, and facilitating the use of numerical simulations in cosmological data interpretation.
Article
This paper describes a new publicly available codebase for modelling galaxy formation in a cosmological context, the "Semi-Analytic Galaxy Evolution" model, or SAGE for short. SAGE is a significant update to that used in Croton et al. (2006) and has been rebuilt to be modular and customisable. The model will run on any N-body simulation whose trees are organised in a supported format and contain a minimum set of basic halo properties. In this work we present the baryonic prescriptions implemented in SAGE to describe the formation and evolution of galaxies, and their calibration for three N-body simulations: Millennium, Bolshoi, and GiggleZ. Updated physics include: gas accretion, ejection due to feedback, and reincorporation via the galactic fountain; a new gas cooling--radio mode active galactic nucleus (AGN) heating cycle; AGN feedback in the quasar mode; a new treatment of gas in satellite galaxies; and galaxy mergers, disruption, and the build-up of intra-cluster stars. Throughout, we show the results of a common default parameterization on each simulation, with a focus on the local galaxy population.
Article
We extend a machine learning (ML) framework presented previously to model galaxy formation and evolution in a hierarchical universe using N-body + hydrodynamical simulations. In this work, we show that ML is a promising technique to study galaxy formation in the backdrop of a hydrodynamical simulation. We use the Illustris simulation to train and test various sophisticated ML algorithms. By using only essential dark matter halo physical properties and no merger history, our model predicts the gas mass, stellar mass, black hole mass, star formation rate, g − r colour, and stellar metallicity fairly robustly. Our results provide a unique and powerful phenomenological framework to explore the galaxy–halo connection that is built upon a solid hydrodynamical simulation. The promising reproduction of the listed galaxy properties demonstrably place ML as a promising and a significantly more computationally efficient tool to study small-scale structure formation. We find that ML mimics a full-blown hydrodynamical simulation surprisingly well in a computation time of mere minutes. The population of galaxies simulated by ML, while not numerically identical to Illustris, is statistically robust and physically consistent with Illustris galaxies and follows the same fundamental observational constraints. ML offers an intriguing and promising technique to create quick mock galaxy catalogues in the future.
Article
We calculate the one-point probability density distribution functions (PDF) and the power spectra of the thermal and kinetic Sunyaev-Zeldovich (tSZ and kSZ) effects and the mean Compton Y parameter using the Magneticum Pathfinder simulations, state-of-the-art cosmological hydrodynamical simulations of a large cosmological volume of (896 Mpc/h)^3. These simulations follow in detail the thermal and chemical evolution of the intracluster medium as well as the evolution of super-massive black holes and their associated feedback processes. We construct full-sky maps of tSZ and kSZ from the light-cones out to z=0.17, and one realization of 8.8x8.8 degree wide, deep light-cone out to z=5.2. The local universe at z<0.027 is simulated by a constrained realisation. The tail of the one-point PDF of tSZ from the deep light-cone follows a power-law shape with an index of -3.2. Once convolved with the effective beam of Planck, it agrees with the PDF measured by Planck. The predicted tSZ power spectrum agrees with that of the Planck data at all multipoles up to l~1000, once the calculations are scaled to the Planck 2015 cosmological parameters with \Omega_m=0.308 and \sigma_8=0.8149. Consistent with the results in the literature, however, we continue to find the tSZ power spectrum at l=3000 that is significantly larger than that estimated from the high-resolution ground-based data. The simulation predicts the mean fluctuating Compton Y value of =1.18x10^{-6} for \Omega_m=0.272 and \sigma_8=0.809. Nearly half (~ 5x10^{-7}) of the signal comes from halos below a virial mass of 10^{13}M_\odot/h. Scaling this to the Planck 2015 parameters, we find =1.57x10^{-6}. The PDF and the power spectrum of kSZ from our simulation agree broadly with the previous work.
Article
We present results obtained from a set of cosmological hydrodynamic simulations of galaxy clusters, aimed at comparing predictions with observational data on the diversity between cool-core and non-cool-core clusters. Our simulations include the effects of stellar and AGN feedback and are based on an improved version of the Smoothed-Particle-Hydrodynamics code GADGET-3, which ameliorates gas mixing and better captures gas-dynamical instabilities by including a suitable artificial thermal diffusion. In this Letter, we focus our analysis on the entropy profiles, our primary diagnostic to classify the degree of cool-coreness of clusters, and on the iron profiles. In keeping with observations, our simulated clusters display a variety of behaviors in entropy profiles: they range from steadily decreasing profiles at small radii, characteristic of cool-core systems, to nearly flat core isentropic profiles, characteristic of non cool-core systems. Using observational criteria to distinguish between the two classes of objects, we find them to occur in similar proportions in simulations and in observations. Furthermore, we also find that simulated cool-core clusters have profiles of iron abundance that are steeper than those of non-cool-core clusters, also in agreement with observational results. We show that the capability of our simulations to generate a realistic cool-core structure in the cluster population is due to AGN feedback and artificial thermal diffusion: their combined action allows to naturally distribute the energy extracted from super-massive black holes and to compensate the radiative losses of low-entropy gas with short cooling time residing in the cluster core.
Book
Applied Predictive Modeling covers the overall predictive modeling process, beginning with the crucial steps of data preprocessing, data splitting and foundations of model tuning. The text then provides intuitive explanations of numerous common and modern regression and classification techniques, always with an emphasis on illustrating and solving real data problems. The text illustrates all parts of the modeling process through many hands-on, real-life examples, and every chapter contains extensive R code for each step of the process. This multi-purpose text can be used as an introduction to predictive models and the overall modeling process, a practitioner's reference handbook, or as a text for advanced undergraduate or graduate level predictive modeling courses. To that end, each chapter contains problem sets to help solidify the covered concepts and uses data available in the book's R package. This text is intended for a broad audience as both an introduction to predictive models as well as a guide to applying them. Non-mathematical readers will appreciate the intuitive explanations of the techniques while an emphasis on problem-solving with real data across a wide variety of applications will aid practitioners who wish to extend their expertise. Readers should have knowledge of basic statistical ideas, such as correlation and linear regression analysis. While the text is biased against complex equations, a mathematical background is needed for advanced topics. © Springer Science+Business Media New York 2013. All rights reserved.
Article
Current and future surveys of large-scale cosmic structure are associated with a massive and complex datastream to study, characterize, and ultimately understand the physics behind the two major components of the 'Dark Universe', dark energy and dark matter. In addition, the surveys also probe primordial perturbations and carry out fundamental measurements, such as determining the sum of neutrino masses. Large-scale simulations of structure formation in the Universe play a critical role in the interpretation of the data and extraction of the physics of interest. Just as survey instruments continue to grow in size and complexity, so do the supercomputers that enable these simulations. Here we report on HACC (Hardware/Hybrid Accelerated Cosmology Code), a recently developed and evolving cosmology N-body code framework, designed to run efficiently on diverse computing architectures and to scale to millions of cores and beyond. HACC can run on all current supercomputer architectures and supports a variety of programming models and algorithms. It has been demonstrated at scale on Cell- and GPU-accelerated systems, standard multi-core node clusters, and Blue Gene systems. HACC's design allows for ease of portability, and at the same time, high levels of sustained performance on the fastest supercomputers available. We present a description of the design philosophy of HACC, the underlying algorithms and code structure, and outline implementation details for several specific architectures. We show selected accuracy and performance results from some of the largest high resolution cosmological simulations so far performed, including benchmarks evolving more than 3.6 trillion particles.
Article
In recent years, deep neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.