Contributions to log F from our emulator as a function of k for the Planck 2018 cosmology. The line numbers indicated in the legend correspond to the line in Eq. (6). One sees that the first term provides an overall offset, the second and fourth capture the BAO signal, and the third term contains a broad oscillation and then matches on to the decaying residual at high k.

Contributions to log F from our emulator as a function of k for the Planck 2018 cosmology. The line numbers indicated in the legend correspond to the line in Eq. (6). One sees that the first term provides an overall offset, the second and fourth capture the BAO signal, and the third term contains a broad oscillation and then matches on to the decaying residual at high k.

Source publication
Article
Full-text available
Context. Computing the matter power spectrum, P ( k ), as a function of cosmological parameters can be prohibitively slow in cosmological analyses, hence emulating this calculation is desirable. Previous analytic approximations are insufficiently accurate for modern applications, so black-box, uninterpretable emulators are often used. Aims. We aim...

Contexts in source publication

Context 1
... cosines with an argument proportional to 1/ √ 1 + bk 2 , for some constant b. This functional form (x/ 1 + y 2 ) arises due to the inclusion of the analytic quotient operator, which also explains why the constant 1 appears multiple times in Eq. (6). These terms give oscillations which vary slowly as a function of k. In particular, as plotted in Fig. 5, the third line of Eq. (6) contains approximately one cycle of oscillation across the range of k considered, with a minimum during the BAO part of the power spectrum, and a maximum just afterwards. Beyond this point, this term fits the non-oscillatory, decaying part of the residual beyond k ∼ 1 h Mpc −1 (compare the middle panel of ...
Context 2
... (6) contains approximately one cycle of oscillation across the range of k considered, with a minimum during the BAO part of the power spectrum, and a maximum just afterwards. Beyond this point, this term fits the non-oscillatory, decaying part of the residual beyond k ∼ 1 h Mpc −1 (compare the middle panel of Fig. 2 to the third term plotted in Fig. ...
Context 3
... frequency of these oscillations is ω ∝ 1/ b + Ω 2 b for some parameter b, such that cosmologies with a higher fraction of baryons have many more cycles of BAO in a given range of k, as one would physically expect. From Fig. 5 Thus, although we did not enforce physically motivated terms in the equation search, we see that simple oscillatory contributions for the BAOs have emerged and thus our symbolic emulator is not merely a high order series expansion, but contains terms which are both compact and interpretable. We find that such terms exist in many functions given in Fig. 3, however we find that using shorter run times for operon of only 2-4 h A209, page 6 of 9 (compared to approximately 24 h on a single node of 128 cores for our fiducial analysis) do not provide as interpretable expressions as Eq. ...

Similar publications

Preprint
Full-text available
We present a method to mitigate the effects of fiber assignment incompleteness in two-point power spectrum and correlation function measurements from galaxy spectroscopic surveys, by truncating small angular scales from estimators. We derive the corresponding modified correlation function and power spectrum windows to account for the small angular...
Preprint
Full-text available
An accurate calculation of their abundance is crucial for numerous aspects of cosmology related to primordial black holes (PBHs). For example, placing constraints on the primordial power spectrum from constraints on the abundance of PBHs (or vice-versa), calculating the mass function observable today, or predicting the merger rate of (primordial) b...

Citations

... After the first seminal papers on this topic [5][6][7], several emulators have been produced in the literature, emulating the output of Boltzmann solvers such as CAMB [8] or CLASS [9], with applications ranging from the Cosmic Microwave Background (CMB) [10][11][12][13][14], the linear matter power spectrum [11,[15][16][17][18][19], galaxy power spectrum multipoles [17,[19][20][21][22], and the galaxy survey angular power spectrum [23][24][25][26][27][28][29]. ...
... ported to any programming language without loss of precision; second, it greatly speeds up the preprocessing step. Indeed, evaluating the symbolic expression takes only 200 ns, offering a dramatic speedup compared to the ODE solution (see also [18,78,79] for other applications of symbolic regression related to LSS observables). ...
Preprint
Full-text available
We present the official release of the EFfective Field theORy surrogaTe (Effort), a novel and efficient emulator designed for the Effective Field Theory of Large-Scale Structure (EFTofLSS). This tool combines state-of-the-art numerical methods and clever preprocessing strategies to achieve exceptional computational performance without sacrificing accuracy. To validate the emulator reliability, we compare Bayesian posteriors sampled using Effort via Hamiltonian MonteCarlo methods to the ones sampled using the widely-used pybird code, via the Metropolis-Hastings sampler. On a large-volume set of simulations, and on the BOSS dataset, the comparison confirms excellent agreement, with deviations compatible with MonteCarlo noise. Looking ahead, Effort is poised to analyze next-generation cosmological datasets and to support joint analyses with complementary tools.
... As such, after several iterations, a population of expressions which well approximate the dataset emerge. We choose OPERON over other SR codes due to its strong performance in benchmark tests [109,110] and as it has already been demonstrated to be effective in cosmological and astrophysical studies [111][112][113][114]. ...
Article
Full-text available
New constraints on the expansion rate of the Universe seem to favor evolving dark energy in the form of thawing quintessence models, i.e., models for which a canonical, minimally coupled scalar field has, at late times, begun to evolve away from potential energy domination. We scrutinize the evidence for thawing quintessence by exploring what it predicts for the equation of state. We show that, in terms of the usual Chevalier-Polarski-Linder parameters, ( w 0 , w a ), thawing quintessence is, in fact, only marginally consistent with a compilation of the current data. Despite this, we embrace the possibility that thawing quintessence is dark energy and find constraints on the microphysics of this scenario. We do so in terms of the effective mass m 2 and energy scale V 0 of the scalar field potential. We are particularly careful to enforce uninformative, flat priors on these parameters so as to minimize their effect on the final posteriors. While the current data favors a large and negative value of m 2 , when we compare these models to the standard Λ CDM model we find that there is scant evidence for thawing quintessence. Published by the American Physical Society 2024
... As such, after several iterations, a population of expressions which well approximate the dataset emerge. We choose operon over other SR codes due to its strong performance in benchmark tests [103,104] and as it has already been demonstrated to be effective in cosmological and astrophysical studies [105][106][107][108]. ...
Preprint
New constraints on the expansion rate of the Universe seem to favor evolving dark energy in the form of thawing quintessence models, i.e., models for which a canonical, minimally coupled scalar field has, at late times, begun to evolve away from potential energy domination. We scrutinize the evidence for thawing quintessence by exploring what it predicts for the equation of state. We show that, in terms of the usual Chevalier-Polarski-Linder parameters, (w0w_0, waw_a), thawing quintessence is, in fact, only marginally consistent with a compilation of the current data. Despite this, we embrace the possibility that thawing quintessence is dark energy and find constraints on the microphysics of this scenario. We do so in terms of the effective mass m2m^2 and energy scale V0V_0 of the scalar field potential. We are particularly careful to enforce un-informative, flat priors on these parameters so as to minimize their effect on the final posteriors. While the current data favors a large and negative value of m2m^2, when we compare these models to the standard Λ\LambdaCDM model we find that there is scant evidence for thawing quintessence.
... Top: linear (black dashed) and nonlinear (blue) matter power spectrum at z = 0.7. The nonlinear power spectrum is obtained with the syren-halofit package (Bartlett et al. 2024b(Bartlett et al. , 2024a. Bottom: the fractional difference between the linear and nonlinear power spectrum. ...
Article
Full-text available
Line intensity mapping (LIM) has emerged as a promising tool for probing the 3D large-scale structure through the aggregate emission of spectral lines. The presence of interloper lines poses a crucial challenge in extracting the signal from the target line in LIM. In this work, we introduce a novel method for LIM analysis that simultaneously extracts line signals from multiple spectral lines, utilizing the covariance of native LIM data elements defined in the spectral–angular space. We leverage correlated information from different lines to perform joint inference on all lines simultaneously, employing a Bayesian analysis framework. We present the formalism, demonstrate our technique with a mock survey setup resembling the SPHEREx deep-field observation, and consider four spectral lines within the SPHEREx spectral coverage in the near-infrared: H α , [O iii ], H β , and [O ii ]. We demonstrate that our method can extract the power spectrum of all four lines at the ≳10 σ level at z < 2. For the brightest line, H α , the 10 σ sensitivity can be achieved out to z ∼ 3. Our technique offers a flexible framework for LIM analysis, enabling simultaneous inference of signals from multiple line emissions while accommodating diverse modeling constraints and parameterizations.
... While deriving a complete analytical description of P (k) from first principles remains unattainable, semianalytical formulations have proven effective in approximating P (k) [56,57]. The widely utilized "analytical emulator" for the P (k) of the concordance model is the Eisenstein-Hu (EH) formula, established around three decades ago [19]. 2 The EH formula, rooted in well-understood physical phenomena, comprises approximately 30 expressions in its complete form, rendering it highly complex (see Appendix A). ...
Preprint
Current and future large-scale structure surveys aim to constrain cosmological parameters with unprecedented precision by analyzing vast amounts of data. This imposes a pressing need to develop fast and accurate methods for computing the matter power spectrum P(k), or equivalently, the matter transfer function T(k). In previous works, we introduced precise fitting formulas for these quantities within the standard cosmological model, including extensions such as the presence of massive neutrinos and modifications of gravity. However, these formulations overlooked a key characteristic imprinted in P(k): the baryon acoustic oscillation signal. Here, we leverage our understanding of the well-known physics behind this oscillatory pattern to impose constraints on our genetic algorithm, a machine learning technique. By employing this ``physics-informed'' approach, we introduce an expression that accurately describes the matter transfer function with sub-percent mean accuracy. The high interpretability of the output allows for straightforward extensions of this formulation to other scenarios involving massive neutrinos and modifications of gravity. We anticipate that this formula will serve as a competitive fitting function for P(k), meeting the accuracy requirements essential for cosmological analyses.
... It is therefore a significant challenge when we meet a physical system, observable, or an experimental dataset that we cannot describe through mathematical formulas. This problem arises in various places in astrophysics and cosmology where, for instance, the precise halo mass function is unknown [1], stellar density profiles are estimated based on various approximations but the true underlying relation is yet to be discovered [2], a derived expression for the non-linear matter power spectrum does not yet exist [3,4], and intricacies of general relativity leave useful expressions for observables such as the mean redshift drift [5,6] still waiting to be found (the concept of mean redshift drift will be introduced in section 2). When a ground truth expression for an quantity is desired but seems unfeasible to obtain through analytical considerations, one option is to try to use symbolic regression which is a regression method based on machine learning (ML). ...
... For instance, the work in [5,6] utilized the symbolic regression algorithm AI Feynman [7,8] because of its user-friendliness and because it was developed by physicists with discovering physics formulae in mind, showing an improvement compared to earlier state-of-the-art algorithms. The work presented in [3,4] instead utilized OPERON [9] which was chosen due to its speed, memory efficiency and because it is based on genetic algorithms which tend to do well in symbolic regression benchmark studies and competitions (e.g. [10,11]). ...
... The expression for the redshift drift given in the previous section is valid only for FLRW spacetimes, i.e. in spacetimes which have no structures on any scales. Standard cosmology is 3 We could equally well have written out the Hubble parameter in terms of Ωm,0, z, and H0 as when we considered the Hubble parameter but since the difference between the targets in the datasets with the Hubble parameter versus the redshift drift would very small, we expect the symbolic regression algorithms would perform similarly to as on the Hubble datasets. 4 Remember that for flat, single component universes, H(z) = based on the assumption that the large-scale dynamics of the Universe can be approximated well by the FLRW models but it has also been discussed whether the structures on smaller scales can affect the large-scale/averaged dynamics. ...
Preprint
We present a benchmark study of ten symbolic regression algorithms applied to cosmological datasets. We find that the dimension of the feature space as well as precision of datasets are highly important for symbolic regression tasks to be successful. We find no indication that inter-dependence of features in datasets are particularly important, meaning that it is not an issue if datasets e.g. contain both z and H(z) as features. We find no indication that performance of algorithms on standardized datasets are good indications of performance on cosmological datasets. This suggests that it is not necessarily prudent to choose which symbolic regressions algorithm to use based on their performance on standardized data. Instead, a more prudent approach is to consider a variety of algorithms. Overall, we find that most of the benched algorithms do rather poorly in the benchmark and suggest possible ways to proceed with developing algorithms that will be better at identifying ground truth expressions for cosmological datasets. As part of this publication we introduce our benchmark algorithm cp3-bench which we make publicly available at https://github.com/CP3-Origins/cp3-bench. The philosophy behind cp3-bench is that is should be as user-friendly as possible, available in a ready-to-use format, and allow for easy additions of new algorithms and datasets.
... However, these symbolic expressions are insufficiently accurate for modern uses. This led Bartlett et al. (2024) to propose a simple extension to the Eisenstein & Hu (1998) expressions which gives a root mean squared fractional error of just 0.2% across a wide range of cosmologies, which is more than sufficient for current analyses (Taylor et al. 2018). ...
... We produce simple analytic expressions (Eqs. (25)- (27)) for all variables appearing in the halofit model so that, when coupled with the linear matter power spectrum approximation of Bartlett et al. (2024), halofit can be evaluated without performing any integrals, dramatically increasing its speed by a factor of 2350. ...
... and found an expression for F which was able to produce a subpercent level approximation to P L (k; a, θ) for a range of cosmologies. Throughout this paper, when referring to this fitting formula, we mean the 'fiducial' model given in Bartlett et al. (2024) as opposed to the more accurate yet less interpretable model given in the appendix of that paper. Extending this to the nonlinear case requires addressing three differences compared to the linear theory prediction: 1. P(k; a, θ) depends on the scale factor, a, in a non-trivial manner. ...
Article
Full-text available
Context. Rapid and accurate evaluation of the nonlinear matter power spectrum, P ( k ), as a function of cosmological parameters and redshift is of fundamental importance in cosmology. Analytic approximations provide an interpretable solution, yet current approximations are neither fast nor accurate relative to numerical emulators. Aims. We aim to accelerate symbolic approximations to P ( k ) by removing the requirement to perform integrals, instead using short symbolic expressions to compute all variables of interest. We also wish to make such expressions more accurate by re-optimising the parameters of these models (using a larger number of cosmologies and focussing on cosmological parameters of more interest for present-day studies) and providing correction terms. Methods. We use symbolic regression to obtain simple analytic approximations to the nonlinear scale, k σ , the effective spectral index, n eff , and the curvature, C , which are required for the HALOFIT model. We then re-optimise the coefficients of HALOFIT to fit a wide range of cosmologies and redshifts. We then again exploit symbolic regression to explore the space of analytic expressions to fit the residuals between P ( k ) and the optimised predictions of HALOFIT . Our results are designed to match the predictions of EUCLIDEMULATOR 2, but we validate our methods against N -body simulations. Results. We find symbolic expressions for k σ , n eff and C which have root mean squared fractional errors of 0.8%, 0.2% and 0.3%, respectively, for redshifts below 3 and a wide range of cosmologies. We provide re-optimised HALOFIT parameters, which reduce the root mean squared fractional error (compared to EUCLIDEMULATOR 2) from 3% to below 2% for wavenumbers k = 9 × 10 ⁻³ − 9 h Mpc ⁻¹ . We introduce SYREN-HALOFIT (symbolic-regression-enhanced HALOFIT ), an extension to HALOFIT containing a short symbolic correction which improves this error to 1%. Our method is 2350 and 3170 times faster than current HALOFIT and HMCODE implementations, respectively, and 2680 and 64 times faster than EUCLIDEMULATOR 2 (which requires running CLASS ) and the BACCO emulator. We obtain comparable accuracy to EUCLIDEMULATOR 2 and the BACCO emulator when tested on N -body simulations. Conclusions. Our work greatly increases the speed and accuracy of symbolic approximations to P ( k ), making them significantly faster than their numerical counterparts without loss of accuracy.
Article
Full-text available
We introduce cp3-bench, a tool for comparing/benching symbolic regression algorithms, which we make publicly available at https://github.com/CP3-Origins/cp3-bench. In its current format, cp3-bench includes 12 different symbolic regression algorithms which can be automatically installed as part of cp3-bench. The philosophy behind cp3-bench is that is should be as user-friendly as possible, available in a ready-to-use format, and allow for easy additions of new algorithms and datasets. Our hope is that users of symbolic regression algorithms can use cp3-bench to easily install and compare/bench an array of symbolic regression algorithms to better decide which algorithms to use for their specific tasks at hand. To introduce and motivate the use of cp3-bench we present a small benchmark of 12 symbolic regression algorithms applied to 28 datasets representing six different cosmological and astroparticle physics setups. Overall, we find that most of the benched algorithms do rather poorly in the benchmark and suggest possible ways to proceed with developing algorithms that will be better at identifying ground truth expressions for cosmological and astroparticle physics datasets. Our demonstration benchmark specifically studies the significance of dimensionality of the feature space and precision of datasets. We find both to be highly important for symbolic regression tasks to be successful. On the other hand, we find no indication that inter-dependence of features in datasets is particularly important, meaning that it is not in general a hindrance for symbolic regression algorithms if datasets e.g. contain both z and H(z) as features. Lastly, we note that we find no indication that performance of algorithms on standardized datasets are good indicators of performance on particular cosmological and astrophysical datasets. This suggests that it is not necessarily prudent to choose symbolic regression algorithms based on their performance on standardized data. Instead, a more robust approach is to consider a variety of algorithms, chosen based on the particular task at hand that one wishes to apply symbolic regression to.
Article
In many astrophysical systems, photons interact with matter through thermal Comptonization. In these cases, under certain simplifying assumptions, the evolution of the photon spectrum is described by an energy diffusion equation such as the Kompaneets equation, having dependencies on the seed photon temperature, θi\theta _i, the electron temperature, θe\theta _e, and the Compton y-parameter. The resulting steady-state spectrum is characterized by the average photon energy and the Compton temperature, which both lack analytical dependencies on the initial parameters. Here, we present empirical relations of these two quantities as functions of θi\theta _i, θe\theta _e, and y, obtained by evaluating the steady-state solution of the Kompaneets equation accounting for energy diffusion and electron recoil. The relations have average fractional errors 1  per cent{\sim} 1~{{\ \rm per\ cent}} across a wide range of the initial parameters, which make them useful in numerical applications.
Article
Given the growth in the variety and precision of astronomical datasets of interest for cosmology, the best cosmological constraints are invariably obtained by combining data from different experiments. At the likelihood level, one complication in doing so is the need to marginalise over large-dimensional parameter models describing the data of each experiment. These include both the relatively small number of cosmological parameters of interest and a large number of ‘nuisance’ parameters. Sampling over the joint parameter space for multiple experiments can thus become a very computationally expensive operation. This can be significantly simplified if one could sample directly from the marginal cosmological posterior distribution of preceding experiments, depending only on the common set of cosmological parameters. We show that this can be achieved by emulating marginal posterior distributions via normalising flows. The resulting trained normalising flow models can be used to efficiently combine cosmological constraints from independent datasets without increasing the dimensionality of the parameter space under study. The method is able to accurately describe the posterior distribution of real cosmological datasets, as well as the joint distribution of different datasets, even when significant tension exists between experiments. The resulting joint constraints can be obtained in a fraction of the time it would take to combine the same datasets at the level of their likelihoods. We construct normalising flow models for a set of public cosmological datasets of general interests and make them available, together with the software used to train them, and to exploit them in cosmological parameter inference.