Gabriel Kronberger’s research while affiliated with University of Applied Sciences Upper Austria and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (169)


Figure 1: (a) Illustrative example of an e-graph and (b) the same e-graph after inserting the expression í µí±¥ + 2í µí±¥.
Symbolic regression algorithms hyperparameters. Operators enveloped with |.| apply the absolute value to the first argument. The population size for PySR is 1 /10 of the reported values in this table to allow the use of ten islands.
Improving Genetic Programming for Symbolic Regression with Equality Graphs
  • Preprint
  • File available

January 2025

·

1 Read

·

Gabriel Kronberger

The search for symbolic regression models with genetic programming (GP) has a tendency of revisiting expressions in their original or equivalent forms. Repeatedly evaluating equivalent expressions is inefficient, as it does not immediately lead to better solutions. However, evolutionary algorithms require diversity and should allow the accumulation of inactive building blocks that can play an important role at a later point. The equality graph is a data structure capable of compactly storing expressions and their equivalent forms allowing an efficient verification of whether an expression has been visited in any of their stored equivalent forms. We exploit the e-graph to adapt the subtree operators to reduce the chances of revisiting expressions. Our adaptation, called eggp, stores every visited expression in the e-graph, allowing us to filter out from the available selection of subtrees all the combinations that would create already visited expressions. Results show that, for small expressions, this approach improves the performance of a simple GP algorithm to compete with PySR and Operon without increasing computational cost. As a highlight, eggp was capable of reliably delivering short and at the same time accurate models for a selected set of benchmarks from SRBench and a set of real-world datasets.

Download

Number of possible patterns for tree of different number of levels.
rEGGression: an Interactive and Agnostic Tool for the Exploration of Symbolic Regression Models

January 2025

·

3 Reads

Regression analysis is used for prediction and to understand the effect of independent variables on dependent variables. Symbolic regression (SR) automates the search for non-linear regression models, delivering a set of hypotheses that balances accuracy with the possibility to understand the phenomena. Many SR implementations return a Pareto front allowing the choice of the best trade-off. However, this hides alternatives that are close to non-domination, limiting these choices. Equality graphs (e-graphs) allow to represent large sets of expressions compactly by efficiently handling duplicated parts occurring in multiple expressions. E-graphs allow to store and query all SR solution candidates visited in one or multiple GP runs efficiently and open the possibility to analyse much larger sets of SR solution candidates. We introduce rEGGression, a tool using e-graphs to enable the exploration of a large set of symbolic expressions which provides querying, filtering, and pattern matching features creating an interactive experience to gain insights about SR models. The main highlight is its focus in the exploration of the building blocks found during the search that can help the experts to find insights about the studied phenomena.This is possible by exploiting the pattern matching capability of the e-graph data structure.



Figure 7. Relation between median bias and median model size over all problems.
Distribution of the average RMSE values and corresponding ranks per problem, broken down by noise level σ. The left value in each cell is the median, the right value the interquartile range. The right column are the values outlined in Figures 3 and 4.
Distribution of the average model size and corresponding ranks per problem, broken down by noise level σ. The left value in each cell is the median, the right value the interquartile range. The right column are the values outlined in Figures 3 and 4.
Distribution of the average bias and corresponding ranks per problem, broken down by noise level σ. The left value in each cell is the median, the right value the interquartile range. The right column are the values outlined in Figures 5 and 6.
Distribution of the average variance and corresponding ranks per problem, broken down by noise level σ. The left value in each cell is the median, the right value the interquartile range. The right column are the values outlined in Figures 5 and 6.
Bias and Variance Analysis of Contemporary Symbolic Regression Methods

November 2024

·

23 Reads

Applied Sciences

Symbolic regression is commonly used in domains where both high accuracy and interpretability of models is required. While symbolic regression is capable to produce highly accurate models, small changes in the training data might cause highly dissimilar solution. The implications in practice are huge, as interpretability as key-selling feature degrades when minor changes in data cause substantially different behavior of models. We analyse those perturbations caused by changes in training data for ten contemporary symbolic regression algorithms. We analyse existing machine learning models from the SRBench benchmark suite, a benchmark that compares the accuracy of several symbolic regression algorithms. We measure the bias and variance of algorithms and show how algorithms like Operon and GP-GOMEA return highly accurate models with similar behavior despite changes in training data. Our results highlight that larger model sizes do not imply different behavior when training data change. On the contrary, larger models effectively prevent systematic errors. We also show how other algorithms like ITEA or AIFeynman with the declared goal of producing consistent results meet up to their expectation of small and similar models.








Citations (50)


... This is possible with the use of e-graph data structure [43] created for the equality saturation algorithm, a technique used to alleviate the phase ordering problem in the optimization of computer programs during the compilation process. This technique was previously used in the context of symbolic regression in [11,24] to investigate the problem of unwanted overparameterization that increases the chance of a suboptimal fitting of the numerical parameters. The generated e-graph has another interesting feature that can be exploited by symbolic regression algorithms: it contains a database of patterns and equivalent expressions that can be easily matched against a new candidate expression. ...

Reference:

rEGGression: an Interactive and Agnostic Tool for the Exploration of Symbolic Regression Models
Effects of Reducing Redundant Parameters in Parameter Optimization for Symbolic Regression using Genetic Programming
  • Citing Article
  • December 2024

Journal of Symbolic Computation

... Both operations were performed six times, and each candidate model produced was evaluated for consistency at a sample of domain locations. We did not investigate if there was some optimum number with respect to overall run time or if the preferential search strategy made a significant impact on an appropriate measure of success rate, as discussed in Kronberger et al. (2024). Although, Garbrecht et al. (2021b) showed that the preferential search strategy can improve overall run time when the evaluation and selection steps performed in crossover and mutation are faster than the main evaluation and selection step, assuming it beneficially guides the algorithm's search. ...

The Inefficiency of Genetic Programming for Symbolic Regression
  • Citing Chapter
  • September 2024

... In contrast to classical regression methods, like linear and polynomial regression, an advantage of symbolic regression is that neither the model structure, nor its parameters have to be pre-defined. Additionally highly non-linear relationships can be expressed and the generated models can be easily manipulated and transformed into any expert system [63] . Symbolic regression involves finding the best model structure and its coefficients simultaneously. ...

Evolutionary Computation and Genetic Programming
  • Citing Chapter
  • July 2024

... After the first seminal papers on this topic [5][6][7], several emulators have been produced in the literature, emulating the output of Boltzmann solvers such as CAMB [8] or CLASS [9], with applications ranging from the Cosmic Microwave Background (CMB) [10][11][12][13][14], the linear matter power spectrum [11,[15][16][17][18][19], galaxy power spectrum multipoles [17,[19][20][21][22], and the galaxy survey angular power spectrum [23][24][25][26][27][28][29]. ...

A precise symbolic emulator of the linear matter power spectrum

Astronomy and Astrophysics

... where f (k, ·) ∈ C(R, R) for each k ∈ [1, N], L is a Jacobi operator given by Lu k = a k u k+1 + a k−1 u k−1 + b k u k , {a k } and {b k } are real valued sequences, ω ∈ R, and λ is a positive real parameter. Difference equations have played a crucial role in various fields of modern science, including computer science, biology and engineering applications in physical problems [1][2][3][4][5][6]. Therefore, studying difference equations is essential. ...

Learning Difference Equations With Structured Grammatical Evolution for Postprandial Glycaemia Prediction

IEEE Journal of Biomedical and Health Informatics

... This is possible with the use of e-graph data structure [43] created for the equality saturation algorithm, a technique used to alleviate the phase ordering problem in the optimization of computer programs during the compilation process. This technique was previously used in the context of symbolic regression in [11,24] to investigate the problem of unwanted overparameterization that increases the chance of a suboptimal fitting of the numerical parameters. The generated e-graph has another interesting feature that can be exploited by symbolic regression algorithms: it contains a database of patterns and equivalent expressions that can be easily matched against a new candidate expression. ...

Reducing Overparameterization of Symbolic Regression Models with Equality Saturation
  • Citing Conference Paper
  • July 2023

... It exploits the observation that the majority of dynamical systems exhibit a limited number of significant terms. This method utilized in various applications such as deducing biological models (Mangan et al., 2016), simulating and optimizing microalgal and cyanobacterial photo-production processes (Zhang et al., 2020), reconstructing chaotic and stochastic dynamical systems (Nguyen et al., 2020), physicsinformed learning (Corbetta, 2020), modeling a biological reactor (Lisci et al., 2021), identifying the governing model of COVID-19 (Ihsan, 2021), predicting blood glucose levels (Joedicke et al., 2022), modeling air pollutants (Rubio-Herrero et al., 2022), identifying digital twin systems (Wang et al., 2023), determining water distribution systems (Moazeni and Khazaei, 2023), and modeling bacterial zinc response (Sandoz et al., 2023). ...

Identifying Differential Equations for the Prediction of Blood Glucose using Sparse Identification of Nonlinear Systems
  • Citing Chapter
  • February 2023

Lecture Notes in Computer Science

... Depending on how the shape constraints are implemented into the expression search, they can either provide additional guidance or even reduce the search space. These benefits have already been shown in several studies, including [11,7,8,6,3,12,18,19,14]. Altogether, the expressions obtained have greater utility and reliability when they obey the domain-specific shape constraints. ...

Shape-Constrained Symbolic Regression with NSGA-III
  • Citing Chapter
  • February 2023

Lecture Notes in Computer Science

... However, this approach is practically viable only if the number of selected features remains low, such as below ten, as the search algorithm is still being performed with GP. Alternative methods include the fast function extraction [32,33] and differentiable GP approach [34], which aim to address the scalability challenges of SR but are not NN-based. ...

Symbolic Regression with Fast Function Extraction and Nonlinear Least Squares Optimization
  • Citing Chapter
  • February 2023

Lecture Notes in Computer Science