To read the file of this research, you can request a copy directly from the authors.
Abstract
Fast Function Extraction (FFX) is a deterministic algorithm for solving symbolic regression problems. We improve the accuracy of FFX by adding parameters to the arguments of nonlinear functions. Instead of only optimizing linear parameters, we optimize these additional nonlinear parameters with separable nonlinear least squared optimization using a variable projection algorithm. Both FFX and our new algorithm is applied on the PennML benchmark suite. We show that the proposed extensions of FFX leads to higher accuracy while providing models of similar length and with only a small increase in runtime on the given data. Our results are compared to a large set of regression methods that were already published for the given benchmark suite.
In this paper we analyze the effects of using nonlinear least squares for parameter identification of symbolic regression models and integrate it as local search mechanism in tree-based genetic programming. We employ the Levenberg–Marquardt algorithm for parameter optimization and calculate gradients via automatic differentiation. We provide examples where the parameter identification succeeds and fails and highlight its computational overhead. Using an extensive suite of symbolic regression benchmark problems we demonstrate the increased performance when incorporating nonlinear least squares within genetic programming. Our results are compared with recently published results obtained by several genetic programming variants and state of the art machine learning algorithms. Genetic programming with nonlinear least squares performs among the best on the defined benchmark suite and the local search can be easily integrated in different genetic programming algorithms as long as only differentiable functions are used within the models.
Symbolic Regression tries to find a mathematical expression that describes the relationship of a set of explanatory variables to a measured variable. The main objective is to find a model that minimizes the error and, optionally, that also minimizes the expression size. A smaller expression can be seen as an interpretable model considered a reliable decision model. This is often performed with Genetic Programming which represents their solution as expression trees. The shortcoming of this algorithm lies on this representation that defines a rugged search space and contains expressions of any size and difficulty. These pose as a challenge to find the optimal solution under computational constraints. This paper introduces a new data structure, called Interaction-Transformation (IT), that constrains the search space in order to exclude a region of larger and more complicated expressions. In order to test this data structure, it was also introduced an heuristic called SymTree. The obtained results show evidence that SymTree are capable of obtaining the optimal solution whenever the target function is within the search space of the IT data structure and competitive results when it is not. Overall, the algorithm found a good compromise between accuracy and simplicity for all the generated models.
Symbolic regression is a common application for genetic programming (GP). This paper presents a new non-evolutionary technique
for symbolic regression that, compared to competent GP approaches on real-world problems, is orders of magnitude faster (taking
just seconds), returns simpler models, has comparable or better prediction on unseen data, and converges reliably and deterministically.
I dub the approach FFX, for Fast Function Extraction. FFX uses a recentlydeveloped machine learning technique, pathwise regularized
learning, to rapidly prune a huge set of candidate basis functions down to compact models. FFX is verified on a broad set
of real-world problems having 13 to 1468 input variables, outperforming GP as well as several state-of-the-art regression
techniques.
Keywordstechnology-symbolic regression-genetic programming, pathwise-regularization-real-world problems-machine learning-lasso-ridge regression-elastic net-integrated circuits
Symbolic regression is a powerful system identification technique in industrial scenarios where no prior knowledge on model structure is available. Such scenarios often require specific model properties such as interpretability, robustness, trustworthiness and plausibility, that are not easily achievable using standard approaches like genetic programming for symbolic regression. In this chapter we introduce a deterministic symbolic regression algorithm specifically designed to address these issues. The algorithm uses a context-free grammar to produce models that are parameterized by a non-linear least squares local optimization procedure. A finite enumeration of all possible models is guaranteed by structural restrictions as well as a caching mechanism for detecting semantically equivalent solutions. Enumeration order is established via heuristics designed to improve search efficiency. Empirical tests on a comprehensive benchmark suite show that our approach is competitive with genetic programming in many noiseless problems while maintaining desirable properties such as simple, reliable models and reproducibility.
Separable nonlinear least squares (SNLLS) problems arise frequently in many research fields, such as system identification and machine learning. The variable projection (VP) method is a very powerful tool for solving such problems. In this paper, we consider the regularization of ill-conditioned SNLLS problems based on the VP method. Selecting an appropriate regularization parameter is difficult because of the nonlinear optimization procedure. We propose to determine the regularization parameter using the weighted generalized cross validation (WGCV) method at every iteration. This makes the original objective function changing during the optimization procedure. To circumvent this problem, we use an inequation to produce a consistent demand of decreasing at successive iterations. The approximation of the Jacobian of the regularized problem is also discussed. The proposed regularized VP algorithm is tested by the parameter estimation problem of several statistical models. Numerical results demonstrate the effectiveness of the proposed algorithm.
Nonlinear least squares problems frequently arise for which the variables to be solved for can be separated into a linear and a nonlinear part. A variable projection algorithm has been developed recently which is designed to take advantage of the structure of a problem whose variables separate in this way. This paper gives a slightly more efficient and slightly more general version of this algorithm than has appeared earlier.
Where are we now? a large benchmark study of recent symbolic regression methods
Jan 2018
1183-1190
P Orzechowski
W La Cava
J H Moore
Orzechowski, P., La Cava, W., Moore, J.H.: Where are we now? a large benchmark
study of recent symbolic regression methods. In: Proceedings of the Genetic and
Evolutionary Computation Conference. pp. 1183-1190 (2018)