Conference Paper

Dynamics of Predictability and Variable Influences Identified in Financial Data Using Sliding Window Machine Learning

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this paper we analyze the dynamics of the predictability and variable interactions in financial data of the years 2007–2014. Using a sliding window approach, we have generated mathematical prediction models for various financial parameters using other available parameters in this data set. For each variable we identify the relevance of other variables with respect to prediction modeling. By applying sliding window machine learning we observe that changes of the predictability of financial variables as well as of influence factors can be identified by comparing modeling results generated for different periods of the last 8 years. We see changes of relationships and the predictability of financial variables over the last years, which corresponds to the fact that relationships and dynamics in the financial sector have changed significantly over the last decade. Still, our results show that the predictability has not decreased for all financial variables, indeed in numerous cases the prediction quality has even improved.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In the work of Winkler et al. (2015) a sliding window machine learning approach is proposed to analyze changes of predictability and variable relationships over time for a set of financial data, in order to gain insights concerning a complex real-world system's dynamics. With the same objective Zenisek et al. (2020) present an algorithm to evaluate variable interaction networks on streaming data in a sliding window fashion, which is illustrated in Figure 2. The VIN evaluation approach for analyzing streaming data is an algorithm which can be performed subsequent to the network modelling approach on which it relies. ...
Conference Paper
Full-text available
With the growing use of machine learning models in many critical domains, research regarding making these models, as well as their predictions, more explainable has intensified in the last few years. In this paper, we present extensions to the machine learning based data mining technique Variable Interaction Networks (VIN), to integrate existing domain knowledge and thus, enable more meaningful analysis. Several tests on data from a case study concerned with long-term monitored photovoltaic systems, verify the feasibility of our approach to provide valuable, human-interpretable insights. In particular, we show the successful application of root-cause detection in scenarios with changing system conditions.
Book
Full-text available
Genetic Algorithms and Genetic Programming: Modern Concepts and Practical Applications discusses algorithmic developments in the context of genetic algorithms (GAs) and genetic programming (GP). It applies the algorithms to significant combinatorial optimization problems and describes structure identification using HeuristicLab as a platform for algorithm development. The book focuses on both theoretical and empirical aspects. The theoretical sections explore the important and characteristic properties of the basic GA as well as main characteristics of the selected algorithmic extensions developed by the authors. In the empirical parts of the text, the authors apply GAs to two combinatorial optimization problems: the traveling salesman and capacitated vehicle routing problems. To highlight the properties of the algorithmic measures in the field of GP, they analyze GP-based nonlinear structure identification applied to time series and classification problems. Written by core members of the HeuristicLab team, this book provides a better understanding of the basic workflow of GAs and GP, encouraging readers to establish new bionic, problem-independent theoretical concepts. By comparing the results of standard GA and GP implementation with several algorithmic extensions, it also shows how to substantially increase achievable solution quality.
Chapter
Many optimization problems cannot be solved by classical mathematical optimization techniques due to their complexity and the size of the solution space. In order to achieve solutions of high quality though, heuristic optimization algorithms are frequently used. These algorithms do not claim to find global optimal solutions, but offer a reasonable tradeoff between runtime and solution quality and are therefore especially suitable for practical applications. In the last decades the success of heuristic optimization techniques in many different problem domains encouraged the development of a broad variety of optimization paradigms which often use natural processes as a source of inspiration (as for example evolutionary algorithms, simulated annealing, or ant colony optimization). For the development and application of heuristic optimization algorithms in science and industry, mature, flexible and usable software systems are required. These systems have to support scientists in the development of new algorithms and should also enable users to apply different optimization methods on specific problems easily. The architecture and design of such heuristic optimization software systems impose many challenges on developers due to the diversity of algorithms and problems as well as the heterogeneous requirements of the different user groups. In this chapter the authors describe the architecture and design of their optimization environment HeuristicLab which aims to provide a comprehensive system for algorithm development, testing, analysis and generally the application of heuristic optimization methods on complex problems.
Article
In this paper we describe the identification of variable interaction networks in a medical data set. The main goal is to generate mathematical models for standard blood parameters as well as tumor markers using other available parameters in this data set. For each variable we identify those variables that are most relevant for modeling it; relevance of a variable can in this context be defined via the frequency of its occurrence in models identified by evolutionary machine learning methods or via the decrease in modeling quality after removing it from the data set. Several data based modeling approaches implemented in HeuristicLab have been applied for identifying estimators for selected tumor markers and cancer diagnoses: Linear regression and support vector machines (optimized using evolutionary algorithms) as well as genetic programming.
Article
In this paper, we present an ensemble modeling approach for sentiment analysis using machine learning algorithms. The main goal of sentiment analysis is to develop estimators that are able to identify the sentiment orientation (positive, negative, or neutral) of sentences found in any arbitrary source. The novel approach presented here relies on the analysis of the words found in sentences and the formation of large sets of heterogeneous models, i.e., binary as well as multi-class classification models that are calculated by various different machine learning methods; these models shall represent the relationship between the presence of given words (or combination of words) and sentiments. All models trained during the learning phase are applied during the test phase and the final sentiment assessment is annotated with a confidence value that specifies, how reliable the models are regarding the presented decision. In the empirical part of this paper, we show results achieved using a German corpus of Amazon recensions and a set of machine learning methods (decision trees and adaptive boosting, Gaussian processes, random forests, k-nearest neighbor classification, support vector machines and artificial neural networks with evolutionary feature and parameter optimization, and genetic programming). Using a heterogeneous model ensemble learning approach that combines multi-class classifiers as well as binary classifiers, the classification accuracy can be increased significantly and the ratio of totally wrongly classified samples (i.e., those that are assigned to the completely opposite sentiment orientation) can be decreased significantly.
Article
Written by one of the preeminent researchers in the field, this book provides a comprehensive exposition of modern analysis of causation. It shows how causality has grown from a nebulous concept into a mathematical theory with significant applications in the fields of statistics, artificial intelligence, economics, philosophy, cognitive science, and the health and social sciences. Judea Pearl presents and unifies the probabilistic, manipulative, counterfactual, and structural approaches to causation and devises simple mathematical tools for studying the relationships between causal connections and statistical associations. The book will open the way for including causal analysis in the standard curricula of statistics, artificial intelligence, business, epidemiology, social sciences, and economics. Students in these fields will find natural models, simple inferential procedures, and precise mathematical definitions of causal concepts that traditional texts have evaded or made unduly complicated. The first edition of Causality has led to a paradigmatic change in the way that causality is treated in statistics, philosophy, computer science, social science, and economics. Cited in more than 5,000 scientific publications, it continues to liberate scientists from the traditional molds of statistical thinking. In this revised edition, Judea Pearl elucidates thorny issues, answers readers’ questions, and offers a panoramic view of recent advances in this field of research. Causality will be of interests to students and professionals in a wide variety of fields. Anyone who wishes to elucidate meaningful relationships from data, predict effects of actions and policies, assess explanations of reported events, or form theories of causal understanding and causal speech will find this book stimulating and invaluable.
Conference Paper
This contribution describes how symbolic regression can be used for knowledge discovery with the open-source software HeuristicLab. HeuristicLab includes a large set of algorithms and problems for combinatorial optimization and for regression and classification, including symbolic regression with genetic programming. It provides a rich GUI to analyze and compare algorithms and identified models. This contribution mainly focuses on specific aspects of symbolic regression that are unique to HeuristicLab, in particular, the identification of relevant variables and model simplification.
Article
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.
Article
This article describes the architecture and implementation of the genetic programming (GP) framework of HeuristicLab. In particular we focus on the core design goals, namely extensibility, usability, and performance optimization and explain our approach to reach these goals. The overall design, the encoding, interpretation, and evaluation of programs is described and code examples are given to explain core aspects of the framework. HeuristicLab is available as open source software at http://dev.heuristiclab.com.
Evolutionary System Identification: Modern Concepts and Practical Applications
  • S Winkler
Genetic programming of an algorithmic chemistry Ann Arbor
  • W Banzhaf
  • C O Lasarczyk
  • U Reilly
  • T Yu
  • R Riolo
  • B Worzel