Technical ReportPDF Available

The Forecasting Toolbox of the MATLAB-Toolbox SciXMiner - Short manual

Authors:

Abstract and Figures

The Forecasting Toolbox of the MATLAB-Toolbox
SciXMiner
Short manual
Jorge Ángel González Ordiano, Ralf Mikut
Karlsruhe Institute of Technology (KIT), Institute for Automation and Applied Informatics
P.O. Box 3640, D-76021 Karlsruhe, Germany
Phone: ++49/721/608-25731, Fax: ++49/721/608-25702
Email: ralf.mikut@kit.edu
Beta version: Version 2020a (24.02.2020)
ii
Contents
Contents iii
1 Motivation 1
2 Installation 2
3 General remarks 3
3.1 Gettingstarted........................................ 3
3.2 Variables .......................................... 3
3.2.1 regr_single .................................. 3
3.2.2 cdf_bestfit .................................. 3
3.2.3 scenarioForecast .............................. 4
3.3 DemoProjects........................................ 4
3.4 Usecases .......................................... 4
3.5 Versions........................................... 5
4 Menu items and Control elements 6
4.1 MenuitemsForecasting.................................. 6
4.1.1 Forecasting Model (Regression) . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.1.2 IntervalForecast .................................. 6
4.1.3 ScenarioForecast.................................. 6
4.1.4 Parametric Probabilistic Forecast . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1.5 Hierarchical Probabilistic Forecast . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1.6 Validate Forecasting Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1.7 Help......................................... 7
4.2 Control elements for ’Forecasting’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5 Plugins 28
Bibliography 30
iii
1 Motivation
This forecasting toolbox offers different time series forecasting methods with a focus on special needs
of energy time series [7]. A short description of the toolbox can be found in [5], the quantile forecasting
methodology is explained in [5, 6]. This is a preliminary beta version that will be improved in the future.
The present contribution is supported by the Helmholtz Association under the Joint Initiative ”Energy
System 2050 — A Contribution of the Research Field Energy”.
SciXMiner [9] is open source software. The download page is
http://sourceforge.net/projects/SciXMiner/.
It is licensed under the conditions of the GNU General Public License (GNU-GPL) of The Free Software
Foundation (see http://www.fsf.org/).
This manual is organized as follows: Chapter 2 explains the installation procedure. Chapter 3 outlines the
implemented functionality followed by some recommendations for working with the toolbox SciXMiner.
Detailed information for the use of menu items and control elements (Chapter 4) follow.
2 Installation
The zipped toolbox has to be extracted in the directory application_specials of the SciXMiner
directory. The subdirectory structure must be preserved, leading to a new subdirectory forecasting.
The extension packages can be switched on and off by Extras - Choose application-specific extension
packages....
The toolbox requires at least SciXMiner Version 2020a (24.02.2020) .
3 General remarks
The forecasting toolbox offers a plethora of time series forecasting methods. These methods include
point, probabilistic, and scenario forecasting approaches. Additionally, options specifically designed to
be used during the forecasting of energy time series can also be found within the toolbox.
3.1 Getting started
The forecasting toolbox can be used on any SciXMiner project containing time series. To understand
how to create SciXMiner projects, please refer to the SciXMiner related documentation.
3.2 Variables
The forecasting toolbox uses the SciXMiner variable d_orgs. The dimensions of d_orgs represent
the following: the first, represents the dimension of a given time series (i.e. if a time series is univariate or
multivariate), the second, is the number measurements that form the time series, and the last, represents
the number of time series available in the project. It is important to mention that the toolbox functionality
depends on the existence of d_orgs.
Furthermore, the new toolbox not only modifies some existing variables, but also defines some new ones.
More in detail information about this is given in the next sections.
3.2.1 regr_single
The variable regr_single is the variable used in SciXMiner to save the information of an already
trained regression model. Therefore, the forecasting toolbox uses this variable to save the information of
all regression based forecasting models. However, since the forecasting toolbox modifies the structure of
some the information saved within the variable it is recommended that the users DO NOT save and load
regression based forecasting models as they would normally do with regression models. Instead the load
and save functionality available within the new toolbox should be used. When saving a regression based
forecasting model, the toolbox creates a file with a .fmodel extension that contains the regr_single
variable.
3.2.2 cdf_bestfit
The variable cdf_bestfit is a structure array containing the parametric probabilistic forecasts ob-
tained by the toolbox’s method. The variable contains the following fields:
4Chapter 3. General remarks
sample: Contains the type and parameters of the parametric distribution functions estimated at
each timestep. Additional information corresponding to each one of the estimated distributions is
also contained in this field.
rank: This field contains a rank of the distributions tested. The distributions are ranked from the
best fitting one to the worse. Additionally, the difference between the metric of each distribution
and the best fitting one are also contained in this field.
3.2.3 scenarioForecast
The variable scenarioForecast is a structure array containing the scenario forecasts obtained by
the toolbox’s method. The variable contains the following fields:
parameters: This field contains the parameters used to create the scenario forecasts.
values: The field contains all of the scenario forecasts created.
3.3 Demo Projects
Two demos projects can be found in the projects folder of the subfolder containing this ex-
tension. The demo project called hierarchicalForecastExample.prjz is used to test
the implemented hierarchical probabilistic forecasting method, while the demo project named
Settlement_experiments_M45_S45.prjz is used to test all other approaches. Testing the fore-
casting toolbox consists in running the test_forecasting_toolbox.batch batch file, which
uses specific macros for testing each one of the available methods. Note that the macros are named using
the following convention, i.e. test_{forecasting model being tested}.
3.4 Use cases
Selected SciXMiner applications are listed in Table 3.1.
Application References
Campus load forecasting [5]
Optimization for household bat-
teries
[2]
Optimization for batteries of
electric vehicles
[4]
Optimization for batteries of an
industrial campus
[3, 1]
Solar power forecasting without
weather data
[8, 10]
Table 3.1: Selected SciXMiner applications
3.5. Versions 5
3.5 Versions
Beta version Version 2020a (24.02.2020)
4 Menu items and Control elements
4.1 Menu items ’Forecasting’
4.1.1 Forecasting Model (Regression)
Menu element containing options related forecasting models based on regressions.
Create:
Click to create forecasting models based on regressions.
Apply:
Click to apply forecasting models based on regressions.
Create & Apply:
Click to create and apply forecasting models based on regressions.
Save:
Click to save forecasting models based on regressions.
Load:
Click to load forecasting models based on regressions.
View:
Click to view the forecasting models based on polynomial based regressions.
4.1.2 Interval Forecast
Menu element containing options related to interval forecasts.
Plot Interval Forecast:
Click to plot interval forecasts.
4.1.3 Scenario Forecast
Menu element containing options related to scenario forecasts.
Create Scenario Forecasts with QR:
Click to create scenario forecasts based on quantile regressions.
View Scenario Forecasts:
Click to view the scenario forecasts created.
6
4.2. Control elements for ’Forecasting’ 7
4.1.4 Parametric Probabilistic Forecast
Click to obtain parametric probabilistic forecasts.
4.1.5 Hierarchical Probabilistic Forecast
Menu element containing options related to hierarchical probabilistic forecasts.
Create:
Menu element containing options for creating hierarchical probabilistic forecasts.
Apply:
Menu element containing options for applying hierarchical probabilistic forecasts.
Apply QR Hierarchical Probabilistic Forecasts:
Click to apply hierarchical probabilistic forecasts based on quantile regressions.
4.1.6 Validate Forecasting Model
Validates a forecasting model with a selected proportion of training and test data. The proportion is
selected by Control element: Forecasting - Percentage of training data for validation. Here, a model
is created on the training data (begin of the time series) and validated on the test data (end of the time
series). The results are shown in the MATLAB workspace and plotted into files if Control element:
Forecasting - Plot validation results is selected.
4.1.7 Help
Shows the help file fo the Forecasting toolbox.
4.2 Control elements for ’Forecasting’
Forecast type:
Use to select the forecasting model to be created
Energy forecasting:
Check in case specific energy forecasting options are to be used.
Regression technique:
Use to select the data mining technique to be used (e.g., Polynomial or ANN)
Desired output time series:
Use to select the time series for which a forecast is to be obtained.
Forecast horizon:
Use the field to write the models forecast horizon in timesteps.
8Chapter 4. Menu items and Control elements
Figure 4.1: Control elements for Forecasting - Point Forecast- Polynomial
Input time series:
Use the field to write the indexes that correspond—within the project—to the time series that will
be used as input.
Feature selection:
Check if a Wrapper feature selection is to be applied during the training.
Lags:
Use the field to write the lags in timesteps of the input time series’ past values that the models
are going to use as features. For example, if values with lags ranging from 0 to 24 are used the
following has to be written in the edit field {0:1:24}. On the other hand, if values with specific
lags are required this can be written for instance, as {0,12,24} or {0}. The former specifies that
the values used are those with lags of 0, 12, and 24 and the later states that only values with
the specific lag of 0 are used. Additionally, this edit field also allows for independent definition
of features for each input time series used. For example writing {0:1:24};{0,12,24};0 specifies
different lags to be taken from the first, second and third time series. It needs to be mentioned,
that if the independent definition is used, the lags of each input time series have to be explicitly
defined.
Energy forecasting:
Use to select the type of energy time series being forecast (e.g., photovoltaic power, load, wind
power).
4.2. Control elements for ’Forecasting’ 9
Degree:
Use the field to write the maximal degree that the polynomial models are allowed to have.
P>0:
Check if a constraint assuring only positive forecast values is to be applied.
Features:
Use the field to write the number of features that will be selected.
Set a0 to zero:
Check if the offset of the polynomial models is to be set equal to zero.
Eliminate night values:
Use to select the approach for identifying night values. The selected approach is used to remove
night values from the training set and to set them automatically to zero during the models applica-
tion.
Percentage of training data for validation:
Defines the percentage of training data for a model validation by Forecasting - Validate Forecasting
Model.
Plot validation results:
Switches the plotting of results of model validation by Forecasting - Validate Forecasting Model.
10 Chapter 4. Menu items and Control elements
Figure 4.2: Control elements for Forecasting - Point Forecast- Artificial Neural Network (MLP)
Forecast type:
Use to select the forecasting model to be created
Energy forecasting:
Check in case specific energy forecasting options are to be used.
Regression technique:
Use to select the data mining technique to be used (e.g., Polynomial or ANN)
Desired output time series:
Use to select the time series for which a forecast is to be obtained.
Forecast horizon:
Use the field to write the models forecast horizon in timesteps.
Input time series:
Use the field to write the indexes that correspond—within the project—to the time series that will
be used as input.
Feature selection:
Check if a Wrapper feature selection is to be applied during the training.
Lags:
Use the field to write the lags in timesteps of the input time series’ past values that the models
4.2. Control elements for ’Forecasting’ 11
are going to use as features. For example, if values with lags ranging from 0 to 24 are used the
following has to be written in the edit field {0:1:24}. On the other hand, if values with specific
lags are required this can be written for instance, as {0,12,24} or {0}. The former specifies that
the values used are those with lags of 0, 12, and 24 and the later states that only values with
the specific lag of 0 are used. Additionally, this edit field also allows for independent definition
of features for each input time series used. For example writing {0:1:24};{0,12,24};0 specifies
different lags to be taken from the first, second and third time series. It needs to be mentioned,
that if the independent definition is used, the lags of each input time series have to be explicitly
defined.
Energy forecasting:
Use to select the type of energy time series being forecast (e.g., photovoltaic power, load, wind
power).
Hidden layers:
Use the field to write the number of hidden layers of the neural networks to be trained.
Neurons:
Use the field to write the number of neurons within the hidden layers of the neural networks to be
trained.
P>0:
Check if a constraint assuring only positive forecast values is to be applied.
Features:
Use the field to write the number of features that will be selected.
Eliminate night values:
Use to select the approach for identifying night values. The selected approach is used to remove
night values from the training set and to set them automatically to zero during the models applica-
tion.
Percentage of training data for validation:
Defines the percentage of training data for a model validation by Forecasting - Validate Forecasting
Model.
Plot validation results:
Switches the plotting of results of model validation by Forecasting - Validate Forecasting Model.
12 Chapter 4. Menu items and Control elements
Figure 4.3: Control elements for Forecasting - Polynomial Quantile Regression (QR) w. Pinball Loss-
Energy forecasting
Forecast type:
Use to select the forecasting model to be created
Energy forecasting:
Check in case specific energy forecasting options are to be used.
Desired output time series:
Use to select the time series for which a forecast is to be obtained.
Forecast horizon:
Use the field to write the models forecast horizon in timesteps.
Input time series:
Use the field to write the indexes that correspond—within the project—to the time series that will
be used as input.
Feature selection:
Check if a Wrapper feature selection is to be applied during the training.
Lags:
Use the field to write the lags in timesteps of the input time series’ past values that the models
are going to use as features. For example, if values with lags ranging from 0 to 24 are used the
following has to be written in the edit field {0:1:24}. On the other hand, if values with specific
4.2. Control elements for ’Forecasting’ 13
lags are required this can be written for instance, as {0,12,24} or {0}. The former specifies that
the values used are those with lags of 0, 12, and 24 and the later states that only values with
the specific lag of 0 are used. Additionally, this edit field also allows for independent definition
of features for each input time series used. For example writing {0:1:24};{0,12,24};0 specifies
different lags to be taken from the first, second and third time series. It needs to be mentioned,
that if the independent definition is used, the lags of each input time series have to be explicitly
defined.
Energy forecasting:
Use to select the type of energy time series being forecast (e.g., photovoltaic power, load, wind
power).
Degree:
Use the field to write the maximal degree that the polynomial models are allowed to have.
P>0:
Check if a constraint assuring only positive forecast values is to be applied.
Features:
Use the field to write the number of features that will be selected.
Set a0 to zero:
Check if the offset of the polynomial models is to be set equal to zero.
Eliminate night values:
Use to select the approach for identifying night values. The selected approach is used to remove
night values from the training set and to set them automatically to zero during the models applica-
tion.
Quantiles:
Use the field to write the probabilities that correspond to the quantile regressions to be estimated
(written values have to be between 0 and 1).
Quantiles constraint:
Check if constraints for avoiding quantile crossing are to be applied.
Reg:
Use to write the regularization value used during the application of the constraints for avoiding
quantile crossing. If the field is left empty the value is set to .
Percentage of training data for validation:
Defines the percentage of training data for a model validation by Forecasting - Validate Forecasting
Model.
Plot validation results:
Switches the plotting of results of model validation by Forecasting - Validate Forecasting Model.
14 Chapter 4. Menu items and Control elements
Figure 4.4: Control elements for Forecasting - Quantile Regressions (QR) with NNQF- Polynomial
Forecast type:
Use to select the forecasting model to be created
Energy forecasting:
Check in case specific energy forecasting options are to be used.
Regression technique:
Use to select the data mining technique to be used (e.g., Polynomial or ANN)
Desired output time series:
Use to select the time series for which a forecast is to be obtained.
Forecast horizon:
Use the field to write the models forecast horizon in timesteps.
Input time series:
Use the field to write the indexes that correspond—within the project—to the time series that will
be used as input.
Feature selection:
Check if a Wrapper feature selection is to be applied during the training.
Lags:
Use the field to write the lags in timesteps of the input time series’ past values that the models
4.2. Control elements for ’Forecasting’ 15
are going to use as features. For example, if values with lags ranging from 0 to 24 are used the
following has to be written in the edit field {0:1:24}. On the other hand, if values with specific
lags are required this can be written for instance, as {0,12,24} or {0}. The former specifies that
the values used are those with lags of 0, 12, and 24 and the later states that only values with
the specific lag of 0 are used. Additionally, this edit field also allows for independent definition
of features for each input time series used. For example writing {0:1:24};{0,12,24};0 specifies
different lags to be taken from the first, second and third time series. It needs to be mentioned,
that if the independent definition is used, the lags of each input time series have to be explicitly
defined.
Quantile features:
Check if an independent feature selection for each quantile regression is to be applied.
Energy forecasting:
Use to select the type of energy time series being forecast (e.g., photovoltaic power, load, wind
power).
Degree:
Use the field to write the maximal degree that the polynomial models are allowed to have.
P>0:
Check if a constraint assuring only positive forecast values is to be applied.
Features:
Use the field to write the number of features that will be selected.
Set a0 to zero:
Check if the offset of the polynomial models is to be set equal to zero.
Eliminate night values:
Use to select the approach for identifying night values. The selected approach is used to remove
night values from the training set and to set them automatically to zero during the models applica-
tion.
k:
Use the field to write number of nearest neighbors used by the nearest neighbors quantile filter.
Distance:
Defines the distance function used by the nearest neighbors quantile filter.
Weights:
Defines the weights applied to the features during the distance calculation of the nearest neighbors
quantile filter.
Quantiles:
Use the field to write the probabilities that correspond to the quantile regressions to be estimated
(written values have to be between 0 and 1).
Quantiles constraint:
Check if constraints for avoiding quantile crossing are to be applied.
Reg:
Use to write the regularization value used during the application of the constraints for avoiding
quantile crossing. If the field is left empty the value is set to .
16 Chapter 4. Menu items and Control elements
Percentage of training data for validation:
Defines the percentage of training data for a model validation by Forecasting - Validate Forecasting
Model.
Plot validation results:
Switches the plotting of results of model validation by Forecasting - Validate Forecasting Model.
4.2. Control elements for ’Forecasting’ 17
Figure 4.5: Control elements for Forecasting - Quantile Regressions (QR) with NNQF- Artificial Neural
Network (MLP)
Forecast type:
Use to select the forecasting model to be created
Energy forecasting:
Check in case specific energy forecasting options are to be used.
Regression technique:
Use to select the data mining technique to be used (e.g., Polynomial or ANN)
Desired output time series:
Use to select the time series for which a forecast is to be obtained.
Forecast horizon:
Use the field to write the models forecast horizon in timesteps.
Input time series:
Use the field to write the indexes that correspond—within the project—to the time series that will
be used as input.
Feature selection:
Check if a Wrapper feature selection is to be applied during the training.
Lags:
Use the field to write the lags in timesteps of the input time series’ past values that the models
18 Chapter 4. Menu items and Control elements
are going to use as features. For example, if values with lags ranging from 0 to 24 are used the
following has to be written in the edit field {0:1:24}. On the other hand, if values with specific
lags are required this can be written for instance, as {0,12,24} or {0}. The former specifies that
the values used are those with lags of 0, 12, and 24 and the later states that only values with
the specific lag of 0 are used. Additionally, this edit field also allows for independent definition
of features for each input time series used. For example writing {0:1:24};{0,12,24};0 specifies
different lags to be taken from the first, second and third time series. It needs to be mentioned,
that if the independent definition is used, the lags of each input time series have to be explicitly
defined.
Quantile features:
Check if an independent feature selection for each quantile regression is to be applied.
Energy forecasting:
Use to select the type of energy time series being forecast (e.g., photovoltaic power, load, wind
power).
Hidden layers:
Use the field to write the number of hidden layers of the neural networks to be trained.
Neurons:
Use the field to write the number of neurons within the hidden layers of the neural networks to be
trained.
P>0:
Check if a constraint assuring only positive forecast values is to be applied.
Features:
Use the field to write the number of features that will be selected.
Eliminate night values:
Use to select the approach for identifying night values. The selected approach is used to remove
night values from the training set and to set them automatically to zero during the models applica-
tion.
k:
Use the field to write number of nearest neighbors used by the nearest neighbors quantile filter.
Distance:
Defines the distance function used by the nearest neighbors quantile filter.
Weights:
Defines the weights applied to the features during the distance calculation of the nearest neighbors
quantile filter.
Quantiles:
Use the field to write the probabilities that correspond to the quantile regressions to be estimated
(written values have to be between 0 and 1).
Quantiles constraint:
Check if constraints for avoiding quantile crossing are to be applied.
4.2. Control elements for ’Forecasting’ 19
Reg:
Use to write the regularization value used during the application of the constraints for avoiding
quantile crossing. If the field is left empty the value is set to .
Percentage of training data for validation:
Defines the percentage of training data for a model validation by Forecasting - Validate Forecasting
Model.
Plot validation results:
Switches the plotting of results of model validation by Forecasting - Validate Forecasting Model.
20 Chapter 4. Menu items and Control elements
Figure 4.6: Control elements for Forecasting - Interval Forecasts
Forecast type:
Use to select the forecasting model to be created
Desired output time series:
Use to select the time series for which a forecast is to be obtained.
Lower bound time series:
Use the field to write the indexes of the time series that will form the lower bounds of the interval
forecasts.
Upper bound time series:
Use the field to write the indexes of the time series that will form the upper bounds of the interval
forecasts.
Corresponding interval probabilities:
Use the field to write the probabilities of the interval forecasts being created.
Starting timestep:
Use the field to write the timestep in which the interval forecasts begin.
Ending timestep:
Use the field to write the timestep in which the interval forecasts end.
4.2. Control elements for ’Forecasting’ 21
Percentage of training data for validation:
Defines the percentage of training data for a model validation by Forecasting - Validate Forecasting
Model.
Plot validation results:
Switches the plotting of results of model validation by Forecasting - Validate Forecasting Model.
22 Chapter 4. Menu items and Control elements
Figure 4.7: Control elements for Forecasting - Parametric Probabilistic Forecast with QRs
Forecast type:
Use to select the forecasting model to be created
TSs containing the quantile estimates:
Use to write the indexes that correspond to the time series containing the quantile estimates that
will be used to find the best fitting parametric distributions
Max CDF argument:
Notice that the method in which this option is available requires estimates of non-parametric CDFs!
Use this field to define the maximal argument that the non-parametric CDFs are allowed to have.
If left empty this value is calculated independently in a timestep-wise basis (i.e. the recommended
setting).
Min CDF argument:
Notice that the method in which this option is available requires estimates of non-parametric CDFs!
Use this field to define the minimal argument that the non-parametric CDFs are allowed to have.
If left empty this value is calculated independently in a timestep-wise basis (i.e. the recommended
setting).
Corresponding CDF values:
Use the field to write the corresponding probabilities of the quantile estimates that will be used to
find the best fitting parametric distributions.
4.2. Control elements for ’Forecasting’ 23
Parametric CDF to be fitted:
The method containing this option tests for every timestep various parametric distributions and
selects the best fitting one (i.e. the default setting Find Best Fit). However, if the user requires the
same parametric distribution for every timestep, this option can be used. The option allows the
selection of the parametric distribution, whose parameters are to be estimated in a timestep-wise
basis using a method that is similar to the method of moments.
Technique to find best fit:
Use to select the method the parametric CDFs that best fit the quantile estimates in a timestep-wise
basis. The default option is Least Squares, which bases its selection on the mean squared error.
TS representing an indicator function:
Use the field to write the index of a time series indicating the timesteps in which the estimation
of the uncertainty is not required (e.g., night values of a photovoltaic power time series). The
indicator time series should only contain values between zero and one and it should be one for
the timesteps in which no uncertainty description is needed. The field can be left empty if an
uncertainty description is to obtained for all timesteps.
Apply quantiles of the parametric CDF fitted:
Check if new time series containing the quantiles of the parametric CDFs estimated at each
timestep are to be obtained and saved.
All quantiles <=:
Use the field to write the maximal allowed value for all quantile estimates. In other words, all
estimates that are greater than the given threshold are replaced by the threshold. If the field is left
empty no constraint is applied.
All quantiles >=:
Use the field to write the minimal allowed value for all quantile estimates. In other words, all
estimates that are lower than the given threshold are replaced by the threshold. If the field is left
empty no constraint is applied.
Compare to optimal fit:
Check if a comparison between the estimated parametric CDFs and those that could be obtained
by fitting the CDFS to the quantiles used should be done.
Percentage of training data for validation:
Defines the percentage of training data for a model validation by Forecasting - Validate Forecasting
Model.
Plot validation results:
Switches the plotting of results of model validation by Forecasting - Validate Forecasting Model.
24 Chapter 4. Menu items and Control elements
Figure 4.8: Control elements for Forecasting - Scenario Forecast with QRs
Forecast type:
Use to select the forecasting model to be created
Begin scenario forecast at k =:
Use the field to write the timestep in which the first scenario forecast should begin.
Number of scenarios per scenario forecast:
Use the field to write the number of scenarios that will form the scenario forecasts (A scenario
is a possible realization of a time series’ future; A scenario forecast is a collection of possible
scenarios)
Scenario length:
Use the field to write the length (in timesteps) of the scenarios that will be created.
Number of scenario forecasts:
Use the field to write the number of scenario forecasts that will be calculated.
Timesteps for new scenario forecast:
Use the field to the number of timesteps between each of the calculated scenario forecasts. In other
words, after how many timesteps a new scenario forecast is to be obtained.
TS representing indicator function:
Use the field to write the index of a time series indicating the timesteps in which the estimation
of the uncertainty is not required (e.g., night values of a photovoltaic power time series). The
4.2. Control elements for ’Forecasting’ 25
indicator time series should only contain values between zero and one and it should be one for
the timesteps in which no uncertainty description is needed. The field can be left empty if an
uncertainty description is to obtained for all timesteps.
Max CDF argument:
Notice that the method in which this option is available requires estimates of non-parametric CDFs!
Use this field to define the maximal argument that the non-parametric CDFs are allowed to have.
If left empty this value is calculated independently in a timestep-wise basis (i.e. the recommended
setting).
Min CDF argument:
Notice that the method in which this option is available requires estimates of non-parametric CDFs!
Use this field to define the minimal argument that the non-parametric CDFs are allowed to have.
If left empty this value is calculated independently in a timestep-wise basis (i.e. the recommended
setting).
Apply quantiles of the scenario forecasts:
Check if new time series containing the empirical quantiles of the scenario values at each timestep
are to be obtained and saved.
All quantiles <=:
Use the field to write the maximal allowed value for all quantile estimates. In other words, all
estimates that are greater than the given threshold are replaced by the threshold. If the field is left
empty no constraint is applied.
All quantiles >=:
Use the field to write the minimal allowed value for all quantile estimates. In other words, all
estimates that are lower than the given threshold are replaced by the threshold. If the field is left
empty no constraint is applied.
Percentage of training data for validation:
Defines the percentage of training data for a model validation by Forecasting - Validate Forecasting
Model.
Plot validation results:
Switches the plotting of results of model validation by Forecasting - Validate Forecasting Model.
26 Chapter 4. Menu items and Control elements
Figure 4.9: Control elements for Forecasting - Hierarchical Probabilistic Forecasts with QRs
Forecast type:
Use to select the forecasting model to be created
Folder to save regressions:
Use the field to write the folder in which all regression models (i.e. forecasting models) trained
are going to be saved.
Time series to be forecast:
Use the field to write the indexes of all time series for which their aggregated probabilistic forecast
is required.
Correlation threshold:
Use the field to write the minimal correlation that two time series should have for them to be
assumed as dependent. If the field is left empty all considered time series’ are assumed to be
dependent.
Correlation models regression technique:
Use to select the data mining technique for estimating the correlation models. These models are
regressions that describe the relationship between the future values of correlated time series.
Correlation model parameters:
Use the field to write the hyperparameters necessary for training the correlation models. In the case
of polynomials two numbers are needed; the first is the maximal allowed degree and the second is
the number of features to be selected. Analogously, in the case of ANNs two numbers are needed;
4.2. Control elements for ’Forecasting’ 27
the first is the number of hidden layers and the second is the number of hidden neurons in each
hidden layer.
Percentage of training data for validation:
Defines the percentage of training data for a model validation by Forecasting - Validate Forecasting
Model.
Plot validation results:
Switches the plotting of results of model validation by Forecasting - Validate Forecasting Model.
5 Plugins
Negativ -> Null or NaN (Neg2ZeroNaN): Replace negative values with 0 or NaNs
Function name: plugin_ts_negativ_to_zeroNaN.m
Type: TS
Time series: 1 inputs, 1 outputs, Segments possible: none
Single features: 0 inputs, 0 outputs
Images: 0 inputs, 0 outputs
Direct callback: none
Number of parameters: 1
Parameter: Value for replacing the negative values (1: replace with zero, 2: replace with
NaN)
Periodic Integral (PeriodInt): Integrate a time series for a given period of time
Function name: plugin_ts_integral_over_period.m
Type: TS
Time series: 1 inputs, 1 outputs, Segments possible: yes
Single features: 0 inputs, 0 outputs
Images: 0 inputs, 0 outputs
Direct callback: none
Number of parameters: 1
Parameter: Period, Reset Integral (Number of timesteps defining a period, Boolean
defining if the integral has to be set to zero again after the given period)
Remove NaNs with linear Interpolation (NaNfree): Replaces NaN values with values estimated
using a linear interpolation
Function name: plugin_nan_behandlung.m
Type: TS
Time series: 1 inputs, 1 outputs, Segments possible: none
Single features: 0 inputs, 0 outputs
Images: 0 inputs, 0 outputs
Direct callback: none
Number of parameters: 0
28
29
Shift Min -> Zero (ShiftMin2Zero): Add bias to make the time series have zero as a mimimum
Function name: plugin_ts_shift_min2zero.m
Type: TS
Time series: 1 inputs, 1 outputs, Segments possible: none
Single features: 0 inputs, 0 outputs
Images: 0 inputs, 0 outputs
Direct callback: none
Number of parameters: 0
Bibliography
[1] R. R. Appino. Scheduling of Energy Storage using Probabilistic Forecasts and Energy-based Ag-
gregated Models. PhD thesis, Karlsruhe Institute of Technology, 2019.
[2] R. R. Appino, J. Á. González Ordiano, R. Mikut, T. Faulwasser, and V. Hagenmeyer. On the use
of probabilistic forecasts in scheduling of renewable energy sources coupled to storages. Applied
Energy, 210:1207 – 1218, 2018.
[3] R. R. Appino, J. Á. González Ordiano, N. Munzke, R. Mikut, T. Faulwasser, and V. Hagenmeyer.
Assessment of a scheduling strategy for dispatching prosumption of an industrial campus. In
Proc., Internationaler ETG-Kongress 2019, 08. – 09.05.2019, Esslingen am Neckar, pages 289–
294. VDE-Verlag, 2019.
[4] R. R. Appino, M. Muñoz-Ortiz, J. Á. González Ordiano, R. Mikut, V. Hagenmeyer, and
T. Faulwasser. Reliable dispatch of renewable generation via charging of time-varying pev pop-
ulations. IEEE Transactions on Power Systems, 34(2):1558–1568, 2019.
[5] J. Á. González Ordiano. New Data-Driven Probabilistic Forecasting Methods with Applications in
Energy Systems. PhD thesis, Karlsruher Institut für Technologie (KIT), 2019.
[6] J. Á. González Ordiano, L. Gröll, R. Mikut, and V. Hagenmeyer. Probabilistic energy forecast-
ing using the nearest neighbors quantile filter and quantile regression. International Journal of
Forecasting, 2019.
[7] J. Á. González Ordiano, S. Waczowicz, V. Hagenmeyer, and R. Mikut. Energy forecasting tools
and services. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(2):e1235,
2018.
[8] J. Á. González Ordiano, S. Waczowicz, M. Reischl, R. Mikut, and V. Hagenmeyer. Photovoltaic
power forecasting using simple data-driven models without weather data. Computer Science -
Research and Development, 32(1-2):237–246, 2017.
[9] R. Mikut, A. Bartschat, W. Doneit, J. Á. González Ordiano, B. Schott, J. Stegmaier, S. Waczow-
icz, and M. Reischl. The MATLAB toolbox SciXMiner: User’s manual and programmer’s guide.
Technical report, arXiv:1704.03298, 2017.
[10] V. Sharma, U. Cali, V. Hagenmeyer, R. Mikut, and J. Á. González Ordiano. Numerical weather
prediction data free solar power forecasting with neural networks. In Proc., Ninth International
Conference on Future Energy Systems, Karlsruhe, Germany, pages 604–609. ACM, New York,
NY, USA, 2018.
30
... For the nodes within the second group we train a series of quantile regressions that estimate 99 different quantiles ( = 0.01, 0.02, … , 0.99) of the nodes' active power one, two, and three hours in advance using as input the active power measurements of the past two days. These regressions are all polynomials of maximum degree two trained using the matlab open source toolbox scixminer [41] and the method described in [14]. As previously mentioned, the ppf uses as input the pce coefficients and bases of the future active and reactive power. ...
... Note that these quantile regressions have the same structure as the ones used for the power. They are polynomials of maximal degree two that take the current measurements of the past two days as input and that are trained using the matlab open source toolbox scixminer [41] and the method described in [14]. The presented approach also allows us to obtain correlated samples of the state defining variables that we can then use to forecast the probability of joint events, without having to estimate the joint distribution. ...
Article
Full-text available
The uncertainty associated with renewable energies creates challenges in the operation of distribution grids. One way for Distribution System Operators to deal with this is the computation of probabilistic forecasts of the full state of the grid. Recently, probabilistic forecasts have seen increased interest for quantifying the uncertainty of renewable generation and load. However, individual probabilistic forecasts of the state defining variables do not allow the prediction of the probability of joint events, for instance, the probability of two line flows exceeding their limits simultaneously. To overcome the issue of estimating the probability of joint events, we present an approach that combines data-driven probabilistic forecasts (obtained more specifically with quantile regressions) and probabilistic power flow. Moreover, we test the presented method using data from a real-world distribution grid that is part of the Energy Lab 2.0 of the Karlsruhe Institute of Technology and we implement it within a state-of-the-art computational framework.
Conference Paper
Full-text available
The worldwide increase in renewable energy penetration levels has made accuracy, availability, and affordability of wind and solar energy forecasting systems an integral part of the modern power grids. The present paper describes an approach to forecasting one-day-ahead photovoltaic (PV) power generation without the use of numerical weather prediction (NWP) data. The presented approach uses a closed loop non-linear autoregressive artificial neural network (CL-NAR-ANN) model with only the historical generated PV power data as input. In case of emergency, if the communication channel with the weather provider fails, the whole forecasting system runs a risk of failing. Also, purchasing NWP data might be too expensive for smaller utilities. In such situations, NWP data free models can provide cost-effective and reasonably accurate PV power forecasts, which can act as a good backup solution. Furthermore, the model is evaluated using a dataset from the Global Energy Forecasting Competition of 2014 (GEFCom14) and its results are compared to other data-driven models such as polynomial and artificial neural network (ANN) models with and without NWP data as input. The results suggest that the CL-NAR-ANN model delivers acceptable forecasts and outperforms other NWP free models by a margin of 8% in terms of root mean square error, hence supporting the possibility of obtaining acceptable forecasts using the CL-NAR-ANN.
Article
Full-text available
Electric energy generation from renewable energy sources is generally non-dispatchable due to its intrinsic volatility. Therefore, its integration into electricity markets and in power system operations is often based on volatility-compensating energy storage systems. Scheduling and control of this kind of coupled systems is usually based on hierarchical control and optimization. On the upper level, one solves an optimization problem to compute a dispatch schedule and a coherent allocation of energy reserves. On the lower level, one performs online adjustments of the dispatch schedule using, for example, model predictive control. In the present paper, we propose a formulation of the upper level optimization based on data-driven probabilistic forecasts of the power and energy output of the uncontrollable loads and generators dependent on renewable energy sources. Specifically, relying on probabilistic forecasts of both \emph{power and energy profiles} of the uncertain demand/generation, we propose a novel framework to ensure the online feasibility of the dispatch schedule with a given security level. The efficacy of the proposed scheme is illustrated by simulations based on real household production and consumption data.
Article
Full-text available
The Matlab toolbox SciXMiner is designed for the visualization and analysis of time series and features with a special focus to classification problems. It was developed at the Institute of Applied Computer Science of the Karlsruhe Institute of Technology (KIT), a member of the Helmholtz Association of German Research Centres in Germany. The aim was to provide an open platform for the development and improvement of data mining methods and its applications to various medical and technical problems. SciXMiner bases on Matlab (tested for the version 2017a). Many functions do not require additional standard toolboxes but some parts of Signal, Statistics and Wavelet toolboxes are used for special cases. The decision to a Matlab-based solution was made to use the wide mathematical functionality of this package provided by The Mathworks Inc. SciXMiner is controlled by a graphical user interface (GUI) with menu items and control elements like popup lists, checkboxes and edit elements. This makes it easier to work with SciXMiner for inexperienced users. Furthermore, an automatization and batch standardization of analyzes is possible using macros. The standard Matlab style using the command line is also available. SciXMiner is an open source software. The download page is http://sourceforge.net/projects/SciXMiner. It is licensed under the conditions of the GNU General Public License (GNU-GPL) of The Free Software Foundation.
Article
Full-text available
The present contribution offers evidence regarding the possibility of obtaining reasonable photovoltaic power forecasts without using weather data and with simple data-driven models. The lack of weather data as input stems from the fact that the constant obtainment of forecast weather data might become too expensive or that communication with weather services might fail, but still accurate planning and scheduling decisions have to be conducted. Therefore, accurate one-day ahead forecasting models with only information of past generated power as input for offline photovoltaic systems or as backup in case of communication failures are of interest. The results contained in the present contribution, obtained using a freely available dataset, provide a baseline with which more complex forecasting models can be compared. Additionally, it will also be shown that the presented weather-free data-driven models provide better forecasts than a trivial persistence technique for different forecast horizons. The methodology used in the present work for the data preprocessing and the creation and validation of forecasting models has a generalization capacity and thus can be used for different types of time series as well as different data mining techniques.
Article
Parametric quantile regression is a useful tool for obtaining probabilistic energy forecasts. Nonetheless, traditional quantile regressions may be complicated to obtain using complex data mining techniques (e.g., artificial neural networks), since they are trained using a non-differentiable cost function. This article presents a method that uses a new nearest neighbors quantile filter to obtain quantile regressions independently of the data mining technique utilized and without the non-differentiable cost function. This method is subsequently validated using the dataset from the 2014 Global Energy Forecasting Competition. The results show that the method presented here is able to solve the competition’s task with a similar accuracy to the competition’s winner and in a similar timeframe, but requiring a much less powerful computer. This property may be relevant in an online forecasting service for which the fast computation of probabilistic forecasts using less powerful machines is required.
Conference Paper
Effective scheduling of the power exchange with an external grid is of interest for clusters of devices (including consumption , generation and storage) that aim to participate in energy markets. Scheduling is often performed by means of optimization-based algorithms considering uncertain forecasts of inflexible generation and consumption. In this paper, we assess the performance of a previously proposed scheduling algorithm-based on non-parametric probabilistic forecasts both in terms of power and energy-in a realistic industrial campus setting. Simulation results using real data from the KIT north campus underpin the applicability of the proposed algorithm. 1 Introduction Energy markets and power system operation are undergoing significant changes, due to the increasing diffusion of distributed energy resources. In particular, in the future an increasing number of prosumers will be able to act directly on the energy markets. This is of interest specifically in the case of large prosumers, such as an industrial campus equipped with photovoltaic generation. Nevertheless , acting on an energy market implies a commitment to a given level of power exchange with the grid, in accordance to a pre-computed dispatch schedule [1]. The volatility surrounding the power output of some distributed energy resources-such as photovoltaic generators-causes difficulties in realizing the committed level of power exchange. Distributed storage allocated at the prosumption side could provide adequate power and energy reserves to compensate for generation and consumption volatility, thus alleviating the problem [2]. However, the high hardware cost of storage might requires using it to multiple ends. For instance, a combination of energy arbitrage and volatility compensation can be pursued. Computation of a dispatch schedule that targets two purposes at once is complicated ; thus, automated optimization-based scheduling algorithms are often applied. In [3], a promising scheduling algorithm is proposed combining recent advancement in terms of probabilistic forecast-cf. [4]-and numerical optimization. Specifically, in [3] a probabilistic energy balance is introduced to account for the uncertainty affecting the state of charge. Several investigations have already been conducted to test the performance of the algorithm in comparison with other scheduling procedures, see [5] and [6]. However, therein the analysis is limited to a small-size solar generator (of the order of 10 kW peak), with an arbitrary cost function. In the present paper, we evaluate the efficacy of the algorithm from [3] in a different setting. To this end, we consider a large industrial campus acting on the market. Additionally, we include the use of external inputs in the forecasting process, such as weather infor-Manipulated Inflexible Figure 1 Schematic representation of the campus system. mation. Finally, we evaluate the energy arbitrage using a realistic cost function based on the actual day-ahead prices in the German electric energy market. 2 Models and Requirements We consider a test case composed of three main elements: • the aggregated electric load of a medium voltage sub-station serving office buildings; • a large-scale photovoltaic (PV) generator; • a large-scale Battery Energy Storage System (BESS). For the purpose of this work-i.e. the computation of a reliable dispatch schedule of active power exchange with the utility grid-we consider a simplified model that lumps these three elements in a one node, accounting solely for the exchange of active power at the interfaces with the high voltage transmission grid. This is essentially a power balance among the various components, see Fig. 1. Considering a discrete time setting with time step δ = 1h and time index k, we denote the average active power exchange with the grid over time step k with g(k), the average active power exchange with the BESS over time step k with p(k), the average load (active power) over time step k with l b (k), and the average PV generation over time step k with l PV (k).
Article
The inherent storage of plug-in electric vehicles is likely to foster the integration of intermittent generation from renewable energy sources into existing power systems. To the end of achieving dispatchability of a system composed of plug-in electric vehicles and intermittent generation, we propose a three-stage scheme. The main difficulties in dispatching such a system are the uncertainties inherent to intermittent generation and the time-varying aggregation of vehicles. We propose to address the former by means of probabilistic forecasts, while we approach the latter with separate stage-specific models. Specifically, we first compute a dispatch schedule, using probabilistic forecasts together with an aggregated dynamic model of the system. The power output of the single devices are set subsequently using deterministic forecasts and device-specific models. We draw upon a simulation study based on real data of generation and vehicle traffic to validate our findings.
Article
The increasing complexity of the power grid and the continuous integration of volatile renewable energy systems on all aspects of it have made more precise forecasts of both energy supply and demand necessary for the future Smart Grid. Yet, the ever increasing volume of tools and services makes it difficult for users (e.g., energy utility companies) and researchers to obtain even a general sense of what each tool or service offers. The present contribution provides an overview and categorization of several energy‐related forecasting tools and services (specifically for load and volatile renewable power), as well as general information regarding principles of time series, load, and volatile renewable power forecasting. WIREs Data Mining Knowl Discov 2018, 8:e1235. doi: 10.1002/widm.1235 This article is categorized under: Application Areas > Business and Industry Application Areas > Data Mining Software Tools Technologies > Prediction
Scheduling of Energy Storage using Probabilistic Forecasts and Energy-based Aggregated Models
  • R R Appino
R. R. Appino. Scheduling of Energy Storage using Probabilistic Forecasts and Energy-based Aggregated Models. PhD thesis, Karlsruhe Institute of Technology, 2019.
New Data-Driven Probabilistic Forecasting Methods with Applications in Energy Systems
  • J Á González Ordiano
J. Á. González Ordiano. New Data-Driven Probabilistic Forecasting Methods with Applications in Energy Systems. PhD thesis, Karlsruher Institut für Technologie (KIT), 2019.