PresentationPDF Available

Probabilistic Forecasting with nnetsauce (using Density Estimation, Bayesian inference, Conformal prediction and Vine copulas)

Authors:

Abstract and Figures

Probabilistic Forecasting with nnetsauce (using Density Estimation, Bayesian inference, Conformal prediction and Vine copulas)
Content may be subject to copyright.
Probabilistic Forecasting with nnetsauce (using
Density Estimation, Bayesian inference, Conformal
prediction and Vine copulas)
T. Moudiki (thierrymoudiki.github.io)
2024-07-26
1 / 49
Context
Quasi-randomized neural networks (QRNs) applied to time
series lags for forecasting
Uncertainty quantification using Kernel Density Estimation,
Bayesian inference, Conformal prediction and Vine copulas
Implemented in Python package nnetsauce version 0.23.0
2 / 49
Plan
1 - Key components of nnetsauce forecasting
1 - 1 Quasi-randomized neural networks (QRNs)
1 - 2 Uncertainty quantification in forecasting
2 - QRN forecasting with nnetsauce
2 - 1 nnetsauce’s description (Python version)
2 - 2 Install+import Python packages (including nnetsauce)
2 - 3 Import data for the demo
2 - 4 Using the fit +predict interface
2 - 5 Using GPUs
2 - 6 Time series cross-validation
2 - 7 AutoML with LazyMTS
3 / 49
Plan
1 - Key components of nnetsauce forecasting
1 - 1 Quasi-randomized neural networks (QRNs)
1 - 2 Uncertainty quantification in forecasting
2 - QRN forecasting with nnetsauce
2 - 1 nnetsauce’s description (Python version)
2 - 2 Install+import Python packages (including nnetsauce)
2 - 3 Import data for the demo
2 - 4 Using the fit +predict interface
2 - 5 Using GPUs
2 - 6 Time series cross-validation
2 - 7 AutoML with LazyMTS
4 / 49
1 - Key components of nnetsauce forecasting
1 - 1 Quasi-randomized neural networks (QRNs)
Figure 1: QRN principle
See also this doc
A simple case: base learner = Linear Regression (next page)
5 / 49
1 - 1 - Quasi-randomized neural networks (QRNs)
Simple case:base learner = Linear Regression; yRn, to be
explained by X(j),j {1,...,p}
y=β0+
p
X
j=1
βjX(j)+
L
X
l=1
γIg
p
X
j=1
W(j,l)X(j)
+ϵ
With:
g:activation function nonlinearity
L: number of nodes in the hidden layer
W(j,I), hidden layer: pseudo/quasi-random
Quasi-random: designed to cover the space parsimoniously
secret sauce: Layer normalization (centering and scaling
twice)
βjand γI: linear model coefficients
ϵ: residuals
6 / 49
1 - 1 - Quasi-randomized neural networks (QRNs)
QRNs applied to time series
Response y= most recent time series observations
Covariates X= time series lags
base learner: can be any Machine Learning model
Multivariate forecasting case: the base learner shared by all
the time series
7 / 49
1 - 2 Uncertainty quantification in forecasting
Point forecasts/Uncertainty quantification
Point forecasts: cool, but not very informative. How wrong
can we be, based on the assumptions that we made: answers
how “certain” can we be about the forecast?
Uncertainty quantification needed: prediction intervals
and/or predictive simulations.
prediction intervals: point forecast +/- a term (with a level
of confidence)
predictive simulations: future scenarios for the variables of
interest
8 / 49
1 - 2 Uncertainty quantification in forecasting
In nnetsauce
Based on:
Bayesian priors
In-sample residuals = model fit - true observation on the
whole training set
Calibrated residuals = model fit - true observation on a
held-out calibration set
(Vine) Copulas (since nnetsauce v0.23.0)
Calibrated residuals used in
nnetsauce
for methods based on
sequential split conformal prediction (more on this later)
9 / 49
1 - 2 Uncertainty quantification in forecasting
Short focus on sequential split conformal prediction (see also
https://github.com/thierrymoudiki/2024-07-17-scp-block-
bootstrap)
10 / 49
1 - 2 Uncertainty quantification in forecasting
Short focus on copulas (source vine-copula.org)
Figure 2: Copulas 11 / 49
1 - 2 Uncertainty quantification in forecasting
Recap
In nnetsauce version 0.23.0:
Via aBayesian base learner
Via aconformalized base learner
Via in-sample residuals for methods based on:
parametric residuals distribution inference (gaussian)
density estimation and simulation of residuals (kde)
bootstrap resampling (bootstrap and block-bootstrap)
vine copulas (vine-*)
Via calibrated residuals for methods based on sequential split
conformal prediction (SCP) (scp*-kde,scp*-bootstrap,
scp*-block-bootstrap,scp*-vine-*)
12 / 49
Plan
1 - Key components of nnetsauce forecasting
1 - 1 Quasi-randomized neural networks (QRNs)
1 - 2 Uncertainty quantification in forecasting
2 - QRN forecasting with nnetsauce
2 - 1 nnetsauce’s description (Python version)
2 - 2 Install+import Python packages (including nnetsauce)
2 - 3 Import data for the demo
2 - 4 Using the fit +predict interface
2 - 5 Using GPUs
2 - 6 Time series cross-validation
2 - 7 AutoML with LazyMTS
13 / 49
2 - QRN forecasting with nnetsauce
2 - 1 nnetsauce’s description (Python version)
General-purpose Machine Learning using Randomized and
Quasi-Randomized neural networks
GitHub: https://github.com/Techtonique/nnetsauce
PyPI: https://pypi.org/project/nnetsauce/
Conda: https://anaconda.org/conda-forge/nnetsauce
Tasks:
Classification
Regression
Univariate/Multivariate time series forecasting
14 / 49
2 - 1 nnetsauce’s description (Python version) (cont’d)
Simple interface for each model:
fit: fitting model to training data
predict: model inference on unseen data
GPU version optimizes matrices multiplication using JAX (not
magical)
Classes MTS and DeepMTS for time series forecasting
DeepMTS seems to be more suited for nearly stationary data
(but I encourage you to try and tell me)
Automated Machine Learning (AutoML) with classes
LazyMTS
and LazyDeepMTS
Cross-validation
15 / 49
2 - 2 Install+import Python packages (including nnetsauce)
pip install nnetsauce
pip install git+https://github.com/Techtonique/mlsauce.git --verbose
import nnetsauce as ns # import the package
import mlsauce as ms
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.linear_model import Ridge
from statsmodels.tsa.seasonal import STL
sns.set_theme(style="darkgrid")
16 / 49
2 - 3 Import data for the demo
Input format (univariate time series)
data-type-1.png
See examples in
https://github.com/Techtonique/datasets/tree/main/time_series
17 / 49
2 - 3 Import data for the demo
Input format (multivariate time series)
Figure 3: data-type-2
See examples in
https://github.com/Techtonique/datasets/tree/main/time_series
18 / 49
2 - 3 Import data for the demo
Univariate:Monthly anti-diabetic drug sales in Australia from
1992 to 2008
url ="https://raw.githubusercontent.com/Techtonique/"
url += "datasets/main/time_series/univariate/"
url += "a10.csv"
df_a10 =pd.read_csv(url)
df_a10.index =pd.DatetimeIndex(df_a10.date) # must have
df_a10.drop(columns=['date'], inplace=True)
19 / 49
df_a10.plot()
1993 1995 1997 1999 2001 2003 2005 2007
date
5
10
15
20
25
30 value
20 / 49
2 - 3 Import data for the demo (cont’d)
Multivariate:Heater vs Ice cream sales data set
url ="https://raw.githubusercontent.com/Techtonique/"
url += "datasets/main/time_series/multivariate/"
url += "ice_cream_vs_heater.csv"
df_temp =pd.read_csv(url)
df_temp.index =pd.DatetimeIndex(df_temp.date) # must have
# first other difference
df_icecream =df_temp.drop(columns=['date']).diff().\
dropna()
21 / 49
df_icecream.plot()
2005 2007 2009 2011 2013 2015 2017 2019
date
30
20
10
0
10
20 heater
icecream
22 / 49
2 - 4 Using the fit +predict interface
A few examples of probabilistic forecasting with nnetsauce:
Gaussian
Bayesian (Gaussian prior on base learner)
Kernel Density Estimation (KDE) and sequential split
conformal prediction (SCP)
Conformalized base learner: TweedieRegressor + SCP
Vine Copula (combined with SCP)
Also use the docs for exact spec.:
https://techtonique.github.io/nnetsauce/nnetsauce.html#MTS
23 / 49
2 - 4 Using the fit +predict interface
Gaussian
from sklearn.ensemble import BaggingRegressor
regr =ns.MTS(obj=BaggingRegressor(), # base learner
type_pi="gaussian",# type of pred. int.
lags=20,# number of time series lags
show_progress=False)
regr.fit(df_icecream);# fit the model
regr.predict(h=30);# 30-steps ahead forecast
24 / 49
2 - 4 Using the fit +predict interface
regr.plot("heater", type_plot="pi")# plot pred. int.
2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
20
10
0
10
20
prediction intervals for heater
25 / 49
2 - 4 Using the fit +predict interface
Bayesian (Gaussian prior)
from sklearn.linear_model import BayesianRidge
regr =ns.MTS(obj=BayesianRidge(), # base learner
lags=15,# no. of time series lags
show_progress=False)
regr.fit(df_a10);# fit the model
regr.predict(h=40, return_std=True);# 40-steps ahead
26 / 49
2 - 4 Using the fit +predict interface
regr.plot(type_plot="pi")
1992 1996 2000 2004 2008 2012
5
10
15
20
25
30
35
40
prediction intervals for input time series
27 / 49
2 - 4 Using the fit +predict interface
SCP-KDE
from sklearn.linear_model import LassoLarsIC
regr =ns.MTS(obj=LassoLarsIC(), # base learner
type_pi="scp-kde",# type of pred. int.
replications=250,# no. of sample paths
kernel='gaussian',# density kernel
lags=15,# no. of time series lags
show_progress=False)
regr.fit(df_a10);# fit the model
regr.predict(h=40);# 40-steps ahead
28 / 49
2 - 4 Using the fit +predict interface
regr.plot(type_plot="spaghetti")
1992 1996 2000 2004 2008 2012
Time
5
10
15
20
25
30
35
40
Values
250 simulations of input time series
29 / 49
2 - 4 Using the fit +predict interface
Conformalized base learner
from sklearn.linear_model import TweedieRegressor
obj0 =ns.PredictionInterval(obj=TweedieRegressor(),
method="splitconformal",
type_split="sequential",
level=95)
regr =ns.MTS(obj=obj0,
lags=20,
show_progress=False)
regr.fit(df_icecream);
regr.predict(h=30, return_pi=True);
30 / 49
2 - 4 Using the fit +predict interface
regr.plot("heater", type_plot="pi")# plot one series
2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
20
10
0
10
20
prediction intervals for heater
31 / 49
2 - 4 Using the fit +predict interface
Vine copula + sequential split conformal
import nnetsauce as ns
from sklearn.linear_model import TweedieRegressor
regr =ns.MTS(obj=TweedieRegressor(),
lags=25,# no. of time series lags
type_pi="scp-vine-tll",# vine copula spec.
replications=250,# no. of sample paths
show_progress=False)
regr.fit(df_icecream);
regr.predict(h=30);
32 / 49
2 - 4 Using the fit +predict interface
regr.plot("heater", type_plot="spaghetti")
2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Time
20
10
0
10
20
Values
250 simulations of heater
33 / 49
2 - 4 Using the fit +predict interface
regr.plot("icecream", type_plot="spaghetti")
2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Time
30
20
10
0
10
20
Values
250 simulations of icecream
34 / 49
2 - 5 Using GPUs
Public notebook on GitHub (https://bit.ly/45RchgD)
Simulated multivariate time series: 100 series, 10000
observations
Ran on Kaggle notebooks, with accelerator GPU P100
import numpy as np
import pandas as pd
import nnetsauce as ns
from sklearn.linear_model import Ridge
from time import time
df =generate_synthetic_mts(n_steps=10000,
n_series=100,
amplitude=40, seed=14531)
df_ =df.diff().dropna()
35 / 49
2 - 5 Using GPUs (cont’d)
Example 1 on CPU, using Ridge regression
regr =Ridge()
obj_MTS =ns.MTS(regr,
lags =15,
n_hidden_features=5,
nodes_sim="uniform",
backend="cpu",# specify backend
verbose =1)
start =time()
obj_MTS.fit(df_)
print(f"Elapsed: {time()-start}")
Elapsed: 64.46652388572693
36 / 49
2 - 5 Using GPUs (cont’d)
Example 2 on GPU (uses JAX behind the scenes), using
Ridge regression
regr =Ridge()
obj_MTS =ns.MTS(regr,
lags =15,
n_hidden_features=5,
nodes_sim="uniform",
backend="gpu",# specify backend
verbose =1)
start =time()
obj_MTS.fit(df_)
print(f"Elapsed: {time()-start}")
Elapsed: 40.53069853782654
37% time gain
37 / 49
2 - 5 Using GPUs (cont’d)
Example 3 on GPU (uses JAX behind the scenes), using
Ridge regression on GPU (mlsauce implementation) too
regr =ms.RidgeRegressor(reg_lambda=1.0, backend="gpu")
obj_MTS =ns.MTS(regr,
lags =15,
n_hidden_features=5,
nodes_sim="uniform",
backend="gpu",# specify backend
verbose =1)
start =time()
obj_MTS.fit(df_)
print(f"Elapsed: {time()-start}")
Elapsed: 23.551459312438965
63% time gain
Can also use xgboost with tree_method='gpu_hist'e.g 38 / 49
2 - 6 Time series cross-validation
2 methods: fixed window and increasing window
fixed window
39 / 49
2 - 6 Time series cross-validation (cont’d)
increasing window
40 / 49
2 - 6 Time series cross-validation (cont’d)
Example in nnetsauce
import nnetsauce as ns
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn.linear_model import Ridge
from statsmodels.tsa.base.datetools import dates_from_str
# some example data
mdata =sm.datasets.macrodata.load_pandas().data
# prepare the dates index
dates =mdata[['year','quarter']].astype(int).astype(str)
quarterly =dates["year"]+"Q" +dates["quarter"]
quarterly =dates_from_str(quarterly)
mdata =mdata[['realgovt','tbilrate','cpi']]
mdata.index =pd.DatetimeIndex(quarterly)
data =np.log(mdata).diff().dropna()
41 / 49
2 - 6 Time series cross-validation (cont’d)
Example in nnetsauce (cont’d)
obj_MTS =ns.MTS(Ridge(), lags =3,
n_hidden_features=7,
replications=100,
seed=24, verbose =0,
type_pi="scp2-block-bootstrap",
show_progress=False)
cv =obj_MTS.cross_val_score(data,
verbose =0,
initial_window=100,
horizon=5,
level=95,
fixed_window=False, # True for rolling
show_progress=False,
scoring="coverage")[1]
42 / 49
2 - 6 Time series cross-validation (cont’d)
Example in nnetsauce (cont’d)
Average coverage on 98 samples
print(cv.mean)
## 92.51700680272108
43 / 49
2 - 6 Time series cross-validation (cont’d)
Example in nnetsauce (cont’d)
obj_MTS =ns.MTS(Ridge(), lags =3,
n_hidden_features=7,
replications=100,
seed=24, verbose =0,
type_pi="scp2-block-bootstrap",
show_progress=False)
cv =obj_MTS.cross_val_score(data,
verbose =0,
initial_window=100,
horizon=5,
level=95,
fixed_window=True, # False for increasing
show_progress=False,
scoring="coverage")[1]
44 / 49
2 - 6 Time series cross-validation (cont’d)
Example in nnetsauce (cont’d)
Average coverage on 98 samples
print(cv.mean)
## 92.44897959183673
45 / 49
2 - 7 Automated Machine Learning (AutoML) with LazyMTS
# split data into training/testing set
n=data.shape[0]
max_idx_train =np.floor(n*0.9)
training_index =np.arange(0, max_idx_train)
testing_index =np.arange(max_idx_train, n)
df_train =data.iloc[training_index,:]
df_test =data.iloc[testing_index,:]
# Train + predict on 3 ML models
regr_mts =ns.LazyMTS(lags =25,
type_pi="scp2-kde",
kernel="gaussian",
replications=250,
estimators=["Ridge",
"ElasticNet",
"RandomForestRegressor"],
show_progress=False);
models, predictions =regr_mts.fit(df_train, df_test);
## 0%| | 0/3 [00:00<?, ?it/s] 33%|###3 | 1/3 [00:00<00:01, 1.50it/s] 67%|######6 | 2/3 [00:03<00:01, 1.89s/it]100%|##########| 3/3 [00:04<00:00, 1.33s/it]100%|##########| 3/3 [00:04<00:00, 1.36s/it]
46 / 49
2 - 7 Automated Machine Learning (AutoML) with LazyMTS
(cont’d)
print(models[['RMSE','COVERAGE']])
## RMSE COVERAGE
## Model
## MTS(ElasticNet) 0.20 87.30
## MTS(RandomForestRegressor) 0.20 88.89
## MTS(Ridge) 0.28 80.95
47 / 49
2 - 7 Automated Machine Learning (AutoML) with LazyMTS
(cont’d)
regr_mts =ns.LazyMTS(lags =25,
type_pi="scp2-kde",
kernel="gaussian",
replications=250,
estimators="all",# the change
show_progress=False);
models, predictions =regr_mts.fit(df_train, df_test);
## 0%| | 0/32 [00:00<?, ?it/s] 3%|3 | 1/32 [00:01<00:46, 1.52s/it] 6%|6 | 2/32 [00:02<00:34, 1.16s/it] 9%|9 | 3/32 [00:03<00:26, 1.08it/s] 12%|#2 | 4/32 [00:03<00:23, 1.18it/s] 16%|#5 | 5/32 [00:04<00:21, 1.24it/s] 19%|#8 | 6/32 [00:05<00:19, 1.33it/s] 22%|##1 | 7/32 [00:12<01:08, 2.76s/it] 25%|##5 | 8/32 [00:12<00:50, 2.09s/it] 28%|##8 | 9/32 [00:14<00:46, 2.04s/it] 34%|###4 | 11/32 [00:15<00:26, 1.26s/it] 38%|###7 | 12/32 [00:16<00:22, 1.11s/it] 41%|#### | 13/32 [00:16<00:18, 1.00it/s] 44%|####3 | 14/32 [00:17<00:18, 1.01s/it] 47%|####6 | 15/32 [00:18<00:15, 1.09it/s] 50%|##### | 16/32 [00:26<00:48, 3.06s/it] 53%|#####3 | 17/32 [00:27<00:35, 2.38s/it] 56%|#####6 | 18/32 [00:28<00:28, 2.05s/it] 62%|######2 | 20/32 [00:29<00:15, 1.31s/it] 66%|######5 | 21/32 [00:30<00:13, 1.21s/it] 69%|######8 | 22/32 [00:31<00:11, 1.13s/it] 72%|#######1 | 23/32 [00:32<00:09, 1.02s/it] 78%|#######8 | 25/32 [00:33<00:05, 1.32it/s] 84%|########4 | 27/32 [00:36<00:05, 1.10s/it] 88%|########7 | 28/32 [00:37<00:04, 1.02s/it] 91%|######### | 29/32 [00:37<00:02, 1.05it/s] 94%|#########3| 30/32 [00:38<00:01, 1.11it/s] 97%|#########6| 31/32 [00:39<00:00, 1.15it/s]100%|##########| 32/32 [00:40<00:00, 1.20it/s]100%|##########| 32/32 [00:40<00:00, 1.25s/it]
48 / 49
2 - 7 Automated Machine Learning (AutoML) with LazyMTS
(cont’d)
print(models.sort_values(by='COVERAGE',
ascending=False)[['COVERAGE']].head(5))
## COVERAGE
## Model
## MTS(AdaBoostRegressor) 92.06
## MTS(BaggingRegressor) 92.06
## MTS(ExtraTreeRegressor) 90.48
## MTS(KNeighborsRegressor) 90.48
## MTS(ElasticNetCV) 88.89
49 / 49
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.