Content uploaded by Thierry Moudiki
Author content
All content in this area was uploaded by Thierry Moudiki on Jul 26, 2024
Content may be subject to copyright.
Probabilistic Forecasting with nnetsauce (using
Density Estimation, Bayesian inference, Conformal
prediction and Vine copulas)
T. Moudiki (thierrymoudiki.github.io)
2024-07-26
1 / 49
Context
▶Quasi-randomized neural networks (QRNs) applied to time
series lags for forecasting
▶Uncertainty quantification using Kernel Density Estimation,
Bayesian inference, Conformal prediction and Vine copulas
▶Implemented in Python package nnetsauce version 0.23.0
2 / 49
Plan
▶1 - Key components of nnetsauce forecasting
▶1 - 1 Quasi-randomized neural networks (QRNs)
▶1 - 2 Uncertainty quantification in forecasting
▶2 - QRN forecasting with nnetsauce
▶2 - 1 nnetsauce’s description (Python version)
▶2 - 2 Install+import Python packages (including nnetsauce)
▶2 - 3 Import data for the demo
▶2 - 4 Using the fit +predict interface
▶2 - 5 Using GPUs
▶2 - 6 Time series cross-validation
▶2 - 7 AutoML with LazyMTS
3 / 49
Plan
▶1 - Key components of nnetsauce forecasting
▶1 - 1 Quasi-randomized neural networks (QRNs)
▶1 - 2 Uncertainty quantification in forecasting
▶2 - QRN forecasting with nnetsauce
▶2 - 1 nnetsauce’s description (Python version)
▶2 - 2 Install+import Python packages (including nnetsauce)
▶2 - 3 Import data for the demo
▶2 - 4 Using the fit +predict interface
▶2 - 5 Using GPUs
▶2 - 6 Time series cross-validation
▶2 - 7 AutoML with LazyMTS
4 / 49
1 - 1 - Quasi-randomized neural networks (QRNs)
Simple case:base learner = Linear Regression; y∈Rn, to be
explained by X(j),j∈ {1,...,p}
y=β0+
p
X
j=1
βjX(j)+
L
X
l=1
γIg
p
X
j=1
W(j,l)X(j)
+ϵ
With:
▶g:activation function →nonlinearity
▶L: number of nodes in the hidden layer
▶W(j,I), hidden layer: pseudo/quasi-random
▶Quasi-random: designed to cover the space parsimoniously
▶secret sauce: “Layer normalization” (centering and scaling
twice)
▶βjand γI: linear model coefficients
▶ϵ: residuals
6 / 49
1 - 1 - Quasi-randomized neural networks (QRNs)
QRNs applied to time series
▶Response y= most recent time series observations
▶Covariates X= time series lags
▶base learner: can be any Machine Learning model
▶
Multivariate forecasting case: the base learner shared by all
the time series
7 / 49
1 - 2 Uncertainty quantification in forecasting
Point forecasts/Uncertainty quantification
▶Point forecasts: cool, but not very informative. How wrong
can we be, based on the assumptions that we made: answers
how “certain” can we be about the forecast?
▶Uncertainty quantification needed: prediction intervals
and/or predictive simulations.
▶prediction intervals: point forecast +/- a term (with a level
of confidence)
▶predictive simulations: future scenarios for the variables of
interest
8 / 49
1 - 2 Uncertainty quantification in forecasting
In nnetsauce
▶Based on:
▶Bayesian priors
▶In-sample residuals = model fit - true observation on the
whole training set
▶Calibrated residuals = model fit - true observation on a
held-out calibration set
▶(Vine) Copulas (since nnetsauce v0.23.0)
▶
Calibrated residuals used in
nnetsauce
for methods based on
sequential split conformal prediction (more on this later)
9 / 49
1 - 2 Uncertainty quantification in forecasting
Recap
▶In nnetsauce version 0.23.0:
▶Via aBayesian base learner
▶Via aconformalized base learner
▶Via in-sample residuals for methods based on:
▶parametric residuals distribution inference (gaussian)
▶density estimation and simulation of residuals (kde)
▶bootstrap resampling (bootstrap and block-bootstrap)
▶vine copulas (vine-*)
▶
Via calibrated residuals for methods based on sequential split
conformal prediction (SCP) (scp*-kde,scp*-bootstrap,
scp*-block-bootstrap,scp*-vine-*)
12 / 49
Plan
▶1 - Key components of nnetsauce forecasting
▶1 - 1 Quasi-randomized neural networks (QRNs)
▶1 - 2 Uncertainty quantification in forecasting
▶2 - QRN forecasting with nnetsauce
▶2 - 1 nnetsauce’s description (Python version)
▶2 - 2 Install+import Python packages (including nnetsauce)
▶2 - 3 Import data for the demo
▶2 - 4 Using the fit +predict interface
▶2 - 5 Using GPUs
▶2 - 6 Time series cross-validation
▶2 - 7 AutoML with LazyMTS
13 / 49
2 - QRN forecasting with nnetsauce
2 - 1 nnetsauce’s description (Python version)
▶General-purpose Machine Learning using Randomized and
Quasi-Randomized neural networks
▶GitHub: https://github.com/Techtonique/nnetsauce
▶PyPI: https://pypi.org/project/nnetsauce/
▶Conda: https://anaconda.org/conda-forge/nnetsauce
▶Tasks:
▶Classification
▶Regression
▶Univariate/Multivariate time series forecasting
14 / 49
2 - 1 nnetsauce’s description (Python version) (cont’d)
▶Simple interface for each model:
▶fit: fitting model to training data
▶predict: model inference on unseen data
▶
GPU version optimizes matrices multiplication using JAX (not
magical)
▶Classes MTS and DeepMTS for time series forecasting
▶DeepMTS seems to be more suited for nearly stationary data
(but I encourage you to try and tell me)
▶
Automated Machine Learning (AutoML) with classes
LazyMTS
and LazyDeepMTS
▶Cross-validation
15 / 49
2 - 2 Install+import Python packages (including nnetsauce)
pip install nnetsauce
pip install git+https://github.com/Techtonique/mlsauce.git --verbose
import nnetsauce as ns # import the package
import mlsauce as ms
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.linear_model import Ridge
from statsmodels.tsa.seasonal import STL
sns.set_theme(style="darkgrid")
16 / 49
2 - 3 Import data for the demo
Univariate:Monthly anti-diabetic drug sales in Australia from
1992 to 2008
url ="https://raw.githubusercontent.com/Techtonique/"
url += "datasets/main/time_series/univariate/"
url += "a10.csv"
df_a10 =pd.read_csv(url)
df_a10.index =pd.DatetimeIndex(df_a10.date) # must have
df_a10.drop(columns=['date'], inplace=True)
19 / 49
df_a10.plot()
1993 1995 1997 1999 2001 2003 2005 2007
date
5
10
15
20
25
30 value
20 / 49
2 - 3 Import data for the demo (cont’d)
Multivariate:Heater vs Ice cream sales data set
url ="https://raw.githubusercontent.com/Techtonique/"
url += "datasets/main/time_series/multivariate/"
url += "ice_cream_vs_heater.csv"
df_temp =pd.read_csv(url)
df_temp.index =pd.DatetimeIndex(df_temp.date) # must have
# first other difference
df_icecream =df_temp.drop(columns=['date']).diff().\
dropna()
21 / 49
df_icecream.plot()
2005 2007 2009 2011 2013 2015 2017 2019
date
30
20
10
0
10
20 heater
icecream
22 / 49
2 - 4 Using the fit +predict interface
A few examples of probabilistic forecasting with nnetsauce:
▶Gaussian
▶Bayesian (Gaussian prior on base learner)
▶Kernel Density Estimation (KDE) and sequential split
conformal prediction (SCP)
▶Conformalized base learner: TweedieRegressor + SCP
▶Vine Copula (combined with SCP)
Also use the docs for exact spec.:
https://techtonique.github.io/nnetsauce/nnetsauce.html#MTS
23 / 49
2 - 4 Using the fit +predict interface
▶Gaussian
from sklearn.ensemble import BaggingRegressor
regr =ns.MTS(obj=BaggingRegressor(), # base learner
type_pi="gaussian",# type of pred. int.
lags=20,# number of time series lags
show_progress=False)
regr.fit(df_icecream);# fit the model
regr.predict(h=30);# 30-steps ahead forecast
24 / 49
2 - 4 Using the fit +predict interface
regr.plot("heater", type_plot="pi")# plot pred. int.
2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
20
10
0
10
20
prediction intervals for heater
25 / 49
2 - 4 Using the fit +predict interface
▶Bayesian (Gaussian prior)
from sklearn.linear_model import BayesianRidge
regr =ns.MTS(obj=BayesianRidge(), # base learner
lags=15,# no. of time series lags
show_progress=False)
regr.fit(df_a10);# fit the model
regr.predict(h=40, return_std=True);# 40-steps ahead
26 / 49
2 - 4 Using the fit +predict interface
regr.plot(type_plot="pi")
1992 1996 2000 2004 2008 2012
5
10
15
20
25
30
35
40
prediction intervals for input time series
27 / 49
2 - 4 Using the fit +predict interface
▶SCP-KDE
from sklearn.linear_model import LassoLarsIC
regr =ns.MTS(obj=LassoLarsIC(), # base learner
type_pi="scp-kde",# type of pred. int.
replications=250,# no. of sample paths
kernel='gaussian',# density kernel
lags=15,# no. of time series lags
show_progress=False)
regr.fit(df_a10);# fit the model
regr.predict(h=40);# 40-steps ahead
28 / 49
2 - 4 Using the fit +predict interface
regr.plot(type_plot="spaghetti")
1992 1996 2000 2004 2008 2012
Time
5
10
15
20
25
30
35
40
Values
250 simulations of input time series
29 / 49
2 - 4 Using the fit +predict interface
▶Conformalized base learner
from sklearn.linear_model import TweedieRegressor
obj0 =ns.PredictionInterval(obj=TweedieRegressor(),
method="splitconformal",
type_split="sequential",
level=95)
regr =ns.MTS(obj=obj0,
lags=20,
show_progress=False)
regr.fit(df_icecream);
regr.predict(h=30, return_pi=True);
30 / 49
2 - 4 Using the fit +predict interface
regr.plot("heater", type_plot="pi")# plot one series
2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
20
10
0
10
20
prediction intervals for heater
31 / 49
2 - 4 Using the fit +predict interface
▶Vine copula + sequential split conformal
import nnetsauce as ns
from sklearn.linear_model import TweedieRegressor
regr =ns.MTS(obj=TweedieRegressor(),
lags=25,# no. of time series lags
type_pi="scp-vine-tll",# vine copula spec.
replications=250,# no. of sample paths
show_progress=False)
regr.fit(df_icecream);
regr.predict(h=30);
32 / 49
2 - 4 Using the fit +predict interface
regr.plot("heater", type_plot="spaghetti")
2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Time
20
10
0
10
20
Values
250 simulations of heater
33 / 49
2 - 4 Using the fit +predict interface
regr.plot("icecream", type_plot="spaghetti")
2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Time
30
20
10
0
10
20
Values
250 simulations of icecream
34 / 49
2 - 5 Using GPUs
▶Public notebook on GitHub (https://bit.ly/45RchgD)
▶Simulated multivariate time series: 100 series, 10000
observations
▶Ran on Kaggle notebooks, with accelerator GPU P100
import numpy as np
import pandas as pd
import nnetsauce as ns
from sklearn.linear_model import Ridge
from time import time
df =generate_synthetic_mts(n_steps=10000,
n_series=100,
amplitude=40, seed=14531)
df_ =df.diff().dropna()
35 / 49
2 - 5 Using GPUs (cont’d)
▶Example 1 on CPU, using Ridge regression
regr =Ridge()
obj_MTS =ns.MTS(regr,
lags =15,
n_hidden_features=5,
nodes_sim="uniform",
backend="cpu",# specify backend
verbose =1)
start =time()
obj_MTS.fit(df_)
print(f"Elapsed: {time()-start}")
Elapsed: 64.46652388572693
36 / 49
2 - 5 Using GPUs (cont’d)
▶Example 2 on GPU (uses JAX behind the scenes), using
Ridge regression
regr =Ridge()
obj_MTS =ns.MTS(regr,
lags =15,
n_hidden_features=5,
nodes_sim="uniform",
backend="gpu",# specify backend
verbose =1)
start =time()
obj_MTS.fit(df_)
print(f"Elapsed: {time()-start}")
Elapsed: 40.53069853782654
▶37% time gain
37 / 49
2 - 5 Using GPUs (cont’d)
▶Example 3 on GPU (uses JAX behind the scenes), using
Ridge regression on GPU (mlsauce implementation) too
regr =ms.RidgeRegressor(reg_lambda=1.0, backend="gpu")
obj_MTS =ns.MTS(regr,
lags =15,
n_hidden_features=5,
nodes_sim="uniform",
backend="gpu",# specify backend
verbose =1)
start =time()
obj_MTS.fit(df_)
print(f"Elapsed: {time()-start}")
Elapsed: 23.551459312438965
▶63% time gain
▶Can also use xgboost with tree_method='gpu_hist'e.g 38 / 49
2 - 6 Time series cross-validation
2 methods: fixed window and increasing window
fixed window
39 / 49
2 - 6 Time series cross-validation (cont’d)
increasing window
40 / 49
2 - 6 Time series cross-validation (cont’d)
Example in nnetsauce
import nnetsauce as ns
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn.linear_model import Ridge
from statsmodels.tsa.base.datetools import dates_from_str
# some example data
mdata =sm.datasets.macrodata.load_pandas().data
# prepare the dates index
dates =mdata[['year','quarter']].astype(int).astype(str)
quarterly =dates["year"]+"Q" +dates["quarter"]
quarterly =dates_from_str(quarterly)
mdata =mdata[['realgovt','tbilrate','cpi']]
mdata.index =pd.DatetimeIndex(quarterly)
data =np.log(mdata).diff().dropna()
41 / 49
2 - 6 Time series cross-validation (cont’d)
Example in nnetsauce (cont’d)
obj_MTS =ns.MTS(Ridge(), lags =3,
n_hidden_features=7,
replications=100,
seed=24, verbose =0,
type_pi="scp2-block-bootstrap",
show_progress=False)
cv =obj_MTS.cross_val_score(data,
verbose =0,
initial_window=100,
horizon=5,
level=95,
fixed_window=False, # True for rolling
show_progress=False,
scoring="coverage")[1]
42 / 49
2 - 6 Time series cross-validation (cont’d)
Example in nnetsauce (cont’d)
Average coverage on 98 samples
print(cv.mean)
## 92.51700680272108
43 / 49
2 - 6 Time series cross-validation (cont’d)
Example in nnetsauce (cont’d)
obj_MTS =ns.MTS(Ridge(), lags =3,
n_hidden_features=7,
replications=100,
seed=24, verbose =0,
type_pi="scp2-block-bootstrap",
show_progress=False)
cv =obj_MTS.cross_val_score(data,
verbose =0,
initial_window=100,
horizon=5,
level=95,
fixed_window=True, # False for increasing
show_progress=False,
scoring="coverage")[1]
44 / 49
2 - 6 Time series cross-validation (cont’d)
Example in nnetsauce (cont’d)
Average coverage on 98 samples
print(cv.mean)
## 92.44897959183673
45 / 49
2 - 7 Automated Machine Learning (AutoML) with LazyMTS
# split data into training/testing set
n=data.shape[0]
max_idx_train =np.floor(n*0.9)
training_index =np.arange(0, max_idx_train)
testing_index =np.arange(max_idx_train, n)
df_train =data.iloc[training_index,:]
df_test =data.iloc[testing_index,:]
# Train + predict on 3 ML models
regr_mts =ns.LazyMTS(lags =25,
type_pi="scp2-kde",
kernel="gaussian",
replications=250,
estimators=["Ridge",
"ElasticNet",
"RandomForestRegressor"],
show_progress=False);
models, predictions =regr_mts.fit(df_train, df_test);
## 0%| | 0/3 [00:00<?, ?it/s] 33%|###3 | 1/3 [00:00<00:01, 1.50it/s] 67%|######6 | 2/3 [00:03<00:01, 1.89s/it]100%|##########| 3/3 [00:04<00:00, 1.33s/it]100%|##########| 3/3 [00:04<00:00, 1.36s/it]
46 / 49
2 - 7 Automated Machine Learning (AutoML) with LazyMTS
(cont’d)
print(models[['RMSE','COVERAGE']])
## RMSE COVERAGE
## Model
## MTS(ElasticNet) 0.20 87.30
## MTS(RandomForestRegressor) 0.20 88.89
## MTS(Ridge) 0.28 80.95
47 / 49
2 - 7 Automated Machine Learning (AutoML) with LazyMTS
(cont’d)
regr_mts =ns.LazyMTS(lags =25,
type_pi="scp2-kde",
kernel="gaussian",
replications=250,
estimators="all",# the change
show_progress=False);
models, predictions =regr_mts.fit(df_train, df_test);
## 0%| | 0/32 [00:00<?, ?it/s] 3%|3 | 1/32 [00:01<00:46, 1.52s/it] 6%|6 | 2/32 [00:02<00:34, 1.16s/it] 9%|9 | 3/32 [00:03<00:26, 1.08it/s] 12%|#2 | 4/32 [00:03<00:23, 1.18it/s] 16%|#5 | 5/32 [00:04<00:21, 1.24it/s] 19%|#8 | 6/32 [00:05<00:19, 1.33it/s] 22%|##1 | 7/32 [00:12<01:08, 2.76s/it] 25%|##5 | 8/32 [00:12<00:50, 2.09s/it] 28%|##8 | 9/32 [00:14<00:46, 2.04s/it] 34%|###4 | 11/32 [00:15<00:26, 1.26s/it] 38%|###7 | 12/32 [00:16<00:22, 1.11s/it] 41%|#### | 13/32 [00:16<00:18, 1.00it/s] 44%|####3 | 14/32 [00:17<00:18, 1.01s/it] 47%|####6 | 15/32 [00:18<00:15, 1.09it/s] 50%|##### | 16/32 [00:26<00:48, 3.06s/it] 53%|#####3 | 17/32 [00:27<00:35, 2.38s/it] 56%|#####6 | 18/32 [00:28<00:28, 2.05s/it] 62%|######2 | 20/32 [00:29<00:15, 1.31s/it] 66%|######5 | 21/32 [00:30<00:13, 1.21s/it] 69%|######8 | 22/32 [00:31<00:11, 1.13s/it] 72%|#######1 | 23/32 [00:32<00:09, 1.02s/it] 78%|#######8 | 25/32 [00:33<00:05, 1.32it/s] 84%|########4 | 27/32 [00:36<00:05, 1.10s/it] 88%|########7 | 28/32 [00:37<00:04, 1.02s/it] 91%|######### | 29/32 [00:37<00:02, 1.05it/s] 94%|#########3| 30/32 [00:38<00:01, 1.11it/s] 97%|#########6| 31/32 [00:39<00:00, 1.15it/s]100%|##########| 32/32 [00:40<00:00, 1.20it/s]100%|##########| 32/32 [00:40<00:00, 1.25s/it]
48 / 49
2 - 7 Automated Machine Learning (AutoML) with LazyMTS
(cont’d)
print(models.sort_values(by='COVERAGE',
ascending=False)[['COVERAGE']].head(5))
## COVERAGE
## Model
## MTS(AdaBoostRegressor) 92.06
## MTS(BaggingRegressor) 92.06
## MTS(ExtraTreeRegressor) 90.48
## MTS(KNeighborsRegressor) 90.48
## MTS(ElasticNetCV) 88.89
49 / 49