Content uploaded by Louis Desportes
Author content
All content in this area was uploaded by Louis Desportes on Nov 26, 2019
Content may be subject to copyright.
Short-Term Temperature Forecasting
on a Several Hours Horizon
Louis Desportes1(B
), Pierre Andry1, Inbar Fijalkow1, and J´erˆome David2
1ETIS, Univ Paris Seine, Univ Cergy-Pontoise, ENSEA, CNRS,
95000 Cergy-Pontoise, France
{louis.desportes,pierre.andry,inbar.fijalkow}@ensea.fr
2ZenT, 95000 Neuville sur Oise, France
jerome.david@zent-eco.com
Abstract. Outside temperature is an important quantity in building
control. It enables improvement in inhabitant energy consumption fore-
cast or heating requirement prediction. However most previous works on
outside temperature forecasting require either a lot of computation or a
lot of different sensors. In this paper we try to forecast outside tempera-
ture at a multiple hour horizon knowing only the last 24h of temperature
and computed clear-sky irradiance up to the prediction horizon. We pro-
pose the use different neural networks to predict directly at each hour of
the horizon instead of using forecast of one hour to predict the next. We
show that the most precise one is using one dimensional convolutions,
and that the error is distributed across the year. The biggest error factor
we found being unknown cloudiness at the beginning of the day. Our
findings suggest that the precision improvement seen is not due to trend
accuracy improvement but only due to an improvement in precision.
Keywords: Forecast ·Temperature ·Smart building ·CNN
1 Introduction
1.1 Motivation: EcobioH2 Building
Today’s buildings energy consumption is decreasing due to progress in used
materials and appliances energy management, so that in the near future we
could envision positive energy buildings, i.e. buildings that produce more energy
than they consume. This implies using a local source of energy, like solar panels
on the rooftop or a windmill, rather than power from the grid. However, local
energy sources are usually intermittent. Therefore, a storage of the energy is
required for the low-to-no-production periods, like batteries, and a way to control
storage/usage periods. This implies knowing in advance the energy production
The research reported in this publication is part of the EcobioH2 project supported by
EcoBio and ADEME, the french agency for environnement and energy. This project is
funded by the PIA, the french national investment plan for innovation.
c
Springer Nature Switzerland AG 2019
I. V. Tetko et al. (Eds.): ICANN 2019, LNCS 11730, pp. 525–536, 2019.
https://doi.org/10.1007/978-3-030-30490-4_42
526 L. Desportes et al.
and demand. Both being highly influenced by external climate, ourstudy focuses
on temperature forecast.
The EcobioH2 project [3,4] intends to be the first low footprint building in
France using hydrogen fuel cells for energy storage and a neural network for its
control. The 6-storey building of approximately 10 000 squared meters will host
retail, cultural, lodging, offices and digital activities. It will have solar panels
on its rooftop to produce energy, hybrid hydrogen energy storage to store it.
EcobioH2 project requires temperature forecast to design an energy control and
monitoring system balancing local energy needs and energy production.
1.2 Temperature Forecast
Temperature forecast is required to refine electricity load forecast [9]. The
influence of heat is important on appliances consumption and need for cool-
ing/heating. Moreover the weather fluctuations cause behavior shifts of inhabi-
tants.
Knowing what the outside temperature will be in the next 6 to 24 h, we
can predict, and take into account, how much energy will be needed for heating
or cooling the building. The aim of this paper is to investigate the temperature
forecast on a several hours horizon with a limited amount of sensors. In particular
neither wind speed nor wet-bulb thermometers will be available. The exploitation
of our predictor should also not require a too large amount of data.
1.3 Related Works
Different methods have been proposed for short-term temperature forecasting. [6]
uses Abductive networks, a method that links multiple Volterra series together
in order to ease network interpretation. However this method uses a network
for each prediction hour which induces a huge complexity. Better methods have
since been found. Those methods are outlined below.
Based upon a physical model of temperature, [15] uses Volterra series to pro-
pose probabilistics forecasts. The authors propose to use hidden Markov models
and the Viterbi algorithm to account for the cloudiness variation. They predict
at short terms of 15 min and 30 min and still have a lot of parameters to learn
(2 Volterra series, 2 HMM, 1 Autoregressive filter).
[9] is using Echo-State networks hidden state to account for cloudiness. This
implies a long and complex convergence for the network. Furthermore the authors
only predict at an hour horizon, when we want to have a several hours horizon,
or a day horizon using one completely different network depending on the hour
the forecast is made.
Simpler Artificial Neural Networks forecasting methods have also been pro-
posed, such as [11]. The authors use many different sensors (wet temperature,
wind speed, humidity, pressure, . . . ) that we don’t have on site. [7] needs the
last 10 years of values of a given day as an input of their network and only
predicts the values of the next day while we need a forecast for several hours of
the current day.
Short-Term Temperature Forecasting on a Several Hours Horizon 527
The closest work to ours is [16] detailed in [17]. After training it predicts
temperature using only the last 24 h of temperature measures and computed
irradiance data. This prediction happens only at the next hour. The authors
then use the prediction in 1 h of the network to predict the temperature in 2 h
yielding to a propagation error that increases with the forecast horizon. Moreover
this prediction method preprocess the data before feeding it to the network. This
preprocessing might limit the network learning capacity.
In the sequel, we will investigate different neural networks architectures to
improve the several hours horizon forecast.
2 Model and Problem Setting
We want to forecast the outside temperature Tout for each hour up to an horizon
of H= 6 h or H= 24h. At the trained network input, we use only the N= 24
last hours of temperature values Tout(t−N+1:t) and the computed irradiance
Ith between t−N+ 1 and t+H, where tis the current instant. In other words,
we want to find the function fsuch that the H hours temperature forecast
Tout(t+1:t+H) is given by:
Tout(t+ 1 : t+H) = f(Tout (t−N+ 1 :
t); Ith(t−N+1:t+H)), where MSE: T
t=1 H
h=1 Tout(t+h)−
Tout(t+h)2
is minimal. This problem is shown in Fig. 1.
We denote X(t+1:t+H) = (X(t+ 1), X(t+ 2), . . . , X(t+H)) the vector
containing the Hvalues of Xbetween hours t+ 1 and t+Hwith a time step of
1 h. Identically for X(t−N+1:t). The computed irradiance Ith is the clear-sky
irradiance. It is the power received if the sky does not have any cloud. Ith can
be computed using the equations found in [12]. Tout is given in K, and Ith in
W/m2.
Ith(t−N+ 1 : t+H)
Tout(t−N+ 1 : t)
f�
Tout(t+1:t+H)
Fig. 1. Temperature forecast scheme
3 Temperature Forecasting Using Neural Networks
In the sequel, we will use neural networks to learn the function fas defined in
Fig. 1, from the sole Tout and Ith.
Our work is based upon [17]. This method preprocesses the input data as
y1=Ith(t+ 1); Tout(t); max(Tout (t−N+ 1 : t)); min(Tout(t−N+1:t)); Tout(t); Tout(t−1)
where Tout(t) is the mean of Tout (t−N+ 1 : t). This preprocessed input y1is
then fed to an hidden neural network layer with bias b2and a tanh activation
528 L. Desportes et al.
function such that y2=tanh(y1×W2+b2). The hidden layer output is then
fed to the output layer Dense(1) with bias b2and no activation function giving
the one value output
Tout(t+ 1) = y2×w3+b3. This method is displayed in
Fig. 2. We suspect that using
Tout(t+ 1) as an input of the neural network to
predict Tout(t+ 2) may induce error propagation.
Ith(t−N+h:t+h−1)
Tout(t−N+h:t+h−1)
P reprocess Dense
y1Dense(1)
y2�
Tout(t+ 1)
delay
Fig. 2. [17] network structure
3.1 Multi-horizon
In order to avoid error propagation, we propose to adapt the network proposed
in [17] to directly forecast up to t+Hhorizons with each t+n,n∈[1, H] as an
output of the network.
We preprocess the data in the same way as [17]. However to ensure our
network has the same input information as their when run over Hhorizons we
add Ith(t+ 2 : t+H) to the output of the preprocessing: y1=Ith (t+ 1 : t+
H); Tout(t); max(Tout (t−N+1 : t)); min(Tout (t−N+1 : t)); Tout (t); Tout(t−1).
In the contrary of [9], we don’t want to train as many networks as the number of
outputs. This means that our network’s Dense layer is common to all prediction
horizons. The formula for this layer is the same, only the dimension changes; the
output is made of Hvalues, one for each of the horizon time-step. This yields
to:
Tout(t+1:t+H)=y2×W3+b3This network named preprocess multih
is shown in Fig. 3.
Ith(t−N+ 1 : t+H)
Tout(t−N+1:t)
P reprocess Dense
y1Dense(H)
y2�
Tout(t+1:t+H)
Fig. 3. preprocess multih network structure
Short-Term Temperature Forecasting on a Several Hours Horizon 529
3.2 Raw Input
In order to understand if the preprocessing proposed by [17] limits the method
performance, we propose to remove the inputs preprocessing and to feed the
network with the raw inputs: in yr= (Ith (t−N+1:t+H); Tout(t−N+ 1 :
t)). These raw inputs are sent to Dense and Dense(H) layers using the same
formulas as the previous network preprocess multih with different dimension.
Figure 4show this network named raw multih.
Ith(t−N+ 1 : t+H)
Tout(t−N+ 1 : t)
Dense Dense(H)�
Tout(t+ 1 : t+H)
Fig. 4. raw multih structure
3.3 Convolutions
Ith(t−N+ 1 : t+H)
Tout(t−N+ 1 : t)
filterI1
filterT1
...
. . .
filterIC
filterTC
Dense
convI1
convT1
convIC
convTC
Dense(H)
y2�
Tout(t+ 1 : t+H)
Fig. 5. conv multih structure
Next, we investigate the usage of a convolutional layer enabling the network to
do a better analysis of the inputs [14] because convolutions enable the network
to factor for local temporal convolutions. Doing so, we adopt a similar approach
as in audio processing [13]. In our case a 1D convolution should be sufficient
since our signal seems has a slow frequency evolution. To our knowledge, it is
the first attempt to apply such solution to temperature forecasting.
Each raw input is fed to a separate convolutional layer with bias, no activation
and no padding. For each convolution cin C, the number of convolutions, the
formula is convIc=Ith(t−N+1:t+H)∗filterIc+bIcand convTc=
Tout(t−N+1:t)∗filtersTc+bTc. With ∗the convolution operator. The output
530 L. Desportes et al.
of those two convolutional layer is flattened and concatenated in a 1 dimension
vector y1to be sent to the hidden layer. The hidden layer Dense and the output
layer Dense(H) use the same formulas as raw multih and preprocess multih.
This method depicted in Fig. 5is named conv multih.
3.4 Linear Predictor
For the sake of comparison, we want to measure the benefit of the neural networks
in regards to linear forecasts. We will call the linear method linear raw multih.
In this case, the same raw inputs, Ith(t−N+ 1 : t+H) and Tout(t−N+1:t),
are fed to the output layer with bias band no activation function:
Tout(t+1:t+H)=yr×W+b
4 Available Datasets
There are many datasets available that take their data from weather stations
around the world. The World Meteorological Organization (WMO) has its own
set of weather stations. Composed of an aggregation of weather stations from
country specific meteorological organizations. Information about the current
weather status is broadcasted using synoptic code, also known as “code synop”
[1]. Each station has its own diffusion schedule from each hour to every 6 h.
The aeronautic industry also has its own weather records called Metar
(METeorological Aerodrome Report) [2]. Each airport makes its own report and
broadcasts it every half hour.
The US National Radiation Research Laboratory built some weather stations
to study solar radiations. Their data [8] is freely available on each project website
and include the local temperature.
Other datasets take their sources from satellite observations. They only use
weather stations to calibrate the interpretation of their imagery. Satellite imagery
has the benefit of having data for more locations instead of a few discrete measure
points.
In this work, we use the NASA Merra-2 [10] data. Those data are composed
of an aggregation of different, wordwide, observations with a 1 h frequency. The
dataset is packed with clear-sky irradiance and available freely for specific loca-
tions and the years 2005–2006.
In the following experiments we used the Merra-2 dataset on the town of Avi-
gnon (43.95◦, 4.817◦) in France, the location of the EcobioH2 building, obtained
through SoDa HelioClim-3 Archives for free [5]. Using the data of 2005 as the
train set and of 2006 as the validation and test set. Using the same data for
test and validation can be made since we did not reach overfitting in any of
our training. And therefore, could not optimize the number of epochs for the
validation dataset.
Short-Term Temperature Forecasting on a Several Hours Horizon 531
5 Used Metrics
We pose T=Tout(t+ 1 : t+H) to improve readability and Nthe sample size.
Next, we recall the equations of different metrics.
RMSE =
1
N
N
i=0
(Ti−
Ti)2(1)
RMSE is our primary metric. It lends itself easily to interpretation as an error
interval since it is expressed in the same metric as the output variable.
R2= 1 −N
i=0(Ti−
Ti)2
N
i=0(Ti−Ti)2(2)
R2allows us to know how much of the signal is predicted. Giving us an idea of
our room for improvement.
MAE =1
N
N
i=0
|Ti−
Ti|(3)
MAE is used in many works. It can be interpreted as an error interval but does
not penalize far-offpredictions.
6 Results
Using the training data, we perform a stochastic gradient descent algorithm in
order to find the different parameters Wiand bifor the different networks and
hyperparmeters combination. Then we evaluate each hyperparmeter combina-
tion on the train set and select the best one. The result is displayed in Table 1.
Table 1. Best hyper-parameters found for each network
Algorithms Epochs Learning
rate
Batch
size
Number of
neurons in
hidden layer
Conv
sizes
Number
of conv
[17] 50 k 0.001 8 5
preprocess multih 50 k 0.001 8 50
linear raw multih 150 k 0.001 32
raw multih 150 k 0.001 32 14
conv multih 50 k 0.001 8 60 3 24
Then we predict on the test set to obtain RMSE and MAE and R2. While
the error values of RMSE and MAE are in Kelvin, they are equal to the error
values in degree Celsius.
532 L. Desportes et al.
In Table 2we see that [17] error increases way more the further the horizon.
It suggests that indeed using
Tout(t+ 1) as an input of the neural network to
predict Tout(t+ 2) induce error propagation. Therefore our approch to multiple
horizon forecast is the right one.
Table 2. RMSE (K) for each horizon and network
Algorithms t + 1 t + 2 t + 3 t + 4 t + 5 t + 6
[17] 0.475 0.996 1.433 1.753 1.962 2.084
preprocess multih 0.412 0.715 0.962 1.171 1.354 1.515
linear raw multih 0.409 0.756 1.030 1.256 1.413 1.540
raw multih 0.375 0.644 0.882 1.098 1.289 1.428
conv multih 0.340 0.602 0.846 1.053 1.236 1.380
The same Table exhibits that, according to the RMSE metric, the best pre-
cision for all given metrics is achieved with the conv multih network. We explain
this result by the ability of convolution to characterize the sky cloudiness. Table 3
is available to enable comparison with other works who uses this metric.
Table 3. MAE (K) for each horizon and network
Algorithms t + 1 t + 2 t + 3 t + 4 t + 5 t + 6
[17] 0.342 0.761 1.125 1.396 1.568 1.665
preprocess multih 0.293 0.531 0.729 0.895 1.038 1.169
linear raw multih 0.294 0.565 0.785 0.964 1.089 1.182
raw multih 0.277 0.481 0.660 0.832 0.982 1.091
conv multih 0.2413 0.440 0.629 0.792 0.936 1.048
Table 4indicates that we predict most of the signal. The fact that even the
most basic predictor, linear raw multih, gives excellent results validate the way
Table 4. R2in percentage for each horizon and network
Algorithms t + 1 t + 2 t + 3 t + 4 t + 5 t + 6
[17] 99.72 98.77 97.44 96.17 95.20 94.59
preprocess multih 99.79 99.36 98.85 98.29 97.72 97.14
linear raw multih 99.79 99.29 98.68 98.03 97.51 97.05
raw multih 99.82 99.48 99.03 98.50 97.93 97.46
conv multih 99.86 99.55 99.11 98.62 98.10 97.63
Short-Term Temperature Forecasting on a Several Hours Horizon 533
we stated the problem of temperature forecast. However even small forecasting
improvement can be useful since they can be leveraged by other predictors.
7 Analysis
We analyze in more details the results of the proposed conv multih, because
it is the most precise one, to understand its weaknesses. In Fig. 6, we plot the
RMSEagainst the prediction hour. We see that there is a spike in error at the
beginning of the day, from 5 am to 8 am. Since cloudiness is defined as c=
1−Ireal
Iexpected = 1 −Ilocal
Iclear sky , and since the clear sky irradiance is zero before
sunrise, we can’t have a cloudiness information before the sunrise. Hence, this
spike is due to the insufficient information regarding the upcoming cloudiness of
the day.
Fig. 6. RMSE in Kelvin, depending on the forecast hour for the best network
We did the same analysis regarding the month of the instant t(Fig. 8) and
the evolution of error (RMSE) during the year (Fig. 7). No other spike can be
seen, the error is evenly shared across the year.
Fig. 7. RMSE in Kelvin, versus the day of the year for t + 1 and t + 6
534 L. Desportes et al.
Fig. 8. RMSE in Kelvin, versus the month for the best network
The location of prediction has a great influence on the prediction error. We
see in Table 5that Nice, a city by the sea in the south of Avignon, has better
results than Avignon. This is due to the climatic conditions of the city, Nice
having way less clouds than Avignon. We used Nice as a comparison point since
it is the location [17] used.
Table 5. RMSE (K) of conv multih for the cities of Nice and Avignon
Location t + 1 t + 2 t + 3 t + 4 t + 5 t + 6
Nice 0.2171 0.4006 0.5635 0.7004 0.8049 0.8916
Avignon 0.3397 0.6021 0.8460 1.0527 1.2355 1.3800
The goal of forecasting is to be precise (RMSE). It should be noted that this
precision can be slightly improved by letting the training continue even when the
gradient is small. This can be seen when using a logarithmic scale as in Fig. 9.
Still this improvement cost a lot more computations per error unit.
We also want to know if the prediction is reliable. For this reason, we intro-
duce the trend accuracy:
trendk(x) = 0 if |x|≤k
sign(x) otherwise
trend accuracyk=count(trendk(
Ti) = trend(Ti))
N
(4)
That is the accuracy of the network to forecast if the temperature will rise,
fall or stay constant. We choose k= 0.3 as the interval for the constant class.
From our results, Table 2, this value seems to be lowest standard deviation we
could have. Thus values in this interval could be seen as a stable trend.
Short-Term Temperature Forecasting on a Several Hours Horizon 535
Fig. 9. Normalized MSE of the best network throughout learning
In Table 6we see that if the conv multih is always very close from the best
accuracy. It is rarely the best one. When preprocessing is removed accuracy
values are in a very small interval. This show the improvement seen in RMSE is
not due to an improvement in accuracy but solely in an improvement in precision.
Keep in mind that the categorization of trends been a bit arbitrary, variation is
to be expected, so the difference seen may not be significant.
Table 6. Trend accuracy (.3) for each horizon and network
Algorithms t + 1 t + 2 t + 3 t + 4 t + 5 t + 6
[17] 0.834 0.710 0.659 0.665 0.683 0.695
preprocess multih 0.835 0.735 0.690 0.694 0.683 0.689
linear raw multih 0.809 0.724 0.704 0.704 0.699 0.705
raw multih 0.811 0.729 0.714 0.697 0.699 0.697
conv multih 0.848 0.756 0.713 0.698 0.696 0.695
8 Conclusion
In this paper, we proposed several neural networks for temperature forecast
based on the sole previous 24 h temperature and computed irradiance. We showed
that convolutional neural networks are a good tool for temperature forecasting.
The proposed networks display precision improvement over linear predictors and
non-linear ones. However progress should be made to account for cloudiness at
sunrise and improve prediction accuracy. Our solution has the main advantage
of not propagating forecasting error through time, and to have the best precision
of forecast.
536 L. Desportes et al.
References
1. International Codes, Volume I.1, Annex II to the WMO Technical Regulations:
part A- Alphanumeric Codes (2011–2018). https://library.wmo.int/doc num.php?
explnum id=5708
2. Meteorological Service for International Air Navigation (Annex 3) (2013). https://
www.icao.int/Meetings/METDIV14/Documents/an03 cons secured.pdf
3. ECOBIO H2 – ADEME, March 2019. https://www.ademe.fr/ecobio-h2
4. Ecobioh2 - etis, February 2019. https://ecobioh2.ensea.fr
5. HelioClim-3 Archives for Free - www.soda-pro.com. March 2019. http://www.soda-
pro.com/web-services/radiation/helioclim- 3-archives-for- free. Accessed 11 Mar
2019
6. Abdel-Aal, R.: Hourly temperature forecasting using abductive networks. Eng.
Appl. Artif. Intell. 17(5), 543–556 (2004). https://doi.org/10.1016/j.engappai.
2004.04.002
7. Abhishek, K., Singh, M., Ghosh, S., Anand, A.: Weather forecasting model using
artificial neural network. Procedia Technol. 4, 311–318 (2012). https://doi.org/10.
1016/j.protcy.2012.05.047. 2012 C3IT
8. Andreas, A.M.: NREL: Measurement and Instrumentation Data Center (MIDC),
March 2019. https://midcdmz.nrel.gov. Accessed 2 Apr 2019
9. Deihimi, A., Orang, O., Showkati, H.: Short-term electric load and temperature
forecasting using wavelet echo state networks with neural reconstruction. Energy
57, 382–401 (2013). https://doi.org/10.1016/j.energy.2013.06.007
10. Gelaro, R., McCarty, W., Su´arez, M.J., Todling, R., et al.: The Modern-Era Retro-
spective Analysis for Research and Applications, Version 2 (MERRA-2). J. Clim.,
June 2017. https://doi.org/10.1175/JCLI-D-16-0758.1
11. Hayati, M., Mohebi, Z.: Application of artificial neural networks for temperature
forecasting. Int. J. Elect. Comput. Energ. Electron. Commun. Eng. 1(4), 662–666
(2007). https://doi.org/10.5281/zenodo.1070987
12. Ineichen, P.: Quatre ann´ees de mesures d’ensoleillement `a Gen`eve. Ph.D. thesis 19
July 1983. https://doi.org/10.13097/archive-ouverte/unige:17467
13. Korzeniowski, F., Widmer, G.: A fully convolutional deep auditory model for musi-
cal chord recognition. In: 2016 IEEE 26th International Workshop on Machine
Learning for Signal Processing (MLSP), pp. 1–6, Septembe 2016. https://doi.org/
10.1109/MLSP.2016.7738895
14. LeCun, Y., Bengio, Y., et al.: Convolutional networks for images, speech, and time
series. Handb. Brain Theor. Neural Netw. 3361(10) (1995)
15. Ramakrishna, R., Bernstein, A., Dall’Anese, E., Scaglione, A.: Joint probabilistic
forecasts of temperature and solar irradiance. In: IEEE ICASSP 2018. https://doi.
org/10.1109/ICASSP.2018.8462496
16. Salque, T., Marchio, D., Riederer, P.: Neural predictive control for single-speed
ground source heat pumps connected to a floor heating system for typical french
dwelling. Building Serv. Eng. Res. Technol. 35(2), 182–197 (2014). https://doi.
org/10.1177/0143624413480370
17. Salque, T.: M´ethode d’´evaluation des performances annuelles d’un r´egulateur
pr´edictif de PAC g´eothermiques sur banc d’essai semi-virtuel. Ph.D. thesis (2013),
http://www.theses.fr/2013ENMP0095, eNMP 2013