Content uploaded by Louis Desportes

Author content

All content in this area was uploaded by Louis Desportes on Nov 26, 2019

Content may be subject to copyright.

Short-Term Temperature Forecasting

on a Several Hours Horizon

Louis Desportes1(B

), Pierre Andry1, Inbar Fijalkow1, and J´erˆome David2

1ETIS, Univ Paris Seine, Univ Cergy-Pontoise, ENSEA, CNRS,

95000 Cergy-Pontoise, France

{louis.desportes,pierre.andry,inbar.fijalkow}@ensea.fr

2ZenT, 95000 Neuville sur Oise, France

jerome.david@zent-eco.com

Abstract. Outside temperature is an important quantity in building

control. It enables improvement in inhabitant energy consumption fore-

cast or heating requirement prediction. However most previous works on

outside temperature forecasting require either a lot of computation or a

lot of diﬀerent sensors. In this paper we try to forecast outside tempera-

ture at a multiple hour horizon knowing only the last 24h of temperature

and computed clear-sky irradiance up to the prediction horizon. We pro-

pose the use diﬀerent neural networks to predict directly at each hour of

the horizon instead of using forecast of one hour to predict the next. We

show that the most precise one is using one dimensional convolutions,

and that the error is distributed across the year. The biggest error factor

we found being unknown cloudiness at the beginning of the day. Our

ﬁndings suggest that the precision improvement seen is not due to trend

accuracy improvement but only due to an improvement in precision.

Keywords: Forecast ·Temperature ·Smart building ·CNN

1 Introduction

1.1 Motivation: EcobioH2 Building

Today’s buildings energy consumption is decreasing due to progress in used

materials and appliances energy management, so that in the near future we

could envision positive energy buildings, i.e. buildings that produce more energy

than they consume. This implies using a local source of energy, like solar panels

on the rooftop or a windmill, rather than power from the grid. However, local

energy sources are usually intermittent. Therefore, a storage of the energy is

required for the low-to-no-production periods, like batteries, and a way to control

storage/usage periods. This implies knowing in advance the energy production

The research reported in this publication is part of the EcobioH2 project supported by

EcoBio and ADEME, the french agency for environnement and energy. This project is

funded by the PIA, the french national investment plan for innovation.

c

Springer Nature Switzerland AG 2019

I. V. Tetko et al. (Eds.): ICANN 2019, LNCS 11730, pp. 525–536, 2019.

https://doi.org/10.1007/978-3-030-30490-4_42

526 L. Desportes et al.

and demand. Both being highly inﬂuenced by external climate, ourstudy focuses

on temperature forecast.

The EcobioH2 project [3,4] intends to be the ﬁrst low footprint building in

France using hydrogen fuel cells for energy storage and a neural network for its

control. The 6-storey building of approximately 10 000 squared meters will host

retail, cultural, lodging, oﬃces and digital activities. It will have solar panels

on its rooftop to produce energy, hybrid hydrogen energy storage to store it.

EcobioH2 project requires temperature forecast to design an energy control and

monitoring system balancing local energy needs and energy production.

1.2 Temperature Forecast

Temperature forecast is required to reﬁne electricity load forecast [9]. The

inﬂuence of heat is important on appliances consumption and need for cool-

ing/heating. Moreover the weather ﬂuctuations cause behavior shifts of inhabi-

tants.

Knowing what the outside temperature will be in the next 6 to 24 h, we

can predict, and take into account, how much energy will be needed for heating

or cooling the building. The aim of this paper is to investigate the temperature

forecast on a several hours horizon with a limited amount of sensors. In particular

neither wind speed nor wet-bulb thermometers will be available. The exploitation

of our predictor should also not require a too large amount of data.

1.3 Related Works

Diﬀerent methods have been proposed for short-term temperature forecasting. [6]

uses Abductive networks, a method that links multiple Volterra series together

in order to ease network interpretation. However this method uses a network

for each prediction hour which induces a huge complexity. Better methods have

since been found. Those methods are outlined below.

Based upon a physical model of temperature, [15] uses Volterra series to pro-

pose probabilistics forecasts. The authors propose to use hidden Markov models

and the Viterbi algorithm to account for the cloudiness variation. They predict

at short terms of 15 min and 30 min and still have a lot of parameters to learn

(2 Volterra series, 2 HMM, 1 Autoregressive ﬁlter).

[9] is using Echo-State networks hidden state to account for cloudiness. This

implies a long and complex convergence for the network. Furthermore the authors

only predict at an hour horizon, when we want to have a several hours horizon,

or a day horizon using one completely diﬀerent network depending on the hour

the forecast is made.

Simpler Artiﬁcial Neural Networks forecasting methods have also been pro-

posed, such as [11]. The authors use many diﬀerent sensors (wet temperature,

wind speed, humidity, pressure, . . . ) that we don’t have on site. [7] needs the

last 10 years of values of a given day as an input of their network and only

predicts the values of the next day while we need a forecast for several hours of

the current day.

Short-Term Temperature Forecasting on a Several Hours Horizon 527

The closest work to ours is [16] detailed in [17]. After training it predicts

temperature using only the last 24 h of temperature measures and computed

irradiance data. This prediction happens only at the next hour. The authors

then use the prediction in 1 h of the network to predict the temperature in 2 h

yielding to a propagation error that increases with the forecast horizon. Moreover

this prediction method preprocess the data before feeding it to the network. This

preprocessing might limit the network learning capacity.

In the sequel, we will investigate diﬀerent neural networks architectures to

improve the several hours horizon forecast.

2 Model and Problem Setting

We want to forecast the outside temperature Tout for each hour up to an horizon

of H= 6 h or H= 24h. At the trained network input, we use only the N= 24

last hours of temperature values Tout(t−N+1:t) and the computed irradiance

Ith between t−N+ 1 and t+H, where tis the current instant. In other words,

we want to ﬁnd the function fsuch that the H hours temperature forecast

Tout(t+1:t+H) is given by:

Tout(t+ 1 : t+H) = f(Tout (t−N+ 1 :

t); Ith(t−N+1:t+H)), where MSE: T

t=1 H

h=1 Tout(t+h)−

Tout(t+h)2

is minimal. This problem is shown in Fig. 1.

We denote X(t+1:t+H) = (X(t+ 1), X(t+ 2), . . . , X(t+H)) the vector

containing the Hvalues of Xbetween hours t+ 1 and t+Hwith a time step of

1 h. Identically for X(t−N+1:t). The computed irradiance Ith is the clear-sky

irradiance. It is the power received if the sky does not have any cloud. Ith can

be computed using the equations found in [12]. Tout is given in K, and Ith in

W/m2.

Ith(t−N+ 1 : t+H)

Tout(t−N+ 1 : t)

f�

Tout(t+1:t+H)

Fig. 1. Temperature forecast scheme

3 Temperature Forecasting Using Neural Networks

In the sequel, we will use neural networks to learn the function fas deﬁned in

Fig. 1, from the sole Tout and Ith.

Our work is based upon [17]. This method preprocesses the input data as

y1=Ith(t+ 1); Tout(t); max(Tout (t−N+ 1 : t)); min(Tout(t−N+1:t)); Tout(t); Tout(t−1)

where Tout(t) is the mean of Tout (t−N+ 1 : t). This preprocessed input y1is

then fed to an hidden neural network layer with bias b2and a tanh activation

528 L. Desportes et al.

function such that y2=tanh(y1×W2+b2). The hidden layer output is then

fed to the output layer Dense(1) with bias b2and no activation function giving

the one value output

Tout(t+ 1) = y2×w3+b3. This method is displayed in

Fig. 2. We suspect that using

Tout(t+ 1) as an input of the neural network to

predict Tout(t+ 2) may induce error propagation.

Ith(t−N+h:t+h−1)

Tout(t−N+h:t+h−1)

P reprocess Dense

y1Dense(1)

y2�

Tout(t+ 1)

delay

Fig. 2. [17] network structure

3.1 Multi-horizon

In order to avoid error propagation, we propose to adapt the network proposed

in [17] to directly forecast up to t+Hhorizons with each t+n,n∈[1, H] as an

output of the network.

We preprocess the data in the same way as [17]. However to ensure our

network has the same input information as their when run over Hhorizons we

add Ith(t+ 2 : t+H) to the output of the preprocessing: y1=Ith (t+ 1 : t+

H); Tout(t); max(Tout (t−N+1 : t)); min(Tout (t−N+1 : t)); Tout (t); Tout(t−1).

In the contrary of [9], we don’t want to train as many networks as the number of

outputs. This means that our network’s Dense layer is common to all prediction

horizons. The formula for this layer is the same, only the dimension changes; the

output is made of Hvalues, one for each of the horizon time-step. This yields

to:

Tout(t+1:t+H)=y2×W3+b3This network named preprocess multih

is shown in Fig. 3.

Ith(t−N+ 1 : t+H)

Tout(t−N+1:t)

P reprocess Dense

y1Dense(H)

y2�

Tout(t+1:t+H)

Fig. 3. preprocess multih network structure

Short-Term Temperature Forecasting on a Several Hours Horizon 529

3.2 Raw Input

In order to understand if the preprocessing proposed by [17] limits the method

performance, we propose to remove the inputs preprocessing and to feed the

network with the raw inputs: in yr= (Ith (t−N+1:t+H); Tout(t−N+ 1 :

t)). These raw inputs are sent to Dense and Dense(H) layers using the same

formulas as the previous network preprocess multih with diﬀerent dimension.

Figure 4show this network named raw multih.

Ith(t−N+ 1 : t+H)

Tout(t−N+ 1 : t)

Dense Dense(H)�

Tout(t+ 1 : t+H)

Fig. 4. raw multih structure

3.3 Convolutions

Ith(t−N+ 1 : t+H)

Tout(t−N+ 1 : t)

filterI1

filterT1

...

. . .

filterIC

filterTC

Dense

convI1

convT1

convIC

convTC

Dense(H)

y2�

Tout(t+ 1 : t+H)

Fig. 5. conv multih structure

Next, we investigate the usage of a convolutional layer enabling the network to

do a better analysis of the inputs [14] because convolutions enable the network

to factor for local temporal convolutions. Doing so, we adopt a similar approach

as in audio processing [13]. In our case a 1D convolution should be suﬃcient

since our signal seems has a slow frequency evolution. To our knowledge, it is

the ﬁrst attempt to apply such solution to temperature forecasting.

Each raw input is fed to a separate convolutional layer with bias, no activation

and no padding. For each convolution cin C, the number of convolutions, the

formula is convIc=Ith(t−N+1:t+H)∗filterIc+bIcand convTc=

Tout(t−N+1:t)∗filtersTc+bTc. With ∗the convolution operator. The output

530 L. Desportes et al.

of those two convolutional layer is ﬂattened and concatenated in a 1 dimension

vector y1to be sent to the hidden layer. The hidden layer Dense and the output

layer Dense(H) use the same formulas as raw multih and preprocess multih.

This method depicted in Fig. 5is named conv multih.

3.4 Linear Predictor

For the sake of comparison, we want to measure the beneﬁt of the neural networks

in regards to linear forecasts. We will call the linear method linear raw multih.

In this case, the same raw inputs, Ith(t−N+ 1 : t+H) and Tout(t−N+1:t),

are fed to the output layer with bias band no activation function:

Tout(t+1:t+H)=yr×W+b

4 Available Datasets

There are many datasets available that take their data from weather stations

around the world. The World Meteorological Organization (WMO) has its own

set of weather stations. Composed of an aggregation of weather stations from

country speciﬁc meteorological organizations. Information about the current

weather status is broadcasted using synoptic code, also known as “code synop”

[1]. Each station has its own diﬀusion schedule from each hour to every 6 h.

The aeronautic industry also has its own weather records called Metar

(METeorological Aerodrome Report) [2]. Each airport makes its own report and

broadcasts it every half hour.

The US National Radiation Research Laboratory built some weather stations

to study solar radiations. Their data [8] is freely available on each project website

and include the local temperature.

Other datasets take their sources from satellite observations. They only use

weather stations to calibrate the interpretation of their imagery. Satellite imagery

has the beneﬁt of having data for more locations instead of a few discrete measure

points.

In this work, we use the NASA Merra-2 [10] data. Those data are composed

of an aggregation of diﬀerent, wordwide, observations with a 1 h frequency. The

dataset is packed with clear-sky irradiance and available freely for speciﬁc loca-

tions and the years 2005–2006.

In the following experiments we used the Merra-2 dataset on the town of Avi-

gnon (43.95◦, 4.817◦) in France, the location of the EcobioH2 building, obtained

through SoDa HelioClim-3 Archives for free [5]. Using the data of 2005 as the

train set and of 2006 as the validation and test set. Using the same data for

test and validation can be made since we did not reach overﬁtting in any of

our training. And therefore, could not optimize the number of epochs for the

validation dataset.

Short-Term Temperature Forecasting on a Several Hours Horizon 531

5 Used Metrics

We pose T=Tout(t+ 1 : t+H) to improve readability and Nthe sample size.

Next, we recall the equations of diﬀerent metrics.

RMSE =

1

N

N

i=0

(Ti−

Ti)2(1)

RMSE is our primary metric. It lends itself easily to interpretation as an error

interval since it is expressed in the same metric as the output variable.

R2= 1 −N

i=0(Ti−

Ti)2

N

i=0(Ti−Ti)2(2)

R2allows us to know how much of the signal is predicted. Giving us an idea of

our room for improvement.

MAE =1

N

N

i=0

|Ti−

Ti|(3)

MAE is used in many works. It can be interpreted as an error interval but does

not penalize far-oﬀpredictions.

6 Results

Using the training data, we perform a stochastic gradient descent algorithm in

order to ﬁnd the diﬀerent parameters Wiand bifor the diﬀerent networks and

hyperparmeters combination. Then we evaluate each hyperparmeter combina-

tion on the train set and select the best one. The result is displayed in Table 1.

Table 1. Best hyper-parameters found for each network

Algorithms Epochs Learning

rate

Batch

size

Number of

neurons in

hidden layer

Conv

sizes

Number

of conv

[17] 50 k 0.001 8 5

preprocess multih 50 k 0.001 8 50

linear raw multih 150 k 0.001 32

raw multih 150 k 0.001 32 14

conv multih 50 k 0.001 8 60 3 24

Then we predict on the test set to obtain RMSE and MAE and R2. While

the error values of RMSE and MAE are in Kelvin, they are equal to the error

values in degree Celsius.

532 L. Desportes et al.

In Table 2we see that [17] error increases way more the further the horizon.

It suggests that indeed using

Tout(t+ 1) as an input of the neural network to

predict Tout(t+ 2) induce error propagation. Therefore our approch to multiple

horizon forecast is the right one.

Table 2. RMSE (K) for each horizon and network

Algorithms t + 1 t + 2 t + 3 t + 4 t + 5 t + 6

[17] 0.475 0.996 1.433 1.753 1.962 2.084

preprocess multih 0.412 0.715 0.962 1.171 1.354 1.515

linear raw multih 0.409 0.756 1.030 1.256 1.413 1.540

raw multih 0.375 0.644 0.882 1.098 1.289 1.428

conv multih 0.340 0.602 0.846 1.053 1.236 1.380

The same Table exhibits that, according to the RMSE metric, the best pre-

cision for all given metrics is achieved with the conv multih network. We explain

this result by the ability of convolution to characterize the sky cloudiness. Table 3

is available to enable comparison with other works who uses this metric.

Table 3. MAE (K) for each horizon and network

Algorithms t + 1 t + 2 t + 3 t + 4 t + 5 t + 6

[17] 0.342 0.761 1.125 1.396 1.568 1.665

preprocess multih 0.293 0.531 0.729 0.895 1.038 1.169

linear raw multih 0.294 0.565 0.785 0.964 1.089 1.182

raw multih 0.277 0.481 0.660 0.832 0.982 1.091

conv multih 0.2413 0.440 0.629 0.792 0.936 1.048

Table 4indicates that we predict most of the signal. The fact that even the

most basic predictor, linear raw multih, gives excellent results validate the way

Table 4. R2in percentage for each horizon and network

Algorithms t + 1 t + 2 t + 3 t + 4 t + 5 t + 6

[17] 99.72 98.77 97.44 96.17 95.20 94.59

preprocess multih 99.79 99.36 98.85 98.29 97.72 97.14

linear raw multih 99.79 99.29 98.68 98.03 97.51 97.05

raw multih 99.82 99.48 99.03 98.50 97.93 97.46

conv multih 99.86 99.55 99.11 98.62 98.10 97.63

Short-Term Temperature Forecasting on a Several Hours Horizon 533

we stated the problem of temperature forecast. However even small forecasting

improvement can be useful since they can be leveraged by other predictors.

7 Analysis

We analyze in more details the results of the proposed conv multih, because

it is the most precise one, to understand its weaknesses. In Fig. 6, we plot the

RMSEagainst the prediction hour. We see that there is a spike in error at the

beginning of the day, from 5 am to 8 am. Since cloudiness is deﬁned as c=

1−Ireal

Iexpected = 1 −Ilocal

Iclear sky , and since the clear sky irradiance is zero before

sunrise, we can’t have a cloudiness information before the sunrise. Hence, this

spike is due to the insuﬃcient information regarding the upcoming cloudiness of

the day.

Fig. 6. RMSE in Kelvin, depending on the forecast hour for the best network

We did the same analysis regarding the month of the instant t(Fig. 8) and

the evolution of error (RMSE) during the year (Fig. 7). No other spike can be

seen, the error is evenly shared across the year.

Fig. 7. RMSE in Kelvin, versus the day of the year for t + 1 and t + 6

534 L. Desportes et al.

Fig. 8. RMSE in Kelvin, versus the month for the best network

The location of prediction has a great inﬂuence on the prediction error. We

see in Table 5that Nice, a city by the sea in the south of Avignon, has better

results than Avignon. This is due to the climatic conditions of the city, Nice

having way less clouds than Avignon. We used Nice as a comparison point since

it is the location [17] used.

Table 5. RMSE (K) of conv multih for the cities of Nice and Avignon

Location t + 1 t + 2 t + 3 t + 4 t + 5 t + 6

Nice 0.2171 0.4006 0.5635 0.7004 0.8049 0.8916

Avignon 0.3397 0.6021 0.8460 1.0527 1.2355 1.3800

The goal of forecasting is to be precise (RMSE). It should be noted that this

precision can be slightly improved by letting the training continue even when the

gradient is small. This can be seen when using a logarithmic scale as in Fig. 9.

Still this improvement cost a lot more computations per error unit.

We also want to know if the prediction is reliable. For this reason, we intro-

duce the trend accuracy:

trendk(x) = 0 if |x|≤k

sign(x) otherwise

trend accuracyk=count(trendk(

Ti) = trend(Ti))

N

(4)

That is the accuracy of the network to forecast if the temperature will rise,

fall or stay constant. We choose k= 0.3 as the interval for the constant class.

From our results, Table 2, this value seems to be lowest standard deviation we

could have. Thus values in this interval could be seen as a stable trend.

Short-Term Temperature Forecasting on a Several Hours Horizon 535

Fig. 9. Normalized MSE of the best network throughout learning

In Table 6we see that if the conv multih is always very close from the best

accuracy. It is rarely the best one. When preprocessing is removed accuracy

values are in a very small interval. This show the improvement seen in RMSE is

not due to an improvement in accuracy but solely in an improvement in precision.

Keep in mind that the categorization of trends been a bit arbitrary, variation is

to be expected, so the diﬀerence seen may not be signiﬁcant.

Table 6. Trend accuracy (.3) for each horizon and network

Algorithms t + 1 t + 2 t + 3 t + 4 t + 5 t + 6

[17] 0.834 0.710 0.659 0.665 0.683 0.695

preprocess multih 0.835 0.735 0.690 0.694 0.683 0.689

linear raw multih 0.809 0.724 0.704 0.704 0.699 0.705

raw multih 0.811 0.729 0.714 0.697 0.699 0.697

conv multih 0.848 0.756 0.713 0.698 0.696 0.695

8 Conclusion

In this paper, we proposed several neural networks for temperature forecast

based on the sole previous 24 h temperature and computed irradiance. We showed

that convolutional neural networks are a good tool for temperature forecasting.

The proposed networks display precision improvement over linear predictors and

non-linear ones. However progress should be made to account for cloudiness at

sunrise and improve prediction accuracy. Our solution has the main advantage

of not propagating forecasting error through time, and to have the best precision

of forecast.

536 L. Desportes et al.

References

1. International Codes, Volume I.1, Annex II to the WMO Technical Regulations:

part A- Alphanumeric Codes (2011–2018). https://library.wmo.int/doc num.php?

explnum id=5708

2. Meteorological Service for International Air Navigation (Annex 3) (2013). https://

www.icao.int/Meetings/METDIV14/Documents/an03 cons secured.pdf

3. ECOBIO H2 – ADEME, March 2019. https://www.ademe.fr/ecobio-h2

4. Ecobioh2 - etis, February 2019. https://ecobioh2.ensea.fr

5. HelioClim-3 Archives for Free - www.soda-pro.com. March 2019. http://www.soda-

pro.com/web-services/radiation/helioclim- 3-archives-for- free. Accessed 11 Mar

2019

6. Abdel-Aal, R.: Hourly temperature forecasting using abductive networks. Eng.

Appl. Artif. Intell. 17(5), 543–556 (2004). https://doi.org/10.1016/j.engappai.

2004.04.002

7. Abhishek, K., Singh, M., Ghosh, S., Anand, A.: Weather forecasting model using

artiﬁcial neural network. Procedia Technol. 4, 311–318 (2012). https://doi.org/10.

1016/j.protcy.2012.05.047. 2012 C3IT

8. Andreas, A.M.: NREL: Measurement and Instrumentation Data Center (MIDC),

March 2019. https://midcdmz.nrel.gov. Accessed 2 Apr 2019

9. Deihimi, A., Orang, O., Showkati, H.: Short-term electric load and temperature

forecasting using wavelet echo state networks with neural reconstruction. Energy

57, 382–401 (2013). https://doi.org/10.1016/j.energy.2013.06.007

10. Gelaro, R., McCarty, W., Su´arez, M.J., Todling, R., et al.: The Modern-Era Retro-

spective Analysis for Research and Applications, Version 2 (MERRA-2). J. Clim.,

June 2017. https://doi.org/10.1175/JCLI-D-16-0758.1

11. Hayati, M., Mohebi, Z.: Application of artiﬁcial neural networks for temperature

forecasting. Int. J. Elect. Comput. Energ. Electron. Commun. Eng. 1(4), 662–666

(2007). https://doi.org/10.5281/zenodo.1070987

12. Ineichen, P.: Quatre ann´ees de mesures d’ensoleillement `a Gen`eve. Ph.D. thesis 19

July 1983. https://doi.org/10.13097/archive-ouverte/unige:17467

13. Korzeniowski, F., Widmer, G.: A fully convolutional deep auditory model for musi-

cal chord recognition. In: 2016 IEEE 26th International Workshop on Machine

Learning for Signal Processing (MLSP), pp. 1–6, Septembe 2016. https://doi.org/

10.1109/MLSP.2016.7738895

14. LeCun, Y., Bengio, Y., et al.: Convolutional networks for images, speech, and time

series. Handb. Brain Theor. Neural Netw. 3361(10) (1995)

15. Ramakrishna, R., Bernstein, A., Dall’Anese, E., Scaglione, A.: Joint probabilistic

forecasts of temperature and solar irradiance. In: IEEE ICASSP 2018. https://doi.

org/10.1109/ICASSP.2018.8462496

16. Salque, T., Marchio, D., Riederer, P.: Neural predictive control for single-speed

ground source heat pumps connected to a ﬂoor heating system for typical french

dwelling. Building Serv. Eng. Res. Technol. 35(2), 182–197 (2014). https://doi.

org/10.1177/0143624413480370

17. Salque, T.: M´ethode d’´evaluation des performances annuelles d’un r´egulateur

pr´edictif de PAC g´eothermiques sur banc d’essai semi-virtuel. Ph.D. thesis (2013),

http://www.theses.fr/2013ENMP0095, eNMP 2013