Content uploaded by Xiaodong Jia

Author content

All content in this area was uploaded by Xiaodong Jia on Aug 06, 2019

Content may be subject to copyright.

Gaussian Process Regression for Numerical Wind Speed Prediction Enhancement

1

Haoshu Caia, Xiaodong Jiaa*, Jianshe Fenga, Yuan-Ming Hsua, Wenzhe Lia, Jay Leea

2

a NSF I/UCR Center for Intelligent Maintenance Systems,

3

Department of Mechanical Engineering, University of Cincinnati, PO Box 210072,

4

Cincinnati, Ohio 45221-0072, USA

5

6

Abstract:

7

This paper studies the application of Multi-Task Gaussian Process (MTGP) regression model

8

to enhance the numerical predictions of wind speed. In the proposed method, a Support Vector

9

Regressor (SVR) is first utilized to fuse the predictions from Numerical Weather Predictors (NWP).

10

The purpose of this regressor is to map the numerical predictions at coarse geographical nodes to

11

the desired turbine location. In subsequent analysis, this SVR prediction output is further enhanced

12

by the MTGP regression model. Based on the validation results on the real-world data from large-

13

scale off-shore wind farm, the prediction accuracies of the NWP are significantly improved at both

14

the short-term horizons (1~6 hours ahead) and the long-term horizons (7~24 hours ahead) by

15

employing the proposed method. More importantly, the short-term prediction accuracy after

16

enhancement is found comparable or even better than the cutting-edge statistical models for short-

17

term extrapolations.

18

19

Keyword: Wind speed prediction, Multi-Task Gaussian Process, Gaussian Process Regression,

20

Support Vector Machine, Time Series Prediction, Forecasting

21

22

23

Correspondence Author:

24

Xiaodong Jia; E-mail: jiaxg@mail.uc.edu; Tel: (513)556-3412; Fax: (513)556-3390

25

Address: 560 Baldwin Hall, University of Cincinnati, PO Box 210072, Cincinnati, OH 45221

26

27

28

1

1. Introduction

29

Driven by the demand of renewable energy, large numbers of wind power generators are

30

erected over the past decade [1-3]. However, one limitation that impedes the further development

31

of wind farm is its high Operation and Maintenance (O&M) cost, which is essentially caused by

32

the uncertainty of wind power production. To better adapt to the inconsistent wind conditions and

33

reduce the costs, wind speed (WS) prediction is identified as one of the key inputs for the wind

34

farm power dispatching and the maintenance planning. In current practices, both the short-term

35

WS prediction for the future 0-6 hours and the long-term prediction for future 7-24 hours are

36

important inputs to meet the requirements of power grid dispatching, and the power output for

37

each wind turbine are optimized based on the predicted wind conditions[4, 5]. For maintenance

38

planning, the short-term WS prediction within future 6 hours also serves as a critical input to

39

schedule the maintenance activities for the next day and to minimize the production loss [6-9]. To

40

this end, investigations on the advanced analytics for WS prediction hold great economic value

41

and academic value.

42

WS prediction is intrinsically challenging due to the intermittent fluctuations in WS under

43

intricate meteorological conditions. To better predict the WS at different time horizons, the short-

44

term prediction and long-term prediction are done by different types of data driven models. The

45

short-term prediction is largely based on statistical approaches, which extrapolates the WS series

46

by modeling its time evolution using statistical models. Related examples involve Auto-Regressive

47

(AR) and Auto-Regressive Moving Average (ARMA), Artificial Neural Network (ANN) and

48

Kalman Filter (KF) or Unscented Kalman Filter (UKF) that are discussed in [10-13]. To enhance

49

the performance and the robustness of these prediction algorithms, various combined models are

50

also proposed in the literature. Monfared et al. [12] utilize fuzzy logic and ANN to model WS

51

estimation based on the statistic properties of the input time series. Kani et al.[14] use ANN and

52

Markov Chain to capture patterns in WS time series data. Chen et al.[15] integrate Support Vector

53

Regression (SVR) with KF to realize dynamic state estimation. Santamaría-Bonfil et al.[16, 17]

54

utilize SVR model by tuning the model parameters using heuristic algorithms such as Genetic

55

Algorithm (GA) and Particle Swam Optimization (PSO). For the long-term WS prediction, the

56

prediction outputs from the Numerical Weather Predictors (NWP) are generally preferred since

57

the accuracy of the statistical models deteriorates very fast when larger prediction horizon is

58

considered. The prediction results from NWP are usually given at coarse geographical grids and

59

these results may indicate systematic prediction bias on complex terrain. To address these issues,

60

the NWP results are used as reference predictions and regression models are employed to post-

61

process the NWP outputs. For example, KF is explored as a post-processing method in [18] to

62

correct the prediction results from NWP model to avoid systematic bias. Similarly, a practical

63

methodology based on KF is utilized to improve the prediction of NWP in [19] and the results are

64

validated on two years’ data.

65

By summarizing these related researches, it is found that the statistical models for time series

66

extrapolation can give rather satisfactory accuracy in the short-term horizons (1~6 hours ahead),

67

however, its accuracy deteriorates very fast when the prediction horizon exceeds 6 hours. For long-

68

term prediction (7 hours), the predictions from NWP are necessary to guarantee the prediction

69

accuracy. However, since the NWP results are given at coarse geographical grids and due to the

70

complexity of wind dynamics itself, the direct output from NWP normally has bias comparing

71

2

with prediction target. This bias in prediction is found more prominent in complex terrain and

72

dynamic wind environment. Although the KF structure in [18] and [19] mitigates the prediction

73

bias of NWP to some extent, this method still fails to yield satisfactory results in short-term

74

prediction horizons. Moreover, the KF needs too many prior assumptions, such as the regression

75

coefficients in KF need to be known beforehand, the noise term in KF needs to follow Gaussian

76

distribution and the dynamic process modeled by KF must be linear.

77

To address these challenges and limitations, this work proposes to use the Multiple Task

78

Gaussian Process (MTGP) to post-process the numerical weather predictions. In this work, a novel

79

methodology for NWP enhancement is proposed based on the MTGP model. One major

80

contribution of the proposed method is that it not only enhances the prediction accuracy in the

81

long-term horizons (7~24 hours ahead) but also significantly improved the accuracy in the short-

82

term horizons (1~6 hours ahead). Especially, the short-term prediction accuracy of the proposed

83

method is found even better than the statistical models that are specifically proposed for short term

84

extrapolations. In the proposed method, spatial correlation of wind speed between the prediction

85

grids of NWP and the turbine nacelle position are first modeled by SVR. Subsequently, the

86

prediction outcome of SVR are further enhanced by MTGP. The superiority of the proposed

87

method is demonstrated by benchmarking with cutting-edge prediction techniques for short term

88

predictions in [15] and the recent techniques for NWP enhancement in [18, 19]. To the author’s

89

knowledge, this is the first time that MTGP is applied to address wind speed prediction issues.

90

And there are still limited prediction methods in literature that can achieve good accuracy in both

91

short-term and long-term prediction horizons.

92

The rest of this paper is organized as follows. Section 2 illustrates the technical backgrounds.

93

In Section 3, the proposed methodology is presented and described. Section 4 shows the

94

experiment results, the comparison with several benchmarks and the discussions. Finally,

95

conclusions and future work are presented in Section 5.

96

2. Technical Backgrounds

97

2.1. Standard Gaussian Process Regression

98

Gaussian Process Regression (GPR) is a non-parametric method that can model arbitrary

99

complex system. In most prediction problems, GPR is preferred due to its flexibility to provide the

100

uncertainty representations. GPR models a time series using Gaussian prior that is parameterized

101

by the mean function (MF) and a covariance function (CovF) as described below:

102

( )

( ) ~ ( ), ( , )f m k=y x x x x'N

(1)

In Eq.(1), and denote the input and output in the training dataset and is known as

103

latent variable in the GPR model. In most applications, the mean function in Eq.(1) is set to

104

0, and CovF , which describes the similarity between input data points, is the key ingredient

105

in GPR since data points with similar input are likely to have similar target value [20]. In the

106

current literature, one of the most frequently-used kernel function squared exponential (SE) is

107

shown below:

108

3

2

2

12

2

( ) exp 2

SE

d

kd

=−

(2)

Where in Eq.(2) are the Euclidean distance between two indexes .

109

denotes the modified Bessel function. Parameters, in Eq.(2) are the hyper parameters that

110

need to be optimized.

111

During the model training, the negative log marginalized likelihood (NLML) in Eq. (3) is

112

minimized, so that the hyper-parameters in the kernel matrix can be estimated.

113

( )

2 2 1

NLML = log ( | , )

11

log | | ( ) log(2 )

2 2 2

nn

p

n

−

−

= − + − + −

T

yx

K I y K I y

(3)

The unknown hyper-parameter in Eq. (3) is determined by minimizing the NLML. The

114

optimization problem for parameter estimation is written as:

115

ˆargmin log( ( | , ))

=−pxy

(4)

Since the NLML is a convex function, it can be optimized by off-the-shelf optimization algorithms,

116

such as gradient descent.

117

After the model training, the predictive distribution of GPR at testing data point can be

118

described as:

119

*

| ~ ( ,cov( ))N

* * *

f x,y,x f f

(5)

( )

21

( ) ( , )( ( , ) ) ( )

n

mm

−

= + + −

* * *

f x K x x K x x I y x

(6)

( )

1

2

* * *

cov ( , ) ( , ) ( , ) ( , )

n

−

= − +

**

f K x x K x x K x x I K x x

(7)

Where the

is the prediction results and

demonstrates the prediction uncertainty.

120

The mean of GPR predictive distribution in Eq.(6) is a linear combination of target variable

121

in the training set, when the mean function . Under this condition, the mean of

122

predictive distribution can be re-written as:

123

21

( , )( ( , ) )

−

= + =

n GPR**

f K x x K x x I y W y

(8)

Where is the weighting matrix for the standard GPR.

124

The mean function of GPR is normally set to be 0 for trend-free time series. GPR is

125

known as non-parametric approach which can be employed to model time series or systems with

126

4

arbitrary complexity when provided with sufficient data. A non-zero mean function is normally

127

employed when a clear trend is observed from the time series or there is a sound assumption of the

128

trend term. Like in [21], an exponential trend term is employed as the mean function to better

129

extrapolate the degradation trajectory of the battery cell in the long term. In the present study, the

130

mean function is set to 0 since we did not see a consistent trend term of wind speed in the prediction

131

horizon.

132

When using GPR to make prediction, there are several general steps to follow:

133

Step1: Given training data and , the hyper-parameters in the GPR model is obtained by

134

minimizing the NLML in Eq. (3);

135

Step2: Given the testing time index , the predictive distribution of GPR is obtained by

136

using Eq.(5)~Eq.(6). The hyper-parameters in Eq.(5)~Eq.(7) are obtained from Step 1;

137

Step3: The mean of the predictive distribution in Eq.(6) is employed as the predicted value,

138

the confidence interval is derived by using the covariance function in Eq.(7). In this study, the

139

error bound is as

2cov( )

**

ff

.

140

2.2. Multi-Task Gaussian Process Regression

141

MTGP is an extension of the GPR model, and it is described as a special case of standard

142

GPR [22], to deal with the situation when GPR model has multiple outputs. MTGP was originally

143

proposed in [23], and the superiority of MTGP in the multivariate psychological time-series

144

analysis was demonstrated in [24]. Another more recent study about using MTGP for battery

145

capacity prediction also presents improved results[21]. In the setting of WS prediction, the input

146

of MTGP is the time indices of the WS series, and the output of MTGP is multiple WS series

147

including the historical WS at turbine nacelle and the reference series from NWP model.

148

The key in MTGP is to recognize the correlation across multiple outputs by using the novel

149

covariance kernel function below:

150

( , ', , ') ( , ') ( , ')

MTGP c t

k x x l l k l l k x x=

(9)

Where represent the indices of series and there are series in total, and

151

in Eq.(9) model the correlation across the multiple outputs and the covariance for one series

152

respectively. denote the time indices for task and . Based on Eq.(9), the kernel matrix of

153

MTGP can be constructed as：

154

( , , , ) ( , ) ( , )

c t c t

=

MTGP C t

K X L K L K X

(10)

Where

is the Kronecker product. and are the hyper-parameters in the kernel matrix. In

155

Eq.(10), is a similarity matrix that models the correlation or similarity across multiple

156

series, is a symmetric matrix that models the covariance across the time indices for the

157

-th series, represents the number of time indices for the -th serie. Therefore, is a

158

matrix that captures the similarity across multiple output series. To ensure that

159

5

is positive semi-definitive, is constructed based on the Cholesky decomposition as

160

below:

161

(1,1) (1,2) (1, )

(2,1) (2,2) (2, )

( ,1) ( ,2) ( , )

c c c m

c c c m

T

c m c m c m m

LL

==

C

K

(11)

Where is a lower triangular matrix. The elements in represent the similarity level between

162

each pair of the reference series. According to the description in [23, 24], these elements in can

163

be interpreted as correlation coefficients.

164

3. Methodology

165

3.1. Using MTGP for NWP enhancement

166

When using MTGP for time series prediction, all the training and testing procedures are

167

the same with traditional GPR except the construction of the kernel matrix. To better illustrate the

168

kernel matrix in MTGP, we use the case of two output tasks as an example. In this scenario, the

169

MTGP model can be constructed as:

170

( )

( )

( ) ( )

( )

0, 0, (

==

rr r r rp r p

r

MTGP

ppr p r pp p p

NK x ,x K x ,x

yK x,x)

yK x ,x K x ,x

(12)

Where

1

r

r

y

and

1

p

p

y

are two outputs with different dimensionality,

r

x

and

p

x

are

171

the models input for the two output tasks, the dimensionality of

r

x

and

p

x

should be the same.

172

In 1D case or time series prediction,

r

x

and

p

x

are the time indices the two time series.

rr

,

rp

173

,

pr

and

pp

denotes the correlation coefficients of two output series, these four coefficient are

174

treated as part of the hyper-parameters in MTGP and they are obtained by optimizing the NLML

175

in the model training. It is also important to note that

=

rp pr

is always valid due to Eq.(11).

176

Application of MTGP for NWP enhancement is illustrated in Fig. 1. In Fig. 1,

r

y

serves as

177

the reference wind speed series that is given by the NWP and its time indices of

r

y

is written as

178

,..., ,..., 24= − +

rt k t tx

.

p

y

in Eq.(12) corresponds to the measured wind speed series at

179

the turbine nacelle which is going to be extrapolated to the future 24 hours, the time indices of

p

y

180

is as

,...,=−

pt k tx

, where is the length of time window for model construction. By using

181

6

r

y

,

p

y

,

r

x

and

p

x

as training data, a MTGP model is constructed and a better prediction at time

182

*1,..., 24= + +

pttx

can be obtained. The predictive distribution can be simply obtained

183

following Eq.(5)~Eq.(7).

184

185

Fig. 1 MTGP Enhancement for Reference-based Prediction

186

Like the standard GPR, the prediction results of MTGP can be also interpreted as a linear

187

combination of the historical observations, which can be written as:

188

* * 2 1

( ( ( , ) ) ,

−

= + =

rr

p MTGP p MTGP n MTGP

pp

yy

f K x,x ) K x x I W

yy

Where

( )

( )

( ) ( )

( ) ( )

( ) ( )

* * *

24 ( )

)

( , )

+ +

+

=

=

rr r r rp r p

MTGP

pr p r pp p p r p r p

MTGP p pr p r pp p p rp

K x ,x K x ,x

K (x,x K x ,x K x ,x

K x x K x ,x K x ,x

(13)

From Eq. (13), one can easily find that the prediction output of the MTGP model is simply a

189

linear combination of

r

y

and

p

y

. Consequently, MTGP is employed as a novel approach to

190

further enhance the prediction accuracy of NWP in this investigation.

191

Due to the fact that the prediction result of GPR and MTGP is given as a linear combination

192

of the

r

y

and

p

y

, the MTGP model enhances the NWP results both in the short-term and long-

193

term accuracy. The presence of

( )

pp p p

K x ,x

term in the kernel matrix mainly contributes to

194

the short-term prediction accuracy, since the

pp

is more dominant than

pr

in the short-term

195

7

horizons. The prediction result considers

r

y

as a reference in the long-term extrapolations, which

196

is why the prediction accuracy does not deteriorate significantly in the long-term horizons. More

197

importantly, the algorithms can achieve enhanced prediction accuracy in both short-term and long-

198

term horizons mainly because MTGP automatically decides the optimal trade-off between the

199

NWP output and the extrapolation of measured wind speed at turbine nacelle. This is achieved by

200

obtaining the optimal

pp

and

pr

in the model training phase. Therefore, the model is expected

201

to be superior than NWP in long-term horizons and to be superior than statistical models in short-

202

term horizons.

203

As a summary of the discussion, the algorithm proposed for NWP enhancement based on

204

MTGP is described as below. It is important to note that the proposed method requires to re-train

205

the MTGP model at each prediction step. This implies that the proposed model has no bias known

206

as seasonal effects, since a new model is derived at each prediction step by utilizing the data in the

207

short past only.

208

Algorithm 1. Using MTGP for NWP output enhancement

209

At certain time step ,

Step 1:

Initialize

r

y

,

p

y

,

r

x

,

p

x

,

*

p

x

and the time window length as:

,..., ,..., 24= − +

rt k t tx

,

*1,..., 24= + +

pttx

,

,...,=−

pt k tx

,

1

r

r

y

is the NWP prediction at time

r

x

,

1

p

p

y

is the measured wind speed

at turbine nacelle during time

p

x

Step 2:

Construct the MTGP as Eq. (12) and obtain the optimized hyper-parameters

rr

,

rp

,

pp

and other hyper-parameters in the kernel function. The optimal hyper-

parameters are obtained by minimizing the NLML.

Step 3:

Obtain the predictive distribution of MTGP. The prediction mean is described as Eq.

(13).

Step 4:

Propagate to time and repeat Step 1~Step 3.

3.2. The proposed methodology and implementation

210

211

The goal of this research is to predict WS in future 24 hours. The given data in this research

212

involves the WS at the turbine nacelle collected by the Supervisory Control and Data Acquisition

213

(SCADA) system and the weather forecast data from the NWP model. The NWP data is given as

214

the average WS within 1 hour at each forecast gird from three different heights, 10m, 50m and

215

90m. In this research, the NWP data from today and one day before is used to build the model.

216

Therefore, at any specific point of time, the available data includes the SCADA data up till now,

217

and the NWP data updated today and 1 day ago. Spatially, the NWP data includes the WS

218

8

prediction at 9 grid nodes in Fig. 2, which has rather coarse spatial resolution of roughly 9.5

219

kilometers. The Wind Turbine (WT) position is located within the area covered by these grid nodes,

220

as shown in Fig. 2, and the heights of the turbines remain unknown.

221

222

Fig. 2 WT Position and Weather Forecast Grid Position.

223

224

Fig. 3 Statement of the WS prediction problem

225

The proposed methodology is illustrated in Fig. 3. In Fig. 3, the time resolution of NWP data

226

is 1 hour. As mentioned before, the NWP data reported from today and one day before is utilized.

227

Therefore, the dimensionality of numerical weather predictions is . As shown in

228

Fig. 3, a time window with length is needs to establish the SVR model between all 54 numerical

229

wind predictions and the measured wind speed series. The purpose of this SVR model is to give a

230

reference prediction series

r

y

in the future 24 hours. This prediction is merely based on the NWP

231

outputs and will be subsequently enhanced by the MTGP model. In the MTGP enhancement step,

232

the detailed procedures for model construction and making prediction are described in Algorithm

233

1. It is important to highlight that the SVR model and the MTGP model are re-trained at each time

234

step by using the historical data for model construction. Therefore, the seasonal effect of the wind

235

speed distribution is not a concern in the present model, because the prediction is made based on

236

the predication output of NWP and the extrapolation of wind series in the recent past.

237

238

9

To calibrate the performance of the propose method, Root Mean Square Error (RMSE) and

239

Mean Absolute Percentage Error (MAPE) are used as criteria to evaluate the prediction accuracy.

240

Suppose the WS data from current time point to the next 24 hours is

241

and the predicted WS data is referred as

242

, RMSE and MAPE at a certain prediction horizon are calculated as

243

follows:

244

* ' 2

,,

1

()

=

−

=

m

i h i h

i

h

xx

RMSE m

(14)

*'

,,

'

1,

1

=

−

=

mi h i h

hiih

xx

MAPE mx

(15)

Where represents the number of prediction steps, denotes the prediction horizon.

245

To better demonstrate the improvements made by the proposed methodology, the methods

246

listed in Table 1 will be benchmarked. In Table 1, Best NWP model uses RMSE to select one

247

series of NWP data with the highest forecast accuracy. SVR-NWP model uses an SVR model to

248

fuse the NWP data for prediction as Step 1 in Fig. 3. SVR+UKF focuses on short-term prediction

249

and it is a dynamic methodology that is proposed in [15].

250

To compare with other peer algorithms for NWP enhancement, KF structure that is discussed

251

in [18, 19] are implemented and benchmarked. In their discussions, the bias of the NWP is modeled

252

as a high-order polynomial:

253

* *2 *3

0, 1, 2, 3,

= + + + +

t t t t t t t t t

e c c x c x c x v

(16)

Where

t

e

is a scalar bias of NWP at time ,

0,t

c

,

1,t

c

,

2,t

c

and

3,t

c

are the polynomial coefficients.

254

t

v

is a Gaussian process noise. The above equation is implemented in a KF structure as shown

255

below:

256

* *2 *3

1, [1, , , ]

−

= + = +

t t t t t t t t t

w e x x x vc c c

(17)

Where

0, 1, 2, 3,

, , ,

=

T

t t t t t

c c c cc

.

257

To summarize, a list of benchmarking algorithms is tabulated in Table 1. And the prediction

258

accuracies the algorithms in Table 1 are benchmarked based on the real-word data in the next

259

section.

260

10

Table 1 List of benchmarking algorithms

261

Short-term prediction

(1~6 hours ahead)

Long -term prediction

(6~24 hours ahead)

Short and long-term prediction

(1~24 hours ahead)

SVR+UKF[15]

Best NWP+KF [19]

SVR-NWP+KF [19]

SVR-NWP,

Best NWP

SVR-NWP+MTGP

(The proposed method)

4. Results and Discussions

262

The performance of the proposed method is validated based on the off-shore wind farm data

263

collected within half a year. This dataset under study includes the WS series at turbine nacelle

264

collected by the SCADA and the NWP forecast data that is described above. For KF ensembled

265

methods in [18], the data from January and February 2017 is used for cross-validation and model

266

training, while data from March and September 2017 is used for model performance testing. For

267

other methods which don’t need pre-training, the data from March and September is used for model

268

testing and result benchmarking. At the beginning of the analysis, all the time series from NWP

269

and anemometer measurements at turbine nacelle are synchronized and pre-processed to have a 1-

270

hour interval.

271

To demonstrate the advantages of the proposed methodology, the prediction model is

272

validated on two different turbines, which are benchmarked in Table 2 ~ Table 5. Table 2 and

273

Table 3 show the prediction results of WT #1 in April and September. Table 4 and Table 5 show

274

the prediction results of WT #2 during the same month. The best prediction accuracies at each

275

prediction horizon are highlighted in bold character. Generally, the proposed model SVR-

276

NWP+MTGP yields the best results comparing with all others.

277

Comparing the prediction accuracies of different methods at different prediction horizons, the

278

following findings are highlighted. (1) SVR+UKF model that is proposed in [15] demonstrates

279

improved prediction accuracy in short-term horizons (1~6 hours ahead). However, its

280

extrapolation accuracy deteriorates very fast at long-term horizons (7~24 hours ahead). (2) Best

281

NWP and SVR-NWP demonstrate good accuracy at long-term horizons. However, their accuracies

282

in short-term are not comparable to the statistical models; (3) Best NWP+KF and SVR-NWP+KF

283

demonstrate enhanced prediction accuracy comparing with Best NWP and SVR-NWP in short-

284

term horizons, especially 1~3 hours ahead. This finding indicates that the KF structure in [18, 19]

285

can effectively enhance the prediction accuracy as expected. However, the performance of such

286

post-processing steps in long-term prediction horizon is quite unstable. In some occasions, it makes

287

the prediction accuracy even worse; (4) The proposed method, SVR-NWP+MTGP, demonstrates

288

excellent accuracy in both short-term and long-term horizons. Its prediction accuracies in short-

289

term horizons are found comparable to SVR+UKF model and even better in some occasions. More

290

importantly, the long-term prediction accuracy of the proposed method is better than Best NWP,

291

SVR-NWP, Best NWP+KF and SVR-NWP+KF. In addition, the improvements made by the

292

proposed method is consistent over different turbine locations and different month.

293

The superiority of the proposed method is better explained in Fig. 4 and Fig. 5 by comparing

294

the RMSE and MAPE of different methods. The results in Fig. 4 and Fig. 5 demonstrate the

295

validation results on two different wind turbines at two different months. In both short-term and

296

long-term prediction horizons, the proposed method gives the best accuracy in term of both RMSE

297

11

and MAPE. It is highlighted that the short-term prediction performance of the proposed method is

298

comparable with the state-of-art approach SVR+UKF in [15], the long-term prediction accuracy

299

of the proposed method is superior than the KF structure that is presented in [19] for NWP post-

300

processing.

301

Table 2 Turbine #1 Comparison Result of April

302

Methods

Predict Hours

1- hour ahead

4-h ahead

6-hours ahead

12-hours ahead

18-hours ahead

24-hours ahead

MAPE

RMSE

MAPE

RMSE

MAPE

RMSE

MAPE

RMSE

MAPE

RMSE

MAPE

RMSE

SVR+UKF[15]

21.578

1.320

42.362

2.960

46.781

3.504

56.992

4.516

57.390

4.780

59.211

5.192

Best NWP

50.810

3.729

57.339

3.951

59.534

4.009

60.912

3.993

57.215

3.874

48.689

3.551

SVR-NWP

32.462

2.404

45.674

3.302

49.068

3.571

52.059

3.994

52.630

4.225

55.041

4.404

SVR-NWP+KF

[19]

30.179

2.069

43.264

2.747

46.452

2.896

50.164

3.109

51.725

3.274

55.526

3.346

Best NWP+KF

[19]

37.949

2.689

49.353

3.450

50.793

3.364

56.417

3.834

58.062

3.915

58.855

3.821

SVR-

NWP+MTGP

(The proposed

method)

18.891

1.281

41.312

2.481

44.645

2.669

49.831

2.872

52.026

2.991

51.488

3.013

303

Table 3 Turbine #1 Comparison Result of September

304

Methods

Predict Hours

1- hour ahead

4-h ahead

6-hours ahead

12-hours ahead

18-hours ahead

24-hours ahead

MAPE

RMSE

MAPE

RMSE

MAPE

RMSE

MAPE

RMSE

MAPE

RMSE

MAPE

RMSE

SVR+UKF[15]

24.493

1.437

40.850

2.656

46.770

3.342

57.907

4.440

64.258

5.051

69.382

5.488

Best NWP

61.311

3.448

64.257

3.512

63.280

3.488

60.932

3.474

64.637

3.561

59.814

3.332

SVR-NWP

34.850

2.267

43.685

2.709

45.656

2.807

48.943

2.959

49.246

2.999

50.100

3.077

SVR-NWP+KF

[19]

35.316

2.585

46.288

2.826

49.599

2.940

56.513

3.206

58.604

3.312

58.544

3.387

Best NWP+KF

[19]

37.598

2.670

44.393

3.147

46.075

3.321

48.928

3.365

48.732

3.386

53.402

3.593

SVR-NWP+MTGP

(The proposed

method)

22.588

1.442

42.071

2.577

45.660

2.780

48.101

2.853

48.635

2.919

50.383

3.109

305

Table 4 Turbine #2 Comparison Result of April

306

Methods

Predict Hours

1- hour ahead

4-h ahead

6-hours ahead

12-hours ahead

18-hours ahead

24-hours ahead

MAPE

RMSE

MAPE

RMSE

MAPE

RMSE

MAPE

RMSE

MAPE

RMSE

MAPE

RMSE

SVR+UKF[15]

20.522

1.284

42.853

2.597

48.267

3.030

58.061

3.993

62.880

4.487

60.653

4.612

Best NWP

54.510

3.511

59.528

3.663

61.808

3.727

64.229

3.722

62.155

3.659

52.734

3.380

SVR-NWP

32.938

2.256

43.780

3.067

46.679

3.328

50.568

3.682

50.279

3.824

52.941

3.978

SVR-NWP+KF

[19]

31.651

1.900

43.678

2.577

46.279

2.750

49.773

2.962

51.621

3.057

53.239

3.115

Best NWP+KF

[19]

36.036

2.563

49.136

3.424

50.643

3.379

54.012

3.545

54.849

3.508

58.816

3.899

SVR-NWP+MTGP

(The proposed

method)

18.317

1.240

38.614

2.274

41.994

2.466

46.547

2.656

47.635

2.755

48.145

2.797

Table 5 Turbine #2 Comparison Result of September

307

Methods

Predict Hours

1- hour ahead

4-h ahead

6-hours ahead

12-hours ahead

18-hours ahead

24-hours ahead

MAPE

RMSE

MAPE

RMSE

MAPE

RMSE

MAPE

RMSE

MAPE

RMSE

MAPE

RMSE

12

SVR+UKF[15]

23.349

1.387

41.875

2.618

45.514

3.124

55.323

3.994

65.124

4.764

71.706

5.212

Best NWP

59.158

3.441

60.907

3.460

62.713

3.609

60.261

3.532

64.388

3.649

59.738

3.461

SVR-NWP

33.385

2.343

41.175

2.757

43.057

2.855

45.717

2.954

46.806

3.018

48.089

3.098

SVR-NWP+KF

[19]

33.409

2.268

42.550

2.787

45.107

2.921

50.296

3.139

50.496

3.210

50.851

3.271

Best NWP+KF

[19]

34.456

2.539

41.002

3.071

43.953

3.334

46.819

3.460

46.340

3.465

51.382

3.614

SVR-NWP+MTGP

(The proposed

method)

21.661

1.452

39.950

2.564

44.061

2.848

47.951

3.062

47.378

3.041

48.272

3.146

308

(a)

(b)

(c)

(d)

Fig. 4 Benchmarking of Turbine #1 prediction. (a) RMSE in April; (b) MAPE in April; (c)

RMSE in September; (d) MAPE in September.

309

13

(a)

(b)

(c)

(d)

Fig. 5 Benchmarking of Turbine #2 prediction. (a) RMSE in April; (b) MAPE in April; (c)

RMSE in September; (d) MAPE in September.

310

(a)

14

(b)

(c)

(d)

Fig. 6 Detailed Results and Error Bounds of SVR-NWP+MTGP model (a) 1-hour Ahead

Prediction; (b) 6-hour Ahead Prediction; (c) 12-hour Ahead Prediction; (d) 24-hour Ahead

Prediction.

Fig. 6 shows detailed results of SVR-NWP+MTGP model, which is randomly selected from

311

the testing dataset. Several findings are highlighted below: (1) Generally, the proposed model fits

312

the ground truth well. (2) Short-term horizon leads to smaller error bounds. For long-term horizon,

313

the proposed model is capable to describe the overall trends of WS data. (3) In most cases, the true

314

value of WS data falls into the predicted error bounds, except the wind gust around time point 300-

315

350. However, for 1-hour ahead prediction, the proposed model is capable to predict the wind gust

316

very well.

317

Fig. 7 compares the error distributions of the benchmarking models and the proposed model

318

in two months. Several key points for discussion in Fig. 7 are as follows: (1) The SVR+UKF and

319

the proposed method gives the smallest error in 1-hour ahead prediction. And the proposed method

320

is slightly better than SVR+UKF; (2) the proposed method also gives the best prediction at 6-hour

321

15

ahead prediction. The prediction error of the SVR+UKF method starts to increase at this prediction

322

horizon; (3) At 12-hour Ahead prediction and 24-hours ahead prediction, the SVR-NWP and SVR-

323

NWP+ MTGP give the smallest error. (4) Comparing SVR+UKF, Best NWP and the proposed

324

method, one can easily find that the proposed method keeps the advantage of SVR+UKF in the

325

short-term horizons and the advantage of NWP in the long-term horizons; (5) Comparing Best

326

NWP, SVR+NWP and SVR-NWP+MTGP, one can find that the MTGP mainly reduces the

327

prediction error in the short-term horizons. At long-term horizons, the prediction results are

328

slightly better than SVR-NWP and significantly better than NWP; (6) By comparing with the

329

recent models in literature SVR+UKF[15] and NWP+KF[19], the proposed method gives the best

330

overall prediction accuracy in the 24 hours ahead prediction.

331

1-hour Ahead

6-hour Ahead

12-hour Ahead

24-hour Ahead

SVR+

UKF

[15]

Best

NWP

Best

NWP+

KF

[19]

SVR-

NWP

SVR-

NWP+

MTGP

Fig. 7 Probabilistic Histograms of Residues with the Distribution Mean and 2- Interval

Finally, the execution time of the proposed model is discussed. The proposed methodology

332

and the benchmarking methods are run on a PC with RAM 32GB, CPU 3.50GHz, Windows 10

333

Enterprise. At each time point, WS data and NWP data of the last 120 hours are included in all of

334

16

the models testing procedure. The proposed model is run 10 times on the data from March and

335

September.

336

Fig. 8 demonstrates the average execution time and the accuracy of different time block

337

lengths. The average execution time refers to the average prediction time at each time point. The

338

time block length is donated to the number of hours during which past WS and NWP data is taken

339

into consideration, as shown in Fig. 3. It indicates that the execution time grows as the length of

340

time block enlarges. Meanwhile, the prediction accuracies RMSE and MAPE are improved as

341

more WS and NWP data is taken into the model. However, when the time block length grows to

342

96 hours, the average execution time still grows while RMSE and MAPE converge to a stable

343

level. Therefore, a time block of 96 hours is recommended to achieve the optimal prediction

344

performance and to save execution time.

345

(a)

(b)

(c)

Fig. 8 Average Execution Time and Prediction Accuracy of Time Block Length. (a) Average

Execution Time vs. Time Block Length (b) RMSE of Different Time Block Length (c) MAPE

of Different Time Block Length.

5. Conclusion

346

In this paper, a novel WS prediction method is proposed. The effectiveness and the superiority of

347

the proposed method are validated on a dataset collected from an off-shore wind farm. The final

348

results suggest following conclusions. (1) The proposed method can be effectively employed to

349

improve the prediction accuracy of the numerical weather prediction; (2) the proposed method

350

17

carries both the advantages of time series extrapolation method for short-term prediction and the

351

advantages of NWP in long-term prediction horizon. (3) The proposed method reports improved

352

prediction accuracy comparing with the recently proposed models of SVR+UKF [15] and

353

NWP+KF [19].

354

355

In future works, the proposed method will be integrated into commercial software for

356

practical use and the sparse Gaussian process methods will be explored to further boost the

357

computational efficiency.

358

Reference:

359

[1] I. Colak, S. Sagiroglu, and M. Yesilbudak, "Data mining and wind power prediction: A

360

literature review," Renewable Energy, vol. 46, pp. 241-247, 2012.

361

[2] J. Jung and R. P. Broadwater, "Current status and future advances for wind speed and

362

power forecasting," Renewable and Sustainable Energy Reviews, vol. 31, pp. 762-777,

363

2014/03/01/ 2014.

364

[3] X. Jia, C. Jin, M. Buzza, Y. Di, D. Siegel, and J. Lee, "A deviation based assessment

365

methodology for multiple machine health patterns classification and fault detection,"

366

Mechanical Systems and Signal Processing, vol. 99, pp. 244-261, 2018.

367

[4] J. Jin, D. Zhou, P. Zhou, S. Qian, and M. Zhang, "Dispatching strategies for coordinating

368

environmental awareness and risk perception in wind power integrated system," Energy,

369

vol. 106, pp. 453-463, 2016.

370

[5] X. Zhang, W. Cai, and Z. Gan, "Optimal dispatching strategies of active power for DFIG

371

wind farm based on GA algorithm," in Control and Decision Conference (CCDC), 2016

372

Chinese, 2016, pp. 6094-6099.

373

[6] P. Eecen, H. Braam, L. Rademakers, and T. Obdam, "Estimating costs of operations and

374

maintenance of offshore wind farms," in European Wind Energy Conference and

375

Exhibition, Milan, Italy, 2007.

376

[7] A. Kovács, G. Erdös, L. Monostori, and Z. J. Viharos, "Scheduling the maintenance of

377

wind farms for minimizing production loss," IFAC Proceedings Volumes, vol. 44, pp.

378

14802-14807, 2011.

379

[8] A. Kovacs, G. Erdős, Z. J. Viharos, and L. Monostori, "A system for the detailed

380

scheduling of wind farm maintenance," CIRP Annals-Manufacturing Technology, vol.

381

60, pp. 497-501, 2011.

382

[9] X. Jia, C. Jin, M. Buzza, W. Wang, and J. Lee, "Wind turbine performance degradation

383

assessment based on a novel similarity metric for machine performance curves,"

384

Renewable Energy, vol. 99, pp. 1191-1201, 2016.

385

[10] H. Liu, H.-q. Tian, and Y.-f. Li, "Comparison of two new ARIMA-ANN and ARIMA-

386

Kalman hybrid methods for wind speed prediction," Applied Energy, vol. 98, pp. 415-

387

424, 2012.

388

[11] O. B. Shukur and M. H. Lee, "Daily wind speed forecasting through hybrid KF-ANN

389

model based on ARIMA," Renewable Energy, vol. 76, pp. 637-647, 2015/04/01/ 2015.

390

[12] M. Monfared, H. Rastegar, and H. M. Kojabadi, "A new strategy for wind speed

391

forecasting using artificial intelligent methods," Renewable Energy, vol. 34, pp. 845-848,

392

2009/03/01/ 2009.

393

18

[13] X. Jia, Y. Di, J. Feng, Q. Yang, H. Dai, and J. Lee, "Adaptive virtual metrology for

394

semiconductor chemical mechanical planarization process using GMDH-type polynomial

395

neural networks," Journal of Process Control, vol. 62, pp. 44-54, 2018.

396

[14] S. A. Pourmousavi Kani and M. M. Ardehali, "Very short-term wind speed prediction: A

397

new artificial neural network–Markov chain model," Energy Conversion and

398

Management, vol. 52, pp. 738-745, 2011/01/01/ 2011.

399

[15] K. Chen and J. Yu, "Short-term wind speed prediction using an unscented Kalman filter

400

based state-space support vector regression approach," Applied Energy, vol. 113, pp. 690-

401

705, 2014.

402

[16] G. Santamaría-Bonfil, A. Reyes-Ballesteros, and C. Gershenson, "Wind speed forecasting

403

for wind farms: A method based on support vector regression," Renewable Energy, vol.

404

85, pp. 790-809, 2016/01/01/ 2016.

405

[17] S. Salcedo-Sanz, E. G. Ortiz-Garcı´a, Á. M. Pérez-Bellido, A. Portilla-Figueras, and L.

406

Prieto, "Short term wind speed prediction based on evolutionary support vector

407

regression algorithms," Expert Systems with Applications, vol. 38, pp. 4052-4057,

408

2011/04/01/ 2011.

409

[18] P. Louka, G. Galanis, N. Siebert, G. Kariniotakis, P. Katsafados, I. Pytharoulis, et al.,

410

"Improvements in wind speed forecasts for wind power prediction purposes using

411

Kalman filtering," Journal of Wind Engineering and Industrial Aerodynamics, vol. 96,

412

pp. 2348-2362, 2008.

413

[19] F. Cassola and M. Burlando, "Wind speed and wind energy forecast through Kalman

414

filtering of Numerical Weather Prediction model output," Applied Energy, vol. 99, pp.

415

154-166, 2012/11/01/ 2012.

416

[20] C. E. Rasmussen, "Gaussian processes for machine learning," 2006.

417

[21] R. R. Richardson, M. A. Osborne, and D. A. Howey, "Gaussian process regression for

418

forecasting battery state of health," Journal of Power Sources, vol. 357, pp. 209-219,

419

2017.

420

[22] H. Liu, J. Cai, and Y.-S. Ong, "Remarks on Multi-Output Gaussian Process Regression,"

421

Knowledge-Based Systems, 2018.

422

[23] E. V. Bonilla, K. M. Chai, and C. Williams, "Multi-task Gaussian process prediction," in

423

Advances in neural information processing systems, 2008, pp. 153-160.

424

[24] R. Dürichen, M. A. Pimentel, L. Clifton, A. Schweikard, and D. A. Clifton, "Multitask

425

gaussian processes for multivariate physiological time-series analysis," IEEE

426

Transactions on Biomedical Engineering, vol. 62, pp. 314-322, 2015.

427

428