Download full-text PDF

Association Rules for Clustering Algorithms for Data Mining of Temporal Power Ramp Balance

Conference Paper (PDF Available)  · October 2015
DOI: 10.13140/RG.2.1.2987.5682
Conference: Conference: IEEE Cyberworlds 2015
Nurseda Yildirim at University of Zaragoza
  • 7.71
  • University of Zaragoza
Bahri Uzunoglu at Uppsala University
  • 16.7
  • Uppsala University
Abstract
Power ramp estimation is utmost importance for wind power plants which will be the focus of this paper. Power ramps are caused by intermittent supply of wind power generation. This is an important problem in the power system that needs to keep the load and generation at balance at all times while any unbalance leads to price volatility, grid security issues that can create power stability problems that leads to financial losses. In this study, K-means clustering and association rules of apriori algorithm are implemented to analyze and predict wind power ramp occurrences based on 10 minutes temporal SCADA data of power from records of Ayyildiz wind farm. Power ramps are computed from this data. Five wind turbines with no dissimilarity measure in space were clustered based on temporal data. The power ramp data are analyzed by the K-means algorithm for calculation of their cluster means and cluster labels. Association rules of data mining algorithm were employed to analyze temporal ramp occurrences between wind turbines. Each turbine impact on the other turbines were analyzed as different transactions at each time step. Operational rules based on these transactions are discovered by an apriori association rule algorithm for operation room decision making. Discovery of association rules from an apriori algorithm can help with decision making for power system operator.
Figures
Association Rules for Clustering Algorithms for
Data Mining of Temporal Power Ramp Balance
Nurseda Yildirim and Bahri Uzuno˘
glu
Department of Engineering Sciences, Division of Electricity
Centre for Renewable Electric Energy Conversion
Uppsala University, The ˚
Angstr¨
om Laboratory
Box 534, 751 21 Uppsala, Sweden,
e-mail: nursedayildirim@iyte.edu.tr
e-mail: bahri.uzunoglu@angstrom.uu.se
e-mail: bahriuzunoglu@computationalrenewables.com
Abstract—Power ramp estimation is utmost importance for
wind power plants which will be the focus of this paper.
Power ramps are caused by intermittent supply of wind power
generation. This is an important problem in the power system
that needs to keep the load and generation at balance at all times
while any unbalance leads to price volatility, grid security issues
that can create power stability problems that leads to financial
losses.
In this study, K-means clustering and association rules of
apriori algorithm are implemented to analyze and predict wind
power ramp occurrences based on 10 minutes temporal SCADA
data of power from records of Ayyildiz wind farm. Power
ramps are computed from this data. Five wind turbines with no
dissimilarity measure in space were clustered based on temporal
data. The power ramp data are analyzed by the K-means
algorithm for calculation of their cluster means and cluster
labels. Association rules of data mining algorithm were employed
to analyze temporal ramp occurrences between wind turbines.
Each turbine impact on the other turbines were analyzed as
different transactions at each time step. Operational rules based
on these transactions are discovered by an apriori association
rule algorithm for operation room decision making. Discovery of
association rules from an apriori algorithm can help with decision
making for power system operator.
Index Terms—Data mining, big data, power ramp, clustering,
association rules, apriori algorithm
I. INTRODUCTION
Power ramp estimation is very important for wind power
plants which will be the focus of this paper. Due to inter-
mittency of wind speed, power level can vary in a stochastic
behaviour. Instant volatile changes in the power level is defined
as ramp event. Power ramp rate (PRR) that is instant change in
power level, is denoted by gradient of the power production
identified by the anomalies of the first derivative of power
production. Power ramp up and down refers to positive and
negative ramp issues respectively with negative ramps impact-
ing the power system security, contingency and also impacting
reserve electricity market with financial consequences [1] [2].
Power production is always positive or zero however due to
change direction of power, PRR can be negative and positive.
Higher magnitude of absolute value of PRR points to faster
power surge (drop).
Power ramp rate related applications can employ available
historical data such as SCADA and meteorological mast data
to create forecasts and prediction model while some of this
data is collected as mandatory as a result of the regulations of
the industry. In these massive databases, knowledge discovery
issues can be addressed with data mining methods. Grid
service infrastructure has been one of the implementations of
the methodology of data mining [3]. In the analysis of power
ramps events, there is data available with physical parameters,
spatial parameters and temporal parameters. Especially high
frequent large data sets could be utilized with the help of
associative prediction rules of data mining that will be devel-
oped in this paper for the decision making in the operational
room. This will be the focus of this study. To understand
relationships between occurrences of ramp, location of each
turbine has been investigated previously by the authors [4] [5]
based on spatial and temporal effects. This will be further
expanded in this work to associative prediction rules with
physical and temporal parameters that will be expanded to
a decision making process algorithm for the power system
operational room.
A temporal physical parameter space study was undertaken
by Kamath [6] that demonstrated that some atmospheric
physical parameters which are taken from historical SCADA
data are more important than others for ramp occurrences.
At the end of the study, author derived a helper key set for
control room operators. Control room operators face with two
main issues related with ramp. Positive ramps occur when the
unexpected shock occurs in wind power production increases
due to positive trend on wind speed in a short period that can
lead to financial losses and imbalance in the grid, operators
must balance load decreasing the production of other power
plants. For negative ramp cases operators should have enough
backup power. In another study by [7], a apriori algorithm
was employed with the approach of correcter of predicted
wind speed values for the Hexi Corridor area of China.
In this study, meteorological variables that affect predicted
wind speeds, such as temperature, pressure and humidity was
clustered according to rules between these parameters. The
cluster means of each group (4 main set due to number of
meteorological tower) was found to decrease the prediction
error in this study [7]. There has been also other applications
of apriori rules in wind power industry in different fields such
as alarm data cleaning, operation and maintenance and fault
detection from alarm data [8] [9] with the help of association
rules.
Methodology explanation, K-means algorithm and defini-
tion of associative rule mining procedure with support, lift
and rules are summarized in section II. The details of input
database and parameters will be given in section III. Findings
and list of rules are given in section IV. Discussion will be
presented in section V.
II. ASS OC IATIO NS RULES FOR CLUSTERING
ALG OR ITH M
Clustering algorithms optimize an objective function F.
Function value changes with partition of the number of
clusters kas defined by C1, C2, C3, ..., Ck
F:Qk(Ω) R(1)
herein Qk(Ω) is the set of all the partitions of data Ω =
w1, w2, w3, ..., wmin Knon empty cluster.
A. Clustering algorithm - K-means
The K-means algorithm creates a solution to figure out
locally optimal values via clustering criterion Fwhich de-
pends on the sum of the distance between each element and
its nearest cluster center (centroid) [10]. We can formulate it
as follows where Kis the number of clusters and Kiis the
number of objects of the cluster i,wij is the jth object of the
ith cluster and ¯wiis the centroid of the each cluster. Clustering
algorithms optimize an objective function F. Function value
changes with partition of the number of clusters kas defined
by C1, C2, C3, ..., Ck
F:Qk(Ω) R(2)
Qk(Ω) is the set of all the partitions of data Ω =
w1, w2, w3, ..., wmin Knon empty cluster. Fobjective func-
tion can be expanded as [10]
F({C1, ..., Ck})=ΣK
i=1ΣKi
j=1 kwij ¯wik(3)
¯wi=1
Ki
ΣKi
j=1wij i= 1, ..., K.
The conventional K-means algorithm pseudo code [11] can
be summarized as below;
1) Clustering methodology needs to implement decision of
optimal cluster size. Cluster size stands for meaningful
partition without losing information in large clusters or
creating too many small clusters. There has been various
approaches to solve this issue. One of them is the entropy
calculation which is not employed in this study [12].
The second approach is calculation and optimization of
indexes which are discussed in [12]. In this study NbClust
package in R program will be employed to perform this
optimization task [12] based on several indexes discussed
in [12]. Majority rule, which is the most frequent result
of 30 different cluster size decision methodology while
final cluster size suggestion will be based on majority
[12].
2) Initialization partition (C1,C2,...,Ck) of database is
executed via Forgy Algorithm [11].
3) Compute centroids of each cluster.
4) Reassign wito closest cluster centroid.
5) Recalculate centroids for each cluster ¯wi.
6) Reiterate until no further changes of cluster membership
occur in a complete iteration and stop.
In the next section, we will introduce the apriori algorithm
that will postprocess the output of cluster number results of
each turbine in generating association rules for operational
decisions.
B. Associations rules - Apriori algorithm
To illustrate three layer process of apriori algorithm, lets
define an array Pt,1...15 ={C1...t,1...5
1. . . C1...t,1...5
3}that
defines the matrix below where matrix single entry at each
row for a single time step Ct,l
kis a set of binary attributes
called items where trepresents time step, lrepresents turbine
number and krepresents cluster label so there are always 15
items in this example where item set of 15 is denoted by I.
P1...N,1...15 =
k=1 k=2 k=3
t=1 C1,1...5
1C1,1...5
2C1,1...5
3
.
.
..
.
..
.
..
.
.
t=NCN,1...5
1CN,1...5
2CN,1...5
3
Each row entry of Pt,1...15 is a transaction that means that there
are as many transactions as time steps. Each row transaction
contains analysis of K-means cluster labels for ramps at each
time step for each turbine in binary form. Each row entry
that represents a new transaction will have a subset of items
such as Xand Ywhere both of the item subsets will have
size less than 15 where X, Y Iand XY=. As a
result an association rule that has a directional rule can be
defined where XY. Association rules are defined at each
transactions based on each original row entry. Each original
row entry will define a new association rule. However, this will
create several associations rules so more filtering is required
on associations.
Support is a user defined limit to filter irrelevant occur-
rences of that candidate association rules that will not be
significant for decisions. This functionality will be defined
by function supp() which basically counts the frequency of
association rules occurrences. In this context of association
rules, support is the occurrence or the size of this association
rule in all rows. Mathematically, the support count σ(XY)
for an itemset Xrule can be stated as [13]
σ(XY) = |{Pt,1...15 |XYPt,1...15, Pt,1...15 P1...N,1...15 }|
where the symbol |.|denote the number of elements in a set.
If we define total number of transactions or time steps in our
case as N, support of XYcan be defined as
supp(XY) = σ(XY)
N.
(4)
Confidence will define the ratio of support of a rule asso-
ciation candidate such as supp(XY)divided by support
of one subset of items such as supp(X)which is defined by
function conf()
conf(XY) = supp(XY)
supp(X).
(5)
Difference between confidence and support must be high-
lighted again, confidence is a tool to measure strength of an
association rule and support symbolizes statistical significance.
Filtering by support threshold is an important tool to imply
rule is worth consideration or should be eliminated [14].
Lift is another criteria denoted by function lif t() as the
ratio of the confidence to right hand item subset. This is used
to strength the filter when there is several association rules
lif t(YX) = supp(XY)
supp(X)supp(Y)=conf(XY)
supp(Y).
(6)
After the generation of item sets, counting phase will identify
frequency of item groups in each item set class. For an
ideal strong relationship between X and Y, support must be
large and confidence must be high. Greater magnitude of
lift values are the proof for stronger associations between
items (stronger rules). According to occurrence of counted
candidates, their support value is calculated by ratio between
candidate repetition number to observation set. In final step
each candidate will be compared with their own support
values to pre-defined threshold support value such as 50%.
The candidates that are exceeding this threshold or catch this
threshold will be selected.
Data mining and discovery rule finding methodologies that
were discussed here are as of nature dependent on input
transaction data types. Transaction is data form of act which
you can not divide to smaller parts that represents any change
in database (DB) as presented. Each transaction is binary
set of items. For the association rule process, input data
was transaction data which has time dimension that is the
count of transaction in this example. Different objects such
as physical variables, spatial or temporal dimensions can
also be employed. Based on transaction data, association rule
process specifically the Apriori algorithm is employed here
that computes the frequent object groups of transaction in the
database through numerous iterations [13].
III. CAS E STU DY-AYY I LD IZ
The Ayyildiz is a wind farm that has five VESTAS V90-3.0
MW turbines at 80m. The wind farm is located in the town of
Bandirma which is closed to sea of Marmara as seen in Figure
1 with open sea in West and East directions. The main wind
direction is from North to North East in this wind farm as seen
in Figure 2b. Ayyildiz wind farm power production values for
five turbines are recorded for 2013 year by 10 min intervals.
2013 power production values are used for PRR analysis while
data was scaled for unit conversion as necessary [11].
(a) Location of the wind farm site province in
Turkey.
(b) Location of the wind farm site city.
(c) Location of the Wind Farm Site.
Fig. 1: Ayyildiz wind farm location details.
SCADA data was provided by TUBITAK (Scientific and
Technological Research Council of Turkey). Power Ramp
Rates (PRR) values that are calculated from SCADA power
records, are ratios from [15] that are equivalent to discrete
derivatives of power:
P RR =|P ower(t+ 10) P ow er(t)|
10 (7)
(a) Wind farm layout with objects; triangles are turbines,
circle is the met mast. Legend shows the height.
(b) Wind rose of the wind farm.
Fig. 2: Ayyildiz wind farm details.
The PRR values are the inputs to K-means Algorithm.
IV. RESULTS AND RU LES
The NbClust package of R is employed to analyze optimized
number of clusters [12]. The maximum cluster size was
defined as three through the analysis. The clustering results are
illustrated in Figure 3. The Figure 3a shows the results of the
clusters in temporal scales on two main principal components
axis of covariance matrix of the data. Herein principal com-
ponent analysis (PCA) results are only used to define the two
main directions for a two dimensional visualization of cluster
data. Clustering analysis is executed via K-means algorithm
defined in theory section. The relationship and similarities
between PCA and K-means in the derivation level has been
also exploited also at the algorithmic level while using PCA
for model reduction of K-means [16]. This is not the approach
taken here while PCA main principal directions namely dc1
and dc2 are used in Figure 3a for visualization purposes. The
results of the visualization in Figure 3a in component direction
1 (dc1) and direction 2 (dc2) clearly shows the clustering of
(a)
Time Turbine 1 Turbine 2 Turbine 3 Turbine 4 Turbine 5 Turbine 1 Turbine 2 Turbine 3 Turbine 4 Turbine 5
(Transaction) PRR PRR PRR PRR PRR Cluster label Cluster label Cluster label Cluster label Cluster label
1 -336 -12 -108 -96 -204 2 2 2 2 2
2 312 -288 96 -192 -84 2 3 2 3 2
3 -96 384 372 336 534 2 3 3 3 3
4 216 120 120 336 -102 3 3 3 3 3
5 -288 -48 -120 36 -90 3 3 3 2 3
(b)
Fig. 3: Clustering results a) K-means cluster results for PRR (prin-
cipal component analysis based illustration) b) K-means clustering
results for cluster size k=3.
temporal data of PRR around three clusters. The results are
also presented Figure 3 with PRR values with their cluster
labels.
The results of the association rules of apriori algorithm
are presented in Figure 4. The 4b summarizes some of the
rules between XY. The most powerful rule is rule 1:
XY:{P RR2=2, P RR4=2}→{P RR5=2}
according to confidence defined in Table 4b. This relation
is also demonstrated in visual aid in Figure 4a that shows
left handside relation (LHS) that is Xon the columns the
grid. Let’s explain this on the example X={P RR2 =
2, P RR4=2}which is the first entry on left top corner.
The index 2 (P RR2 = 2 + 2) from left to right, denotes in
the first entry two. This shows that occurrence was represented
two times for different association rules for LHS. First entry in
parenthesis that is P RR2=2means for second turbine ramp
which is P RR2, the cluster label was equal to P RR2=2.
Last entry denotes that there were one more item similar to
first one. On the rows that denotes RHS which is Y, we can
see for XY,Y={P RR5=2}. This correlates the Table
4b to Figure 4a. Same relations can be also observed for other
rules.
V. DISCUSSION AND CONCLUSION
The proposed method can be applied to the general cases
and it is not site specific. One example farm has been selected
here for study. The 10 minutes intervals recorded SCADA data
can generate large databases for wind farms in several years.
(a) Association rules XYmapping figure.
Number of Rule Rules Support Confidence Lift
rule 1 PRR2=2,PRR4=2PRR5=2 0.1192 0.8655 4.7215
rule 2 PRR5=2 PRR2=2 0.1305 0.7118 3.6663
rule 3 PRR1=3,PRR2=3,PRR4=3,PRR5=3 PRR3=3 0.6149 0.95069 1.1980
(b) Association rules XYmapping table.
Fig. 4: Some of the association rules a) Association rules XY
mapping figure. b) Association rules XYmapping table.
Herein authors generated set of operational rules for ramp
occurrences in Ayyildiz wind farm based on clustering and
association rule algorithms of data mining. First the wind
turbines are clustered based on their power ramp rates from
Scada data. This was achieved by employing K-means clus-
tering rule. After these clusters are introduced to Association
rules’ Apriori algorithm, new operational decision rules could
be implemented for the operator of the wind farms. The
rules provided operational decisions that can be exploited for
decision making of operator.
It is also possible to improve optimization of wind farm
layout in repowering decision cases via the approach employed
here. One such example rule pointed out that if there is positive
ramp occurrence in turbine five, there can be positive ramp
occurrence in turbine two that can help us to predict temporal
ramp estimations in different spatial locations.
Clustering algorithms and machine learning applications
could discover hidden rules between complex data parameters
from big data of these wind farms for operational decision
making of wind farms.
Acknowledgements
We would like to acknowledge the financial support and the
data provided by TUBITAK and Computational Renewables
LLC for the duration of this study.
REFERENCES
[1] B. Uzunoglu and D. Bayazit, “A generic resampling particle filter joint
parameter estimation for electricity prices with jump diffusion,” in
European Energy Market (EEM), 2013 10th International Conference
on the. IEEE, 2013, pp. 1–7.
[2] M. A. ¨
Ulker, “Balancing of wind power: Optimization of power systems
which include wind power systems,” 2011.
[3] C. Aflori and M. Craus, “Grid implementation of the apriori algorithm,”
Advances in Engineering Software, vol. 38, no. 5, pp. 295–300, 2007.
[4] B. Uzunoglu and A. Albayrak, “Data mining of wind data generated
by cfd solutions,” in CFD and Optimization. ECCOMAS Antalya
TURKEY, 2011.
[5] N. Yildirim and B. Uzunoglu, “Spatial clustering for temporal power
ramp balance and wind power estimation,” in Greentech. IEEE, 2015.
[6] C. Kamath, “Associating weather conditions with ramp events in
wind power generation,” in Power Systems Conference and Exposition
(PSCE), 2011 IEEE/PES. IEEE, 2011, pp. 1–8.
[7] Z. Guo, D. Chi, J. Wu, and W. Zhang, “A new wind speed forecasting
strategy based on the chaotic time series modelling technique and the
apriori algorithm,” Energy Conversion and Management, vol. 84, pp.
140–151, 2014.
[8] C. Tong and P. Guo, “Data mining with improved apriori algorithm
on wind generator alarm data,” in Control and Decision Conference
(CCDC), 2013 25th Chinese. IEEE, 2013, pp. 1936–1941.
[9] A. Kusiak and A. Verma, “Prediction of status patterns of wind turbines:
A data-mining approach,” Journal of Solar Energy Engineering, vol.
133, no. 1, p. 011008, 2011.
[10] J. M. Pena, J. A. Lozano, and P. Larranaga, “An empirical comparison of
four inilitiazion methods for the k means algorithm,” Pattern recognition
letters, vol. 20, no. 10, pp. 1027–1040, 1999.
[11] G. Gan, C. Ma, and J. Wu, Data clustering: theory, algorithms, and
applications. Siam, 2007, vol. 20.
[12] M. Charrad, N. Ghazzali, V. Boiteau, and A. Niknafs, “Nbclust: An r
package for determining the relevant number of clusters in a data set,
Journal of Statistical Software, vol. 61, no. 6, pp. 1–36, 2014.
[13] P.-N. Tan and V. Kumar, “Chapter 6. association analysis: Basic concepts
and algorithms,” Introduction to Data Mining. Addison-Wesley. ISBN,
vol. 321321367, 2005.
[14] R. Agrawal, T. Imieli´
nski, and A. Swami, “Mining association rules
between sets of items in large databases,” in ACM SIGMOD Record,
vol. 22, no. 2. ACM, 1993, pp. 207–216.
[15] A. Kusiak and H. Zheng, “Data mining for prediction of wind farm
power ramp rates,” in Sustainable Energy Technologies, 2008. ICSET
2008. IEEE International Conference on. IEEE, 2008, pp. 1099–1103.
[16] C. Ding and X. He, “K-means clustering via principal component
analysis,” in Proceedings of the twenty-first international conference on
Machine learning. ACM, 2004, p. 29.
Project
Work Package 5 of AWESOME project focuses on cost effective strategies for wind energy O&M tasks. As a part of this package, our main goals can be stated as follows; – To create a decision making …" [more]
Project
1. To develop better O&M planning methodologies of wind farms for maximizing its revenue 2. To optimise the maintenance of wind turbines by prognosis of component failures 3. To develop new and bet…" [more]
Conference Paper
April 2015
    Power estimation and power ramp estimation is of crucial importance in renewable energy applications especially for wind power plants that is going to be the focus of this study. Intermittent supply of wind power generation can cause power ramps which are sudden change of power production in time. This is an important problem in power system that aims to keep the load and generation balance.... [Show full abstract]
    Conference Paper
    October 2015
      Power ramp estimation is utmost importance for wind power plants which will be the focus of this paper. Power ramps are caused by intermittent supply of wind power generation. This is an important problem in the power system that needs to keep the load and generation at balance at all times while any unbalance leads to price volatility, grid security issues that can create power stability... [Show full abstract]
      Article
      July 2016
        Power ramp estimation has wide ranging implications for wind power plants and power systems which will be the focus of this paper. Power ramps are large swings in power generation within a short time window. This is an important problem in the power system that needs to maintain the load and generation at balance at all times. Any unbalance in the power system leads to price volatility, grid... [Show full abstract]
        Article
          An optimal Bayesian update strategy that implements the subjective opinions of several experts are introduced for preventive maintenance of wind turbines while single expert opinion has been introduced by the author in the previous studies. This work is introducing the opinion of the wind farm manager or technician via subjective opinions based on a Bayesian adaptive update strategies for... [Show full abstract]
          Discover more