Available via license: CC BY-NC-ND 4.0

Content may be subject to copyright.

ScienceDirect

Available online at www.sciencedirect.com

Procedia Computer Science 156 (2019) 357–366

1877-0509 © 2019 The Authors. Published by Elsevier Ltd.

This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)

Peer-review under responsibility of the scientiﬁc committee of the 8th International Young Scientist Conference on Computational Science.

10.1016/j.procs.2019.08.212

10.1016/j.procs.2019.08.212 1877-0509

© 2019 The Authors. Published by Elsevier Ltd.

This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)

Peer-review under responsibility of the scientic committee of the 8th International Young Scientist Conference on Computational

Science.

Available online at www.sciencedirect.com

Procedia Computer Science 00 (2019) 000–000

www.elsevier.com/locate/procedia

8th International Young Scientist Conference on Computational Science

Pattern Recognition in Non-Stationary Environmental Time Series

Using Sparse Regression

Irina Deevaa,∗, Nikolay O. Nikitina, Anna V. Kaluyzhnayaa

aITMO University, 49 Kronverksky Pr. St. Petersburg, 197101, Russian Federation

Abstract

The various real-world tasks of environmental management make it necessary to obtain the hindcasts and forecasts of natural events

(wind, ocean waves and currents, sea ice, etc.) using data-driven techniques for metocean processes simulation. The models can

be ﬁtted to speciﬁc fragments of the non-stationary multivariate time series individually to reproduce metocean environment with

desired characteristics. In the paper, the approach based on the LASSO regularised regression is proposed for the environmental

time series clustering. It allows the identify the situations with speciﬁc interaction between variables, that can be interpreted by

the values regression coeﬃcients. The weather generator was used to produce both synthetic time series similar to the general

dataset and the identiﬁed clusters. The obtained results can be used to increase the quality of the computationally lightweight

environmental models’ identiﬁcation and interpretation.

c

2019 The Authors. Published by Elsevier Ltd.

This is an open access article under the CC BY-NC-ND license https://creativecommons.org/licenses/by-nc-nd/4.0/)

Peer-review under responsibility of the scientiﬁc committee of the 8th International Young Scientist Conference on Computational

Science.

Keywords: data-driven models; metocean simulation; time series clustering; synthetic data; pattern mining

1. Introduction

The actual tasks of the oﬀshore development raise the problems of the structural optimisation for nearshore con-

structions. One of the main external concerns for this task is the environmental factors [7], especially the extreme

events and univariate and multivariate anomalies in the atmospheric, hydrodynamic or wave processes [22].

The obtain a more reliable solution and decrease the environmental risks, the long-term climate dataset for meto-

cean variables in speciﬁc points are required. It can be obtained from historical reanalysis: global (for example, the

reanalysis ERA Interim [6] for 1979-2019 years) or regional (for example, the pan-arctic reanalysis GLORYS2v4

[10] for 1992-2015 years). However, their spatio-temporal coverage and resolution are often insuﬃcient for the cor-

rect representation of the process in the small-scale domains [19]. The satellite-based observations also have many

similar restrictions.

∗Corresponding author. Tel.: +7-928-292-0889

E-mail address: iriny.deeva@gmail.com

1877-0509 c

2019 The Authors. Published by Elsevier Ltd.

This is an open access article under the CC BY-NC-ND license https://creativecommons.org/licenses/by-nc-nd/4.0/)

Peer-review under responsibility of the scientiﬁc committee of the 8th International Young Scientist Conference on Computational Science.

Available online at www.sciencedirect.com

Procedia Computer Science 00 (2019) 000–000

www.elsevier.com/locate/procedia

8th International Young Scientist Conference on Computational Science

Pattern Recognition in Non-Stationary Environmental Time Series

Using Sparse Regression

Irina Deevaa,∗, Nikolay O. Nikitina, Anna V. Kaluyzhnayaa

aITMO University, 49 Kronverksky Pr. St. Petersburg, 197101, Russian Federation

Abstract

The various real-world tasks of environmental management make it necessary to obtain the hindcasts and forecasts of natural events

(wind, ocean waves and currents, sea ice, etc.) using data-driven techniques for metocean processes simulation. The models can

be ﬁtted to speciﬁc fragments of the non-stationary multivariate time series individually to reproduce metocean environment with

desired characteristics. In the paper, the approach based on the LASSO regularised regression is proposed for the environmental

time series clustering. It allows the identify the situations with speciﬁc interaction between variables, that can be interpreted by

the values regression coeﬃcients. The weather generator was used to produce both synthetic time series similar to the general

dataset and the identiﬁed clusters. The obtained results can be used to increase the quality of the computationally lightweight

environmental models’ identiﬁcation and interpretation.

c

2019 The Authors. Published by Elsevier Ltd.

This is an open access article under the CC BY-NC-ND license https://creativecommons.org/licenses/by-nc-nd/4.0/)

Peer-review under responsibility of the scientiﬁc committee of the 8th International Young Scientist Conference on Computational

Science.

Keywords: data-driven models; metocean simulation; time series clustering; synthetic data; pattern mining

1. Introduction

The actual tasks of the oﬀshore development raise the problems of the structural optimisation for nearshore con-

structions. One of the main external concerns for this task is the environmental factors [7], especially the extreme

events and univariate and multivariate anomalies in the atmospheric, hydrodynamic or wave processes [22].

The obtain a more reliable solution and decrease the environmental risks, the long-term climate dataset for meto-

cean variables in speciﬁc points are required. It can be obtained from historical reanalysis: global (for example, the

reanalysis ERA Interim [6] for 1979-2019 years) or regional (for example, the pan-arctic reanalysis GLORYS2v4

[10] for 1992-2015 years). However, their spatio-temporal coverage and resolution are often insuﬃcient for the cor-

rect representation of the process in the small-scale domains [19]. The satellite-based observations also have many

similar restrictions.

∗Corresponding author. Tel.: +7-928-292-0889

E-mail address: iriny.deeva@gmail.com

1877-0509 c

2019 The Authors. Published by Elsevier Ltd.

Peer-review under responsibility of the scientiﬁc committee of the 8th International Young Scientist Conference on Computational Science.

358 Irina Deeva et al. / Procedia Computer Science 156 (2019) 357–366

2Irina Deeva et al. /Procedia Computer Science 00 (2019) 000–000

The possible solution to this problem is the long-term simulation using regional high-resolution conﬁgurations

of the coupled numerical metocean models. However, these models are very computationally expensive and often

require the expert’s involvement for the conﬁguration’s tuning for speciﬁc regional conditions. The other problem

of the simulated data is the rare occurrence of some events that should be taken into account during the coastal and

nearshore constructions’ optimisation.

The promising solution is to build the artiﬁcial environment for the optimising structure using synthetic environ-

mental datasets with desired characteristics. The various data-driven models can be applied to generate the dataset

and preserve physical consistency between the variables (for example, sea surface height, wave height, wind speed,

etc.). However, the potential quality of the data-driven model quality can be increased through segmentation of the

time series for diﬀerent metocean events (storms, calmness, swells).

This approach is widely applied for the reproduction of the metocean events with speciﬁc properties (for example,

ﬂood-causing cyclones [28]). In can be considered as an unsupervised learning task. Then, the labelled dataset can be

used to train the classiﬁcation model for metocean events.

In this paper, the regression model-based clustering is proposed to resolve this task. The special attention is taken to

the intractability of the obtained results. It is important because the data generation process should be understandable

for the domain expert to ensure the correctness of the data set.

As a case study, the Kara Sea domain was used. To generate the reference long-term dataset, the oﬄine-coupled

system of the pan-Arctic ocean, ice, atmospheric and wave models were conﬁgured and executed using Lomonosov-2

supercomputer. We conduct a set of experiments to compare the diﬀerent approaches to sub-models identiﬁcation.

This paper is structured as follows. Sec. 2describes the problem statement and mathematical formalisation of the

synthetic data generation task. Sec. 3provides an overview of various works and researches in the ﬁeld of meteoro-

logical time series clustering and patterns recognition. Sec. 4describes metocean datasets and the methods that were

used for clustering and searching patterns in the source data. Sec. 5provides the results of the conducted experiments

and their analysis. Finally, Sec. 6provides a summary of the paper and the generalisation of the obtained experimental

results.

2. Problem statement

The initial problem for the usage of data-driven models for environmental tasks is multidimensional relation among

many processes. Aside from this, an additional problem is non-stationarity of process and, in general case, no infor-

mation about a certain type of non-stationarity. In the presence of such limitations, it is obvious that prior assumption

about the model of the process is very uncertain. This fact aﬀects the conclusions about appropriate mathematical

methods and makes them biased (e.g. usage of methods for Gaussian processes when the targeted process is far from

Gaussian). To overcome this issue it seems reasonable to make the ﬁrst step that allows to identify and split initial data

to intervals that have homogeneous statistical characteristics. In case of environmental data homogeneous statistical

characteristics are usually associated with certain weather patterns. Such homogeneous chunks are more suitable and

easier for identiﬁcation of prediction model structure. In this article, we investigate the possibility for identiﬁcation

of the patterns with a homogeneous structure of approximation function. As a class of approximation functions was

chosen sparse polynomial regression with L1 regularisation (LASSO). The choice of such class of functions comes

from the ability of curvilinear regression to approximate some kinds of non-linearities and possibility to discover it’s

structure by neglecting the insigniﬁcant terms.

The ﬁtness function with L1-penalty for the LASSO model can be written in such formulation:

min

β∈Rp{(Y−Xβ)T(Y−Xβ)+λ|β|1}, where|β|1=

p

j=1

|β|j.(1)

In formulation 1, X is a matrix of predictors (with curvilinear terms), Y is a vector of response (or predicted variable),

λis a degree of penalisation. As a proof of concept and example of visible separation of LASSO coeﬃcients to patterns

and their alignment with diﬀerent environmental situations is presented in Fig. 1.

Irina Deeva et al. / Procedia Computer Science 156 (2019) 357–366 359

Irina Deeva et al. /Procedia Computer Science 00 (2019) 000–000 3

Fig. 1. The example of the interaction between wave-breaker and the metocean environment. The red rectangles highlight the coeﬃcients for the

open water period (without ice coverage), the small blue rectangles inside it highlights the anomalous negative values of the regression coeﬃcients

that indicate the speciﬁc pattern existence.

3. Related work

The clustering of the environmental data is the special case of multivariate non-stationary time series clustering

[13]. The clustering and classiﬁcation of the time series is a challenging problem that is widely studied but still

haven’t universal solution [1]. In the case of time series for weather data, we deal with multidimensional time series,

the clustering of which is carried out by certain methods. For example, in this article [12] the application of the

Finite Element Method is considered in the context of time series analysis. The approach allows the identiﬁcation

of some hidden regimes in time series characterised by a set of model-speciﬁc parameters and some model distance

functional. The diﬀerent approaches can be applied like deep neural networks [9] for classiﬁcation, autoregressive [5]

and dynamic time warping metric [17] for clustering.

The common approach to the interpretation of the multivariate time series and clusters in it is a dimensionality

reduction and visualisation techniques [26]. However, the internal structure of some types of data-driven models

(regression-based models, tree-based models) also can be interpreted [21] directly. Due to this reason, the model-

based approach to the time-series clustering can be chosen as an appropriate method according to the several studies

[11,34,4]. For the parameters of weather models can also be applied methods to reduce the dimension [15], [14].

A separate important issue is the determination of the number of clusters for further clustering of weather time

series. Since we do not know in advance how many hidden modes (clusters) can be observed in a time series, we can

set a metric by which the quality of the identiﬁed clusters for their diﬀerent numbers will be evaluated [2].

Also, there are several approaches to the task of synthetic environment identiﬁcation that can be applied for the

processing of the time series in identiﬁed clusters. The ﬁrst one based at the application of the data-driven or statistical

models for the restoration of the main variable using a set of other variables. There are stochastic models [29], auto-

regressive models [3] and neural networks [30] are widely applied in this area. The main disadvantage of many

methods is the insuﬃcient quality of the rare events reconstruction and unreliable preservation of the data consistency,

while the speciﬁcs of the metocean data is a strong physical dependence between diﬀerent environmental processes

represented as variables in a dataset.

The other approach is the weather generators [40]. They allow creating a consistent time series based at the existing

environmental dataset. The promising approach presented in is to identify the multi-year variability using ARIMA

or wavelet-based models [36] and that apply Markov chain models [35] to reconstruct daily variability (the most

of weather generators are not represent hourly variability, however, there are few techniques that support hourly

simulations too [25,33]).

The application of the data-driven models for the wind waves forecasting is also widely discussed in the literature.

They can be applied for both tasks of forecasting [41] and the extension of real observations with synthetic data [30].

360 Irina Deeva et al. / Procedia Computer Science 156 (2019) 357–366

4Irina Deeva et al. /Procedia Computer Science 00 (2019) 000–000

Usually, the wind variables shifted by time and space are used to reproduce the corresponding wave characteristics

[20] in the point. Also, the neural networks can be applied to reproduce the whole wave ﬁeld in some region [18].

To increase the quality of the simulation for the speciﬁc group of events, the problem-speciﬁc model can be applied

instead of common data-driven wave models. This speciﬁc model can diﬀer in structure or training data set.

The described methods can be used for the generation of the synthetic external data for various simulations, but

the speciﬁc task of the artiﬁcial envelopment requires more ﬂexible and interpretable approach. According to this

results in these studies, we decided to base the research at linear regression to build the proof-of-concept model for

non-stationary metocean time series based at interpretable clustering.

4. Data and methods

4.1. Dataset overview

We choose the Arctic region as a case study due to the several causes. First of all, the dynamics in sea ice cover

causes the changes in the wind-wave interaction, that makes the task of the reliable data-driven model identiﬁcation

more complicated. Also, the insuﬃcient time-spatial coverage of the observations and reanalyses in polar regions

makes the synthetic data generation task quite actual. Finally, the climate changes in the last decades make it necessary

to take the long-term trends into attention during data analysis.

The reference dataset was obtained using the system of state-of-art high-resolution computer models. It includes

the NEMO ocean model [24] and LIM3 ice model [32] WRF atmospheric model [31] and Wave Watch III model [38].

The conﬁguration was speciﬁcally adapted to the regional Arctic simulation as described in [16]. The time resolution

of all models is 1 hour. The 30-year run was executed using Lomonosov-2 supercomputer.

The ﬁnal datasets contain 8 variables which describe wave-wind processes (Datetime, Wind speed, Wind direction,

Sea surface height, Signiﬁcant wave height, Wave peak period, Mean wave period, Ice concentration). As a case study,

the 18 points were chosen in the small area in the Kara Sea near the Ob Bay coast. Their locations and sea depths in the

points are presented in Fig. 2. To analyse the variability in diﬀerent scales, we prepare the daily- and yearly-averaged

time series.

Fig. 2. The locations and indexes of the selected data points in the Kara Sea. The colormap represents the sea depth in each cell of grid.

To obtain a synthetic dataset solution that is similar to the real (model-based) one, the stochastic weather generator

was conﬁgured using R package weathermen obtained from [39] repository. It uses ARIMA and Markov chain models

to generate the dataset of the desired length. The source code was modiﬁed to introduce the various input and output

variables support (the reference package was strictly conﬁgured for the speciﬁc variables’ set only).

Irina Deeva et al. / Procedia Computer Science 156 (2019) 357–366 361

Irina Deeva et al. /Procedia Computer Science 00 (2019) 000–000 5

4.2. Model-based clustering and pattern mining

The metocean multivariate time series are characterised by speciﬁc annual and inter-annual patterns that can be

identiﬁed using statistical techniques [37]. Also, the application of the clustering methods allows identifying the local

sea waves events (storms, calmness, swells, ice- and non-ice periods) even without manual pre-labelling. In the same

time, the clustering results should be interpretable for the domain expert.

To reach this aims, we decided to build the data-driven model that includes that signiﬁcant wave height (hs) as target

variable and the set of predictor variables, that includes wind, ice and wave characteristics in target and neighbour

points.

To reduce the dimension of the original data and select the most signiﬁcant predictors, we used the LASSO method

with the alpha parameter (with constant that multiplies the L1 term) equal to 0.1. It can be seen from Fig. 3that in

some cases the coeﬃcients of the predictors turn to zero, which indicates the insigniﬁcance of this component of the

model, and therefore can be associated with any weather pattern.

Fig. 3. Coeﬃcients of LASSO. It can be seen that in some cases the coeﬃcients of the predictors turn to zero, which indicates the insigniﬁcance of

this component of the model, and therefore can be associated with any weather pattern.

The concept of the model-based clustering of time series is illustrated in Fig. 4a, that also contains the demonstra-

tion of the chunk concept that was involved into the algorithm as a sliding window of pre-deﬁned size.

At ﬁrst, all observations were divided into equal chunks. It should be noted here that the quality of clustering

depended on the size of the chunk. The metric used was adjusted rand score that computes a similarity measure

between two clusters by considering all pairs of samples and counting pairs that are assigned in the same or diﬀerent

clusters in the predicted and real clusters. During the experiments, it was found that the most eﬃcient cluster size is 72

hours, the results of the metrics for diﬀerent chunks can be seen in Fig. 4b. The architecture of the proposed solution

presented in Fig. 5. The density estimations for the distribution of the coeﬃcients in diﬀerent environmental cases are

presented in Fig. 6.

362 Irina Deeva et al. / Procedia Computer Science 156 (2019) 357–366

6Irina Deeva et al. /Procedia Computer Science 00 (2019) 000–000

Fig. 4. The visualisation of the model-based clustering approach. The upper plot contains the time series from data set, the lower plot contains the

regression coeﬃcients identiﬁed for every subset.

Fig. 5. The proposed workﬂow for interpretable generation of the case-speciﬁc synthetic environmental datasets.

Fig. 6. The density estimation for the regression coeﬃcients values distribution for diﬀerent cases: storm and calm sea. The diﬀerence between

distribution conﬁrms the idea of the coeﬃcient-based clustering for the identiﬁcation of speciﬁc environmental cases.

5. Experimental studies

The several experiments were conducted to examine the possibility of the regression-based time series clustering

in diﬀerent cases. The main aspects that should be validated are the practical possibility of the interpretable clusters

identiﬁcation and the eﬀectiveness of the model trained on the real dataset for the simulated synthetic time series.

Irina Deeva et al. / Procedia Computer Science 156 (2019) 357–366 363

Irina Deeva et al. /Procedia Computer Science 00 (2019) 000–000 7

5.1. Chunk-based clustering using k-means

The following experiment was aimed at testing the possibility of clustering weather patterns in the parameter space

of linear regression, built on the data at a point. The wave height was taken as a response to linear regression, and

three parameters selected by the LASSO as predictors: wind speed, wave peaks period, average wave period.

Then, on each chunk, a linear regression model was trained, the coeﬃcients of which were consistently preserved.

The resulting visualization in the space of model coeﬃcients, however, demonstrates that it is rather diﬃcult to identify

explicit patterns purely visually. Therefore, an attempt was made to divide the coeﬃcients obtained by the K-means

method. This separation revealed two clusters, which again may indicate the presence of two periods in the observa-

tions - ice and ice-free.

5.2. Dimensionality reduction and density-beased clustering

The following experiment was aimed at testing the possibility of clustering weather patterns in the parameter space

of linear regression, built on the data at a point. The wave height was taken as a response to linear regression, and

three parameters selected by the LASSO as predictors: wind speed, wave peaks period, average wave period.

Then, on each chunk, a linear regression model was trained, the coeﬃcients of which were consistently preserved.

The resulting visualization in the space of model coeﬃcients, however, demonstrates that it is rather diﬃcult to identify

explicit patterns purely visually. Therefore, an attempt was made to divide the coeﬃcients obtained by the K-means

method. This separation revealed two clusters, which again may indicate the presence of two periods in the observa-

tions - ice and ice-free.

The second subset of experiments was aimed to check eﬀective clustering during diﬀerent ice conditions. The t-

SNE [23] dimensionality reduction method was used to convert the coeﬃcients space to 2D. Then, the DBSCAN [8]

algorithm was applied for the clustering, because it allows estimating not just labels, but also the number of clusters.

The results of the processing of the full data set are presented in Fig. 7a. There are two clear clusters are highlighted.

Fig. 7. The results of tSNE clustering for a) full dataset b) ice-free period

To demonstrate the interpretation of the clusters, the time series with cluster-speciﬁc markup are presented in

Fig. 8a. However, it’s still hard to identify the sub-clusters for the ice-free period. After that, the ice period (when

the wave height equals to zero) was removed from time series and data were clustered again. As it can be seen in

Fig. 7b, the density-based clustering still allow obtaining two clear groups. To interpret the clusters, the time series

for 3 years were visualised again in Fig. 8b. It can be seen that the ice melting and freezing cases are covered by a

smaller cluster in reduced space, that allow to separate in from both ice-covered and no-ice cases using only obtained

regression coeﬃcients.

364 Irina Deeva et al. / Procedia Computer Science 156 (2019) 357–366

8Irina Deeva et al. /Procedia Computer Science 00 (2019) 000–000

Fig. 8. The clustered time series: a) for the full data set variables. The green color represents cluster for ice-free and freeze-melt periods, red color

represents the cluster for the ice-covered sea b) for the ice-free period in data set. The green color represents cluster for ice-free period, red color

represents freeze-melt periods.

5.3. Cluster-speciﬁc synthetic data generation

The next experiment design consists of the synthetic data generation based at one of the clusters. The modiﬁed

weather generator described above was initialised with ice melt-freeze cluster (green colour in Fig. 7b). Then, the data

from the identiﬁed cluster were used as a training set for the generator. The comparison of the obtained synthetic data

with real data and generic synthetic data is presented in Fig. 9.

Fig. 9. The estimation of the distribution density for environmental variables from the cluster-based synthetic data (red), real cluster data (blue) and

generic synthetic data (black).

It can be seen that the distribution of the variables in common time series is diﬀerent from both data from a cluster

and synthetic cluster-based values. At the same time, the reality is synthetic data for the cluster is similar enough.

6. Conclusion

In the paper, the analysis of the diﬀerent approaches to the data-driven model identiﬁcation and interpretation was

provided. The time series for each spatial point were separated into chunks. The LASSO regression was ﬁtted in each

chunk. The optimal size of chunk equal to 72 hours was estimated using the set of experiment with the adjusted Rand

Irina Deeva et al. / Procedia Computer Science 156 (2019) 357–366 365

Irina Deeva et al. /Procedia Computer Science 00 (2019) 000–000 9

index metric. The several interpretable clusters were recognised and analysed due to the comparison of variables’

distributions in the clusters. Then, the synthetic time series were generated for the identiﬁed cluster.

The conducted results conﬁrm the practical eﬀectiveness of the clustering in the regression coeﬃcients space

described in the paper. The proposed approach to the metocean time series clustering allows increasing the quality of

the synthetic data generation for speciﬁc cases.

All source code and materials used in the paper are available in the repository [27].

Acknowledgements

This research is ﬁnancially supported by The Russian Scientiﬁc Foundation, Agreement #19-71-00150.

The research is carried out using the equipment of the shared research facilities of HPC computing resources at

Lomonosov Moscow State University.

References

[1] Aghabozorgi, S., Shirkhorshidi, A.S., Wah, T.Y., 2015. Time-series clustering–a decade review. Information Systems 53, 16–38.

[2] Andrew, C., Julie, K., L., 2012. Data clustering reveals climate impacts on local wind phenomena. Journal of Applied Meteorology and

Climatology 51, 1547–1557.

[3] Biller, B., Nelson, B.L., 2003. Modeling and generating multivariate time-series input processes using a vector autoregressive technique. ACM

Transactions on Modeling and Computer Simulation (TOMACS) 13, 211–237.

[4] Chamroukhi, F., Sam´

e, A., Aknin, P., Govaert, G., 2011. Model-based clustering with hidden markov model regression for time series with

regime changes, in: The 2011 International Joint Conference on Neural Networks, IEEE. pp. 2814–2821.

[5] Corduas, M., Piccolo, D., 2008. Time series clustering and classiﬁcation by the autoregressive metric. Computational statistics & data analysis

52, 1860–1872.

[6] Dee, D.P., Uppala, S., Simmons, A., Berrisford, P., Poli, P., Kobayashi, S., Andrae, U., Balmaseda, M., Balsamo, G., Bauer, d.P., et al., 2011.

The era-interim reanalysis: Conﬁguration and performance of the data assimilation system. Quarterly Journal of the royal meteorological

society 137, 553–597.

[7] Ehlers, S., Kujala, P., Veitch, B., Khan, F., Vanhatalo, J., 2014. Scenario based risk management for arctic shipping and operations, in:

ASME 2014 33rd International Conference on Ocean, Oﬀshore and Arctic Engineering, American Society of Mechanical Engineers. pp.

V010T07A006–V010T07A006.

[8] Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al., 1996. A density-based algorithm for discovering clusters in large spatial databases with

noise., in: Kdd, pp. 226–231.

[9] Fawaz, H.I., Forestier, G., Weber, J., Idoumghar, L., Muller, P.A., 2019. Deep learning for time series classiﬁcation: a review. Data Mining

and Knowledge Discovery , 1–47.

[10] Garric, G., Parent, L., Greiner, E., Dr´

evillon, M., Hamon, M., Lellouche, J.M., R´

egnier, C., Desportes, C., Le Galloudec, O., Bricaud, C., et al.,

2017. Performance and quality assessment of the global ocean eddy-permitting physical reanalysis glorys2v4., in: EGU General Assembly

Conference Abstracts, p. 18776.

[11] Grun, B., 2018. Model-based clustering. arXiv preprint arXiv:1807.01987 .

[12] Horenko, I., 2010a. Finite element approach to clustering of multidimensional time series. SIAM J. Sci. Comput. 32, 62–83.

[13] Horenko, I., 2010b. On clustering of non-stationary meteorological time series. Dynamics of Atmospheres and Oceans 49, 164–187.

[14] Horenko, I., Johannes, S.E., Christof, S., 2006. Set-oriented dimension reduction: Localizing principal component analysis via hidden markov

models. Lecture Notes in Bioinformatics 4216, 98–115.

[15] Horenko, I., Rupert, K., Stamen, D., Christof, S., 2008. Automated generation of reduced stochastic weather models i: simultaneous dimension

and model reduction for time series analysis. SIAM Mult. Mod. Sim. 6, 1125–1145.

[16] Hvatov, A., Nikitin, N.O., Kalyuzhnaya, A.V., Kosukhin, S.S., 2018. Adaptation of nemo-lim3 model for multigrid high resolution arctic

simulation. arXiv preprint arXiv:1810.03657 .

[17] Izakian, H., Pedrycz, W., Jamal, I., 2015. Fuzzy clustering of time series data using dynamic time warping distance. Engineering Applications

of Artiﬁcial Intelligence 39, 235–244.

[18] James, S.C., Zhang, Y., O’Donncha, F., 2018. A machine learning framework to forecast wave conditions. Coastal Engineering 137, 1–10.

[19] Jianping, T., Ming, Z., Bingkai, S., 2007. The eﬀects of model resolution on the simulation of regional climate extreme events. Journal of

Meteorological Research 21, 129–140.

[20] Kambekar, A., Deo, M., 2010. Wave simulation and forecasting using wind time history and data-driven methods. Ships and Oﬀshore

Structures 5, 253–266.

[21] Lipton, Z.C., 2016. The mythos of model interpretability. arXiv preprint arXiv:1606.03490 .

[22] Lopatoukhin, L., Boukhanovsky, A., 2009. Multivariable extremes of metocean events (extreme and freak as the examples), in: EGU General

Assembly Conference Abstracts, p. 2098.

[23] Maaten, L.v.d., Hinton, G., 2008. Visualizing data using t-sne. Journal of machine learning research 9, 2579–2605.

[24] Madec, G., et al., 2015. Nemo ocean engine .

366 Irina Deeva et al. / Procedia Computer Science 156 (2019) 357–366

10 Irina Deeva et al. /Procedia Computer Science 00 (2019) 000–000

[25] Mezghani, A., Hingray, B., 2009. A combined downscaling-disaggregation weather generator for stochastic generation of multisite hourly

weather variables over complex terrain: Development and multi-scale validation for the upper rhone river basin. Journal of Hydrology 377,

245–260.

[26] Nguyen, M., Purushotham, S., To, H., Shahabi, C., 2017. m-tsne: A framework for visualizing high-dimensional multivariate time series. arXiv

preprint arXiv:1708.07942 .

[27] Nikitin, N., Deeva, I., 2019. The source code for the experimental studies presented in the paper. URL: https://github.com/Anaxagor/

Wave_wind_model.git.

[28] Nikitin, N.O., Spirin, D.S., Visheratin, A.A., Kalyuzhnaya, A.V., 2016. Statistics-based models of ﬂood-causing cyclones for the baltic sea

region. Procedia Computer Science 101, 272–281.

[29] Papalexiou, S.M., 2018. Uniﬁed theory for stochastic modelling of hydroclimatic processes: Preserving marginal distributions, correlation

structures, and intermittency. Advances in water resources 115, 234–252.

[30] Peres, D., Iuppa, C., Cavallaro, L., Cancelliere, A., Foti, E., 2015. Signiﬁcant wave height record extension by neural networks and reanalysis

wind data. Ocean Modelling 94, 128–140.

[31] Powers, J.G., Klemp, J.B., Skamarock, W.C., Davis, C.A., Dudhia, J., Gill, D.O., Coen, J.L., Gochis, D.J., Ahmadov, R., Peckham, S.E., et al.,

2017. The weather research and forecasting model: Overview, system eﬀorts, and future directions. Bulletin of the American Meteorological

Society 98, 1717–1737.

[32] Rousset, C., Vancoppenolle, M., Madec, G., Fichefet, T., Flavoni, S., Barth´

elemy, A., Benshila, R., Chanut, J., Levy, C., Masson, S., Vivier, F.,

2015. The louvain-la-neuve sea ice model lim3.6: global and regional capabilities. Geoscientiﬁc Model Development 8, 2991–3005. URL:

https://www.geosci-model- dev.net/8/2991/2015/, doi:10.5194/gmd-8- 2991-2015.

[33] Safeeq, M., Fares, A., 2011. Accuracy evaluation of climgen weather generator and daily to hourly disaggregation methods in tropical condi-

tions. Theoretical and applied climatology 106, 321–341.

[34] Sam´

e, A., Chamroukhi, F., Govaert, G., Aknin, P., 2011. Model-based clustering and segmentation of time series with changes in regime.

Advances in Data Analysis and Classiﬁcation 5, 301–321.

[35] Shamshad, A., Bawadi, M., Hussin, W.W., Majid, T., Sanusi, S., 2005. First and second order markov chain models for synthetic generation of

wind speed time series. Energy 30, 693–708.

[36] Steinschneider, S., Brown, C., 2013. A semiparametric multivariate, multisite weather generator with low-frequency variability for use in

climate risk assessments. Water resources research 49, 7205–7220.

[37] Stopa, J.E., Cheung, K.F., Tolman, H.L., Chawla, A., 2013. Patterns and cycles in the climate forecast system reanalysis wind and wave data.

Ocean Modelling 70, 207–220.

[38] Tolman, H.L., et al., 2009. User manual and system documentation of wavewatch iii tm version 3.14. Technical note, MMAB Contribution

276, 220.

[39] Walker, J.D., 2019. Introduction to the weathergen package. URL: http://walkerjeffd.github.io/weathergen/.

[40] Wilks, D.S., Wilby, R.L., 1999. The weather generation game: a review of stochastic weather models. Progress in physical geography 23,

329–357.

[41] Zamani, A., Solomatine, D., Azimian, A., Heemink, A., 2008. Learning from data for wind–wave forecasting. Ocean engineering 35, 953–962.