Steps of the kWNN algorithm carried out in the cluster. 

Steps of the kWNN algorithm carried out in the cluster. 

Source publication
Article
Full-text available
A new approach for big data forecasting based on the k-weighted nearest neighbours algorithm is introduced in this work. Such an algorithm has been developed for distributed computing under the Apache Spark framework. Every phase of the algorithm is explained in this work, along with how the optimal values of the input parameters required for the a...

Contexts in source publication

Context 1
... w ) 10: end for 11: Return RDD w 12: join(RDD w , RDD h ) finally makes a prediction from the RDD with every possible window of w past values and its next h values formed. This is shown in Figure 4, where every RDD i represents a variable across the cluster, meanwhile an rdd ij represents a chunk of the RDD i in the j executor of the cluster. Figure 5 presents the steps of the prediction, which are carried out in the cluster in a more detailed way. RDD 1 gathers the windows of w past values along with its next h values as seen in Figure 5(a). This RDD needs to be split into training and test sets each time h values are predicted, as the TRS is updated adding these h real values from the TES as explained in Figure 6. It needs to be mentioned that around 70% of the values will be the TRS and around 30% will be the ...
Context 2
... w ) 10: end for 11: Return RDD w 12: join(RDD w , RDD h ) finally makes a prediction from the RDD with every possible window of w past values and its next h values formed. This is shown in Figure 4, where every RDD i represents a variable across the cluster, meanwhile an rdd ij represents a chunk of the RDD i in the j executor of the cluster. Figure 5 presents the steps of the prediction, which are carried out in the cluster in a more detailed way. RDD 1 gathers the windows of w past values along with its next h values as seen in Figure 5(a). This RDD needs to be split into training and test sets each time h values are predicted, as the TRS is updated adding these h real values from the TES as explained in Figure 6. It needs to be mentioned that around 70% of the values will be the TRS and around 30% will be the ...
Context 3
... the following transaction, the distance from the pattern to every win- dow of w values is measured in a map function and added as a new column as shown in Figure ...
Context 4
... Figure 5(b), a filter function is applied in order to select the TRS, being n < m. Afterwards, the w past values of the h values to predict are selected as the pattern to search for its k nearest neighbours, as shown in Figure ...
Context 5
... the whole RDD is sorted in ascending order by the distances with a sortBy function and just the k with the closest distances to the pattern are selected with a take operation as shown in Figure 5(d) and in Figure 5(e), being ...
Context 6
... the whole RDD is sorted in ascending order by the distances with a sortBy function and just the k with the closest distances to the pattern are selected with a take operation as shown in Figure 5(d) and in Figure 5(e), being ...

Citations

... Historical averaging (HA) [40], autoregressive integrated moving average (ARIMA) [41], and vector autoregressive (VAR) [42] are all classical statistical methods for traffic forecasting, and these models need to rely on linear time series. Some machine-learning-based traffic forecasting models are support vector regression (SVR) [43], k-nearest neighbor (KNN) [44], and random forest regression (RFR) [45], which capture complex nonlinear relationships in traffic data. As traffic flow data are complex and nonlinear, machine-learning-based methods perform better than statistical forecasting methods, but machine-learning methods require some expertise for manual design. ...
Article
Full-text available
Accurate traffic flow forecasting is crucial for urban traffic control, planning, and detection. Most existing spatial-temporal modeling methods overlook the hidden dynamic correlations between road network nodes and the time series nonstationarity while synchronously capturing complex long- and short-term spatial-temporal dependencies. To this end, this paper proposes an Attention-based Spatial-Temporal Synchronous Graph Convolutional Network (AST-SGCN) to capture complex spatial-temporal correlations over long and short terms. Specifically, we design a self-attention mechanism that utilizes spatial-temporal synchronous computation to efficiently mine dynamic spatial-temporal correlations with changes in traffic and enhance computational efficiency. Then, we construct a residual adaptive adjacency matrix, which includes historical data and node vectors, to stimulate the information transfer of spatial-temporal graph nodes and mine the hidden spatial-temporal dependencies through the graph convolution layer. Next, we establish a Fourier transform layer (FTL) to handle the nonstationary data. Finally, we develop a spatial-temporal hybrid stacking module for capturing complex long-term spatial-temporal correlations, within which two layers of graph convolution and one layer of self-attention are deployed. Extensive experimental results on three real-world traffic flow datasets demonstrate that our AST-SGCN model outperforms the comparable models.
... Altman (1992) have noted that using a weighted average of the selected samples can help reduce bias in traditional KNN methods. More recently, KNN has been shown to be effective in making prediction using large volume of time series data sets (Sorjamaa et al., 2007;Talavera-Llames et al., 2018). ...
... Decomposition helps in separating the time series into its components, including trend, seasonality, and residuals (random variations). This enables to examination and analysis of each component separately, gaining insights into their individual contributions to the overall pattern (Talavera-Llames et al., 2018). ...
Article
Full-text available
Stock price prediction remains a complex challenge in financial markets. This study introduces a novel Long Short-Term Memory (LSTM) model optimized by Sand Cat Swarm Optimization (SCSO) for stock price prediction. The research evaluates multiple algorithms including ANN, LSTM variants, Auto-ARIMA, Gradient Boosted Trees, DeepAR, N-BEATS, N-HITS, and the proposed LSTM-SCSO using DAX index data from 2018 to 2023. Model performance was assessed through Mean Squared Error, Mean Absolute Error, Mean Absolute Percentage Error, and out-of-sample R2 metrics. Statistical significance was validated using Model Confidence Set analysis with 5000 bootstrap replications. Results demonstrate LSTM-SCSO's superior performance across all evaluation metrics. The model achieved an annualized return of 66.25% compared to the DAX index's 47.45%, with a Sharpe ratio of 2.9091. The integration of technical indicators and macroeconomic variables enhanced the model's predictive capabilities. These findings establish LSTM-SCSO as an effective tool for stock price prediction, offering practical value for investment decision-making.
... Both static and dynamic ensembles were analyzed obtaining promising results in effectiveness. The k-weighted nearest neighbors algorithm, originally proposed for single-machine environments (Troncoso et al., 2007(Troncoso et al., , 2002, was adapted for big data using distributed computation (Talavera-Llames et al., 2018). The authors developed a distributed computation scheme to find the neighbors. ...
... Because of continual technological developments in the hardware development industry, large-scale data gathering and storage are now easily feasible. A vast amount of historical data can be used to train different machine learning (ML) algorithms and to identify distinctive patterns and relationships between data in order to optimize different systems and procedures [8,9]. Table 2 represents the control parameters that are set for a particular mode, such as tidal volume in volume control mode or pressure control in pressure control mode. ...
Article
Full-text available
Ventilation mode is one of the most crucial ventilator settings, selected and set by knowledgeable critical care therapists in a critical care unit. The application of a particular ventilation mode must be patient-specific and patient-interactive. The main aim of this study is to provide a detailed outline regarding ventilation mode settings and determine the best machine learning method to create a deployable model for the appropriate selection of ventilation mode on a per breath basis. Per-breath patient data is utilized, preprocessed and finally a data frame is created consisting of five feature columns (inspiratory and expiratory tidal volume, minimum pressure, positive end-expiratory pressure, and previous positive end-expiratory pressure) and one output column (output column consisted of modes to be predicted). The data frame has been split into training and testing datasets with a test size of 30%. Six machine learning algorithms were trained and compared for performance, based on the accuracy, F1 score, sensitivity, and precision. The output shows that the Random-Forest Algorithm was the most precise and accurate in predicting all ventilation modes correctly, out of the all the machine learning algorithms trained. Thus, the Random-Forest machine learning technique can be utilized for predicting optimal ventilation mode setting, if it is properly trained with the help of the most relevant data. Aside from ventilation mode, control parameter settings, alarm settings and other settings may also be adjusted for the mechanical ventilation process utilizing appropriate machine learning, particularly deep learning approaches.
... Since all computations are done in memory on Spark, the execution of an application will be significantly up to 100 times faster than applications like Hadoop. Programming algorithms in this field have also been developed in the manner and location of data distribution, parallel complexity or fault tolerance in Python of R platform [42]. ...
Article
Due to the advancement in communication networks, metering and smart control systems, as well as the prevalent use of Internet-based structures, new forms of power systems have seen moderate changes with respect to several aspects of contradictory Cyber–Physical Power Systems (CPPSs). These structures usually have connections between power sections and cyber parts. CPPSs confront newly emerging issues including stability, resiliency, reliability, vulnerability and also security. Studying, analyzing and providing solutions to mitigate or solve these problems highly depend on accurate modeling methods and examining the interaction mechanisms associated with the cyber-security of Smart Grids (SGs). This paper aims to systematically summarize different methods and techniques and to review corresponding solution approaches in cyber-security in energy systems. In the first step, we discuss the interactive features of cyber-security; then, their modeling and mechanisms are reviewed and summarized in detail. Furthermore, the characteristics and applicability of different cyber-attack models are technically discussed and analyzed. The cutting-edge cyber security approaches such as blockchain and quantum computing in SGs and power systems are stated, and recent research directions are highlighted. The decisive problem-solving approaches and defense mechanisms are presented. Finally, some points regarding the role of cyber-security in the future of SGs are presented.
... In Refs. [28,29] multi-step forecasting was monitored by the ML models using the Spark environment. Specifically, Ref. [28] used H iterations to compute the multistep prediction, while [29] implemented multivariate regression models using ML libraries. ...
... [28,29] multi-step forecasting was monitored by the ML models using the Spark environment. Specifically, Ref. [28] used H iterations to compute the multistep prediction, while [29] implemented multivariate regression models using ML libraries. As a result, the H technique was not scalable for forecasting. ...
Article
Full-text available
Telecommunication companies collect a deluge of subscriber data without retrieving substantial information. Exploratory analysis of this type of data will facilitate the prediction of varied information that can be geographical, demographic, financial, or any other. Prediction can therefore be an asset in the decision-making process of telecommunications companies, but only if the information retrieved follows a plan with strategic actions. The exploratory analysis of subscriber data was implemented in this research to predict subscriber usage trends based on historical time-stamped data. The predictive outcome was unknown but approximated using the data at hand. We have used 730 data points selected from the Insights Data Storage (IDS). These data points were collected from the hourly statistic traffic table and subjected to exploratory data analysis to predict the growth in subscriber data usage. The Auto-Regressive Integrated Moving Average (ARIMA) model was used to forecast. In addition, we used the normal Q-Q, correlogram, and standardized residual metrics to evaluate the model. This model showed a p-value of 0.007. This result supports our hypothesis predicting an increase in subscriber data growth. The ARIMA model predicted a growth of 3 Mbps with a maximum data usage growth of 14 Gbps. In the experimentation, ARIMA was compared to the Convolutional Neural Network (CNN) and achieved the best results with the UGRansome data. The ARIMA model performed better with execution speed by a factor of 43 for more than 80,000 rows. On average, it takes 0.0016 s for the ARIMA model to execute one row, and 0.069 s for the CNN to execute the same row, thus making the ARIMA 43× (0.0690.0016) faster than the CNN model. These results provide a road map for predicting subscriber data usage so that telecommunication companies can be more productive in improving their Quality of Experience (QoE). This study provides a better understanding of the seasonality and stationarity involved in subscriber data usage’s growth, exposing new network concerns and facilitating the development of novel predictive models.
... Learning-Based PLF Methods. PLF methods remained popular for their promising results to forecast the power load consumption in residential buildings [13,14], subways [15], industries [16], and households [17,18]. Majority of the methods are based on traditional approaches. ...
... Suppose y ∼ i indicates the variable values for n number of predictions for energy consumption and y i indicates the observed values, so equations (14) to (16) show the MSE, RMSE, and MAE formulation where the RMSE is the square root of MSE values. Similarly, to measure the performance of forecasting that compute the correctness of the proposed method, MAPE is used, which gives the absolute error in percentage and compute the mean of the error and is given in equation 17. ...
Article
Full-text available
Over the decades, a rapid upsurge in electricity demand has been observed due to overpopulation and technological growth. The optimum production of energy is mandatory to preserve it and improve the energy infrastructure using the power load forecasting (PLF) method. However, the complex energy systems’ transition towards more robust and intelligent system will ensure its momentous role in the industrial and economical world. The extraction of deep knowledge from complex energy data patterns requires an efficient and computationally intelligent deep learning-based method to examine the future electricity demand. Stand by this, we propose an intelligent deep learning-based PLF method where at first the data collected from the house through meters are fed into the pre-assessment step. Next, the sequence of refined data is passed into a modified convolutional long short-term memory (ConvLSTM) network that captures the spatiotemporal correlations from the sequence and generates the feature maps. The generated feature map is forward propagated into a deep gated recurrent unit (GRU) network for learning, which provides the final PLF. We experimentally proved that the proposed method revealed promising results using mean square error (MSE) and root mean square error (RMSE) and outperformed state of the art using the competitive power load dataset.(Github Code). (Github code: https://github.com/FathUMinUllah3797/ConvLSTM-Deep_GRU).
... F. Martínez-Á lvarez fmaralv@upo.es relationships in the data [18,[25][26][27]. However, deep learning techniques are acquiring a great relevance nowadays to solve a large number of applications in multiple areas due to the enhancements in computational capabilities [33,40,41]. ...
Article
Full-text available
Nowadays, electricity is a basic commodity necessary for the well-being of any modern society. Due to the growth in electricity consumption in recent years, mainly in large cities, electricity forecasting is key to the management of an efficient, sustainable and safe smart grid for the consumer. In this work, a deep neural network is proposed to address the electricity consumption forecasting in the short-term, namely, a long short-term memory (LSTM) network due to its ability to deal with sequential data such as time-series data. First, the optimal values for certain hyper-parameters have been obtained by a random search and a metaheuristic, called coronavirus optimization algorithm (CVOA), based on the propagation of the SARS-Cov-2 virus. Then, the optimal LSTM has been applied to predict the electricity demand with 4-h forecast horizon. Results using Spanish electricity data during nine years and half measured with 10-min frequency are presented and discussed. Finally, the performance of the proposed LSTM using random search and the LSTM using CVOA is compared, on the one hand, with that of recently published deep neural networks (such as a deep feed-forward neural network optimized with a grid search) and temporal fusion transformers optimized with a sampling algorithm, and, on the other hand, with traditional machine learning techniques, such as a linear regression, decision trees and tree-based ensemble techniques (gradient-boosted trees and random forest), achieving the smallest prediction error below 1.5%.
... ese thoughtful and considerate personalized learning services are accompanied by the emergence of the network, and it has promoted the rapid growth of students' knowledge ability and the healthy development of personality. erefore, it has become a necessary research topic to design a digital learning environment that conforms to cognitive psychology and is suitable for individuals to conduct individualized learning so as to make it more conducive to learners' efficient and fast learning [10][11][12][13][14][15][16][17][18]. ...
Article
Full-text available
In order to meet the needs of the current personalized education, improve the shortcomings of the current digital learning system in personalized learning, and introduce the service concept into education, the development and research of the personalized learning system based on the analysis of user behavior patterns has become the development of digital learning. This paper studies a language learning method, that includes obtaining the user’s language learning interaction data to determine the user's language level: wherein, the user’s language level data includes the user’s initial language level and the current language level; initial learning model: according to the current language level, an adaptive algorithm is used to update the initial learning model, and the user performs linguistics further according to the updated learning model. In addition, in order to realize the clustering analysis of Chinese language online learning users’ learning behavior, since the Fuzzy C-means (FCM) clustering results are easily affected by the selection of their initial cluster centers, a Harmony Search (HS)-FCM-based Chinese language learning user learning behavior clustering analysis is proposed. The participation dimension, focus dimension, regularity dimension, interaction dimension, and academic performance are selected as the analysis indicators of learning behavior. The learner level is divided into 5 levels, namely excellent, good, medium, qualified, and poor. Compared with HSFCM, and decision tree, it is found that the algorithm Improved HS (HIS)-FCM in this paper has higher clustering accuracy, faster convergence speed, and lower fitness, which provides new opportunities for learner level division and optimization of course learning.