All content in this area was uploaded by Gustavo Enrique Batista on Sep 30, 2018
Content may be subject to copyright.
A preview of the PDF is not available
... This method not only improves computational efficiency but also provides superior classification accuracy compared to conventional DTW methods [30]. Such advancements are critical as they address the computational challenges associated with DTW's quadratic time complexity, which can hinder its application in large datasets [31]. Furthermore, the integration of DTW with machine learning techniques has opened new avenues for its application. ...
Energy stock price prediction is a pivotal challenge in financial forecasting, characterized by high volatility and complexity influenced by geopolitical factors, regulatory shifts, and sector-specific issues. Traditional methods often struggle to account for the intricate dependencies and temporal patterns present in energy stock data. To address these limitations, this study introduces a hybrid model that integrates a Graph Convolutional Network (GCN) with an attention-enhanced Long Short-Term Memory (LSTM) architecture. By employing a graph structure derived from Dynamic Time Warping (DTW), the GCN captures inter-stock relationships, while the attention mechanism within the LSTM component refines the modelling of temporal dynamics, allowing the model to focus on the most relevant historical information. Experimental evaluations across multiple energy stocks show that this combined LSTMGC model significantly outperforms conventional approaches- including Linear Regression, GRU, MLP, and standalone LSTMs- when assessed using Mean Squared Error (MSE) and R-squared (R²) metrics. By jointly leveraging spatial and temporal dependencies, as well as the selective attention mechanism, the proposed framework enhances predictive accuracy and reliability, offering valuable insights for investors and policymakers navigating the evolving energy market.
Crowdsourced testing has gained prominence in the field of software testing due to its ability to effectively address the challenges posed by the fragmentation problem in mobile app testing. The inherent openness of crowdsourced testing brings diversity to the testing outcome. However, it also presents challenges for app developers in inspecting a substantial quantity of test reports. To help app developers inspect the bugs in crowdsourced test reports as early as possible, crowdsourced test report prioritization has emerged as an effective technology by establishing a systematic optimal report inspecting sequence. Nevertheless, crowdsourced test reports consist of app screenshots and textual descriptions, but current prioritization approaches mostly rely on textual descriptions, and some may add vectorized image features at the image-as-a-whole level or widget level. They still lack precision in accurately characterizing the distinctive features of crowdsourced test reports. In terms of prioritization strategy, prevailing approaches adopt simple prioritization based on features combined merely using weighted coefficients, without adequately considering the semantics, which may result in biased and ineffective outcomes.
In this paper, we propose ENCREPRIOR, an enhanced crowdsourced test report prioritization approach via image-and-text semantic understanding and feature integration. ENCREPRIOR extracts distinctive features from crowdsourced test reports. For app screenshots, ENCREPRIOR considers the structure (i.e., GUI layout) and the contents (i.e., GUI widgets), viewing the app screenshot from the macroscopic and microscopic perspectives, respectively. For textual descriptions, ENCREPRIOR considers the Bug Description and Reproduction Step as the bug context. During the prioritization, we do not directly merge the features with weights to guide the prioritization. Instead, in order to comprehensively consider the semantics, we adopt a prioritize-reprioritize strategy. This practice combines different features together by considering their individual ranks. The reports are first prioritized on four features separately. Then, the ranks on four sequences are used to lexicographically reprioritize the test reports with an integration of features from app screenshots and textual descriptions. Results of an empirical study show that ENCREPRIOR outperforms the representative baseline approach DEEPPRIOR by 15.61% on average, ranging from 2.99% to 63.64% on different apps, and the novelly proposed features and prioritization strategy all contribute to the excellent performance of ENCREPRIOR.</p
Abrupt changes in speed due to external disturbances can occur during human gait. Insight into the gait compensation mechanism is necessary to examine gait stability in the context of unexpected speed changes. The present study investigated the effect of speed changes on overall gait patterns under three different walking speed conditions. Dynamic time warping (DTW) and cross-correlation techniques were used, revealing that similarity of gait kinematics decreased at slower gait speed compared to preferred and faster speeds. These results suggest that individuals may encounter difficulties in maintaining gait stability at slower speed than normal and faster speeds. Additionally, at the moment when speed changes occur, the similarity of gait patterns significantly decreases, as indicated by higher DTW and cross-correlation values. This insight could assist clinicians and physical therapists in gaining an understanding of overall gait patterns during unexpected speed changes and developing assistive devices to prevent slips, trips, and falls.
This paper presents a comprehensive study of application of different machine learning techniques for the prediction of turbine faults in an early stage. As the need to ensure an optimal turbine operation through early stage fault detection grows, precise predictive models using machine learning techniques are also increasing rapidly. This study primarily focuses on development and evaluation of the predictive models capable of anticipating fault phases using different range of classification algorithms. It assesses the effectiveness of different classification methods including Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest, and Convolutional Neural Networks (CNNs) to predict distinct turbine phases. These algorithms are evaluated on experimented dataset encompassing various stages of faults. The results from the algorithm show that CNN was accurately predicting the particular turbine fault phases achieving 80% accuracy with a small length of time series. The deep learning capabilities of CNN capture intricate spatial patterns in turbine data enabling them to discern fault indicators eluded by traditional machine learning methods. The insights gained from this research helps to extend the application of machine learning approaches in the field of turbine fault detection, paving the way for more precise and dependable early failure detection systems.
Due to the openness of the crowdsourced testing paradigm, crowdworkers submit massive spotty duplicate test reports, which hinders developers from effectively reviewing the reports and detecting bugs. Test report clustering is widely used to alleviate this problem and improve the effectiveness of crowdsourced testing. Existing clustering methods basically rely on the analysis of textual descriptions. A few methods are independently supplemented by analyzing screenshots in test reports as pixel sets, leaving out the semantics of app screenshots from the widget perspective. Further, ignoring the semantic relationships between screenshots and textual descriptions may lead to the imprecise analysis of test reports, which in turn negatively affects the clustering effectiveness. This paper proposes a semi-supervised crowdsourced test report clustering approach, namely SemCluster. SemCluster respectively extracts features from app screenshots and textual descriptions and forms the structure feature, the content feature, the bug feature, and reproduction steps. The clustering is principally conducted on the basis of the four features. Further, in order to avoid bias of specific individual features, SemCluster exploits the semantic relationships between app screenshots and textual descriptions to form the semantic binding rules as guidance for clustering crowdsourced test reports. Experiment results show that SemCluster outperforms state-of-the-art approaches on six widely used metrics by 10.49% -- 200.67%, illustrating the excellent effectiveness.
Given the ubiquity of time series data, the data mining community has spent significant time investigating the best time series similarity measure to use for various tasks and domains. After more than a decade of extensive efforts, there is increasing evidence that Dynamic Time Warping (DTW) is very difficult to beat. Given that, recent efforts have focused on making the intrinsically slow DTW algorithm faster. For the similarity-search task, an important subroutine in many data mining algorithms, significant progress has been made by replacing the vast majority of expensive DTW calculations with cheap-to-compute lower bound calculations. However, these lower bound based optimizations do not directly apply to clustering, and thus for some realistic problems, clustering with DTW can take days or weeks. In this work, we show that we can mitigate this untenable lethargy by casting DTW clustering as an anytime algorithm. At the heart of our algorithm is a novel data-adaptive approximation to DTW which can be quickly computed, and which produces approximations to DTW that are much better than the best currently known linear-time approximations. We demonstrate our ideas on real world problems showing that we can get virtually all the accuracy of a batch DTW clustering algorithm in a fraction of the time. Keywords DTW, Clustering, Anytime Algorithm. Copyright
In time series mining, the Dynamic Time Warping (DTW) distance is a commonly and widely used similarity measure. Since the computational complexity of the DTW distance is quadratic, various kinds of warping constraints, lower bounds and abstractions have been developed to speed up time series mining under DTW distance.
In this contribution, we propose a novel Lucky Time Warping (LTW) distance, with linear time and space complexity, which uses a greedy algorithm to accelerate distance calculations for nearest neighbor classification. The results show that, compared to the Euclidean distance (ED) and (un)constrained DTW distance, our LTW distance trades classification accuracy against computational cost reasonably well, and therefore can be used as a fast alternative for nearest neighbor time series classification.
Time-series data naturally arise in countless domains, such as meteorology, astrophysics, geology, multimedia, and economics. Similarity search is very popular, and DTW (Dynamic Time Warping) is one of the two prevailing distance measures. Although DTW incurs a heavy computation cost, it provides scaling along the time axis. In this paper, we propose FTW (Fast search method for dynamic Time Warping), which guarantees no false dismissals in similarity query processing. FTW efficiently prunes a significant number of the search cost. Experiments on real and synthetic sequence data sets reveals that FTW is significantly faster than the best existing method, up to 222 times.
The Dynamic Time Warping (DTW) distance measure is a technique that has long been known in speech recognition community. It allows a non-linear mapping of one signal to another by minimizing the distance between the two. A decade ago, DTW was introduced into Data Mining community as a utility for various tasks for time series problems including classification, clustering, and anomaly detection. The technique has flourished, particularly in the last three years, and has been applied to a variety of problems in various disciplines. In spite of DTW's great success, there are still several persistent "myths" about it. These myths have caused confusion and led to much wasted research effort. In this work, we will dispel these myths with the most comprehensive set of time series experiments ever conducted.
Dynamic Time Warping (DTW) has a quadratic time and space complexity that limits its use to small time series. In this paper we introduce FastDTW, an approximation of DTW that has a linear time and space complexity. FastDTW uses a multilevel approach that recursively projects a solution from a coarser resolution and refines the projected solution. We prove the linear time and space complexity of FastDTW both theoretically and empirically. We also analyze the accuracy of FastDTW by comparing it to two other types of existing approximate DTW algorithms: constraints (such as Sakoe-Chiba Bands) and abstraction. Our results show a large improvement in accuracy over existing methods.
The previous decade has brought a remarkable increase of the interest in applications that deal with querying and mining of time series data. Many of the research efforts in this context have focused on introducing new representation methods for dimensionality reduction or novel similarity measures for the underlying data. In the vast majority of cases, each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive experimental study re-implementing eight different time series representations and nine similarity measures and their variants, and testing their effectiveness on thirty-eight time series data sets from a wide variety of application domains. In this paper, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. In addition to providing a unified validation of some of the existing achievements, our experiments also indicate that, in some cases, certain claims in the literature may be unduly optimistic.
A recently introduced primitive for time series data mining, unsupervised shapelets (u-shapelets), has demonstrated significant potential for time series clustering. In contrast to approaches that consider the entire time series to compute pairwise similarities, the u-shapelets technique allows considering only relevant subsequences of time series. Moreover, u-shapelets allow us to bypass the apparent chicken-and-egg paradox of defining relevant with reference to the clustering itself. U-shapelets have several advantages over rival methods. First, they are defined even when the time series are of different lengths; for example, they allow clustering datasets containing a mixture of single heartbeats and multi-beat ECG recordings. Second, u-shapelets mitigate sensitivity to irrelevant data such as noise, spikes, dropouts, etc. Finally, u-shapelets demonstrated ability to provide additional insights into the data. Unfortunately, the state-of-the-art algorithms for u-shapelets search are intractable and so their advantages have only been demonstrated on tiny datasets. We propose a simple approach to speed up a u-shapelet discovery by two orders of magnitude, without any significant loss in clustering quality.
Dynamic time warping (DTW) has proven itself to be an exceptionally strong distance measure for time series. DTW in combination with one-nearest neighbor, one of the simplest machine learning methods, has been difficult to convincingly outperform on the time series classification task. In this paper, we present a simple technique for time series classification that exploits DTW’s strength on this task. But instead of directly using DTW as a distance measure to find nearest neighbors, the technique uses DTW to create new features which are then given to a standard machine learning method. We experimentally show that our technique improves over one-nearest neighbor DTW on 31 out of 47 UCR time series benchmark datasets. In addition, this method can be easily extended to be used in combination with other methods. In particular, we show that when combined with the symbolic aggregate approximation (SAX) method, it improves over it on 37 out of 47 UCR datasets. Thus the proposed method also provides a mechanism to combine distance-based methods like DTW with feature-based methods like SAX. We also show that combining the proposed classifiers through ensembles further improves the performance on time series classification.
The study and comparison of sequences of characters from a finite alphabet is relevant to various areas of science, notably molecular biology. The measurement of sequence similarity involves the consideration of the different possible sequence alignments in order to find an optimal one for which the “distance” between sequences is minimum. By associating a path in a lattice to each alignment, a geometric insight can be brought into the problem of finding an optimal alignment. This problem can then be solved by applying a dynamic programming algorithm. However, the computational effort grows rapidly with the number N of sequences to be compared (O(ℓ N), where ℓ is the mean length of the sequences to be compared). It is proved here that knowledge of the measure of an arbitrarily chosen alignment can be used in combination with information from the pairwise alignments to considerably restrict the size of the region of the lattice in consideration. This reduction implies fewer computations and less memory space needed to carry out the dynamic programming optimization process. The observations also suggest new variants of the multiple alignment problem.
This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition. First, a general principle of time-normalization is given using time-warping function. Then, two time-normalized distance definitions, called symmetric and asymmetric forms, are derived from the principle. These two forms are compared with each other through theoretical discussions and experimental studies. The symmetric form algorithm superiority is established. A new technique, called slope constraint, is successfully introduced, in which the warping function slope is restricted so as to improve discrimination between words in different categories. The effective slope constraint characteristic is qualitatively analyzed, and the optimum slope constraint condition is determined through experiments. The optimized algorithm is then extensively subjected to experimental comparison with various DP-algorithms, previously applied to spoken word recognition by different research groups. The experiment shows that the present algorithm gives no more than about two-thirds errors, even compared to the best conventional algorithm.