# Eamonn Keogh's research while affiliated with University of California and other places

**What is this page?**

This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

## Publications (366)

Time series data remains a perennially important datatype considered in data mining. In the last decade there has been an increasing realization that time series data can be best understood by reasoning about time series subsequences on the basis of their similarity to other subsequences: the two most familiar such time series concepts being motifs...

The matrix profile is an effective data mining tool that provides similarity join functionality for time series data. Users of the matrix profile can either join a time series with itself using intra-similarity join (i.e., self-join) or join a time series with another time series using inter-similarity join. By invoking either or both types of join...

In recent years, time series motif discovery has emerged as perhaps the most important primitive for many analytical tasks, including clustering, classification, rule discovery, segmentation, and summarization. In parallel, it has long been known that Dynamic Time Warping (DTW) is superior to other similarity measures such as Euclidean Distance und...

Query-based similarity search is a useful exploratory tool that has been used in many areas such as music, economics, and biology to find common patterns and behaviors. Existing query-based search systems allow users to search large time series collections, but these systems are not very robust and they often fail to find similar patterns. In this...

The first question a data analyst asks when confronting a new dataset is often, “Show me some representative/typical data.” Answering this question is simple in many domains, with random samples or aggregate statistics of some kind. Surprisingly, it is difficult for large time series datasets. The major difficulty is not time or space complexity, b...

Over the last decade, time series motif discovery has emerged as a useful primitive for many downstream analytical tasks, including clustering, classification, rule discovery, segmentation, and summarization. In parallel, there has been an increased understanding that Dynamic Time Warping (DTW) is the best time series similarity measure in a host o...

Data series motif discovery represents one of the most useful primitives for data series mining, with applications to many domains, such as robotics, entomology, seismology, medicine, and climatology, and others. The state-of-the-art motif discovery tools still require the user to provide the motif length. Yet, in several cases, the choice of motif...

In the last fifteen years, data series motif and discord discovery have emerged as two useful and well-used primitives for data series mining, with applications to many domains, including robotics, entomology, seismology, medicine, and climatology. Nevertheless, the state-of-the-art motif and discord discovery tools still require the user to provid...

Chickens are the most important poultry species in the world. Globally, industrial-scale production systems account for most of the poultry meat and eggs produced. The welfare of these birds matters for both ethical and economic reasons. From an ethical perspective, poultry have a sufficient degree of awareness to suffer pain if their health is poo...

In the last 15 years, data series motif and discord discovery have emerged as two useful and well-used primitives for data series mining, with applications to many domains, including robotics, entomology, seismology, medicine, and climatology. Nevertheless, the state-of-the-art motif and discord discovery tools still require the user to provide the...

The recently introduced data structure, the Matrix Profile, annotates a time series by recording the location of and distance to the nearest neighbor of every subsequence. This information trivially provides answers to queries for both time series motifs and time series discords, perhaps two of the most frequently used primitives in time series dat...

At their core, many time series data mining algorithms reduce to reasoning about the shapes of time series subsequences. This requires an effective distance measure, and for last two decades most algorithms use Euclidean distance or DTW as their core subroutine. We argue that these distance measures are not as robust as the community seems to belie...

Time series classification is an important task in its own right, and it is often a precursor to further downstream analytics. To date, virtually all works in the literature have used either shape-based classification using a distance measure or feature-based classification after finding some suitable features for the domain. It seems to be underap...

The discovery of conserved (repeated) patterns in time series is arguably the most important primitive in time series data mining. Called time series motifs, these primitive patterns are useful in their own right, and are also used as inputs into classification, clustering, segmentation, visualization, and anomaly detection algorithms. Recently the...

The UCR time series archive - introduced in 2002, has become an important resource in the time series data mining community, with at least one thousand published papers making use of at least one data set from the archive. The original incarnation of the archive had sixteen data sets but since that time, it has gone through periodic expansions. The...

The article Domain agnostic online semantic segmentation for multi-dimensional time series, written by Shaghayegh Gharghabi, Chin-Chia Michael Yeh, Yifei Ding, Wei Ding, Paul Hibbing, Samuel LaMunion, Andrew Kaplan, Scott E. Crouter, Eamonn Keogh was originally published electronically on the publisher’s internet portal (currently SpringerLink) on...

Research into time series classification has tended to focus on the case of series of uniform length. However, it is common for real-world time series data to have unequal lengths. Differing time series lengths may arise from a number of fundamentally different mechanisms. In this work, we identify and evaluate two classes of such mechanisms -- var...

Time series motifs were introduced in 2002 and have since become a fundamental tool for time series analytics, finding diverse uses in dozens of domains. In this work, we introduce Time Series Chains, which are related to, but distinct from, time series motifs. Informally, time series chains are a temporally ordered set of subsequence patterns, suc...

In manufacturing, a golden batch is an idealized realization of the perfect process to produce the desired item, typically represented as a multidimensional time series of pressures, temperatures, flow-rates and so forth. The golden batch is sometimes produced from first-principle models, but it is typically created by recording a batch produced by...

Time series are one of the most common data types in nature. Given this fact, there are dozens of query-by-sketching/ query-by-example/ query-algebra systems proposed to allow users to search large time series collections. However, none of these systems have seen widespread adoption. We argue that there are two reasons why this is so. The first rea...

Unsupervised semantic segmentation in the time series domain is a much studied problem due to its potential to detect unexpected regularities and regimes in poorly understood data. However, the current techniques have several shortcomings, which have limited the adoption of time series semantic segmentation beyond academic settings for four primary...

We present a new method to accelerate the process of matched filtering (template matching) of seismic waveforms by efficient calculation of (cross-) correlation coefficients. The cross-correlation method is commonly used to analyze seismic data, for example, to detect repeating or similar seismic waveform signals, earthquake swarms, foreshocks, aft...

Since its introduction, unsupervised representation learning has attracted a lot of attention from the research community, as it is demonstrated to be highly effective and easy-to-apply in tasks such as dimension reduction, clustering, visualization, information retrieval, and semi-supervised learning. In this work, we propose a novel unsupervised...

In 2002, the UCR time series classification archive was first released with sixteen datasets. It gradually expanded, until 2015 when it increased in size from 45 datasets to 85 datasets. In October 2018 more datasets were added, bringing the total to 128. The new archive contains a wide range of problems, including variable length series, but it st...

The UCR Time Series Archive - introduced in 2002, has become an important resource in the time series data mining community, with at least one thousand published papers making use of at least one dataset from the archive. The original incarnation of the archive had sixteen datasets but since that time, it has gone through periodic expansions. The l...

Similarity search is the core procedure for several time series mining tasks. While different distance measures can be used for this purpose, there is clear evidence that the Dynamic Time Warping (DTW) is the most suitable distance function for a wide range of application domains. Despite its quadratic complexity, research efforts have proposed a s...

Dynamic Time Warping (DTW) is a highly competitive distance measure for most time series data mining problems. Obtaining the best performance from DTW requires setting its only parameter, the maximum amount of warping (w). In the supervised case with ample data, w is typically set by cross-validation in the training stage. However, this method is l...

Since their introduction over a decade ago, time se-ries motifs have become a fundamental tool for time series analytics, finding diverse uses in dozens of domains. In this work we introduce Time Series Chains, which are related to, but distinct from, time series motifs. Informally, time series chains are a temporally ordered set of subsequence pat...

Most algorithms for music data mining and retrieval analyze the similarity between feature sets extracted from the raw audio. A conventional approach to assess similarities within or between recordings is to create similarity matrices. However, this method requires quadratic space for each comparison and typically requires a costly post-processing...

In the last fifteen years, data series motif discovery has emerged as one of the most useful primitives for data series mining, with applications to many domains, including robotics, entomology, seismology, medicine, and climatology. Nevertheless, the state-of-the-art motif discovery tools still require the user to provide the motif length. Yet, in...

Data series motif discovery represents one of the most useful primitives for data series mining, with applications to many domains, such as robotics, entomology, seismology, medicine, and climatology, and others. The state-of-the-art motif discovery tools still require the user to provide the motif length. Yet, in several cases, the choice of motif...

Domain-specific distances preferred by analysts for exploring similarities among time series tend to be ``point-to-point'' distances. Unfortunately, this point-wise nature limits their ability to perform meaningful comparisons between sequences of different lengths and with temporal mis-alignments. Analysts instead need ``elastic'' alignment tools...

The discovery of time series motifs has emerged as one of the most useful primitives in time series data mining. Researchers have shown its utility for exploratory data mining, summarization, visualization, segmentation, classification, clustering, and rule discovery. Although there has been more than a decade of extensive research, there is still...

Time series motifs are approximately repeated subsequences found within a longer time series. They have been in the literature since 2002, but recently they have begun to receive significant attention in research and industrial communities. This is perhaps due to the growing realization that they implicitly offer solutions to a host of time series...

The last decade has seen a flurry of research on all-pairs-similarity-search (or similarity joins) for text, DNA and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. However, there has been surprisingly little progress made on similarity joins for time series subsequences. The lack of progress...

Time series motif discovery has emerged as perhaps the most used primitive for time series data mining, and has seen applications to domains as diverse as robotics, medicine and climatology. There has been recent significant progress on the scalability of motif discovery. However, we believe that the current definitions of motif discovery are limit...

In academic settings over the last decade, there has been significant progress in time series classification. However, much of this work makes assumptions that are simply unrealistic for deployed industrial applications. Examples of these unrealistic assumptions include the following: assuming that data subsequences have a single fixed-length, are...

In recent years, the research community, inspired by its success in dealing with single-dimensional time series, has turned its attention to dealing with multidimensional time series. There are now a plethora of techniques for indexing, classification, and clustering of multidimensional time series. However, we argue that the difficulty of explorat...

In the last 5 years there have been a large number of new time series classification algorithms proposed in the literature. These algorithms have been evaluated on subsets of the 47 data sets in the University of California, Riverside time series classification archive. The archive has recently been expanded to 85 data sets, over half of which have...