ArticlePDF Available

Simple Algorithms for Peak Detection in Time-Series

Authors:

Abstract and Figures

Identifying and analyzing peaks (or spikes) in a given time-series is important in many applications. Peaks indicate significant events such as sudden increase in price/volume, sharp rise in demand, bursts in data traffic etc. While it is easy to visually identify peaks in a small univariate time-series, there is a need to formalize the notion of a peak to avoid subjectivity and to devise algorithms to automatically detect peaks in any given time-series. The latter is important in applications such as data center monitoring where thousands of large time-series indicating CPU/memory utilization need to be analyzed in real-time. A data point in a time-series is a local peak if (a) it is a large and locally maximum value within a window, which is not necessarily large nor globally maximum in the entire time-series; and (b) it is isolated i.e., not too many points in the window have similar values. Not all local peaks are true peaks; a local peak is a true peak if it is a reasonably large value even in the global context. We offer different formalizations of the notion of a peak and propose corresponding algorithms to detect peaks in the given time-series. We experimentally compare the effectiveness of these algorithms.
Content may be subject to copyright.
1
Simple Algorithms for Peak Detection in Time-Series
Girish Keshav Palshikar
Tata Research Development and Design Centre (TRDDC)
54B Hadapsar Industrial Estate
Pune 411013, India.
Email: gk.palshikar@tcs.com
2
Simple Algorithms for Peak Detection in Time-Series
Abstract: Identifying and analyzing peaks (or spikes) in a given time-series is important in many
applications. Peaks indicate significant events such as sudden increase in price/volume, sharp
rise in demand, bursts in data traffic etc. While it is easy to visually identify peaks in a small
univariate time-series, there is a need to formalize the notion of a peak to avoid subjectivity and
to devise algorithms to automatically detect peaks in any given time-series. The latter is
important in applications such as data center monitoring where thousands of large time-series
indicating CPU/memory utilization need to be analyzed in real-time. A data point in a time-series
is a local peak if (a) it is a large and locally maximum value within a window, which is not
necessarily large nor globally maximum in the entire time-series; and (b) it is isolated i.e., not
too many points in the window have similar values. Not all local peaks are true peaks; a local
peak is a true peak if it is a reasonably large value even in the global context. We offer different
formalizations of the notion of a peak and propose corresponding algorithms to detect peaks in
the given time-series. We experimentally compare the effectiveness of these algorithms.
Keywords: Time-series, Peak detection, Burst detection, Spike detection
1. INTRODUCTION
Identifying and analyzing peaks (also called spikes) in a given time-series is an important in
many applications, because peaks are useful topological features of a time-series. In power
distribution data, peaks indicate sudden high demands. In server CPU utilization data, peaks
indicate sharp increase in workload. In network data, peaks correspond to bursts in traffic. In
financial data, peaks indicate abrupt rise in price or volume. Troughs can be considered as
inverted peaks and are equally important in many applications. Many other application areas
e.g., bioinformatics (Azzini et al (2004)), mass spectrometry (Coombes et al (2005)), signal
processing (Jordanov, Hall and Kastner (2002), Harmer et al (2008)), image processing (Ma1,
van Genderen1 and Beukelman (2005)), astrophysics (Zhu and Shaha 2003) require peak
detection. We distinguish between peaks (which are high values with sharp rise followed quickly
by sharp fall implying a narrow base width) and bursts (which are relatively wide contiguous
regions of high values). Thus a burst consists of a wide region of high values with sharp falls on
3
either side, whereas a peak is a very narrow region (only a few points) of high values with sharp
falls on either side. We formalize these notions below.
After the peaks are detected, analysis of these peaks consists of many tasks such as identifying
periodicity of peaks (Vlachos, Meek, Vagena and Gunopulos (2004)), forecasting the time of
occurrence and value of the next peak (Choi, Park, Kim and Kim (1996)) and identifying
dependencies among peaks of two or more time-series (e.g., in a multivariate time-series).
While it is easy to visually identify peaks in a small univariate time-series, there is a need to
formalize the notion of a peak to avoid subjectivity and to devise algorithms to automatically
detect peaks in any given time-series. The latter is important in applications such as data center
monitoring where thousands of large time-series indicating CPU/memory utilization of
thousands of servers need to be analyzed in real-time.
In this paper, we propose several different ways of formalizing the notion of a peak. We present
several different algorithms, each based on a specific formalization of the notion of a peak, to
detect all peaks in the given time-series. We discuss experimental evaluation of these algorithms.
We also provide a comparison of the proposed algorithms among each other and with those in
the related literature.
2. RELATED WORK
Peak detection is a common task in time-series analysis and signal processing. Standard
approaches to peak detection include (i) using smoothing and then fitting a known function (e.g.,
a polynomial) to the time-series; and (ii) matching a known peak shape to the time-series.
Another common approach to peak-trough detection is to detect zero-crossings (i.e., local
maxima) in the differences (slope sign change) between a point and its neighbours. However,
this detects all peaks-troughs, whether strong or not. To reduce the effects of noise, it is required
that the local signal-to-noise ratio (SNR) should be over a certain threshold; see Nijm et al
(2007) and Jordanov, Hall and Kastner (2002). The key question now is how to set the correct
threshold so as to minimize false positives. Ma, van Genderen and Beukelman (2005) compute
4
the threshold automatically by adapting it to the noise levels in the time-series as h = (max +
abs_avg)/2 + K * abs_dev, where max is the maximum value in the time-series, abs_avg is the
average of the absolute values in the time-series, abs_dev is the mean absolute deviation and K is
a user-specified constant.
Azzini et al (2004) analyze peaks in gene expression microarray time-series data (for malaria
parasite Plasmodium falciparum) using multiple methods; each method assigns a score to every
point in the time-series. In one method, the score is the rate of change (i.e., the derivative)
computed at each point. In another method, the score is computed as the fraction of the area
under the candidate peak. Top 10 candidate peaks are selected for each method; peaks detected
by multiple methods are chosen as true peaks. The detected peaks are used to identify genes;
SVM are then used to assign a functional group to each identified gene.
Key problems in peak detection are noise in the data and the fact that peaks occur with different
amplitudes (strong and weak peaks) and at different scales, which result in a large number of
false positives among detected peaks. Based on the observation that peaks in mass spectroscopy
data have characteristic shapes, Du, Kibbe and Lin (2006) propose a continuous wavelet
transform (CWT) based pattern-matching algorithm for peak detection. 2D array of CWT
coefficients is computed (using a Mexican Hat mother wavelet which has the basic shape like a
peak) for the time-series at multiple scales and “ridges” in this wavelet space representation are
systematically examined to identify peaks. Coombes et al (2005) and Lange et al (2006) present
other approaches for peak detection using wavelets and their applications to analyze
spectroscopy data.
Zhu and Shasha (2003) propose a wavelet-based burst (not peak) detection algorithm. The
wavelet coefficients (as well as window statistics such as averages) for Haar wavelets are
organized in a special data structure called the shifted wavelet tree (SWT). Each level in the tree
corresponds to a resolution or time scale and each node corresponds to a window. By
automatically scanning windows of different sizes and different time resolutions, the bursts can
be elastically detected (appropriate window size is automatically decided). Zhu and Shasha
5
(2003) apply their technique to detecting Gamma Ray bursts in real-time in the Milagro
astronomical telescope, which vary widely in their strength and duration (from minutes to days).
Harmer et al (2008) propose a momentum-based algorithm to detect peaks. The idea is compute
velocity (i.e., rate of change) and momentum (i.e., product of value and velocity) at various
points. A “ball” dropped from a previously detected peak will gain momentum as it climbs down
and lose momentum as it climbs the next peak; the point where it comes to rest (loses all its
momentum) is the next peak. Simple analogs of the laws in Newtonian mechanics are proposed
(e.g., friction) to compute changes in momentum as the ball traverses the time-series.
Vlachos et al (2004) describe a moving average based algorithm for burst (not peak) detection;
our peak function S2 is closely related to this algorithm. The time-series is smoothed using a
moving average filter and values which are larger than x times the standard deviation of the
entire (smoothed) time-series are considered as peaks; x is typically between 1.5 to 2.0. The
extent of smoothing is decided using domain knowledge (e.g., 30 points for daily data). See also
Vlachos et al (2008) for closely related work, application to burst detection in real-time
streaming data and analysis of correlations between bursts.
3. PROBLEM FORMALIZATION
Let T = x1, x2, …, xN be a given univariate uniformly sampled time-series containing N values.
Without loss of generality, the time instants are assumed to be 1, 2, …, N (i.e., the time-series T
is uniformly sampled). Let xi be a given ith point in T. Let S be a given peak function, which
associates a score (which is a non-negative real number) S(i, xi, T) with ith element xi of the given
time-series T. A given point xi in T is a peak if S(i, xi, T) , where is a user-specified (or
suitably calculated) threshold value. The important question is: how to compute the function S?
We provide different characterizations of the peak function S.
We begin with the observation that a peak is clearly a local phenomenon, although a local peak
may not be accepted as a true peak in the light of other peaks in the time-series. A data point in a
time-series is a local peak if (a) it is a large and locally maximum value within a window; the
6
value need not necessarily be large nor globally maximum in the entire time-series; and (b) it is
isolated i.e., not too many points in the window have similar values. Not all local peaks are true
peaks; a local peak is a true peak if it is a reasonably large value even in the global context. We
offer different formalizations of the notion of a peak and propose corresponding algorithms to
detect peaks in the given time-series.
We first propose several different ways to compute the function S, which captures the
“spikiness” of the point xi in the local context. We then discuss how locally detected peaks (using
the function S) can be validated in the time-series as a whole. In the following, we assume that k
> 0 is a given integer. Let N+(k,i,T) = <xi+1, xi+2,…,xi+k> the sequence of k right temporal
neighbours of xi i.e., k points immediately following the ith point xi in T. N(k,i,T) is defined
similarly as the set of k left (previous) temporal neighbours of xi. Let N(k,i,T) = N+(k,i,T)
N
(k,i,T) denote the sequence of 2k points around the ith point (without the ith point itself) in T (
denotes concatenation). Let N
(k,i,T) = N+(k,i,T) {xi} N
(k,i,T). For clarity, the definitions
below generally assume that k < i < N k; each definition can be easily modified to cover other
values of i towards the beginning and end of the time-series.
1. For a given point xi in T, the following function S1 computes the average of (i) the maximum
among the signed distances of xi from its k left neighbours and (ii) the maximum among the
signed distances of xi from its k right neighbours. Low values of k (e.g., 3 to 5) are usually
suitable, if most peaks are “thin”. Values of S1(k,i,xi,T) indicate the “significance” of the
height of the peak at the ith time instant.
2},,,max{},,,max{
),,,( 2121
1kiiiiiikiiiiii
ixxxxxxxxxxxx
TxikS
2. Function S2 computes the average of (i) the average of the signed distances of xi from its k
left neighbours and (ii) the average of the signed distances of xi from its k right neighbours.
2
)()(
),,,(
2121
2kxxxxxx
kxxxxxx
TxikS
kiiiiiikiiiiii
i
3. Function S3 computes the average signed distance of the ith value xi in T from the average
value of its k temporal neighbours.
7
2
......
),,,(
2121
3
kxxx
x
kxxx
x
TxikS
kiii
i
kiii
i
i
4. Entropy of any sequence of M values A = <a1, a2,…,aM> is defined as follows:
 
M
iiwiww apapAH 1))(log()()(
where pw(ai) is an estimate of the probability density at ai. The kernel density technique (also
called Parzen window) can be used (Wand and Jones (1995)) to estimate the probability
density p(ai) at ith value ai in the given sequence A:
M
jwii
j
wii
iw aa
aa
K
aaM
ap i
1
1
)(
where K is a suitable kernel function and w > 0 is a given integer. Subscript w in H and p
indicates the width parameter used in kernel density estimation. Epanechnikov and Gaussian
are two well-known kernel functions (defined below):
 
otherwise
1 |x| if
0
1
4
3
)( 2
xxK
Function S4 computes the difference in the entropy of the two sequences N(k,i,T) and
N
(k,i,T), which gives an idea of how “influential” or significant xi is in this window.
Gaussian kernel is used to compute the density estimate.
)),,(()),,((),,,,(
4TikNHTikNHTxiwkS wwi
5. Another idea is that a peak would be an “outlier” when considered in the local context of a
window of 2k points around it. While there are a large number of sophisticated approaches
for outlier detection (Barnett and Lewis (1994)), considering the need for efficiency and
ability to work with small data (2k points), we use either one of the following well-known
techniques. Let m, s denote the mean and standard deviation of the 2k data points in N(k,i,T)
around xi.
8
(a) The ith point xi is a peak if (i) xi m and (ii) |xi m| 3s. Assuming that the 2k values in
N(k,i,T) are normally distributed with mean m and standard deviation s, by the well-
known normal probability rule, P[ 3s < xi m < 3s] = 0.997. Hence, if |xi m| 3s then
the value xi is clearly rare. Since the data in N(k,i,T) may not always be normally
distributed, we propose the following non-parametric technique.
(b) Chebyshev Inequality states that for a random variable X with mean and standard
deviation , and for any positive number h, P[|X - | < h] 1 1/h2 i.e., P[|X - | h]
< 1/h2. Applying this to our case (and using m and s as estimators of and ), P[|xi - m|
hs] < 1/h2; e.g., h = 3 gives P[|xi - m| 3s] < 0.111. Chebyshev Inequality is non-
parametric i.e., it does not assume any particular distribution for the values of the random
variable X. Another decision rule for whether xi is a peak or not is as follows: the ith point
xi is a peak if (i) xi m and (ii) |xi m| hs, for some suitably chosen h > 0.
Using each of the above peak functions, we could easily write an algorithm to detect all
peaks in the given time-series T. We show below the algorithm that uses the peak
function S1 (other peak detection algorithms are very similar, except that each uses a
different peak function). The peak function S1 computes its value at each point using the
local window (context) of size 2k around that point. All points where the peak function
has a positive value are candidate peaks. We rule out some of these locally detected peaks
using the global context (time-series as a whole) as follows. We compute the mean m
and standard deviation s of all positive values of the peak function and then retain only
those points xi in the time-series which satisfy the condition S1(k,i,xi,T) m > h * s,
where h is a user-specified constant. A simple post-processing (used in all algorithms)
involves removing peaks if they are “too near” to each other (e.g., within the same
window of size k).
9
algorithm peak1 // one peak detection algorithms that uses peak function S1
input T = x1, x2, …, xN, N // input time-series of N points
input k // window size around the peak
input h // typically 1 h 3
output O // set of peaks detected in T
begin
O = // initially empty
for (i = 1; i < n; i++) do
a[i] = S1(k,i,xi,T); // compute peak function value for each of the N points in T
end for
Compute the mean m and standard deviation s of all positive values in array a;
for (i = 1; i < n; i++) do // remove local peaks which are “small” in global context
if (a[i] > 0 && (a[i] m) >( h * s)) then O = O {xi}; end if
end for
Order peaks in O in terms of increasing index in T
// retain only one peak out of any set of peaks within distance k of each other
for every adjacent pair of peaks xi and xj in O do
if |j i| k then remove the smaller value of {xi, xj} from O end if
end for
end
4. EXPERIMENTAL EVALUATION
In this section, we present a quick comparison of the proposed algorithms on a sample time-
series. The time-series consists of annual sunspot data for years 1700 to 2008 and is obtained
from the following web-site:
ftp://ftp.ngdc.noaa.gov/STP/SOLAR_DATA/SUNSPOT_NUMBERS/YEARLY.PLT
As seen, entropy-based peak function S4 has detected all peaks; S5 has also done quite well but
the other peak functions have missed some peaks. Note that there are no false positives. Fig. 2
shows a much noisier time-series, where we have got similar results (S4, S5 worked well).
10
Fig.1. Peaks detected in the annual sunspot number time-series using proposed algorithms (first k
and last k points are not analyzed for peaks).
(d) S4: k=5 w=5 h=1.5
(a) S1: k=5 h=1.5
(b) S2: k=5 h=1.5
(c) S3: k=5 h=1.5
(e) S5: k=5 h=1.5
11
Fig.2. Peaks detected in a time-series of 480 points using proposed algorithms (first k and last k
points are not analyzed for peaks).
(a) S1: k=5 h=1.5
(b) S2: k=5 h=1.5
(c) S3: k=5 h=1.5
(e) S5: k=10 h=1.5
(d) S4: k=15 w=3
12
5. CONCLUSIONS AND FURTHER WORK
In this paper, we have proposed a formal characterization of the notion of a peak in a time-series
and have presented several algorithms for peak detection. We also presented a quick
experimental evaluation of the proposed algorithms. The algorithms work on the raw time-series
data and do not need any pre-processing such as smoothing, thereby eliminating some subjective
aspects. We are working on a more in-depth evaluation as well on deploying the peak detection
techniques in different applications. Often an element of experimentation is involved in choosing
the right values of the parameters (e.g., k) of the proposed peak detection algorithms. We have
identified some useful heuristics to automatically select the right parameter values. We are
working on peak detection in an online setting, which is important in some applications.
Acknowledgements. I thank Prof. Harrick Vin for his guidance and encouragement throughout this work. Thanks to
Manoj Jain, Navneet Rao, Shivam Sahai and other colleagues in TRDDC for their help and useful discussions.
Sincere thanks to Dr. Manasee Palshikar for her support.
References
Azzini I., Dell’Anna R., Ciocchetta F., Demichelis F., Sboner A., Blanzieri E., Malossini A.
(2004), Simple Methods for Peak Detection in Time Series Microarray Data, Proc.
CAMDA’04 (Critical Assessment of Microarray Data).
Barnett V., Lewis T. (1994), Outliers in Statistical Data, 3/e, Wiley Publishers.
Choi J.-G., Park J-K., Kim K.-H., Kim J.-C. (1996), A Daily Peak Load Forecasting System
using a Chaotic Time Series, Proc. Int. Conf. on Intelligent Systems Applications to Power
Systems, pp. 283 287.
K.R. Coombes et al. (2005), Improved Peak Detection and Quantification of Mass Spectrometry
Data Acquired from Surface-enhanced Laser Desorption and Ionization by Denoising Spectra
with the Undecimated Discrete Wavelet Transform, Proteomics, 5, 41074117.
Du P., Kibbe W.A., Lin S.M. (2006), Improved Peak Detection in Mass Spectrum by
Incorporating Continuous Wavelet Transform-based Pattern Matching, Bioinformatics, vol. 22,
no. 17, pp. 2059 2065.
Jordanov V.T., Hall D.L., Kastner M. (2002), Digital Peak Detector with Noise Threshold,
Proc. IEEE Nuclear Science Symposium Conference, vol. 1, pp. 140 142.
13
Harmer K., Howells G., Sheng W., Fairhurst M., Deravi F. (2008), A Peak-Trough Detection
Algorithm Based on Momentum, Proc. IEEE Congress on Image and Signal Processing
(CISP), pp. 454 458.
Kleinberg J. (2002), Bursty and Hierarchical Structure in Streams, Proc. 8th ACM SIGKDD
Conf., ACM Press, pp. 91101.
Lange E., Gropl C., Reinert K., Kohlbacher O., Hildebrandt A., (2006), High Accuracy Peak
Picking of Proteomics Data using Wavelet Techniques”, in Proceedings of Pacific Symposium
on Biocomputing 2006, Maui, Hawaii, USA, pp. 243254.
Ma1 M., van Genderen1 A., Beukelman P. (2005), Developing and Implementing Peak
Detection for Real-Time Image Registration, Proc. 16th Annual Workshop on Circuits, Systems
and Signal Processing (proRISC2005), pp. 641 652.
Nijm G. M., Sahakian A. V., Swiryn S., Larson A. C. (2007), Comparison of Signal Peak
Detection Algorithms for Self-Gated Cardiac Cine MRI, Computers in Cardiology 2007.
Vlachos M., Meek C., Vagena Z., Gunopulos D. (2004), Identification of Similarities,
Periodicities and Bursts for Online Search Queries”, Proc. SIGMOD 2004 Conf., ACM Press,
pp. 131142.
Vlachos M., Wu K.-L., Chen S.-K., Yu P.S. (2008), “Correlating Burst Events on Streaming
Stock market Data”, Data Mining and Knowledge Discovery, vol. 16, pp. 109 133.
Wand M.P., Jones M.C. (1995), Kernel Smoothing, Chapman and Hall.
Zhu Y., Shasha D. (2003), Efficient Elastic Burst Detection in Data Streams, Proc. SIGKDD
2003 Conf., ACM Press, pp 336345.
... Instead, a different approach was taken. The acceleration magnitude (A m ) was used to perform peak detection, extracting local maximum values above a specific threshold within a particular time range [37]. The extracted positions, referred to as peak points, and their corresponding values, peak values, were obtained. ...
... Instead, a different approach was taken. The acceleration magnitude ( ) was used to perform peak detection, extracting local maximum values above a specific threshold within a particular time range [37]. The extracted positions, referred to as peak points, and their corresponding values, peak values, were obtained. ...
Article
Full-text available
The monitoring of pre-weaned calf behavior is crucial for ensuring health, welfare, and optimal growth. This study aimed to develop and validate a machine learning-based technique for the simultaneous monitoring of multiple behaviors in pre-weaned beef calves within a cow–calf contact (CCC) system using collar-mounted sensors integrating accelerometers and gyroscopes. Three complementary models were developed to classify feeding-related behaviors (natural suckling, feeding, rumination, and others), postural states (lying and standing), and coughing events. Sensor data, including tri-axial acceleration and tri-axial angular velocity, along with video recordings, were collected from 78 beef calves across two farms. The LightGBM algorithm was employed for behavior classification, and model performance was evaluated using a confusion matrix, the area under the receiver operating characteristic curve (AUC-ROC), and Pearson’s correlation coefficient (r). Model 1 achieved a high performance in recognizing natural suckling (accuracy: 99.10%; F1 score: 96.88%; AUC-ROC: 0.999; r: 0.997), rumination (accuracy: 97.36%; F1 score: 95.07%; AUC-ROC: 0.995; r: 0.990), and feeding (accuracy: 95.76%; F1 score: 91.89%; AUC-ROC: 0.990; r: 0.987). Model 2 exhibited an excellent classification of lying (accuracy: 97.98%; F1 score: 98.45%; AUC-ROC: 0.989; r: 0.982) and standing (accuracy: 97.98%; F1 score: 97.11%; AUC-ROC: 0.989; r: 0.983). Model 3 achieved a reasonable performance in recognizing coughing events (accuracy: 88.88%; F1 score: 78.61%; AUC-ROC: 0.942; r: 0.969). This study demonstrates the potential of machine learning and collar-mounted sensors for monitoring multiple behaviors in calves, providing a valuable tool for optimizing production management and early disease detection in the CCC system
... Peak prediction differs from the traditional and well-studied peak detection task [8] in which the goal is to identify the peaks (i.e., local or global maxima) of a query time series. In addition, it differs from load forecasting [9], in which the goal is predicting the complete power demand for the following days, not focusing on specific values representing high-demand periods. ...
... Specifically, we consider 7 days ahead in our experiments. Since the peak represents an observation from a time series that has not yet been observed at the prediction time, this differentiates our peak forecasting task from conventional peak detection [8] or finding local maxima in time series [23]. In other words, in load peak forecasting, we are interested in predicting ahead of time (e.g., 7 days) what the maximum energy load of a customer will be in the following days and when it will occur. ...
... Therefore, we detect such anomalies and remove it. Furthermore, extraction of RR is applied using local peak function [24] approach. ...
... To extract the RR from the filtered CSI graph, we used local peak function [24] to find the number of peak in an interval and scaled it to be a minute resulted as a bpm. There is a single parameter for local peak that is a minimum distance before the next peak. ...
Article
Full-text available
Sleep apnea, characterized by breathing interruptions or slow breathing at night, can cause various health issues. Detecting respiratory rate (RR) using Wireless Fidelity (Wi-Fi) can identify sleep disorders without physical contact avoiding sleep disruption. However, traditional methods using Network Interface Cards (NICs) like the Intel Wi-Fi Link 5300 NIC are often costly and limited in channel state information (CSI) resolution. Our study introduces an effective strategy using the affordable ESP32 single-board computer for tracking RR through detailed analysis of Wi-Fi signal CSI. We developed a technique correlating Wi-Fi signal fluctuations with RR, employing signal processing methods—Hampel Filtering, Gaussian Filtering, Linear Interpolation, and Butterworth Low Pass Filtering—to accurately extract relevant signals. Additionally, noise from external movements is mitigated using a Z-Score for anomaly detection approach. We also implemented a local peak function to count peaks within an interval, scaling it to bpm for RR identification. RR measurements were conducted at different rates—Normal (12–16 bpm), Fast (>16 bpm), and Slow (<12 bpm)—to assess the effectiveness in both normal and sleep apnea conditions. Tested on data from 8 participants with distinct body types and genders, our approach demonstrated accuracy by comparing modeled sleep RR against actual RR measurements from the Vernier Respiration Monitor Belt. Optimal parameter settings yielded an overall average mean absolute deviation (MAD) of 2.60 bpm, providing the best result for normal breathing (MAD = 1.38). Different optimal settings were required for fast (MAD = 1.81) and slow breathing (MAD = 2.98). The results indicate that our method effectively detects RR using a low-cost approach under different parameter settings.
... To calculate the RR intervals in the vital signal acquired from radar, it is necessary to detect a series of peaks within the selected vital signal. Simple algorithms for peak detection in time-series data have been used in previous studies [41]. Instead, we adopted a new peak detection algorithm, which is a modified and simplified version of one previously used in clutter detection for people counting [29]. ...
Article
Full-text available
Mental distress-induced imbalances in autonomic nervous system activities adversely affect the electrical stability of the cardiac system, with heart rate variability (HRV) identified as a related indicator. Traditional HRV measurements use electrocardiography (ECG), but impulse radio ultra-wideband (IR-UWB) radar has shown potential in HRV measurement, although it is rarely applied to psychological studies. This study aimed to assess early high levels of mental distress using HRV indices obtained using radar through modified signal processing tailored to reduce phase noise and improve positional accuracy. We conducted 120 evaluations on 15 office workers from a software startup, with each 5 min evaluation using both radar and ECG. Visual analog scale (VAS) scores were collected to assess mental distress, with evaluations scoring 7.5 or higher classified as high-mental distress group, while the remainder formed the control group. Evaluations indicating high levels of mental distress showed significantly lower HRV compared to the control group, with radar-derived indices correlating strongly with ECG results. The radar-based analysis demonstrated a significant ability to differentiate high mental distress, supported by receiver operating characteristic (ROC) analysis. These findings suggest that IR-UWB radar could be a supportive tool for distinguishing high levels of mental stress, offering clinicians complementary diagnostic insights.
... The threshold for peak detection was determined by setting it at values higher than the average plus two standard deviations, following a common practice in the field (Aiello et al. 2021;Palshikar 2009). ...
Preprint
Crisis events elicit emotional contagion and influence public risk perception, bringing out the societal response to emergencies. This study investigates emotional contagion and public risk perception on social media platforms during crisis events within the Chinese context. We analyze approximately 21 million unique COVID-19-related Sina Weibo posts from January 1st, 2020, to May 31st, 2020, utilizing the Weibo Five Basic Mood Lexicon and Simplified Chinese LIWC dictionary to calculate word frequency. Employing change-point detection and ANOVAs, we explore the temporal and spatial characteristics of emotional changes and public risk perception. Our findings reveal significant emotional shifts during the early stages of the pandemic, consistent with heightened perceived risk levels. Moreover, the emotional impact and risk perception of the epidemic reveal temporal consistency and independence from geographical location. Cultural disparities may contribute to the divergent findings between China and the Western context. This study underscores the significance of context-specific insights and offers implications for risk communication, public engagement, and policymaking in future crises.
Chapter
Environmental spatiotemporal data analytics (ESTDA) is a field that combines environmental science, data science, and geographic information systems to explore the relationship between environmental phenomena and their spatial and temporal variability. ESTDA has been used to address a wide range of environmental issues, such as climate change, pollution, biodiversity loss, and natural disasters. The goal of this field is to identify patterns, trends, and anomalies in environmental data that can help scientists and policymakers make informed decisions. ESTDA relies on a variety of analytical techniques, including statistical models, machine learning algorithms, remote sensing, and spatial analysis. These techniques allow researchers to extract meaningful information from large and complex datasets, including environmental monitoring networks, satellite imagery, and citizen science data. By analysing these data, ESTDA can provide insights into the drivers of environmental change, the impacts of human activities on the environment, and the effectiveness of environmental policies and management strategies. Overall, ESTDA has the potential to improve our understanding of environmental systems and inform more effective environmental decision-making. However, it also faces a number of challenges, such as data quality and availability, computational limitations, and need for interdisciplinary collaboration. Addressing these challenges will be crucial for the continued advancement of ESTDA and its potential to contribute to sustainable development and conservation efforts. The chapter aims at addressing the overall concept of ESTDA, its issues, and challenges.
Conference Paper
Full-text available
A new peak picking algorithm for the analysis of mass spectrometric (MS) data is presented. It is independent of the underlying machine or ionization method, and is able to resolve highly convoluted and asymmetric signals. The method uses the multiscale nature of spectrometric data by first detecting the mass peaks in the wavelet-transformed signal before a given asymmetric peak function is fitted to the raw data. In an optional third stage, the resulting fit can be further improved using techniques from nonlinear optimization. In contrast to currently established techniques (e.g. SNAP, Apex) our algorithm is able to separate overlapping peaks of multiply charged peptides in ESI-MS data of low resolution. Its improved accuracy with respect to peak positions makes it a valuable preprocessing method for MS-based identification and quantification experiments. The method has been validated on a number of different annotated test cases, where it compares favorably in both runtime and accuracy with currently established techniques. An implementation of the algorithm is freely available in our open source framework OpenMS (www.open-ms.de). @InProceedings{lange_et_al:DSP:2006:535, author = {Eva Lange and Clemens Gr{"o}pl and Oliver Kohlbacher and Andreas Hildebrandt}, title = {High-accuracy peak picking of proteomics data}, booktitle = {Computational Proteomics}, year = {2006}, editor = {Christian G. Huber and Oliver Kohlbacher and Knut Reinert}, number = {05471}, series = {Dagstuhl Seminar Proceedings}, ISSN = {1862-4405}, publisher = {Internationales Begegnungs- und Forschungszentrum f{"u}r Informatik (IBFI), Schloss Dagstuhl, Germany}, address = {Dagstuhl, Germany}, URL = {http://drops.dagstuhl.de/opus/volltexte/2006/535}, annote = {Keywords: Mass spectrometry, peak detection, peak picking} }
Article
Full-text available
We address the problem of monitoring and identification of corre- lated burst patterns in multi-stream time series databases. We follow a two-step methodology: first we identify the burst sections in our data and subsequently we store them for easy retrieval in an efficient in-memory index. The burst detection scheme imposes a variable threshold on the examined data and takes advantage of the skewed distribution that is typically encountered in many applications. The detected bursts are compacted into burst intervals and stored in an interval index. The index facilitates the identification of correlated bursts by performing very efficient overlap operations on the stored burst regions. We presentthemeritsoftheproposedindexingschemethroughathoroughanalysis of its complexity. We also manifest the real-time response of our burst index- ing technique, and demonstrate the usefulness of the approach for correlating surprising volume trading events using historical stock data of the NY stock exchange. While the focus of this work is on financial data, the proposed meth- ods and data-structures can find applications for anomaly or novelty detection in telecommunication, network traffic and medical data.
Conference Paper
Full-text available
We present several methods for mining knowledge from the query logs of the MSN search engine. Using the query logs, we build a time series for each query word or phrase (e.g., 'Thanksgiving' or 'Christmas gifts') where the elements of the time series are the number of times that a query is issued on a day. All of the methods we describe use sequences of this form and can be applied to time series data generally. Our primary goal is the discovery of semantically similar queries and we do so by identifying queries with similar demand patterns. Utilizing the best Fourier coefficients and the energy of the omitted components, we improve upon the state-of-the-art in time-series similarity matching. The extracted sequence features are then organized in an efficient metric tree index structure. We also demonstrate how to efficiently and accurately discover the important periods in a time-series. Finally we propose a simple but effective method for identification of bursts (long or short-term). Using the burst information extracted from a sequence, we are able to efficiently perform 'query-by-burst' on the database of time-series. We conclude the presentation with the description of a tool that uses the described methods, and serves as an interactive exploratory data discovery tool for the MSN query database.
Article
Full-text available
Motivation: A major problem for current peak detection algorithms is that noise in mass spectrometry (MS) spectra gives rise to a high rate of false positives. The false positive rate is especially problematic in detecting peaks with low amplitudes. Usually, various baseline correction algorithms and smoothing methods are applied before attempting peak detection. This approach is very sensitive to the amount of smoothing and aggressiveness of the baseline correction, which contribute to making peak detection results inconsistent between runs, instrumentation and analysis methods. Results: Most peak detection algorithms simply identify peaks based on amplitude, ignoring the additional information present in the shape of the peaks in a spectrum. In our experience, 'true' peaks have characteristic shapes, and providing a shape-matching function that provides a 'goodness of fit' coefficient should provide a more robust peak identification method. Based on these observations, a continuous wavelet transform (CWT)-based peak detection algorithm has been devised that identifies peaks with different scales and amplitudes. By transforming the spectrum into wavelet space, the pattern-matching problem is simplified and in addition provides a powerful technique for identifying and separating the signal from the spike noise and colored noise. This transformation, with the additional information provided by the 2D CWT coefficients can greatly enhance the effective signal-to-noise ratio. Furthermore, with this technique no baseline removal or peak smoothing preprocessing steps are required before peak detection, and this improves the robustness of peak detection under a variety of conditions. The algorithm was evaluated with SELDI-TOF spectra with known polypeptide positions. Comparisons with two other popular algorithms were performed. The results show the CWT-based algorithm can identify both strong and weak peaks while keeping false positive rate low. Availability: The algorithm is implemented in R and will be included as an open source module in the Bioconductor project.
Article
Full-text available
A new peak picking algorithm for the analysis of mass spectrometric (MS) data is presented. It is independent of the underlying machine or ionization method, and is able to resolve highly convoluted and asymmetric signals. The method uses the multiscale nature of spectrometric data by first detecting the mass peaks in the wavelet-transformed signal before a given asymmetric peak function is fitted to the raw data. In an optional third stage, the resulting fit can be further improved using techniques from nonlinear optimization. In contrast to currently established techniques (e.g. SNAP, Apex) our algorithm is able to separate overlapping peaks of multiply charged peptides in ESI-MS data of low resolution. Its improved accuracy with respect to peak positions makes it a valuable preprocessing method for MS-based identification and quantification experiments. The method has been validated on a number of different annotated test cases, where it compares favorably in both runtime and accuracy with currently established techniques. An implementation of the algorithm is freely available in our open source framework OpenMS.
Conference Paper
Self-gating (SG) is a cardiac MRI technique to synchronize data acquisition to the cardiac cycle based upon MR signal triggers as opposed to conventional ECG triggers. Fourteen healthy subjects underwent cardiac MRI scans in four different orientations: two chamber, three chamber, four chamber, and short axis. SG trigger times were computed using two methods, first difference and template matching, and ECG trigger times were also recorded for comparison. The root-mean-square (RMS) error was used to evaluate performance, defined as the variability relative to the mean difference between SG trigger times and ECG trigger times. The mean RMS error was lower for template matching than first difference approach for all scan orientations; the improvement in RMS error was statistically significant for all orientations except short axis. In conclusion, compared to the first difference approach, template matching improved the accuracy of trigger detection for two, three, and four chamber SG cardiac MRI scans.
Article
Mass spectrometry is being used to find disease-related patterns in mixtures of proteins derived from biological fluids. Questions have been raised about the reproducibility and reliability of peak quantifications using this technology. We collected nipple aspirate fluid from breast cancer patients and healthy women, pooled them into a quality control sample, and produced 24 replicate SELDI spectra. We developed a novel algorithm to process the spectra, denoising with the undecimated discrete wavelet transform (UDWT), and evaluated it for consistency and reproducibility. UDWT efficiently decomposes spectra into noise and signal. The noise is consistent and uncorrelated. Baseline correction produces isolated peak clusters separated by flat regions. Our method reproducibly detects more peaks than the method implemented in Ciphergen software. After normalization and log transformation, the mean coefficient of variation of peak heights is 10.6%. Our method to process spectra provides improvements over existing methods. Denoising using the UDWT appears to be an important step toward obtaining results that are more accurate. It improves the reproducibility of quantifications and supplies tools for investigation of the variations in the technology more carefully. Further study will be required, because we do not have a gold standard providing an objective assessment of which peaks are present in the samples.
Conference Paper
This paper presents a simple, yet novel, approach to peak-trough detection using a rudimentary model of Newtonian mechanics. Based on the line-searching technique also employed in artificial neural network technology to determine global minima, the momentum is used to find both peaks and troughs of a signal. This algorithm provides a fast alternative to the traditional techniques, which uses contextual information in order to determine prominent peaks and troughs without requiring smoothing or thresholding.