Article

Spectrum estimation with missing observations

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Without Abstract

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Since Lomb-Scargle's method [10,11] is widely used as a benchmark, it is included in the comparison below to prove it as biased. Missing data samples in equidistant data streams have also been investigated broadly in [20][21][22][23][24][25][26][27][28], including also specific cases of correlated data gaps. These derivations strictly depend on the specific cases of missing data and are not robust against changes in spectral content of the data gaps. ...
... The present article introduces bias-free estimators for the covariance function and the power spectral density from equidistant data sets with interruptions of arbitrary spectral composition. It is a combination of three known but rarely used methods, namely a) weighted average taken from [25], except for any spectral or time windowing to circumvent any modulation of the spectrum by filtering resp. windowing, b) restriction of the domain of the covariance function, mentioned briefly in [34] as an appropriate means for reducing the estimation variance of the spectral estimates and c) correction of the covariance estimate after removal of the estimated mean value, adapted to the weighted average to work with gapped data [35][36][37]. ...
... If all samples x i in a particular realization of the signal are given, then the covariance function at any lag distance k can be estimated from the average of all samples of the data set, which have the distance k. With missing samples, either reconstruction and filling the gaps will help for the price of a bias, as mentioned above, or the averaging process is restricted to those pairs of samples, where both x i and x i+k are available, as used, e.g., in [25]. This method has the potential to yield bias-free covariance estimates also from signals with gaps. ...
Article
Full-text available
Nonparametric estimation of the covariance function and the power spectral density of uniformly spaced data from stationary stochastic processes with missing samples is investigated. Several common methods are tested for their systematic and random errors under the condition of variations in the distribution of the missing samples. In addition to random and independent outliers, the influence of longer and hence correlated data gaps on the performance of the various estimators is also investigated. The aim is to construct a bias-free estimation routine for the covariance function and the power spectral density from stationary stochastic processes under the condition of missing samples with an optimum use of the available information in terms of low estimation variance and mean square error, and that independent of the spectral composition of the data gaps. The proposed procedure is a combination of three methods that allow bias-free estimation of the desired statistical functions with efficient use of the available information: weighted averaging over valid samples, derivation of the covariance estimate for the entire data set and restriction of the domain of the covariance function in a post-processing step, and appropriate correction of the covariance estimate after removal of the estimated mean value. The procedures abstain from interpolation of missing samples as well as block subdivision. Spectral estimates are obtained from covariance functions and vice versa using Wiener–Khinchin’s theorem.
... Missing data samples in equidistant data streams have also been investigated broadly by Jones (1962a,b); Parzen (1963); Scheinok (1965); Bloomfield (1970); Jones (1971Jones ( , 1972; Ghazal and Elhassanein (2006); Munteanu et al. (2016), 1 Plantier et al. (2012) state that Eq. (14) (in that publication) would be an approximation. However, it is the explicit and exact formulation for the signal model used by Nobach et al. (1998), namely the random sampling of a continuous time. ...
... In the present article a non-parametric and bias-free method is introduced, which is i) simple to realize, ii) efficient in using the available information and iii) universal for various statistical properties of the data gaps. It is a combination of three methods a) taken from Jones (1971), except for any spectral or time windowing, b) using derivations as in Vogelsang and Yang (2016) adapted to the two-sided autocovariance function including weighting and transferred to the cross-covariance case (see details in Nobach (2023)) and c) mentioned only briefly as an appropriate means for spectral estimation in Bartlett (1948). This combination allows bias-free estimation of the variance of the signal with data gaps, its covariance function and the corresponding power spectrum, independent of the spectral content of the data gaps, including those cases, where the mean value is estimated and subtracted out from the data. ...
Preprint
Full-text available
Signal processing of uniformly spaced data from stationary stochastic processes with missing samples is investigated. Besides randomly and independently occurring outliers also correlated data gaps are investigated. Non-parametric estimators for the mean value, the signal variance, the autocovariance and cross-covariance functions and the corresponding power spectral densities are given, which are bias-free, independent of the spectral composition of the data gaps. Bias-free estimation is obtained by averaging over valid samples only from the data set. The procedures abstain from interpolation of missing samples. An appropriate bias correction is used for cases where the estimated mean value is subtracted out from the data. Spectral estimates are obtained from covariance functions using Wiener-Khinchin's theorem.
... Each tag recorded the depth (pressure) and temperature every other hour for six days and every twelfth hour on the seventh day. To have values at regular intervals, the ten unobserved values on the seventh day are replaced by interpolated values (cubic splines method, SAS, 1993, Proc Expand) before spectral analysis (Jones, 1971), and otherwise by repetitions of the last observation. With two-hour resolution a highest frequency of a four-hour cycle may be detected (Nyquist frequency, Priestley, 1981). ...
... Thus act(a)=100% means that the DDR interval, [dmin(a), dmax(a)], coincides with an FVR interval for some neutral buoyancy level d (Figure 7(a)). The actual neutral buoyancy depth throughout the day is not known from the observations but the daily ascent and descent behaviour (see Results) and the significance of the upper limit m of FVR as a barrier (Harden Jones andScholes, 1981, 1985;Arnold and Greer Walker, 1992) makes it reasonable to assume that d is inside the DDR and to choose m=dmin(a). The main purpose is to measure vertical activity level, so that the same act(a)value means roughly the same effort or physiological strain whether the DDR is narrow in shallow or wide in deep water. ...
Article
Full-text available
Stensholt, B. K. 2001. Cod migration patterns in relation to temperature: analysis of storage tag data. - ICES Journal of Marine Science, 58: 770-793. Bivariate time-series of depth (pressure) and temperature with two-hour intervals from 19-data storage tags (DST) attached to adult Northeast Arctic cod (Gadus morhua L.) released from mid-March are analysed. Interplay between migration behaviour, physiological limitation factors, environment, and ecology in the Barents Sea is investigated using geometrical and statistical methods. Thermo-stratification is ident- ified using r(t), the ratio between temperature and depth change over each record interval. Vertical activity, act(a), in relation to physiological limitations to pressure change is measured with the ratio of the daily depth range to the free vertical range. Cycles are detected by spectral analysis. The analysis supports conclusions from large-scale studies. Cod migrate along stable thermal paths until they reach a front area (or feeding ground), where the vertical activity increases and the records of depth, temperature, and r(t) change pattern, level and range. The (semi-) diurnal vertical migration (DVM) occurs seasonally in some fish, mainly in areas with large tempera- ture gradient. In 11 out of 12 tags where DVM is detected, this occurs during summer and autumn. In seven out of 11 tags where semi-diurnal tidal cycles are detected in the temperature series together with a significant reduction in vertical migration, this occurs during April. In some tags diurnal or semi-diurnal cycles appear in both depth and temperature series. 2001 International Council for the Exploration of the Sea
... However, adaptations are necessary, since random sampling on a continuous domain has a different spectral composition than equidistant sampling with missing samples. Independent and randomly distributed missing samples with otherwise equidistant sampling has also been investigated [31,[33][34][35][36][37][38]. Correlated data gaps have been investigated only for very specific cases [31,[39][40][41][42], without options for generalization or without satisfactory bias correction. ...
Article
Full-text available
The prediction and correction of systematic errors in direct spectral estimation from irregularly sampled data taken from a stochastic process is investigated. Different sampling schemes are investigated, which lead to such an irregular sampling of the observed process. Both kinds of sampling schemes are considered, stochastic sampling with non-equidistant sampling intervals from a continuous distribution and, on the other hand, nominally equidistant sampling with missing individual samples yielding a discrete distribution of sampling intervals. For both distributions of sampling intervals, continuous and discrete, different sampling rules are investigated. On the one hand, purely random and independent sampling times are considered. This is given only in those cases, where the occurrence of one sample at a certain time has no influence on other samples in the sequence. This excludes any preferred delay intervals or external selection processes, which introduce correlations between the sampling instances. On the other hand, sampling schemes with interdependency and thus correlation between the individual sampling instances are investigated. This is given whenever the occurrence of one sample in any way influences further sampling instances, e.g., any recovery times after one instance, any preferences of sampling intervals including, e.g., sampling jitter or any external source with correlation influencing the validity of samples. A bias-free estimation of the spectral content of the observed random process from such irregularly sampled data is the goal of this investigation.
... We call such a time series with missing observations gappy. Despite several works on spectral density function (SDF) estimation of short-range dependent time series (e.g., Parzen 1963;Scheinok 1965;Bloomfield 1970;Jones 1971;Dunsmuir and Robinson 1981;Dahlhaus 1987) the theory for the estimation of the SDF for a gappy LRD time series, however, has remained unsolved, limiting the ability to further provide a useful spectral-based estimate of the LRD parameter. In this paper, we build upon the work of Mondal and Percival (2010) that derive consistent and asymptotically normal estimates of wavelet variances for gappy time series, including LRD ones. ...
Article
Full-text available
Knowledge of the long-range dependence (LRD) parameter is critical to studies of self-similar behavior. However, statistical estimation of the LRD parameter becomes difficult when the observed data are masked by short-range dependence and other noises or are gappy in nature (i.e., some values are missing in an otherwise regular sampling). Currently there is a lack of theory for spectral- and wavelet-based estimators of the LRD parameter for gappy data. To address this, we estimate the LRD parameter for gappy Gaussian semiparametric time series based upon undecimated wavelet variances. We develop estimation methods by using novel estimators of the wavelet variances, providing asymptotic theory for the joint distribution of the wavelet variances and our estimator of the LRD parameter. We introduce sandwich estimators to compute standard errors for our estimates. We demonstrate the efficacy of our methods using Monte Carlo simulations and provide guidance on practical issues such as how to select the range of wavelet scales. We demonstrate the methodology using two applications: one for gappy Arctic sea-ice draft data and another for gap-free and gappy daily average temperature data collected at 17 locations in south central Sweden.
... Jones semble être le premier à partir des années 1960 à s'intéresser au problème [Jones, 1962], [Jones, 1972] [Marvasti, 1984]. Ainsi, le signal X 11 (t 11 ) s'écrit : += += += ...
Thesis
Bon nombre de méthodes de compression font du signal originel un signal échantillonné irrégulièrement. Par ailleurs, certains systèmes, de part leur conception, ne permettent de recueillir que des signaux à échantillonnage irrégulier. Afin de pouvoir traiter ces signaux, diverses méthodes de reconstruction sont apparues depuis quatre décades. Une manière originale de traiter de tels signaux échantillonnés irrégulièrement est de prendre en compte la principale caractéristique de ces signaux, à savoir l'intervalle de temps variable entre deux échantillons consécutifs. Nous avons voulu, dans ce travail, développer des outils de traitement de signaux à échantillonnage irrégulier, sans avoir recours à une reconstruction totale d'un signal à échantillonnage régulier. Afin d'obtenir un signal à échantillonnage irrégulier, nous avons développé une méthode de compression du signal, dite d'échantillonnage à pas variable, alliant taux de compression et fidélité de reconstruction, et générant ainsi un signal irrégulièrement échantillonné. L'idée originale consiste à traiter directement ce signal à échantillonnage irrégulier. Nous avons, par conséquent, redéfini les méthodes et outils suivants : opérations simples ; opérations statistiques ; analyse spectrale par transformée de Fourier; filtrage passe-bas et passe-bande ; identification ; décomposition. L'ensemble de ces outils et méthodes a été appliqué à la détection d'événements dans l'électrocardiogramme. Une compression par échantillonnage à pas variable permet de réduire la mémoire de stockage de I'ECG. D'une part, l'extraction d'informations caractéristiques de chaque cycle de l'ECG permet de recueillir et de suivre l'évolution temporelle d'intervalles de temps, d'amplitudes, de formes d'ondes, d'énergies ... D'autre part, chaque cycle cardiaque de I'ECG échantillonné à pas variable est décomposé sur des bases comprenant un battement normal sain et un battement de chaque arythmie. L'évolution au cours du temps des coefficients de la décomposition peut être associée à l'évolution des paramètres précédemment décrits afin de les fusionner, d'en étudier les variations, et d'améliorer la prise de décision en terme d'arythmie.
... Hurvich and Ray (1995) argued, that the GPH estimator was consistent only when d < 1 by simulation, and Kim and Phillips (1999) theoretically. In the same context, Velasco (1999a) showed the consistency and To solve the non-consistency problem, Hurvich and Ray (1995) and Velasco (1999a) suggested the use of data tapering, which was first proposed by Cooley and Tukey (1965) and discussed by Cooley et al. (1967) and Jones (1971). This technique has also been used by many authors, such as Hurvich and Chen (2000), Giraitis and Robinson (2003), Sibbertsen (2004), Olhede et al. (2004), among many others. ...
Article
In this thesis, we consider two classes of long memory processes: the stationary long memory processes and the non-stationary long memory processes. We are devoted to the study of their probabilistic properties, estimation methods, forecast methods and the statistical tests. Stationary long memory processes have been extensively studied over the past decades. It has been shown that some long memory processes have the properties of self-similarity, which are important for parameter estimation. We review the self-similar properties of continuous-time and discrete-time long memory processes. We establish the propositions that stationary long memory process is asymptotically second-order self-similar, while stationary short memory process is not asymptotically second-order self-similar. Then we extend the results to specific long memory processes such as k-factor GARMA processes and k-factor GIGARCH processes. We also investigate the self-similar properties of some heteroscedastic models and the processes with switches and jumps. We make a review for the stationary long memory processes' parameter estimation methods, including the parametric methods (for example, maximum likelihood estimation, approximate maximum likelihood estimation) and the semiparametric methods (for example, GPH method, Whittle method, Robinson method). The consistency and asymptotic normality behaviors are also investigated for the estimators. Testing the fractionally integrated order of seasonal and non-seasonal unit roots of the stochastic stationary long memory process is quite important for the economic and financial time series modeling. The widely used Robinson test (1994) is applied to various well-known long memory models. Via Monte Carlo experiments, we study and compare the performances of this test using several sample sizes, which provide a good reference for the practitioners who want to apply Robinson's test. In practice, seasonality and time-varying long-range dependence can often be observed and thus some kind of non-stationarity exists inside the economic and financial data sets. To take into account this kind of phenomena, we review the existing non-stationary processes and we propose a new class of non-stationary stochastic process: the locally stationary k-factor Gegenbauer process. We describe a procedure of estimating consistently the time-varying parameters with the help of the discrete wavelet packet transform (DWPT). The consistency and asymptotic normality of the estimates are proved. The robustness of the algorithm is investigated through simulation study. We also propose the forecast method for this new non-stationary long memory processes. Applications and forecasts based on the error correction term in the error correction model of the Nikkei Stock Average 225 (NSA 225) index and theWest Texas Intermediate (WTI) crude oil price are followed.
... It can effect the spectrum of the time-series to work with non-equidistant distributed time-series. Several persons like Jones (1971) or Parzen (1963) studied time-series especially with missing data. However, we used linearly interpolated data for our projection to prevent such errors. ...
Article
This thesis deals with projection methods of soil temperature and soil moisture into depth by a forward model based on near-surface time-series. In addition, thermal as well as hydraulic soil parameters are estimated by using the Levenberg-Marquardt algorithm. For soil temperature, two analytical projection methods are compared which use the transfer function and the Fourier transform approach, respectively. In each case, additional mathematical strategies are required to improve the projection results, e. g. by adding an integral over the initial profile or applying the Tukey window on the time-series. The resulting projected temperature matches field measurements very well, especially for the transfer function method a residuum down to ±0.05 �C is achieved. Further, the uniqueness of the parameter space is evaluated and the temporal evolution of the thermal diffusivity is estimated through both projection methods. The projection of near-surface soil moisture is realized numerically by a finite volume scheme due to the strong non-linearity of the system. On the basis of synthetic data the conditions are explored under which accurate estimations of the hydraulic parameters are feasible. One of these conditions is found to be that the water content range should be larger than 0.5 times of the porosity. The corresponding relative parameter error is found to be some 10−5. Furthermore, the study shows that an accurate estimation is more feasible for soils with a lower saturated conductivity, and steep functions of the hydraulic properties. This is typically the case for soils with a higher sand content. Data from a field site is then used to verify the findings of the synthetic study and to discuss limitations, e.g. ponding water at a soil layer interface.
... are obtained from the actual observations y * only, that is, ignoring the missing locations. This is a two-dimensional nonstationary analogue of Parzen's approach to the estimation of the spectral density with missing observations, which solves the problem by computing a modified form of finite Fourier transform with a taper equal to the indicator function (Priestley 1981, p. 586; see also Jones 1971). ...
Article
Full-text available
This article presents a model-based thin-plate smoothing method for optimal signal extraction and interpolation of missing data in spatial datasets. The method is based on a spectral EM algorithm where the two steps can be carried out in the frequency domain. In essence, the approach allows both dimensions to be treated separately from each other effectively rendering a likelihood that is easy to evaluate. As a result the algorithm is computationally inexpensive, in terms of both memory size and computing time, while allowing us to obtain an analytic expression for the asymptotic variance of the signal-to-noise ratio with which to construct confidence intervals of the missing data. Some numerical Monte Carlo simulations and a real data example using remotely sensed global aerosol optical thickness data illustrate the results given. Supplemental materials (Matlab computer code and dataset) are available online.
... 15 Except for an ARFIMA (1, d, 0) and for d = 1.8, the order p = 2 give also a positive bias, since the estimate converge to the order p. 16 These observations didn't verified for m = n 0.8 . 17 The same conclusions can be obtained with m = n 0.8 . ...
Article
In this paper, we study through a Monte Carlo simulation, the effect of the order p of "Zhurbenko-Kolmogorov" taper on the properties of semiparametric estimators by using a peri-odogram with Fourier frequencies λj,n, where j is a multiple of the order p. For p ≥ [d + 1/2] + 1, we obtain the consistency and the asymptotic normality of the estimators, however, for p = [d + 1/2] + 1, the variance is lesser than the preceding case. We show also that it is necessary to choose an optimal value of the truncation parameter m. In considering the ARF IMA (1, d, 0) and ARF IMA (0, d, 1) models, we show that the autoregressive part as well as the moving-average part have an important role on bias and variances of these estimators. We finally carry out an empirical application by using four monthly seasonally adjusted logarithm CPI series.
... A moving window is used to localize high and low correlations in the time series. The used method neglects missing data, while the observation times are still taken into account (Jones, 1971). ...
Article
Full-text available
The municipality of Arnhem has set up a monitoring program to investigate water quality in storm sewers as well as the efficiency of several end-of-pipe solutions for stormwater treatment. Four pilot installations have been set up: a sandfilter, a lamella-separator, a soil filter and a helophyte filter. Flow rates and water levels are real-time measured. The obtained data are tested on reliability, accuracy and completeness by automated tests. Logical and statistical tests are used. During the project the necessity of data validation was proved. The detected problems appeared to be divers. Heavy noise in flow data was distinguished with a check on autocorrelations. The total measured volume was largely distorted by the noise. Multiple errors in data logger software, causing faulty flow data, were discovered. Suspicious data were sometimes caused by unexpected disturbance of the conditions. Data validation must regularly take place from the very beginning of a monitoring project to be alert on any of such disturbances.
Book
Cambridge Core - Statistics for Physical Sciences and Engineering - Spectral Analysis for Univariate Time Series - by Donald B. Percival
Article
Consider a zero-mean and second-order stationary time series of interest {Xt} that cannot be observed directly. Instead an amplitude-modulated time series {Yt}:={AtUtXt} is observed where {At} is a stationary Bernoulli time series and {Ut} is a time series of independent variables satisfying P(Ut=0)=0 and μ:=E{Ut}≠0. Time series {At} creates missing observations when At = 0, and Ut modulates not missed Xt. There is bad and good news about spectral analysis of amplitude-modulated time series. The bad news is that in general consistent estimation of the spectral density is impossible. The good news is that the spectral shape (which is the spectral density minus (2π)−1E{Xt2}) multiplied by factor μ2 may be consistently estimated. This article, for the first time in the literature, explores a classical problem of sequential nonparametric estimation of the scaled shape with assigned mean integrated square error. It proposes an adaptive sequential estimator that solves the problem and whose mean stopping time matches the performance of a minimax oracle that knows an underlying spectral density and the amplitude-modulating mechanism. The asymptotic theory is complemented by numerical examples.
Article
Estimation with assigned risk is a classical statistical problem, and the theory is well developed for the case of directly observed (no missing) data. In this article a more complicated problem of estimation of the spectral density in presence of missing data is considered. First, the corresponding theory of sequential estimation with minimal expected stopping time is developed. Then it is shown that a two‐stage estimator may be used and it yields the minimal stopping time. Sample size of the first stage may be deterministic and in order smaller than a minimal stopping time, and then the first stage defines the size of the second stage. Furthermore, the estimator adapts to unknown smoothness of an underlying spectral density and to an underlying missing mechanism.
Article
Three related estimators are considered for the parametrized spectral density of a discrete-time process X ( n ), n = 1, 2, · · ·, when observations are not available for all the values n = 1(1) N . Each of the estimators is obtained by maximizing a frequency domain approximation to a Gaussian likelihood, although they do not appear to be the most efficient estimators available because they do not fully utilize the information in the process a ( n ) which determines whether X ( n ) is observed or missed. One estimator, called M3, assumes that the second-order properties of a ( n ) are known; another, M2, lets these be known only up to an unknown parameter vector; the third, M1, requires no model for a ( n ). Under representative sets of conditions, which allow for both deterministic and stochastic a ( n ), the strong consistency and asymptotic normality of M1, M2, and M3 are established. The conditions needed for consistency when X ( n ) is an autoregressive moving-average process are discussed in more detail. It is also shown that in general M1 and M3 are equally efficient asymptotically and M2 is never more efficient, and may be less efficient, than M1 and M3.
Chapter
In the three papers by Drs. Buccheri, Hertz and Norris, and Krolik, the basic problem considered is the detection of periodicities in signals when the signals come in various forms. These signals can be arrival times of radio pulsars (point processes), x-ray binaries, or high-energy gamma-rays. The problem of detection of periodicities is a classical problem considered in time series literature. Schuster introduced the “periodogram” in 1898 for detection of periodicities. Several decades later, Fisher (1929), in a classical paper [see also Whittle (1954), Grenander and Rosenblatt (1957)] derived an exact test for testing the significance of the largest peak, second largest, etc. These tests were under the assumption that we observe the signal against the background of a Gaussian white noise. This methodology has been extended by Hannan (1961) and Priestley (1962) to cover the situation where the noise is a colored (Gaussian) series. Throughout these developments, the assumption has been that the data are a discrete parameter, equally spaced, time series. We briefly describe these tests; see Priestley (1981) for details.
Chapter
The Direct Quadratic Spectrum Estimation (DQSE) method was defined in Marquardt and Acuff, 1982. Some of the theoretical properties of DQSE were explored. The method was illustrated with several numerical examples. The DQSE method is versatile in handling data that have irregular spacing or missing values; the method is computationally stable, is robust to isolated outlier observations in irregularly spaced data, is capable of fine frequency resolution, makes maximum use of all available data, and is easy to implement on a computer. Moreover, DQSE, coupled with irregularly spaced data, can provide a powerful diagnostic tool because irregularly spaced data are inherently resistant to aliasing problems that often are a limitation with equally spaced data.
Chapter
In Section II some examples of, and models for, missing and irregular data patterns will be reviewed. In Section III nonparametric estimation of covariances and spectra will be discussed and some general large sample theory for the estimates of covariances will be given. Various estimation schemes for parametric models will be reviewed in Section IV. Sections V and VI are concerned with estimation based on maximizing the Gaussian likelihood. The effect of various amounts and types of missing data on the distributional properties of these estimates is investigated via simulations and large sample theory. In particular the large sample information matrix for the first order autoregression is derived and it is indicated that the autoregressive coefficient can sometimes be more efficiently estimated by taking M observations over N > M time points than by taking M consecutive observations. In Section VII some pollution data is reanalyzed.
Chapter
Several different theoretical researchers have shown that alias-free spectral estimation is possible for stationary random processes if randomly timed instantaneous samples of the random process are available. There is no such thing as a physically realizable alias-free spectrum estimator because the information concerning the higher frequencies of the measured process is largely contained in the subset of the random samples sufficiently close together to measure them and in the absence of uncertainty of the exact time instants of the samples. Physical measuring devices always exhibit a small, but finite, dead time following a measurement instant during which no additional measurement can be made and/or there is always finite time jitter, or uncertainty, of the measurements. In spite of these effects, which produce bias errors, useful practical estimators have been developed over the past ten years which do not require the Nyquist criterion with respect to the mean sample rate. Such techniques have been developed by ourselves and others for computer analysis of burst-counter laser velocimeter data.
Chapter
1. Statistical Inference. Statistics is part of the methodology of science—pure and applied. It is pertinent to the various goals of science proper: explanation and understanding, prediction and control, discovery and application, justification, classification. Various writers have set down block diagrams illustrating how scientific enquiry proceeds and how statistics impinges on that process. We mention Bartlett (1967), Box (1976), Mohr (1977) and Parzen (1980). An early writerwas Kempthorne (1952) who set down (essentially) the following diagram.
Article
There are many occasions throughout structural health monitoring (SHM) in which collected data sets contain missing observations; such instances may occur as a result of failed communications or packet losses in a wireless sensor network or as a result of sensing and sampling methods, e.g., mobile sensing. By implementing modified Expectation and Maximization steps, Structural Identification using Expectation Maximization (STRIDE) is capable of processing data in these circumstances and is the first modal identification technique to formally accept data with missing observations. This paper presents the STRIDE algorithm, a statistical perspective of missing data, and new STRIDE equations that account for missing observations. Expectation step (E-step) equations are given explicitly for both partially observed time steps and those fully not observed. The maximization step (M-step) provides state-space parameter updates in terms of available observations and missing data state variable statistics. This paper also discusses the performance and convergence behavior of STRIDE with missing data. Finally, two applications are presented to exemplify common use in the contexts of Network Reliability and Mobile Sensing, both using data collected at Golden Gate Bridge. This paper demonstrates that sensor network data containing a significant amount of missing observations can be used to achieve a comprehensive modal identification. A successful real-world identification with simulated mobile sensors quantifies the preservation of spatial information, establishing benefits of this type of network and emphasizing an inquiry for future SHM implementations.
Article
The paper surveys principal methods of time series analysis in view of applications in biology. The main emphasis is the detection and measurement of rhythms. Methods of smoothing, regression, spectral estimation, periodogram analysis, complex demodulation, autoregressive model fitting,..., among others are discussed.
Article
This paper surveys the current state of large sample theory for estimation in stationary discrete time series observed at unequally spaced times. The paper considers the nonparametric estimation of sample covariances, correlations and spectra and traces the development of consistency and asymptotic normality results for these quantities. The second part of the paper discusses the estimation of finite parameter models for stationary time series. Consistent, but inefficient, methods based on sample covariances and on spectra are discussed and compared. The final part of the paper reports on the recent results concerning a general central limit theorem (for the estimate obtained by a single iteration from a consistent estimate) for Gaussian data. The essential condition on the sampling times is that the finite sample information matrix, when divided by the sample size, has a limit which is non-singular and has finite norm. This condition will be illustrated by considering examples of periodic sampling, of random but time homogeneous sampling, of sparse early sampling, and of asymptotically stationary sampling. The effect on efficiency of the design of sampling pattern will be briefly discussed. Finally it will be indicated what extra complexity of proof is needed to handle the non-Gaussian case.
Article
A brief review of time series analysis with missing observations and unequally spaced data in both time and frequency domain is presented. The exact likelihood for Gaussian ARMA processes can be calculated using Kalman recursive estimation, and non-linear optimization programs used to calculate the maximum likelihood estimates of the parameters including the observational error variance. This algorithm readily handles missing observations. For unequally spaced data it is necessary to consider continuous time models. This paper considers the problem of fitting a continuous time autoregression (all pole spectrum) with observational error. If sampled at equally spaced time points this process would be ARMA (p, p) so for p > 1 this gives the possibility of a more parsimonious representation. Using a state space representation, the exact likelihood function can be calculated by first performing an orthogonal rotation on the state vector producing an uncoupled equation of state with a diagonal state transition matrix. This greatly simplifies the recursive calculation of the likelihood for equally or unequally spaced data. Non-linear optimization involving constraints to ensure a stationary solution produces maximum likelihood estimates of the parameters and observational error variance. Numerical examples are presented of fitting continuous time models to yearly sunspot numbers and producing forecasts and of fitting models to unequally spaced respiration data.
Article
Techniques for reliably estimating the power spectral density function for both small and large samples of a stationary stochastic process are described. These techniques have been particularly successful in cases where the range of the spectrum is large. The methods are resistant to a moderate amount of contaminated or erroneous data and are well suited for use with auxiliary tests for stationarity and normality. Part I is concerned with background and theoretical considerations while examples from the development and analysis of the WT4 waveguide medium will be discussed in Part II, next issue.
Conference Paper
There are innumerable situations where the data observed from a non-stationary random field are collected with missing values. In this work a consistent estimate of the evolutionary spectral density is given where some observations are randomly missing.
Article
Despite the geophysical importance of solar ultraviolet radiation, specific aspects of its temporal variations have not yet been adequately determined experimentally, nor are the mechanisms for the variability completely understood. Satellite observations have verified the reality of solar ultraviolet irradiance variations over time scales of days and months, and model calculations have confirmed the association of these short‐term variations with the evolution and rotation of regions of enhanced magnetic activity on the solar disc. However, neither rocket nor satellite measurements have yet been made with sufficient accuracy and regularity to establish unequivocally the nature of the variability over the longer time of the 11‐year solar cycle. The comparative importance for the long‐term variations of local regions of enhanced magnetic activity and global scale activity perturbations is still being investigated. Solar ultraviolet irradiance variations over both short and long time scales are reviewed, with emphasis on their connection to solar magnetic activity. Correlations with ground‐based measures of solar variability are examined because of the importance of the ground‐based observations as historical proxies of ultraviolet irradiance variations. Current problems in understanding solar ultraviolet irradiance variations are discussed, and the measurements planned for solar cycle 22, which may resolve these problems, are briefly described.
Article
Evolution of the ocean is considered from the view of evolving moments of probability distributions of possible oceans. The result is to anticipate statistical mechanical forcings that are qualitatively different than those considered by conventional ocean modeling. Simplified practical implementation of statistical mechanical forcings within context of conventional modeling shows significant impact upon model results with the suggestion of improvement in areas of chronic model deficiency.
Article
Conventional methods of computing spectra require constant sampling rates and therefore must be modified to accommodate the randomly sampled data from the laser velocimeter. Four approaches that provide estimates of the power spectra from randomly sampled data are evaluated with respect to accuracy and computational speed. Simulated data of varying spectral content are used as input. An estimate of the correlation function that resolves the random time distribution into equidistant time intervals provides the best compromise between computational speed and accuracy for laser velocimeter data.
Article
Characteristics of temporal variability in the California Current system are analyzed using a 30-month time series of Coastal Zone Color Scanner (CZCS) imagery. About 20-25% of the variance is produced by a periodic annual cycle with peak values in winter. Analysis of ship-based chlorophyll measurements indicates that the winter peak is only characteristic of the upper portion of the euphotic zone and the total water column chlorophyll peaks during the spring upwelling season. Satellite studies of intra-annual variability are modulated by strong 5- to 6-day oscillation in the availability of usable imagery, resulting from a combination of satellite orbital dynamics, which produces images of the study area roughly 4 out of every 6 days, and an oscillation in cloud cover, which controls the availability of clear imagery. The cloud cover oscillation, which is also present in coastal winds, undoubtedly affects the ocean surface and biases the data obtained by satellites. Analysis of data using a 5-day time step indicates that the predominant mode of nonseasonal variability is characterized by imphase fluctuations throughout the southern and central California coastal region.
Article
Conventional methods of computing spectra require constant sampling rates and therefore must be modified to accommodate the randomly sampled data from the laser velocimeter. Four approaches that provide estimates of the power spectra from randomly sampled data are evaluated with respect to accuracy and computational speed. Simulated data of varying spectral content are used as input. An estimate of the correlation function that resolves the random time distribution into equidistant time intervals provides the best compromise between computational speed and accuracy for laser velocimeter data.
Article
The analysis method for regression model with unequally spaced tillU! series e"or is presented, which is based on the relationship between the Green function of continuous system and the autoregression parameters of the tillU! series. The conditional moximum likelihood estimation and exact moximum likelihood estimation of parameters of the regression model with unequally spaced co"elated e"or are discussed in detail The IlU!thod is not only suitable for the tillU! series with missing observations but also applicable to the irregularly sampled data in social and natural science. The method can also combine regression with autoregression and promote the precision of analysis andforecast. NUllU!rical examples are given at last, which can iUustrate the performance of the new method.
Article
Differences in the temporal behavior of the ultraviolet irradiance at 205 nm and the 10.7-cm radio flux, the ultraviolet irradiance at 121.6 nm, model calculations of the 205-nm irradiance derived from Ca II K plage emission, and the sunspot-blocking function are examined during a 5-year period near the maximum of solar cycle 21. Because of solar rotation the dominant variance in each of these time series occurs at 27 days, but real temporal differences arise because the five solar time series are each formed at different heights within the solar atmosphere and are associated with a variety of solar active region phenomena having different spatial and temporal characteristics on the solar disc. These differences may be important if the ground-based solar activity time series are used instead of the measured UV irradiances in correlation studies of solar variability with atmospheric parameters such as ozone densities and temperature. Recognizing the presence of autocorrelation in the UV irradiance time series is also important in solar terrestrial correlation studies, since it complicates the use of classical statistical techniques for estimating the significance of the results.
Article
The power spectra of the daily peak electron content measured at Hawaii are estimated via covariance estimations, bivariate autoregressive estimations and fast Fourier transforms for a year of data close to minimum solar activity (1965) and a year of data close to maximum solar activity (1969). The strong peaks about 6 days and 15 days in the 1965 and 1969 power spectra, respectively, suggest an influence of the interplanetary magnetic sector structure on the electron content at low latitude (21·3°N, geographic). The daily solar flux (Sa) at 2800 MHz of 1965 and 1969 are analysed similarly. The decrease in energy content with period range of 3–7 days in the 1969 Sa power spectrum supports the above point of view.
Article
Hourly carbon dioxide concentrations at the south pole were obtained by nondispersive infrared analyzers for the years 1975-1978 and 1980-1982. A spectral analysis of the ambient CO2 variability showed very little power for periods shorter than 5 days. Our data showed good agreement with other data sets for the range of the annual fluctuation from 1977 to 1982 and disagreements for 1976. The estimated annual CO2 increase (about 0.6 to 2 ppm yr-) and ranges of seasonal fluctuation were insensitive to the data selection methods used. After 1979, seasonal fluctuations apparently decreased.
Article
Oceanographic observations taken at Ocean Station "P" (50°N, 145°W) provide one of the few long oceanographic time series. The intermittent nature of the sampling program at Station "P," however, presents problems for a standard time-series analysis. In this paper the seriousness of this aspect of the data is discussed and spectral estimates are obtained for the potential temperature, potential density anomaly, and dynamic height anomaly at several depths. We found that the lowest frequency estimates of dynamic height anomaly are approximately an order of magnitude less than those obtained from a similar station in the North Atlantic. This is consistent with a more rapid falloff of energy with depth at Station "P".
Article
A comparison of several methods for spectral estimation of a univariate process with equi-spaced samples, including maximum entropy, linear predictive, and autoregressive techniques, is made. The comparison is conducted via simulation for situations both with and without bad (or missing) data points. The case of bad data points required extensions of existing techniques in the literature and is documented fully here in the form of processing equations and FORTRAN programs. It is concluded that the maximum entropy (Burg) technique is as good as any of the methods considered, for the univariate case. The methods considered are particularly advantageous for short data segments. This report also reviews several available techniques for spectral analysis under different states of knowledge and presents the interrelationships of the various approaches in a consistent notation. Hopefully, this non-rigorous presentation will clarify this method of spectral analysis for readers who are nonexpert in the field.
Article
The paper deals with testing of periodicity for time series with missing obser-vations. Two schemes of missing observations are considered: regularly missing obser-vation and observations missing randomly according to the BERNOULLI model
Article
This paper provides an example of some initial steps in signal analysis applied to a simple aquatic ecosystem in the form of a small artificial pond. Irradation, water temperature and dissolved oxygen concentration were continuously recorded over a two year period. Following the definition and discussion of several related parameters, an analysis procedure for trends and forced annual patterns was proposed and carried out. The annual pattern of photosynthetic fixation of solar energy is characterized by the annual level transfer efficiency, and for the periodic part of the process, by the transfer gain and phase shift. The possible role of temperature is briefly discussed.
Article
Irradiation, water temperature and dissolved oxygen of a small artificial pond were continuously recorded over a two year period. Following the elimination of previously estimated mean trends and forced annual patterns (LINGEMAN 1980) a spectral analysis procedure is proposed and executed for the residual signals of total diurnal irradiation, diel mean temperature and diurnal gross primary production. The residual power spectrum of diurnal irradiation was shown to be essentially the one of a white noise. Several higher frequencies were significantly present in both the signals of temperature and production. Some consideration is given to the cross correlation between the latter two parameters.
Article
A process generated by a stochastic differential equation driven by pure noise is sampled at irregular intervals. A model for the sampled sequence is deduced. We describe a maximum likelihood procedure for estimating the parameters and establish the strong consistency and asymptotic normality of the estimates. The use of the model in prediction is considered. Simplifications in the case of periodic sampling are explored.
Article
Full-text available
A class of spectral windows depending on one parameter is presented and shown to include many of the common windows. The mean square rate of convergence of the associated spectral density estimators are calculated in terms of this parameter for spectral densities which are locally Lipschitz continuous The class is shown to include certain data tapers and data windows corresponding to missing observations. This is true also for the kernels of (C−α) summability which provide means for estimating the spectral density when the covariance function is periodic.
Article
The effect of regularly missed observations on the estimation of parameters of an autoregressive (AR) process is investigated by using the frequency domain method. For first order AR processes, numerical results are shown to see a behavior of variances of the estimate due to the missed observations. In some cases, we can positively utilize the concept of missed observations to decrease the variances if the number of observations is fixed but time instants at with the observations are made can be changed.
Article
Classical statistical tests for trend, both parametric and nonparametric, assume independence of observations, a condition rarely encountered in time series obtained by using moderate to high sample frequencies. A method is developed for summarizing the power of the parametric t tests and the nonparametric Spearman's rho test and Mann‐Whitney's test against step and linear trends in a dimensionless ‘trend number’ which is a function of trend magnitude, standard deviation of the time series, and sample size. For the case of dependent observations, use of an equivalent independent sample size rather than the actual sample size is shown to enable use of the same trend number developed for the independent case. An important related result is the existence of an upper limit on power (trend detectability) over a fixed time horizon, regardless of the number of samples taken, for a lag 1 Markov process.
Article
Full-text available
This article discusses the sampling of stationary discrete-time stochastic processes at fixed but unequally spaced time points. The pattern of the sampling times is periodic with a cycle of p time units. One of the major problems is to determine given p the minimum number of sampling points required per cycle in order to estimate the covariances at all lags. The second problem is to find a pattern of distribution for the sampling points within the cycle which will allow the estimation of all covariances. A discussion of the references which describe the statistical properties of the estimates of covariances and spectra in this sampling situation is given.
Article
This chapter presents the review of various approaches to power spectrum estimation. Many time series in the natural sciences and economics contain very strong periodic effects, and their detection has been the objective of some of the earliest investigations of time series. An important periodic effect will manifest itself in a readily identifiable spectral peak at the corresponding frequency, its influence on the process being measured by the magnitude of the peak. The presence of spectral peaks leads, however, to serious difficulties in power spectrum estimation. In some applications, such as in seismography, the objective is to distinguish between two stationary time series or to classify a series and the power spectrum is a convenient discriminator. The Wiener–Kolmogorov theory of prediction and smoothing leads to frequency-domain formulas that require power spectrum estimates for their practical implementation. Because of their nonparametric nature, spectrum estimates are unlikely to be accurate or reliable unless based on a substantial amount of data.
Article
Three related estimators are considered for the parametrized spectral density of a discrete-time process X(n), n = 1, 2, .. , when observations are not available for all the values n = 1(1)N. Each of the estimators is obtained by maximizing a frequency domain approximation to a Gaussian likelihood, although they do not appear to be the most efficient estimators available because they do not fully utilize the information in the process a(n) which determines whether X(n) is observed or missed. One estimator, called M3, assumes that the second-order properties of a(n) are known; another, M2, lets these be known only up to an unknown parameter vector; the third, M1, requires no model for a(n). Under representative sets of conditions, which allow for both deterministic and stochastic a(n), the strong consistency and asymptotic normality of M1, M2, and M3 are established. The conditions needed for consistency when X(n) is an autoregressive moving-average process are discussed in more detail. It is also shown that in general M1 and M3 are equally efficient asymptotically and M2 is never more efficient, and may be less efficient, than M1 and M3.
Article
The paper deals with the problem of (at least) one of the data series, given two or more of them, is not completely known. We assume that the series can be modelled according to the linear hypothesis, and that there is some correlation among them. In this paper we develop two models for estimation of the unknown values of that (or those) data series. Se dispone de dos o más series de datos, de las cuales al menos una no se conoce completamente. Se supone que las series se pueden modelizar con la hipótesis lineal; así como que existe alguna estructura de correlación entre ellas. Se desarrollan dos modelos para estimar los valores desconocidos de la(s) serie(s) de datos.
Article
The measurement of power spectra is a problem of steadily increasing importance which appears to some to be primarily a problem in statistical estimation. Others may see it as a problem of instrumentation, recording and analysis which vitally involves the ideas of transmission theory. Actually, ideas and techniques from both fields are needed. When they are combined, they provide a basis for developing the insight necessary (i) to plan both the acquisition of adequate data and sound procedures for its reduction to meaningful estimates and (ii) to interpret these estimates correctly and usefully. This account attempts to provide and relate the necessary ideas and techniques in reasonable detail — Part I of this article appeared in the January, 1958 issue of THE BELL SYSTEM TECHNICAL JOURNAL.
Article
Shapiro and Silverman (1960) have shown that in certain additive random sampling schemes (such as Poisson sampling), the entire covariance function is uniquely determined by the correlation sequence, thus eliminating aliasing. This enables one to estimate the entire spectral density of a continuous process by discrete sampling, and suggests the possibility of constructing sampling patterns which will also eliminate aliasing.
Article
An efficient method for the calculation of the interactions of a 2' factorial ex- periment was introduced by Yates and is widely known by his name. The generaliza- tion to 3' was given by Box et al. (1). Good (2) generalized these methods and gave elegant algorithms for which one class of applications is the calculation of Fourier series. In their full generality, Good's methods are applicable to certain problems in which one must multiply an N-vector by an N X N matrix which can be factored into m sparse matrices, where m is proportional to log N. This results inma procedure requiring a number of operations proportional to N log N rather than N2. These methods are applied here to the calculation of complex Fourier series. They are useful in situations where the number of data points is, or can be chosen to be, a highly composite number. The algorithm is here derived and presented in a rather different form. Attention is given to the choice of N. It is also shown how special advantage can be obtained in the use of a binary computer with N = 2' and how the entire calculation can be performed within the array of N data storage locations used for the given Fourier coefficients. Consider the problem of calculating the complex Fourier series N-1 (1) X(j) = EA(k)-Wjk, j = 0 1, * ,N- 1, k=0
Article
Estimating the spectral density of a discrete stationary stochastic process is studied for the case when the observations consist of repeated groups of α\alpha equally spaced observations followed by β\beta missed observations, (α>β)(\alpha > \beta). The asymptotic variance of the estimate is derived for normally distributed variables. It is found that this variance depends not only on the value of the spectral density being estimated, but also on the spectral density at the harmonic frequencies brought in by the periodic method of sampling. Curves are presented for β=1\beta = 1 showing the increase in the standard deviation and effective decrease in sample size as a function of α\alpha.