Article

Cramér–Karhunen–Loève representation and harmonic principal component analysis of functional time series

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We develop a doubly spectral representation of a stationary functional time series, and study the properties of its empirical version. The representation decomposes the time series into an integral of uncorrelated frequency components (Cramér representation), each of which is in turn expanded in a Karhunen–Loève series. The construction is based on the spectral density operator, the functional analogue of the spectral density matrix, whose eigenvalues and eigenfunctions at different frequencies provide the building blocks of the representation. By truncating the representation at a finite level, we obtain a harmonic principal component analysis of the time series, an optimal finite dimensional reduction of the time series that captures both the temporal dynamics of the process, as well as the within-curve dynamics. Empirical versions of the decompositions are introduced, and a rigorous analysis of their large-sample behaviour is provided, that does not require any prior structural assumptions such as linearity or Gaussianity of the functional time series, but rather hinges on Brillinger-type mixing conditions involving cumulants.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... where the convergence holds in the appropriate operator norm. For processes of which F ω has absolutely summable eigenvalues, Panaretos and Tavakoli (2013a) derived a functional Cramér representation and showed that the eigenfunctions of F ω allow a harmonic principal component analysis, providing an optimal representation of the time series in finite dimension. This was later relaxed in Tavakoli (2014) to processes with only weak spectral density operators implicitly defined by (1.1). ...
... where the convergence holds in~¨~1. Panaretos and Tavakoli (2013a) showed that a zero mean weakly stationary functional time series X satisfying condition (2.1) admits a functional spectral representation of the form ...
... An important ingredient in establishing this classical theorem is the existence of an isometric isomorphism that allows to identify a weakly stationary time series on the integers with an orthogonal increment process on r´π, πs. As already mentioned, an initial generalization of the Cramér representation to weakly stationary functional time series was first considered by Panaretos and Tavakoli (2013a), but is restricted to processes for which the assumption ř hP ~C h~1 ă 8 holds. In this section, we shall use the established functional Herglotz's theorem (Theorem 3) to derive a functional Cramér representation that can be seen as a true generalization of the classical theorem to the function space. ...
Preprint
In this article, we prove Herglotz's theorem for Hilbert-valued time series. This requires the notion of an operator-valued measure, which we shall make precise for our setting. Herglotz's theorem for functional time series allows to generalize existing results that are central to frequency domain analysis on the function space. In particular, we use this result to prove the existence of a functional Cram{\'e}r representation of a large class of processes, including those with jumps in the spectral distribution and long-memory processes. We furthermore obtain an optimal finite dimensional reduction of the time series under weaker assumptions than available in the literature. The results of this paper therefore enable Fourier analysis for processes of which the spectral density operator does not necessarily exist.
... Functional data carry infinite-dimensional intrinsic variation and in order to exploit this rich source of information, it is important to optimally extract defining characteristics to finite dimension via techniques such as functional PCA (FPCA). In the case of stationary dependent functional data, the shape and smoothness properties of the random curves are completely encoded by the spectral density operator, which has been shown to allow for an optimal lower dimension representation via dynamic FPCA (see e.g., Panaretos and Tavakoli, 2013a;Hörmann et al., 2015). Since the assumption of weak stationarity is often too restrictive, we aim to provide the building blocks for statistical inference of nonstationary functional time series and for the development of techniques such as time-varying dynamic FPCA. ...
... We start by the main assumptions needed for a frequency domain characterization of local stationarity. In contrast to Panaretos and Tavakoli (2013a), who only consider transfer functions in B 2 , we also prove the more general case of transfer functions in B 8 as it includes the important case of functional autoregressive processes. The necessary results are proved in section B2.3 of the Appendix. ...
... The proof is relegated to section A1.1 of the Appendix. For p " 2, the proposition yields a time-varying version of the corresponding result of Panaretos and Tavakoli (2013a). The more general case p " 8 also includes linear models introduced by Bosq (2000) and Hörmann and Kokoszka (2010) as well as the important class of time-varying functional autoregressive processes, which we discuss in detail in the next section. ...
Preprint
The literature on time series of functional data has focused on processes of which the probabilistic law is either constant over time or constant up to its second-order structure. Especially for long stretches of data it is desirable to be able to weaken this assumption. This paper introduces a framework that will enable meaningful statistical inference of functional data of which the dynamics change over time. We put forward the concept of local stationarity in the functional setting and establish a class of processes that have a functional time-varying spectral representation. Subsequently, we derive conditions that allow for fundamental results from nonstationary multivariate time series to carry over to the function space. In particular, time-varying functional ARMA processes are investigated and shown to be functional locally stationary according to the proposed definition. As a side-result, we establish a Cram\'er representation for an important class of weakly stationary functional processes. Important in our context is the notion of a time-varying spectral density operator of which the properties are studied and uniqueness is derived. Finally, we provide a consistent nonparametric estimator of this operator and show it is asymptotically Gaussian using a weaker tightness criterion than what is usually deemed necessary.
... These methods are usually called the dynamic versions of functional principal component analysis, abbreviated as DFPCA. In addition, Panaretos and Tavakoli (2013); Hörmann et al. (2015); Tan et al. (2024) introduced a more general DFPCA for dimension reduction using dynamic Karhunen-Loève expansions. ...
... Compared to conventional KL expansions, each component in the dynamic KL expansion is a convolution of functional filters and dynamic FPC scores, where the functional filters are obtained by the eigenfunctions of spectral density functions. This kind of convolution structure is more general than the conventional KL expansion, leading to optimal dimension reduction for stationary FTS (Panaretos and Tavakoli, 2013;Hörmann et al., 2015;Tan et al., 2024). ...
... Overall, we can employ static FPCA for dimension reduction of FTS, while it may be overly simplistic and not optimal in capturing serial dependencies within data. Meanwhile, DFPCA (Panaretos and Tavakoli, 2013;Hörmann et al., 2015;Tan et al., 2024) accounts for serial dependencies via dynamic KL expansions and obtains optimal dimension reductions, yet it may not provide a parsimonious representation and has some practical issues remaining in its estimation procedures. Intuitively, FTS with a more complex dependence structure may require a more complicated representation; however, such complexity is preferable to be minimized to avoid redundancy in dimension reduction. ...
Preprint
Full-text available
Functional time series (FTS) are increasingly available from diverse real-world applications such as finance, traffic, and environmental science. To analyze such data, it is common to perform dimension reduction on FTS, converting serially dependent random functions to vector time series for downstream tasks. Traditional methods like functional principal component analysis (FPCA) and dynamic FPCA (DFPCA) can be employed for the dimension reduction of FTS. However, these methods may either not be theoretically optimal or be too redundant to represent serially dependent functional data. In this article, we introduce a novel dimension reduction method for FTS based on dynamic FPCA. Through a new concept called optimal functional filters, we unify the theories of FPCA and dynamic FPCA, providing a parsimonious and optimal representation for FTS adapting to its serial dependence structure. This framework is referred to as principal analysis via dependency-adaptivity (PADA). Under a hierarchical Bayesian model, we establish an implementation procedure of PADA for dimension reduction and prediction of irregularly observed FTS. We establish the statistical consistency of PADA in achieving parsimonious and optimal dimension reduction and demonstrate its effectiveness through extensive simulation studies. Finally, we apply our method to daily PM2.5 concentration data, validating the effectiveness of PADA for analyzing FTS data.
... In this paper, we address such a generalization valid in the functional context. Related issued have been recently considered in [21,22,26] where, in particular, the authors derive a functional version of the Cramér representation which relies on a spectral density operator defined under strong assumptions on the covariance structure of the time series. Under the same assumption, [22] introduced filters whose transfer functions are valued in a restricted set of operators and this was latter generalized to bounded-operator-valued transfer functions in [26], Section 2.5 (see also [27], Appendix B.2. 3). ...
... Related issued have been recently considered in [21,22,26] where, in particular, the authors derive a functional version of the Cramér representation which relies on a spectral density operator defined under strong assumptions on the covariance structure of the time series. Under the same assumption, [22] introduced filters whose transfer functions are valued in a restricted set of operators and this was latter generalized to bounded-operator-valued transfer functions in [26], Section 2.5 (see also [27], Appendix B.2. 3). An application of this spectral theory to dimension reduction is proposed by the means of a harmonic functional principal component analysis (see [14,22]). ...
... Under the same assumption, [22] introduced filters whose transfer functions are valued in a restricted set of operators and this was latter generalized to bounded-operator-valued transfer functions in [26], Section 2.5 (see also [27], Appendix B.2. 3). An application of this spectral theory to dimension reduction is proposed by the means of a harmonic functional principal component analysis (see [14,22]). A more general approach is adopted in [28] where the authors provide a definition of operator-valued measures from which they derive a functional version of the Herglotz theorem, the functional Cramér representation, the definition of linear filters with bounded-operator-valued transfer functions and a harmonic functional principal component analysis in the case where the spectral measure has finitely many discontinuities. ...
Article
Full-text available
The spectral theory for weakly stationary processes valued in a separable Hilbert space has known renewed interest in the past decade. Here we follow earlier approaches which fully exploit the normal Hilbert module property of the time domain. The key point is to build the Gramian-Cramér representation as an isomorphic mapping from the modular spectral domain to the modular time domain. We also discuss the general Bochner theorem and provide useful results on the composition and inversion of lag-invariant linear filters. Finally, we derive the Cramér-Karhunen-Loève decomposition and harmonic functional principal component analysis, which are established without relying on additional assumptions.
... Their spectrum provides a singular system separating the stochastic and functional fluctuations of X, allowing for optimal finite dimensional approximations and functional PCA via the Karhunen-Loève expansion. And, that same singular system arises as the natural means of regularisation for inference problems (such as regression and testing) which are ill-posed in infinite dimensions (Panaretos and Tavakoli [41], Wang et al. [56]). ...
... Once a Fréchet mean of a given sample of covariance operators is found, the second order statistical analysis is to understand the variation of the sample around this mean. The optimal value of the Fréchet functional gives a coarse measure of variance (as a sum of squared distances of the observation from their mean), but it is desirable to find a parsimonious representation for the main sources/paths of variation in the sample, analogous to Principal Component Analysis (PCA) in Euclidean spaces [32] and functional versions thereof in Hilbert spaces [41]. ...
Preprint
Covariance operators are fundamental in functional data analysis, providing the canonical means to analyse functional variation via the celebrated Karhunen--Lo\`eve expansion. These operators may themselves be subject to variation, for instance in contexts where multiple functional populations are to be compared. Statistical techniques to analyse such variation are intimately linked with the choice of metric on covariance operators, and the intrinsic infinite-dimensionality of these operators. In this paper, we describe the manifold geometry of the space of trace-class infinite-dimensional covariance operators and associated key statistical properties, under the recently proposed infinite-dimensional version of the Procrustes metric. We identify this space with that of centred Gaussian processes equipped with the Wasserstein metric of optimal transportation. The identification allows us to provide a complete description of those aspects of this manifold geometry that are important in terms of statistical inference, and establish key properties of the Fr\'echet mean of a random sample of covariances, as well as generative models that are canonical for such metrics and link with the problem of registration of functional data.
... These too contribute to the ill-posedness of the problem, which is now doubly ill-posed: one needs to solve an operator deconvolution problem, where the "Fourier division" step is replaced with the solution of an integral equation. To account for these two layers of ill-posedness, one needs to consider the frequency domain framework (Panaretos and Tavakoli (2013a), Panaretos and Tavakoli (2013b)), and it turns out that the operator that needs to be inverted as part of the normal equations is now the spectral density operator of the process {X t }, ...
... the Fourier transform of the lag t autocovariance operators R t of {X t }. Just as estimation in the i.i.d case is based on the spectral truncation or the ridge regularisation of the covariance operator, estimation in the time series case can be based on the spectral truncation or ridge regularisation of the spectral density operator (achieved by harmonic or dynamic PCA, see Panaretos and Tavakoli (2013b) and Hörmann, Kidziński and Hallin (2015)). The spectral truncation approach was recently considered and studied by Hörmann, Kidziński and Kokoszka (2015), and indeed this appears to be the first contribution to the theory of time series regression without any structural assumptions further to weak dependence (to be contrasted to the functional regression of linear processes, which are much better understood, see Bosq (2012)). ...
Preprint
The functional linear model extends the notion of linear regression to the case where the response and covariates are iid elements of an infinite dimensional Hilbert space. The unknown to be estimated is a Hilbert-Schmidt operator, whose inverse is by definition unbounded, rendering the problem of inference ill-posed. In this paper, we consider the more general context where the sample of response/covariate pairs forms a weakly dependent stationary process in the respective product Hilbert space: simply stated, the case where we have a regression between functional time series. We consider a general framework of potentially nonlinear processes, exploiting recent advances in the spectral analysis of time series. Our main result is the establishment of the rate of convergence for the corresponding estimators of the regression coefficients, the latter forming a summable sequence in the space of Hilbert-Schmidt operators. In a sense, our main result can be seen as a generalisation of the classical functional linear model rates, to the case of time series, and rests only upon cumulant mixing conditions. While the analysis becomes considerably more involved in the dependent case, the rates are strikingly comparable to those of the i.i.d. case, but at the expense of an additional factor caused by the necessity to estimate the spectral density operator at a nonparametric rate, as opposed to the parametric rate for covariance operator estimation.
... Regarding stationarity assumptions, Horváth, Kokoszka, and Rice (2014) extended KPSS statistics to test the null hypothesis of FTS stationarity. In contrast, Panaretos and Tavakoli (2013) and Hörmann, Kidziński, and Hallin (2015) utilized Fourier analysis to introduce dynamic FPCA for stationary FTS. Vector autoregressive (VAR) models for forecasting FTS were introduced by Aue, Norinho, and Hörmann (2015). ...
... As observed in FPCA methodology, the FPCs solely rely on information from the the autocovariance operator at lag 0. However, to leverage information from other lags, the concept of dynamic FPCA was introduced by Hörmann et al. (2015) and Panaretos and Tavakoli (2013). Moreover, the existing FPCA method has been developed for independent observations, which is a serious weakness when we are dealing with FTS data. ...
Article
Full-text available
Functional time series (FTS) analysis has emerged as a potent framework for modeling and forecasting time‐dependent data with functional attributes. In this comprehensive review, we navigate through the intricate landscape of FTS methodologies, meticulously surveying the core principles of univariate FTS and delving into the nuances of multivariate FTS. The journey commences with an exploration of the foundational aspects of univariate FTS analysis. We delve into representation, estimation, and modeling, spotlighting the effectiveness of various parametric and nonparametric models at our disposal. The stage then transitions to multivariate FTS analysis, where we confront the intricacies posed by high‐dimensional data. We explore strategies for dimensionality reduction, forecasting, and the integration of diverse parametric and nonparametric models within the multivariate realm. We also highlight commonly used R packages for modeling and forecasting FTS and multivariate FTS. Acknowledging the dynamic evolution of the field, we dissect challenges and chart future directions, paving a course for refinement and innovation. Through a fusion of multivariate statistics, functional analysis, and time series forecasting, this review underscores the interdisciplinary essence of FTS analysis. It not only reveals past accomplishments, but also illuminates the potential of FTS in unraveling insights and facilitating well‐informed decisions across diverse domains. This article is categorized under: Data: Types and Structure > Time Series, Stochastic Processes, and Functional Data Statistical and Graphical Methods of Data Analysis > Multivariate Analysis
... which encapsulates the second-order stochastic dynamics of the process. Panaretos and Tavakoli [32,33] introduced a frequency-domain approach to functional time series analysis. Their method develops a spectral representation for stationary sequences that simultaneously capture both the within curve dynamics (the dynamics of the curve {X 0 (τ ) : τ ∈ [0, 1]}) as well as the between curve dynamics (the dynamics of the sequence {X t : t ∈ Z}). ...
... Given an m-vector whose components are stationary functional time series, say (X (1) , . . . , X (m) ), one can postulate a model where each X (j) is generated conditionally on the realisation of (a random) spectral density F (j) , via a Cramér-Karhunen-Loève expansion [32]. Conversely, one may conduct a PCA on the spectral densities, i.e. the flows of spectral density operators F (1) , . . . ...
Preprint
Full-text available
We develop a statistical framework for conducting inference on collections of time-varying covariance operators (covariance flows) over a general, possibly infinite dimensional, Hilbert space. We model the intrinsically non-linear structure of covariances by means of the Bures- Wasserstein metric geometry. We make use of the Riemmanian-like structure induced by this metric to define a notion of mean and covariance of a random flow, and develop an associ- ated Karhunen-Lo`eve expansion. We then treat the problem of estimation and construction of functional principal components from a finite collection of covariance flows, observed fully or irregularly. Our theoretical results are motivated by modern problems in functional data analysis, where one observes operator-valued random processes – for instance when analysing dynamic functional connectivity and fMRI data, or when analysing multiple functional time se- ries in the frequency domain. Nevertheless, our framework is also novel in the finite-dimensions (matrix case), and we demonstrate what simplifications can be afforded then. We illustrate our methodology by means of simulations and data analyses.
... Since we do not perform such preliminary FPCA approximation, the common component does not get lost with our method. A similar point can be made if one performs, as a preliminary step, the usual FPCA projection of each series based on the lag-0 covariance operators E [x i0 ⊗ x i0 ] (as opposed to the long-run covariance), or even a more sophisticated dynamic (or harmonic) FPCA projection (Panaretos and Tavakoli, 2013a;Hörmann, Kidziński and Hallin, 2015). ...
... That factor model is only a particular case of the general (also called generalized) dynamic factor model introduced by Forni et al. (2000), where factors are loaded in a fully dynamic way via filters (see Hallin and Lippi (2013) for the advantages that generalized approach). An extension of the general dynamic factor model to the functional setting is, of course, highly desirable, but requires a more general representation result involving filters with operatorial coefficients and the concept of functional dynamic or harmonic principal components (Panaretos and Tavakoli, 2013a;Hörmann, Kidziński and Hallin, 2015). (ii) Another crucial issue is the analysis of volatilities. ...
Article
In this paper, which consists of two parts (Part I: representation results; Part II: estimation and forecasting methods), we set up the theoretical foundations for a high‐dimensional functional factor model approach in the analysis of large cross‐sections (panels) of functional time series (FTS). In Part I, we establish a representation result stating that, under mild assumptions on the covariance operator of the cross‐section, we can represent each FTS as the sum of a common component driven by scalar factors loaded via functional loadings, and a mildly cross‐correlated idiosyncratic component. Our model and theory are developed in a general Hilbert space setting that allows for mixed panels of functional and scalar time series. We then turn, in Part II, to the identification of the number of factors, and the estimation of the factors, their loadings, and the common components. We provide a family of information criteria for identifying the number of factors, and prove their consistency. We provide average error bounds for the estimators of the factors, loadings, and common components; our results encompass the scalar case, for which they reproduce and extend, under weaker conditions, well‐established similar results. Under slightly stronger assumptions, we also provide uniform bounds for the estimators of factors, loadings, and common components, thus extending existing scalar results. Our consistency results in the asymptotic regime where the number N of series and the number T of time observations diverge thus extend to the functional context the “blessing of dimensionality” that explains the success of factor models in the analysis of high‐dimensional (scalar) time series. We provide numerical illustrations that corroborate the convergence rates predicted by the theory, and provide a finer understanding of the interplay between N and T for estimation purposes. We conclude with an application to forecasting mortality curves, where we demonstrate that our approach outperforms existing methods
... Particularly, the asymptotic normality of the functional discrete Fourier transform (fDFT) of the curve data is proved (see also [38]). In [32], a harmonic principal component analysis of functional time series, based on Karhunen-Loéve-like decomposition in the temporal functional spectral domain is proposed, the so-called Cramér-Karhunen-Loéve representation (see also [35]; [36]). Some recent applications in the context of functional regression are obtained in [33]. ...
... From equation (32), for ω ∈ [−π, π], the projection ...
Preprint
This paper introduces the statistical analysis of Jacobi frequency varying Long Range Dependence (LRD) functional time series in connected and compact two-point homogeneous spaces. The convergence to zero, in the Hilbert-Schmidt operator norm, of the integrated bias of the periodogram operator is proved under alternative conditions to the ones considered in Ruiz-Medina (2022). Under this setting of conditions, weak-consistency of the minimum contrast parameter estimator of the LRD operator holds. The case where the projected manifold process can display Short Range Dependence (SRD) and LRD at different manifold scales is also analyzed. The estimation of the spectral density operator is addressed in this case. The performance of both estimation procedures is illustrated in the simulation study undertaken within the families of multifractionally integrated spherical functional autoregressive-moving average (SPHARMA) processes.
... and their corresponding norms which are respectively equivalent to · L 2 (S 2 ×S 2 ) and (D ⊗ D) · L 2 (S 2 ×S 2 ) . Accordingly, the set {ψ j ⊗ ψ j } forms an orthonormal basis for L 2 (S 2 ×S 2 ) endowed with (34) and an orthogonal basis for H p endowed with (35). All the results can be then extended. ...
... Note that cumulants mixing conditions arise naturally in this context, since we are essentially dealing with moment-based estimators. See for instance [34,41]. Definition 6. ...
... The sought representation is in terms of a Fourier series built using the eigenfunctions {ϕ k } of the integral operator R with kernel Cov(X(t), X(s)). Such a finitedimensional representation is key in functional data analysis: not only does it serve as a basis for motivating methodology by analogy to multivariate statistics, but it constitutes the canonical means of regularization in regression, testing, and prediction, which are all ill-posed inverse problems when dealing with functional data; see Panaretos and Tavakoli [18] for an account of the genesis and evolution of functional PCA and Wang et al. [22] for an overview of its manifold applications in functional data analysis. ...
Preprint
Functional data analyses typically proceed by smoothing, followed by functional PCA. This paradigm implicitly assumes that rough variation is due to nuisance noise. Nevertheless, relevant functional features such as time-localised or short scale fluctuations may indeed be rough relative to the global scale, but still smooth at shorter scales. These may be confounded with the global smooth components of variation by the smoothing and PCA, potentially distorting the parsimony and interpretability of the analysis. The goal of this paper is to investigate how both smooth and rough variations can be recovered on the basis of discretely observed functional data. Assuming that a functional datum arises as the sum of two uncorrelated components, one smooth and one rough, we develop identifiability conditions for the recovery of the two corresponding covariance operators. The key insight is that they should possess complementary forms of parsimony: one smooth and finite rank (large scale), and the other banded and potentially infinite rank (small scale). Our conditions elucidate the precise interplay between rank, bandwidth, and grid resolution. Under these conditions, we show that the recovery problem is equivalent to rank-constrained matrix completion, and exploit this to construct estimators of the two covariances, without assuming knowledge of the true bandwidth or rank; we study their asymptotic behaviour, and then use them to recover the smooth and rough components of each functional datum by best linear prediction. As a result, we effectively produce separate functional PCAs for smooth and rough variation.
... Panaretos and Tavakoli [13] for a motivation. The spectral representations in Theorems 2 and 3 are straightly related to this methodology. ...
Preprint
Many dynamical phenomena display a cyclic behavior, in the sense that time can be partitioned into units within which distributional aspects of a process are homogeneous. In this paper, we introduce a class of models - called conjugate processes - allowing the sequence of marginal distributions of a cyclic, continuous-time process to evolve stochastically in time. The connection between the two processes is given by a fundamental compatibility equation. Key results include Laws of Large Numbers in the presented framework. We provide a constructive example which illustrates the theory, and give a statistical implementation to risk forecasting in financial data.
... They have an established place in FDA research with readily available R and matlab implementations. In principle, other basis systems could be used, especially those custom-developed for time series of functions, see, e.g., Hörmann et al. [22] and Panaretos and Tavakoli [36], or even those going beyond linear dimension reduction, see Li and Song [30]. In each of these cases, our general approach could be applied to the resulting scores, but new asymptotic justifications and numerical implementations would have to be developed. ...
Preprint
Functional panels are collections of functional time series, and arise often in the study of high frequency multivariate data. We develop a portmanteau style test to determine if the cross-sections of such a panel are independent and identically distributed. Our framework allows the number of functional projections and/or the number of time series to grow with the sample size. A large sample justification is based on a new central limit theorem for random vectors of increasing dimension. With a proper normalization, the limit is standard normal, potentially making this result easily applicable in other FDA context in which projections on a subspace of increasing dimension are used. The test is shown to have correct size and excellent power using simulated panels whose random structure mimics the realistic dependence encountered in real panel data. It is expected to find application in climatology, finance, ecology, economics, and geophysics. We apply it to Southern Pacific sea surface temperature data, precipitation patterns in the South-West United States, and temperature curves in Germany.
... Hence FPCA might not be optimal for functional time series. In Hörmann et al. [11] and Panaretros and Tavakoli [12] an optimal dimension reduction for dependent data is introduced. They propose a filtering technique based on a frequency domain approach, which reduces the dimension in such a way that the score vectors form a multivariate time series with diagonal lagged covariance matrices. ...
Preprint
This work is devoted to functional ARMA(p,q)(p, q) processes and approximating vector models based on functional PCA in the context of prediction. After deriving sufficient conditions for the existence of a stationary solution to both the functional and the vector model equations, the structure of the approximating vector model is investigated. The stationary vector process is used to predict the functional process. A bound for the difference between vector and functional best linear predictor is derived. The paper concludes by applying functional ARMA processes for the modeling and prediction of highway traffic data.
... The consistency of the weighted periodogram operator, in the sense of the integrated mean square error, as well as pointwise, in the Hilbert-Schmidt operator norm, is also derived under SRD. In [21], a harmonic principal component analysis of functional time series in the temporal functional spectral domain is derived, based on a Karhunen-Loéve-like decomposition, the so-called Cramér-Karhunen-Loéve representation (see also [26]; [27]). In the context of functional regression, some applications are presented in [22] and [30]. ...
Preprint
A statistical hypothesis test for long range dependence (LRD) in manifold-supported functional time series is formulated in the spectral domain. The proposed test statistic operator is based on the weighted periodogram operator. It is assumed that the elements of the spectral density operator family are invariant with respect to the group of isometries of the manifold. A Central Limit Theorem is derived to obtain the asymptotic Gaussian distribution of the proposed test statistics operator under the null hypothesis. The rate of convergence to zero, in the Hilbert--Schmidt operator norm, of the bias of the integrated empirical second and fourth order cumulant spectral density operators is established under the alternative hypothesis. The consistency of the test is derived, from the consistency, in the sense of the integrated mean square error, of the weighted periodogram operator under LRD. Our proposal to implement, in practice, the testing approach is based on the temporal-frequency-varying Karhunen-Lo\'eve expansion obtained here for invariant random Hilbert-Schmidt kernels on manifolds. A simulation study illustrates the main results regarding asymptotic normality and consistency, and the empirical size and power properties of the proposed testing approach.
... If X is Gaussian, then the η j are independent and standard normal. Non-Gaussian examples are given by elliptical models and factor models, see for instance Lopes [35] and Jirak and Wahl [26] for concrete computations or Hörmann et al. [19] and Panaretos and Tavakoli [42] for the context of functional data analysis. ...
Article
Full-text available
Given finite i.i.d. samples in a Hilbert space with zero mean and trace-class covariance operator ΣΣ\Sigma , the problem of recovering the spectral projectors of ΣΣ\Sigma naturally arises in many applications. In this paper, we consider the problem of finding distributional approximations of the spectral projectors of the empirical covariance operator Σ^Σ^{\hat{\Sigma }}, and offer a dimension-free framework where the complexity is characterized by the so-called relative rank of ΣΣ\Sigma . In this setting, novel quantitative limit theorems and bootstrap approximations are presented subject to mild conditions in terms of moments and spectral decay. In many cases, these even improve upon existing results in a Gaussian setting.
... The approach of going to the Fourier domain analysis bypasses this issue. There are further related papers dealing with Fourier domain analysis: e.g., Aue and van Delft [17] and Panaretos and Tavakoli [18,19]. ...
... The traditional FPCA method, also known as static FPCA, was developed for independent observations, a major drawback of working with FTS. The dynamic FPCA approach is an improved alternative for FTS forecasting (see, e.g., Hörmann et al., 2015;Panaretos and Tavakoli, 2013;Rice and Shang, 2017 Antoniadis et al. (2006Antoniadis et al. ( , 2016 execute one-step-ahead prediction using a nonparametric wavelet kernel; pointwise prediction intervals are produced using a re-sampling approach. Some other contributions, such as Raña et al. (2016) and Vilar et al. (2018), use modelbased bootstrap procedures for constructing pointwise prediction intervals for one-step-ahead prediction. ...
Preprint
Full-text available
We consider modeling and forecasting high-dimensional functional time series (HDFTS), which can be cross-sectionally correlated and temporally dependent. We present a novel two-way functional median polish decomposition, which is robust against outliers, to decompose HDFTS into deterministic and time-varying components. A functional time series forecasting method, based on dynamic functional principal component analysis, is implemented to produce forecasts for the time-varying components. By combining the forecasts of the time-varying components with the deterministic components, we obtain forecast curves for multiple populations. Illustrated by the age- and sex-specific mortality rates in the US, France, and Japan, which contain 51 states, 95 departments, and 47 prefectures, respectively, the proposed model delivers more accurate point and interval forecasts in forecasting multi-population mortality than several benchmark methods.
... That factor model is only a particular case of the general (also called generalized) dynamic factor model introduced by Forni et al. (2000), where factors are loaded in a fully dynamic way via filters (see Hallin and Lippi (2013) for the advantages that generalized approach). An extension of the general dynamic factor model to the functional setting is, of course, highly desirable, but requires a more general representation result involving filters with operatorial coefficients and the concept of functional dynamic or harmonic principal components (Panaretos and Tavakoli, 2013;Hörmann et al., 2015). (iii) The problem of identifying the number of factors is only briefly considered here, and requires further attention. ...
Article
This paper is the second one in a set of two laying the theoretical foundations for a high‐dimensional functional factor model approach in the analysis of large cross‐sections (panels) of functional time series (FTS). Part I establishes a representation result by which, under mild assumptions on the covariance operator of the cross‐section, any FTS admits a canonical representation as the sum of a common and an idiosyncratic component; common components are driven by a finite and typically small number of scalar factors loaded via functional loadings, while idiosyncratic components are only mildly cross‐correlated. Building on that representation result, Part II is dealing with the identification of the number of factors, their estimation, the estimation of their loadings and the common components, and the resulting forecasts. We provide a family of information criteria for identifying the number of factors, and prove their consistency. We provide average error bounds for the estimators of the factors, loadings, and common components; our results encompass the scalar case, for which they reproduce and extend, under weaker conditions, well‐established similar results. Under slightly stronger assumptions, we also provide uniform bounds for the estimators of factors, loadings, and common components, thus extending existing scalar results. Our consistency results in the asymptotic regime where the number N of series and the number T of time points diverge thus extend to the functional context the “blessing of dimensionality” that explains the success of factor models in the analysis of high‐dimensional (scalar) time series. We provide numerical illustrations that corroborate the convergence rates predicted by the theory, and provide a finer understanding of the interplay between N and T for estimation purposes. We conclude with an application to forecasting mortality curves, where our approach outperforms existing methods.
... Very recently, the problem of investigating sequences of serially-dependent spherical random fields and their second-order structure has been also addressed using a functional time series approach in [5,8,6,7]. With functional time series analysis one usually refers to methods and techniques developed for collections of Hilbert-valued random variables indexed by integers, where the index is interpreted as time -see for instance [22,23]. In this framework, the assumption of spatial isotropy can be relaxed (see, e.g., [7]), while the hypothesis of temporal stationarity is usually preserved. ...
Preprint
Full-text available
In this paper, we introduce the concept of isotropic Hilbert-valued spherical random field, thus extending the notion of isotropic spherical random field to an infinite-dimensional setting. We then establish a spectral representation theorem and a functional Schoenberg's theorem. Following some key results established for the real-valued case, we prove consistency and quantitative central limit theorem for the sample power spectrum operators in the high-frequency regime.
... The term "Cramér-Karhunen-Loève" was coined in Panaretos and Tavakoli, 2013a. the amounts to give a rigorous meaning to the formulâ ...
Thesis
The analysis of electrical load curves collected by smart meters is a key step for many energy management tasks ranging from consumption forecasting and load monitoring to customers characterization and segmentation. In this context, researchers from EDF R&D are interested in extracting significant information from the daily electrical load curves in order to compare the consumption behaviors of different buildings. The strategy followed by the group which hosted my doctorate is to use physical and deterministic models based on information such as the room size, the insulating materials or weather data, or to extract hand-designed patterns from the electrical load curves based on the knowledge of experts. Given the growing amount of data collected, the interest of the group in statistical or data-driven methods has increased significantly in recent years. These approaches should provide new solutions capable of exploiting massive data without relying on expensive processing and expert knowledge. My work fits directly into this trend by proposing two modeling approaches: the first approach is based on functional time series and the second one is based on non-negative tensor factorization. This thesis is split into three main parts. In the first part, we present the industrial context and the practical objective of the thesis, as well as an exploratory analysis of the data and a discussion on the two modeling approaches proposed. In the second part, we follow the first modeling approach and provide a thorough study of the spectral theory for functional time series. Finally, the second modeling approach based on non-negative tensor factorization is presented in the third part.
... This enables the approximation of the curves, using orthogonal PCs which involve lagged observations, based on information available at nearby observations. Panaretos and Tavakoli (2013) proposed a similar idea for extending dynamic PCA to FTS, in which the reconstructed curves are based on a stochastic integration with respect to some orthogonal increment functional stochastic process. Both of these recent developments did not account for non-stationarity in the covariance structure over time in a FTS. ...
Article
Full-text available
Outgassing of carbon dioxide (CO 2_2 2 ) from river surface waters, estimated using partial pressure of dissolved CO 2_2 2 , has recently been considered an important component of the global carbon budget. However, little is still known about the high-frequency dynamics of CO 2_2 2 emissions in small-order rivers and streams. To analyse such highly dynamic systems, we propose a time-varying functional principal components analysis (FPCA) for non-stationary functional time series (FTS). This time-varying FPCA is performed in the frequency domain to investigate how the variability and auto-covariance structures in a FTS change over time. This methodology, and the associated proposed inference, enables investigation of the changes over time in the variability structure of the diurnal profiles of the partial pressure of CO 2_2 2 and identification of the drivers of those changes. By means of a simulation study, the performance of the time-varying dynamic FPCs is investigated under different scenarios of complete and incomplete FTS. Although the time-varying dynamic FPCA has been applied here to study the daily processes of consuming and producing CO 2_2 2 in a small catchment of the river Dee in Scotland, this methodology can be applied more generally to any dynamic time series.Supplementary materials accompanying this paper appear online.
... It allows the assessment of the eigenvalues associated to the eigenfunctions (extension of the Hilbert-Schmidt theorem [57]) of χ, according to its approximation based on the above basis decomposition. From the linear algebra, the resulting FPCA is expressed by X following the Karhunen-Loève decomposition [58] ...
Preprint
Full-text available
Computational modeling of the manufacturing process of Lithium-Ion Battery (LIB) composite electrodes based on mechanistic approaches, allows predicting the influence of manufacturing parameters on electrode properties. However, ensuring that the calculated properties match well with experimental data, is typically time and resources consuming In this work, we tackled this issue by proposing a functional data-driven framework combining Functional Principal Component Analysis and K-Nearest Neighbors algorithms. This aims first to recover the early numerical values of a mechanistic electrode manufacturing simulation to predict if the observable being calculated is prone to match or not, \textit{i.e} screening step. In a second step it recovers additional numerical values of the ongoing mechanistic simulation iterations to predict the mechanistic simulation result, \textit{i.e} forecasting step. We demonstrated this approach in context of LIB manufacturing through non-equilibrium molecular dynamics (NEMD) simulations, aiming to capture the rheological behavior of electrode slurries. We discuss in full details our novel methodology and we report that the expected mechanistic simulation results can be obtained 11 times faster with respect to running the complete mechanistic simulation, while being accurate enough from an experimental point of view, with a F1scoreF1_{score} equals to 0.90, and a Rscore2R^2_{score} equals to 0.96 for the learnings validation. This paves the way towards a powerful tool to drastically reduce the utilization of computational resources while running mechanistic simulations of battery manufacturing electrodes.
... Note that cumulants mixing conditions arise naturally in this context, since we are essentially dealing with moment-based estimators. See for instance [34,41] . ...
Preprint
Full-text available
We propose nonparametric estimators for the second-order central moments of spherical random fields within a functional data context. We consider a measurement framework where each field among an identically distributed collection of spherical random fields is sampled at a few random directions, possibly subject to measurement error. The collection of fields could be i.i.d. or serially dependent. Though similar setups have already been explored for random functions defined on the unit interval, the nonparametric estimators proposed in the literature often rely on local polynomials, which do not readily extend to the (product) spherical setting. We therefore formulate our estimation procedure as a variational problem involving a generalized Tikhonov regularization term. The latter favours smooth covariance/autocovariance functions, where the smoothness is specified by means of suitable Sobolev-like pseudo-differential operators. Using the machinery of reproducing kernel Hilbert spaces, we establish representer theorems that fully characterizing the form of our estimators. We determine their uniform rates of convergence as the number of fields diverges, both for the dense (increasing number of spatial samples) and sparse (bounded number of spatial samples) regimes. We moreover validate and demonstrate the practical feasibility of our estimation procedure in a simulation setting, assuming a fixed number of samples per field. Our numerical estimation procedure leverages the sparsity and second-order Kronecker structure of our setup to reduce the computational and memory requirements by approximately three orders of magnitude compared to a naive implementation would require.
... The asymptotic normality of the functional discrete Fourier transform of the curve data is previously proved, under suitable functional cumulant mixing conditions, and the summability in time of the trace norm of the elements of the covariance operator family (see also Tavakoli [71]). In Panaretos and Tavakoli [62], a Karhunen-Loéve-like decomposition in the temporal functional spectral domain is derived, the so-called Cramér-Karhunen-Loéve representation, providing a harmonic principal component analysis of functional time series (see also some recent applications in the context of functional regression in Pham and Panaretos [64], and Rubin and Panaretos [66]). In addition, Rubin and Panaretos [67] propose simulation techniques based on the Cramér-Karhunen-Loéve representation. ...
Preprint
COVID-19 incidence is analyzed at the provinces of some Spanish Communities during the period February-October, 2020. Two infinite-dimensional regression approaches are tested. The first one is implemented in the regression framework introduced in Ruiz-Medina, Miranda and Espejo (2019). Specifically, a bayesian framework is adopted in the estimation of the pure point spectrum of the temporal autocorrelation operator, characterizing the second-order structure of a surface sequence. The second approach is formulated in the context of spatial curve regression. A nonparametric estimator of the spectral density operator, based on the spatial periodogram operator, is computed to approximate the spatial correlation between curves. Dimension reduction is achieved by projection onto the empirical eigenvectors of the long-run spatial covariance operator. Cross-validation procedures are implemented to test the performance of the two functional regression approaches.
... In high-dimensional data analysis, dimension-reduction techniques, such as principal component analysis (PCA) (see Jolliffe and Cadima, 2016, for an internsive review) or factor analysis (Gaskin and Happell, 2014) are applied first. In modeling functional time series, to incorporate the temporal dependence within each curve, functional version of dynamic principal component analysis Panaretos and Tavakoli (2013a); ; Rice and Shang (2017) and functional version of factor models (Hays et al., 2012;Liebl et al., 2013;Jungbacker et al., 2014;Kokoszka et al., 2015;Martínez-Hernández et al., 2020) are adopted. ...
Preprint
Full-text available
This paper proposes a two-fold factor model for high-dimensional functional time series (HDFTS), which enables the modeling and forecasting of multi-population mortality under the functional data framework. The proposed model first decomposes the HDFTS into functional time series with lower dimensions (common feature) and a system of basis functions specific to different cross-sections (heterogeneity). Then the lower-dimensional common functional time series are further reduced into low-dimensional scalar factor matrices. The dimensionally reduced factor matrices can reasonably convey useful information in the original HDFTS. All the temporal dynamics contained in the original HDFTS are extracted to facilitate forecasting. The proposed model can be regarded as a general case of several existing functional factor models. Through a Monte Carlo simulation, we demonstrate the performance of the proposed method in model fitting. In an empirical study of the Japanese subnational age-specific mortality rates, we show that the proposed model produces more accurate point and interval forecasts in modeling multi-population mortality than those existing functional factor models. The financial impact of the improvements in forecasts is demonstrated through comparisons in life annuity pricing practices.
... There are two general ways in the literature to reduce dimensions and capture the temporal dependence among functions simultaneously. On the one hand, Panaretos and Tavakoli (2013), Hörmann et al. (2015), Rice and Shang (2017), and Martínez-Hernández et al. (2020) adopt a dynamic functional principal component analysis (FPCA) approach to incorporate temporal dependence in the long-run covariance function. Hays et al. (2012), Liebl (2013), Jungbacker et al. (2014), and Kokoszka et al. (2015) develop functional dynamic factor models, in which correlated functions are reduced to a smaller set of latent dynamic factors. ...
Article
Full-text available
Financial data (e.g., intraday share prices) are recorded almost continuously and thus take the form of a series of curves over the trading days. Those sequentially collected curves can be viewed as functional time series. When we have a large number of highly correlated shares, their intraday prices can be viewed as high-dimensional functional time series (HDFTS). In this paper, we propose a new approach to forecasting multiple financial functional time series that are highly correlated. The difficulty of forecasting high-dimensional functional time series lies in the “curse of dimensionality.” What complicates this problem is modeling the autocorrelation in the price curves and the comovement of multiple share prices simultaneously. To address these issues, we apply a matrix factor model to reduce the dimension. The matrix structure is maintained, as information contains in rows and columns of a matrix are interrelated. An application to the constituent stocks in the Dow Jones index shows that our approach can improve both dimension reduction and forecasting results when compared with various existing methods.
... These monographs, as well a great number of theory/methods papers assume that one can observe the complete sample path ( see, e.g., Dauxois et al. [3], Hall and Hosseini-Nasab [7], Hörmann and Kokoszka [11] [10], Hall and Horowitz [6], Panaretos and Tavakoli [13,14], Delaigle and Hall [4]). This "complete observation" framework encompasses processes that can be non-smooth, even nowhere differentiable, such as diffusion processes. ...
Preprint
Full-text available
Functional data are typically modeled as sampled paths of smooth stochastic processes in order to mitigate the fact that they are often observed discretely and noisily, occasionally irregularly and sparsely. The required smoothness allows for the use of smoothing techniques but excludes many stochastic processes, most notably diffusion processes. Such processes would otherwise be well within the realm of functional data analysis, at least under complete observation. In this short note we demonstrate that a simple modification of existing methods allows for the functional data analysis of processes with nowhere differentiable sample paths, even when these are discretely and noisily observed, including under irregular and sparse designs. By way of simulation it is shown that this is not a theoretical curiosity, but can work well in practice, hinting at potential closer links with the field of diffusion inference.
... that is, the Y ,m 's are eigenfunctions of R t and the C (t)'s are the associated eigenvalues [see Michel (2013), Theorem 7.3 and Corollary 7.4]. Consequently, the autocovariance kernels can be expanded in L 2 (S 2 × S 2 ; C) as ...
Article
Full-text available
In this paper, we focus on isotropic and stationary sphere-cross-time random fields. We first introduce the class of spherical functional autoregressive-moving average processes (SPHARMA), which extend in a natural way the spherical functional autoregressions (SPHAR) recently studied in Caponera and Marinucci (Ann Stat 49(1):346–369, 2021) and Caponera et al. (Stoch Process Appl 137:167–199, 2021); more importantly, we then show that SPHAR and SPHARMA processes of sufficiently large order can be exploited to approximate every isotropic and stationary sphere-cross-time random field, thus generalizing to this infinite-dimensional framework some classical results on real-valued stationary processes. Further characterizations in terms of functional spectral representation theorems and Wold-like decompositions are also established.
Preprint
We consider nonparametric estimation of a covariance function on the unit square, given a sample of discretely observed fragments of functional data. When each sample path is only observed on a subinterval of length δ<1\delta<1, one has no statistical information on the unknown covariance outside a δ\delta-band around the diagonal. The problem seems unidentifiable without parametric assumptions, but we show that nonparametric estimation is feasible under suitable smoothness and rank conditions on the unknown covariance. This remains true even when observation is discrete, and we give precise deterministic conditions on how fine the observation grid needs to be relative to the rank and fragment length for identifiability to hold true. We show that our conditions translate the estimation problem to a low-rank matrix completion problem, construct a nonparametric estimator in this vein, and study its asymptotic properties. We illustrate the numerical performance of our method on real and simulated data.
Preprint
We consider infinite-dimensional Hilbert space-valued random variables that are assumed to be temporal dependent in a broad sense. We prove a central limit theorem for the moving block bootstrap and for the tapered block bootstrap, and show that these block bootstrap procedures also provide consistent estimators of the long run covariance operator. Furthermore, we consider block bootstrap-based procedures for fully functional testing of the equality of mean functions between several independent functional time series. We establish validity of the block bootstrap methods in approximating the distribution of the statistic of interest under the null and show consistency of the block bootstrap-based tests under the alternative. The finite sample behaviour of the procedures is investigated by means of simulations. An application to a real-life dataset is also discussed.
Preprint
Within the framework of functional data analysis, we develop principal component analysis for periodically correlated time series of functions. We define the components of the above analysis including periodic, operator-valued filters, score processes and the inversion formulas. We show that these objects are defined via convergent series under a simple condition requiring summability of the Hilbert-Schmidt norms of the filter coefficients, and that they poses optimality properties. We explain how the Hilbert space theory reduces to an approximate finite-dimensional setting which is implemented in a custom build R package. A data example and a simulation study show that the new methodology is superior to existing tools if the functional time series exhibit periodic characteristics.
Article
Functional data have been gaining increasing popularity in the field of time series analysis. However, so far modeling heterogeneous multivariate functional time series remains a research gap. To fill it, this paper proposes a time-varying functional state space model (TV-FSSM). It uses functional decomposition to extract features of the functional observations, where the decomposition coefficients are regarded as latent states that evolve according to a tensor autoregressive model. This two-layer structure can on the one hand efficiently extract continuous functional features, and on the other provide a flexible and generalized description of data heterogeneity among different time points. An expectation maximization (EM) framework is developed for parameter estimation, where regularization and constraints are incorporated for better model interoperability. As the sample size grows, an incremental learning version of the EM algorithm is given to efficiently update the model parameters. Some model properties, including model identifiability conditions, convergence issues, time complexities, and bounds of its one-step-ahead prediction errors, are also presented. Extensive experiments on both real and synthetic datasets are performed to evaluate the predictive accuracy and efficiency of the proposed framework.
Article
Full-text available
Functional data analysis (FDA) is a statistical framework that allows for the analysis of curves, images, or functions on higher dimensional domains. The goals of FDA, such as descriptive analyses, classification, and regression, are generally the same as for statistical analyses of scalar-valued or multivariate data, but FDA brings additional challenges due to the high- and infinite dimensionality of observations and parameters, respectively. This paper provides an introduction to FDA, including a description of the most common statistical analysis techniques, their respective software implementations, and some recent developments in the field. The paper covers fundamental concepts such as descriptives and outliers, smoothing, amplitude and phase variation, and functional principal component analysis. It also discusses functional regression, statistical inference with functional data, functional classification and clustering, and machine learning approaches for functional data analysis. The methods discussed in this paper are widely applicable in fields such as medicine, biophysics, neuroscience, and chemistry and are increasingly relevant due to the widespread use of technologies that allow for the collection of functional data. Sparse functional data methods are also relevant for longitudinal data analysis. All presented methods are demonstrated using available software in R by analyzing a dataset on human motion and motor control. To facilitate the understanding of the methods, their implementation, and hands-on application, the code for these practical examples is made available through a code and data supplement and on GitHub.
Article
We consider time series with a seasonal component that varies randomly in length and shape. The shape parameters of the seasonal process, as well as the noise component, are stationary and exhibit long-range dependence. A functional limit theorem for the estimated parameter process leads to asymptotic inference under suitable conditions on the observational grid. The model is motivated by a study of the effect of body positioning on respiratory muscles during weaning (Walterspacher et al. 2017). ARTICLE HISTORY
Article
Full-text available
In this work, we develop two stable estimators for solving linear functional regression problems. It is well known that such a problem is an ill-posed stochastic inverse problem. Hence, a special interest has to be devoted to the stability issue in the design of an estimator for solving such a problem. Our proposed estimators are based on combining a stable least-squares technique and a random projection of the slope function β0(·)∈L2(J),β0()L2(J),\beta _0(\cdot )\in L^2(J), where J is a compact interval. Moreover, these estimators have the advantage of having a fairly good convergence rate with reasonable computational load, since the involved random projections are generally performed over a fairly small dimensional subspace of L2(J).L2(J).L^2(J). More precisely, the first estimator is given as a least-squares solution of a regularized minimization problem over a finite dimensional subspace of L2(J).L2(J).L^2(J). In particular, we give an upper bound for the empirical risk error as well as the convergence rate of this estimator. The second proposed stable LFR estimator is based on combining the least-squares technique with a dyadic decomposition of the i.i.d. samples of the stochastic process, associated with the LFR model. In particular, we provide an L2L2L^2-risk error of this second LFR estimator. Finally, we provide some numerical simulations on synthetic as well as on real data that illustrate the results of this work. These results indicate that our proposed estimators are competitive with some existing and popular LFR estimators.
Article
Full-text available
This paper addresses the estimation of the second-order structure of a manifold cross-time random field (RF) displaying spatially varying Long Range Dependence (LRD), adopting the functional time series framework introduced in Ruiz-Medina (Fract Calc Appl Anal 25:1426–1458, 2022). Conditions for the asymptotic unbiasedness of the integrated periodogram operator in the Hilbert–Schmidt operator norm are derived beyond structural assumptions. Weak-consistent estimation of the long-memory operator is achieved under a semiparametric functional spectral framework in the Gaussian context. The case where the projected manifold process can display Short Range Dependence (SRD) and LRD at different manifold scales is also analyzed. The performance of both estimation procedures is illustrated in the simulation study, in the context of multifractionally integrated spherical functional autoregressive–moving average (SPHARMA(p,q)) processes.
Article
Full-text available
We propose a broad class of models for time series of curves (functions) that can be used to quantify near long‐range dependence or near unit root behavior. We establish fundamental properties of these models and rates of consistency for the sample mean function and the sample covariance operator. The latter plays a role analogous to sample crosscovariances for multivariate time series, but is far more important in the functional setting because its eigenfunctions are used in principal component analysis, which is a major tool in functional data analysis. It is used for dimension reduction of feature extraction. We also establish a central limit theorem for functions following our model. Both the consistency rates and the normalizations in the CLT are nonstandard. They reflect the local unit root behavior and the long memory structure at moderate lags.
Article
Full-text available
The computational simulation of the manufacturing process of lithium-ion battery composite electrodes based on mechanistic models allows capturing the influence of manufacturing parameters on electrode properties. However, ensuring that these properties match with experimental data is typically computationally expensive. In this work, we tackled this costly procedure by proposing a functional data-driven framework, aiming first to retrieve the early numerical values calculated from a molecular dynamics simulation to predict if the observable being calculated is prone to match with our range of experimental values, and in a second step, recover additional values of the ongoing simulation to predict its final result. We demonstrated this approach in the context of the calculation of electrode slurries viscosities. We report that for various electrode chemistries, the expected mechanistic simulation results can be obtained 11 times faster with respect to the complete simulations, while being accurate with a Rscore2 equals to 0.96.
Article
Breathing effort in mechanical ventilation is commonly estimated by airway pressure. More advanced methods involve transdiaphragmatic pressure measurements (Pdi) or surface electromyography (sEMG) of the respiratory muscles. To study whether inspi-ratory efforts may be predicted by the noninvasive sEMG method, a model is proposed for time series with a stochastically changing periodic component. The model can be interpreted as a functional time series or a process based on a state space representation, with a flexible temporal dependence structure in the parameter process and the residuals, including long memory, short memory or antipersistence. An application to Pdi and sEMG measurements shows the potential usefulness of the method in the context of monitoring patients undergoing mechanical ventilation.
Article
COVID–19 incidence is analyzed at the provinces of the Spanish Communities in the Iberian Peninsula during the period February–October, 2020. Two infinite–dimensional regression approaches, surface regression and spatial curve regression, are proposed. In the first one, Bayesian maximum a posteriori (MAP) estimation is adopted in the approximation of the pure point spectrum of the temporal regression residual autocorrelation operator. Thus, an alternative to the moment–based estimation methodology developed in Ruiz–Medina, Miranda and Espejo (2019) is derived. Additionally, spatial curve regression is considered. A nonparametric estimator of the spectral density operator, based on the spatial periodogram operator, is computed to approximate the spatial correlation between curves. Dimension reduction is achieved by projection onto the empirical eigenvectors of the long–run spatial covariance operator. Cross–validation procedures are implemented to test the performance of the two functional regression approaches.
Article
Full-text available
We develop the basic building blocks of a frequency domain framework for drawing statistical inferences on the second-order structure of a stationary sequence of functional data. The key element in such a context is the spectral density operator, which generalises the notion of a spectral density matrix to the functional setting, and characterises the second-order dynamics of the process. Our main tool is the functional Discrete Fourier Transform (fDFT). We derive an asymptotic Gaussian representation of the fDFT, thus allowing the transformation of the original collection of dependent random functions into a collection of approximately independent complex-valued Gaussian random functions. Our results are then employed in order to construct estimators of the spectral density operator based on smoothed versions of the periodogram kernel, the functional generalisation of the periodogram matrix. The consistency and asymptotic law of these estimators are studied in detail. As immediate consequences, we obtain central limit theorems for the mean and the long-run covariance operator of a stationary functional time series. Our results do not depend on structural modelling assumptions, but only functional versions of classical cumulant mixing conditions, and are shown to be stable under discrete observation of the individual curves.
Article
Full-text available
This paper reviews recent research on dependent functional data. After providing an introduction to functional data analysis, we focus on two types of dependent functional data structures: time series of curves and spatially distributed curves. We review statistical models, inferential methodology, and possible extensions. The paper is intended to provide a concise introduction to the subject with plentiful references. 1. Introduction Functional data analysis (FDA) is a relatively new branch of statistics, going back to the early 1990s, but its mathematical foundations are rooted in much earlier developments in the theory of operators in a Hilbert space and the functional analysis. In the most basic setting, the sample consists of curves . The set is typically an interval of the line. In increasingly many applications, it is however a subset of the plane, or a sphere, or even a 3D region. In those cases, the data are surfaces over a region, or more general functions over some domain, hence the term functional data. This survey is concerned mostly with the analysis of curves, but some references to more general functions are given in Section 1.1. Functional data are high-dimensional data, as, in a statistical model, each functions consists of infinitely many values . In traditional statistics, the data consist of a sample of scalars or vectors. For example, for each survey participant, we may record age, gender, income, and education level. The data point thus has dimension four; it is a vector with quantitative and categorical entries. High-dimensional data typically have dimension comparable to or larger than the sample size. As they are often analyzed using regression models in which the sample size is denoted by and the number of explanatory variables by , high-dimensional data often fall into the “large , small ” paradigm, but clearly they form a much broader class, with a great deal of work focusing on covariance matrices based on a sample of -dimensional vectors. A distinctive feature of functional data is that the curves or surfaces are assumed to be smooth in some sense; if is close , the values and should be similar. In the “large , small ” paradigm, there need not be any natural ordering of the covariates or any natural measure of distance between them. The analysis often focuses on the selection of a small number of relevant covariates (the variable selection problem). In the FDA, the analysis involves obtaining a smooth, low dimensional representation of each curve. FDA views each curve in a sample as a separate statistical object. In this sense, FDA is part of the object data analysis in which data points are not scalars or vectors, but structures which are modeled by complex mathematical objects, for example, by graphs. Some references are given in Section 1.1. However, even curves are far more complicated structures than scalars or vectors. The curves are characterized not only by magnitude but also by shape. The shape of a random curve plays a role analogous to the dependence between the coordinates of a random vector. Human growth curves provide a well-known example. Suppose that there are randomly selected subjects of the same gender. Let be the height of the th subject measured at time from birth. The points are different for different subjects. Using methods of FDA, we can construct continuous and differentiable curves . The shapes and magnitudes of these curves give us an idea about the variability in the process of growth, rather just about the variability of the final height, which can be assessed using the scalars . Some data can be very naturally viewed as curves. For example, if the height measurements are available at a fairly regular and sufficiently dense grid of times , it is easy to visualize them as curves, even though it is not immediately obvious how to compute derivatives of such curves. In many situations, the points are extremely dense. For example, physical instruments may return an observation every five seconds, so in a day, we will have 17,280 values . A day is a natural time domain in many applications, and the problem is to replace the 17,280 values available in day by a smaller more manageable set of numbers. This is generally possible due to the assumption of some smoothness. At the other extreme are sparse longitudinal data. Such data often arise in medical research. For example, a measurement can be made on a patient only several times during the course of treatment. Yet we know that the quantity that is measured exists at any time, so it is a curve that is observed only at a few sparse time points. References to the relevant functional methodology are given in Section 1.1. Growth curves or sparse observations on a sample of patients can be viewed as independent curves drawn from a population of interest. A large body of research in FDA has been motivated by various problems arising in such a setting. At the same time, many functional data sets, most notably in physical and environmental sciences, arise from long records of observations. An example is presented in Figure 1 which shows seven consecutive functional observations (curves). These curves show a very rough periodic pattern, but modeling periodicity is difficult, as this pattern is, in fact, severely disturbed several times a month due to ionospheric storms. The 24 h period must however enter into any statistical model as it is caused by the rotation of the Earth. It is thus natural in this context to treat the long continuous record as consisting of consecutive curves, each defined over a 24 h time interval. Space physics researchers have long been associating the occur enhancements on a given day with physical phenomena in near Earth space. This gives additional support to treating these data as a time series of curves of evolving shape, which we will call a functional time series. Similar functional series arise, for example, in urban pollution monitoring studies.
Article
Full-text available
This paper describes a technique for principal components analysis of data consisting ofn functions each observed atp argument values. This problem arises particularly in the analysis of longitudinal data in which some behavior of a number of subjects is measured at a number of points in time. In such cases information about the behavior of one or more derivatives of the function being sampled can often be very useful, as for example in the analysis of growth or learning curves. It is shown that the use of derivative information is equivalent to a change of metric for the row space in classical principal components analysis. The reproducing kernel for the Hilbert space of functions plays a central role, and defines the best interpolating functions, which are generalized spline functions. An example is offered of how sensitivity to derivative information can reveal interesting aspects of the data.
Article
Full-text available
Linear processes on functional spaces were born about fifteen years ago. And this original topic went through the same fast development as the other areas of functional data modeling such as PCA or regression. They aim at generalizing to random curves the classical ARMA models widely known in time series analysis. They offer a wide spectrum of models suited to the statistical inference on continuous time stochastic processes within the paradigm of functional data. Essentially designed to improve the quality and the range of prediction, they give birth to challenging theoretical and applied problems. We propose here a state of the art which emphasizes recent advances and we present some promising perspectives based on our experience in this area.
Article
This paper reviews recent research on dependent functional data. After providing an introduction to functional data analysis, we focus on two types of dependent functional data structures: time series of curves and spatially distributed curves. We review statistical models, inferential methodology, and possible extensions. The paper is intended to provide a concise introduction to the subject with plentiful references.
Article
Let X⊂Rn and le;……race class operator on L2(X) with corresponding kernel K(x, y) ∊ L2(X ⨯ X). An integral formula for tr K, proven by Duflo for continuous kernels, is generalized for arbitrary trace class kernels. This formula is shown to be equivalent to one involving the factorization o;…nt;…roduct of Hilbert-Schmidt operators. The formula and its derivation yield two new necessary conditions for traceability o;…ilbert-Schmidt kernel, and these conditions are also shown to be sufficient for positive operators. The proofs make use of the boundedness of the Hardy- Littlewood maximal function on L2(Rn).
Article
We develop methods for the analysis of a collection of curves which are stochastically modelled as independent realizations of a random function with an unknown mean and covariance structure. We propose a method of estimating the mean function non‐parametrically under the assumption that it is smooth. We suggest a variant on the usual form of cross‐validation for choosing the degree of smoothing to be employed. This method of cross‐validation, which consists of deleting entire sample curves, has the advantage that it does not require that the covariance structure be known or estimated. In the estimation of the covariance structure, we are primarily concerned with models in which the first few eigenfunctions are smooth and the eigenvalues decay rapidly, so that the variability is predominantly of large scale. We propose smooth nonparametric estimates of the eigenfunctions and a suitable method of cross‐validation to determine the amount of smoothing. Our methods are applied to data on the gaits of a group of 5‐year‐old children.
Article
1. In generalized harmonic analysis as developed by Wiener [7,8] and Bochner [1] we are concerned with a measurable complex-valued function f(t) of the real variable t (which may be thought of as representing time), and it is assumed that the limit φ(t)=limx12TTTf(τ+t)f(τ) dτ \varphi \left( t \right) = \mathop {\lim }\limits_{x \to \infty } \frac{1}{{2T}}\int_{ - T}^T {f\left( {\tau + t} \right)} \overline {f\left( \tau \right)} {\text{ }}d\tau (1) exists for all real t. If, in addition, ϕ(t) is continuous at the particular point t = 0, it is continuous for all real t and may be represented by a Fourier Stieltjes integral φ(t)=eitx dΦ(x) \varphi \left( t \right) = \int_{ - \infty }^\infty {{e^{itx}}} {\text{ }}d\Phi \left( x \right) (2) where Φ(x) is real, bounded and never decreasing.
Article
PreliminariesMeasures with finite variationσ-additive measuresMeasures with finite semivariationIntegration with respect to a measure with finite semivariationStrong additivityExtension of measuresApplications
Article
This book offers a predominantly theoretical coverage of statistical prediction, with some potential applications discussed, when data and/ or parameters belong to a large or infinite dimensional space. It develops the theory of statistical prediction, non-parametric estimation by adaptive projection - with applications to tests of fit and prediction, and theory of linear processes in function spaces with applications to prediction of continuous time processes. This work is in the Wiley-Dunod Series co-published between Dunod (www.dunod.com) and John Wiley and Sons, Ltd.
Article
This book will be most useful to applied mathematicians, communication engineers, signal processors, statisticians, and time series researchers, both applied and theoretical. Readers should have some background in complex function theory and matrix algebra and should have successfully completed the equivalent of an upper division course in statistics.
Article
Let X⊂ℝ n and let K be a trace class operator on L 2 (X) with corresponding kernel K(x,y)∈L 2 (X×X). An integral formula for tr K, proven by Duffo for continuous kernels, is generalized for arbitrary trace class kernels. This formula is shown to be equivalent to one involving the factorization of K into a product of Hilbert-Schmidt operators. The formula and its derivation yield two new necessary conditions for traceability of a Hilbert-Schmidt kernel, and these conditions are also shown to be sufficient for positive operators. The proofs make use of the boundedness of the Hardy-Littlewood maximal function on L 2 (ℝ n ).
Book
professional skill in stochastic calculus and its application to problems in finance. Students are expected to have had some graduate-level experience with probability and real analysis. The course tries to attend to modeling issues, but much of our e orts are focused on mathematical foundations. Students who get the most out of the course are those with an appreciation, or even a appetite, for mathematical proofs and problem solving. Course Plan: The course begins with simple random walk and the analy-sis of gambling games. This material is used to motivate the theory of martingales, and, after reaching a decent level of confidence with discrete processes, the course takes up the more demanding development of continu-ous time stochastic process, especially Brownian motion. The construction of Brownian motion is given in detail, and enough material on the subtle properties of Brownian paths is developed so that the student should evolve a good sense of when intuition can be trusted and when it cannot. The course then takes up the It?o integral and aims to provide a development that is honest and complete without being pedantic. With the It?o integral in hand, the course focuses more on models. Sto-chastic processes of importance in Finance and Economics are developed in concert with the tools of stochastic calculus that are needed in order to solve problems of practical importance. The financial notion of replication is de-veloped, and the Black-Scholes PDE is derived by three di erent methods. The course then introduces enough of the theory of the di usion equation to be able to solve the Black-Scholes PDE and prove the uniqueness of the solution. The foundations for the martingale theory of arbitrage pricing are then prefaced by a well motivated development of the martingale represen-tation theorems and Girsanov theory. Arbitrage pricing is then revisited and the notions of attainability and completeness are developed in order to give a clear view of the fundamental formula for the pricing of contingent claims. Texts: Stochastic Calculus and Financial Applications, J. M. Steele, (Springer, 2003). Homework: Regular homework will be assigned and solutions will be pro-vided, but homework will not count directly toward the grade. Some prob-lems on the Grading: Grades are based on a take-home midterm (30%) and a take-home final exam (70%) (Please the course webpage for further details, including a past exam, with solutions. ) Auditors: Auditors are welcome. O ce Hours: Monday 3:00-4:00 Wednesday 4:15-5:15 (Please see Steele's Home Page for contact information)
Article
We develop methods for the analysis of a collection of curves which are stochastically modelled as independent realizations of a random function with an unknown mean and covariance structure. We propose a method of estimating the mean function non-parametrically under the assumption that it is smooth. We suggest a variant on the usual form of cross-validation for choosing the degree of smoothing to be employed. This method of cross-validation, which consists of deleting entire sample curves, has the advantage that it does not require that the covariance structure be known or estimated. In the estimation of the covariance structure, we are primarily concerned with models in which the first few eigenfunctions are smooth and the eigenvalues decay rapidly, so that the variability is predominantly of large scale. We propose smooth nonparametric estimates of the eigen-functions and a suitable method of cross-validation to determine the amount of smoothing. Our methods are applied to data on the gaits of a group of 5-year-old children.
Article
Contents: Models; Observables; The general problem; Testing hypotheses; The singular case; Linear estimation, Quadratic estimation; Markov processes, Pattern recognition; Research problems: inference in processes.
Article
we extend the theory of principal components to random variabies X with values in a separable HILBERT space and prove optima! properties well-known for finite-dimen-sional spaces, Further we give an estimate of the variance operator D from a series of independent, observations and prove the strong consistency if the estimate for nuclear D.In the case D has single eigenvalues we find that with probability 1 the limiting properties of the sample principal components computed from such an estimate of D are those of the exact principal components.
Article
Functional data analysis, or FDA, is a relatively new and rapidly growing area of statistics. A substantial part of the interest in the field derives from new types of data that are generated through the application of new technologies. Statistical methodologies, such as linear regression, which are effectively finite-dimensional in conventional statistical settings, become infinite-dimensional in the context of functional data. As a result, the convergence rates of estimators based on functional data can be relatively slow, and so there is substantial interest in methods for dimension reduction, such as principal components analysis (PCA). However, although the statistical development of PCA for FDA has been underway for approximately two decades, relatively high-order theoretical arguments have been largely absent. This makes it difficult to assess the impact that, for example, eigenvalue spacings have on properties of eigenvalue estimators, or to develop concise first-order limit theory for linear functional regression. This paper shows how to overcome these hurdles. It develops rigorous arguments that underpin stochastic expansions of estimators of eigenvalues and eigenfunctions, and shows how to use them to answer statistical questions. The theory is based on arguments from operator theory, made more challenging by the requirement of statisticians that closeness of functions be measured in the L∞, rather than L2, metric. The statistical implications of the properties we develop have been discussed elsewhere, but the theoretical arguments that lie behind them have not been presented before.
Article
This study deals with the simultaneous nonparametric estimations of n curves or observations of a random process corrupted by noise in which sample paths belong to a nite dimension functional subspace. The estimation, by means of B-splines, leads to a new kind of functional principal components analysis. Asymptotic rates of convergence are given for the mean and the eigenelements of the empirical covariance operator. Heuristic arguments show that a well chosen smoothing parameter may improve the estimation of the subspace which contains the sample path of the process. Finally, simulations suggest that the estimation method studied here is advantageous when there are a small number of design points.
Article
  This work proposes an extension of the functional principal components analysis (FPCA) or Karhunen–Loève expansion, which can take into account non-parametrically the effects of an additional covariate. Such models can also be interpreted as non-parametric mixed effect models for functional data. We propose estimators based on kernel smoothers and a data-driven selection procedure of the smoothing parameters based on a two-step cross-validation criterion. The conditional FPCA is illustrated with the analysis of a data set consisting of egg laying curves for female fruit flies. Convergence rates are given for estimators of the conditional mean function and the conditional covariance operator when the entire curves are collected. Almost sure convergence is also proven when one observes discretized noisy sample paths only. A simulation study allows us to check the good behaviour of the estimators.
Chapter
When either data or the models for them involve functions, and when only weak assumptions about these functions such as smoothness are permitted, familiar statistical methods must be modified and new approaches developed in order to take advantage of this smoothness. The first part of the article considers some general issues such as characteristics of functional data, uses of derivatives in functional modelling, estimation of phase variation by the alignment or registration of curve features, the nature of error, and so forth. The second section describes functional versions of traditional methods such principal components analysis and linear modelling, and also mentions purely functional approaches that involve working with and estimating differential equations in the functional data analysis process.
Article
From the results of convergence by sampling in linear principal component analysis (of a random function in a separable Hilbert space), the limiting distribution is given for the principal values and the principal factors. These results can be explicitly written in the normal case. Some applications to statistical inference are investigated.
Chapter
Most statistical analyses involve one or more observations taken on each of a number of individuals in a sample, with the aim of making inferences about the general population from which the sample is drawn. In an increasing number of fields, these observations are curves or images. Curves and images are examples of functions, since an observed intensity is available at each point on a line segment, a portion of a plane, or a volume. For this reason, we call observed curves and images ‘functional data,’ and statistical methods for analyzing such data are described by the term ‘functional data analysis.’ It is the smoothness of the processes generating functional data that differentiates this type of data from more classical multivariate observations. This smoothness means that we can work with the information in the derivatives of functions or images. This article includes several illustrative examples.
Article
We consider the sampling problem for functional PCA (fPCA), where the simplest example is the case of taking time samples of the underlying functional components. More generally, we model the sampling operation as a continuous linear map from H\mathcal{H} to Rm\mathbb{R}^m, where the functional components to lie in some Hilbert subspace H\mathcal{H} of L2L^2, such as a reproducing kernel Hilbert space of smooth functions. This model includes time and frequency sampling as special cases. In contrast to classical approach in fPCA in which access to entire functions is assumed, having a limited number m of functional samples places limitations on the performance of statistical procedures. We study these effects by analyzing the rate of convergence of an M-estimator for the subspace spanned by the leading components in a multi-spiked covariance model. The estimator takes the form of regularized PCA, and hence is computationally attractive. We analyze the behavior of this estimator within a nonasymptotic framework, and provide bounds that hold with high probability as a function of the number of statistical samples n and the number of functional samples m. We also derive lower bounds showing that the rates obtained are minimax optimal.
Article
This paper is concerned with inference based on the mean function of a functional time series, which is defined as a collection of curves obtained by splitting a continuous time record, e.g. into daily or annual curves. We develop a normal approximation for the functional sample mean, and then focus on the estimation of the asymptotic variance kernel. Using these results, we develop and asymptotically justify a testing procedure for the equality of means in two functional samples exhibiting temporal dependence. Evaluated by means of a simulations study and application to real data sets, this two sample procedure enjoys good size and power in finite samples. We provide the details of its numerical implementation.
Article
The principal components analysis of functional data is often enhanced by the use of smoothing. It is shown that an attractive method of incorporating smoothing is to replace the usual L2L^2-orthonormality constraint on the principal components by orthonormality with respect to an inner product that takes account of the roughness of the functions. The method is easily implemented in practice by making use of appropriate function transforms (Fourier transforms for periodic data) and standard principal components analysis programs. Several alternative possible interpretations of the smoothed principal components as obtained by the method are presented. Some theoretical properties of the method are discussed: the estimates are shown to be consistent under appropriate conditions, and asymptotic expansion techniques are used to investigate their bias and variance properties. These indicate that the form of smoothing proposed is advantageous under mild conditions, indeed milder than those for existing methods of smoothed functional principal components analysis. The choice of smoothing parameter by cross-validation is discussed. The methodology of the paper is illustrated by an application to a biomechanical data set obtained in the study of the behaviour of the human thumb-forefinger system.
Article
The autoregressive model in a Banach space (ARB) contains many continuous time processes used in practice, for example, processes that satisfy linear stochastic differential equations of order k, a very particular case being the Ornstein–Uhlenbeck process. In this paper we study empirical estimators for ARB processes. In particular we show that, under some regularity conditions, the empirical mean is asymptotically optimal with respect to a.s. convergence and convergence of order 2. Limit in distribution and the law of the iterated logarithm are also presented. Concerning the empirical covariance operator we note that, if (X n, n ∈ &Zopf;) is ARB then (X n ⊗ X n, n ∈ &Zopf;) is AR in a suitable space of linear operators. This fact allows us to interpret the empirical covariance operator as a sample mean of an AR and to derive similar results for it.
Article
We prove that, for the main modes of stochastic convergence (law of large numbers, CLT, devia- tions principles, LIL) asymptotic results for selfadjoint random operators yield equivalent results for their eigenvalues and associated projectors. Statistical applications are mentioned.
Article
Let "..." be a linear process with values in a Hilbert space H. We prove a central limit theorem for the vector of empirical covariance operators of the random variables X at orders 0 to h in the space of Hilbert-Schmidt operators.
Article
Functional data analysis is intrinsically infinite dimensional; functional principal component analysis reduces dimension to a finite level, and points to the most significant components of the data. However, although this technique is often discussed, its properties are not as well understood as they might be. We show how the properties of functional principal component analysis can be elucidated through stochastic expansions and related results. Our approach quantifies the errors that arise through statistical approximation, in successive terms of orders "n"-super- - 1/2, "n"-super- - 1, "n"-super- - 3/2, …, where "n" denotes sample size. The expansions show how spacings among eigenvalues impact on statistical performance. The term of size "n"-super- - 1/2 illustrates first-order properties and leads directly to limit theory which describes the dominant effect of spacings. Thus, for example, spacings are seen to have an immediate, first-order effect on properties of eigenfunction estimators, but only a second-order effect on eigenvalue estimators. Our results can be used to explore properties of existing methods, and also to suggest new techniques. In particular, we suggest bootstrap methods for constructing simultaneous confidence regions for an infinite number of eigenvalues, and also for individual eigenvalues and eigenvectors. Copyright 2006 Royal Statistical Society.
Article
If a problem in functional data analysis is low dimensional then the methodology for its solution can often be reduced to relatively conventional techniques in multivariate analysis. Hence, there is intrinsic interest in assessing the finite dimensionality of functional data. We show that this problem has several unique features. From some viewpoints the problem is trivial, in the sense that continuously distributed functional data which are exactly finite dimensional are immediately recognizable as such, if the sample size is sufficiently large. However, in practice, functional data are almost always observed with noise, for example, resulting from rounding or experimental error. Then the problem is almost insolubly difficult. In such cases a part of the average noise variance is confounded with the true signal and is not identifiable. However, it is possible to define the unconfounded part of the noise variance. This represents the best possible lower bound to all potential values of average noise variance and is estimable in low noise settings. Moreover, bootstrap methods can be used to describe the reliability of estimates of unconfounded noise variance, under the assumption that the signal is finite dimensional. Motivated by these ideas, we suggest techniques for assessing the finiteness of dimensionality. In particular, we show how to construct a critical point such that, if the distribution of our functional data has fewer than "q" - 1 degrees of freedom, then we should be willing to assume that the average variance of the added noise is at least . If this level seems too high then we must conclude that the dimension is at least "q" - 1. We show that simpler, more conventional techniques, based on hypothesis testing, are gene
Article
The use of principal component methods to analyze functional data is appropriate in a wide range of different settings. In studies of ``functional data analysis,'' it has often been assumed that a sample of random functions is observed precisely, in the continuum and without noise. While this has been the traditional setting for functional data analysis, in the context of longitudinal data analysis a random function typically represents a patient, or subject, who is observed at only a small number of randomly distributed points, with nonnegligible measurement error. Nevertheless, essentially the same methods can be used in both these cases, as well as in the vast number of settings that lie between them. How is performance affected by the sampling plan? In this paper we answer that question. We show that if there is a sample of n functions, or subjects, then estimation of eigenvalues is a semiparametric problem, with root-n consistent estimators, even if only a few observations are made of each function, and if each observation is encumbered by noise. However, estimation of eigenfunctions becomes a nonparametric problem when observations are sparse. The optimal convergence rates in this case are those which pertain to more familiar function-estimation settings. We also describe the effects of sampling at regularly spaced points, as opposed to random points. In particular, it is shown that there are often advantages in sampling randomly. However, even in the case of noisy data there is a threshold sampling rate (depending on the number of functions treated) above which the rate of sampling (either randomly or regularly) has negligible impact on estimator performance, no matter whether eigenfunctions or eigenvectors are being estimated.
Inference for Functional Data with Applications
  • L Horváth
  • P Kokoszka
L. Horváth, P. Kokoszka, Inference for Functional Data with Applications, in: Springer Series in Statistics, Springer, New York, NY, 2012.
Estimation d’opérateurs de corrélation de processus linéaires fonctionnels: lois limites, déviations modérées
  • A Mas
A. Mas, Estimation d'opérateurs de corrélation de processus linéaires fonctionnels: lois limites, déviations modérées, Ph.D. Thesis, Université Paris VI, 2000.
  • K Karhunen
K. Karhunen, ¨ Uber lineare Methoden in der Wahrscheinlichkeitsrechnung, Annales Academiae Scientiarum Fennicae. Series A I (37) (1947) 79.
Estimation of the mean of functional time series and a two-sample problem
  • Horváth