Article

Robust scalar-on-function partial quantile regression

Taylor & Francis
Journal of Applied Statistics
Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Compared with the conditional mean regression-based scalar-on-function regression model, the scalar-on-function quantile regression is robust to outliers in the response variable. However, it is susceptible to outliers in the functional predictor (called leverage points). This is because the influence function of the regression quantiles is bounded in the response variable but unbounded in the predictor space. The leverage points may alter the eigenstructure of the predictor matrix, leading to poor estimation and prediction results. This study proposes a robust procedure to estimate the model parameters in the scalar-on-function quantile regression method and produce reliable predictions in the presence of both outliers and leverage points. The proposed method is based on a functional partial quantile regression procedure. We propose a weighted partial quantile covariance to obtain functional partial quantile components of the scalar-on-function quantile regression model. After the decomposition, the model parameters are estimated via a weighted loss function, where the robustness is obtained by iteratively reweighting the partial quantile components. The estimation and prediction performance of the proposed method is evaluated by a series of Monte-Carlo experiments and an empirical data example. The results are compared favorably with several existing methods. The method is implemented in an R package robfpqr.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Our method should not be mistaken for classical (scalar-on-scalar) quantile regression (Koenker, 2005), nor for scalar-on-function quantile regression (Cardot, Crambes and Sarda, 2005;Chen and Müller, 2012;Kato, 2012;Li et al., 2022;Ghosal et al., 2023;Yan, Li and Niu, 2023;Beyaztas, Tez and Shang, 2024), function-on-scalar quantile regression (Yang, 2020;Yang et al., 2020;Liu, Li and Morris, 2020;Zhang et al., 2022;Liu, Li and Morris, 2023), or function-on-function quantile regression (Beyaztas, Shang and Alin, 2022;Zhu et al., 2023;Mutis et al., 2024;Beyaztas, Shang and Saricam, 2024). Although the latter bears certain similarities to the proposed model, the conditional distribution of the functional response are characterized through quantiles, i.e., the predictor and response themselves are not quantile curves. ...
Article
Full-text available
This paper introduces a new objective measure for assessing treatment response in asthmatic patients using computed tomography (CT) imaging data. For each patient, CT scans were obtained before and after one year of mon-oclonal antibody treatment. Following image segmentation, the Hounsfield unit (HU) values of the voxels were encoded through quantile functions. It is hypothesized that patients with improved conditions after treatment will exhibit better expiration, reflected in higher HU values and an upward shift in the quantile curve. To objectively measure treatment response, a novel linear regression model on quantile functions is developed, drawing inspiration from Verde and Irpino (2010). Unlike their framework, the proposed model is parametric and incorporates distributional assumptions on the errors, enabling statistical inference. The model allows for the explicit calculation of regression coefficient estimators and confidence intervals, similar to conventional linear regression. The corresponding data and R code are available on GitHub to facilitate the reproducibility of the analyses presented.
... Our method should not be mistaken for classical (scalar-on-scalar) quantile regression (Koenker, 2005), nor for scalar-on-function quantile regression (Cardot, Crambes and Sarda, 2005;Chen and Müller, 2012;Kato, 2012;Li et al., 2022;Ghosal et al., 2023;Yan, Li and Niu, 2023;Beyaztas, Tez and Shang, 2024), function-on-scalar quantile regression (Yang, 2020;Yang et al., 2020;Liu, Li and Morris, 2020;Zhang et al., 2022;Liu, Li and Morris, 2023), or function-on-function quantile regression (Beyaztas, Shang and Alin, 2022;Zhu et al., 2023;Mutis et al., 2024;Beyaztas, Shang and Saricam, 2024). Although the latter bears certain similarities to the proposed model, the conditional distribution of the functional response are characterized through quantiles, i.e., the predictor and response themselves are not quantile curves. ...
Preprint
Full-text available
This paper introduces a new objective measure for assessing treatment response in asthmatic patients using computed tomography (CT) imaging data. For each patient, CT scans were obtained before and after one year of mon-oclonal antibody treatment. Following image segmentation, the Hounsfield unit (HU) values of the voxels were encoded through quantile functions. It is hypothesized that patients with improved conditions after treatment will exhibit better expiration, reflected in higher HU values and an upward shift in the quantile curve. To objectively measure treatment response, a novel linear regression model on quantile functions is developed, drawing inspiration from Verde and Irpino (2010). Unlike their framework, the proposed model is parametric and incorporates distributional assumptions on the errors, enabling statistical inference. The model allows for the explicit calculation of regression coefficient estimators and confidence intervals, similar to conventional linear regression. The corresponding data and R code are available on GitHub to facilitate the reproducibility of the analyses presented.
Article
Full-text available
In this study, we propose a function-on-function linear quantile regression model that allows for more than one functional predictor to establish a more flexible and robust approach. The proposed model is first transformed into a finitedimensional space via the functional principal component analysis paradigm in the estimation phase. It is then approximated using the estimated functional principal component functions, and the estimated parameter of the quantile regression model is constructed based on the principal component scores. In addition, we propose a Bayesian information criterion to determine the optimum number of truncation constants used in the functional principal component decomposition. Moreover, a stepwise forward procedure and the Bayesian information criterion are used to determine the significant predictors for including in the model. We employ a nonparametric bootstrap procedure to construct prediction intervals for the response functions. The finite sample performance of the proposed method is evaluated via several Monte Carlo experiments and an empirical data example, and the results produced by the proposed method are compared with the ones from existing models.
Article
Full-text available
A function-on-function linear quantile regression model, where both the response and predictors consist of random curves, is proposed by extending the classical quantile regression setting into the functional data to characterize the entire conditional distribution of functional response. In this paper, a functional partial quantile regression approach, a quantile regression analog of the functional partial least squares regression, is proposed to estimate the function-on-function linear quantile regression model. A partial quantile covariance function is first used to extract the functional partial quantile regression basis functions. The extracted basis functions are then used to obtain the functional partial quantile regression components and estimate the final model. Although the functional random variables belong to an infinite-dimensional space, they are observed in a finite set of discrete-time points in practice. Thus, in our proposal, the functional forms of the discretely observed random variables are first constructed via a finite-dimensional basis function expansion method. The functional partial quantile regression constructed using the functional random variables is approximated via the partial quantile regression constructed using the basis expansion coefficients. The proposed method uses an iterative procedure to extract the partial quantile regression components. A Bayesian information criterion is used to determine the optimum number of retained components. The proposed functional partial quantile regression model allows for more than one functional predictor in the model. However, the true form of the proposed model is unspecified, as the relevant predictors for the model are unknown in practice. Thus, a forward variable selection procedure is used to determine the significant predictors for the proposed model. Moreover, a case-sampling-based bootstrap procedure is used to construct pointwise prediction intervals for the functional response. The predictive performance of the proposed method is evaluated using several Monte Carlo experiments under different data generation processes and error distributions. The finite-sample performance of the proposed method is compared with the functional partial least squares method. Through an empirical data example, air quality data are analyzed to demonstrate the effectiveness of the proposed method.
Article
Full-text available
It is known that functional single-index regression models can achieve better prediction accuracy than functional linear models or fully nonparametric models, when the target is to predict a scalar response using a function-valued covariate. However, the performance of these models may be adversely affected by extremely large values or skewness in the response. In addition, they are not able to offer a full picture of the conditional distribution of the response. Motivated by using trajectories of PM10 concentrations of last day to predict the maximum PM10 concentration of the current day, a functional single-index quantile regression model is proposed to address those issues. A generalized profiling method is employed to estimate the model. Simulation studies are conducted to investigate the finite sample performance of the proposed estimator. We apply the proposed framework to predict the maximal value of PM10 concentrations based on the intraday PM10 concentrations of the previous day.
Article
Full-text available
Researchers are sometimes interested in predicting a distal or external outcome (such as smoking cessation at follow-up) from the trajectory of an intensively recorded longitudinal variable (such as urge to smoke). This can be done in a semiparametric way via scalar-on-function regression. However, the resulting fitted coefficient regression function requires special care for correct interpretation, as it represents the joint relationship of time points to the outcome, rather than a marginal or cross-sectional relationship. We provide practical guidelines, based on experience with scientific applications, for helping practitioners interpret their results and illustrate these ideas using data from a smoking cessation study.
Article
Full-text available
Background The Paced Auditory Serial Addition Test (PASAT) is a useful cognitive test in patients with multiple sclerosis (MS), assessing sustained attention and information processing speed. However, the neural underpinnings of performance in the test are controversial. We aimed to study the neural basis of PASAT performance by using structural magnetic resonance imaging (MRI) in a series of 242 patients with MS. Methods PASAT (3-s) was administered together with a comprehensive neuropsychological battery. Global brain volumes and total T2-weighted lesion volumes were estimated. Voxel-based morphometry and lesion symptom mapping analyses were performed. Results Mean PASAT score was 42.98 ± 10.44; results indicated impairment in 75 cases (31.0%). PASAT score was correlated with several clusters involving the following regions: bilateral precuneus and posterior cingulate, bilateral caudate and putamen, and bilateral cerebellum. Voxel-based lesion symptom mapping showed no significant clusters. Region of interest–based analysis restricted to white matter regions revealed a correlation with the left cingulum, corpus callosum, bilateral corticospinal tracts, and right arcuate fasciculus. Correlations between PASAT scores and global volumes were weak. Conclusion PASAT score was associated with regional volumes of the posterior cingulate/precuneus and several subcortical structures, specifically the caudate, putamen, and cerebellum. This emphasises the role of both cortical and subcortical structures in cognitive functioning and information processing speed in patients with MS.
Article
Full-text available
We propose a new measure related with tail dependence in terms of correlation: quantile correlation coefficient of random variables X, Y. The quantile correlation is defined by the geometric mean of two quantile regression slopes of X on Y and Y on X in the same way that the Pearson correlation is related with the regression coefficients of Y on X and X on Y. The degree of tail dependent association in X, Y, if any, is well reflected in the quantile correlation. The quantile correlation makes it possible to measure sensitivity of a conditional quantile of a random variable with respect to change of the other variable. The properties of the quantile correlation are similar to those of the correlation. This enables us to interpret it from the perspective of correlation, on which tail dependence is reflected. We construct measures for tail dependent correlation and tail asymmetry and develop statistical tests for them. We prove asymptotic normality of the estimated quantile correlation and limiting null distributions of the proposed tests, which is well supported in finite samples by a Monte-Carlo study. The proposed quantile correlation methods are well illustrated by analyzing birth weight data set and stock return data set.
Article
Full-text available
Objectives: The presence and monitoring of cognitive impairment is frequently overlooked in a disease such as multiple sclerosis (MS), which has the potential to affect the physical, social, and socioeconomic lives of individuals in early adulthood. The purpose of this study was to establish Paced Auditory Serial Addition Test (PASAT) normative data for the healthy Turkish population. Patients and methods: Three hundred eighty-five healthy volunteers were enrolled. Thirty-two subgroups were established, comprising four age groups (18-25, 26-35, 36-45, and 46-55), four education groups (5 years of education, 8 years, 11 years and 15 years) and two gender groups (male and female). The PASAT test was applied to the entire study group. Results: PASAT score decreased with age, although the difference between the age groups did not achieve statistical significance. Very strong significant correlation was determined between education level and PASAT performance. PASAT scores increased with the number of years of education. Conclusion: This study provides normal PASAT values in the Turkish population on the basis of age, gender, and level of education. These data can be used as control values in clinical practice.
Article
Full-text available
We propose a prediction procedure for the functional linear quantile regression model by using partial quantile covariance techniques and develop a simple partial quantile regression (SIMPQR) algorithm to efficiently extract partial quantile regression (PQR) basis for estimating functional coefficients. We further extend our partial quantile covariance techniques to functional composite quantile regression (CQR) defining partial composite quantile covariance. There are three major contributions. (1) We define partial quantile covariance between two scalar variables through linear quantile regression. We compute PQR basis by sequentially maximizing the partial quantile covariance between the response and projections of functional covariates. (2) In order to efficiently extract PQR basis, we develop a SIMPQR algorithm analogous to simple partial least squares (SIMPLS). (3) Under the homoscedasticity assumption, we extend our techniques to partial composite quantile covariance and use it to find the partial composite quantile regression (PCQR) basis. The SIMPQR algorithm is then modified to obtain the SIMPCQR algorithm. Two simulation studies show the superiority of our proposed methods. Two real data from ADHD-200 sample and ADNI are analyzed using our proposed methods.
Article
Full-text available
We introduce the functional generalized additive model (FGAM), a novel regression model for association studies between a scalar response and a functional predictor. We model the link-transformed mean response as the integral with respect to t of F{X(t), t} where F( ·, ·) is an unknown regression function and X(t) is a functional covariate. Rather than having an additive model in a finite number of principal components as by Müller and Yao (200823. Müller, H.G., and Yao, F. (2008), “Functional Additive Models,” Journal of the American Statistical Association, 103, 1534–1544.[Taylor & Francis Online], [Web of Science ®]View all references), our model incorporates the functional predictor directly and thus our model can be viewed as the natural functional extension of generalized additive models. We estimate F( ·, ·) using tensor-product B-splines with roughness penalties. A pointwise quantile transformation of the functional predictor is also considered to ensure each tensor-product B-spline has observed data on its support. The methods are evaluated using simulated data and their predictive performance is compared with other competing scalar-on-function regression alternatives. We illustrate the usefulness of our approach through an application to brain tractography, where X(t) is a signal from diffusion tensor imaging at position, t, along a tract in the brain. In one example, the response is disease-status (case or control) and in a second example, it is the score on a cognitive test. The FGAM is implemented in R in the refund package. There are additional supplementary materials available online.
Article
Full-text available
This paper deals with a scalar response conditioned by a functional random variable. The main goal is to estimate nonparametrically the quantiles of such a conditional distribution when the sample is considered as an α-mixing sequence. Firstly, a kernel type estimator for the conditional cumulative dis-tribution function (cond-cdf) is introduced. Afterwards, we derive an esti-mate of the quantiles by inverting this estimated cond-cdf, and asymptotic properties are stated. This approach can be applied in time series analysis. For that, the whole observed time series has to be split into a set of functional data, and the functional conditional quantile approach can be used both to forecast and to build confidence prediction bands. The El Niño time series illustrates this.
Article
Full-text available
This paper proposes an informative exploratory tool, the functional boxplot, for visualizing functional data, as well as its generalization, the enhanced functional boxplot. Based on the center outwards ordering induced by band depth for functional data, the descriptive statistics of a functional boxplot are: the envelope of the 50% central region, the median curve and the maximum non-outlying envelope. In addition, outliers can be detected in a functional boxplot by the 1.5 times the 50% central region empirical rule, analogous to the rule for classical boxplots. The construction of a functional boxplot is illustrated on a series of sea surface temperatures related to the El Niño phenomenon and its outlier detection performance is explored by simula-tions. As applications, the functional boxplot and enhanced functional boxplot are demonstrated on children growth data and spatio-temporal U.S. precipitation data for nine climatic regions, respectively.
Article
Full-text available
  Motivated by the conditional growth charts problem, we develop a method for conditional quantile analysis when predictors take values in a functional space. The method proposed aims at estimating conditional distribution functions under a generalized functional regression framework. This approach facilitates balancing of model flexibility and the curse of dimensionality for the infinite dimensional functional predictors. Its good performance in comparison with other methods, both for sparsely and for densely observed functional covariates, is demonstrated through theory as well as in simulations and an application to growth curves, where the method proposed can, for example, be used to assess the entire growth pattern of a child by relating it to the predicted quantiles of adult height.
Article
Full-text available
We develop fast fitting methods for generalized functional linear models. The functional predictor is projected onto a large number of smooth eigenvectors and the coefficient function is estimated using penalized spline regression; confidence intervals based on the mixed model framework are obtained. Our method can be applied to many functional data designs including functions measured with and without error, sparsely or densely sampled. The methods also extend to the case of multiple functional predictors or functional predictors with a natural multilevel structure. The approach can be implemented using standard mixed effects software and is computationally fast. The methodology is motivated by a study of white-matter demyelination via diffusion tensor imaging (DTI). The aim of this study is to analyze differences between various cerebral white-matter tract property measurements of multiple sclerosis (MS) patients and controls. While the statistical developments proposed here were motivated by the DTI study, the methodology is designed and presented in generality and is applicable to many other areas of scientific research. An online appendix provides R implementations of all simulations.
Article
Full-text available
We propose a novel class of models for functional data exhibiting skewness or other shape characteristics that vary with spatial or temporal location. We use copulas so that the marginal distributions and the dependence structure can be modeled independently. Dependence is modeled with a Gaussian or t-copula, so that there is an underlying latent Gaussian process. We model the marginal distributions using the skew t family. The mean, variance, and shape parameters are modeled nonparametrically as functions of location. A computationally tractable inferential framework for estimating heterogeneous asymmetric or heavy-tailed marginal distributions is introduced. This framework provides a new set of tools for increasingly complex data collected in medical and public health studies. Our methods were motivated by and are illustrated with a state-of-the-art study of neuronal tracts in multiple sclerosis patients and healthy controls. Using the tools we have developed, we were able to find those locations along the tract most affected by the disease. However, our methods are general and highly relevant to many functional data sets. In addition to the application to one-dimensional tract profiles illustrated here, higher-dimensional extensions of the methodology could have direct applications to other biological data including functional and structural magnetic resonance imaging (MRI).
Article
Full-text available
We introduce models for the analysis of functional data observed at multiple time points. The dynamic behavior of functional data is decomposed into a time-dependent population average, baseline (or static) subject-specific variability, longitudinal (or dynamic) subject-specific variability, subject-visit-specific variability and measurement error. The model can be viewed as the functional analog of the classical longitudinal mixed effects model where random effects are replaced by random processes. Methods have wide applicability and are computationally feasible for moderate and large data sets. Computational feasibility is assured by using principal component bases for the functional processes. The methodology is motivated by and applied to a diffusion tensor imaging (DTI) study designed to analyze differences and changes in brain connectivity in healthy volunteers and multiple sclerosis (MS) patients. An R implementation is provided.87.
Article
Full-text available
Inflammatory demyelination and axon damage in the corpus callosum are prominent features of multiple sclerosis (MS) and may partially account for impaired performance on complex tasks. The objective of this article was to characterize quantitative callosal MRI abnormalities and their association with disability. In 69 participants with MS and 29 healthy volunteers, lesional and extralesional callosal MRI indices were estimated via diffusion tensor tractography. expanded disability status scale (EDSS) and MS functional composite (MSFC) scores were recorded in 53 of the participants with MS. All tested callosal MRI indices were diffusely abnormal in MS. EDSS score was correlated only with age (r = 0.51). Scores on the overall MSFC and its paced serial auditory addition test (PASAT) and 9-hole peg test components were correlated with callosal fractional anisotropy (r = 0.27, 0.35, and 0.31, respectively) and perpendicular diffusivity (r = -0.29, -0.30, and -0.31) but not with overall callosal volume or callosal lesion volume; the PASAT score was more weakly correlated with callosal magnetization-transfer ratio (r = 0.21). Anterior callosal abnormalities were associated with impaired PASAT performance and posterior abnormalities with slow performance on the 9-hole peg test. In conclusion, abnormalities in the corpus callosum can be assessed with quantitative MRI and are associated with cognitive and complex upper-extremity dysfunction in MS.
Article
Full-text available
The need for a measure of severity of concussion apart from duration of post-traumatic amnesia is examined. The paced auditory serial-addition test, a measure of rate of information processing, is presented as a convenient test for estimating individual performance during recovery. Procedures for administration and control data are given, and the programme used for managing the rehabilitation of concussion patients described.
Article
Full-text available
This paper deals with a linear model of regression on quantiles when the explanatory variable takes values in some functional space and the response is scalar. We propose a spline estimator of the functional coefficient that minimizes a penalized L1 type criterion. Then, we study the asymptotic behavior of this estimator. The penalization is of primary importance to get existence and convergence.
Article
The scalar‐on‐function regression model has become a popular analysis tool to explore the relationship between a scalar response and multiple functional predictors. Most of the existing approaches to estimate this model are based on the least‐squares estimator, which can be seriously affected by outliers in empirical datasets. When outliers are present in the data, it is known that the least‐squares‐based estimates may not be reliable. This paper proposes a robust functional partial least squares method, allowing a robust estimate of the regression coefficients in a scalar‐on‐multiple‐function regression model. In our method, the functional partial least squares components are computed via the partial robust M‐regression. The predictive performance of the proposed method is evaluated using several Monte Carlo experiments and two chemometric datasets: glucose concentration spectrometric data and sugar process data. The results produced by the proposed method are compared favorably with some of the classical functional or multivariate partial least squares and functional principal component analysis methods.
Article
Partial least squares (PLS) is a dimensionality reduction technique used as an alternative to ordinary least squares (OLS) in situations where the data is colinear or high dimensional. Both PLS and OLS provide mean based estimates, which are extremely sensitive to the presence of outliers or heavy tailed distributions. In contrast, quantile regression is an alternative to OLS that computes robust quantile based estimates. In this work, the multivariate PLS is extended to the quantile regression framework, obtaining a theoretical formulation of the problem and a robust dimensionality reduction technique that we call fast partial quantile regression (fPQR), that provides quantile based estimates. An efficient implementation of fPQR is also derived, and its performance is studied through simulation experiments and the chemometrics well known biscuit dough dataset, a real high dimensional example.
Book
This book establish the link between inequality studies and quantile regression. Using clear statistical explanations and rich empirical examples, the took creates a bridge between the new and conventional modeling frameworks.
Article
People employ the function-on-function regression to model the relationship between two stochastic processes. Fitting this model, widely used strategies include functional partial least squares algorithms which typically require iterative eigen-decomposition. Here we introduce a route of functional partial least squares based upon Krylov subspace. Our route can be expressed in two forms equivalent to each other in exact arithmetic: One is non-iterative with explicit expressions of the estimator and prediction, facilitating the theoretical derivation and potential extensions; the other one stabilizes numerical outputs. The consistence of estimation and prediction is established under regularity conditions. It is highlighted that our proposal is competitive in terms of both estimation and prediction accuracy but consumes much less execution time.
Article
The main purpose of this paper is to estimate, semi-parametrically, the quantiles of a conditional distribution when the response is a real-valued random variable subject to a right-censorship phenomenon and the predictor takes values in an infinite dimensional space. We assume that the explanatory and the response variables are linked through a single-index structure. First, we introduce a kernel-type estimator of the conditional quantile when the data are supposed to be selected from an underlying stationary and ergodic process. Then, under some general conditions, the uniform almost-complete convergence rate as well as the asymptotic distribution of the estimator are established. A numerical study, including simulated and real data application, is performed to illustrate the validity and the finite-sample performance of the considered estimator.
Article
To respond to the compelling air pollution programs, shipping companies are nowadays setting‐up on their fleets modern multisensor systems that stream massive amounts of observational data, which can be considered as varying over a continuous domain. Motivated by this context, a novel procedure is proposed, which extends classical multivariate techniques to the monitoring of multivariate functional data and a scalar quality characteristic related to them. The proposed procedure is shown to be also applicable in real time and is illustrated by means of a real‐case study in the maritime field on the continuous monitoring of operating conditions (ie, the multivariate functional data) and total CO2 emissions (ie, the scalar quality characteristic) at each voyage of a cruise ship. The real‐time monitoring is particularly helpful for promptly supporting managerial decision making by indicating if and when an anomaly occurs during the navigation.
Article
Quantile regression for functional partially linear model in ultra-high dimensions is proposed and studied. By focusing on the conditional quantiles, where conditioning is on both multiple random processes and high-dimensional scalar covariates, the proposed model can lead to a comprehensive description of the scalar response. To select and estimate important variables, a double penalized functional quantile objective function with two nonconvex penalties is developed, and the optimal tuning parameters involved can be chosen by a two-step technique. Based on the difference convex analysis (DCA), the asymptotic properties of the resulting estimators are established, and the convergence rate of the prediction of the conditional quantile function can be obtained. Simulation studies demonstrate a competitive performance against the existing approach. A real application to Alzheimer's Disease Neuroimaging Initiative (ADNI) data is used to illustrate the practicality of the proposed model.
Article
Researchers are increasingly interested in regression models for functional data. This article discusses a comprehensive framework for additive (mixed) models for functional responses and/or functional covariates based on the guiding principle of reframing functional regression in terms of corresponding models for scalar data, allowing the adaptation of a large body of existing methods for these novel tasks. The framework encompasses many existing as well as new models. It includes regression for ‘generalized’ functional data, mean regression, quantile regression as well as generalized additive models for location, shape and scale (GAMLSS) for functional data. It admits many flexible linear, smooth or interaction terms of scalar and functional covariates as well as (functional) random effects and allows flexible choices of bases—particularly splines and functional principal components—and corresponding penalties for each term. It covers functional data observed on common (dense) or curve-specific (sparse) grids. Penalized-likelihood-based and gradient-boosting-based inference for these models are implemented in R packages refund and FDboost, respectively. We also discuss identifiability and computational complexity for the functional regression models covered. A running example on a longitudinal multiple sclerosis imaging study serves to illustrate the flexibility and utility of the proposed model class. Reproducible code for this case study is made available online.
Article
We propose a regularized partially functional quantile regression model where the response variable is scalar while the explanatory variables involve both infinite-dimensional predictor processes viewed as functional data, and high-dimensional scalar covariates. Despite extensive work focusing on functional linear models, little effort has been devoted to the development of robust methodologies that tackle the scenarios of non-normal errors. This motivates our proposal of functional quantile regression that seeks an alternative and robust solution to least squares type procedures within the partially functional regression framework. We focus on estimating and selecting the important variables in the high-dimensional covariates, which is complicated by the infinite-dimensional functional predictor. We establish the asymptotic properties of the resulting shrinkage estimator, and empirical illustrations are given by simulation and an application to a brain imaging dataset.
Article
Recent years have seen an explosion of activity in the field of functional data analysis (FDA), in which curves, spectra, images and so on are considered as basic functional data units. A central problem in FDA is how to fit regression models with scalar responses and functional data points as predictors. We review some of the main approaches to this problem, categorising the basic model types as linear, non-linear and non-parametric. We discuss publicly available software packages and illustrate some of the procedures by application to a functional magnetic resonance imaging data set.
Book
Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators provides a uniquely broad compendium of the key mathematical concepts and results that are relevant for the theoretical development of functional data analysis (FDA). The self-contained treatment of selected topics of functional analysis and operator theory includes reproducing kernel Hilbert spaces, singular value decomposition of compact operators on Hilbert spaces and perturbation theory for both self-adjoint and non self-adjoint operators. The probabilistic foundation for FDA is described from the perspective of random elements in Hilbert spaces as well as from the viewpoint of continuous time stochastic processes. Nonparametric estimation approaches including kernel and regularized smoothing are also introduced. These tools are then used to investigate the properties of estimators for the mean element, covariance operators, principal components, regression function and canonical correlations. A general treatment of canonical correlations in Hilbert spaces naturally leads to FDA formulations of factor analysis, regression, MANOVA and discriminant analysis. This book will provide a valuable reference for statisticians and other researchers interested in developing or understanding the mathematical aspects of FDA. It is also suitable for a graduate level special topics course.
Article
Least squares estimation of the functional linear regression model with scalar response is an ill-posed problem due to the infinite dimension of the functional predictor. Dimension reduction approaches as principal component regression or partial least squares regression are proposed and widely used in applications. In both cases the interpretation of the model could be difficult because of the roughness of the coefficient regression function. In this paper, two penalized estimations of this model based on modifying the partial least squares criterion with roughness penalties for the weight functions are proposed. One introduces the penalty in the definition of the norm in the functional space, and the other one in the cross-covariance operator. A simulation study and an several applications on real data show the efficiency of the penalized approaches with respect to the non penalized ones.
Article
In this paper we study statistical inference in functional quantile regression for scalar response and a functional covariate. Specifically, we consider linear functional quantile model where the effect of the covariate on the quantile of the response is modeled through the inner product between the functional covariate and an unknown smooth regression parameter function that varies with the level of quantile. The objective is to test that the regression parameter is constant across several quantile levels of interest. The parameter function is estimated by combining ideas from functional principal component analysis and quantile regression. We establish asymptotic properties of the parameter function estimator, for a single quantile level as well as for a set of quantile levels. An adjusted Wald testing procedure is proposed for this hypothesis of interest and its chi-square asymptotic null distribution is derived. The testing procedure is investigated numerically in simulations involving sparsely and noisy functional covariates and in the capital bike share study application. The proposed approach is easy to implement and the {\tt R} code is published online.
Article
Prediction of Ozone pollution is currently an important field of research, mainly in a goal of prevention. Many statistical methods have already been used to study data dealing with pollution. For example, Ghattas (1999) used a regression tree approach, while a functional approach has been proposed by Damon and Guillas (2002) and by Aneiros-Perez, Cardot, Estevez-Perez and Vieu (2004). Pollution data often consist now in hourly measurements of pollutants and meteorological data. These variables are then comparable to curves known in some discretization points, usually called functional data in the literature, Ramsay and Silverman (1997). Many examples of such data have already been studied in various fields, Franck and Friedman (1993), Ramsay and Silverman (2002), Ferraty and Vieu (2002). It seems then natural to propose some models that take into account the fact that the variables are functions of time. The data we study here were provided by the ORAMIP (Observatoire R-Legional de lfAir en Midi-Pyr-Len-Lees-h), which is an air observatory located in the city of Toulouse (France). We are interested in a pollutant like Ozone. We consider the prediction of the maximum of pollution for a day (maximum of Ozone) knowing the Ozone temporal evolution the day before. To do this, we consider two models. The first one is the functional linear model introduced by Ramsay and Dalzell (1993). It is based on the prediction of the conditional mean. The second one is a generalization of the linear model for quantile re222 12 Ozone Pollution Forecasting gression introduced by Koenker and Bassett (1978) when the covariates are curves. It consists in forecasting the conditional median. More generally, we introduce this model for the -¿-conditional quantile, with -¿ -̧ (0, 1). This allows us to give prediction intervals. For both models, a spline estimator of the functional coefficient is introduced, in a way similar to Cardot, Ferraty and Sarda (2003). This work is divided into four parts. First, we give a brief statistical description and analysis of the data, in particular by the use of principal components analysis (PCA), to study the general behaviour of the variables. Secondly, we present the functional linear model and we propose a spline estimator of the functional coefficient. Similarly, we propose in the third part a spline estimator of the functional coefficient for the -¿-conditional quantile. In both models, we describe the algorithms that have been implemented to obtain the spline estimator. We also extend these algorithms to the case where there are several functional predictors by the use of a backfitting algorithm. Finally, these approaches are illustrated using the real pollution data provided by the ORAMIP.
Article
A general framework for smooth regression of a functional response on one or multiple functional predictors is proposed. Using the mixed model representation of penalized regression expands the scope of function-on-function regression to many realistic scenarios. In particular, the approach can accommodate a densely or sparsely sampled functional response as well as multiple functional predictors that are observed on the same or different domains than the functional response, on a dense or sparse grid, and with or without noise. It also allows for seamless integration of continuous or categorical covariates and provides approximate confidence intervals as a by-product of the mixed model inference. The proposed methods are accompanied by easy to use and robust software implemented in the pffr function of the R package refund. Methodological developments are general, but were inspired by and applied to a diffusion tensor imaging brain tractography dataset.
Article
The goal of our article is to provide a transparent, robust, and computationally feasible statistical approach for testing in the context of scalar-on-function linear regression models. Assuming linearity between response and predictors, we are interested in testing for the necessity of functional effects. Our methods are motivated by and applied to a large longitudinal study involving diffusion tensor imaging of intracranial white matter tracts in a susceptible cohort. In the context of this study, we conduct hypothesis tests that are motivated by anatomical knowledge and support recent findings regarding the relationship between cognitive impairment and white matter demyelination. R code and data are in the examples of refund::rlrt.pfr(). Supplementary materials for this article are available online.
Article
This paper studies estimation in partial functional linear quantile regression in which the dependent variable is related to both a vector of finite length and a function-valued random variable as predictor variables. The slope function is estimated by the functional principal component basis. The asymptotic distribution of the estimator of the vector of slope parameters is derived and the global convergence rate of the quantile estimator of unknown slope function is established under suitable norm. It is showed that this rate is optimal in a minimax sense under some smoothness assumptions on the covariance kernel of the covariate and the slope function. The convergence rate of the mean squared prediction error for the proposed estimators is also be established. Finite sample properties of our procedures are studied through Monte Carlo simulations. A real data example about Berkeley growth data is used to illustrate our proposed methodology. © 2014, Science China Press and Springer-Verlag Berlin Heidelberg.
Article
The theory and practice of statistical methods in situations where the available data are functions (instead of real numbers or vectors) is often referred to as Functional Data Analysis (FDA). This subject has become increasingly popular from the end of the 1990s and is now a major research field in statistics. The aim of this expository paper is to offer a short tutorial as well as a partial survey of the state of the art in FDA theory. Both the selection of topics and the references list are far from exhaustive. Many interesting ideas and references have been left out for the sake of brevity and readability. In summary, this paper provides: (a)A discussion on the nature and treatment of the functional data.(b)A review of some probabilistic tools especially suited for FDA.(c)A discussion about how the usual centrality parameters, mean, median and mode, can be defined and estimated in the functional setting.(d)Short accounts of the main ideas and current literature on regression, classification, dimension reduction and bootstrap methods in FDA.(e)Some final comments regarding software for FDA.
Article
The topic of this paper is related to quantile regression when the covariate is a function. The estimator we are interested in, based on the Support Vector Machine method, was introduced in Crambes et al. (2011). We improve the results obtained in this former paper, giving a rate of convergence in probability of the estimator. In addition, we give a practical method to construct the estimator, solution of a penalized L1-type minimization problem, using an Iterative Reweighted Least Squares procedure. We evaluate the performance of the estimator in practice through simulations and a real data set study.
Article
This paper deals with a nonparametric estimation of conditional quantile regression when the explanatory variable X takes its values in a bounded subspace of a functional space X and the response Y takes its values in a compact of the space Y≔R. The functional observations, X1,…,Xn, are projected onto a finite dimensional subspace having a suitable orthonormal system. The Xi’s will be characterized by their coordinates in this basis. We perform the Support Vector Machine Quantile Regression approach in finite dimension with the selected coefficients. Then we establish weak consistency of this estimator. The various parameters needed for the construction of this estimator are automatically selected by data-splitting and by penalized empirical risk minimization.
Article
In this paper, we propose two important measures, quantile correlation (QCOR) and quantile partial correlation (QPCOR). We then apply them to quantile autoregressive (QAR) models, and introduce two valuable quantities, the quantile autocorrelation function (QACF) and the quantile partial autocorrelation function (QPACF). This allows us to extend the classical Box-Jenkins approach to quantile autoregressive models. Specifically, the QPACF of an observed time series can be employed to identify the autoregressive order, while the QACF of residuals obtained from the fitted model can be used to assess the model adequacy. We not only demonstrate the asymptotic properties of QCOR, QPCOR, QACF, and PQACF, but also show the large sample results of the QAR estimates and the quantile version of the Ljung-Box test. Simulation studies indicate that the proposed methods perform well in finite samples, and an empirical example is presented to illustrate usefulness.
Article
There are many chemometric applications, such as spectroscopy, where the objective is to explain a scalar response from a functional variable (the spectrum) whose observations are functions of wavelengths rather than vectors. In this paper, PLS regression is considered for estimating the linear model when the predictor is a functional random variable. Due to the infinite dimension of the space to which the predictor observations belong, they are usually approximated by curves/functions within a finite dimensional space spanned by a basis of functions. We show that PLS regression with a functional predictor is equivalent to finite multivariate PLS regression using expansion basis coefficients as the predictor, in the sense that, at each step of the PLS iteration, the same prediction is obtained. In addition, from the linear model estimated using the basis coefficients, we derive the expression of the PLS estimate of the regression coefficient function from the model with a functional predictor. The results provided by this functional PLS approach are compared with those given by functional PCR and discrete PLS and PCR using different sets of simulated and spectrometric data.
Chapter
When either data or the models for them involve functions, and when only weak assumptions about these functions such as smoothness are permitted, familiar statistical methods must be modified and new approaches developed in order to take advantage of this smoothness. The first part of the article considers some general issues such as characteristics of functional data, uses of derivatives in functional modelling, estimation of phase variation by the alignment or registration of curve features, the nature of error, and so forth. The second section describes functional versions of traditional methods such principal components analysis and linear modelling, and also mentions purely functional approaches that involve working with and estimating differential equations in the functional data analysis process.
Book
Quantile regression is gradually emerging as a unified statistical methodology for estimating models of conditional quantile functions. By complementing the exclusive focus of classical least squares regression on the conditional mean, quantile regression offers a systematic strategy for examining how covariates influence the location, scale and shape of the entire response distribution. This monograph is the first comprehensive treatment of the subject, encompassing models that are linear and nonlinear, parametric and nonparametric. The author has devoted more than 25 years of research to this topic. The methods in the analysis are illustrated with a variety of applications from economics, biology, ecology and finance. The treatment will find its core audiences in econometrics, statistics, and applied mathematics in addition to the disciplines cited above.
Article
Partial least squares regression (PLSR) is a method of finding a reliable predictor of the response variable when there are more regressors than observations. It does so by eliciting a small number of components from the regressors that are inherently informative about the response. Quantile regression (QR) estimates the quantiles of the response distribution by regression functions of the covariates, and so gives a fuller description of the response than does the usual regression for the mean value of the response. We extend QR to partial quantile regression (PQR) when there are more regressors than observations. For each percentile the method provides a low dimensional approximation to the joint distribution of the covariates and response with a given coverage probability and which, under further linearity assumptions, estimates the corresponding quantile of the conditional distribution. The methodology parallels the procedure for PLSR using a quantile covariance that is appropriate for predicting a quantile rather than the usual covariance which is appropriate for predicting a mean value. The analysis suggests a new measure of risk associated with the quantile regressions. Examples are given that illustrate the methodology and the benefits accrued, based on simulated data and the analysis of spectrometer data.
Article
The regression quantile estimate introduced by Koenker and Bassett in 1978 may not be robust when the predictors contain leverage points. We define estimates which are free of this drawback, and furthermore attain the maximum breakdown point for this problem. Simulations show them to behave generally better than competing robust quantile estimates.
Article
Partial least squares (PLS) regression on an L2-continuous stochastic process is an extension of the finite set case of predictor variables. The PLS components existence as eigenvectors of some operator and convergence properties of the PLS approximation are proved. The results of an application to stock-exchange data will be compared with those obtained by other methods.
Article
Partial Least Squares (PLS) is a standard statistical method in chemometrics. It can be considered as an incomplete, or “partial”, version of the Least Squares estimator of regression, applicable when high or perfect multicollinearity is present in the predictor variables. The Least Squares estimator is well-known to be an optimal estimator for regression, but only when the error terms are normally distributed. In the absence of normality, and in particular when outliers are in the data set, other more robust regression estimators have better properties. In this paper a “partial” version of M-regression estimators will be defined. If an appropriate weighting scheme is chosen, partial M-estimators become entirely robust to any type of outlying points, and are called Partial Robust M-estimators. It is shown that partial robust M-regression outperforms existing methods for robust PLS regression in terms of statistical precision and computational speed, while keeping good robustness properties. The method is applied to a data set consisting of EPXMA spectra of archaeological glass vessels. This data set contains several outliers, and the advantages of partial robust M-regression are illustrated. Applying partial robust M-regression yields much smaller prediction errors for noisy calibration samples than PLS. On the other hand, if the data follow perfectly well a normal model, the loss in efficiency to be paid for is very small.
Article
This paper studies estimation in functional linear quantile regression in which the dependent variable is scalar while the covariate is a function, and the conditional quantile for each fixed quantile index is modeled as a linear functional of the covariate. Here we suppose that covariates are discretely observed and sampling points may differ across subjects, where the number of measurements per subject increases as the sample size. Also, we allow the quantile index to vary over a given subset of the open unit interval, so the slope function is a function of two variables: (typically) time and quantile index. Likewise, the conditional quantile function is a function of the quantile index and the covariate. We consider an estimator for the slope function based on the principal component basis. An estimator for the conditional quantile function is obtained by a plug-in method. Since the so-constructed plug-in estimator not necessarily satisfies the monotonicity constraint with respect to the quantile index, we also consider a class of monotonized estimators for the conditional quantile function. We establish rates of convergence for these estimators under suitable norms, showing that these rates are optimal in a minimax sense under some smoothness assumptions on the covariance kernel of the covariate and the slope function. Empirical choice of the cutoff level is studied by using simulations.
Article
We studied functional status of MS patients in a geographically based cohort in Olmsted County, Minnesota. The 162 definite MS patients who were alive and residing in the study area on December 1, 1991, constituted the MS prevalence disability cohort. We identified 179 cases of definite or probable MS, providing an overall sex- and age-adjusted prevalence rate of 167.5 per 100,000. Median duration of MS from onset was 15.4 years, and median age on prevalence date was 47.5 years. The Minimal Record of Disability for MS determined the degree of impairment, disability, and handicap of the entire cohort within 4 months of the prevalence date. The frequency of Expanded Disability Status Scale scores of the MS prevalence cohort showed a bimodal distribution with peaks at 1 and 6.5 (3.5 [1 to 9.5], median [range]). Approximately one-third of the cohort had marked paraparesis, paraplegia, or quadriplegia. One-fourth of all patients needed intermittent or almost constant catheterization for bladder dysfunction. Few patients (3.7%) reported severe decrease in mentation or dementia requiring supervision. Many patients (53.1%) were working full-time. Most patients (72.2%) maintained their usual financial standard without external support. There were no differences in level of impairment, disability, or handicap observed between the subgroup of 122 patients (75.3%) who are incident cases (onset of disease as residents of Olmsted County) compared with the entire prevalence cohort. This geographically based study of MS demonstrates that the functional status is more favorable than previously recognized.