Publications (10)10.99 Total impact
 [Show abstract] [Hide abstract]
ABSTRACT: We describe a new approach to analyze chirp syllables of freetailed bats from two regions of Texas in which they are predominant: Austin and College Station. Our goal is to characterize any systematic regional differences in the mating chirps and assess whether individual bats have signature chirps. The data are analyzed by modeling spectrograms of the chirps as responses in a Bayesian functional mixed model. Given the variable chirp lengths, we compute the spectrograms on a relative time scale interpretable as the relative chirp position, using a variable window overlap based on chirp length. We use 2D wavelet transforms to capture correlation within the spectrogram in our modeling and obtain adaptive regularization of the estimates and inference for the regionsspecific spectrograms. Our model includes random effect spectrograms at the bat level to account for correlation among chirps from the same bat, and to assess relative variability in chirp spectrograms within and between bats. The modeling of spectrograms using functional mixed models is a general approach for the analysis of replicated nonstationary time series, such as our acoustical signals, to relate aspects of the signals to various predictors, while accounting for betweensignal structure. This can be done on raw spectrograms when all signals are of the same length, and can be done using spectrograms defined on a relative time scale for signals of variable length in settings where the idea of defining correspondence across signals based on relative position is sensible.Journal of the American Statistical Association 06/2013; 108(502):514526. DOI:10.1080/01621459.2013.793118 · 1.98 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by mfold crossvalidation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such crossvalidation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10fold crossvalidation, is a random variable that has considerable and surprising variation. Similar remarks apply to nonoracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of mfold crossvalidation with any oracle method, and not just the SCAD and Adaptive Lasso.The American Statistician 11/2011; 65(4):223228. DOI:10.1198/tas.2011.11052 · 0.92 Impact Factor 
Article: Longitudinal functional principal component modeling via Stochastic Approximation Monte Carlo.
[Show abstract] [Hide abstract]
ABSTRACT: The authors consider the analysis of hierarchical longitudinal functional data based upon a functional principal components approach. In contrast to standard frequentist approaches to selecting the number of principal components, the authors do model averaging using a Bayesian formulation. A relatively straightforward reversible jump Markov Chain Monte Carlo formulation has poor mixing properties and in simulated data often becomes trapped at the wrong number of principal components. In order to overcome this, the authors show how to apply Stochastic Approximation Monte Carlo (SAMC) to this problem, a method that has the potential to explore the entire space and does not become trapped in local extrema. The combination of reversible jump methods and SAMC in hierarchical longitudinal functional data is simplified by a polar coordinate representation of the principal components. The approach is easy to implement and does well in simulated data in determining the distribution of the number of principal components, and in terms of its frequentist estimation properties. Empirical applications are also presented. The Canadian Journal of Statistics 38: 256–270; 2010 © 2010 Statistical Society of Canada Les auteurs considèrent l'analyse de données fonctionnelles longitudinales hiérarchiques en se basant sur une approche d'analyse en composantes principales fonctionnelles. En contraste par rapport aux approches fréquentistes usuelles pour déterminer le nombre de composantes principales, les auteurs utilisent le moyennage de modèles bayésiens. L'algorithme de MonteCarlo à chaîne de Markov avec sauts réversibles standard a de faibles propriétés de mélange et, à partir de données simulées, il choisit souvent le mauvais nombre de composantes principales. Pour contourner ce problème, les auteurs montrent comment adapter l'algorithme de MonteCarlo par approximation stochastique (SAMC) à ce problème, cette méthode a le potentiel d'explorer la totalité de l'espace sans rester pris dans des extremums locaux. La combinaison des méthodes à sauts réversibles et SAMC dans l'analyse de données fonctionnelles longitudinales hiérarchiques est simplifiée à l'aide d'une représentation en coordonnées polaires des composantes principales. Cette approche est facile à implanter et elle fournit de bons résultats pour déterminer la distribution du nombre de composantes principales avec des données simulées. En plus, elle a de bonnes propriétés fréquentistes lors de l'estimation. Des applications empiriques sont aussi présentées. La revue canadienne de statistique 38: 256–270; 2010 © 2010 Société statistique du CanadaCanadian Journal of Statistics 06/2010; 38(2):256270. DOI:10.1002/cjs.10062 · 0.65 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: Hierarchical functional data are widely seen in complex studies where subunits are nested within units, which in turn are nested within treatment groups. We propose a general framework of functional mixed effects model for such data: within unit and within subunit variations are modeled through two separate sets of principal components; the subunit level functions are allowed to be correlated. Penalized splines are used to model both the mean functions and the principal components functions, where roughness penalties are used to regularize the spline fit. An EM algorithm is developed to fit the model, while the specific covariance structure of the model is utilized for computational efficiency to avoid storage and inversion of large matrices. Our dimension reduction with principal components provides an effective solution to the difficult tasks of modeling the covariance kernel of a random function and modeling the correlation between functions. The proposed methodology is illustrated using simulations and an empirical data set from a colon carcinogenesis study. Supplemental materials are available online.Journal of the American Statistical Association 03/2010; 105(489):390400. DOI:10.1198/jasa.2010.tm08737 · 1.98 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: We consider the problem of score testing for certain low dimensional parameters of interest in a model that could include finite but high dimensional secondary covariates and associated nuisance parameters. We investigate the possibility of the potential gain in power by reducing the dimensionality of the secondary variables via oracle estimators such as the Adaptive Lasso. As an application, we use a recently developed framework for score tests of association of a disease outcome with an exposure of interest in the presence of a possible interaction of the exposure with other cofactors of the model. We derive the local power of such tests and show that if the primary and secondary predictors are independent, then having an oracle estimator does not improve the local power of the score test. Conversely, if they are dependent, there is the potential for power gain. Simulations are used to validate the theoretical results and explore the extent of correlation needed between the primary and secondary covariates to observe an improvement of the power of the test by using the oracle estimator. Our conclusions are likely to hold more generally beyond the model of interactions considered here.The International Journal of Biostatistics 01/2010; 6(1):Article 12. DOI:10.2202/15574679.1231 · 0.74 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: Recently (Martinez et al. 2010),we compared calciumion signaling (Ca2+) between two exposures, where the data present as movies, or, more prosaically, time series of images. They described novel uses of singular value decompositions (SVD) and weighted versions of them (WSVD) to extract the signals from such movies, in a way that is semiautomatic and tuned closely to the actual data and their many complexities. These complexities include the following. First, the images themselves are of no interest: all interest focuses on the behavior of individual cells across time, and thus the cells need to be segmented in an automated manner. Second, the cells themselves have 100+ pixels, so that they form 100+ curves measured over time, so that data compression is required to extract the features of these curves. Third, some of the pixels in some of the cells are subject to image saturation due to bit depth limits, and this saturation needs to be accounted for if one is to normalize the images in a reasonably unbiased manner. Finally, theCa2+ signals have oscillations orwaves that vary with time and these signals need to be extracted. Thus, they showed how to use multiple weighted and standard singular value decompositions to detect, extract and clarify the Ca2+ signals. In this paper,we showhow this signal extraction lends itself to a cluster analysis of the cell behavior, which shows distinctly different patterns of behavior.Statistical Modelling and Regression Structures, 01/2010: pages 419430;  [Show abstract] [Hide abstract]
ABSTRACT: Time series associated with singlemolecule experiments and/or simulations contain a wealth of multiscale information about complex biomolecular systems. We demonstrate how a collection of Penalizedsplines (Psplines) can be useful in quantitatively summarizing such data. In this work, functions estimated using Psplines are associated with stochastic differential equations (SDEs). It is shown how quantities estimated in a single SDE summarize fastscale phenomena, whereas variation between curves associated with different SDEs partially reflects noise induced by motion evolving on a slower time scale. Psplines assist in "semiparametrically" estimating nonlinear SDEs in situations where a timedependent external force is applied to a singlemolecule system. The Psplines introduced simultaneously use function and derivative scatterplot information to refine curve estimates. We refer to the approach as the PuDI (Psplines using Derivative Information) method. It is shown how generalized least squares ideas fit seamlessly into the PuDI method. Applications demonstrating how utilizing uncertainty information/approximations along with generalized least squares techniques improve PuDI fits are presented. Although the primary application here is in estimating nonlinear SDEs, the PuDI method is applicable to situations where both unbiased function and derivative estimates are available.SIAM Journal on Multiscale Modeling and Simulation 01/2010; 8(4):15621580. DOI:10.1137/090768102 · 1.63 Impact Factor  SIAM Journal on Multiscale Modeling and Simulation 01/2010; 8:2097. DOI:10.1137/100809921 · 1.63 Impact Factor

Article: Use of multiple singular value decompositions to analyze complex intracellular calcium ion signals
[Show abstract] [Hide abstract]
ABSTRACT: We compare calcium ion signaling (Ca(2+)) between two exposures; the data are present as movies, or, more prosaically, time series of images. This paper describes novel uses of singular value decompositions (SVD) and weighted versions of them (WSVD) to extract the signals from such movies, in a way that is semiautomatic and tuned closely to the actual data and their many complexities. These complexities include the following. First, the images themselves are of no interest: all interest focuses on the behavior of individual cells across time, and thus, the cells need to be segmented in an automated manner. Second, the cells themselves have 100+ pixels, so that they form 100+ curves measured over time, so that data compression is required to extract the features of these curves. Third, some of the pixels in some of the cells are subject to image saturation due to bit depth limits, and this saturation needs to be accounted for if one is to normalize the images in a reasonably unbiased manner. Finally, the Ca(2+) signals have oscillations or waves that vary with time and these signals need to be extracted. Thus, our aim is to show how to use multiple weighted and standard singular value decompositions to detect, extract and clarify the Ca(2+) signals. Our signal extraction methods then lead to simple although finely focused statistical methods to compare Ca(2+) signals across experimental conditions.The Annals of Applied Statistics 12/2009; 3(4):14671492. DOI:10.1214/09AOAS253 · 1.46 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: When employing model selection methods such as the Lasso and the Adaptive Lasso, it is typical to estimate the smoothing parameter by mfold crossvalidation, e.g., m = 10. In problems where the true regression function is sparse and the signals large, such crossvalidation typically works well, with the Adaptive Lasso being an oracle method. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using the Adaptive Lasso with 10fold crossvalidation is a random variable that has considerable and surprising variation. Similar remarks apply to the Lasso. Our study strongly questions the suitability of performing only a single run of mfold crossvalidation with any oracle method, and not just the Adaptive Lasso.
Publication Stats
53  Citations  
10.99  Total Impact Points  
Top Journals
Institutions

2013

University of Texas MD Anderson Cancer Center
 Division of Radiation Oncology
Houston, Texas, United States


2010

Texas A&M University
 Department of Statistics
College Station, TX, United States
