About
47
Publications
4,177
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
655
Citations
Introduction
Bump Hunting of High Dimensional Data;
Regularization and Variance Stabilization of High Dimensional Data;
Variable Selection;
Statistical Computing;
Applied Probability and Statistics;
Data Mining
Skills and Expertise
Publications
Publications (47)
Principal Components Analysis is a widely used technique for dimension reduction and characterization of variability in multivariate populations. Our interest lies in studying when and why the rotation to principal components can be used effectively within a response-predictor set relationship in the context of mode hunting. Specifically focusing o...
The paper addresses a common problem in the analysis of high-dimensional high-throughput "omics" data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not re...
The search for structures in real datasets e.g. in the form of bumps, components, classes or clusters is important as these often reveal underlying phenomena leading to scientific discoveries. One of these tasks, known as bump hunting, is to locate domains of a multidimensional input space where the target function assumes local maxima without pre-...
We propose a new method to find modes based on active information. We develop an algorithm called active information mode hunting (AIMH) that, when applied to the whole space, will say whether there are any modes present and where they are. We show AIMH is consistent and, given that information increases where probability decreases, it helps to ove...
We introduce a survival/risk bump hunting framework to build a bump hunting
model with a censored time-to-event response. Our method called Survival Bump
Hunting relies on a rule-induction method, based on recursive peelings that
uses specific survival peeling criteria such as hazards ratio or log-rank test
statistics. To validate our model and imp...
Principal components analysis has been used to reduce the dimensionality of datasets for a long time. In this paper, we will demonstrate that in mode detection the components of smallest variance, the pettiest components, are more important. We prove that for a multivariate normal or Laplace distribution, we obtain boxes of optimal volume by implem...
Background
In this study, we demonstrate that our modified Gene Set Enrichment Analysis (GSEA) method, drug perturbation GSEA (dpGSEA), can detect phenotypically relevant drug targets through a unique transcriptomic enrichment that emphasizes biological directionality of drug-derived gene sets.
Results
We detail our dpGSEA method and show its effe...
Principal component analysis has been used to reduce dimensionality of datasets for a long time. In this paper, we will demonstrate that in mode detection the components of smallest variance, the pettiest components, are more important. We prove that when the data follows a multivariate normal distribution, by implementing "pettiest component analy...
The genomics revolution also spawned the dawn of precision medicine. As in the National Research Council definition, if its promise is fully realized, then more accurate decisions about individual patient treatment decisions and outcomes will be possible. Disparities researchers have also begun looking to the precision medicine paradigm with the ho...
Unraveling interactions among variables such as genetic, clinical, demographic and environmental factors is essential to understand the development of common and complex diseases. To increase the power to detect such variables interactions associated with clinical time-to-events outcomes, we borrowed established concepts from random survival forest...
Glioblastoma multiforme (GBM) is a genomically complex and aggressive primary adult brain tumor, with a median survival time of 12 - 14 months. The heterogeneous nature of this disease has made the identification and validation of prognostic biomarkers difficult. Using reverse phase protein array data from 203 primary untreated GBM patients we have...
Among the problems posed by high-dimensional datasets (so called p ≫ n paradigm) are that variable-specific estimators of variances are not reliable and tests statistics have low powers, both due to a lack of degrees of freedom. In addition, variance is observed to be a function of the mean. We introduce a non-parametric adaptive regularization pro...
We present an implementation in the R language for statistical computing of our recent non-parametric joint adaptive mean-variance regularization and variance stabilization procedure. The method is specifically suited for handling difficult problems posed by high-dimensional multivariate datasets (p ≫ n paradigm), such as in 'omics'-type data, amon...
PRIMsrc is a novel implementation of a non-parametric bump hunting procedure, based on the Patient Rule Induction Method (PRIM), offering a unified treatment of outcome variables, including censored time-to-event (Survival), continuous (Regression) and discrete (Classification) responses. To fit the model, it uses a recursive peeling procedure with...
Principal Components Analysis is a widely used technique for dimension
reduction and characterization of variability in multivariate populations. Our
interest lies in studying when and why the rotation to principal components can
be used effectively within a response-predictor set relationship in the context
of mode hunting. Specifically focusing o...
BackgroundTo determine how diets high in saturated fat could increase polyp formation in the mouse model of intestinal neoplasia, Apc
Min/+
, we conducted large-scale metabolome analysis and association study of colon and small intestine polyp formation from plasma and liver samples of Apc
Min/+
vs. wild-type littermates, kept on low vs. high-fat d...
We show that if we have an orthogonal base ($u_1,\ldots,u_p$) in a
$p$-dimensional vector space, and select $p+1$ vectors $v_1,\ldots, v_p$ and
$w$ such that the vectors traverse the origin, then the probability of $w$
being to closer to all the vectors in the base than to $v_1,\ldots, v_p$ is at
least 1/2 and converges as $p$ increases to infinity...
Purpose. The incidence of liver neoplasms is rising in USA. The purpose of this study was to determine metabolic profiles of liver tissue during early cancer development. Methods. We used the rabbit VX2 model of liver tumors (LT) and a control group consisting of sham animals implanted with Gelfoam into their livers (LG). After two weeks from impla...
DEFB4/103A encoding β-defensin 2 and 3, respectively, inhibit CXCR4-tropic (X4) viruses in vitro. We determined whether DEFB4/103A Copy Number Variation (CNV) influences time-to-X4 and time-to-AIDS outcomes.
We utilized samples from a previously published Multicenter AIDS Cohort Study (MACS), which provides longitudinal account of viral tropism in...
Large-scale protein-protein interaction data sets have been generated for several species including yeast and human and have enabled the identification, quantification, and prediction of cellular molecular networks. Affinity purification-mass spectrometry (AP-MS) is the preeminent methodology for large-scale analysis of protein complexes, performed...
Background
Affinity-Purification Mass-Spectrometry (AP-MS) provides a powerful means of identifying protein complexes and interactions. Several important challenges exist in interpreting the results of AP-MS experiments. First, the reproducibility of AP-MS experimental replicates can be low, due both to technical variability and the dynamic nature...
Figure S1. Scatter plots of protein spectral counts vs. protein MASCOT scores (left-hand-side) and protein MASCOT scores vs. protein marginal inclusion probabilities (right-hand-side) in all AP-MS control and bait experiments. Figure S2: Scatter plot of peptide PROPHET probabilities (Prob) onto the peptide MASCOT scores (Score) in all AP-MS control...
Data set and Database Search. Determination of Protein Spectral Counts, Protein MASCOT Scores, and Protein Marginal Inclusion Probabilities. Raw Input Dataset Structure. Initial Pre-filtering. Derivation of Marginal and Joint Inclusion Probabilities of Indicator Prey Proteins. Identification of Reproducible Experimental Replicates and Reproducible...
Biological validation of ROCS protein-protein interaction (PPI) scoring results for the Specific Prey Proteins between SAINT (Posterior Probability PSAINT), ComPASS (D − score) and our method ROCS (C − score) in all AP-MS bait experiments (each on a separate Excel tab-sheet in a single file).
Comparison of SAINT protein-protein interaction (PPI) scoring for the Specific Prey Proteins at the different ROCS procedural stages “N” (SAINT-only), “R”, and “S” (ROCS-SAINT) in all AP-MS bait experiments (each on a separate Excel tab-sheet in a single file).
ROCS lists of Indicator Prey Proteins (IPI) and Reproducible Experimental Replicates (RER) in all AP-MS control and bait experiments (each on a single Excel tab-sheet in a single file).
Comparison of protein-protein interaction (PPI) scoring for the Specific Prey Proteins between SAINT (Posterior Probability PSAINT), ComPASS (D − score) and our method ROCS (C − score) in all AP-MS bait experiments (each on a separate Excel tab-sheet in a single file).
The question of molecular heterogeneity and of tumoral phenotype in cancer remains unresolved. To understand the underlying molecular basis of this phenomenon, we analyzed genome-wide expression data of colon cancer metastasis samples, as these tumors are the most advanced and hence would be anticipated to be the most likely heterogeneous group of...
To define a panel of novel protein biomarkers of renal disease.
Adults with type 1 diabetes in the Coronary Artery Calcification in Type 1 Diabetes study who were initially free of renal complications (n = 465) were followed for development of micro- or macroalbuminuria (MA) and early renal function decline (ERFD, annual decline in estimated glomer...
Allogeneic hematopoietic stem cell transplantation (SCT) is the only curative therapy for many malignant and nonmalignant conditions. Idiopathic pneumonia syndrome (IPS) is a frequently fatal complication that limits successful outcomes. Preclinical models suggest that IPS represents an immune mediated attack on the lung involving elements of both...
Dendritic cells (DC) direct the magnitude, polarity and effector function of the adaptive immune response. DC express toll-like receptors (TLR), antigen capturing and processing machinery, and costimulatory molecules, which facilitate innate sensing and T cell activation. Once activated, DC can efficiently migrate to lymphoid tissue and prime T cel...
Supplemental Figure S3. The top metabolic functions affected by E1A in quiescent cells. The stacked bar chart displays for each canonical pathway the number of genes that were found significantly up-regulated (red), and down-regulated (green) by Bayesian model selection. The molecules/genes in a given pathway that were not found in our list of sign...
Supplemental Figure S2. Validation of E1A-modulated genes identified by microarray analysis. for validation, a set of nine genes (CDC6, CCNE1, MCM3, E2F1, MCM7, SKP2, UNG, PLK1, and CSF1) that were significantly regulated by E1A, as determined by BAM analysis, were tested by qRT-PCR. Left panel: volcano plot of absolute BAM Zcut values plotted vers...
Supplemental Figure S4. Expression of E1A in E1A-inducible BALB/c 3T3 cells after treatment with Dox. Nuclear extracts were prepared from E1A-inducible cells (Clone 13, passages 12, 18 and 31) after treatment with Dox (100 ng/ml) for 6 h. Extracts were then subjected to western blot analysis using M73, an antibody specific for E1A (4,5). The membra...
Supplemental Table S1. Entire list of significant genes differentially expressed in quiescent cells after E1A expression. Listing was generated by using the Bayesian ANOVA for microarrays (BAM). Transcripts are ranked by statistical significance (Zcut values) from top to bottom. Direction of change is indicated by the Zcut sign. Generalized-log-fol...
Supplemental Figure S1. Departure from normality of the distribution of differentially expressed genes. This figure maps differentially expressed genes in quiescent cells after E1A induction onto a normal quantile-quantile plot. Genes found significantly up- and down-regulated by BAM analysis (2401 total, i.e. 1174 up-regulated and 1,227down-regula...
Adenoviruses force quiescent cells to re-enter the cell cycle to replicate their DNA, and for the most part, this is accomplished after they express the E1A protein immediately after infection. In this context, E1A is believed to inactivate cellular proteins (e.g., p130) that are known to be involved in the silencing of E2F-dependent genes that are...
Crooked tail (Cd) mice bear a gain-of-function mutation in Lrp6, a co-receptor for canonical WNT signaling, and are a model of neural tube defects (NTDs), preventable with dietary folic
acid (FA) supplementation. Whether the FA response reflects a direct influence of FA on LRP6 function was tested with prenatal
supplementation in LRP6-deficient emb...
Diabetes mellitus is estimated to affect approximately 24 million people in the United States and more than 150 million people worldwide. There are numerous end organ complications of diabetes, the onset of which can be delayed by early diagnosis and treatment. Although assays for diabetes are well founded, tests for its complications lack sufficie...
Standard genetic mapping techniques scan chromosomal segments for location of genetic linkage and association signals. The majority of these methods consider only correlations at single markers and/or phenotypes with explicit detailing of the genetic structure. These methods tend to be limited by their inability to consider the effect of large numb...
The study of the cascade of events of induction and sequential gene activation that takes place during human embryonic development is hindered by the unavailability of postimplantation embryos at different stages of development. Spontaneous differentiation of human embryonic stem cells (hESCs) can occur by means of the formation of embryoid bodies...
Human embryonic stem cells (ESC) are undifferentiated and are endowed with the capacities of self-renewal and pluripotential differentiation. Adult stem cells renew their own tissue, but whether they can transdifferentiate to other tissues is still controversial. To understand the genetic program that underlies the pluripotency of stem cells, we co...
The synthetic immunomodulator AS101[ammonium trichloro(dioxoethylene-o,o')tellurate] was previously found to protect cancer patients from chemotherapy-induced bone marrow toxicity and alopecia. Here we show that AS101 induces hair growth in nude and normal mice. AS101 possesses the dual ability to both induce anagen and retard spontaneous catagen i...
To gain insight into the transformation of epidermal cells into squamous carcinoma cells (SCC), we compared the response to ultraviolet B radiation (UVB) of normal human epidermal keratinocytes (NHEK) versus their transformed counterpart, SCC, using biological and molecular profiling. DNA microarray analyses (Affymetrix), approximately 12000 genes)...
The therapeutic potential of monoclonal antibodies for treating a variety of severe or life-threatening diseases is high. Although intravenous infusion appears the simplest and most obvious mode of administration, it is not applicable to many long-term treatments. It might be advantageously replaced by gene/cell therapies, however, rendering treatm...