[Show abstract][Hide abstract] ABSTRACT: Principal Components Analysis is a widely used technique for dimension
reduction and characterization of variability in multivariate populations. Our
interest lies in studying when and why the rotation to principal components can
be used effectively within a response-predictor set relationship in the context
of mode hunting. Specifically focusing on the Patient Rule Induction Method
(PRIM), we first develop a fast version of this algorithm (fastPRIM) under
normality which facilitates the theoretical studies to follow. Using basic
geometrical arguments, we then demonstrate how the PC rotation of the predictor
space alone can in fact generate improved mode estimators. Simulation results
are used to illustrate our findings.
[Show abstract][Hide abstract] ABSTRACT: To determine how diets high in saturated fat could increase polyp formation in the mouse model of intestinal neoplasia, ApcMin/+, we conducted large-scale metabolome analysis and association study of colon and small intestine polyp formation from plasma and liver samples of ApcMin/+ vs. wild-type littermates, kept on low vs. high-fat diet. Label-free mass spectrometry was used to quantify untargeted plasma and acyl-CoA liver compounds, respectively. Differences in contrasts of interest were analyzed statistically by unsupervised and supervised modeling approaches, namely Principal Component Analysis and Linear Model of analysis of variance. Correlation between plasma metabolite concentrations and polyp numbers was analyzed with a zero-inflated Generalized Linear Model.
BMC Systems Biology 06/2014; 8(1):72. · 2.98 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We show that if we have an orthogonal base ($u_1,\ldots,u_p$) in a
$p$-dimensional vector space, and select $p+1$ vectors $v_1,\ldots, v_p$ and
$w$ such that the vectors traverse the origin, then the probability of $w$
being to closer to all the vectors in the base than to $v_1,\ldots, v_p$ is at
least 1/2 and converges as $p$ increases to infinity to a normal distribution
on the interval [-1,1]; i.e., $\Phi(1)-\Phi(-1)\approx0.6826$. This result has
relevant consequences for Principal Components Analysis in the context of
regression and other learning settings, if we take the orthogonal base as the
direction of the principal components.
[Show abstract][Hide abstract] ABSTRACT: Large-scale protein-protein interaction data sets have been generated for several species including yeast and human and have enabled the identification, quantification, and prediction of cellular molecular networks. Affinity purification-mass spectrometry (AP-MS) is the preeminent methodology for large-scale analysis of protein complexes, performed by immunopurifying a specific "bait" protein and its associated "prey" proteins. The analysis and interpretation of AP-MS data sets is, however, not straightforward. In addition, although yeast AP-MS data sets are relatively comprehensive, current human AP-MS data sets only sparsely cover the human interactome. Here we develop a framework for analysis of AP-MS data sets that addresses the issues of noise, missing data, and sparsity of coverage in the context of a current, real world human AP-MS data set. Our goal is to extend and increase the density of the known human interactome by integrating bait-prey and cocomplexed preys (prey-prey associations) into networks. Our framework incorporates a score for each identified protein, as well as elements of signal processing to improve the confidence of identified protein-protein interactions. We identify many protein networks enriched in known biological processes and functions. In addition, we show that integrated bait-prey and prey-prey interactions can be used to refine network topology and extend known protein networks.
Journal of Proteome Research 07/2012; 11(9):4476-87. · 5.06 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The paper addresses a common problem in the analysis of high-dimensional high-throughput "omics" data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel "similarity statistic"-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called 'MVR' ('Mean-Variance Regularization'), downloadable from the CRAN website.
[Show abstract][Hide abstract] ABSTRACT: BACKGROUND: Affinity-Purification Mass-Spectrometry (AP-MS) provides a powerful means of identifyingprotein complexes and interactions. Several important challenges exist in interpreting theresults of AP-MS experiments. First, the reproducibility of AP-MS experimental replicatescan be low, due both to technical variability and the dynamic nature of protein interactions inthe cell. Second, the identification of true protein-protein interactions in AP-MS experimentsis subject to inaccuracy due to high false negative and false positive rates. Severalexperimental approaches can be used to mitigate these drawbacks, including the use ofreplicated and control experiments and relative quantification to sensitively distinguish trueinteracting proteins from false ones. RESULTS: To address the issues of reproducibility and accuracy of protein-protein interactions, weintroduce a two-step method, called ROCS, which makes use of Indicator Proteins to selectreproducible AP-MS experiments, and of Confidence Scores to select specific protein-proteininteractions. The Indicator Proteins account for measures of protein identification as well asprotein reproducibility, effectively allowing removal of outlier experiments that contributenoise and affect downstream inferences. The filtered set of experiments is then used in theProtein-Protein Interaction (PPI) scoring step. Prey protein scoring is done by computing aConfidence Score, which accounts for the probability of occurrence of prey proteins in thebait experiments relative to the control experiment, where the significance cutoff parameter isestimated by simultaneously controlling false positives and false negatives against metrics offalse discovery rate and biological coherence respectively. In summary, the ROCS methodrelies on automatic objective criterions for parameter estimation and error-controlledprocedures. We illustrate the performance of our method by applying it to five previously published AP-MS experiments, each containing well characterized protein interactions,allowing for systematic benchmarking of ROCS. We show that our method may be used onits own to make accurate identification of specific, biologically relevant protein-proteininteractions or in combination with other AP-MS scoring methods to significantly improveinferences. CONCLUSIONS: Our method addresses important issues encountered in AP-MS datasets, making ROCS a verypromising tool for this purpose, either on its own or especially in conjunction with othermethods. We anticipate that our methodology may be used more generally in proteomicsstudies and databases, where experimental reproducibility issues arise. The method isimplemented in the R language, and is available as an R package called "ROCS", freelyavailable from the CRAN repository http://cran.r-project.org/.
[Show abstract][Hide abstract] ABSTRACT: To define a panel of novel protein biomarkers of renal disease.
Adults with type 1 diabetes in the Coronary Artery Calcification in Type 1 Diabetes study who were initially free of renal complications (n = 465) were followed for development of micro- or macroalbuminuria (MA) and early renal function decline (ERFD, annual decline in estimated glomerular filtration rate of ≥3.3%). The label-free proteomic discovery phase was conducted in 13 patients who progressed to MA by the 6-year visit and 11 control subjects, and four proteins (Tamm-Horsfall glycoprotein, α-1 acid glycoprotein, clusterin, and progranulin) identified in the discovery phase were measured by enzyme-linked immunosorbent assay in 74 subjects: group A, normal renal function (n = 35); group B, ERFD without MA (n = 15); group C, MA without ERFD (n = 16); and group D, both ERFD and MA (n = 8).
In the label-free analysis, a model of progression to MA was built using 252 peptides, yielding an area under the curve (AUC) of 84.7 ± 5.3%. In the validation study, ordinal logistic regression was used to predict development of ERFD, MA, or both. A panel including Tamm-Horsfall glycoprotein (odds ratio 2.9, 95% CI 1.3-6.2, P = 0.008), progranulin (1.9, 0.8-4.5, P = 0.16), clusterin (0.6, 0.3-1.1, P = 0.09), and α-1 acid glycoprotein (1.6, 0.7-3.7, P = 0.27) improved the AUC from 0.841 to 0.889.
A panel of four novel protein biomarkers predicted early renal damage in type 1 diabetes. These findings require further validation in other populations for prediction of renal complications and treatment monitoring.
Diabetes care 03/2012; 35(3):549-55. · 7.74 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Allogeneic hematopoietic stem cell transplantation (SCT) is the only curative therapy for many malignant and nonmalignant conditions. Idiopathic pneumonia syndrome (IPS) is a frequently fatal complication that limits successful outcomes. Preclinical models suggest that IPS represents an immune mediated attack on the lung involving elements of both the adaptive and the innate immune system. However, the etiology of IPS in humans is less well understood. To explore the disease pathway and uncover potential biomarkers of disease, we performed two separate label-free, proteomics experiments defining the plasma protein profiles of allogeneic SCT patients with IPS. Samples obtained from SCT recipients without complications served as controls. The initial discovery study, intended to explore the disease pathway in humans, identified a set of 81 IPS-associated proteins. These data revealed similarities between the known IPS pathways in mice and the condition in humans, in particular in the acute phase response. In addition, pattern recognition pathways were judged to be significant as a function of development of IPS, and from this pathway we chose the lipopolysaccaharide-binding protein (LBP) protein as a candidate molecular diagnostic for IPS, and verified its increase as a function of disease using an ELISA assay. In a separately designed study, we identified protein-based classifiers that could predict, at day 0 of SCT, patients who: 1) progress to IPS and 2) respond to cytokine neutralization therapy. Using cross-validation strategies, we built highly predictive classifier models of both disease progression and therapeutic response. In sum, data generated in this report confirm previous clinical and experimental findings, provide new insights into the pathophysiology of IPS, identify potential molecular classifiers of the condition, and uncover a set of markers potentially of interest for patient stratification as a basis for individualized therapy.
[Show abstract][Hide abstract] ABSTRACT: DEFB4/103A encoding β-defensin 2 and 3, respectively, inhibit CXCR4-tropic (X4) viruses in vitro. We determined whether DEFB4/103A Copy Number Variation (CNV) influences time-to-X4 and time-to-AIDS outcomes.
We utilized samples from a previously published Multicenter AIDS Cohort Study (MACS), which provides longitudinal account of viral tropism in relation to the full spectrum of rates of disease progression. Using traditional models for time-to-event analysis, we investigated association between DEFB4/103A CNV and the two outcomes, and interaction between DEFB4/103A CNV and disease progression groups, Fast and Slow.
Time-to-X4 and time-to-AIDS were weakly correlated. There was a stronger relationship between these two outcomes for the fast progressors. DEFB4/103A CNV was associated with time-to-AIDS, but not time-to-X4. The association between higher DEFB4/103A CNV and time-to-AIDS was more pronounced for the slow progressors.
DEFB4/103A CNV was associated with time-to-AIDS in a disease progression group-specific manner in the MACS cohort. Our findings may contribute to enhancing current understanding of how genetic predisposition influences AIDS progression.
Journal of AIDS & Clinical Research 01/2012; 3(10). · 6.83 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The question of molecular heterogeneity and of tumoral phenotype in cancer remains unresolved. To understand the underlying molecular basis of this phenomenon, we analyzed genome-wide expression data of colon cancer metastasis samples, as these tumors are the most advanced and hence would be anticipated to be the most likely heterogeneous group of tumors, potentially exhibiting the maximum amount of genetic heterogeneity. Casting a statistical net around such a complex problem proves difficult because of the high dimensionality and multicollinearity of the gene expression space, combined with the fact that genes act in concert with one another and that not all genes surveyed might be involved. We devise a strategy to identify distinct subgroups of samples and determine the genetic/molecular signature that defines them. This involves use of the local sparse bump hunting algorithm, which provides a much more optimal and biologically faithful transformed space within which to search for bumps. In addition, thanks to the variable selection feature of the algorithm, we derived a novel sparse gene expression signature, which appears to divide all colon cancer patients into two populations: a population whose expression pattern can be molecularly encompassed within the bump and an outlier population that cannot be. Although all patients within any given stage of the disease, including the metastatic group, appear clinically homogeneous, our procedure revealed two subgroups in each stage with distinct genetic/molecular profiles. We also discuss implications of such a finding in terms of early detection, diagnosis and prognosis.
Statistics in Medicine 11/2011; 31(11-12):1203-20. · 2.04 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Dendritic cells (DC) direct the magnitude, polarity and effector function of the adaptive immune response. DC express toll-like receptors (TLR), antigen capturing and processing machinery, and costimulatory molecules, which facilitate innate sensing and T cell activation. Once activated, DC can efficiently migrate to lymphoid tissue and prime T cell responses. Therefore, DC play an integral role as mediators of the immune response to multiple pathogens. Elucidating the molecular mechanisms involved in DC activation is therefore central in gaining an understanding of host response to infection. Unfortunately, technical constraints have limited system-wide 'omic' analysis of human DC subsets collected ex vivo. Here we have applied novel proteomic approaches to human myeloid dendritic cells (mDCs) purified from 100 mL of peripheral blood to characterize specific molecular networks of cell activation at the individual patient level, and have successfully quantified over 700 proteins from individual samples containing as little as 200,000 mDCs. The proteomic and network readouts after ex vivo stimulation of mDCs with TLR3 agonists are measured and verified using flow cytometry.
Journal of immunological methods 09/2011; 375(1-2):39-45. · 2.35 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Adenoviruses force quiescent cells to re-enter the cell cycle to replicate their DNA, and for the most part, this is accomplished after they express the E1A protein immediately after infection. In this context, E1A is believed to inactivate cellular proteins (e.g., p130) that are known to be involved in the silencing of E2F-dependent genes that are required for cell cycle entry. However, the potential perturbation of these types of genes by E1A relative to their functions in regulatory networks and canonical pathways remains poorly understood.
We have used DNA microarrays analyzed with Bayesian ANOVA for microarray (BAM) to assess changes in gene expression after E1A alone was introduced into quiescent cells from a regulated promoter. Approximately 2,401 genes were significantly modulated by E1A, and of these, 385 and 1033 met the criteria for generating networks and functional and canonical pathway analysis respectively, as determined by using Ingenuity Pathway Analysis software. After focusing on the highest-ranking cellular processes and regulatory networks that were responsive to E1A in quiescent cells, we observed that many of the up-regulated genes were associated with DNA replication, the cell cycle and cellular compromise. We also identified a cadre of up regulated genes with no previous connection to E1A; including genes that encode components of global DNA repair systems and DNA damage checkpoints. Among the down-regulated genes, we found that many were involved in cell signalling, cell movement, and cellular proliferation. Remarkably, a subset of these was also associated with p53-independent apoptosis, and the putative suppression of this pathway may be necessary in the viral life cycle until sufficient progeny have been produced.
These studies have identified for the first time a large number of genes that are relevant to E1A's activities in promoting quiescent cells to re-enter the cell cycle in order to create an optimum environment for adenoviral replication.
[Show abstract][Hide abstract] ABSTRACT: The search for structures in real datasets e.g. in the form of bumps, components, classes or clusters is important as these often reveal underlying phenomena leading to scientific discoveries. One of these tasks, known as bump hunting, is to locate domains of a multidimensional input space where the target function assumes local maxima without pre-specifying their total number. A number of related methods already exist, yet are challenged in the context of high dimensional data. We introduce a novel supervised and multivariate bump hunting strategy for exploring modes or classes of a target function of many continuous variables. This addresses the issues of correlation, interpretability, and high-dimensionality (p ≫ n case), while making minimal assumptions. The method is based upon a divide and conquer strategy, combining a tree-based method, a dimension reduction technique, and the Patient Rule Induction Method (PRIM). Important to this task, we show how to estimate the PRIM meta-parameters. Using accuracy evaluation procedures such as cross-validation and ROC analysis, we show empirically how the method outperforms a naive PRIM as well as competitive non-parametric supervised and unsupervised methods in the problem of class discovery. The method has practical application especially in the case of noisy high-throughput data. It is applied to a class discovery problem in a colon cancer micro-array dataset aimed at identifying tumor subtypes in the metastatic stage. Supplemental Materials are available online.
Journal of Computational and Graphical Statistics 12/2010; 19(4):900-929. · 1.27 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Crooked tail (Cd) mice bear a gain-of-function mutation in Lrp6, a co-receptor for canonical WNT signaling, and are a model of neural tube defects (NTDs), preventable with dietary folic acid (FA) supplementation. Whether the FA response reflects a direct influence of FA on LRP6 function was tested with prenatal supplementation in LRP6-deficient embryos. The enriched FA (10 ppm) diet reduced the occurrence of birth defects among all litters compared with the control (2 ppm FA) diet, but did so by increasing early lethality of Lrp6(-/-) embryos while actually increasing NTDs among nulls alive at embryonic days 10-13 (E10-13). Proliferation in cranial neural folds was reduced in homozygous Lrp6(-/-) mutants versus wild-type embryos at E10, and FA supplementation increased proliferation in wild-type but not mutant neuroepithelia. Canonical WNT activity was reduced in LRP6-deficient midbrain-hindbrain at E9.5, demonstrated in vivo by a TCF/LEF-reporter transgene. FA levels in media modulated the canonical WNT response in NIH3T3 cells, suggesting that although FA was required for optimal WNT signaling, even modest FA elevations attenuated LRP5/6-dependent canonical WNT responses. Gene expression analysis in embryos and adults showed striking interactions between targeted Lrp6 deficiency and FA supplementation, especially for mitochondrial function, folate and methionine metabolism, WNT signaling and cytoskeletal regulation that together implicate relevant signaling and metabolic pathways supporting cell proliferation, morphology and differentiation. We propose that FA supplementation rescues Lrp6(Cd/Cd) fetuses by normalizing hyperactive WNT activity, whereas in LRP6-deficient embryos, added FA further attenuates reduced WNT activity, thereby compromising development.
Human Molecular Genetics 12/2010; 19(23):4560-72. · 7.69 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Diabetes mellitus is estimated to affect approximately 24 million people in the United States and more than 150 million people worldwide. There are numerous end organ complications of diabetes, the onset of which can be delayed by early diagnosis and treatment. Although assays for diabetes are well founded, tests for its complications lack sufficient specificity and sensitivity to adequately guide these treatment options. In our study, we employed a streptozotocin-induced rat model of diabetes to determine changes in urinary protein profiles that occur during the initial response to the attendant hyperglycemia (e.g. the first two months) with the goal of developing a reliable and reproducible method of analyzing multiple urine samples as well as providing clues to early markers of disease progression. After filtration and buffer exchange, urinary proteins were digested with a specific protease, and the relative amounts of several thousand peptides were compared across rat urine samples representing various times after administration of drug or sham control. Extensive data analysis, including imputation of missing values and normalization of all data was followed by ANOVA analysis to discover peptides that were significantly changing as a function of time, treatment and interaction of the two variables. The data demonstrated significant differences in protein abundance in urine before observable pathophysiological changes occur in this animal model and as function of the measured variables. These included decreases in relative abundance of major urinary protein precursor and increases in pro-alpha collagen, the expression of which is known to be regulated by circulating levels of insulin and/or glucose. Peptides from these proteins represent potential biomarkers, which can be used to stage urogenital complications from diabetes. The expression changes of a pro-alpha 1 collagen peptide was also confirmed via selected reaction monitoring.
[Show abstract][Hide abstract] ABSTRACT: Standard genetic mapping techniques scan chromosomal segments for location of genetic linkage and association signals. The majority of these methods consider only correlations at single markers and/or phenotypes with explicit detailing of the genetic structure. These methods tend to be limited by their inability to consider the effect of large numbers of model variables jointly. In contrast, we propose a Bayesian analysis of variance (ANOVA) method to categorize individuals based on similarity of multidimensional profiles and attempt to analyze all variables simultaneously. Using Problem 1 of the Genetic Analysis Workshop 15 data set, we demonstrate the method's utility for joint analysis of gene expression levels and single-nucleotide polymorphism genotypes. We show that the method extracts similar information to that of previous genetic mapping analyses, and suggest extensions of the method for mining unique information not previously found.
[Show abstract][Hide abstract] ABSTRACT: The study of the cascade of events of induction and sequential gene activation that takes place during human embryonic development is hindered by the unavailability of postimplantation embryos at different stages of development. Spontaneous differentiation of human embryonic stem cells (hESCs) can occur by means of the formation of embryoid bodies (EBs), which resemble certain aspects of early embryos to some extent. Embryonic vascular formation, vasculogenesis, is a sequential process that involves complex regulatory cascades. In this study, changes of gene expression along the development of human EBs for 4 weeks were studied by large-scale gene screening. Two main clusters were identified-one of down-regulated genes such as POU5, NANOG, TDGF1/Cripto (TDGF, teratocarcinoma-derived growth factor-1), LIN28, CD24, TERF1 (telomeric repeat binding factor-1), LEFTB (left-right determination, factor B), and a second of up-regulated genes such as TWIST, WNT5A, WT1, AFP, ALB, NCAM1. Focusing on the vascular system development, genes known to be involved in vasculogenesis and angiogenesis were explored. Up-regulated genes include vasculogenic growth factors such as VEGFA, VEGFC, FIGF (VEGFD), ANG1, ANG2, TGFbeta3, and PDGFB, as well as the related receptors FLT1, FLT4, PDGFRB, TGFbetaR2, and TGFbetaR3, other markers such as CD34, VCAM1, PECAM1, VE-CAD, and transcription factors TAL1, GATA2, and GATA3. The reproducibility of the array data was verified independently and illustrated that many genes known to be involved in vascular development are activated during the differentiation of hESCs in culture. Hence, the analysis of the vascular system can be extended to other differentiation pathways, allocating human EBs as an in vitro model to study early human development.
[Show abstract][Hide abstract] ABSTRACT: Human embryonic stem cells (ESC) are undifferentiated and are endowed with the capacities of self-renewal and pluripotential differentiation. Adult stem cells renew their own tissue, but whether they can transdifferentiate to other tissues is still controversial. To understand the genetic program that underlies the pluripotency of stem cells, we compared the transcription profile of ESC with that of progenitor/stem cells of human hematopoietic and keratinocytic origins, along with their mature cells to be viewed as snapshots along tissue differentiation. ESC gene profiles show higher complexity with significantly more highly expressed genes than adult cells. We hypothesize that ESC use a strategy of expressing genes that represent various differentiation pathways and selection of only a few for continuous expression upon differentiation to a particular target. Such a strategy may be necessary for the pluripotency of ESC. The progenitors of either hematopoietic or keratinocytic cells also follow the same design principle. Using advanced clustering, we show that many of the ESC expressed genes are turned off in the progenitors/stem cells followed by a further down-regulation in adult tissues. Concomitantly, genes specific to the target tissue are up-regulated toward mature cells of skin or blood.
The FASEB Journal 02/2005; 19(1):147-9. · 5.70 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The synthetic immunomodulator AS101[ammonium trichloro(dioxoethylene-o,o')tellurate] was previously found to protect cancer patients from chemotherapy-induced bone marrow toxicity and alopecia. Here we show that AS101 induces hair growth in nude and normal mice. AS101 possesses the dual ability to both induce anagen and retard spontaneous catagen in the C57BL/6 mouse model. Anagen induced by AS101 is mediated by keratinocyte growth factor (KGF), as it is abrogated both in nude mice co-treated with AS101 plus neutralizing anti KGF antibodies and in AS101-treated transgenic mice expressing a dominant-negative KGF receptor transgene in basal keratinocytes. AS101 up-regulates KGF expression by activating the ras signaling pathway in cultured fibroblasts. AS101-induced delayed catagen is associated with inhibition of terminal differentiation marker expression both in nude and C57BL/6 mice epidermal follicular keratinocytes and in cultures of primary mouse follicular keratinocytes induced to differentiate. This activity is associated with relatively sustained elevation of p21waf. Delayed expression of terminal differentiation markers was not induced by AS101 in follicular keratinocytes from p21waf knockout mice. Because similar results were obtained with cultures of primary human keratinocytes and fibroblasts, preliminary case report studies revealed substantial hair growth when AS101 was topically applied on three adolescents who had remained alopeciac 1-2 years after chemotherapy. The results emphasize the unique mode of action of AS101 and highlight its potential clinical use for treating certain types of alopecia.
The FASEB Journal 03/2004; 18(2):400-2. · 5.70 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: To gain insight into the transformation of epidermal cells into squamous carcinoma cells (SCC), we compared the response to ultraviolet B radiation (UVB) of normal human epidermal keratinocytes (NHEK) versus their transformed counterpart, SCC, using biological and molecular profiling. DNA microarray analyses (Affymetrix), approximately 12000 genes) indicated that the major group of upregulated genes in keratinocytes fall into three categories: (i). antiapoptotic and cell survival factors, including chemokines of the CXC/CC subfamilies (e.g. IL-8, GRO-1, -2, -3, SCYA20), growth factors (e.g. HB-EGF, CTGF, INSL-4), and proinflammatory mediators (e.g. COX-2, S100A9), (ii). DNA repair-related genes (e.g. GADD45, ERCC, BTG-1, Histones), and (iii). ECM proteases (MMP-1, -10). The major downregulated genes are DeltaNp63 and PUMILIO, two potential markers for the maintenance of keratinocyte stem cells. NHEK were found to be more resistant than SCC to UVB-induced apoptosis and this resistance was mainly because of the protection from cell death by secreted survival factors, since it can be transferred from NHEK to SCC cultures by the conditioned medium. Whereas the response of keratinocytes to UVB involved regulation of key checkpoint genes (p53, MDM2, p21(Cip1), DeltaNp63), as well as antiapoptotic and DNA repair-related genes - no or little regulation of these genes was observed in SCC. The effect of UVB on NHEK and SCC resulted in upregulation of 251 and 127 genes, respectively, and downregulation of 322 genes in NHEK and 117 genes in SCC. To further analyse these changes, we used a novel unsupervised coupled two-way clustering method that allowed the identification of groups of genes that clearly partitioned keratinocytes from SCC, including a group of genes whose constitutive expression levels were similar before UVB. This allowed the identification of discriminating genes not otherwise revealed by simple static comparison in the absence of UVB irradiation. The implication of the changes in gene profile in keratinocytes for epithelial cancer is discussed.