Yulan Liang

University of Maryland, Baltimore, Baltimore, MD, USA

Are you Yulan Liang?

Claim your profile

Publications (25)33.11 Total impact

  • Article: Sequential Support Vector Regression with Embedded Entropy for SNP Selection and Disease Classification.
    Yulan Liang, Arpad Kelemen
    [show abstract] [hide abstract]
    ABSTRACT: Comprehensive evaluation of common genetic variations through association of SNP structure with common diseases on the genome-wide scale is currently a hot area in human genome research. For less costly and faster diagnostics, advanced computational approaches are needed to select the minimum SNPs with the highest prediction accuracy for common complex diseases. In this paper, we present a sequential support vector regression model with embedded entropy algorithm to deal with the redundancy for the selection of the SNPs that have best prediction performance of diseases. We implemented our proposed method for both SNP selection and disease classification, and applied it to simulation data sets and two real disease data sets. Results show that on the average, our proposed method outperforms the well known methods of Support Vector Machine Recursive Feature Elimination, logistic regression, CART, and logic regression based SNP selections for disease classification.
    Statistical Analysis and Data Mining 06/2011; 4(3):301-312.
  • Source
    Article: Use of supplementary phenotype to identify additional rheumatoid arthritis loci in a linkage analysis of 342 UK affected sibling pair families.
    [show abstract] [hide abstract]
    ABSTRACT: Although rheumatoid arthritis has been shown to have moderately strong genetic component, both linked loci identified in linkage analyses and susceptibility variants from association studies are short of adequately accounting for a comprehensive catalogue of the molecular factors underlying this complex disease. The objective of this study was to use supplementary phenotype based on cumulative hazard of rheumatoid arthritis to identify linkage evidence for new and additional rheumatoid arthritis loci in a genome-wide linkage analysis of 342 affected sibling pair families from the United Kingdom. Using proportional hazards model, we estimated cumulative hazard of rheumatoid arthritis and then used it as a quantitative trait in a non-parametric multipoint variance component linkage analysis with 353 microsatellite markers distributed across the 22 autosomal chromosomes. We identified 3 new loci with genome-wide suggestive linkage evidence for rheumatoid arthritis on 9q21.13, 15p11.1 and 20q13.33. Our results also confirmed previously reported linkage evidence in the HLA-DRB1 region on chromosome 6 and on locus 1q32.1. This study demonstrates the potential for information gain through the use of supplementary phenotypes in genetic study of complex diseases to identify new and additional potential linked loci that are not detected by linkage analysis of traditional phenotypes; and our results provide further evidence of the involvement of multiple loci in the genetic aetiology of rheumatoid arthritis.
    BMC Medical Genetics 12/2009; 10:142. · 2.33 Impact Factor
  • Source
    Article: Complex segregation analysis of pedigrees from the Gilda Radner Familial Ovarian Cancer Registry reveals evidence for mendelian dominant inheritance.
    [show abstract] [hide abstract]
    ABSTRACT: Familial component is estimated to account for about 10% of ovarian cancer. However, the mode of inheritance of ovarian cancer remains poorly understood. The goal of this study was to investigate the inheritance model that best fits the observed transmission pattern of ovarian cancer among 7669 members of 1919 pedigrees ascertained through probands from the Gilda Radner Familial Ovarian Cancer Registry at Roswell Park Cancer Institute, Buffalo, New York. Using the Statistical Analysis for Genetic Epidemiology program, we carried out complex segregation analyses of ovarian cancer affection status by fitting different genetic hypothesis-based regressive multivariate logistic models. We evaluated the likelihood of sporadic, major gene, environmental, general, and six types of Mendelian models. Under each hypothesized model, we also estimated the susceptibility allele frequency, transmission probabilities for the susceptibility allele, baseline susceptibility and estimates of familial association. Comparisons between models were carried out using either maximum likelihood ratio test in the case of hierarchical models, or Akaike information criterion for non-nested models. When assessed against sporadic model without familial association, the model with both parent-offspring and sib-sib residual association could not be rejected. Likewise, the Mendelian dominant model that included familial residual association provided the best-fitting for the inheritance of ovarian cancer. The estimated disease allele frequency in the dominant model was 0.21. This report provides support for a genetic role in susceptibility to ovarian cancer with a major autosomal dominant component. This model does not preclude the possibility of polygenic inheritance of combined effects of multiple low penetrance susceptibility alleles segregating dominantly.
    PLoS ONE 02/2009; 4(6):e5939. · 4.09 Impact Factor
  • Article: Bayesian finite Markov mixture model for temporal multi-tissue polygenic patterns.
    Yulan Liang, Arpad Kelemen
    [show abstract] [hide abstract]
    ABSTRACT: Finite mixture models can provide the insights about behavioral patterns as a source of heterogeneity of the various dynamics of time course gene expression data by reducing the high dimensionality and making clear the major components of the underlying structure of the data in terms of the unobservable latent variables. The latent structure of the dynamic transition process of gene expression changes over time can be represented by Markov processes. This paper addresses key problems in the analysis of large gene expression data sets that describe systemic temporal response cascades and dynamic changes to therapeutic doses in multiple tissues, such as liver, skeletal muscle, and kidney from the same animals. Bayesian Finite Markov Mixture Model with a Dirichlet Prior is developed for the identifications of differentially expressed time related genes and dynamic clusters. Deviance information criterion is applied to determine the number of components for model comparisons and selections. The proposed Bayesian models are applied to multiple tissue polygenetic temporal gene expression data and compared to a Bayesian model-based clustering method, named CAGED. Results show that our proposed Bayesian Finite Markov Mixture model can well capture the dynamic changes and patterns for irregular complex temporal data.
    Biometrical Journal 02/2009; 51(1):56-69. · 1.25 Impact Factor
  • Article: Computational Intelligence in Bioinformatics: SNP/Haplotype Data in Genetic Association Study for Common Diseases.
    IEEE Transactions on Information Technology in Biomedicine. 01/2009; 13:841-847.
  • Article: Use of supplementary phenotype to identify additional rheumatoid arthritis loci in a linkage analysis of 342 UK affected sibling pair families
    [show abstract] [hide abstract]
    ABSTRACT: Abstract Background Although rheumatoid arthritis has been shown to have moderately strong genetic component, both linked loci identified in linkage analyses and susceptibility variants from association studies are short of adequately accounting for a comprehensive catalogue of the molecular factors underlying this complex disease. The objective of this study was to use supplementary phenotype based on cumulative hazard of rheumatoid arthritis to identify linkage evidence for new and additional rheumatoid arthritis loci in a genome-wide linkage analysis of 342 affected sibling pair families from the United Kingdom. Methods Using proportional hazards model, we estimated cumulative hazard of rheumatoid arthritis and then used it as a quantitative trait in a non-parametric multipoint variance component linkage analysis with 353 microsatellite markers distributed across the 22 autosomal chromosomes. Results We identified 3 new loci with genome-wide suggestive linkage evidence for rheumatoid arthritis on 9q21.13, 15p11.1 and 20q13.33. Our results also confirmed previously reported linkage evidence in the HLA-DRB1 region on chromosome 6 and on locus 1q32.1. Conclusion This study demonstrates the potential for information gain through the use of supplementary phenotypes in genetic study of complex diseases to identify new and additional potential linked loci that are not detected by linkage analysis of traditional phenotypes; and our results provide further evidence of the involvement of multiple loci in the genetic aetiology of rheumatoid arthritis.
    BMC Medical Genetics. 01/2009;
  • Source
    Article: Bayesian models and meta analysis for multiple tissue gene expression data following corticosteroid administration.
    Yulan Liang, Arpad Kelemen
    [show abstract] [hide abstract]
    ABSTRACT: BACKGROUND: This paper addresses key biological problems and statistical issues in the analysis of large gene expression data sets that describe systemic temporal response cascades to therapeutic doses in multiple tissues such as liver, skeletal muscle, and kidney from the same animals. Affymetrix time course gene expression data U34A are obtained from three different tissues including kidney, liver and muscle. Our goal is not only to find the concordance of gene in different tissues, identify the common differentially expressed genes over time and also examine the reproducibility of the findings by integrating the results through meta analysis from multiple tissues in order to gain a significant increase in the power of detecting differentially expressed genes over time and to find the differential differences of three tissues responding to the drug. RESULTS AND CONCLUSION: Bayesian categorical model for estimating the proportion of the 'call' are used for pre-screening genes. Hierarchical Bayesian Mixture Model is further developed for the identifications of differentially expressed genes across time and dynamic clusters. Deviance information criterion is applied to determine the number of components for model comparisons and selections. Bayesian mixture model produces the gene-specific posterior probability of differential/non-differential expression and the 95% credible interval, which is the basis for our further Bayesian meta-inference. Meta-analysis is performed in order to identify commonly expressed genes from multiple tissues that may serve as ideal targets for novel treatment strategies and to integrate the results across separate studies. We have found the common expressed genes in the three tissues. However, the up/down/no regulations of these common genes are different at different time points. Moreover, the most differentially expressed genes were found in the liver, then in kidney, and then in muscle.
    BMC Bioinformatics 09/2008; 9:354. · 2.75 Impact Factor
  • Source
    Article: Statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic study for complex diseases
    Yulan Liang, Arpad Kelemen
    [show abstract] [hide abstract]
    ABSTRACT: Recent advances of information technology in biomedical sciences and other applied areas have created numerous large diverse data sets with a high dimensional feature space, which provide us a tremendous amount of information and new opportunities for improving the quality of human life. Meanwhile, great challenges are also created driven by the continuous arrival of new data that requires researchers to convert these raw data into scientific knowledge in order to benefit from it. Association studies of complex diseases using SNP data have become more and more popular in biomedical research in recent years. In this paper, we present a review of recent statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic association studies for complex diseases. The review includes both general feature reduction approaches for high dimensional correlated data and more specific approaches for SNPs data, which include unsupervised haplotype mapping, tag SNP selection, and supervised SNPs selection using statistical testing/scoring, statistical modeling and machine learning methods with an emphasis on how to identify interacting loci.
    04/2008;
  • Chapter: Review of Computational Intelligence for Gene-Gene and Gene-Environment Interactions in Disease Mapping
    [show abstract] [hide abstract]
    ABSTRACT: Comprehensive evaluation of common genetic variations through association of SNP structure with common complex disease in the genome-wide scale is currently a hot area in human genome research. Computational science, which includes computational intelligence, has recently become the third method of scientific enquiry besides theory and experimentation. Interest grew fast in developing and applying computational intelligence techniques to disease mapping using SNP and haplotype data. This review provides a coverage of recently developed theories and applications in computational intelligence for gene-gene and gene-environment interactions in complex diseases in genetic association study.
    01/2008: pages 1-16;
  • Chapter: Time Course Gene Expression Classification with Time Lagged Recurrent Neural Network
    Yulan Liang, Arpad Kelemen
    [show abstract] [hide abstract]
    ABSTRACT: Heterogeneous types of gene expressions may provide a better insight into the biological role of gene interaction with the environment, disease development and drug effect at the molecular level. In this chapter for both exploring and prediction purposes a Time Lagged Recurrent Neural Network with trajectory learning is proposed for identifying and classifying the gene functional patterns from the heterogeneous nonlinear time series microarray experiments. The proposed procedures identify gene functional patterns from the dynamics of a state-trajectory learned in the heterogeneous time series and the gradient information over time. Also, the trajectory learning with Back-propagation through time algorithm can recognize gene expression patterns vary over time. This may reveal much more information about the regulatory network underlying gene expressions. The analyzed data were extracted from spotted DNA microarrays in the budding yeast expression measurements, produced by Eisen et al. The gene matrix contained 79 experiments over a variety of heterogeneous experiment conditions. The number of recognized gene patterns in our study ranged from two to ten and were divided into three cases. Optimal network architectures with different memory structures were selected based on Akaike and Bayesian information criteria using two-way factorial design. The optimal model performance was compared to other popular gene classification algorithms, such as Nearest Neighbor, Support Vector Machine, and Self-Organized Map. The reliability of the performance was verified with multiple iterated runs.
    01/2008: pages 149-163;
  • Article: Bayesian state space models for inferring and predicting temporal gene expression profiles.
    Yulan Liang, Arpad Kelemen
    [show abstract] [hide abstract]
    ABSTRACT: Prediction of gene dynamic behavior is a challenging and important problem in genomic research while estimating the temporal correlations and non-stationarity are the keys in this process. Unfortunately, most existing techniques used for the inclusion of the temporal correlations treat the time course as evenly distributed time intervals and use stationary models with time-invariant settings. This is an assumption that is often violated in microarray time course data since the time course expression data are at unequal time points, where the difference in sampling times varies from minutes to days. Furthermore, the unevenly spaced short time courses with sudden changes make the prediction of genetic dynamics difficult. In this paper, we develop two types of Bayesian state space models to tackle this challenge for inferring and predicting the gene expression profiles associated with diseases. In the univariate time-varying Bayesian state space models we treat both the stochastic transition matrix and the observation matrix time-variant with linear setting and point out that this can easily be extended to nonlinear setting. In the multivariate Bayesian state space model we include temporal correlation structures in the covariance matrix estimations. In both models, the unevenly spaced short time courses with unseen time points are treated as hidden state variables. Bayesian approaches with various prior and hyper-prior models with MCMC algorithms are used to estimate the model parameters and hidden variables. We apply our models to multiple tissue polygenetic affymetrix data sets. Results show that the predictions of the genomic dynamic behavior can be well captured by the proposed models.
    Biometrical Journal 01/2008; 49(6):801-14. · 1.25 Impact Factor
  • Article: Model-based or algorithm-based? Statistical evidence for diabetes and treatments using gene expression.
    [show abstract] [hide abstract]
    ABSTRACT: Gene expression profiles obtained from samples of diabetic and normal rats with and without treatments can be used to identify genes that distinguish normal and diabetic individuals and also to evaluate the effectiveness of drug treatments. This study examines changes in global gene expression in rat muscle caused by streptozotocin-induced diabetes and vanadyl sulfate treatment. We explored model-based and algorithm-based methods with gene screening measures for microarray gene expression data to classify and predict individuals with high risk of diabetes. Results show that the mixed ANOVA model-based approach provides an efficient way to conduct an investigation of the inherent variability in gene expression data and to estimate the effects of experimental factors such as treatments and diseases and their interactions. The algorithm-based weighted voting and neural network classifiers show good classification performance for the diabetes and treatment groups. Although neural network performs better than weighted voting with higher classification rate, the interpretation of weighted voting is more straightforward. The study indicates that the choice of the gene selection procedure is at least as important as the choice of the classification procedure. We conclude that both mixed model-based and algorithm-based approaches provide the statistical evidence of the biological hypotheses that vanadyl sulfate treatment of diabetic animals restores gene expression patterns to normal. Although model-based and algorithm-based methods provide different strengths and perspective for the analysis of the same set of data, in general both can be considered and developed for analyzing factorial design experiments with multiple groups and factors. This study represents a major step towards the discovery of responsible genes related to diabetes and its treatment.
    Statistical Methods in Medical Research 05/2007; 16(2):139-53. · 2.44 Impact Factor
  • Article: Hierarchical Bayesian Neural Network for Gene Expression Temporal Patterns
    Yulan Liang, Arpad Kelemen
    [show abstract] [hide abstract]
    ABSTRACT: There are several important issues to be addressed for gene expression temporal patterns' analysis: first, the correlation structure of multidimensional temporal data; second, the numerous sources of variations with existing high level noise; and last, gene expression mostly involves heterogeneous multiple dynamic patterns. We propose a Hierarchical Bayesian Neural Network model to account for the input correlations of time course gene array data. The variations in absolute gene expression levels and the noise can be estimated with the hierarchical Bayesian setting. The network parameters and the hyperparameters were simultaneously optimized with Monte Carlo Markov Chain simulation. Results show that the proposed model and algorithm can well capture the dynamic feature of gene expression temporal patterns despite the high noise levels, the highly correlated inputs, the overwhelming interactions, and other complex features typically present in microarray data. We test and demonstrate the proposed models with yeast cell cycle temporal data sets. The model performance of Hierarchical Bayesian Neural Network was compared to other popular machine learning methods such as Nearest Neighbor, Support Vector Machine, and Self Organized Map.
    Statistical Applications in Genetics and Molecular Biology 02/2007; 3(1):20-20. · 1.52 Impact Factor
  • Source
    Article: Model-based or algorithm-based? Statistical evidence for diabetes and treatments using gene expression
    Yulan Liang, Bamidele Tayo
    [show abstract] [hide abstract]
    ABSTRACT: Gene expression profiles obtained from samples of diabetic and normal rats with and without treatments can be used to identify genes that distinguish normal and diabetic individuals and also to evaluate the effectiveness of drug treatments. This study examines changes in global gene expression in rat muscle caused by streptozotocin-induced diabetes and vanadyl sulfate treatment. We explored model-based and algorithm-based methods with gene screening measures for microarray gene expression data to classify and predict individuals with high risk of diabetes. Results show that the mixed ANOVA model-based approach provides an efficient way to conduct an investigation of the inherent variability in gene expression data and to estimate the effects of experimental factors such as treatments and diseases and their interactions. The algorithm-based weighted voting and neural network classifiers show good classification performance for the diabetes and treatment groups. Although neural network performs better than weighted voting with higher classification rate, the interpretation of weighted voting is more straightforward. The study indicates that the choice of the gene selection procedure is at least as important as the choice of the classification procedure. We conclude that both mixed model-based and algorithm-based approaches provide the statistical evidence of the biological hypotheses that vanadyl sulfate treatment of diabetic animals restores gene expression patterns to normal. Although model-based and algorithm-based methods provide different strengths and perspective for the analysis of the same set of data, in general both can be considered and developed for analyzing factorial design experiments with multiple groups and factors. This study represents a major step towards the discovery of responsible genes related to diabetes and its treatment.
    Statistical Methods in Medical Research 01/2007; 16:139-153. · 2.44 Impact Factor
  • Article: Diabetes-altered gene expression in rat skeletal muscle corrected by oral administration of vanadyl sulfate.
    [show abstract] [hide abstract]
    ABSTRACT: Treatment with vanadium, a representative of a class of antidiabetic compounds, alleviates diabetic hyperglycemia and hyperlipidemia. Oral administration of vanadium compounds in animal models and humans does not cause clinical symptoms of hypoglycemia, a common problem for diabetic patients with insulin treatment. Gene expression, using Affymetrix arrays, was examined in muscle from streptozotocin-induced diabetic and normal rats in the presence or absence of oral vanadyl sulfate treatment. This treatment affected normal rats differently from diabetic rats, as demonstrated by two-way ANOVA of the full array data. Diabetes altered the expression of 133 genes, and the expression of 30% of these genes dysregulated in diabetes was normalized by vanadyl sulfate treatment. For those genes, the ratio of expression in normal animals to the expression in diabetic animals showed a strong negative correlation with the ratio of expression in diabetic animals to the expression in diabetic animals treated with vanadyl sulfate (P = -0.85). The genes identified belong to six major metabolic functional groups: lipid metabolism, oxidative stress, muscle structure, protein breakdown and biosynthesis, the complement system, and signal transduction. The identification of oxidative stress genes, coupled with the known oxidative chemistry of vanadium, implicates reactive oxygen species in the action of this class of compounds. These results imply that early transition metals or compounds formed from their chemical interactions with other metabolites may act as general transcription modulators, a role not usually associated with this class of compounds.
    Physiological Genomics 09/2006; 26(3):192-201. · 2.73 Impact Factor
  • Article: Bayesian dynamic multivariate models for inferring gene interaction networks.
    Yulan Liang, Arpad Kelemen
    [show abstract] [hide abstract]
    ABSTRACT: Constructions of gene and protein dynamic network is a challenging and important problem in genomic research while estimating the temporal correlations and non-stationarity are the keys in this process. In this paper, we develop Bayesian dynamic multivariate models to tackle this challenge for inferring the gene network profiles associated with diseases and treatments. We treat both the stochastic transition matrix and the observation matrix time-variant and include temporal correlation structures in the covariance matrix estimations in the multivariate Bayesian setting. The unevenly spaced short time courses with unseen time points are treated as hidden state variables. Bayesian approaches with various prior and hyper-prior models with MCMC algorithms are used to estimate the model parameters. We apply our models to multiple tissue polygenetic affymetrix data sets. Preliminary results show that the genomic dynamic behavior can be well captured by the proposed model.
    Conference proceedings: ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference 02/2006; 1:2041-4.
  • Source
    Article: Associating phenotypes with molecular events: recent statistical advances and challenges underpinning microarray experiments.
    Yulan Liang, Arpad Kelemen
    [show abstract] [hide abstract]
    ABSTRACT: Progress in mapping the genome and developments in array technologies have provided large amounts of information for delineating the roles of genes involved in complex diseases and quantitative traits. Since complex phenotypes are determined by a network of interrelated biological traits typically involving multiple inter-correlated genetic and environmental factors that interact in a hierarchical fashion, microarrays hold tremendous latent information. The analysis of microarray data is, however, still a bottleneck. In this paper, we review the recent advances in statistical analyses for associating phenotypes with molecular events underpinning microarray experiments. Classical statistical procedures to analyze phenotypes in genetics are reviewed first, followed by descriptions of the statistical procedures for linking molecular events to measured gene expression phenotypes (microarray-based gene expression) and observed phenotypes such as diseases status. These statistical procedures include (1) prior analysis, such as data quality controls, and normalization analyses for minimizing the effects of experimental artifacts and random noise; (2) gene selections and differentiation procedures based on inferential statistics for the class comparisons; (3) dynamic temporal patterns analysis through exploratory statistics such as unsupervised clustering and supervised classification and predictions; (4) assessing the reliability of microarray studies using real-time PCR and the reproducibility issues from many studies and multiple platforms. In addition, the post analysis to associate the discovered patterns of gene expression to pathway and functional analysis for selected genes are also considered in order to increase our understanding of interconnected gene processes.
    Functional and Integrative Genomics 02/2006; 6(1):1-13. · 2.84 Impact Factor
  • Source
    Article: Genome-wide linkage analysis of age at onset of alcohol dependence: a comparison between microsatellites and single-nucleotide polymorphisms.
    [show abstract] [hide abstract]
    ABSTRACT: Using the dataset provided for Genetic Analysis Workshop 14 by the Collaborative Study on the Genetics of Alcoholism, we performed genome-wide linkage analysis of age at onset of alcoholism to compare the utility of microsatellites and single-nucleotide polymorphisms (SNPs) in genetic linkage study. A multipoint nonparametric variance component linkage analysis method was applied to the survival distribution function obtained from semiparametric proportional hazards model of the age at onset phenotype of alcoholism. Three separate linkage analyses were carried out using 315 microsatellites, 2,467 and 9,467 SNPs, spanning the 22 autosomal chromosomes. Heritability of age at onset was estimated to be approximately 12% (p < 0.001). We observed weak correlation, both in trend and strength, of genome-wide linkage signals between microsatellites and SNPs. Results from SNPs revealed more and stronger linkage signals across the genome compared with those from microsatellites. The only suggestive evidence of linkage from microsatellites was on chromosome 1 (LOD of 1.43). Differences in map densities between the two sets of SNPs used in this study did not appear to confer an advantage in terms of strength of linkage signals. Our study provided support for better performance of dense SNP maps compared with the sparse microsatellite maps currently available for linkage analysis of quantitative traits. This better performance could be attributable to precise definition and high map resolutions achievable with dense SNP maps, thus resulting in increased power to detect possible loci affecting given trait or disease.
    BMC Genetics 12/2005; 6 Suppl 1:S12. · 2.47 Impact Factor
  • Source
    Article: Learning High Quality Decisions with Neural Networks in" Conscious" Software Agents
    [show abstract] [hide abstract]
    ABSTRACT: Finding suitable jobs for US Navy sailors periodically is an important and ever-changing process. An Intelligent Distribution Agent (IDA) and particularly its constraint satisfaction module take up the challenge to automate the process. The constraint satisfaction module's main task is to provide the bulk of the decision making process in assigning sailors to new jobs in order to maximize Navy and sailor "happiness". We propose Multilayer Perceptron neural network with structural learning in combination with statistical criteria to aid IDA's constraint satisfaction, which is also capable of learning high quality decision making over time. Multilayer Perceptron (MLP) with different structures and algorithms, Feedforward Neural Network (FFNN) with logistic regression and Support Vector Machine (SVM) with Radial Basis Function (RBF) as network structure and Adatron learning algorithm are presented for comparative analysis. Discussion of Operations Research and standard optimization techniques is also provided. The subjective indeterminate nature of the detailer decisions make the optimization problem nonstandard. Multilayer Perceptron neural network with structural learning and Support Vector Machine produced highly accurate classification and encouraging prediction.
    WSEAS TRANSACTIONS ON SYSTEMS Issue. 10/2005; 9(4):1109-2777.
  • Article: Differential and trajectory methods for time course gene expression data.
    [show abstract] [hide abstract]
    ABSTRACT: MOTIVATION: The issue of high dimensionality in microarray data has been, and remains, a hot topic in statistical and computational analysis. Efficient gene filtering and differentiation approaches can reduce the dimensions of data, help to remove redundant genes and noises, and highlight the most relevant genes that are major players in the development of certain diseases or the effect of drug treatment. The purpose of this study is to investigate the efficiency of parametric (including Bayesian and non-Bayesian, linear and non-linear), non-parametric and semi-parametric gene filtering methods through the application of time course microarray data from multiple sclerosis patients being treated with interferon-beta-1a. The analysis of variance with bootstrapping (parametric), class dispersion (semi-parametric) and Pareto (non-parametric) with permutation methods are presented and compared for filtering and finding differentially expressed genes. The Bayesian linear correlated model, the Bayesian non-linear model the and non-Bayesian mixed effects model with bootstrap were also developed to characterize the differential expression patterns. Furthermore, trajectory-clustering approaches were developed in order to investigate the dynamic patterns and inter-dependency of drug treatment effects on gene expression. RESULTS: Results show that the presented methods performed significant differently but all were adequate in capturing a small number of the potentially relevant genes to the disease. The parametric method, such as the mixed model and two Bayesian approaches proved to be more conservative. This may because these methods are based on overall variation in expression across all time points. The semi-parametric (class dispersion) and non-parametric (Pareto) methods were appropriate in capturing variation in expression from time point to time point, thereby making them more suitable for investigating significant monotonic changes and trajectories of changes in gene expressions in time course microarray data. Also, the non-linear Bayesian model proved to be less conservative than linear Bayesian correlated growth models to filter out the redundant genes, although the linear model showed better fit than non-linear model (smaller DIC). We also report the trajectories of significant genes-since we have been able to isolate trajectories of genes whose regulations appear to be inter-dependent.
    Bioinformatics 08/2005; 21(13):3009-16. · 5.47 Impact Factor

Institutions

  • 2008–2011
    • University of Maryland, Baltimore
      • • Department of Family and Community Health (FCH)
      • • Department of Organizational Systems and Adult Health (OSAH)
      Baltimore, MD, USA
  • 2004–2008
    • University at Buffalo, The State University of New York
      • Department of Biostatistics
      Buffalo, NY, USA
  • 2007
    • Niagara University
      Buffalo, NY, USA
  • 2002–2005
    • The University of Memphis
      • Department of Mathematical Sciences
      Memphis, TN, USA