Ina Hoeschele

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

Are you Ina Hoeschele?

Claim your profile

Publications (24)71.14 Total impact

  • Article: Simulating systems genetics data with SysGenSIM.
    [show abstract] [hide abstract]
    ABSTRACT: SysGenSIM is a software package to simulate Systems Genetics (SG) experiments in model organisms, for the purpose of evaluating and comparing statistical and computational methods and their implementations for analyses of SG data [e.g. methods for expression quantitative trait loci (eQTL) mapping and network inference]. SysGenSIM allows the user to select a variety of network topologies, genetic and kinetic parameters to simulate SG data ( genotyping, gene expression and phenotyping) with large gene networks with thousands of nodes. The software is encoded in MATLAB, and a user-friendly graphical user interface is provided. The open-source software code and user manual can be downloaded at: http://sysgensim.sourceforge.net/ alf@crs4.it.
    Bioinformatics 09/2011; 27(17):2459-62. · 5.47 Impact Factor
  • Article: Sporadic breast cancer patients' germline DNA exhibit an AT-rich microsatellite signature.
    [show abstract] [hide abstract]
    ABSTRACT: Using a custom CGH-like oligonucleotide array to measure the global microsatellite content in the genomes of 72 cancer, cancer-free, and high risk patient and cell line samples (56 germline DNA and 16 in tumor or tumor cell line DNA) we found a unique, reproducible, and statistically significant pattern of 18 motif-specific microsatellite families (out of 962 possible 1-6 mer repeats) in breast cancer patient germline and tumor DNA, but not in germline DNA of cancer-free volunteer controls or in breast cancer patients with BRCA1/2 mutations. These high-similarity A/T rich repetitive motifs were also more pronounced in the germlines and tumors of colon cancer tumor patients (3/6 samples) and microsatellite unstable colon cancer cell lines; however, germline DNA of sporadic breast cancer patients exhibited the largest global content shift for those motifs with extreme AT/GC ratios. These results indicate that global microsatellite variability is complex, suggest the existence of a previously unknown genomic destabilization mechanism in breast cancer patients' germline DNA, and warrant further testing of such microsatellite variability as a predictor of future breast cancer development.
    Genes Chromosomes and Cancer 04/2011; 50(4):275-83. · 3.31 Impact Factor
  • Article: Nonparametric Bayesian variable selection with applications to multiple quantitative trait loci mapping with epistasis and gene-environment interaction.
    [show abstract] [hide abstract]
    ABSTRACT: The joint action of multiple genes is an important source of variation for complex traits and human diseases. However, mapping genes with epistatic effects and gene-environment interactions is a difficult problem because of relatively small sample sizes and very large parameter spaces for quantitative trait locus models that include such interactions. Here we present a nonparametric Bayesian method to map multiple quantitative trait loci (QTL) by considering epistatic and gene-environment interactions. The proposed method is not restricted to pairwise interactions among genes, as is typically done in parametric QTL analysis. Rather than modeling each main and interaction term explicitly, our nonparametric Bayesian method measures the importance of each QTL, irrespective of whether it is mostly due to a main effect or due to some interaction effect(s), via an unspecified function of the genotypes at all candidate QTL. A Gaussian process prior is assigned to this unknown function. In addition to the candidate QTL, nongenetic factors and covariates, such as age, gender, and environmental conditions, can also be included in the unspecified function. The importance of each genetic factor (QTL) and each nongenetic factor/covariate included in the function is estimated by a single hyperparameter, which enters the covariance function and captures any main or interaction effect associated with a given factor/covariate. An initial evaluation of the performance of the proposed method is obtained via analysis of simulated and real data.
    Genetics 09/2010; 186(1):385-94. · 4.01 Impact Factor
  • Article: Genome scan for loci regulating HDL cholesterol levels in Finnish extended pedigrees with early coronary heart disease.
    [show abstract] [hide abstract]
    ABSTRACT: Coronary heart disease (CHD) is the leading cause of mortality in Western societies. Its risk is inversely correlated with plasma high-density lipoprotein cholesterol (HDL-C) levels, and approximately 50% of the variability in these levels is genetically determined. In this study, the aim was to carry out a whole-genome scan for the loci regulating plasma HDL-C levels in 35 well-defined Finnish extended pedigrees (375 members genotyped) with probands having low HDL-C levels and premature CHD. The additive genetic heritability of HDL-C was 43%. A variance component analysis revealed four suggestive quantitative trait loci (QTLs) for HDL-C levels, with the highest LOD score, 3.1, at the chromosomal locus 4p12. Other suggestive LOD scores were 2.1 at 2q33, 2.1 at 6p24 and 2.0 at 17q25. Three suggestive loci for the qualitative low HDL-C trait were found, with a nonparametric multipoint score of 2.6 at the chromosomal locus 10p15.3, 2.5 at 22q11 and 2.1 at 6p12. After correction for statin use, the strongest evidence of linkage was shown on chromosomes 4p12, 6p24, 6p12, 15q22 and 22q11. To search for the underlying gene on chromosome 6, we analyzed two functional and positional candidate genes (peroxisome proliferator-activated receptor-delta (PPARD), and retinoid X receptor beta, (RXRB)), but found no significant evidence of association. In conclusion, we identified seven chromosomal regions for HDL-C regulation exceeding the level for suggestive evidence of linkage.
    European journal of human genetics: EJHG 11/2009; 18(5):604-13. · 3.56 Impact Factor
  • Article: Gaussian process based bayesian semiparametric quantitative trait Loci interval mapping.
    [show abstract] [hide abstract]
    ABSTRACT: In linkage analysis, it is often necessary to include covariates such as age or weight to increase power or avoid spurious false positive findings. However, if a covariate term in the model is specified incorrectly (e.g., a quadratic term misspecified as a linear term), then the inclusion of the covariate may adversely affect power and accuracy of the identification of quantitative trait loci (QTL). Furthermore, some covariates may interact with each other in a complicated fashion. We implement semiparametric models for single and multiple QTL mapping. Both mapping methods include an unspecified function of any covariate found or suspected to have a more complex than linear but unknown relationship with the response variable. They also allow for interactions among different covariates. This analysis is performed in a Bayesian inference framework using Markov chain Monte Carlo. The advantages of our methods are demonstrated via extensive simulations and real data analysis.
    Biometrics 06/2009; 66(1):222-32. · 1.83 Impact Factor
  • Article: Differential protein expression analysis using stable isotope labeling and PQD linear ion trap MS technology.
    Jenny M Armenta, Ina Hoeschele, Iulia M Lazar
    [show abstract] [hide abstract]
    ABSTRACT: An isotope tags for relative and absolute quantitation (iTRAQ)-based reversed-phase liquid chromatography (RPLC)-tandem mass spectrometry (MS/MS) method was developed for differential protein expression profiling in complex cellular extracts. The estrogen positive MCF-7 cell line, cultured in the presence of 17beta-estradiol (E2) and tamoxifen (Tam), was used as a model system. MS analysis was performed with a linear trap quadrupole (LTQ) instrument operated by using pulsed Q dissociation (PQD) detection. Optimization experiments were conducted to maximize the iTRAQ labeling efficiency and the number of quantified proteins. MS data filtering criteria were chosen to result in a false positive identification rate of <4%. The reproducibility of protein identifications was approximately 60%-67% between duplicate, and approximately 50% among triplicate LC-MS/MS runs, respectively. The run-to-run reproducibility, in terms of relative standard deviations (RSD) of global mean iTRAQ ratios, was better than 10%. The quantitation accuracy improved with the number of peptides used for protein identification. From a total of 530 identified proteins (P < 0.001) in the E2/Tam treated MCF-7 cells, a list of 255 proteins (quantified by at least two peptides) was generated for differential expression analysis. A method was developed for the selection, normalization, and statistical evaluation of such datasets. An approximate approximately 2-fold change in protein expression levels was necessary for a protein to be selected as a biomarker candidate. According to this data processing strategy, approximately 16 proteins involved in biological processes such as apoptosis, RNA processing/metabolism, DNA replication/transcription/repair, cell proliferation and metastasis, were found to be up- or down-regulated.
    Journal of the American Society for Mass Spectrometry 03/2009; 20(7):1287-302. · 4.00 Impact Factor
  • Article: Haplotyping methods for pedigrees.
    Guimin Gao, David B Allison, Ina Hoeschele
    [show abstract] [hide abstract]
    ABSTRACT: Haplotypes provide valuable information in the study of diseases, complex traits, population histories, and evolutionary genetics. With the dramatic increase in the number of available single nucleotide polymorphism (SNP) markers, haplotype inference (haplotyping) using observed genotype data has become an important component of genetic studies in general and of statistical gene mapping in particular. Existing haplotyping methods include (1) population-based methods, (2) methods for pooled DNA samples, and (3) methods for family and pedigree data. The methods and computer programs for population data and pooled DNA samples were reviewed recently in the literature. As several authors noted, family and pedigree datasets are abundant and have unique advantages. In the past twenty years, many haplotyping methods for family and pedigree data have been developed. Therefore, in this contribution we review haplotyping methods and the corresponding computer programs suitable for family and pedigree data and discuss their applications and limitations. We explore the connections among these methods, and describe the challenges that remain to be addressed.
    Human Heredity 02/2009; 67(4):248-66. · 1.79 Impact Factor
  • Article: Gene network inference via structural equation modeling in genetical genomics experiments.
    Bing Liu, Alberto de la Fuente, Ina Hoeschele
    [show abstract] [hide abstract]
    ABSTRACT: Our goal is gene network inference in genetical genomics or systems genetics experiments. For species where sequence information is available, we first perform expression quantitative trait locus (eQTL) mapping by jointly utilizing cis-, cis-trans-, and trans-regulation. After using local structural models to identify regulator-target pairs for each eQTL, we construct an encompassing directed network (EDN) by assembling all retained regulator-target relationships. The EDN has nodes corresponding to expressed genes and eQTL and directed edges from eQTL to cis-regulated target genes, from cis-regulated genes to cis-trans-regulated target genes, from trans-regulator genes to target genes, and from trans-eQTL to target genes. For network inference within the strongly constrained search space defined by the EDN, we propose structural equation modeling (SEM), because it can model cyclic networks and the EDN indeed contains feedback relationships. On the basis of a factorization of the likelihood and the constrained search space, our SEM algorithm infers networks involving several hundred genes and eQTL. Structure inference is based on a penalized likelihood ratio and an adaptation of Occam's window model selection. The SEM algorithm was evaluated using data simulated with nonlinear ordinary differential equations and known cyclic network topologies and was applied to a real yeast data set.
    Genetics 04/2008; 178(3):1763-76. · 4.01 Impact Factor
  • Source
    Article: Bayesian estimation of genetic parameters for multivariate threshold and continuous phenotypes and molecular genetic data in simulated horse populations using Gibbs sampling.
    Kathrin F Stock, Ottmar Distl, Ina Hoeschele
    [show abstract] [hide abstract]
    ABSTRACT: Requirements for successful implementation of multivariate animal threshold models including phenotypic and genotypic information are not known yet. Here simulated horse data were used to investigate the properties of multivariate estimators of genetic parameters for categorical, continuous and molecular genetic data in the context of important radiological health traits using mixed linear-threshold animal models via Gibbs sampling. The simulated pedigree comprised 7 generations and 40000 animals per generation. Additive genetic values, residuals and fixed effects for one continuous trait and liabilities of four binary traits were simulated, resembling situations encountered in the Warmblood horse. Quantitative trait locus (QTL) effects and genetic marker information were simulated for one of the liabilities. Different scenarios with respect to recombination rate between genetic markers and QTL and polymorphism information content of genetic markers were studied. For each scenario ten replicates were sampled from the simulated population, and within each replicate six different datasets differing in number and distribution of animals with trait records and availability of genetic marker information were generated. (Co)Variance components were estimated using a Bayesian mixed linear-threshold animal model via Gibbs sampling. Residual variances were fixed to zero and a proper prior was used for the genetic covariance matrix. Effective sample sizes (ESS) and biases of genetic parameters differed significantly between datasets. Bias of heritability estimates was -6% to +6% for the continuous trait, -6% to +10% for the binary traits of moderate heritability, and -21% to +25% for the binary traits of low heritability. Additive genetic correlations were mostly underestimated between the continuous trait and binary traits of low heritability, under- or overestimated between the continuous trait and binary traits of moderate heritability, and overestimated between two binary traits. Use of trait information on two subsequent generations of animals increased ESS and reduced bias of parameter estimates more than mere increase of the number of informative animals from one generation. Consideration of genotype information as a fixed effect in the model resulted in overestimation of polygenic heritability of the QTL trait, but increased accuracy of estimated additive genetic correlations of the QTL trait. Combined use of phenotype and genotype information on parents and offspring will help to identify agonistic and antagonistic genetic correlations between traits of interests, facilitating design of effective multiple trait selection schemes.
    BMC Genetics 02/2007; 8:19. · 2.47 Impact Factor
  • Article: Bayesian estimation of genetic parameters for multivariate threshold and continuous phenotypes and molecular genetic data in simulated horse populations using Gibbs sampling
    Kathrin Stock, Ottmar Distl, Ina Hoeschele
    [show abstract] [hide abstract]
    ABSTRACT: Abstract Background Requirements for successful implementation of multivariate animal threshold models including phenotypic and genotypic information are not known yet. Here simulated horse data were used to investigate the properties of multivariate estimators of genetic parameters for categorical, continuous and molecular genetic data in the context of important radiological health traits using mixed linear-threshold animal models via Gibbs sampling. The simulated pedigree comprised 7 generations and 40000 animals per generation. Additive genetic values, residuals and fixed effects for one continuous trait and liabilities of four binary traits were simulated, resembling situations encountered in the Warmblood horse. Quantitative trait locus (QTL) effects and genetic marker information were simulated for one of the liabilities. Different scenarios with respect to recombination rate between genetic markers and QTL and polymorphism information content of genetic markers were studied. For each scenario ten replicates were sampled from the simulated population, and within each replicate six different datasets differing in number and distribution of animals with trait records and availability of genetic marker information were generated. (Co)Variance components were estimated using a Bayesian mixed linear-threshold animal model via Gibbs sampling. Residual variances were fixed to zero and a proper prior was used for the genetic covariance matrix. Results Effective sample sizes (ESS) and biases of genetic parameters differed significantly between datasets. Bias of heritability estimates was -6% to +6% for the continuous trait, -6% to +10% for the binary traits of moderate heritability, and -21% to +25% for the binary traits of low heritability. Additive genetic correlations were mostly underestimated between the continuous trait and binary traits of low heritability, under- or overestimated between the continuous trait and binary traits of moderate heritability, and overestimated between two binary traits. Use of trait information on two subsequent generations of animals increased ESS and reduced bias of parameter estimates more than mere increase of the number of informative animals from one generation. Consideration of genotype information as a fixed effect in the model resulted in overestimation of polygenic heritability of the QTL trait, but increased accuracy of estimated additive genetic correlations of the QTL trait. Conclusion Combined use of phenotype and genotype information on parents and offspring will help to identify agonistic and antagonistic genetic correlations between traits of interests, facilitating design of effective multiple trait selection schemes.
    BMC Genetics. 01/2007;
  • Article: Influence of priors in Bayesian estimation of genetic parameters for multivariate threshold models using Gibbs sampling
    Kathrin Stock, Ottmar Distl, Ina Hoeschele
    [show abstract] [hide abstract]
    ABSTRACT: Abstract Simulated data were used to investigate the influence of the choice of priors on estimation of genetic parameters in multivariate threshold models using Gibbs sampling. We simulated additive values, residuals and fixed effects for one continuous trait and liabilities of four binary traits, and QTL effects for one of the liabilities. Within each of four replicates six different datasets were generated which resembled different practical scenarios in horses with respect to number and distribution of animals with trait records and availability of QTL information. (Co)Variance components were estimated using a Bayesian threshold animal model via Gibbs sampling. The Gibbs sampler was implemented with both a flat and a proper prior for the genetic covariance matrix. Convergence problems were encountered in > 50% of flat prior analyses, with indications of potential or near posterior impropriety between about round 10 000 and 100 000. Terminations due to non-positive definite genetic covariance matrix occurred in flat prior analyses of the smallest datasets. Use of a proper prior resulted in improved mixing and convergence of the Gibbs chain. In order to avoid (near) impropriety of posteriors and extremely poorly mixing Gibbs chains, a proper prior should be used for the genetic covariance matrix when implementing the Gibbs sampler.
    Genetics Selection Evolution. 01/2007;
  • Article: Nucleoplasmin facilitates reprogramming and in vivo development of bovine nuclear transfer embryos.
    [show abstract] [hide abstract]
    ABSTRACT: Successful cloning by somatic cell nuclear transfer (NT) involves an oocyte-driven transition in gene expression from an inherited somatic pattern, to an embryonic form, during early development. This reprogramming of gene expression is thought to require the remodeling of somatic chromatin and as such, faulty and/or incomplete chromatin remodeling may contribute to the aberrant gene expression and abnormal development observed in NT embryos. We used a novel approach to supplement the oocyte with chromatin remodeling factors and determined the impact of these molecules on gene expression and development of bovine NT embryos. Nucleoplasmin (NPL) or polyglutamic acid (PGA) was injected into bovine oocytes at different concentrations, either before (pre-NT) or after (post-NT) NT. Pre-implantation embryos were then transferred to bovine recipients to assess in vivo development. Microinjection of remodeling factors resulted in apparent differences in the rate of blastocyst development and in pregnancy initiation rates in both NPL- and PGA-injected embryos, and these differences were dependent on factor concentration and/or the time of injection. Post-NT NPL-injected embryos that produced the highest rate of pregnancy also demonstrated differentially expressed genes relative to pre-NT NPL embryos and control NT embryos, both of which had lower pregnancy rates. Over 200 genes were upregulated following post-NT NPL injection. Several of these genes were previously shown to be downregulated in NT embryos when compared to bovine IVF embryos. These data suggest that addition of chromatin remodeling factors to the oocyte may improve development of NT embryos by facilitating reprogramming of the somatic nucleus.
    Molecular Reproduction and Development 09/2006; 73(8):977-86. · 2.53 Impact Factor
  • Article: Approximating identity-by-descent matrices using multiple haplotype configurations on pedigrees.
    Guimin Gao, Ina Hoeschele
    [show abstract] [hide abstract]
    ABSTRACT: Identity-by-descent (IBD) matrix calculation is an important step in quantitative trait loci (QTL) analysis using variance component models. To calculate IBD matrices efficiently for large pedigrees with large numbers of loci, an approximation method based on the reconstruction of haplotype configurations for the pedigrees is proposed. The method uses a subset of haplotype configurations with high likelihoods identified by a haplotyping method. The new method is compared with a Markov chain Monte Carlo (MCMC) method (Loki) in terms of QTL mapping performance on simulated pedigrees. Both methods yield almost identical results for the estimation of QTL positions and variance parameters, while the new method is much more computationally efficient than the MCMC approach for large pedigrees and large numbers of loci. The proposed method is also compared with an exact method (Merlin) in small simulated pedigrees, where both methods produce nearly identical estimates of position-specific kinship coefficients. The new method can be used for fine mapping with joint linkage disequilibrium and linkage analysis, which improves the power and accuracy of QTL mapping.
    Genetics 10/2005; 171(1):365-76. · 4.01 Impact Factor
  • Source
    Article: Genetical genomics analysis of a yeast segregant population for transcription network inference.
    Nan Bing, Ina Hoeschele
    [show abstract] [hide abstract]
    ABSTRACT: Genetic analysis of gene expression in a segregating population, which is expression profiled and genotyped at DNA markers throughout the genome, can reveal regulatory networks of polymorphic genes. We propose an analysis strategy with several steps: (1) genome-wide QTL analysis of all expression profiles to identify eQTL confidence regions, followed by fine mapping of identified eQTL; (2) identification of regulatory candidate genes in each eQTL region; (3) correlation analysis of the expression profiles of the candidates in any eQTL region with the gene affected by the eQTL to reduce the number of candidates; (4) drawing directional links from retained regulatory candidate genes to genes affected by the eQTL and joining links to form networks; and (5) statistical validation and refinement of the inferred network structure. Here, we apply an initial implementation of this strategy to a segregating yeast population. In 65, 7, and 28% of the identified eQTL regions, a single candidate regulatory gene, no gene, or more than one gene was retained in step 3, respectively. Overall, 768 putative regulatory links were retained, 331 of which are the strongest candidate links, as they were retained in the expression correlation analysis and were located within or near an eQTL subregion identified by a multimarker analysis separating multiple linked QTL. One or several biological processes were statistically significantly overrepresented in independent network structures or in highly interconnected subnetworks. Most of the transcription factors found in the inferred network had a putative regulatory link to only one other gene or exhibited cis-regulation.
    Genetics 07/2005; 170(2):533-42. · 4.01 Impact Factor
  • Article: Finite mixture model analysis of microarray expression data on samples of uncertain biological type with application to reproductive efficiency.
    [show abstract] [hide abstract]
    ABSTRACT: Common goals of microarray experiments are the detection of genes that are differentially expressed between several biological types and the construction of classifiers that predict biological type of samples. Here we consider a situation where there is no training data. There is considerable interest in comparing expression profiles associated with successful pregnancies (SP) and unsuccessful pregnancies (UP) in model and farm animals. Successful pregnancy rate is known to be much higher in embryos generated by in vitro fertilization (IVF) than in nuclear transfer (NT) embryos, and higher under induced ovulation for large follicles (LF) than for small follicles (SF). The tasks of identifying genes differentially expressed between SP and UP, and predicting SP for future samples are not well accomplished by comparing IVF and NT, or LF and SF. A suitable method is finite mixture model analysis (FMMA), which models each observed class (IVF and NT, or LF and SF) as a mixture of two distributions, one for SP and one for UP, with different known or unknown proportions (here known to be 0.50 SP for IVF and 0.02 SP for NT). The means of the two distributions differ for the differentially expressed genes, which we identify via a likelihood ratio test. We confirm by simulation that FMMA strongly outperforms hierarchical clustering and linear discriminant analysis using the known class labels (NT, IVF). We apply FMMA to a real data set on IVF and NT embryos, and compute their posterior probabilities of SP, which confirm our prior knowledge of the SP proportions for IVF and NT.
    Veterinary Immunology and Immunopathology 06/2005; 105(3-4):187-96. · 2.08 Impact Factor
  • Source
    Article: A note on joint versus gene-specific mixed model analysis of microarray gene expression data.
    Ina Hoeschele, Hua Li
    [show abstract] [hide abstract]
    ABSTRACT: Currently, linear mixed model analyses of expression microarray experiments are performed either in a gene-specific or global mode. The joint analysis provides more flexibility in terms of how parameters are fitted and estimated and tends to be more powerful than the gene-specific analysis. Here we show how to implement the gene-specific linear mixed model analysis as an exact algorithm for the joint linear mixed model analysis. The gene-specific algorithm is exact, when the mixed model equations can be partitioned into unrelated components: One for all global fixed and random effects and the others for the gene-specific fixed and random effects for each gene separately. This unrelatedness holds under three conditions: (1) any gene must have the same number of replicates or probes on all arrays, but these numbers can differ among genes; (2) the residual variance of the (transformed) expression data must be homogeneous or constant across genes (other variance components need not be homogeneous) and (3) the number of genes in the experiment is large. When these conditions are violated, the gene-specific algorithm is expected to be nearly exact.
    Biostatistics 05/2005; 6(2):183-6. · 2.14 Impact Factor
  • Article: Identification of differentially expressed genes in individual bovine preimplantation embryos produced by nuclear transfer: improper reprogramming of genes required for development.
    [show abstract] [hide abstract]
    ABSTRACT: Using an interwoven-loop experimental design in conjunction with highly conservative linear mixed model methodology using estimated variance components, 18 genes differentially expressed between nuclear transfer (NT)- and in vitro fertilization (IVF)-produced embryos were identified. The set is comprised of three intermediate-filament protein genes (cytokeratin 8, cytokeratin 19, and vimentin), three metabolic genes (phosphoribosyl pyrophosphate synthetase 1, mitochondrial acetoacetyl-coenzyme A thiolase, and alpha-glucosidase), two lysosomal-related genes (prosaposin and lysosomal-associated membrane protein 2), and a gene associated with stress responses (heat shock protein 27) along with major histocompatibility complex class I, nidogen 2, a putative transport protein, heterogeneous nuclear ribonuclear protein K, mitochondrial 16S rRNA, and ES1 (a zebrafish orthologue of unknown function). The three remaining genes are novel. To our knowledge, this is the first report comparing individual embryos produced by NT and IVF using cDNA microarray technology for any species, and it uses a rigorous experimental design that emphasizes statistical significance to identify differentially expressed genes between NT and IVF embryos in cattle.
    Biology of Reproduction 04/2005; 72(3):546-55. · 4.01 Impact Factor
  • Source
    Article: Discovery of meaningful associations in genomic data using partial correlation coefficients.
    [show abstract] [hide abstract]
    ABSTRACT: A major challenge of systems biology is to infer biochemical interactions from large-scale observations, such as transcriptomics, proteomics and metabolomics. We propose to use a partial correlation analysis to construct approximate Undirected Dependency Graphs from such large-scale biochemical data. This approach enables a distinction between direct and indirect interactions of biochemical compounds, thereby inferring the underlying network topology. The method is first thoroughly evaluated with a large set of simulated data. Results indicate that the approach has good statistical power and a low False Discovery Rate even in the presence of noise in the data. We then applied the method to an existing data set of yeast gene expression. Several small gene networks were inferred and found to contain genes known to be collectively involved in particular biochemical processes. In some of these networks there are also uncharacterized ORFs present, which lead to hypotheses about their functions. Programs running in MS-Windows and Linux for applying zeroth, first, second and third order partial correlation analysis can be downloaded at: http://mendes.vbi.vt.edu/tiki-index.php?page=Software. Supplementary information can be found at: URL to be decided.
    Bioinformatics 01/2005; 20(18):3565-74. · 5.47 Impact Factor
  • Article: Conditional probability methods for haplotyping in pedigrees.
    [show abstract] [hide abstract]
    ABSTRACT: Efficient haplotyping in pedigrees is important for the fine mapping of quantitative trait locus (QTL) or complex disease genes. To reconstruct haplotypes efficiently for a large pedigree with a large number of linked loci, two algorithms based on conditional probabilities and likelihood computations are presented. The first algorithm (the conditional probability method) produces a single, approximately optimal haplotype configuration, with computing time increasing linearly in the number of linked loci and the pedigree size. The other algorithm (the conditional enumeration method) identifies a set of haplotype configurations with high probabilities conditional on the observed genotype data for a pedigree. Its computing time increases less than exponentially with the size of a subset of the set of person-loci with unordered genotypes and linearly with its complement. The size of the subset is controlled by a threshold parameter. The set of identified haplotype configurations can be used to estimate the identity-by-descent (IBD) matrix at a map position for a pedigree. The algorithms have been tested on published and simulated data sets. The new haplotyping methods are much faster and provide more information than several existing stochastic and rule-based methods. The accuracies of the new methods are equivalent to or better than those of these existing methods.
    Genetics 09/2004; 167(4):2055-65. · 4.01 Impact Factor
  • Article: ATP-binding cassette transporter A1 locus is not a major determinant of HDL-C levels in a population at high risk for coronary heart disease.
    [show abstract] [hide abstract]
    ABSTRACT: ATP-binding cassette transporter A1 (ABCA1) transports cellular cholesterol to lipid-poor apolipoproteins. Mutations in the ABCA1 gene are linked to rare phenotypes, familial hypoalphalipoproteinemia (FHA) and Tangier disease (TD), characterized by markedly decreased plasma high-density lipoprotein cholesterol (HDL-C) levels. The aim was to test if the ABCA1 locus is a major locus regulating HDL-C levels in the homogenous Finnish population with a high prevalence of coronary heart disease (CHD). Firstly, the ABCA1 locus was tested for linkage to HDL-C levels in 35 families with premature CHD and low HDL-C levels. Secondly, 62 men with low HDL-C levels and CHD were screened for the five mutations known to cause FHA. Thirdly, polymorphisms of the ABCA1 gene were tested for an association with HDL-C levels in a population sample of 515 subjects. The ABCA1 locus was not linked to HDL-C levels in the CHD families, and no carriers of the FHA mutations were found. The AA596 genotype was associated with higher HDL-C levels compared with the GG and GA genotypes in the women, but not in the men. The G596A genotypes explained 4% and the A2589G genotypes 3% of the variation in plasma HDL-C levels in women. The data suggest that the ABCA1 locus is of minor importance in the regulation of HDL-C in Finns.
    Atherosclerosis 03/2003; 166(2):285-90. · 3.79 Impact Factor

Institutions

  • 2010
    • University of North Carolina at Chapel Hill
      • Department of Biostatistics
      Chapel Hill, NC, USA
  • 2009
    • University of Alabama at Birmingham
      • Department of Biostatistics
      Birmingham, AL, USA
  • 2005–2008
    • Virginia Polytechnic Institute and State University
      • • Department of Statistics
      • • Virginia Bioinformatics Institute
      Blacksburg, VA, USA
  • 2007
    • University of Veterinary Medicine Hannover
      • Institut für Tierzucht und Vererbungsforschung
      Hannover, Lower Saxony, Germany