Mikko J Sillanpää

University of Helsinki, Helsinki, Province of Southern Finland, Finland

Are you Mikko J Sillanpää?

Claim your profile

Publications (34)122.98 Total impact

  • Article: A Decision Rule for Quantitative Trait Locus Detection under the Extended Bayesian LASSO Model.
    Crispin M Mutshinda, Mikko J Sillanpää
    [show abstract] [hide abstract]
    ABSTRACT: Bayesian shrinkage analysis is arguably the state-of-the-art technique for large-scale multiple Quantitative Trait Locus (QTL) mapping. However, when the shrinkage model does not involve indicator variables for marker inclusion, QTL detection remains heavily dependent on significance thresholds derived from phenotype permutation under the null hypothesis of no phenotype-to-genotype association. This approach is computationally intensive and more importantly, the hypothetical data generation at the heart of the permutation-based method violates the Bayesian philosophy. Here we propose a fully Bayesian decision rule for QTL detection under the recently introduced Extended Bayesian LASSO for QTL mapping. Our new decision rule is free of any hypothetical data generation, and relies on the well-established Bayes factors for evaluating the evidence for QTL presence at any locus. Simulation results demonstrate the remarkable performance of our decision rule. An application to real-world data is considered as well.
    Genetics 09/2012; · 4.01 Impact Factor
  • Article: Swift block-updating EM and pseudo-EM procedures for Bayesian shrinkage analysis of quantitative trait loci.
    Crispin M Mutshinda, Mikko J Sillanpää
    [show abstract] [hide abstract]
    ABSTRACT: Virtually all existing expectation-maximization (EM) algorithms for quantitative trait locus (QTL) mapping overlook the covariance structure of genetic effects, even though this information can help enhance the robustness of model-based inferences. Here, we propose fast EM and pseudo-EM-based procedures for Bayesian shrinkage analysis of QTLs, designed to accommodate the posterior covariance structure of genetic effects through a block-updating scheme. That is, updating all genetic effects simultaneously through many cycles of iterations. Simulation results based on computer-generated and real-world marker data demonstrated the ability of our method to swiftly produce sensible results regarding the phenotype-to-genotype association. Our new method provides a robust and remarkably fast alternative to full Bayesian estimation in high-dimensional models where the computational burden associated with Markov chain Monte Carlo simulation is often unwieldy. The R code used to fit the model to the data is provided in the online supplementary material.
    Theoretical and Applied Genetics 07/2012; 125(7):1575-87. · 3.30 Impact Factor
  • Article: Overview of LASSO-related penalized regression methods for quantitative trait mapping and genomic selection.
    Zitong Li, Mikko J Sillanpää
    [show abstract] [hide abstract]
    ABSTRACT: Quantitative trait loci (QTL)/association mapping aims at finding genomic loci associated with the phenotypes, whereas genomic selection focuses on breeding value prediction based on genomic data. Variable selection is a key to both of these tasks as it allows to (1) detect clear mapping signals of QTL activity, and (2) predict the genome-enhanced breeding values accurately. In this paper, we provide an overview of a statistical method called least absolute shrinkage and selection operator (LASSO) and two of its generalizations named elastic net and adaptive LASSO in the contexts of QTL mapping and genomic breeding value prediction in plants (or animals). We also briefly summarize the Bayesian interpretation of LASSO, and the inspired hierarchical Bayesian models. We illustrate the implementation and examine the performance of methods using three public data sets: (1) North American barley data with 127 individuals and 145 markers, (2) a simulated QTLMAS XII data with 5,865 individuals and 6,000 markers for both QTL mapping and genomic selection, and (3) a wheat data with 599 individuals and 1,279 markers only for genomic selection.
    Theoretical and Applied Genetics 05/2012; 125(3):419-35. · 3.30 Impact Factor
  • Source
    Article: A hierarchical bayesian approach to multi-trait clinical quantitative trait locus modeling.
    Crispin M Mutshinda, Neli Noykova, Mikko J Sillanpää
    [show abstract] [hide abstract]
    ABSTRACT: Recent advances in high-throughput genotyping and transcript profiling technologies have enabled the inexpensive production of genome-wide dense marker maps in tandem with huge amounts of expression profiles. These large-scale data encompass valuable information about the genetic architecture of important phenotypic traits. Comprehensive models that combine molecular markers and gene transcript levels are increasingly advocated as an effective approach to dissecting the genetic architecture of complex phenotypic traits. The simultaneous utilization of marker and gene expression data to explain the variation in clinical quantitative trait, known as clinical quantitative trait locus (cQTL) mapping, poses challenges that are both conceptual and computational. Nonetheless, the hierarchical Bayesian (HB) modeling approach, in combination with modern computational tools such as Markov chain Monte Carlo (MCMC) simulation techniques, provides much versatility for cQTL analysis. Sillanpää and Noykova (2008) developed a HB model for single-trait cQTL analysis in inbred line cross-data using molecular markers, gene expressions, and marker-gene expression pairs. However, clinical traits generally relate to one another through environmental correlations and/or pleiotropy. A multi-trait approach can improve on the power to detect genetic effects and on their estimation precision. A multi-trait model also provides a framework for examining a number of biologically interesting hypotheses. In this paper we extend the HB cQTL model for inbred line crosses proposed by Sillanpää and Noykova to a multi-trait setting. We illustrate the implementation of our new model with simulated data, and evaluate the multi-trait model performance with regard to its single-trait counterpart. The data simulation process was based on the multi-trait cQTL model, assuming three traits with uncorrelated and correlated cQTL residuals, with the simulated data under uncorrelated cQTL residuals serving as our test set for comparing the performances of the multi-trait and single-trait models. The simulated data under correlated cQTL residuals were essentially used to assess how well our new model can estimate the cQTL residual covariance structure. The model fitting to the data was carried out by MCMC simulation through OpenBUGS. The multi-trait model outperformed its single-trait counterpart in identifying cQTLs, with a consistently lower false discovery rate. Moreover, the covariance matrix of cQTL residuals was typically estimated to an appreciable degree of precision under the multi-trait cQTL model, making our new model a promising approach to addressing a wide range of issues facing the analysis of correlated clinical traits.
    Frontiers in genetics. 01/2012; 3:97.
  • Article: Estimation of quantitative trait locus effects with epistasis by variational Bayes algorithms.
    Zitong Li, Mikko J Sillanpää
    [show abstract] [hide abstract]
    ABSTRACT: Bayesian hierarchical shrinkage methods have been widely used for quantitative trait locus mapping. From the computational perspective, the application of the Markov chain Monte Carlo (MCMC) method is not optimal for high-dimensional problems such as the ones arising in epistatic analysis. Maximum a posteriori (MAP) estimation can be a faster alternative, but it usually produces only point estimates without providing any measures of uncertainty (i.e., interval estimates). The variational Bayes method, stemming from the mean field theory in theoretical physics, is regarded as a compromise between MAP and MCMC estimation, which can be efficiently computed and produces the uncertainty measures of the estimates. Furthermore, variational Bayes methods can be regarded as the extension of traditional expectation-maximization (EM) algorithms and can be applied to a broader class of Bayesian models. Thus, the use of variational Bayes algorithms based on three hierarchical shrinkage models including Bayesian adaptive shrinkage, Bayesian LASSO, and extended Bayesian LASSO is proposed here. These methods performed generally well and were found to be highly competitive with their MCMC counterparts in our example analyses. The use of posterior credible intervals and permutation tests are considered for decision making between quantitative trait loci (QTL) and non-QTL. The performance of the presented models is also compared with R/qtlbim and R/BhGLM packages, using a previously studied simulated public epistatic data set.
    Genetics 01/2012; 190(1):231-49. · 4.01 Impact Factor
  • Article: Associations between Variation in CHRNA5-CHRNA3-CHRNB4, Body Mass Index and Blood Pressure in the Northern Finland Birth Cohort 1966.
    [show abstract] [hide abstract]
    ABSTRACT: The CHRNA5-CHRNA3-CHRNB4 gene cluster on 15q25 has consistently been associated with smoking quantity, nicotine dependence and lung cancer. Recent research also points towards its involvement in cardiovascular homeostasis, but studies in large human samples are lacking, especially on the role of the gene cluster in blood pressure regulation. We studied the associations between 18 single nucleotide polymorphisms (SNPs) in CHRNA5-CHRNA3-CHRNB4 and systolic blood pressure (SBP), diastolic blood pressure (DBP), and body mass index (BMI) in 5402 young adults from the Northern Finland Birth Cohort 1966. We observed some evidence for associations between two SNPs and SBP and between six SNPs and BMI; the evidence for associations with DBP was weaker. The associations with the three phenotypes were driven by different loci with low linkage disequilibrium with each other. The associations appeared more pronounced in smokers, such that the smoking-increasing alleles would predict lower SBP and BMI. Each additional copy of the rs1948 G-allele and the rs950776 A-allele reduced SBP on average by -1.21 (95% CI -2.01, -0.40) mmHg in smokers. The variants associated with BMI included rs2036534, rs6495309, rs1996371, rs6495314, rs4887077 and rs11638372 and had an average effect size of -0.38 (-0.68, -0.08) kg/m(2) per an additional copy of the risk allele in smokers. Formal assessments of interactions provided weaker support for these findings, especially after adjustment for multiple testing. Variation at 15q25 appears to interact with smoking status in influencing SBP and BMI. The genetic loci associated with SBP were in low linkage disequilibrium with those associated with BMI suggesting that the gene cluster might regulate SBP through biological mechanisms that partly differ from those regulating BMI. Further studies in larger samples are needed for more precise evaluation of the possible interactions, and to understand the mechanisms behind.
    PLoS ONE 01/2012; 7(9):e46557. · 4.09 Impact Factor
  • Article: Genetic analysis of complex traits via Bayesian variable selection: the utility of a mixture of uniform priors.
    Timo Knürr, Esa Läärä, Mikko J Sillanpää
    [show abstract] [hide abstract]
    ABSTRACT: A new estimation-based Bayesian variable selection approach is presented for genetic analysis of complex traits based on linear or logistic regression. By assigning a mixture of uniform priors (MU) to genetic effects, the approach provides an intuitive way of specifying hyperparameters controlling the selection of multiple influential loci. It aims at avoiding the difficulty of interpreting assumptions made in the specifications of priors. The method is compared in two real datasets with two other approaches, stochastic search variable selection (SSVS) and a re-formulation of Bayes B utilizing indicator variables and adaptive Student's t-distributions (IAt). The Markov Chain Monte Carlo (MCMC) sampling performance of the three methods is evaluated using the publicly available software OpenBUGS (model scripts are provided in the Supplementary material). The sensitivity of MU to the specification of hyperparameters is assessed in one of the data examples.
    Genetics Research 08/2011; 93(4):303-18. · 1.71 Impact Factor
  • Article: On statistical methods for estimating heritability in wild populations.
    Mikko J Sillanpää
    [show abstract] [hide abstract]
    ABSTRACT: Here we take a look at molecular marker-based heritability estimation suitable for non-model organisms. We address several theoretical issues involved and discuss similarities and differences between our two main approaches: the animal model approach and the shrinkage-estimation based multilocus association approach. Also computational issues and hypothetical example applications for ecologists are considered.
    Molecular Ecology 04/2011; 20(7):1324-32. · 5.52 Impact Factor
  • Source
    Article: A bayesian mixed regression based prediction of quantitative traits from molecular marker and gene expression data.
    Madhuchhanda Bhattacharjee, Mikko J Sillanpää
    [show abstract] [hide abstract]
    ABSTRACT: Both molecular marker and gene expression data were considered alone as well as jointly to serve as additive predictors for two pathogen-activity-phenotypes in real recombinant inbred lines of soybean. For unobserved phenotype prediction, we used a bayesian hierarchical regression modeling, where the number of possible predictors in the model was controlled by different selection strategies tested. Our initial findings were submitted for DREAM5 (the 5th Dialogue on Reverse Engineering Assessment and Methods challenge) and were judged to be the best in sub-challenge B3 wherein both functional genomic and genetic data were used to predict the phenotypes. In this work we further improve upon this previous work by considering various predictor selection strategies and cross-validation was used to measure accuracy of in-data and out-data predictions. The results from various model choices indicate that for this data use of both data types (namely functional genomic and genetic) simultaneously improves out-data prediction accuracy. Adequate goodness-of-fit can be easily achieved with more complex models for both phenotypes, since the number of potential predictors is large and the sample size is not small. We also further studied gene-set enrichment (for continuous phenotype) in the biological process in question and chromosomal enrichment of the gene set. The methodological contribution of this paper is in exploration of variable selection techniques to alleviate the problem of over-fitting. Different strategies based on the nature of covariates were explored and all methods were implemented under the bayesian hierarchical modeling framework with indicator-based covariate selection. All the models based in careful variable selection procedure were found to produce significant results based on permutation test.
    PLoS ONE 01/2011; 6(11):e26959. · 4.09 Impact Factor
  • Article: Extended Bayesian LASSO for multiple quantitative trait loci mapping and unobserved phenotype prediction.
    Crispin M Mutshinda, Mikko J Sillanpää
    [show abstract] [hide abstract]
    ABSTRACT: The Bayesian LASSO (BL) has been pointed out to be an effective approach to sparse model representation and successfully applied to quantitative trait loci (QTL) mapping and genomic breeding value (GBV) estimation using genome-wide dense sets of markers. However, the BL relies on a single parameter known as the regularization parameter to simultaneously control the overall model sparsity and the shrinkage of individual covariate effects. This may be idealistic when dealing with a large number of predictors whose effect sizes may differ by orders of magnitude. Here we propose the extended Bayesian LASSO (EBL) for QTL mapping and unobserved phenotype prediction, which introduces an additional level to the hierarchical specification of the BL to explicitly separate out these two model features. Compared to the adaptiveness of the BL, the EBL is "doubly adaptive" and thus, more robust to tuning. In simulations, the EBL outperformed the BL in regard to the accuracy of both effect size estimates and phenotypic value predictions, with comparable computational time. Moreover, the EBL proved to be less sensitive to tuning than the related Bayesian adaptive LASSO (BAL), which introduces locus-specific regularization parameters as well, but involves no mechanism for distinguishing between model sparsity and parameter shrinkage. Consequently, the EBL seems to point to a new direction for QTL mapping, phenotype prediction, and GBV estimation.
    Genetics 11/2010; 186(3):1067-75. · 4.01 Impact Factor
  • Article: Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities.
    [show abstract] [hide abstract]
    ABSTRACT: The genetic code-the binding specificity of all transfer-RNAs--defines how protein primary structure is determined by DNA sequence. DNA also dictates when and where proteins are expressed, and this information is encoded in a pattern of specific sequence motifs that are recognized by transcription factors. However, the DNA-binding specificity is only known for a small fraction of the approximately 1400 human transcription factors (TFs). We describe here a high-throughput method for analyzing transcription factor binding specificity that is based on systematic evolution of ligands by exponential enrichment (SELEX) and massively parallel sequencing. The method is optimized for analysis of large numbers of TFs in parallel through the use of affinity-tagged proteins, barcoded selection oligonucleotides, and multiplexed sequencing. Data are analyzed by a new bioinformatic platform that uses the hundreds of thousands of sequencing reads obtained to control the quality of the experiments and to generate binding motifs for the TFs. The described technology allows higher throughput and identification of much longer binding profiles than current microarray-based methods. In addition, as our method is based on proteins expressed in mammalian cells, it can also be used to characterize DNA-binding preferences of full-length proteins or proteins requiring post-translational modifications. We validate the method by determining binding specificities of 14 different classes of TFs and by confirming the specificities for NFATC1 and RFX3 using ChIP-seq. Our results reveal unexpected dimeric modes of binding for several factors that were thought to preferentially bind DNA as monomers.
    Genome Research 04/2010; 20(6):861-73. · 13.61 Impact Factor
  • Article: Bayesian inference of genetic parameters based on conditional decompositions of multivariate normal distributions.
    Jon Hallander, Patrik Waldmann, Chunkao Wang, Mikko J Sillanpää
    [show abstract] [hide abstract]
    ABSTRACT: It is widely recognized that the mixed linear model is an important tool for parameter estimation in the analysis of complex pedigrees, which includes both pedigree and genomic information, and where mutually dependent genetic factors are often assumed to follow multivariate normal distributions of high dimension. We have developed a Bayesian statistical method based on the decomposition of the multivariate normal prior distribution into products of conditional univariate distributions. This procedure permits computationally demanding genetic evaluations of complex pedigrees, within the user-friendly computer package WinBUGS. To demonstrate and evaluate the flexibility of the method, we analyzed two example pedigrees: a large noninbred pedigree of Scots pine (Pinus sylvestris L.) that includes additive and dominance polygenic relationships and a simulated pedigree where genomic relationships have been calculated on the basis of a dense marker map. The analysis showed that our method was fast and provided accurate estimates and that it should therefore be a helpful tool for estimating genetic parameters of complex pedigrees quickly and reliably.
    Genetics 03/2010; 185(2):645-54. · 4.01 Impact Factor
  • Source
    Article: Estimating population haplotype frequencies from pooled DNA samples using PHASE algorithm.
    [show abstract] [hide abstract]
    ABSTRACT: Recent studies show that the PHASE algorithm is a state-of-the-art method for population-based haplotyping from individually genotyped data. We present a modified version of PHASE for estimating population haplotype frequencies from pooled DNA data. The algorithm is compared with (i) a maximum likelihood estimation under the multinomial model and (ii) a deterministic greedy algorithm, on both simulated and real data sets (HapMap data). Our results suggest that the PHASE algorithm is a method of choice also on pooled DNA data. The main reason for improvement over the other approaches is assumed to be the same as with individually genotyped data: the biologically motivated model of PHASE takes into account correlated genealogical histories of the haplotypes by modelling mutations and recombinations. The important questions of efficiency of DNA pooling as well as influence of the pool size on the accuracy of the estimates are also considered. Our results are in line with the earlier findings in that the pool size should be relatively small, only 2-5 individuals in our examples, in order to provide reliable estimates of population haplotype frequencies.
    Genetics Research 01/2009; 90(6):509-24. · 1.71 Impact Factor
  • Article: Efficient Markov chain Monte Carlo implementation of Bayesian analysis of additive and dominance genetic variances in noninbred pedigrees.
    Patrik Waldmann, Jon Hallander, Fabian Hoti, Mikko J Sillanpää
    [show abstract] [hide abstract]
    ABSTRACT: Accurate and fast computation of quantitative genetic variance parameters is of great importance in both natural and breeding populations. For experimental designs with complex relationship structures it can be important to include both additive and dominance variance components in the statistical model. In this study, we introduce a Bayesian Gibbs sampling approach for estimation of additive and dominance genetic variances in the traditional infinitesimal model. The method can handle general pedigrees without inbreeding. To optimize between computational time and good mixing of the Markov chain Monte Carlo (MCMC) chains, we used a hybrid Gibbs sampler that combines a single site and a blocked Gibbs sampler. The speed of the hybrid sampler and the mixing of the single-site sampler were further improved by the use of pretransformed variables. Two traits (height and trunk diameter) from a previously published diallel progeny test of Scots pine (Pinus sylvestris L.) and two large simulated data sets with different levels of dominance variance were analyzed. We also performed Bayesian model comparison on the basis of the posterior predictive loss approach. Results showed that models with both additive and dominance components had the best fit for both height and diameter and for the simulated data with high dominance. For the simulated data with low dominance, we needed an informative prior to avoid the dominance variance component becoming overestimated. The narrow-sense heritability estimates in the Scots pine data were lower compared to the earlier results, which is not surprising because the level of dominance variance was rather high, especially for diameter. In general, the hybrid sampler was considerably faster than the blocked sampler and displayed better mixing properties than the single-site sampler.
    Genetics 07/2008; 179(2):1101-12. · 4.01 Impact Factor
  • Article: Estimation of additive genetic and environmental sources of quantitative trait variation using data on married couples and their siblings.
    [show abstract] [hide abstract]
    ABSTRACT: Twin studies have been used to understand the sources of genetic and environmental variation in body height, body weight and other common human quantitative traits. However, it is rather unclear whether these two sources of variation could be really separated in practice. Here, we consider a special study design where phenotype data from married couples and their siblings have been collected. The marital status gives information about the shared environment, while siblings give information about both genetic and environmental variation. To dissect sources of variation and to allow some deviations and pedigree errors in the data, we model such data using a robust polygenic model with finite genome length assumption. As a summary, we provide the estimates for age-dependent proportions of total variation which are due to polygenic and environmental effects. Here, these estimates are provided for body height, weight, systolic blood pressure and total serum cholesterol measured from subjects of the Indian Migration Study.
    Genetics Research 07/2008; 90(3):269-79. · 1.71 Impact Factor
  • Article: Mapping quantitative trait loci from a single-tail sample of the phenotype distribution including survival data.
    Mikko J Sillanpää, Fabian Hoti
    [show abstract] [hide abstract]
    ABSTRACT: A new effective Bayesian quantitative trait locus (QTL) mapping approach for the analysis of single-tail selected samples of the phenotype distribution is presented. The approach extends the affected-only tests to single-tail sampling with quantitative traits such as the log-normal survival time or censored/selected traits. A great benefit of the approach is that it enables the utilization of multiple-QTL models, is easy to incorporate into different data designs (experimental and outbred populations), and can potentially be extended to epistatic models. In inbred lines, the method exploits the fact that the parental mating type and the linkage phases (haplotypes) are known by definition. In outbred populations, two-generation data are needed, for example, selected offspring and one of the parents (the sires) in breeding material. The idea is to statistically (computationally) generate a fully complementary, maximally dissimilar, observation for each offspring in the sample. Bayesian data augmentation is then used to sample the space of possible trait values for the pseudoobservations. The benefits of the approach are illustrated using simulated data sets and a real data set on the survival of F(2) mice following infection with Listeria monocytogenes.
    Genetics 01/2008; 177(4):2361-77. · 4.01 Impact Factor
  • Article: Estimating genealogies from unlinked marker data: a Bayesian approach.
    [show abstract] [hide abstract]
    ABSTRACT: An issue often encountered in statistical genetics is whether, or to what extent, it is possible to estimate the degree to which individuals sampled from a background population are related to each other, on the basis of the available genotype data and some information on the demography of the population. In this article, we consider this question using explicit modelling of the pedigrees and gene flows at unlinked marker loci, but then restricting ourselves to a relatively recent history of the population, that is, considering the genealogy at most some tens of generations backwards in time. As a computational tool we use a Markov chain Monte Carlo numerical integration on the state space of genealogies of the sampled individuals. As illustrations of the method, we consider the question of relatedness at the level of genes/genomes (IBD estimation), using both simulated and real data.
    Theoretical Population Biology 12/2007; 72(3):305-22. · 1.65 Impact Factor
  • Source
    Article: Estimating genealogies from linked marker data: a Bayesian approach.
    Dario Gasbarra, Matti Pirinen, Mikko J Sillanpää, Elja Arjas
    [show abstract] [hide abstract]
    ABSTRACT: Answers to several fundamental questions in statistical genetics would ideally require knowledge of the ancestral pedigree and of the gene flow therein. A few examples of such questions are haplotype estimation, relatedness and relationship estimation, gene mapping by combining pedigree and linkage disequilibrium information, and estimation of population structure. We present a probabilistic method for genealogy reconstruction. Starting with a group of genotyped individuals from some population isolate, we explore the state space of their possible ancestral histories under our Bayesian model by using Markov chain Monte Carlo (MCMC) sampling techniques. The main contribution of our work is the development of sampling algorithms in the resulting vast state space with highly dependent variables. The main drawback is the computational complexity that limits the time horizon within which explicit reconstructions can be carried out in practice. The estimates for IBD (identity-by-descent) and haplotype distributions are tested in several settings using simulated data. The results appear to be promising for a further development of the method.
    BMC Bioinformatics 02/2007; 8:411. · 2.75 Impact Factor
  • Article: Association mapping of complex trait loci with context-dependent effects and unknown context variable.
    Mikko J Sillanpää, Madhuchhanda Bhattacharjee
    [show abstract] [hide abstract]
    ABSTRACT: A novel method for Bayesian analysis of genetic heterogeneity and multilocus association in random population samples is presented. The method is valid for quantitative and binary traits as well as for multiallelic markers. In the method, individuals are stochastically assigned into two etiological groups that can have both their own, and possibly different, subsets of trait-associated (disease-predisposing) loci or alleles. The method is favorable especially in situations when etiological models are stratified by the factors that are unknown or went unmeasured, that is, if genetic heterogeneity is due to, for example, unknown genes x environment or genes x gene interactions. Additionally, a heterogeneity structure for the phenotype does not need to follow the structure of the general population; it can have a distinct selection history. The performance of the method is illustrated with simulated example of genes x environment interaction (quantitative trait with loosely linked markers) and compared to the results of single-group analysis in the presence of missing data. Additionally, example analyses with previously analyzed cystic fibrosis and type 2 diabetes data sets (binary traits with closely linked markers) are presented. The implementation (written in WinBUGS) is freely available for research purposes from http://www.rni.helsinki.fi/ approximately mjs/.
    Genetics 12/2006; 174(3):1597-611. · 4.01 Impact Factor
  • Source
    Article: Constructing the parental linkage phase and the genetic map over distances <1 cM using pooled haploid DNA.
    Dario Gasbarra, Mikko J Sillanpää
    [show abstract] [hide abstract]
    ABSTRACT: A new statistical approach for construction of the genetic linkage map and estimation of the parental linkage phase based on allele frequency data from pooled gametic (sperm or egg) samples is introduced. This method can be applied for estimation of recombination fractions (over distances <1 cM) and ordering of large numbers (even hundreds) of closely linked markers. This method should be extremely useful in species with a long generation interval and a large genome size such as in dairy cattle or in forest trees; the conifer species have haploid tissues available in megagametophytes. According to Mendelian expectation, two parental alleles should occur in gametes in 1:1 proportions, if segregation distortion does not occur. However, due to mere sampling variation, the observed proportions may deviate from their expected value in practice. These deviations and their dependence along the chromosome can provide information on the parental linkage phase and on the genetic linkage map. Usefulness of the method is illustrated with simulations. The role of segregation distortion as a source of these deviations is also discussed. The software implementing this method is freely available for research purposes from the authors.
    Genetics 02/2006; 172(2):1325-35. · 4.01 Impact Factor

Institutions

  • 2002–2012
    • University of Helsinki
      • Department of Mathematics and Statistics
      Helsinki, Province of Southern Finland, Finland
  • 2011
    • University of Pune
      • Department of Statistics
      Pune, State of Maharashtra, India
  • 2010
    • Iowa State University
      • Department of Animal Science
      Ames, IA, USA
  • 2008
    • Sveriges Lantbruksuniversitet
      • Institutionen för skoglig genetik och växtfysiologi
      Uppsala, Uppsala, Sweden
  • 2006
    • University of Chile
      Santiago, Region Metropolitana de Santiago, Chile