Article

Reconstructing sibling relationships in wild populations.

Department of Computer Science, University of Illinois at Chicago, Chicago, IL 60607, USA.
Bioinformatics (Impact Factor: 5.47). 08/2007; 23(13):i49-56. DOI: 10.1093/bioinformatics/btm219
Source: PubMed

ABSTRACT Reconstruction of sibling relationships from genetic data is an important component of many biological applications. In particular, the growing application of molecular markers (microsatellites) to study wild populations of plant and animals has created the need for new computational methods of establishing pedigree relationships, such as sibgroups, among individuals in these populations. Most current methods for sibship reconstruction from microsatellite data use statistical and heuristic techniques that rely on a priori knowledge about various parameter distributions. Moreover, these methods are designed for data with large number of sampled loci and small family groups, both of which typically do not hold for wild populations. We present a deterministic technique that parsimoniously reconstructs sibling groups using only Mendelian laws of inheritance. We validate our approach using both simulated and real biological data and compare it to other methods. Our method is highly accurate on real data and compares favorably with other methods on simulated data with few loci and large family groups. It is the only method that does not rely on a priori knowledge about the population under study. Thus, our method is particularly appropriate for reconstructing sibling groups in wild populations.

0 Bookmarks
 · 
74 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract Pedigree graphs, or family trees, are typically constructed by an expensive process of examining genealogical records to determine which pairs of individuals are parent and child. New methods to automate this process take as input genetic data from a set of extant individuals and reconstruct ancestral individuals. There is a great need to evaluate the quality of these methods by comparing the estimated pedigree to the true pedigree. In this article, we consider two main pedigree comparison problems. The first is the pedigree isomorphism problem, for which we present a linear-time algorithm for leaf-labeled pedigrees. The second is the pedigree edit distance problem, for which we present (1) several algorithms that are fast and exact in various special cases, and (2) a general, randomized heuristic algorithm. In the negative direction, we first prove that the pedigree isomorphism problem is as hard as the general graph isomorphism problem, and that the sub-pedigree isomorphism problem is NP-hard. We then show that the pedigree edit distance problem is APX-hard in general and NP-hard on leaf-labeled pedigrees. We use simulated pedigrees to compare our edit-distance algorithms to each other as well as to a branch-and-bound algorithm that always finds an optimal solution.
    Journal of computational biology: a journal of computational molecular cell biology 08/2012; 19(9):998-1014. · 1.69 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Studying associations between mating system parameters and fitness in natural populations of trees advances our understanding of how local environments affect seed quality, and thereby helps to predict when inbreeding or multiple paternities should impact on fitness. Indeed, for species that demonstrate inbreeding avoidance, multiple paternities (i.e. the number of male parents per half-sib family) should still vary and regulate fitness more than inbreeding - named here as the 'constrained inbreeding hypothesis'. We test this hypothesis in Eucalyptus gracilis, a predominantly insect-pollinated tree. Fifty-eight open-pollinated progeny arrays were collected from trees in three populations. Progeny were planted in a reciprocal transplant trial. Fitness was measured by family establishment rates. We genotyped all trees and their progeny at eight microsatellite loci. Planting site had a strong effect on fitness, but seed provenance and seed provenance × planting site did not. Populations had comparable mating system parameters and were generally outcrossed, experienced low biparental inbreeding and high levels of multiple paternity. As predicted, seed families that had more multiple paternities also had higher fitness, and no fitness-inbreeding correlations were detected. Demonstrating that fitness was most affected by multiple paternities rather than inbreeding, we provide evidence supporting the constrained inbreeding hypothesis; i.e. that multiple paternity may impact on fitness over and above that of inbreeding, particularly for preferentially outcrossing trees at life stages beyond seed development.
    PLoS ONE 01/2014; 9(2):e90478. · 3.73 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Many methods have been proposed to reconstruct the pedigree of a sample of individuals from their multilocus marker genotypes. These methods, like those in other fields of statistical inferences, may suffer from both type I (falsely related) and type II (falsely unrelated) errors. In sibship reconstruction, type I errors come from the spurious fusion of two or more small sibships into a single sibship, and type II errors originate from the spurious splitting of a large sibship into two or more small sibships. In this study I investigate the tendencies of both types of errors made by the likelihood methods in sibship reconstruction, using both analytical and simulation approaches. I propose an improvement on the likelihood methods to reduce sibship splitting, and thus type II errors by downscaling the number of inferred siblings sharing the same genotype at a locus. Simulations are then conducted to compare the accuracy of the original and improved likelihood methods in sibship reconstruction of a large sample of individuals in full-sib families of the same small size, the same large size and highly variable sizes, using a variable number of loci with a variable number of alleles per locus. The methods were also applied to the analysis of a salmon data set. I show that my scaling scheme prevents effectively the splitting of large sibships, and reduces type II errors greatly with little increase in type I errors. As a result, it improves the overall accuracy of sibship assignments, except when sibships are expected to be uniformly small or marker information is unrealistically scarce.Heredity advance online publication, 24 April 2013; doi:10.1038/hdy.2013.34.
    Heredity 04/2013; · 4.11 Impact Factor

Full-text (2 Sources)

View
4 Downloads
Available from
May 27, 2014