Reconstructing sibling relationships in wild populations

Department of Computer Science, University of Illinois at Chicago, Chicago, Illinois, United States
Bioinformatics (Impact Factor: 4.98). 08/2007; 23(13):i49-56. DOI: 10.1093/bioinformatics/btm219
Source: PubMed


Reconstruction of sibling relationships from genetic data is an important component of many biological applications. In particular, the growing application of molecular markers (microsatellites) to study wild populations of plant and animals has created the need for new computational methods of establishing pedigree relationships, such as sibgroups, among individuals in these populations. Most current methods for sibship reconstruction from microsatellite data use statistical and heuristic techniques that rely on a priori knowledge about various parameter distributions. Moreover, these methods are designed for data with large number of sampled loci and small family groups, both of which typically do not hold for wild populations. We present a deterministic technique that parsimoniously reconstructs sibling groups using only Mendelian laws of inheritance. We validate our approach using both simulated and real biological data and compare it to other methods. Our method is highly accurate on real data and compares favorably with other methods on simulated data with few loci and large family groups. It is the only method that does not rely on a priori knowledge about the population under study. Thus, our method is particularly appropriate for reconstructing sibling groups in wild populations.

Download full-text


Available from: Bhaskar Dasgupta,
  • Source
    • "Specifically, in bioinformatics a great deal of attention is being paid to measuring the distance between clusterings of populations, either natural or experimental, for sibling relationship reconstruction. In practice, thus far the focus has been placed almost exclusively on a unique distance measure (see [17], [6], [31], [19], [18], [30], [2], [8], [5], [10], [3]), namely one relying on maximum matching and denoted MMD in the sequel. After its first appearance [4], this measure was subsequently shown [15] to be computable via the assignment problem (see [20, p. 236] and [12, subsection 4.1]). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Measuring the distance between partitions is useful for clustering comparison in different fields. For example, in bioinformatics the measuring mostly obtains through a maximum matching distance MMD, although this is algorithmically demanding and hardly fits certain instances. In fact, another distance measure is being tested, namely one based on information theory and termed variation of information VI. Alternatively, this paper proposes the Hamming distance HD, displaying large range and great measurement sensitivity, while also relying on a neat Boolean representation of partitions. Novel distance HD is computationally handy and shares with VI important characterizing axioms. Developing from the combinatorial concern to translate the traditional Hamming distance from subset to partition lattices, HD constitutes a valuable computational tool for clustering and information processing where a distance between partitions is to be measured.
    Pure Mathematics Applied Mathematics and Computational Methods, Zakynthos Island, Greece; 07/2015
  • Source
    • "The likelihood methods estimate the probability of the data under different partitions and assign individuals to maximum likelihood sibling groups in a population (Smith et al. 2001; Butler et al. 2004; Konovalov et al. 2004; Jones and Wang 2010). The combinatorial approaches use Mendelian laws of inheritance to reconstruct sibling groups to find the smallest number of full-sibling groups (Almudevar and Field 1999; Berger-Wolf et al. 2007). For example, the minimum 2-Allele Set Cover approach, based on Mendelian inheritance rules, does not require population allele frequencies and makes no assumptions about a species' mating system (Berger-Wolf et al. 2007). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genetic diversity and genealogical relationships among individuals of eight aspen demes in a common garden experiment were studied using microsatellite (SSR) and amplified fragment length polymorphism (AFLP) markers. Moderate to high levels of genetic diversity were observed within all demes for the SSR and AFLP markers. The Shannon index for the AFLP markers was positively and significantly correlated with the average observed heterozygosity for the SSR markers across the demes. Comparative analysis of the numbers of the full-sibling groups inferred by different algorithms suggested that the 2-allele algorithm was more accurate than the other methods used in the study. However, in general, significant correlations were found in the numbers of the full-sibling groups inferred by some other algorithms, such as a heuristic algorithm, the 2-allele algorithm, and the Simpson, modified Simpson and pairwise score algorithms. Among demes, the USA and Swiss demes had the highest and lowest numbers, respectively, of full-sibling groups. This combined approach for sibship reconstruction can also be applied to infer the number of paternal or maternal parents in forest tree plantations when no parental information is available; this approach can improve our understanding of family structure and the extent of genetic diversity within forest tree plantations when no diversity, origin, or parental genotypes information is available.
    New Forests 07/2015; DOI:10.1007/s11056-015-9501-9 · 1.83 Impact Factor
  • Source
    • "Family-level mating system parameters were estimated in the same way except that individuals within families were bootstrapped 1000 times to calculate variance estimates. To further investigate the role of the multiple paternities, we estimated the number of full-sib groups within progeny arrays using KINALYZER [57], [58], implementing the 2-allele algorithm, and scaled this value to the size of progeny arrays (kn). Selfed offspring were excluded from this analysis. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Studying associations between mating system parameters and fitness in natural populations of trees advances our understanding of how local environments affect seed quality, and thereby helps to predict when inbreeding or multiple paternities should impact on fitness. Indeed, for species that demonstrate inbreeding avoidance, multiple paternities (i.e. the number of male parents per half-sib family) should still vary and regulate fitness more than inbreeding - named here as the 'constrained inbreeding hypothesis'. We test this hypothesis in Eucalyptus gracilis, a predominantly insect-pollinated tree. Fifty-eight open-pollinated progeny arrays were collected from trees in three populations. Progeny were planted in a reciprocal transplant trial. Fitness was measured by family establishment rates. We genotyped all trees and their progeny at eight microsatellite loci. Planting site had a strong effect on fitness, but seed provenance and seed provenance × planting site did not. Populations had comparable mating system parameters and were generally outcrossed, experienced low biparental inbreeding and high levels of multiple paternity. As predicted, seed families that had more multiple paternities also had higher fitness, and no fitness-inbreeding correlations were detected. Demonstrating that fitness was most affected by multiple paternities rather than inbreeding, we provide evidence supporting the constrained inbreeding hypothesis; i.e. that multiple paternity may impact on fitness over and above that of inbreeding, particularly for preferentially outcrossing trees at life stages beyond seed development.
    PLoS ONE 02/2014; 9(2):e90478. DOI:10.1371/journal.pone.0090478 · 3.23 Impact Factor
Show more