Article

New Optimization Model and Algorithm for Sibling Reconstruction from Genetic Markers

INFORMS Journal on Computing (Impact Factor: 1.32). 05/2010; 22(2):180-194. DOI: 10.1287/ijoc.1090.0322
Source: DBLP

ABSTRACT With improved tools for collecting genetic data from natural and experimental populations, new opportunities arise to study fundamental biological processes, including behavior, mating systems, adaptive trait evolution, and dispersal patterns. Full use of the newly available genetic data often depends upon reconstructing genealogical relationships of individual organisms, such as sibling reconstruction. This paper presents a new optimization framework for sibling reconstruction from single generation microsatellite genetic data. Our framework is based on assumptions of parsimony and combinatorial concepts of Mendel's inheritance rules. Here, we develop a novel optimization model for sibling reconstruction as a large-scale mixed-integer program (MIP), shown to be a generalization of the set covering problem. We propose a new heuristic approach to efficiently solve this large-scale optimization problem. We test our approach on real biological data as presented in other studies as well as simulated data, and compare our results with other state-of-the-art sibling reconstruction methods. The empirical results show that our approaches are very efficient and outperform other methods while providing the most accurate solutions for two benchmark data sets. The results suggest that our framework can be used as an analytical and computational tool for biologists to better study ecological and evolutionary processes involving knowledge of familial relationships in a wide variety of biological systems.

Download full-text

Full-text

Available from: Bhaskar Dasgupta, Jul 30, 2015
0 Followers
 · 
97 Views
  • Source
    • "The two-allele condition is tighter and more restricted than its four-allele counterpart, allowing a more accurate reconstruction. The mathematical constraints of the two-allele condition for sibling group were derived in Chaovalitwongse et al. (2010) and are used as the basis of mathematical models in this paper. From the example in Figure 2, shrimps a and b can be included in the same biologically consistent sibling group because they both satisfy constraints (i) and (ii). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Establishing family relationships, such as parentage and sibling relationships, is fundamental in biological research, especially in wild species, as they are often important to understanding evolutionary, ecological, and behavioral processes. Because it is commonly impossible to determine familial relationships from field observations alone, the reconstruction of sibling relationships often depends on informative genetic markers coupled with accurate sibling reconstruction algorithms. Most studies in the literature reconstruct sibling relationships using methods that are based on either statistical analyses (i.e., likelihood estimation) or combinatorial concepts (i.e., Mendelian inheritance laws) of genetic data. We present a novel computational framework that integrates both combinatorial concepts and statistical analyses into one sibling reconstruction optimization model. To solve this integrated model, we propose a column-generation approach with a branch-and-price method. Under the assumption of parsimonious reconstruction, the master problem is to find the minimum set of sibling groups to cover the tested population. Pricing subproblems, which include both statistical similarity and combinatorial concepts of genetic data, are iteratively solved to generate high-quality sibling group candidates. Tested on real biological data sets, our approach efficiently provides reconstruction results that are more accurate than those provided by other state-of-the-art reconstruction algorithms.
    Informs Journal on Computing 02/2015; 27(1):35-47. DOI:10.1287/ijoc.2014.0608 · 1.12 Impact Factor
  • Source
    • "Our approaches enumerated all possible sibling groups by following the Mendel's laws and solved a set covering problem to find a minimum set of representative sibling groups, which is based on the parsimony assumption when the actually number of sibling groups is not known a priori. Most recently, Chaovalitwongse et al. (2010) proposed an iterative heuristic approach, IMCS, to solve a new optimization model (2AOM) with the combinatorial constraints to find a partition of maximal sibling groups. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The capacitated clustering problem (CCP) has been studied in a wide range of applications. In this study, we investigate a challenging CCP in computational biology, namely, sibling reconstruction problem (SRP). The goal of SRP is to establish the sibling relationship (i.e., groups of siblings) of a population from genetic data. The SRP has gained more and more interests from computational biologists over the past decade as it is an important and necessary keystone for studies in genetic and population biology. We propose a large-scale mixed-integer formulation of the CCP for SRP that is based on both combinatorial and statistical genetic concepts. The objective is not only to find the minimum number of sibling groups, but also to maximize the degree of similarity of individuals in the same sibling groups while each sibling group is subject to genetic constraints derived from Mendel's laws. We develop a new randomized greedy optimization algorithm to effectively and efficiently solve this SRP. The algorithm consists of two key phases: construction and enhancement. In the construction phase, a greedy approach with randomized perturbation is applied to construct multiple sibling groups iteratively. In the enhancement phase, a two-stage local search with a memory function is used to improve the solution quality with respect to the similarity measure. We demonstrate the effectiveness of the proposed algorithm using real biological data sets and compare it with state-of-the-art approaches in the literature. We also test it on larger simulated data sets. The experimental results show that the proposed algorithm provide the best reconstruction solutions.
    Computers & Operations Research 03/2012; 39(3):609-619. DOI:10.1016/j.cor.2011.04.017 · 1.72 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: While full sibling group reconstruction from microsatellite data is a well studied problem, reconstruction of half sibling groups is much less studied, theoretically challenging, and computationally intense problem. In this paper, we present two different formulations of the half-sib reconstruction problem and prove their NP-hardness. We also present exact solutions for these formulations and develop heuristics. Using biological and synthetics data sets we present experimental results and compare them with the leading alternative software COLONY. We show that our results are computationally superior and in terms of quality allow half-sib group reconstruction in the presence of polygamy (unlike COLONY), which is prevalent in nature.
Show more