• [Show abstract] [Hide abstract]
    ABSTRACT: Michigan-style learning classifier systems (M-LCSs) represent an adaptive and powerful class of evolutionary algorithms which distribute the learned solution over a sizable population of rules. However their application to complex real world data mining problems, such as genetic association studies, has been limited. Traditional knowledge discovery strategies for M-LCS rule populations involve sorting and manual rule inspection. While this approach may be sufficient for simpler problems, the confounding influence of noise and the need to discriminate between predictive and non-predictive attributes calls for additional strategies. Additionally, tests of significance must be adapted to M-LCS analyses in order to make them a viable option within fields that require such analyses to assess confidence. In this work we introduce an M-LCS analysis pipeline that combines uniquely applied visualizations with objective statistical evaluation for the identification of predictive attributes, and reliable rule generalizations in noisy single-step data mining problems. This work considers an alternative paradigm for knowledge discovery in M-LCSs, shifting the focus from individual rules to a global, population-wide perspective. We demonstrate the efficacy of this pipeline applied to the identification of epistasis (i.e., attribute interaction) and heterogeneity in noisy simulated genetic association data.
    IEEE Computational Intelligence Magazine 11/2012; 7(4):35-45. · 2.71 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The statistical genetics phenomenon of epistasis is widely acknowledged to confound disease etiology. In order to evaluate strategies for detecting these complex multi-locus disease associations, simulation studies are required. The development of the GAMETES software for the generation of complex genetic models, has provided the means to randomly generate an architecturally diverse population of epistatic models that are both pure and strict, i.e. all n loci, but no fewer, are predictive of phenotype. Previous theoretical work characterizing complex genetic models has yet to examine pure, strict, epistasis which should be the most challenging to detect. This study addresses three goals: (1) Classify and characterize pure, strict, two-locus epistatic models, (2) Investigate the effect of model 'architecture' on detection difficulty, and (3) Explore how adjusting GAMETES constraints influences diversity in the generated models.
    BioData Mining 01/2014; 7:8. · 1.54 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Sporadic amyotrophic lateral sclerosis (sALS) is a severe neurodegenerative disease that causes progressive motor neuron death. Although the etiology of sALS remains unknown, genetic variants are thought to predispose individuals to the disease. Several recent genome-wide association studies have identified a number of loci that increase sALS susceptibility, but these only explain a small proportion of the disease. To extend the current genetic evidence and to identify novel candidates of sALS, we performed a pooling genome-wide association study by 859,311 autosomal single-nucleotide polymorphisms of IlluminaHumanOmniZhongHua-8 combining pathway analysis in 250 typical sALS cases precluding age, clinical course, and phenotype interference and 250 control subjects from Chinese Han populations (CHP). The results revealed that 8 novel loci of 1p34.3, 3p21.1, 3p22.2, 10p15.2, 22q12.1, 3q13.11, 11q25, 12q24.33, and 5 previously reported loci of CNTN4 (kgp11325216), ATXN1 (kgp8327591), C9orf72 (kgp6016770), ITPR2 (kgp3041552), and SOD1 (kgp10760302) were associated with sALS from CHP. Furthermore, the pathway analysis based on the Gene Set Analysis Toolkit V2 showed that 10 top pathways were strongly associated with sALS from CHP, and among them, the 7 most potentially candidate pathways were phosphatidylinositol signaling system, Wnt signaling pathway, axon guidance, MAPK signaling pathway, neurotrophin signaling pathway, arachidonic acid metabolism, and T-cell receptor signaling pathway, a total of 39 significantly associate genes in 7 candidate pathways was suggested to involve in the pathogenesis of sALS from CHP. In conclusion, our results revealed several new loci and pathways related to sALS from CHP and extend the association evidence for partial loci, genes, and pathways, which were previously identified in other populations. Thus, our data provided new clues for exploring the pathogenesis of sALS.
    Neurobiology of aging 01/2014; · 5.94 Impact Factor