Article

A genotype calling algorithm for the Illumina BeadArray platform.

Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK.
Bioinformatics (Impact Factor: 4.62). 11/2007; 23(20):2741-6. DOI: 10.1093/bioinformatics/btm443
Source: PubMed

ABSTRACT Large-scale genotyping relies on the use of unsupervised automated calling algorithms to assign genotypes to hybridization data. A number of such calling algorithms have been recently established for the Affymetrix GeneChip genotyping technology. Here, we present a fast and accurate genotype calling algorithm for the Illumina BeadArray genotyping platforms. As the technology moves towards assaying millions of genetic polymorphisms simultaneously, there is a need for an integrated and easy-to-use software for calling genotypes.
We have introduced a model-based genotype calling algorithm which does not rely on having prior training data or require computationally intensive procedures. The algorithm can assign genotypes to hybridization data from thousands of individuals simultaneously and pools information across multiple individuals to improve the calling. The method can accommodate variations in hybridization intensities which result in dramatic shifts of the position of the genotype clouds by identifying the optimal coordinates to initialize the algorithm. By incorporating the process of perturbation analysis, we can obtain a quality metric measuring the stability of the assigned genotype calls. We show that this quality metric can be used to identify SNPs with low call rates and accuracy.
The C++ executable for the algorithm described here is available by request from the authors.

0 Bookmarks
 · 
220 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Epigenetic modifications such as DNA methylation play a key role in gene regulation and disease susceptibility. However, little is known about the genome-wide frequency, localization, and function of methylation variation and how it is regulated by genetic and environ-mental factors. We utilized the Multiple Tissue Human Expression Resource (MuTHER) and generated Illumina 450K adipose methylome data from 648 twins. We found that individual CpGs had low variance and that variability was suppressed in promoters. We noted that DNA methylation variation was highly heritable (h 2 median ¼ 0.34) and that shared environmental effects correlated with metabolic phenotype-associated CpGs. Analysis of methylation quantitative-trait loci (metQTL) revealed that 28% of CpGs were associated with nearby SNPs, and when overlapping them with adipose expression quantitative-trait loci (eQTL) from the same individuals, we found that 6% of the loci played a role in regulating both gene expression and DNA methylation. These associations were bidirectional, but there were pronounced negative associations for promoter CpGs. Integration of metQTL with adipose reference epigenomes and disease associations revealed significant enrichment of metQTL overlapping metabolic-trait or disease loci in enhancers (the strongest effects were for high-density lipoprotein cholesterol and body mass index [BMI]). We followed up with the BMI SNP rs713586, a cg01884057 metQTL that overlaps an enhancer upstream of ADCY3, and used bisulphite sequencing to refine this region. Our results showed widespread population invariability yet sequence dependence on adipose DNA methylation but that incorporating maps of reg-ulatory elements aid in linking CpG variation to gene regulation and disease risk in a tissue-dependent manner.
    The American Journal of Human Genetics 11/2013; · 10.99 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Vibrio vulnificus is an aquatic bacterium and an important human pathogen. Strains of V. vulnificus are classified into three different biotypes. The newly emerged biotype 3 has been found to be clonal and restricted to Israel. In the family Vibrionaceae, horizontal gene transfer is the main mechanism responsible for the emergence of new pathogen groups. To better understand the evolution of the bacterium, and in particular to trace the evolution of biotype 3, we performed genome-wide SNP genotyping of 254 clinical and environmental V. vulnificus isolates with worldwide distribution recovered over a 30-year period, representing all phylogeny groups. A custom single-nucleotide polymorphism (SNP) array implemented on the Illumina GoldenGate platform was developed based on 570 SNPs randomly distributed throughout the genome. In general, the genotyping results divided the V. vulnificus species into three main phylogenetic lineages and an additional subgroup, clade B, consisting of environmental and clinical isolates from Israel. Data analysis suggested that 69% of biotype 3 SNPs are similar to SNPs from clade B, indicating that biotype 3 and clade B have a common ancestor. The rest of the biotype 3 SNPs were scattered along the biotype 3 genome, probably representing multiple chromosomal segments that may have been horizontally inserted into the clade B recipient core genome from other phylogroups or bacterial species sharing the same ecological niche. Results emphasize the continuous evolution of V. vulnificus and support the emergence of new pathogenic groups within this species as a recurrent phenomenon. Our findings contribute to a broader understanding of the evolution of this human pathogen.
    PLoS ONE 12/2014; 9(12):e114576. · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Parentage control is moving from short tandem repeats- to single nucleotide polymorphism (SNP) systems. For SNP-based parentage control in cattle, the ISAG-ICAR Committee proposes a set of 100/200 SNPs but quality criteria are lacking. Regarding German Holstein-Friesian cattle with only a limited number of evaluated individuals, the exclusion probability is not well-defined. We propose a statistical procedure for excluding single SNPs from parentage control, based on case-by-case evaluation of the GenCall score, to minimize parentage exclusion, based on miscalled genotypes. Exclusion power of the ISAG-ICAR SNPs used for the German Holstein-Friesian population was adjusted based on the results of more than 25 000 individuals. Experimental data were derived from routine genomic selection analyses of the German Holstein-Friesian population using the Illumina BovineSNP50 v2 BeadChip (20 000 individuals) or the EuroG10K variant (7000 individuals). Averages and standard deviations of GenCall scores for the 200 SNPs of the ISAG-ICAR recommended panel were calculated and used to calculate the downward Z-value. Based on minor allelic frequencies in the Holstein-Friesian population, one minus exclusion probability was equal to 1.4×10-10 and 7.2×10-26, with one and two parents, respectively. Two monomorphic SNPs from the 100-SNP ISAG-ICAR core-panel did not contribute. Simulation of 10 000 parentage control combinations, using the GenCall score data from both BeadChips, showed that with a Z-value greater than 3.66 only about 2.5% parentages were excluded, based on the ISAG-ICAR recommendations (core-panel: ≥ 90 SNPs for one, ≥ 85 SNPs for two parents). When applied to real data from 1750 single parentage assessments, the optimal threshold was determined to be Z = 5.0, with only 34 censored cases and reduction to four (0.2%) doubtful parentages. About 70 parentage exclusions due to weak genotype calls were avoided, whereas true exclusions (n = 34) were unaffected. Using SNPs for parentage evaluation provides a high exclusion power also for parent identification. SNPs with a low GenCall score show a high tendency towards intra-molecular secondary structures and substantially contribute to false exclusion of parentages. We propose a method that controls this error without excluding too many parent combinations from the evaluation.
    Genetics Selection Evolution 12/2015; 47(1):3. · 3.75 Impact Factor

Full-text (2 Sources)

Download
40 Downloads
Available from
May 16, 2014