A genotype calling algorithm for the Illumina BeadArray platform

Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK.
Bioinformatics (Impact Factor: 4.98). 11/2007; 23(20):2741-6. DOI: 10.1093/bioinformatics/btm443
Source: PubMed


Large-scale genotyping relies on the use of unsupervised automated calling algorithms to assign genotypes to hybridization data. A number of such calling algorithms have been recently established for the Affymetrix GeneChip genotyping technology. Here, we present a fast and accurate genotype calling algorithm for the Illumina BeadArray genotyping platforms. As the technology moves towards assaying millions of genetic polymorphisms simultaneously, there is a need for an integrated and easy-to-use software for calling genotypes.
We have introduced a model-based genotype calling algorithm which does not rely on having prior training data or require computationally intensive procedures. The algorithm can assign genotypes to hybridization data from thousands of individuals simultaneously and pools information across multiple individuals to improve the calling. The method can accommodate variations in hybridization intensities which result in dramatic shifts of the position of the genotype clouds by identifying the optimal coordinates to initialize the algorithm. By incorporating the process of perturbation analysis, we can obtain a quality metric measuring the stability of the assigned genotype calls. We show that this quality metric can be used to identify SNPs with low call rates and accuracy.
The C++ executable for the algorithm described here is available by request from the authors.

Download full-text


Available from: Taane G Clark, Jan 10, 2014
  • Source
    • "Genotyping, data calling and automated QC Samples were assayed on the Illumina Human610-Quad BeadChip using the Infinium HD Super Assay (Illumina, San Diego, CA, USA); beadchips were scanned with an iScan. Intensity data, normalised according to the standard Illumina algorithm, was extracted and genotypes called using Illuminus[16]. Sample call rate was calculated and Illuminus re-run using only the samples with a call rate of at least 90 % (to improve cluster definition).Samples having a call rate of less than 95 % or having autosomal heterozygosity values in the tail of the distribution were excluded. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Warfarin is the most widely used oral anticoagulant worldwide, but it has a narrow therapeutic index which necessitates constant monitoring of anticoagulation response. Previous genome-wide studies have focused on identifying factors explaining variance in stable dose, but have not explored the initial patient response to warfarin, and a wider range of clinical and biochemical factors affecting both initial and stable dosing with warfarin. A prospective cohort of 711 patients starting warfarin was followed up for 6 months with analyses focusing on both non-genetic and genetic factors. The outcome measures used were mean weekly warfarin dose (MWD), stable mean weekly dose (SMWD) and international normalised ratio (INR) > 4 during the first week. Samples were genotyped on the Illumina Human610-Quad chip. Statistical analyses were performed using Plink and R. VKORC1 and CYP2C9 were the major genetic determinants of warfarin MWD and SMWD, with CYP4F2 having a smaller effect. Age, height, weight, cigarette smoking and interacting medications accounted for less than 20 % of the variance. Our multifactorial analysis explained 57.89 % and 56.97 % of the variation for MWD and SMWD, respectively. Genotypes for VKORC1 and CYP2C9*3, age, height and weight, as well as other clinical factors such as alcohol consumption, loading dose and concomitant drugs were important for the initial INR response to warfarin. In a small subset of patients for whom data were available, levels of the coagulation factors VII and IX (highly correlated) also played a role. Our multifactorial analysis in a prospectively recruited cohort has shown that multiple factors, genetic and clinical, are important in determining the response to warfarin. VKORC1 and CYP2C9 genetic polymorphisms are the most important determinants of warfarin dosing, and it is highly unlikely that other common variants of clinical importance influencing warfarin dosage will be found. Both VKORC1 and CYP2C9*3 are important determinants of the initial INR response to warfarin. Other novel variants, which did not reach genome-wide significance, were identified for the different outcome measures, but need replication.
    Full-text · Article · Dec 2016 · Genome Medicine
  • Source
    • "A number of algorithms are available for processing the raw signal of paired allele intensities into discrete genotype calls (AA, AB, BB) for each SNP in each sample. Current methods include: GenCall [11], Illumina’s proprietary method implemented in the GenomeStudio software; GenoSNP [12]; Illuminus [13]; CRLMM [14-16]; Birdseed [17] and BeagleCall [18]. Three new methods have been proposed recently to meet the challenge of calling low frequency/rare variants on the Illumina platform (M 3[7], zCall [19] and OptiCall [8]). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background SNP genotyping microarrays have revolutionized the study of complex disease. The current range of commercially available genotyping products contain extensive catalogues of low frequency and rare variants. Existing SNP calling algorithms have difficulty dealing with these low frequency variants, as the underlying models rely on each genotype having a reasonable number of observations to ensure accurate clustering. Results Here we develop KRLMM, a new method for converting raw intensities into genotype calls that aims to overcome this issue. Our method is unique in that it applies careful between sample normalization and allows a variable number of clusters k (1, 2 or 3) for each SNP, where k is predicted using the available data. We compare our method to four genotyping algorithms (GenCall, GenoSNP, Illuminus and OptiCall) on several Illumina data sets that include samples from the HapMap project where the true genotypes are known in advance. All methods were found to have high overall accuracy (> 98%), with KRLMM consistently amongst the best. At low minor allele frequency, the KRLMM, OptiCall and GenoSNP algorithms were observed to be consistently more accurate than GenCall and Illuminus on our test data. Conclusions Methods that tailor their approach to calling low frequency variants by either varying the number of clusters (KRLMM) or using information from other SNPs (OptiCall and GenoSNP) offer improved accuracy over methods that do not (GenCall and Illuminus). The KRLMM algorithm is implemented in the open-source crlmm package distributed via the Bioconductor project (
    Full-text · Article · May 2014 · BMC Bioinformatics
  • Source
    • "The samples were genotyped using the Illumina HumanHap610Q array. The normalized intensity data was used by the Illluminus calling algorithm [31] to assign genotypes. No calls were assigned if an individual's most likely genotype was called with a posterior probability threshold of less than 0.95. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background A newly-described syndrome called Aneurysm-Osteoarthritis Syndrome (AOS) was recently reported. AOS presents with early onset osteoarthritis (OA) in multiple joints, together with aneurysms in major arteries, and is caused by rare mutations in SMAD3. Because of the similarity of AOS to idiopathic generalized OA (GOA), we hypothesized that SMAD3 is also associated with GOA and tested the hypothesis in a population-based cohort. Methods Study participants were derived from the Chingford study. Kellgren-Lawrence (KL) grades and the individual features of osteophytes and joint space narrowing (JSN) were scored from radiographs of hands, knees, hips, and lumbar spines. The total KL score, osteophyte score, and JSN score were calculated and used as indicators of the total burden of radiographic OA. Forty-one common SNPs within SMAD3 were genotyped using the Illumina HumanHap610Q array. Linear regression modelling was used to test the association between the total KL score, osteophyte score, and JSN score and each of the 41 SNPs, with adjustment for patient age and BMI. Permutation testing was used to control the false positive rate. Results A total of 609 individuals were included in the analysis. All were Caucasian females with a mean age of 60.9±5.8. We found that rs3825977, with a minor allele (T) frequency of 20%, in the last intron of SMAD3, was significantly associated with total KL score (β = 0.14, Ppermutation = 0.002). This association was stronger for the total JSN score (β = 0.19, Ppermutation = 0.002) than for total osteophyte score (β = 0.11, Ppermutation = 0.02). The T allele is associated with a 1.47-fold increased odds for people with 5 or more joints to be affected by radiographic OA (Ppermutation = 0.046). Conclusion We found that SMAD3 is significantly associated with the total burden of radiographic OA. Further studies are required to reveal the mechanism of the association.
    Full-text · Article · May 2014 · PLoS ONE
Show more