Conference Paper

On Single-Array Genotype Calling Algorithms

Dept. of Med., Univ. of Chicago, Chicago, IL
DOI: 10.1109/BMEI.2008.107 Conference: BioMedical Engineering and Informatics, 2008. BMEI 2008. International Conference on, Volume: 1
Source: IEEE Xplore

ABSTRACT This paper describes issues in using single-array algorithms for calling genotypes for Affymetrix arrays, and introduces a computationally efficient procedure that is designed to be used as a complement to the multi-arrays algorithms. The new tool is based on ideas from a previously introduced algorithm [9] with modifications that improve accuracy. These modifications are also necessary for handling the data from the new arrays which have a modified design with no perfect-matches. The main gain in accuracy is obtained from the partitioning of the probes in homogeneous clusters based on measures of efficiency of probe hybridization that are calculated from the probe sequence composition, and based on measures of probe performance that are calculated using a small training dataset.

3 Reads
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: DNA/DNA duplex formation is the basic mechanism that is used in genome tiling arrays and SNP arrays manufactured by Affymetrix. However, detailed knowledge of the physical process is still lacking. In this study, we show a free energy analysis of DNA/DNA duplex formation these arrays based on the positional-dependent nearest-neighbor (PDNN) model, which was developed previously for describing DNA/RNA duplex formation on expression microarrays. Our results showed that the two ends of a probe contribute less to the stability of the duplexes and that there is a microarray surface effect on binding affinities. We also showed that free energy cost of a single mismatch depends on the bases adjacent to the mismatch site and obtained a comprehensive table of the cost of a single mismatch under all possible combination of adjacent bases. The mismatch costs were found to be correlated with those determined in aqueous solution. We further demonstrate that the DNA copy number estimated from the SNP array correlates negatively with the target length; this is presumably caused by inefficient PCR amplification for long fragments. These results provide important insights into the molecular mechanisms of microarray technology and have implications for microarray design and the interpretation of observed data.
    Nucleic Acids Research 02/2007; 35(3):e18. DOI:10.1093/nar/gkl1064 · 9.11 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In a recent paper [Phys. Rev. E 68, 011906 (2003)], Naef and Magnasco suggested that the "bright" mismatches observed in Affymetrix microarray experiments are caused by the fluorescent molecules used to label RNA target sequences, which would impede target-probe hybridization. Their conclusion is based on the observation of "unexpected" asymmetries in the affinities obtained by fitting microarray data from publicly available experiments. We point out here that the observed asymmetry is due to the inequivalence of RNA and DNA, and that the reported affinities are consistent with stacking free energies obtained from melting experiments of unlabeled nucleic acids in solution. The conclusion of Naef and Magnasco is therefore based on an unjustified assumption.
    Physical Review E 07/2006; 73(6 Pt 1):063901; author reply 063902. DOI:10.1103/PhysRevE.73.063901 · 2.29 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Microarrays have been used extensively to analyze the expression profiles for thousands of genes in parallel. Most of the widely used methods for analyzing Affymetrix Genechip microarray data, including RMA, GCRMA and Model Based Expression Index (MBEI), summarize probe signal intensity data to generate a single measure of expression for each transcript on the array. In contrast, other methods are applied directly to probe intensities, negating the need for a summarization step. In this study, we used the Affymetrix rat genome Genechip to explore variability in probe response patterns within transcripts. We considered a number of possible sources of variability in probe sets including probe location within the transcript, middle base pair of the probe sequence, probe overlap, sequence homology and affinity. Although affinity, middle base pair and probe location effects may be seen at the gross array level, these factors only account for a small proportion of the variation observed at the gene level. A BLAST search and the presence of probe by treatment interactions for selected differentially expressed genes showed high sequence homology for many probes to non-target genes. We suggest that examination and modeling of probe level intensities can be used to guide researchers in refining their conclusions regarding differentially expressed genes. We discuss implications for probe sequence selection for confirmatory analysis using real time PCR.
    BMC Bioinformatics 02/2007; 8(1):146. DOI:10.1186/1471-2105-8-146 · 2.58 Impact Factor