Microarray scanner calibration curves: characteristics and implications

National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, Arkansas 72079, USA.
BMC Bioinformatics (Impact Factor: 2.58). 08/2005; 6 Suppl 2(Suppl 2):S11. DOI: 10.1186/1471-2105-6-S2-S11
Source: PubMed


Microarray-based measurement of mRNA abundance assumes a linear relationship between the fluorescence intensity and the dye concentration. In reality, however, the calibration curve can be nonlinear.
By scanning a microarray scanner calibration slide containing known concentrations of fluorescent dyes under 18 PMT gains, we were able to evaluate the differences in calibration characteristics of Cy5 and Cy3. First, the calibration curve for the same dye under the same PMT gain is nonlinear at both the high and low intensity ends. Second, the degree of nonlinearity of the calibration curve depends on the PMT gain. Third, the two PMTs (for Cy5 and Cy3) behave differently even under the same gain. Fourth, the background intensity for the Cy3 channel is higher than that for the Cy5 channel. The impact of such characteristics on the accuracy and reproducibility of measured mRNA abundance and the calculated ratios was demonstrated. Combined with simulation results, we provided explanations to the existence of ratio underestimation, intensity-dependence of ratio bias, and anti-correlation of ratios in dye-swap replicates. We further demonstrated that although Lowess normalization effectively eliminates the intensity-dependence of ratio bias, the systematic deviation from true ratios largely remained. A method of calculating ratios based on concentrations estimated from the calibration curves was proposed for correcting ratio bias.
It is preferable to scan microarray slides at fixed, optimal gain settings under which the linearity between concentration and intensity is maximized. Although normalization methods improve reproducibility of microarray measurements, they appear less effective in improving accuracy.


Available from: Roger G Perkins
  • Source
    • "Consequently differences in synthesis efficiency/abasic sites between nucleotides explain why forward and reverse DNA probe spots often have significantly different intensities and explain chip-effects. Additionally we explicitly model target DNA concentration, hybridization temperature, mean fragmentation size, wash stringency [15,34,35], and microarray scanner settings [36] which together with errors along the probe sequence give rise to batch effects. Previous studies have reported that different probe sequences have different saturation intensities [34], under our model this is expected given that each probe spot consists of a “forest” of probes and that the number of probes capable of binding target DNA is sequence dependent. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background DNA microarrays are used both for research and for diagnostics. In research, Affymetrix arrays are commonly used for genome wide association studies, resequencing, and for gene expression analysis. These arrays provide large amounts of data. This data is analyzed using statistical methods that quite often discard a large portion of the information. Most of the information that is lost comes from probes that systematically fail across chips and from batch effects. The aim of this study was to develop a comprehensive model for hybridization that predicts probe intensities for Affymetrix arrays and that could provide a basis for improved microarray analysis and probe development. The first part of the model calculates probe binding affinities to all the possible targets in the hybridization solution using the Langmuir isotherm. In the second part of the model we integrate details that are specific to each experiment and contribute to the differences between hybridization in solution and on the microarray. These details include fragmentation, wash stringency, temperature, salt concentration, and scanner settings. Furthermore, the model fits probe synthesis efficiency and target concentration parameters directly to the data. All the parameters used in the model have a well-established physical origin. Results For the 302 chips that were analyzed the mean correlation between expected and observed probe intensities was 0.701 with a range of 0.88 to 0.55. All available chips were included in the analysis regardless of the data quality. Our results show that batch effects arise from differences in probe synthesis, scanner settings, wash strength, and target fragmentation. We also show that probe synthesis efficiencies for different nucleotides are not uniform. Conclusions To date this is the most complete model for binding on microarrays. This is the first model that includes both probe synthesis efficiency and hybridization kinetics/cross-hybridization. These two factors are sequence dependent and have a large impact on probe intensity. The results presented here provide novel insight into the effect of probe synthesis errors on Affymetrix microarrays; furthermore, the algorithms developed in this work provide useful tools for the analysis of cross-hybridization, probe synthesis efficiency, fragmentation, wash stringency, temperature, and salt concentration on microarray intensities.
    BMC Genomics 12/2012; 13(1):737. DOI:10.1186/1471-2164-13-737 · 3.99 Impact Factor
  • Source
    • "Rather than remove flawed measurements from individual files, flawed probes and probes for which there were not consistent above-background measurement in all samples (discussed next) were removed from the array layout file with Aroma, ensuring consistent removal of the related measurements across all files [52]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Sporadic Amyotrophic Lateral Sclerosis (sALS) is a devastating, complex disease of unknown etiology. We studied this disease with microarray technology to capture as much biological complexity as possible. The Affymetrix-focused BaFL pipeline takes into account problems with probes that arise from physical and biological properties, so we adapted it to handle the long-oligonucleotide probes on our arrays (hence LO-BaFL). The revised method was tested against a validated array experiment and then used in a meta-analysis of peripheral white blood cells from healthy control samples in two experiments. We predicted differentially expressed (DE) genes in our sALS data, combining the results obtained using the TM4 suite of tools with those from the LO-BaFL method. Those predictions were tested using qRT-PCR assays. Results LO-BaFL filtering and DE testing accurately predicted previously validated DE genes in a published experiment on coronary artery disease (CAD). Filtering healthy control data from the sALS and CAD studies with LO-BaFL resulted in highly correlated expression levels across many genes. After bioinformatics analysis, twelve genes from the sALS DE gene list were selected for independent testing using qRT-PCR assays. High-quality RNA from six healthy Control and six sALS samples yielded the predicted differential expression for 7 genes: TARDBP, SKIV2L2, C12orf35, DYNLT1, ACTG1, B2M, and ILKAP. Four of the seven have been previously described in sALS studies, while ACTG1, B2M and ILKAP appear in the context of this disease for the first time. Supplementary material can be accessed at: http://webpages.uncc.edu/~cbaciu/LO-BaFL/supplementary_data.html . Conclusion LO-BaFL predicts DE results that are broadly similar to those of other methods. The small healthy control cohort in the sALS study is a reasonable foundation for predicting DE genes. Modifying the BaFL pipeline allowed us to remove noise and systematic errors, improving the power of this study, which had a small sample size. Each bioinformatics approach revealed DE genes not predicted by the other; subsequent PCR assays confirmed seven of twelve candidates, a relatively high success rate.
    BMC Bioinformatics 09/2012; 13(1). DOI:10.1186/1471-2105-13-244 · 2.58 Impact Factor
  • Source
    • "The log2 fold changes for most genes were consistent in the direction of regulation (down or up). However, there was a slight fold change compression [18] for the data obtained with the expired U133A microarrays. That is, the magnitude of differential expression (fold change) for the same gene measured from the expired U133A microarrays (the X axis) was lower than that from the unexpired U133Plus2 microarrays (the Y axis). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The Affymetrix GeneChip® system is a commonly used platform for microarray analysis but the technology is inherently expensive. Unfortunately, changes in experimental planning and execution, such as the unavailability of previously anticipated samples or a shift in research focus, may render significant numbers of pre-purchased GeneChip® microarrays unprocessed before their manufacturer's expiration dates. Researchers and microarray core facilities wonder whether expired microarrays are still useful for gene expression analysis. In addition, it was not clear whether the two human reference RNA samples established by the MAQC project in 2005 still maintained their transcriptome integrity over a period of four years. Experiments were conducted to answer these questions. Microarray data were generated in 2009 in three replicates for each of the two MAQC samples with either expired Affymetrix U133A or unexpired U133Plus2 microarrays. These results were compared with data obtained in 2005 on the U133Plus2 microarray. The percentage of overlap between the lists of differentially expressed genes (DEGs) from U133Plus2 microarray data generated in 2009 and in 2005 was 97.44%. While there was some degree of fold change compression in the expired U133A microarrays, the percentage of overlap between the lists of DEGs from the expired and unexpired microarrays was as high as 96.99%. Moreover, the microarray data generated using the expired U133A microarrays in 2009 were highly concordant with microarray and TaqMan® data generated by the MAQC project in 2005. Our results demonstrated that microarray data generated using U133A microarrays, which were more than four years past the manufacturer's expiration date, were highly specific and consistent with those from unexpired microarrays in identifying DEGs despite some appreciable fold change compression and decrease in sensitivity. Our data also suggested that the MAQC reference RNA samples, stored at -80°C, were stable over a time frame of at least four years.
    BMC Bioinformatics 10/2010; 11 Suppl 6(Suppl 6):S10. DOI:10.1186/1471-2105-11-S6-S10 · 2.58 Impact Factor
Show more