Evaluation of gene expression data generated from expired Affymetrix GeneChip microarrays using MAQC reference RNA samples
ABSTRACT The Affymetrix GeneChip® system is a commonly used platform for microarray analysis but the technology is inherently expensive. Unfortunately, changes in experimental planning and execution, such as the unavailability of previously anticipated samples or a shift in research focus, may render significant numbers of pre-purchased GeneChip® microarrays unprocessed before their manufacturer's expiration dates. Researchers and microarray core facilities wonder whether expired microarrays are still useful for gene expression analysis. In addition, it was not clear whether the two human reference RNA samples established by the MAQC project in 2005 still maintained their transcriptome integrity over a period of four years. Experiments were conducted to answer these questions.
Microarray data were generated in 2009 in three replicates for each of the two MAQC samples with either expired Affymetrix U133A or unexpired U133Plus2 microarrays. These results were compared with data obtained in 2005 on the U133Plus2 microarray. The percentage of overlap between the lists of differentially expressed genes (DEGs) from U133Plus2 microarray data generated in 2009 and in 2005 was 97.44%. While there was some degree of fold change compression in the expired U133A microarrays, the percentage of overlap between the lists of DEGs from the expired and unexpired microarrays was as high as 96.99%. Moreover, the microarray data generated using the expired U133A microarrays in 2009 were highly concordant with microarray and TaqMan® data generated by the MAQC project in 2005.
Our results demonstrated that microarray data generated using U133A microarrays, which were more than four years past the manufacturer's expiration date, were highly specific and consistent with those from unexpired microarrays in identifying DEGs despite some appreciable fold change compression and decrease in sensitivity. Our data also suggested that the MAQC reference RNA samples, stored at -80°C, were stable over a time frame of at least four years.
SourceAvailable from: Jonathan Daniel WrenBMC Bioinformatics 10/2011; 12 Suppl 10(Suppl 10):S1. DOI:10.1186/1471-2105-12-S10-S1 · 2.67 Impact Factor
[Show abstract] [Hide abstract]
ABSTRACT: Introduction. The microarray datasets from the MicroArray Quality Control (MAQC) project have enabled the assessment of the precision, comparability of microarrays, and other various microarray analysis methods. However, to date no studies that we are aware of have reported the performance of missing value imputation schemes on the MAQC datasets. In this study, we use the MAQC Affymetrix datasets to evaluate several imputation procedures in Affymetrix microarrays. Results. We evaluated several cutting edge imputation procedures and compared them using different error measures. We randomly deleted 5% and 10% of the data and imputed the missing values using imputation tests. We performed 1000 simulations and averaged the results. The results for both 5% and 10% deletion are similar. Among the imputation methods, we observe the local least squares method with k = 4 is most accurate under the error measures considered. The k-nearest neighbor method with k = 1 has the highest error rate among imputation methods and error measures. Conclusions. We conclude for imputing missing values in Affymetrix microarray datasets, using the MAS 5.0 preprocessing scheme, the local least squares method with k = 4 has the best overall performance and k-nearest neighbor method with k = 1 has the worst overall performance. These results hold true for both 5% and 10% missing values.Advances in Bioinformatics 10/2013; 2013:790567. DOI:10.1155/2013/790567
[Show abstract] [Hide abstract]
ABSTRACT: MOTIVATION: RNA-Seq experiments produce digital counts of reads that are affected by both biological and technical variation. To distinguish the systematic changes in expression between conditions from noise, the counts are frequently modeled by the Negative Binomial distribution. However in experiments with small sample size the per-gene estimates of the dispersion parameter are unreliable.Method: We propose a simple and effective approach for estimating the dispersions. First, we obtain the initial estimates for each gene using the method of moments. Second, the estimates are regularized, i.e. shrunk towards a common value that minimizes the average squared difference between the initial estimates and the shrinkage estimates. The approach does not require extra modeling assumptions, is easy to compute, and is compatible with the exact test of differential expression. RESULTS: We evaluated the proposed approach using ten simulated and experimental datasets, and compared its performance with that of currently popular packages edgeR, DESeq, baySeq, BBSeq and SAMseq. For these datasets sSeq performed favorably for experiments with small sample size in sensitivity, specificity and computational time. AVAILABILITY: The open source R-based software package sSeq is available at http://www.stat.purdue.edu/∼dyu/sSeq. CONTACT: Olga Vitek (email@example.com).Bioinformatics 04/2013; 29(10). DOI:10.1093/bioinformatics/btt143 · 4.62 Impact Factor