Statistical methods of background correction for Illumina BeadArray data

Division of Biostatistics, Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, USA.
Bioinformatics (Impact Factor: 4.62). 03/2009; 25(6):751-7. DOI: 10.1093/bioinformatics/btp040
Source: PubMed

ABSTRACT Advances in technology have made different microarray platforms available. Among the many, Illumina BeadArrays are relatively new and have captured significant market share. With BeadArray technology, high data quality is generated from low sample input at reduced cost. However, the analysis methods for Illumina BeadArrays are far behind those for Affymetrix oligonucleotide arrays, and so need to be improved.
In this article, we consider the problem of background correction for BeadArray data. One distinct feature of BeadArrays is that for each array, the noise is controlled by over 1000 bead types conjugated with non-specific oligonucleotide sequences. We extend the robust multi-array analysis (RMA) background correction model to incorporate the information from negative control beads, and consider three commonly used approaches for parameter estimation, namely, non-parametric, maximum likelihood estimation (MLE) and Bayesian estimation. The proposed approaches, as well as the existing background correction methods, are compared through simulation studies and a data example. We find that the maximum likelihood and Bayes methods seem to be the most promising.
Supplementary data are available at Bioinformatics online.

Download full-text


Available from: Michael D Story, Jul 28, 2015
  • Source
    • "Compared with similar models previously proposed (Bolstad et al., 2003; Xie et al., 2009), our model takes into account the contribution of noise peaks by the second term in Eqn (6) and the influence of noise on the detection of small signal peaks by the error function in Eqn (7). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Simultaneous recordings of multiple neuron activities with multi-channel extracellular electrodes are widely used for studying information processing by the brain's neural circuits. In this method, the recorded signals containing the spike events of a number of adjacent or distant neurons must be correctly sorted into spike trains of individual neurons, and a variety of methods have been proposed for this spike sorting. However, spike sorting is computationally difficult because the recorded signals are often contaminated by biological noise. Here, we propose a novel method for spike detection, which is the first stage of spike sorting and hence crucially determines overall sorting performance. Our method utilizes a model of extracellular recording data that takes into account variations in spike waveforms, such as the widths and amplitudes of spikes, by detecting the peaks of band-pass-filtered data. We show that the new method significantly improves the cost-performance of multi-channel electrode recordings by increasing the number of cleanly sorted neurons.
    European Journal of Neuroscience 06/2014; 39(11):1943-1950. DOI:10.1111/ejn.12614 · 3.67 Impact Factor
  • Source
    • "Other more sophisticated techniques use explicit probability models for this de-convolution. A model with normally-distributed background variation and exponentially distributed expression levels has proven to be the most popular in this field (McGee and Chen (2006), Xie, Wang and Story (2009)). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Preprocessing forms an oft-neglected foundation for a wide range of statistical and scientific analyses. However, it is rife with subtleties and pitfalls. Decisions made in preprocessing constrain all later analyses and are typically irreversible. Hence, data analysis becomes a collaborative endeavor by all parties involved in data collection, preprocessing and curation, and downstream inference. Even if each party has done its best given the information and resources available to them, the final result may still fall short of the best possible in the traditional single-phase inference framework. This is particularly relevant as we enter the era of "big data". The technologies driving this data explosion are subject to complex new forms of measurement error. Simultaneously, we are accumulating increasingly massive databases of scientific analyses. As a result, preprocessing has become more vital (and potentially more dangerous) than ever before.
    Bernoulli 09/2013; 19(4). DOI:10.3150/13-BEJSP16 · 1.30 Impact Factor
  • Source
    • "We note that the latter two distributions are popular within the context of biological statistics and informatics (e.g. Irizarry et al. (2003), Xie et al. (2009), Plancade et al. (2011)). Clearly, the limiting case ∞ → ∞ → β α , of the normal Laplace is a normal distribution. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Equations of squared skewness and kurtosis as well as sharp inequalities between these quantities are derived for the normal bilateral gamma (NBG) convolution and the important normal variance gamma (NVG) sub-family. Application to portfolio selection with CARA utility is considered. With the NVG as test return distribution, it is analyzed whether a recent approximate ranking function with cubic mean-variance-skewness-kurtosis trade-off should be preferred to the original Gaussian ranking function with linear mean-variance trade-off or not. Based on an appropriate ranking efficiency measure and a simulation study, one notes, up to some exceptional cases, a systematic efficiency increase of the approximate ranking versus the Gaussian ranking. An empirical data analysis for eight different sets of returns from the Swiss Market and the Standard & Poors 500 stock indices, fitted to the NVG with the moment method, confirms the results from the simulation study. For this, full analytical solutions to the moment equations of the variance gamma and the normal variance gamma turn out to be very useful.
Show more