The impact of quantile and rank normalization procedures on the testing power of gene differential expression analysis

BMC Bioinformatics (Impact Factor: 2.58). 04/2013; 14(1):124. DOI: 10.1186/1471-2105-14-124
Source: PubMed


Quantile and rank normalizations are two widely used pre-processing techniques designed to remove technological noise presented in genomic data. Subsequent statistical analysis such as gene differential expression analysis is usually based on normalized expressions. In this study, we find that these normalization procedures can have a profound impact on differential expression analysis, especially in terms of testing power.

We conduct theoretical derivations to show that the testing power of differential expression analysis based on quantile or rank normalized gene expressions can never reach 100% with fixed sample size no matter how strong the gene differentiation effects are. We perform extensive simulation analyses and find the results corroborate theoretical predictions.

Our finding may explain why genes with well documented strong differentiation are not always detected in microarray analysis. It provides new insights in microarray experimental design and will help practitioners in selecting proper normalization procedures.

Download full-text


Available from: Xing Qiu, Mar 10, 2014
  • Source
    • ",j denote the ordered gene expression observations in the jth array (j = 1,2,…,n) of the cth (c = A,B) group, the rth (r = 1,2,…,m) element of this reference array is as follows [3]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The role of Mithramycin as an anticancer drug has been well studied. Sarcoma is a type of cancer arising from cells of mesenchymal origin. Though incidence of sarcoma is not of significant percentage, it becomes vital to understand the role of Mithramycin in controlling tumor progression of sarcoma. In this article, we have analyzed the global gene expression profile changes induced by Mithramycin in two different sarcoma lines from whole genome gene expression profiling microarray data. We have found that the primary mode of action of Mithramycin is by global repression of key cellular processes and gene families like phosphoproteins, kinases, alternative splicing, regulation of transcription, DNA binding, regulation of histone acetylation, negative regulation of gene expression, chromosome organization or chromatin assembly and cytoskeleton.
    Genomics Data 11/2014; 3. DOI:10.1016/j.gdata.2014.11.001
  • Source
    • "Normalization of the microRNA RT-qPCR data was performed with rank normalisation by calculation of fractional rank [23]. First, the Cp value of each individual microRNA species was ranked within each sample. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Fibromyalgia (FM) is characterized by chronic pain and reduced pain threshold. The pathophysiology involves disturbed neuroendocrine function, including impaired function of the growth hormone/insulin-like growth factor-1 axis. Recently, microRNAs have been shown to be important regulatory factors in a number of diseases. The aim of this study was to try to identify cerebrospinal microRNAs with expression specific for FM and to determine their correlation to pain and fatigue. The genome-wide profile of microRNAs in cerebrospinal fluid was assessed in ten women with FM and eight healthy controls using real-time quantitative PCR. Pain thresholds were examined by algometry. Levels of pain (FIQ pain) were rated on a 0-100 mm scale (fibromyalgia impact questionnaire, FIQ). Levels of fatigue (FIQ fatigue) were rated on a 0-100 mm scale using FIQ and by multidimensional fatigue inventory (MFI-20) general fatigue (MFIGF). Expression levels of nine microRNAs were significantly lower in patients with FM patients compared to healthy controls. The microRNAs identified were miR-21-5p, miR-145-5p, miR-29a-3p, miR-99b-5p, miR-125b-5p, miR-23a-3p, 23b-3p, miR-195-5p, miR-223-3p. The identified microRNAs with significantly lower expression in FM were assessed with regard to pain and fatigue. miR-145-5p correlated positively with FIQ pain (r=0.709, p=0.022, n=10) and with FIQ fatigue (r=0.687, p=0.028, n=10). To our knowledge, this is the first study to show a disease-specific pattern of cerebrospinal microRNAs in FM. We have identified nine microRNAs in cerebrospinal fluid that differed between FM patients and healthy controls. One of the identified microRNAs, miR-145 was associated with the cardinal symptoms of FM, pain and fatigue.
    PLoS ONE 10/2013; 8(10):e78762. DOI:10.1371/journal.pone.0078762 · 3.23 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Deep sequencing of transcriptomes has become an indispensable tool for biology, enabling expression levels for thousands of genes to be compared across multiple samples. Since transcript counts scale with sequencing depth, counts from different samples must be normalized to a common scale prior to comparison. We analyzed fifteen existing and novel algorithms for normalizing transcript counts, and evaluated the effectiveness of the resulting normalizations. For this purpose we defined two novel and mutually independent metrics: (1) the number of "uniform" genes (genes whose normalized expression levels have a sufficiently low coefficient of variation), and (2) low Spearman correlation between normalized expression profiles of gene pairs. We also define four novel algorithms, one of which explicitly maximizes the number of uniform genes, and compared the performance of all fifteen algorithms. The two most commonly used methods (scaling to a fixed total value, or equalizing the expression of certain 'housekeeping' genes) yielded particularly poor results, surpassed even by normalization based on randomly selected gene sets. Conversely, seven of the algorithms approached what appears to be optimal normalization. Three of these algorithms rely on the identification of "ubiquitous" genes: genes expressed in all the samples studied, but never at very high or very low levels. We demonstrate that these include a "core" of genes expressed in many tissues in a mutually consistent pattern, which is suitable for use as an internal normalization guide. The new methods yield robustly normalized expression values, which is a prerequisite for the identification of differentially expressed and tissue-specific genes as potential biomarkers.
    PLoS ONE 11/2013; 8(11):e77885. DOI:10.1371/journal.pone.0077885 · 3.23 Impact Factor
Show more