ABSTRACT: Microarray gene expression data often contain missing values. Accurate estimation of the missing values is important for downstream data analyses that require complete data. Nonlinear relationships between gene expression levels have not been well-utilized in missing value imputation. We propose an imputation scheme based on nonlinear dependencies between genes. By simulations based on real microarray data, we show that incorporating nonlinear relationships could improve the accuracy of missing value imputation, both in terms of normalized root-mean-squared error and in terms of the preservation of the list of significant genes in statistical testing. In addition, we studied the impact of artificial dependencies introduced by data normalization on the simulation results. Our results suggest that methods relying on global correlation structures may yield overly optimistic simulation results when the data have been subjected to row (gene)-wise mean removal.
IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM 07/2011; 8(3):723-31. · 2.25 Impact Factor