Correction of technical bias in clinical microarray data improves concordance with known biological information

Children's Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology (CHIP@HST), Harvard Medical School, Boston, MA 02115, USA.
Genome biology (Impact Factor: 10.47). 02/2008; 9(2):R26. DOI: 10.1186/gb-2008-9-2-r26
Source: PubMed

ABSTRACT The performance of gene expression microarrays has been well characterized using controlled reference samples, but the performance on clinical samples remains less clear. We identified sources of technical bias affecting many genes in concert, thus causing spurious correlations in clinical data sets and false associations between genes and clinical variables. We developed a method to correct for technical bias in clinical microarray data, which increased concordance with known biological relationships in multiple data sets.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: As public microarray repositories rapidly accumulate gene expression data, these resources contain increasingly valuable information about cellular processes in human biology. This presents a unique opportunity for intelligent data mining methods to extract information about the transcriptional modules underlying these biological processes. Modeling cellular gene expression as a combination of functional modules, we use independent component analysis (ICA) to derive 423 fundamental components of human biology from a 9395-array compendium of heterogeneous expression data. Annotation using the Gene Ontology (GO) suggests that while some of these components represent known biological modules, others may describe biology not well characterized by existing manually-curated ontologies. In order to understand the biological functions represented by these modules, we investigate the mechanism of the preclinical anti-cancer drug parthenolide (PTL) by analyzing the differential expression of our fundamental components. Our method correctly identifies known pathways and predicts that N-glycan biosynthesis and T-cell receptor signaling may contribute to PTL response. The fundamental gene modules we describe have the potential to provide pathway-level insight into new gene expression datasets.
    Journal of Biomedical Informatics 12/2010; 43(6):932-44. DOI:10.1016/j.jbi.2010.07.001
  • [Show abstract] [Hide abstract]
    ABSTRACT: The effects of dipyridamole and dipyridamole induced ischemia on the traditional spectral parameters of heart rate variability (HRV) were investigated in normal and coronary artery disease (CAD) patients, who underwent a dipyridamole echocardiography test (DET). The relevant spectral parameters (LF and NF powers, LF/HF ratio) were monitored on a beat-to-beat basis and their variations were linked to the different test epochs and the different pathological events as detected by echocardiographic and electrocardiographic changes. A recursive least square (RLS) identification algorithm was used to this purpose, which is able to track the dynamical changes in nonstationary signals. Spectral parameters were obtained by means of a pole-tracking algorithm which fulfils an efficient extraction of these parameters on a beat-to-beat basis. The estimated parameters allow one, to achieve more information on the autonomic nervous system (ANS) status during drug infusion and the correspondence with the induced ischemia
    Computers in Cardiology 1994; 10/1994
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Heritable diseases are caused by germ-line mutations that, despite tissuewide presence, often lead to tissue-specific pathology. Here, we make a systematic analysis of the link between tissue-specific gene expression and pathological manifestations in many human diseases and cancers. Diseases were systematically mapped to tissues they affect from disease-relevant literature in PubMed to create a disease-tissue covariation matrix of high-confidence associations of >1,000 diseases to 73 tissues. By retrieving >2,000 known disease genes, and generating 1,500 disease-associated protein complexes, we analyzed the differential expression of a gene or complex involved in a particular disease in the tissues affected by the disease, compared with nonaffected tissues. When this analysis is scaled to all diseases in our dataset, there is a significant tendency for disease genes and complexes to be overexpressed in the normal tissues where defects cause pathology. In contrast, cancer genes and complexes were not overexpressed in the tissues from which the tumors emanate. We specifically identified a complex involved in XY sex reversal that is testis-specific and down-regulated in ovaries. We also identified complexes in Parkinson disease, cardiomyopathies, and muscular dystrophy syndromes that are similarly tissue specific. Our method represents a conceptual scaffold for organism-spanning analyses and reveals an extensive list of tissue-specific draft molecular pathways, both known and unexpected, that might be disrupted in disease.
    Proceedings of the National Academy of Sciences 12/2008; 105(52):20870-5. DOI:10.1073/pnas.0810772105


Available from