Determination of the minimum number of microarray experiments for discovery of gene expression patterns.

Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada.
BMC Bioinformatics (Impact Factor: 2.67). 02/2006; 7 Suppl 4(Suppl 4):S13. DOI: 10.1186/1471-2105-7-S4-S13
Source: PubMed

ABSTRACT One type of DNA microarray experiment is discovery of gene expression patterns for a cell line undergoing a biological process over a series of time points. Two important issues with such an experiment are the number of time points, and the interval between them. In the absence of biological knowledge regarding appropriate values, it is natural to question whether the behaviour of progressively generated data may by itself determine a threshold beyond which further microarray experiments do not contribute to pattern discovery. Additionally, such a threshold implies a minimum number of microarray experiments, which is important given the cost of these experiments.
We have developed a method for determining the minimum number of microarray experiments (i.e. time points) for temporal gene expression, assuming that the span between time points is given and the hierarchical clustering technique is used for gene expression pattern discovery. The key idea is a similarity measure for two clusterings which is expressed as a function of the data for progressive time points. While the experiments are underway, this function is evaluated. When the function reaches its maximum, it indicates the set of experiments reach a saturated state. Therefore, further experiments do not contribute to the discrimination of patterns.
The method has been verified with two previously published gene expression datasets. For both experiments, the number of time points determined with our method is less than in the published experiments. It is noted that the overall approach is applicable to other clustering techniques.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The advent of DNA microarray technology has enabled biologists to monitor the expression levels (MRNA) of thousands of genes simultaneously. In this survey, we address various approaches to gene expression data analysis using clustering techniques. We discuss the performance of various existing clustering algorithms under each of these approaches. Proximity measure plays an important role in making a clustering technique effective. Therefore, we briefly discuss various proximity measures. Finally, since evaluation of the effectiveness of the clustering techniques over gene data requires validity measures and data sources for numeric data, we discuss them as well.
    Emerging Trends and Applications in Computer Science (NCETACS), 2011 2nd National Conference on; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Over the past 20 years, Omics technologies emerged as the consensual denomination of holistic molecular profiling. These techniques enable parallel measurements of biological -omes, or "all constituents considered collectively", and utilize the latest advancements in transcriptomics, proteomics, metabolomics, imaging, and bioinformatics. The technological accomplishments in increasing the sensitivity and throughput of the analytical devices, the standardization of the protocols and the widespread availability of reagents made the capturing of static molecular portraits of biological systems a routine task. The next generation of time course molecular profiling already allows for extensive molecular snapshots to be taken along the trajectory of time evolution of the investigated biological systems. Such datasets provide the basis for application of the inverse scientific approach. It consists in the inference of scientific hypotheses and theories about the structure and dynamics of the investigated biological system without any a priori knowledge, solely relying on data analysis to unveil the underlying patterns. However, most temporal Omics data still contain a limited number of time points, taken over arbitrary time intervals, through measurements on biological processes shifted in time. The analysis of the resulting short and noisy time series data sets is a challenge. Traditional statistical methods for the study of static Omics datasets are of limited relevance and new methods are required. This chapter discusses such algorithms which enable the application of the inverse analysis approach to short Omics time series.
    Methods in molecular biology (Clifton, N.J.) 01/2011; 719:153-72. DOI:10.1007/978-1-61779-027-0_7 · 1.29 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Gene expression microarrays have become an important exploratory tool in many screening experiments that aim to discover the genes that change expression in two or more biological conditions and can be used to build molecular profiles for both diagnostic and prognostic use. The still very high costs of microarrays and the difficulty in generating the biological samples are critical issues of microarraybased screening experiments, and the experimental design plays a crucial role in how informative an experiment is going to be. In this chapter, we describe some of the major issues related to the design of either randomized control trials or observational studies and discuss the choice of powerful sample sizes, the selection of informative experimental conditions, and experimental strategies that can minimize confounding. We conclude with a discussion of some of the open problems in the design and analysis of microarray experiments that need further research.
    12/2010: pages 271-290;


Available from