-
-
-
04/2008: pages 301 - 319; , ISBN: 9780470397428
-
[show abstract]
[hide abstract]
ABSTRACT: We propose a new strategy to analyse the periodicity of gene expression profiles using Singular Spectrum Analysis (SSA) and Autoregressive (AR) model based spectral estimation. By combining the advantages of SSA and AR modelling, more periodic genes are extracted in the Plasmodium falciparum data set, compared with the classical Fourier analysis technique. We are able to identify more gene targets for new drug discovery, and by checking against the seven well-known malaria vaccine candidates, we have found five additional genes that warrant further biological verification.
International Journal of Bioinformatics Research and Applications 01/2008; 4(3):337-49.
-
[show abstract]
[hide abstract]
ABSTRACT: The eukaryotic promoter prediction is one of the most important problems in DNA sequence analysis, but also a very difficult one. Although a number of algorithms have been proposed, their performances are still limited by low sensitivities and high false positives. We present a method for improving the performance of promoter regions prediction. We focus on the selection of most effective features for different functional regions in DNA sequences. Our feature selection algorithm is based on relative entropy or Kullback-Leibler divergence, and a system combined with position-specific information for promoter regions prediction is developed. The results of testing on large genomic sequences and comparisons with the PromoterInspector and Dragon Promoter Finder show that our algorithm is efficient with higher sensitivity and specificity in predicting promoter regions.
Physical Review E 05/2007; 75(4 Pt 1):041908. · 2.26 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Periodogram analysis of time-series is widespread in biology. A new challenge for analyzing the microarray time series data is to identify genes that are periodically expressed. Such challenge occurs due to the fact that the observed time series usually exhibit non-idealities, such as noise, short length, and unevenly sampled time points. Most methods used in the literature operate on evenly sampled time series and are not suitable for unevenly sampled time series.
For evenly sampled data, methods based on the classical Fourier periodogram are often used to detect periodically expressed gene. Recently, the Lomb-Scargle algorithm has been applied to unevenly sampled gene expression data for spectral estimation. However, since the Lomb-Scargle method assumes that there is a single stationary sinusoid wave with infinite support, it introduces spurious periodic components in the periodogram for data with a finite length. In this paper, we propose a new spectral estimation algorithm for unevenly sampled gene expression data. The new method is based on signal reconstruction in a shift-invariant signal space, where a direct spectral estimation procedure is developed using the B-spline basis. Experiments on simulated noisy gene expression profiles show that our algorithm is superior to the Lomb-Scargle algorithm and the classical Fourier periodogram based method in detecting periodically expressed genes. We have applied our algorithm to the Plasmodium falciparum and Yeast gene expression data and the results show that the algorithm is able to detect biologically meaningful periodically expressed genes.
We have proposed an effective method for identifying periodic genes in unevenly sampled space of microarray time series gene expression data. The method can also be used as an effective tool for gene expression time series interpolation or resampling.
BMC Bioinformatics 02/2007; 8:137. · 2.75 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Abstract
Background
Periodogram analysis of time-series is widespread in biology. A new challenge for analyzing the microarray time series data is to identify genes that are periodically expressed. Such challenge occurs due to the fact that the observed time series usually exhibit non-idealities, such as noise, short length, and unevenly sampled time points. Most methods used in the literature operate on evenly sampled time series and are not suitable for unevenly sampled time series.
Results
For evenly sampled data, methods based on the classical Fourier periodogram are often used to detect periodically expressed gene. Recently, the Lomb-Scargle algorithm has been applied to unevenly sampled gene expression data for spectral estimation. However, since the Lomb-Scargle method assumes that there is a single stationary sinusoid wave with infinite support, it introduces spurious periodic components in the periodogram for data with a finite length. In this paper, we propose a new spectral estimation algorithm for unevenly sampled gene expression data. The new method is based on signal reconstruction in a shift-invariant signal space, where a direct spectral estimation procedure is developed using the B-spline basis. Experiments on simulated noisy gene expression profiles show that our algorithm is superior to the Lomb-Scargle algorithm and the classical Fourier periodogram based method in detecting periodically expressed genes. We have applied our algorithm to the Plasmodium falciparum and Yeast gene expression data and the results show that the algorithm is able to detect biologically meaningful periodically expressed genes.
Conclusion
We have proposed an effective method for identifying periodic genes in unevenly sampled space of microarray time series gene expression data. The method can also be used as an effective tool for gene expression time series interpolation or resampling.
BMC Bioinformatics. 01/2007;
-
Proceedings of 5th Asia-Pacific Bioinformatics Conference, APBC 2007, 15-17 January 2007, Hong Kong, China; 01/2007
-
BMC Bioinformatics. 01/2007; 8.
-
[show abstract]
[hide abstract]
ABSTRACT: MOTIVATION: Promoter prediction is important for the analysis of gene regulations. Although a number of promoter prediction algorithms have been reported in literature, significant improvement in prediction accuracy remains a challenge. In this paper, an effective promoter identification algorithm, which is called PromoterExplorer, is proposed. In our approach, we analyze the different roles of various features, that is, local distribution of pentamers, positional CpG island features and digitized DNA sequence, and then combine them to build a high-dimensional input vector. A cascade AdaBoost-based learning procedure is adopted to select the most 'informative' or 'discriminating' features to build a sequence of weak classifiers, which are combined to form a strong classifier so as to achieve a better performance. The cascade structure used for identification can also reduce the false positive. RESULTS: PromoterExplorer is tested based on large-scale DNA sequences from different databases, including the EPD, DBTSS, GenBank and human chromosome 22. Experimental results show that consistent and promising performance can be achieved.
Bioinformatics 12/2006; 22(22):2722-8. · 5.47 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Cluster analysis of gene expression data from a cDNA microarray is useful for identifying biologically relevant groups of genes. However, finding the natural clusters in the data and estimating the correct number of clusters are still two largely unsolved problems. In this paper, we propose a new clustering framework that is able to address both these problems. By using the one-prototype-take-one-cluster (OPTOC) competitive learning paradigm, the proposed algorithm can find natural clusters in the input data, and the clustering solution is not sensitive to initialization. In order to estimate the number of distinct clusters in the data, we propose a cluster splitting and merging strategy. We have applied the new algorithm to simulated gene expression data for which the correct distribution of genes over clusters is known a priori. The results show that the proposed algorithm can find natural clusters and give the correct number of clusters. The algorithm has also been tested on real gene expression changes during yeast cell cycle, for which the fundamental patterns of gene expression and assignment of genes to clusters are well understood from numerous previous studies. Comparative studies with several clustering algorithms illustrate the effectiveness of our method.
IEEE Transactions on Information Technology in Biomedicine 04/2004; 8(1):5-15. · 1.68 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Cluster analysis of gene expression data from a cDNA microarray is useful for identifying biologically relevant groups of genes. However, finding the natural clusters in the data and estimating the correct number of clusters are still two largely unsolved problems. In this paper, we propose a new clustering framework that is able to address both these problems. By using the one-prototype-take-one-cluster (OPTOC) competitive learning paradigm, the proposed algorithm can find natural clusters in the input data, and the clustering solution is not sensitive to initialization. In order to estimate the number of distinct clusters in the data, we propose a cluster splitting and merging strategy. We have applied the new algorithm to simulated gene expression data for which the correct distribution of genes over clusters is known a priori. The results show that the proposed algorithm can find natural clusters and give the correct number of clusters. The algorithm has also been tested on real gene expression changes during yeast cell cycle, for which the fundamental patterns of gene expression and assignment of genes to clusters are well understood from numerous previous studies. Comparative studies with several clustering algorithms illustrate the effectiveness of our method.
IEEE Transactions on Information Technology in Biomedicine 04/2004; · 1.68 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Cluster analysis of gene expression data is useful for identifying biologically relevant groups of genes. However, finding the correct clusters in the data and estimating the correct number of clusters are still two largely unsolved problems. In this paper, we propose a new clustering framework that is able to address both these problems. By using the one-prototype-take-one-cluster (OPTOC) competitive learning paradigm, the proposed algorithm can find natural clusters in the input data, and the clustering solution is not sensitive to initialization. In order to estimate the number of distinct clusters in the data, an over-clustering and merging strategy is proposed. For validation, we applied the new algorithm to both simulated gene expression data and real gene expression data (expression changes during yeast cell cycle). The results clearly indicate the effectiveness of our method.
Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint Conference of the Fourth International Conference on; 01/2004
-
First Asia-Pacific Bioinformatics Conference (APBC 2003), February 4-7, 2003, Adelaide, Australia; 01/2003
-
IEEE Trans. Circuits Syst. Video Techn. 01/2001; 11:1193-1198.
-
[show abstract]
[hide abstract]
ABSTRACT: Image segmentation is supposed to be the most important step in microarray image analysis. In this work, we proposed a new
template-based segmentation method for DNA microarray images. Different from the local-based segmentation techniques adopted
by all the available analysis softwares, our algorithm segments images from global view of point. Based on mean shift filtering
technique, we first segmented image into some different homogenous regions in which all the spots appeared as different local
maximum regions. Then an initial spot segmentation template was extracted by morphological H- reconstruction. Finally, a refined
spot segmentation template was obtained by histogram analysis. Experimental results showed that our algorithm is robust and
can obtain accurate spot segmentation results. Especially, compared to all the available algorithms, our template-based spot
segmentation scheme not only can facilitate downstream intensity extraction step but also can be very helpful to improve the
accuracy of intensity extraction.
01/1970: pages 41-50;
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper, a new feature extracting method and clustering scheme in spectral space for gene expression data was proposed. We model each member of same cluster as the sum of cluster's representative term and experimental artifacts term. More compact clusters and hence better clustering results can be obtained through extracting essential features or reducing experimental artifacts. In term of the periodicity of gene expression profile data, features extracting is performed in DCT domain by soft-thresholding de-noising method. Clustering process is based on OPTOC competitive learning strategy. The results for clustering real gene expression profiles show that our method is better than directly clustering in the original space. Yes Yes
-
[show abstract]
[hide abstract]
ABSTRACT: Spectral analysis of DNA microarray gene expressions time series data is important for understanding the regulation of gene expression and gene function of the Plasmodium falciparum in the intraerythrocytic developmental cycle. In this paper, we propose a new strategy to analyze the cell cycle regulation of gene expression profiles based on the combination of singular spectrum analysis (SSA) and autoregressive (AR) spectral estimation. Using the SSA, we extract the dominant trend of data and reduce the effect of noise. Based on the AR analysis, high resolution spectra can be produced. Experiment results show that our method can extract more genes and the information can be useful for new drug design. Yes Yes