-
[show abstract]
[hide abstract]
ABSTRACT: Heart rate variability (HRV) is an important dynamical variable of the cardiovascular function. There have been numerous efforts to determine whether HRV dynamics are chaotic or random, and whether certain complexity measures are capable of distinguishing healthy subjects from patients with certain cardiac disease. In this study, we employ a new multiscale complexity measure, the scale-dependent Lyapunov exponent (SDLE), to characterize the relative importance of nonlinear, chaotic, and stochastic dynamics in HRV of healthy, congestive heart failure (CHF), and atrial fibrillation subjects. We show that while HRV data of all these three types are mostly stochastic, the stochasticity is different among the three groups. Furthermore, we show that for the purpose of distinguishing healthy subjects from patients with CHF, features derived from SDLE are more effective than other complexity measures such as the Hurst parameter, the sample entropy, and the multiscale entropy.
Annals of biomedical engineering 12/2009; 38(3):854-64. · 2.41 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Heart rate variability (HRV) time series is highly nonlinear and nonstationary. To effectively characterize its complexity, we employ a newly developed multiscale complexity measure, the scale-dependent Lyapunov exponent (SDLE). We derive two readily computable features from the SDLE and show that they can readily distinguish healthy subjects from patients with congestive heart failure (CHF). The same task is evaluated using other complexity measures, including the Hurst parameter, the sample entropy, and the multiscale entropy. It is shown that for the purpose of distinguishing healthy subjects from patients with CHF, the features derived from the SDLE are much more effective than the Hurst parameter, the sample entropy, and the multiscale entropy.
Bioinformatics and Biomedical Engineering, 2008. ICBBE 2008. The 2nd International Conference on; 06/2008
-
[show abstract]
[hide abstract]
ABSTRACT: Developing effective methods for analyzing array-CGH data to detect chromosomal aberrations is very important for the diagnosis of pathogenesis of cancer and other diseases. Current analysis methods, being largely based on smoothing and/or segmentation, are not quite capable of detecting both the aberration regions and the boundary break points very accurately. Furthermore, when evaluating the accuracy of an algorithm for analyzing array-CGH data, it is commonly assumed that noise in the data follows normal distribution. A fundamental question is whether noise in array-CGH is indeed Gaussian, and if not, can one exploit the characteristics of noise to develop novel analysis methods that are capable of detecting accurately the aberration regions as well as the boundary break points simultaneously? By analyzing bacterial artificial chromosomes (BACs) arrays with an average 1 mb resolution, 19 k oligo arrays with the average probe spacing <100 kb and 385 k oligo arrays with the average probe spacing of about 6 kb, we show that when there are aberrations, noise in all three types of arrays is highly non-Gaussian and possesses long-range spatial correlations, and that such noise leads to worse performance of existing methods for detecting aberrations in array-CGH than the Gaussian noise case. We further develop a novel method, which has optimally exploited the character of the noise, and is capable of identifying both aberration regions as well as the boundary break points very accurately. Finally, we propose a new concept, posteriori signal-to-noise ratio (p-SNR), to assign certain confidence level to an aberration region and boundaries detected.
Nucleic Acids Research 01/2007; 35(5):e35. · 8.03 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The completion of the human genome and genomes of many other organisms calls for the development of faster computational tools
which are capable of easily identifying the structures and extracting features from DNA sequences. Such tools are even more
important for sequencing uncompleted genomes of many other organisms, such as floro- and neuro- genomes. One of the more important
structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence
are to be identified or redundant expressed sequence tags are to be sequenced. Here we report a novel recurrence time based
method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related
features from a genomic DNA sequence. An efficient codon index can also be derived from the recurrence time statistics, which
has two salient features of being largely species-independent and working well on very short sequences. Efficient codon indices
are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected expressed
sequence tag belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Our method only requires approximately 6 · N byte memory and a computational time of N log N to extract all the repeat-related and periodic or quasi-periodic features from a sequence of length N without any prior knowledge about the consensus sequence of those features, therefore enables us to carry out analysis of
genomes on the whole genomic scale.
12/2006: pages 321-337;
-
[show abstract]
[hide abstract]
ABSTRACT: Biological time series are often highly nonlinear and nonstationary. To effectively characterize the complexity of biological signals, we propose a new multiscale analysis method. It has a distinguished feature of scale isolation, and thus can readily deal with nonstationarity in biological signals. By analyzing a number of heart rate variability data, we show that the method can accurately distinguish between healthy subjects and patients with congestive heart failure. Furthermore, our analysis suggests that the dimension of the dynamics of the cardiovascular system is lower under the healthy than under diseased conditions. This is compatible with the observation that a healthy cardiovascular system is a tightly coupled system with coherent functions, while components in a malfunctioning cardiovascular system are somewhat loosely coupled and function incoherently. Therefore, if cardiovascular dynamics could be deterministically chaoslike, it would be more likely to be detected in healthy subjects
Life Science Systems and Applications Workshop, 2006. IEEE/NLM; 08/2006
-
[show abstract]
[hide abstract]
ABSTRACT: Due to the ubiquity of time series with long-range correlation in many areas of science and engineering, analysis and modeling of such data is an important problem. While the field seems to be mature, three major issues have not been satisfactorily resolved. (i) Many methods have been proposed to assess long-range correlation in time series. Under what circumstances do they yield consistent results? (ii) The mathematical theory of long-range correlation concerns the behavior of the correlation of the time series for very large times. A measured time series is finite, however. How can we relate the fractal scaling break at a specific time scale to important parameters of the data? (iii) An important technique in assessing long-range correlation in a time series is to construct a random walk process from the data, under the assumption that the data are like a stationary noise process. Due to the difficulty in determining whether a time series is stationary or not, however, one cannot be 100% sure whether the data should be treated as a noise or a random walk process. Is there any penalty if the data are interpreted as a noise process while in fact they are a random walk process, and vice versa? In this paper, we seek to gain important insights into these issues by examining three model systems, the autoregressive process of order 1, on-off intermittency, and Lévy motions, and considering an important engineering problem, target detection within sea-clutter radar returns. We also provide a few rules of thumb to safeguard against misinterpretations of long-range correlation in a time series, and discuss relevance of this study to pattern recognition.
Physical Review E 02/2006; 73(1 Pt 2):016117. · 2.26 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: In time series analysis, it has been considered of key importance to determine whether a complex time series measured from the system is regular, deterministically chaotic, or random. Recently, Gottwald and Melbourne have proposed an interesting test for chaos in deterministic systems. Their analyses suggest that the test may be universally applicable to any deterministic dynamical system. In order to fruitfully apply their test to complex experimental data, it is important to understand the mechanism for the test to work, and how it behaves when it is employed to analyze various types of data, including those not from clean deterministic systems. We find that the essence of their test can be described as to first constructing a random walklike process from the data, then examining how the variance of the random walk scales with time. By applying the test to three sets of data, corresponding to (i) 1/falpha noise with long-range correlations, (ii) edge of chaos, and (iii) weak chaos, we show that the test mis-classifies (i) both deterministic and weakly stochastic edge of chaos and weak chaos as regular motions, and (ii) strongly stochastic edge of chaos and weak chaos, as well as 1/falpha noise as deterministic chaos. Our results suggest that, while the test may be effective to discriminate regular motion from fully developed deterministic chaos, it is not useful for exploratory purposes, especially for the analysis of experimental data with little a priori knowledge. A few speculative comments on the future of multiscale nonlinear time series analysis are made.
Physical Review E 12/2005; 72(5 Pt 2):056207. · 2.26 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: With the completion of the human and a few model organisms' genomes, and with the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time-based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Our method requires approximately 6 . N byte memory and a computational time of N log N to extract all the repeat-related and periodic or quasi-periodic features from a sequence of length N without any prior knowledge on the consensus sequence of those features, hence enables us to carry out sequence analysis on the whole genomic scale by a PC.
Journal of Bioinformatics and Computational Biology 07/2005; 3(3):677-96.
-
[show abstract]
[hide abstract]
ABSTRACT: Most codon indices used today are based on highly biased nonrandom usage of codons in coding regions. The background of a coding or noncoding DNA sequence, however, is fairly random, and can be characterized as a random fractal. When a gene-finding algorithm incorporates multiple sources of information about coding regions, it becomes more successful. It is thus highly desirable to develop new and efficient codon indices by simultaneously characterizing the fractal and periodic features of a DNA sequence. In this paper, we describe a novel way of achieving this goal. The efficiency of the new codon index is evaluated by studying all of the 16 yeast chromosomes. In particular, we show that the method automatically and correctly identifies which of the three reading frames is the one that contains a gene.
Journal of Biomedicine and Biotechnology 07/2005; 2005(2):139-46. · 2.44 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Entropy and recurrence times are two of the most important complexity measures for both random fields and nonlinear dynamical systems. We report a fundamental relation between recurrence time distri-bution and Renyi entropy of arbitrary integer order for both ergodic random fields and ergodic nonlin-ear dynamical systems, thus provide an elegant and comprehensive characterization for these two impor-tant systems. The fundamental relation is obtained by parameterizing the dynamics in the state space by a discrete symbol sequence and collectively con-sider recurrences to all non-empty sub-regions of the state space. Event detection using recurrence time statistics is also considered, including speech endpoint detection, epileptic seizure detection/prediction from continuous EEG measurements, and gene identifica-tion from genomic DNA sequences.
Conference on Information Sciences and Systems. 04/2005;
-
[show abstract]
[hide abstract]
ABSTRACT: Recently, the concept of exponential sensitivity to initial conditions (ESIC) of deterministic chaos is generalized to power-law sensitivity to initial conditions (PSIC). We describe a general computational procedure to examine PSIC from a time series. Study of noise-free and noisy logistic and Henon maps at the edge of chaos finds that PSIC cannot be shown from clean scalar time series. However, when there is dynamic noise, motions around the edges of chaos all collapse onto the PSIC attractor regardless whether they are simply regular or truly chaotic when noise is absent. Hence, dynamic noise makes PSIC observable. The PSIC concept is further applied to the analysis of long continuous EEG signals with epileptic seizures. It is shown that measures from the PSIC framework is quite effective in detecting seizures, often better than the Lyapunov exponent based methods from the conventional ESIC framework.
Physica A. 01/2005; 35320(05).
-
J. Bioinformatics and Computational Biology. 01/2005; 3:677-696.
-
IEEE Intelligent Systems. 01/2005; 20:34-39.
-
[show abstract]
[hide abstract]
ABSTRACT: Timely detection of unusual and/or unexpected events in natural and man-made systems has deep scientific and practical relevance. We show that the recently proposed conceptually simple and easily calculated measure of permutation entropy can be effectively used to detect qualitative and quantitative dynamical changes. We illustrate our results on two model systems as well as on clinically characterized brain wave data from epileptic patients.
Physical Review E 11/2004; 70(4 Pt 2):046217. · 2.26 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Summary form only given. When a gene finding algorithm incorporates multiple useful and non-redundant sources of information about coding regions, it becomes more successful. It is thus highly desirable to find new and efficient codon indices. Here we propose a novel codon index, which we call the period-3 fractal deviation (PFD). This is obtained by simultaneously considering two incompatible features of DNA sequences, the period-3 feature in coding regions and the fractal feature in both coding and non-coding regions. These two features are incompatible because period-3 defines a specific scale of three nucleotide bases while fractal means there are not any specific scales. The PFD is very different for coding and non-coding sequences, and is reading-frame-dependent. The accuracy of the PFD is evaluated by studying all of the 16 yeast chromosomes. It is found that the percentage accuracy is very high and quite independent of the sliding window size. It is also found that this percentage accuracy is much higher than when period-3 and fractal features are characterized alone, especially when the window size is small. This highly suggests that the method is not only useful for the study of long genome sequences, but may also be very powerful for the study of short DNA segments. The PFD is complementary to other codon indices, including Fourier measures of period-3. This makes it possible to integrate PFD with other measures. Indeed, integration of the PFD measure with those indices using the Fisher linear discriminant analysis significantly improves the accuracy of protein coding sequence identification; This implies the measure proposed here may be readily incorporated into existing gene finding algorithms. Other salient features of the method is that it is non-parametric, does not require training, and can be fully automated.
Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE; 09/2004
-
[show abstract]
[hide abstract]
ABSTRACT: With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or noncoding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.
Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE; 09/2004
-
[show abstract]
[hide abstract]
ABSTRACT: With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.
Proceedings / IEEE Computational Systems Bioinformatics Conference, CSB. IEEE Computational Systems Bioinformatics Conference 02/2004;
-
3rd International IEEE Computer Society Computational Systems Bioinformatics Conference (CSB 2004), 16-19 August 2004, Stanford, CA, USA; 01/2004
-
[show abstract]
[hide abstract]
ABSTRACT: Principal component analysis (PCA) is a popular data analysis method. One of the motivations for using PCA in practice is to reduce the dimension of the original data by projecting the raw data onto a few dominant eigenvectors with large variance (energy). Due to the ubiquity of 1/fα noise in science and engineering, in this Letter we study the prototypical stochastic model for 1/fα processes—the fractional Brownian motion (fBm) processes using PCA, and find that the eigenvalues from PCA of fBm processes follow a power-law, with the exponent being the key parameter defining the fBm processes. We also study random-walk-type processes constructed from DNA sequences, and find that the eigenvalue spectrum from PCA of those random-walk processes also follow power-law relations, with the exponent characterizing the correlation structures of the DNA sequence. In fact, it is observed that PCA can automatically remove linear trends induced by patchiness in the DNA sequence, hence, PCA has a similar capability to the detrended fluctuation analysis. Implications of the power-law distributed eigenvalue spectrum are discussed.
Physics Letters A.
-
[show abstract]
[hide abstract]
ABSTRACT: In recent years it has been increasingly recognized that noise and determinism may have comparable but different influences on population dynamics. However, no simple analysis methods have been introduced into ecology which can readily characterize those impacts. In this paper, we study a population model with strong periodicity and both with and without noise. The noise-free model generates both quasi-periodic and chaotic dynamics for certain parameter values. Due to the strong periodicity, however, the generated chaotic dynamics have not been satisfactorily described. The dynamics becomes even more complicated when there is noise. Characterizing the chaotic and stochastic dynamics in this model thus represents a challenging problem. Here we show how the chaotic dynamics can be readily characterized by the direct dynamical test for deterministic chaos developed by [Gao JB, Zheng ZM. Europhys. Lett. 1994;25:485] and how the influence of noise on quasi-periodic motions can be characterized as asymmetric diffusions wandering along the quasi-periodic orbit. It is hoped that the introduced methods will be useful in studying other population models as well as population time series obtained both in field and laboratory experiments.
Chaos, Solitons & Fractals.