Digital Signal Processing in the Analysis of Genomic Sequences

Centro de Estudios de Informática, Facultad de Ingeniería Eléctrica, Universidad Central "Marta Abreu" de Las Villas, Villa Clara, Cuba
Current Bioinformatics (Impact Factor: 0.92). 01/2009; 4(1):28-40. DOI: 10.2174/157489309787158134


Digital Signal Processing (DSP) applications in Bioinformatics have received great attention in recent years, where new effective methods for genomic sequence analysis, such as the detection of coding regions, have been devel-oped. The use of DSP principles to analyze genomic sequences requires defining an adequate representation of the nucleo-tide bases by numerical values, converting the nucleotide sequences into time series. Once this has been done, all the mathematical tools usually employed in DSP are used in solving tasks such as identification of protein coding DNA re-gions, identification of reading frames, and others. In this article we present an overview of the most relevant applications of DSP algorithms in the analysis of genomic sequences, showing the main results obtained by using these techniques, analyzing their relative advantages and drawbacks, and providing relevant examples. We finally analyze some perspec-tives of DSP in Bioinformatics, considering recent research results on algebraic structures of the genetic code, which sug-gest other new DSP applications in this field, as well as the new field of Genomic Signal Processing.

Download full-text


Available from: Juan V. Lorenzo-Ginori, Sep 01, 2015
300 Reads
  • Source
    • "to distinguishing exonic and intronic regions is based on digital signal processing (DSP) methods. Main DSP methods include the discrete Fourier transform, digital filters, entropy measures and spectral analysis using parametric models [8]. All these approaches look for a 3-periodic pattern in the occurrences of A, C, G or T. The Fourier transform has been widely used for sequence analysis [9]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a new technique for the detection of short exons in DNA sequences. In this method, we analyze four DNA structural properties, which include the DNA bending stiffness, disrupt energy, free energy, and propeller twist, using the autoregressive (AR) model. The linear prediction matrices for the four features are combined to find the same set of linear prediction coefficients, from which we estimate the spectrum of the DNA sequence and detect exons based on the 1/3 frequency component. To overcome the nonstationarity of DNA sequences, we use moving windows of different sizes in the AR model. Experiments on the human genome show that our multi-feature based method is superior in performance to existing exon detection algorithms.
    Journal on Advances in Signal Processing 01/2011; 2011(1). DOI:10.1155/2011/780794 · 0.78 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a new approach for short gene recognition in DNA sequences. Three DNA structural features are selected from an analysis of fourteen structural features. The feature values are mapped to new values. Three DNA signals are generated by the three sets of mapped feature values. Then the three DNA signals are normalized and combined into one signal. An auto-regressive (AR) model is used for power spectral density (PSD) estimation of the signal. The experiment result obtained by this method is shown to be comparable to existing exon detection methods which use digital signal processing (DSP). Also the computation complexity of the new method is only 1/3 of that of the method proposed previously.
    Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Anchorage, Alaska, USA, October 9-12, 2011; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The main feature of global repeat map (GRM) algorithm ( is its ability to identify a broad variety of repeats of unbounded length that can be arbitrarily distant in sequences as large as human chromosomes. The efficacy is due to the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram. In this way, we obtain very fast, efficient and highly automatized repeat finding tool. The method is robust to substitutions and insertions/deletions, as well as to various complexities of the sequence pattern. We present several case studies of GRM use, in order to illustrate its capabilities: identification of α-satellite tandem repeats and higher order repeats (HORs), identification of Alu dispersed repeats and of Alu tandems, identification of Period 3 pattern in exons, implementation of 'magnifying glass' effect, identification of complex HOR pattern, identification of inter-tandem transitional dispersed repeat sequences and identification of long segmental duplications. GRM algorithm is convenient for use, in particular, in cases of large repeat units, of highly mutated and/or complex repeats, and of global repeat maps for large genomic sequences (chromosomes and genomes).
    Nucleic Acids Research 07/2012; 41(1):e47. DOI:10.1093/nar/gks721 · 9.11 Impact Factor
Show more