Article

Digital signal processing in the analysis of genomic sequences

Centro de Estudios de Electrónica y Tecnologías de la Información, Facultad de Ingeniería Eléctrica, Universidad Central "Marta Abreu" de Las Villas, Cuba; Centro de Estudios de Informática, Facultad de Ingeniería Eléctrica, Universidad Central "Marta Abreu" de Las Villas, Villa Clara, Cuba; Biotechnology Group, Instituto Nacional de Investigaciones en Viandas Tropicales, Villa Clara, INIVIT, Cuba
Current Bioinformatics 01/2009; 4:28-40. DOI: 10.2174/157489309787158134

ABSTRACT Digital Signal Processing (DSP) applications in Bioinformatics have received great attention in recent years, where new effective methods for genomic sequence analysis, such as the detection of coding regions, have been devel-oped. The use of DSP principles to analyze genomic sequences requires defining an adequate representation of the nucleo-tide bases by numerical values, converting the nucleotide sequences into time series. Once this has been done, all the mathematical tools usually employed in DSP are used in solving tasks such as identification of protein coding DNA re-gions, identification of reading frames, and others. In this article we present an overview of the most relevant applications of DSP algorithms in the analysis of genomic sequences, showing the main results obtained by using these techniques, analyzing their relative advantages and drawbacks, and providing relevant examples. We finally analyze some perspec-tives of DSP in Bioinformatics, considering recent research results on algebraic structures of the genetic code, which sug-gest other new DSP applications in this field, as well as the new field of Genomic Signal Processing.

2 Bookmarks
 · 
258 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a new technique for the detection of short exons in DNA sequences. In this method, we analyze four DNA structural properties, which include the DNA bending stiffness, disrupt energy, free energy, and propeller twist, using the autoregressive (AR) model. The linear prediction matrices for the four features are combined to find the same set of linear prediction coefficients, from which we estimate the spectrum of the DNA sequence and detect exons based on the 1/3 frequency component. To overcome the nonstationarity of DNA sequences, we use moving windows of different sizes in the AR model. Experiments on the human genome show that our multi-feature based method is superior in performance to existing exon detection algorithms.
    EURASIP Journal on Advances in Signal Processing. 01/2011;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The main feature of global repeat map (GRM) algorithm (www.hazu.hr/grm/software/win/grm2012.exe) is its ability to identify a broad variety of repeats of unbounded length that can be arbitrarily distant in sequences as large as human chromosomes. The efficacy is due to the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram. In this way, we obtain very fast, efficient and highly automatized repeat finding tool. The method is robust to substitutions and insertions/deletions, as well as to various complexities of the sequence pattern. We present several case studies of GRM use, in order to illustrate its capabilities: identification of α-satellite tandem repeats and higher order repeats (HORs), identification of Alu dispersed repeats and of Alu tandems, identification of Period 3 pattern in exons, implementation of 'magnifying glass' effect, identification of complex HOR pattern, identification of inter-tandem transitional dispersed repeat sequences and identification of long segmental duplications. GRM algorithm is convenient for use, in particular, in cases of large repeat units, of highly mutated and/or complex repeats, and of global repeat maps for large genomic sequences (chromosomes and genomes).
    Nucleic Acids Research 07/2012; 41(1):e47. · 8.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Biospectrogam is an open-source software for the spectral analysis of DNA and protein sequences. The software can fetch (from NCBI server), import and manage biological data. One can analyze the data using Digital Signal Processing (DSP) techniques since the software allows the user to convert the symbolic data into numerical data using 23 popular encodings and then apply popular transformations such as Fast Fourier Transform (FFT) etc. and export it. The ability of exporting (both encoding files and transform files) as a MATLAB .m file gives the user an option to apply variety of techniques of DSP. User can also do window analysis (both sliding in forward and backward directions and stagnant) with different size windows and search for meaningful spectral pattern with the help of exported MATLAB file in a dynamic manner by choosing time delay in the plot using Biospectrogram. Random encodings and user choice encoding allows software to search for many possibilities in spectral space. Availability: Biospectrogam is written in Java and is available to download freely from http://www.guptalab.org/biospectrogram. Software has been optimized to run on Windows, Mac OSX and Linux. User manual and you-tube (product demo) tutorial is also available on the website. We are in the process of acquiring open source license for it.
    10/2012;

Full-text (2 Sources)

Download
1,081 Downloads
Available from
May 21, 2014