Conference Paper

Digital Signal Processing Techniques for Gene Finding in Eukaryotes

DOI: 10.1007/978-3-540-69905-7_17 Conference: Image and Signal Processing - 3rd International Conference, ICISP 2008, Cherbourg-Octeville, France, July 1-3, 2008, Proceedings
Source: DBLP


In this paper, we investigate the effects of window shape and length on a DFT-based method for gene and exon prediction in
eukaryotes. We then propose a new gene finding method which combines the selected time-domain and frequency-domain methods,
by employing the most effective DNA symbolic-to-numeric representation examined to date in conjunction with suitable window
shape and length parameters and a signal boosting technique. It is shown herein that the new method outperforms major existing
approaches. By comparison with the existing methods, the proposed method reveals relative improvements of 15.1% to 55.9% over
different methods in terms of prediction accuracy of exonic nucleotides at a 5% false positive rate using the GENSCAN test

3 Reads
  • Source
    • "On the other hand, windows that are too small tend to yield noisy results. Mahmood et al. [23] suggest that a size of 351 bps would provide a good balance between noise and specificity; [10] and [19] also demonstrate why 351 is a solid choice. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a method of implementing parallel gene prediction algorithms in MATLAB. The proposed designs are based on either Goertzel's algorithm or on FFTs and have been implemented using varying amounts of parallelism on a central processing unit (CPU) and on a graphics processing unit (GPU). Results show that an implementation using a straightforward approach can require over 4.5 h to process 15 million base pairs (bps) whereas a properly designed one could perform the same task in less than five minutes. In the best case, a GPU implementation can yield these results in 57 s. The present work shows how parallelism can be used in MATLAB for gene prediction in very large DNA sequences to produce results that are over 270 times faster than a conventional approach. This is significant as MATLAB is typically overlooked due to its apparent slow processing time even though it offers a convenient environment for bioinformatics. From a practical standpoint, this work proposes two strategies for accelerating genome data processing which rely on different parallelization mechanisms. Using a CPU, the work shows that direct access to the MEX function increases execution speed and that the PARFOR construct should be used in order to take full advantage of the parallelizable Goertzel implementation. When the target is a GPU, the work shows that data needs to be segmented into manageable sizes within the GFOR construct before processing in order to minimize execution time.
    BMC Research Notes 04/2012; 5(1):183. DOI:10.1186/1756-0500-5-183
  • [Show abstract] [Hide abstract]
    ABSTRACT: The identification of regions of DNA sequences that code for proteins is one of the most fundamental applications in bioinformatics. These protein-coding regions are in contrast to other DNA regions that encode functional RNA molecules, provide structural stability of chromosomes, serve as genetic raw materials, represent molecular fossils, or have no known purpose (sometimes called "junk DNA"). A number of approaches have been suggested for differentiating between the protein-coding and non-protein-coding regions of DNA. A selection of these approaches is based on digital signal processing (DSP) techniques. These DSP techniques rely on the phenomenon that protein-coding regions have a prominent power spectrum peak at frequency f=⅓ arising from the length of codons (three nucleic acids). This article partitions the identification of protein-coding regions into four discrete steps. Based on this partitioning, DSP techniques can be easily described and compared based on their unique implementations of the processing steps. We compare the approaches, and discuss strengths and weaknesses of each in the context of different applications. Our work provides an accessible introduction and comparative review of DSP methods for the identification of protein-coding regions. Additionally, by breaking down the approaches into four steps, we suggest new combinations that may be worthy of future study.
    Journal of computational biology: a journal of computational molecular cell biology 03/2011; 18(4):639-76. DOI:10.1089/cmb.2010.0184 · 1.74 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The design of new drugs delivery systems is strongly dependent on the capability to maximize biocompatibility and reduce immunotoxicity. The minimization of foreign body reactions is one of the critical step, and toll-like receptors play a pivotal role in sensing and activation of response against exogenous elements. The complexity of these molecules brings about the need to identify those local structural elements that preferentially interact with different classes of compounds. We have applied Digital Signal Processing (DSP) methods to identify the regions containing these structural elements. DSP analysis has been carried out on ‘wild-type’ and several allelic forms of Toll-like receptor 1 cDNA. DSP has enabled the screening of allele specific nucleotide domains that could have an effect on allele-specific response against exogenous compounds.
    Bioinformatics and Bioengineering (BIBE), 2013 IEEE 13th International Conference on; 01/2013
Show more

Similar Publications