Exons and introns characterization in nucleic acid sequences by time-frequency analysis
ABSTRACT A current problem in deoxyribonucleic acid (DNA) sequence analysis is to determine the exact locations of the genes and also in eukaryotes, the protein-coding regions in the mRNA primary transcript (pre-mRNA).The conversion into discrete numerical values of the symbols associated to the nucleotides of these sequences allows for a signal to address the problems related to localization and annotation of genes. In this work, thermodynamic data of free energy changes (ΔG°) on the formation of a duplex structure of DNA or RNA are used to convert the symbols into numerical values associated with the nucleotide sequence pre-mRNA. This study presents an analysis, based on techniques of time-frequency representation of a large number of gene sequences, in order to find variables related to pre-mRNA that could best characterize and discriminate coding regions from non-coding regions. It has been found that instantaneous frequency variables and instantaneous spectral energy variables in different frequency bands, allowed exons and introns to be correctly classified with more than 85%.