A binary coding method of RNA secondary structure and its application.
ABSTRACT According to the three classifications of nucleotides, we introduce a sort of binary coding method of RNA secondary structures. On the basis of this representation, we can reduce a RNA secondary structure into three binary digit sequences. We also propose coding rules based on the exclusive-OR operation. Associating with the proposed coding rules, we can judge the mutation between bases or between base and base pair, and make sequence alignment easily.
[show abstract] [hide abstract]
ABSTRACT: BACKGROUND: In recent years, the important functional roles of RNAs in biological processes have been repeatedly demonstrated. Computing the similarity between two RNAs contributes to better understanding the functional relationship between them. But due to the long-range correlations of RNA, many efficient methods of detecting protein similarity do not work well. In order to comprehensively understand the RNA's function, the better similarity measure among RNAs should be designed to consider their structure features (base pairs). Current methods for RNA comparison could be generally classified into alignment-based and alignment-free. RESULTS: In this paper, we propose a novel wavelet-based method based on RNA triple vector curve representation, named multi-scale RNA comparison. Firstly, we designed a novel numerical representation of RNA secondary structure termed as RNA triple vectors curve (TV-Curve). Secondly, we constructed a new similarity metric based on the wavelet decomposition of the TV-Curve of RNA. Finally we also applied our algorithm to the classification of non-coding RNA and RNA mutation and compared the results to the two well-known RNA comparison tools: RNAdistance and RNApdist. The results in this paper show the potentials of our method in RNA classification and RNA mutation analysis. CONCLUSION: We provide a better visualization and analysis tool named TV-Curve of RNA, especial for long RNA, which can characterize both sequence and structure features. Additionally, based on TV-Curves representation of RNAs, a multi-scale similarity measure for RNA comparison is proposed, which can capture the local and global difference between the information of sequence and structure of RNAs. Compared with the well-known RNA comparison approaches, the proposed method is validated to be outstanding and effective in terms of non-coding RNA classification and RNA mutation. From the numerical experiments, our proposed method can capture more efficient and subtle relationship of RNAs.BMC Bioinformatics 10/2012; 13(1):280. · 2.75 Impact Factor