Identification of homologous microRNAs in 56 animal genomes
MicroRNAs (miRNAs) are endogenous non-protein-coding RNAs of approximately 22 nucleotides. Thousands of miRNA genes have been identified (computationally and/or experimentally) in a variety of organisms, which suggests that miRNA genes have been widely shared and distributed among species. Here, we used unique miRNA sequence patterns to scan the genome sequences of 56 bilaterian animal species for locating candidate miRNAs first. The regions centered surrounding these candidate miRNAs were then extracted for folding and calculating the features of their secondary structure. Using a support vector machine (SVM) as a classifier combined with these features, we identified an additional 13,091 orthologous or paralogous candidate pre-miRNAs, as well as their corresponding candidate mature miRNAs. Stem-loop RT-PCR and deep sequencing methods were used to experimentally validate the prediction results in human, medaka and rabbit. Our prediction pipeline allows the rapid and effective discovery of homologous miRNAs in a large number of genomes.