[show abstract][hide abstract] ABSTRACT: Neuroglobin (Ngb) is a hexacoordinated globin expressed mainly in the central and peripheral nervous system of vertebrates. Although several hypotheses have been put forward regarding the role of neuroglobin, its definite function remains uncertain. Ngb appears to have a neuro-protective role enhancing cell viability under hypoxia and other types of oxidative stress. Ngb is phylogenetically ancient and has a substitution rate nearly four times lower than that of other vertebrate globins, e.g. hemoglobin. Despite its high sequence conservation among vertebrates Ngb seems to be elusive in invertebrates.
We determined candidate orthologs in invertebrates and identified a globin of the placozoan Trichoplax adhaerens that is most likely orthologous to vertebrate Ngb and confirmed the orthologous relationship of the polymeric globin of the sea urchin Strongylocentrotus purpuratus to Ngb. The putative orthologous globin genes are located next to genes orthologous to vertebrate POMT2 similarly to localization of vertebrate Ngb. The shared syntenic position of the globins from Trichoplax, the sea urchin and of vertebrate Ngb strongly suggests that they are orthologous. A search for conserved transcription factor binding sites (TFBSs) in the promoter regions of the Ngb genes of different vertebrates via phylogenetic footprinting revealed several TFBSs, which may contribute to the specific expression of Ngb, whereas a comparative analysis with myoglobin revealed several common TFBSs, suggestive of regulatory mechanisms common to globin genes.
Identification of the placozoan and echinoderm genes orthologous to vertebrate neuroglobin strongly supports the hypothesis of the early evolutionary origin of this globin, as it shows that neuroglobin was already present in the placozoan-bilaterian last common ancestor. Computational determination of the transcription factor binding sites repertoire provides on the one hand a set of transcriptional factors that are responsible for the specific expression of the Ngb genes and on the other hand a set of factors potentially controlling expression of a couple of different globin genes.
PLoS ONE 01/2012; 7(10):e47972. · 3.73 Impact Factor
[show abstract][hide abstract] ABSTRACT: Most of eukaryotic genes are interrupted by introns that need to be removed from pre-mRNAs before they can perform their function. This is done by complex machinery called spliceosome. Many eukaryotes possess two separate spliceosomal systems that process separate sets of introns. The major (U2) spliceosome removes majority of introns, while minute fraction of intron repertoire is processed by the minor (U12) spliceosome. These two populations of introns are called U2-type and U12-type, respectively. The latter fall into two subtypes based on the terminal dinucleotides. The minor spliceosomal system has been lost independently in some lineages, while in some others few U12-type introns persist. We investigated twenty insect genomes in order to better understand the evolutionary dynamics of U12-type introns. Our work confirms dramatic drop of U12-type introns in Diptera, leaving these genomes just with a handful cases. This is mostly the result of intron deletion, but in a number of dipteral cases, minor type introns were switched to a major type, as well. Insect genes that harbor U12-type introns belong to several functional categories among which proteins binding ions and nucleic acids are enriched and these few categories are also overrepresented among these genes that preserved minor type introns in Diptera.
International journal of biological sciences 01/2012; 8(3):344-52. · 3.17 Impact Factor
[show abstract][hide abstract] ABSTRACT: Most genomes are populated by thousands of sequences that originated from mobile elements. On the one hand, these sequences present a real challenge in the process of genome analysis and annotation. On the other hand, there are very interesting biological subjects involved in many cellular processes. Here, we present an overview of transposable elements (TEs) biodiversity and their impact on genomic evolution. Finally, we discuss different approaches to the TEs detection and analyses.
Methods in molecular biology (Clifton, N.J.) 01/2012; 855:337-59.
[show abstract][hide abstract] ABSTRACT: Many multicellular eukaryotes have two types of spliceosomes for the removal of introns from messenger RNA precursors. The major (U2) spliceosome processes the vast majority of introns, referred to as U2-type introns, while the minor (U12) spliceosome removes a small fraction (less than 0.5%) of introns, referred to as U12-type introns. U12-type introns have distinct sequence elements and usually occur together in genes with U2-type introns. A phylogenetic distribution of U12-type introns shows that the minor splicing pathway appeared very early in eukaryotic evolution and has been lost repeatedly.
We have investigated the evolution of U12-type introns among eighteen metazoan genomes by analyzing orthologous U12-type intron clusters. Examination of gain, loss, and type switching shows that intron type is remarkably conserved among vertebrates. Among 180 intron clusters, only eight show intron loss in any vertebrate species and only five show conversion between the U12 and the U2-type. Although there are only nineteen U12-type introns in Drosophila melanogaster, we found one case of U2 to U12-type conversion, apparently mediated by the activation of cryptic U12 splice sites early in the dipteran lineage. Overall, loss of U12-type introns is more common than conversion to U2-type and the U12 to U2 conversion occurs more frequently among introns of the GT-AG subtype than among introns of the AT-AC subtype. We also found support for natural U12-type introns with non-canonical terminal dinucleotides (CT-AC, GG-AG, and GA-AG) that have not been previously reported.
Although complete loss of the U12-type spliceosome has occurred repeatedly, U12 introns are extremely stable in some taxa, including eutheria. Loss of U12 introns or the genes containing them is more common than conversion to the U2-type. The degeneracy of U12-type terminal dinucleotides among natural U12-type introns is higher than previously thought.
[show abstract][hide abstract] ABSTRACT: Recent studies indicate that the initial classification of transposable elements (TEs) as 'useless', 'selfish' or 'junk' pieces of DNA is not an accurate one. TEs seem to have complex regulatory functions and contribute to the coding regions of many genes. Because this contribution had been documented only at transcript level, we searched for evidence that would also support the translation of TE cassettes. Our findings suggest that the proportion of proteins with TE-encoded fragments (approximately 0.1%), although probably underestimated, is much less than what the data at transcript level suggest (approximately 4%). In all cases, the TE cassettes are derived from old TEs, consistent with the idea that incorporation (exaptation) of TE fragments into functional proteins requires long evolutionary periods. We therefore argue that functional proteins are unlikely to contain TE cassettes derived from young TEs, the role of which is probably limited to regulatory functions.
Trends in Genetics 06/2006; 22(5):260-7. · 9.77 Impact Factor
[show abstract][hide abstract] ABSTRACT: Transposable elements (TEs) are major components of eukaryotic genomes, contributing about 50% to the size of mammalian genomes. TEs serve as recombination hot spots and may acquire specific cellular functions, such as controlling protein translation and gene transcription. The latter is the subject of the analysis presented. We scanned TE sequences located in promoter regions of all annotated genes in the human genome for their content in potential transcription regulating signals. All investigated signals are likely to be over-represented in at least one TE class, which shows that TEs have an important potential to contribute to pre-transcriptional gene regulation, especially by moving transcriptional signals within the genome and thus potentially leading to new gene expression patterns. We also found that some TE classes are more likely than others to carry transcription regulating signals, which can explain why they have different retention rates in regions neighboring genes.
[show abstract][hide abstract] ABSTRACT: The chimpanzee is our closest living relative. The morphological differences between the two species are so large that there is no problem in distinguishing between them. However, the nucleotide difference between the two species is surprisingly small. The early genome comparison by DNA hybridization techniques suggested a nucleotide difference of 1-2%. Recently, direct nucleotide sequencing confirmed this estimate. These findings generated the common belief that the human is extremely close to the chimpanzee at the genetic level. However, if one looks at proteins, which are mainly responsible for phenotypic differences, the picture is quite different, and about 80% of proteins are different between the two species. Still, the number of proteins responsible for the phenotypic differences may be smaller since not all genes are directly responsible for phenotypic characters.
[show abstract][hide abstract] ABSTRACT: Classification of proteins into families is one of the main goals of functional analysis. Proteins are usually assigned to a family on the basis of the presence of family-specific patterns, domains, or structural elements. Whereas proteins belonging to the same family are generally similar to each other, the extent of similarity varies widely across families. Some families are characterized by short, well-defined motifs, whereas others contain longer, less-specific motifs. We present a simple method for visualizing such differences. We applied our method to the Arabidopsis thaliana families listed at The Arabidopsis Information Resource (TAIR) Web site and for 76% of the nontrivial families (families with more than one member), our method identifies simple similarity measures that are necessary and sufficient to cluster members of the family together. Our visualization method can be used as part of an annotation pipeline to identify potentially incorrectly defined families. We also describe how our method can be extended to identify novel families and to assign unclassified proteins into known families.
Genome Research 07/2004; 14(6):1160-9. · 14.40 Impact Factor
[show abstract][hide abstract] ABSTRACT: It is believed that 3.2 billion bp of the human genome harbor approximately 35000 protein-coding genes. On average, one could expect one gene per 300000 nucleotides (nt). Although the distribution of the genes in the human genome is not random,it is rather surprising that a large number of genes overlap in the mammalian genomes. Thousands of overlapping genes were recently identified in the human and mouse genomes. However,the origin and evolution of overlapping genes are still unknown. We identified 1316 pairs of overlapping genes in humans and mice and studied their evolutionary patterns. It appears that these genes do not demonstrate greater than usual conservation. Studies of the gene structure and overlap pattern showed that only a small fraction of analyzed genes preserved exactly the same pattern in both organisms.
Genome Research 03/2004; 14(2):280-6. · 14.40 Impact Factor
[show abstract][hide abstract] ABSTRACT: One of the most common activities in bioinformatics is the search for similar sequences. These searches are usually carried out with the help of programs from the NCBI BLAST family. As the majority of searches are routinely performed with default parameters, a question that should be addressed is how reliable the results obtained using the default parameter values are, i.e. what fraction of potential matches have been retrieved by these searches. Our primary focus is on the initial hit parameter, also known as the seed or word, used by the NCBI BLASTn, MegaBLAST and other similar programs in searches for similar nucleotide sequences. We show that the use of default values for the initial hit parameter can have a big negative impact on the proportion of potentially similar sequences that are retrieved. We also show how the hit probability of different seeds varies with the minimum length and similarity of sequences desired to be retrieved and describe methods that help in determining appropriate seeds. The experimental results described in this paper illustrate situations in which these methods are most applicable and also show the relationship between the various BLAST parameters.
Nucleic Acids Research 01/2004; 31(23):6935-41. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: Interspersed repetitive sequences are major components of eukaryotic genomes. Repetitive elements comprise about 50% of the mammalian genome. They interact with the whole genome and influence its evolution. Repetitive elements may serve as recombination hot spots or acquire specific cellular functions such as RNA transcription control or become part of protein coding regions. The latter is a subject of presented analysis. We searched all currently available vertebrate protein sequences, including human proteome complement for the presence of transposable elements. It appears that insertion of TE-cassettes into open reading frames is a general phenomena. They can be found in all vertebrate lineages and originate in all types of transposable elements. It seems that genomes use those cassettes as 'ready to use' motifs in their evolutionary experiments. Most of TE-cassettes are used to create alternative forms of a message and usually the other form, without TE-cassette, is expressed in a cell. Tables listing vertebrate messages with TE-cassettes are available at http://warta.bio.psu.edu/ScrapYard/.