[Show abstract][Hide abstract] ABSTRACT: The limited locations of tRNA introns are crucial for eukaryal tRNA-splicing endonuclease recognition. However, our analysis of the nuclear genome of an early-diverged red alga, Cyanidioschyzon merolae, demonstrated the first evidence of nuclear-encoded tRNA genes that contain ectopic and/or multiple introns. Some genes exhibited both intronic and permuted structures in which the 3'-half of the tRNA coding sequence lies upstream of the 5'-half, and an intron is inserted into either half. These highly disrupted tRNA genes, which account for 63% of all nuclear tRNA genes, are expressed via the orderly and sequential processing of bulge-helix-bulge (BHB) motifs at intron-exon junctions and termini of permuted tRNA precursors, probably by a C. merolae tRNA-splicing endonuclease with an unidentified subunit architecture. The results revealed a considerable diversity in eukaryal tRNA intron properties and endonuclease architectures, which will help to elucidate the acquisition mechanism of the BHB-mediated disrupted tRNA genes.
[Show abstract][Hide abstract] ABSTRACT: Transcription promoters are fundamental genomic cis-elements controlling gene expression. They can be classified into two types by the degree of imprecision of their transcription start sites: peak promoters, which initiate transcription from a narrow genomic region; and broad promoters, which initiate transcription from a wide-ranging region. Eukaryotic transcription initiation is suggested to be associated with the genomic positions and modifications of nucleosomes. For instance, it has been recently shown that histone with H3K9 acetylation (H3K9ac) is more likely to be distributed around broad promoters rather than peak promoters; it can thus be inferred that there is an association between histone H3K9 and promoter architecture.
Here, we performed a systematic analysis of transcription promoters and gene expression, as well as of epigenetic histone behaviors, including genomic position, stability within the chromatin, and several modifications. We found that, in humans, broad promoters, but not peak promoters, generally had significant associations with nucleosome positioning and modification. Specifically, around broad promoters histones were highly distributed and aligned in an orderly fashion. This feature was more evident with histones that were methylated or acetylated; moreover, the nucleosome positions around the broad promoters were more stable than those around the peak ones. More strikingly, the overall expression levels of genes associated with broad promoters (but not peak promoters) with modified histones were significantly higher than the levels of genes associated with broad promoters with unmodified histones.
These results shed light on how epigenetic regulatory networks of histone modifications are associated with promoter architecture.
[Show abstract][Hide abstract] ABSTRACT: Eukaryotic chromosomal DNA coils around histones to form nucleosomes. Although histone affinity for DNA depends on DNA sequence patterns, how nucleosome positioning is determined by them remains unknown. Here, we show relationships between nucleosome positioning and two structural characteristics of DNA conferred by DNA sequence. Analysis of bendability and hydroxyl radical cleavage intensity of nucleosomal DNA sequences indicated that nucleosomal DNA is bendable and fragile and that nucleosome positional stability was correlated with characteristics of DNA. This result explains how histone positioning is partially determined by nucleosomal DNA structure, illuminating the optimization of chromosomal DNA packaging that controls cellular dynamics.
[Show abstract][Hide abstract] ABSTRACT: Following recent advances in high-throughput mass spectrometry (MS)-based proteomics, the numbers of identified phosphoproteins and their phosphosites have greatly increased in a wide variety of organisms. Although a critical role of phosphorylation is control of protein signaling, our understanding of the phosphoproteome remains limited. Here, we report unexpected, large-scale connections revealed between the phosphoproteome and protein interactome by integrative data-mining of yeast multi-omics data. First, new phosphoproteome data on yeast cells were obtained by MS-based proteomics and unified with publicly available yeast phosphoproteome data. This revealed that nearly 60% of ∼6,000 yeast genes encode phosphoproteins. We mapped these unified phosphoproteome data on a yeast protein-protein interaction (PPI) network with other yeast multi-omics datasets containing information about proteome abundance, proteome disorders, literature-derived signaling reactomes, and in vitro substratomes of kinases. In the phospho-PPI, phosphoproteins had more interacting partners than nonphosphoproteins, implying that a large fraction of intracellular protein interaction patterns (including those of protein complex formation) is affected by reversible and alternative phosphorylation reactions. Although highly abundant or unstructured proteins have a high chance of both interacting with other proteins and being phosphorylated within cells, the difference between the number counts of interacting partners of phosphoproteins and nonphosphoproteins was significant independently of protein abundance and disorder level. Moreover, analysis of the phospho-PPI and yeast signaling reactome data suggested that co-phosphorylation of interacting proteins by single kinases is common within cells. These multi-omics analyses illuminate how wide-ranging intracellular phosphorylation events and the diversity of physical protein interactions are largely affected by each other.
[Show abstract][Hide abstract] ABSTRACT: Phosphorylation is a ubiquitous and fundamental regulatory mechanism that controls signal transduction in living cells. The number of identified phosphoproteins and their phosphosites is rapidly increasing as a result of recent mass spectrometry-based approaches.
We analyzed time-course phosphoproteome data obtained previously by liquid chromatography mass spectrometry with the stable isotope labeling using amino acids in cell culture (SILAC) method. This provides the relative phosphorylation activities of digested peptides at each of five time points after stimulating HeLa cells with epidermal growth factor (EGF). We initially calculated the correlations between the phosphorylation dynamics patterns of every pair of peptides and connected the strongly correlated pairs to construct a network. We found that peptides extracted from the same intracellular fraction (nucleus vs. cytoplasm) tended to be close together within this phosphorylation dynamics-based network. The network was then analyzed using graph theory and compared with five known signal-transduction pathways. The dynamics-based network was correlated with known signaling pathways in the NetPath and Phospho.ELM databases, and especially with the EGF receptor (EGFR) signaling pathway. Although the phosphorylation patterns of many proteins were drastically changed by the EGF stimulation, our results suggest that only EGFR signaling transduction was both strongly activated and precisely controlled.
The construction of a phosphorylation dynamics-based network provides a useful overview of condition-specific intracellular signal transduction using quantitative time-course phosphoproteome data under specific experimental conditions. Detailed prediction of signal transduction based on phosphoproteome dynamics remains challenging. However, since the phosphorylation profiles of kinase-substrate pairs on the specific pathway were localized in the dynamics-based network, our method will be a complementary strategy to explore new components of protein signaling pathways in combination with previous methods (including software) of predicting direct kinase-substrate relationships.
[Show abstract][Hide abstract] ABSTRACT: Recent phosphoproteome analyses using mass spectrometry-based technologies have provided new insights into the extensive presence of protein phosphorylation in various species and have raised the interesting question of how this protein modification was gained evolutionarily on such a large scale. We investigated this issue by using human and mouse phosphoproteome data. We initially found that phosphoproteins followed a power-law distribution with regard to their number of phosphosites: most of the proteins included only a few phosphosites, but some included dozens of phosphosites. The power-law distribution, unlike more commonly observed distributions such as normal and log-normal distributions, is considered by the field of complex systems science to be produced by a specific rich-get-richer process called preferential attachment growth. Therefore, we explored the factors that may have promoted the rich-get-richer process during phosphosite evolution. We conducted a bioinformatics analysis to evaluate the relationship of amino acid sequences of phosphoproteins with the positions of phosphosites and found an overconcentration of phosphosites in specific regions of protein surfaces and implications that in many phosphoproteins these clusters of phosphosites are activated simultaneously. Multiple phosphosites concentrated in limited spaces on phosphoprotein surfaces may therefore function biologically as cooperative modules that are resistant to selective pressures during phosphoprotein evolution. We therefore proposed a hypothetical model by which the modularization of multiple phosphosites has been resistant to natural selection and has driven the rich-get-richer process of the evolutionary growth of phosphosite numbers.
[Show abstract][Hide abstract] ABSTRACT: Data-encoding synthetic DNA, inserted into the genome of a living organism, is thought to be more robust than the current media. Because the living genome is duplicated and copied into new generations, one of the merits of using DNA material is long-term data storage within heritable media. A disadvantage of this approach is that encoded data can be unexpectedly broken by mutation, deletion, and insertion of DNA, which occurs naturally during evolution and prolongation, or laboratory experiments. For this reason, several information theory-based approaches have been developed as an error check of broken DNA data in order to achieve data durability. These approaches cannot efficiently recover badly damaged data-encoding DNA. We recently developed a DNA data-storage approach based on the multiple sequence alignment method to achieve a high level of data durability. In this paper, we overview this technology and discuss strategies for optimal application of this approach.
Systems and Synthetic Biology 01/2009; 2(1-2):19-25.
[Show abstract][Hide abstract] ABSTRACT: The analysis of archaeal tRNA genes is becoming more important to evaluate the origin and evolution of tRNA molecule. Even with the recent accumulation of complete genomes of numerous archaeal species, several tRNA genes are still required for a full complement of the codon table. We conducted comprehensive screening of tRNA genes from 47 archaeal genomes by using a combination of different types of tRNA prediction programs and extracted a total of 2,143 reliable tRNA gene candidates including 437 intron-containing tRNA genes, which covered more than 99.9% of the codon tables in Archaea. Previously, the content of intron-containing tRNA genes in Archaea was estimated to be approximately 15% of the whole tRNA genes, and most of the introns were known to be located at canonical positions (nucleotide position between 37 and 38) of precursor tRNA (pre-tRNA). Surprisingly, we observed marked enrichment of tRNA introns in five species of the archaeal order Thermoproteales; about 70% of tRNA gene candidates were found to be intron-containing tRNA genes, half of which contained multiple introns, and the introns were located at various noncanonical positions. Sequence similarity analysis revealed that approximately half of the tRNA introns found at Thermoproteales-specific intron locations were highly conserved among several tRNA genes. Intriguingly, identical tRNA intron sequences were found within different types of tRNA genes that completely lacked exon sequence similarity, suggesting that the tRNA introns in Thermoproteales could have been gained via intron insertion events at a later stage of tRNA evolution. Moreover, although the CCA sequence at the 3' terminal of pre-tRNA is added by a CCA-adding enzyme after gene transcription in Archaea, most of the tRNA genes containing highly conserved introns already encode the CCA sequence at their 3' terminal. Based on these results, we propose possible models explaining the rapid increase of tRNA introns as a result of intron insertion events via retrotransposition of pre-tRNAs. The sequences and secondary structures of the tRNA genes and their bulge-helix-bulge motifs were registered in SPLITSdb (http://splits.iab.keio.ac.jp/splitsdb/), a novel and comprehensive database for archaeal tRNA genes.
Molecular Biology and Evolution 11/2008; 25(12):2709-16. · 14.31 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: A computational analysis of the nuclear genome of a red alga, Cyanidioschyzon merolae, identified 11 transfer RNA (tRNA) genes in which the 3' half of the tRNA lies upstream of the 5' half in the genome. We verified that these genes are expressed and produce mature tRNAs that are aminoacylated. Analysis of tRNA-processing intermediates for these genes indicates an unusual processing pathway in which the termini of the tRNA precursor are ligated, resulting in formation of a characteristic circular RNA intermediate that is then processed at the acceptor stem to generate the correct termini.
[Show abstract][Hide abstract] ABSTRACT: Recent studies have proposed the interesting perspective that viral gene expression is downregulated by host microRNAs (miRNAs), small non-coding RNAs well known as post-transcriptional gene regulators. We computationally predicted human miRNA target sites within 228 human-infecting and 348 invertebrate-infecting virus genomes, and we observed that human-infecting viruses were more likely than invertebrate-infecting ones to be targeted by human miRNAs. We listed 62 possible human miRNA-targeted viruses from 6 families, most of which consisted of single-stranded RNA viruses. These results suggest that miRNAs extensively mediate antiviral defenses in humans.
[Show abstract][Hide abstract] ABSTRACT: RNA decay is thought to exert an important influence on gene expression by maintaining a steady-state level of transcripts and/or by eliminating aberrant transcripts. However, the sequence elements which control such processes have not been determined. Upstream open reading frames (uORFs) in the transcripts of several genes are reported to control translational initiation by stalling ribosomes and thereby promote RNA decay. We therefore performed bioinformatic analysis of the tissue-wide expression profiles and mRNA half-life of transcripts containing uORFs in humans and mice to assess the relationship between RNA decay and the presence of uORFs in transcripts. The expression levels of transcripts containing uORF were markedly lower than those not containing uORF. Moreover, the half-life of the uORF-containing transcripts was also shorter. These results suggest that uORFs are sequence elements that down-regulate RNA transcripts via RNA decay mechanisms.
[Show abstract][Hide abstract] ABSTRACT: In archaeal species, several transfer RNA genes have been reported to contain endogenous introns. Although most of the introns are located at anticodon loop regions between nucleotide positions 37 and 38, a number of introns at noncanonical sites and six cases of tRNA genes containing two introns have also been documented. However, these tRNA genes are often missed by tRNAscan-SE, the software most widely used for the annotation of tRNA genes. We previously developed SPLITS, a computational tool to identify tRNA genes containing one intron at a noncanonical position on the basis of its discriminative splicing motif, but the software was limited in the detection of tRNA genes with multiple introns at noncanonical sites. In this study, we initially updated the system as SPLITSX in order to correctly predict known tRNA genes as well as novel ones with multiple introns. By a comprehensive search for tRNA genes in 29 archaeal genomes using SPLITSX, we listed 43 novel candidates that contain introns at noncanonical sites. As a result, 15 contained two introns and three contained three introns within the respective putative tRNA genes. Moreover, the candidates completely complemented all the codons of two archaeal species of uncultured methanogenic archaeon, RC-I and Thermofilum pendens Hrk 5, with novel candidates that were not detectable by tRNAscan-SE alone.
[Show abstract][Hide abstract] ABSTRACT: The practical realization of DNA data storage is a major scientific goal. Here we introduce a simple, flexible, and robust data storage and retrieval method based on sequence alignment of the genomic DNA of living organisms. Duplicated data encoded by different oligonucleotide sequences was inserted redundantly into multiple loci of the Bacillus subtilis genome. Multiple alignment of the bit data sequences decoded by B. subtilis genome sequences enabled the retrieval of stable and compact data without the need for template DNA, parity checks, or error-correcting algorithms. Combined with the computational simulation of data retrieval from mutated message DNA, a practical use of this alignment-based method is discussed.
[Show abstract][Hide abstract] ABSTRACT: Analysis and visualization of biological networks, such as protein-protein and protein-DNA interactions, are crucially important toward obtaining a thorough understanding of living systems. Here, we present an integrative software platform, eXpanda, which enables an analysis of a very broad range of biological networks, with a special focus on the extraction of characteristic topologies which potentially function as units in the networks. eXpanda is provided as a Perl library which gives full-automatic connections to various biological databases via a Perl programmable interface and can perform topological analysis based on graph theory. The results of these analyses are visualizable by vector graphics. eXpanda is under GNU General Public License. Software package, detailed documentations, source codes, and some sample scripts are downloadable at http://medcd.iab.keio.ac.jp/expanda/.
[Show abstract][Hide abstract] ABSTRACT: The majority of intrinsic rho-independent terminator signals, reported to consist of stable hairpin structures followed by T-rich regions, possess the potential to operate bi-directionally and to induce transcription terminations on both strands of the DNA duplex in Escherichia coli. By using RNAMotif software, we investigated the distributions of termination motifs around the 3'-ends of overlapping and non-overlapping genes at the genomic level. We suggest that the positions of compactly encoded E. coli genes and rho-independent terminators are optimized to terminate the adjoining genes on their antisense strands efficiently, and not to mis-terminate overlapping transcripts, due to their bi-directional properties.
[Show abstract][Hide abstract] ABSTRACT: We developed a computational method to predict the retention times of peptides in HPLC using artificial neural networks (ANN). We performed stepwise multiple linear regressions and selected for ANN input amino acids that significantly affected the LC retention time. Unlike conventional linear models, the trained ANN accurately predicted the retention time of peptides containing up to 50 amino acid residues. In 834 peptides, there was a strong correlation (R2 = 0.928) between measured and predicted retention times. We demonstrated the utility of our method by the prediction of the retention time of 121,273 peptides resulting from LysC-digestion of the Escherichia coli proteome. Our approach is useful for the proteome-wide characterization of peptides and the identification of unknown peptide peaks obtained in proteome analysis.
Journal of Proteome Research 01/2007; 5(12):3312-7. · 5.06 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: A new mathematical index was developed to identify and characterize non-coding RNA (ncRNA) genes encoded within the Escherichia coli (E. coli) genome. It was designated the GMMI (Gapped Markov Model Index) and used to evaluate sequence patterns located at the separate positions of consensus sequences, codon biases and/or possible RNA structures on the basis of the Markov model. The GMMI was able to separate a set of known mRNA sequences from a mixture of ncRNAs including tRNAs and rRNAs. Consequently, the GMMI was employed to predict novel ncRNA candidates. At the beginning, possible transcription units were extracted from the E. coli genome using consensus sequences for the sigma70 promoter and the rho-independent terminator. Then, these units were evaluated by using the GMMI. This identified 133 candidate ncRNAs, which contain 29 previously annotated small RNA genes and 46 possible antisense ncRNAs. Furthermore 12 transcripts (including five antisense RNAs) were confirmed according to the expression analysis. These data suggests that the expression of small antisense RNAs might be more common than previously thought in the E. coli genome.
[Show abstract][Hide abstract] ABSTRACT: In the archaea, some tRNA precursors contain intron(s) not only in the anticodon loop region but also in diverse sites of the gene (intron-containing tRNA or cis-spliced tRNA). The parasite Nanoarchaeum equitans, a member of the Nanoarchaeota kingdom, creates functional tRNA from separate genes, one encoding the 5'-half and the other the 3'-half (split tRNA or trans-spliced tRNA). Although recent genome projects have revealed a huge amount of nucleotide sequence data in the archaea, a comprehensive methodology for intron-containing and split tRNA searching is yet to be established. We therefore developed SPLITS, which is aimed at searching for any type of tRNA gene and is especially focused on intron-containing tRNAs or split tRNAs at the genome level. SPLITS initially predicts the bulge-helix-bulge splicing motif (a well-known, required structure in archaeal pre-tRNA introns) to determine and remove the intronic regions of tRNA genes. The intron-removed DNA sequences are automatically queried to tRNAscan-SE. SPLITS can predict known tRNAs with single introns located at unconventional sites on the genes (100%), tRNAs with double introns (85.7%), and known split tRNAs (100%). Our program will be very useful for identifying novel tRNA genes after completion of genome projects. The SPLITS source code is freely downloadable at http://splits.iab.keio.ac.jp/.
[Show abstract][Hide abstract] ABSTRACT: Protein identification based on mass spectrometry (MS) has previously been performed using peptide mass fingerprinting (PMF) or tandem MS (MS/MS) database searching. However, these methods cannot identify proteins that are not already listed in existing databases. Moreover, the alternative approach of de novo sequencing requires costly equipment and the interpretation of complex MS/MS spectra. Thus, there is a need for novel high-throughput protein-identification methods that are independent of existing predefined protein databases.
Here, we present a hybrid method for genome-fingerprint scanning, known as HybGFS. This technique combines genome sequence-based peptide MS/MS ion searching with liquid-chromatography elution-time (LC-ET) prediction, to improve the reliability of identification. The hybrid method allows the simultaneous identification and mapping of proteins without a priori information about their coding sequences. The current study used standard LC-MS/MS data to query an in silico-generated six-reading-frame translation and the enzymatic digest of an entire genome. Used in conjunction with precursor/product ion-mass searching, the LC-ETs increased confidence in the peptide-identification process and reduced the number of false-positive matches. The power of this method was demonstrated using recombinant proteins from the Escherichia coli K12 strain.
The novel hybrid method described in this study will be useful for the large-scale experimental confirmation of genome coding sequences, without the need for transcriptome-level expression analysis or costly MS database searching.
[Show abstract][Hide abstract] ABSTRACT: MicroRNAs (miRNAs) are endogenous approximately 22-nucleotide (nt) non-coding RNAs that post-transcriptionally regulate the expression of target genes via hybridization to target mRNA. Using known pairs of miRNA and target mRNA in Caenorhabditis elegans, we first performed computational analysis for specific hybridization patterns between these two RNAs. We counted the numbers of perfectly complementary dinucleotide sequences and calculated the free energy within complementary base pairs of each dinucleotide, observed by sliding a 2-nt window along all nucleotides of the miRNA-mRNA duplex. We confirmed not only strong base pairing within the 5' region of miRNAs (nts 1-8) in C. elegans, but also the required mismatch within the central region (nt 9 or nt 10), and we found weak binding within the 3' region (nts 13-14). We also predicted 687 possible miRNA target transcripts, many of which are thought to be involved in C. elegans development, by combining the above mentioned hybridization tendency with the following analyses: (1) prediction of the miRNA-mRNA duplex with free-energy minimization; (2) identification of the complementary pattern within the miRNA-mRNA duplex; (3) conservation of target sites between C. elegans and C. briggsae, a related soil nematode; and (4) extraction of mRNA candidates with multiple target sites. Rigorous tests using shuffled miRNA controls supported these predictions. Our results suggest that miRNAs recognize their target mRNAs by their hybridization pattern and that many target mRNAs may be regulated through a combination of several specific miRNA target sites in C. elegans.