[Show abstract][Hide abstract] ABSTRACT: A central undertaking in synthetic biology (SB) is the quest for the 'minimal genome'. However, 'minimal sets' of essential genes are strongly context-dependent and, in all prokaryotic genomes sequenced to date, not a single protein-coding gene is entirely conserved. Furthermore, a lack of consensus in the field as to what attributes make a gene truly essential adds another aspect of variation. Thus, a universal minimal genome remains elusive. Here, as an alternative to defining a minimal genome, we propose that the concept of gene persistence can be used to classify genes needed for robust long-term survival. Persistent genes, although not ubiquitous, are conserved in a majority of genomes, tend to be expressed at high levels, and are frequently located on the leading DNA strand. These criteria impose constraints on genome organization, and these are important considerations for engineering cells and for creating cellular life-like forms in SB.
Trends in Genetics 12/2012; 29(5). DOI:10.1016/j.tig.2012.11.001
[Show abstract][Hide abstract] ABSTRACT: Geochemistry often reveals unexpected (anti)correlations. Arsenic (As) and selenium (Se) are cases in point. We explore the hypothesis that bacteria living in an As-replete environment recruited a biological process involving Se and sulfur to fulfil their need for As detoxification. In analogy with the formation of arsenolipids and arsenosugars, which are common non-toxic As metabolites derived from microbial and plant metabolism, we attempt to explain the prevalence of novel sulfur-containing As derivatives, in particular monothioarsenate, in the aqueous environment. Thiolated-As species have been overlooked so far mainly because of the difficulty of their identification. Based on comparative genomics, we propose a scenario where SelD and SelU proteins, commonly used to make selenophosphate and modify transfer RNA, have been recruited to make monothioarsenate, a relatively innocuous arsenical. This hypothesis is discussed in terms of the relative geochemical distribution of Se and As.
[Show abstract][Hide abstract] ABSTRACT: Gene clustering plays an important role in the organization of the bacterial chromosome and several mechanisms have been proposed to explain its extent. However, the controversies raised about the validity of each of these mechanisms remind us that the cause of this gene organization remains an open question. Models proposed to explain clustering did not take into account the function of the gene products nor the likely presence or absence of a given gene in a genome. However, genomes harbor two very different categories of genes: those genes present in a majority of organisms - persistent genes - and those present in very few organisms - rare genes.
We show that two classes of genes are significantly clustered in bacterial genomes: the highly persistent and the rare genes. The clustering of rare genes is readily explained by the selfish operon theory. Yet, genes persistently present in bacterial genomes are also clustered and we try to understand why. We propose a model accounting specifically for such clustering, and show that indispensability in a genome with frequent gene deletion and insertion leads to the transient clustering of these genes. The model describes how clusters are created via the gene flux that continuously introduces new genes while deleting others. We then test if known selective processes, such as co-transcription, physical interaction or functional neighborhood, account for the stabilization of these clusters.
We show that the strong selective pressure acting on the function of persistent genes, in a permanent state of flux of genes in bacterial genomes, maintaining their size fairly constant, that drives persistent genes clustering. A further selective stabilization process might contribute to maintaining the clustering.
[Show abstract][Hide abstract] ABSTRACT: Genes consistently present in a clique of genomes, preferring the leading DNA strands are deemed persistent. The persistent bacterial proteome organises around intermediary and RNA metabolism, and RNA-related information transfer, with a significant contribution to compartmentalisation. Despite inevitable losses during evolution, the extant persistent proteome displays functions present early on. Proteins coded by genes staying clustered in a majority of genomes constitute a network of mutual attraction made up of three concentric circles. The outer one, mostly devoted to metabolism, breaks into small pieces and fades away. The second, more continuous, one organises around class I tRNA synthetases. The well-connected inner circle comprises the ribosome and information transfer. This reflects the progressive construction of cells, starting from the metabolism of coenzymes, nucleotides and fatty acids-related molecules. Subsequently, a core set of aminoacyl-tRNA synthetases scaffolded around RNA, connected to cell division machinery and organised metabolism around translation. This remarkable organisation reflects the evolution of life from small molecules metabolism to the RNA world, suggesting that extant microorganisms carry the marks of the ancient processes that created life. Further analysis suggests that RNA degradation, associated to the presence of iron, still plays a role in extant metabolism, including the evolution of genome structures.
[Show abstract][Hide abstract] ABSTRACT: Oligoribonuclease is the only RNase in Escherichia coli that is able to degrade RNA oligonucleotides five residues and shorter in length. Firmicutes including Bacillus subtilis do not have an Oligoribonuclease (Orn) homologous protein and it is not yet understood which proteins accomplish the equivalent function in these organisms. We had previously identified oligoribonucleases Orn from E. coli and its human homolog Sfn in a screen for proteins that are regulated by 3'-phosphoadenosine 5'-phosphate (pAp). Here, we identify YtqI as a potential functional analog of Orn through its interaction with pAp. YtqI degrades RNA oligonucleotides in vitro with preference for 3-mers. In addition, YtqI has pAp-phosphatase activity in vitro. In agreement with these data, YtqI is able to complement both orn and cysQ mutants in E. coli. An ytqI mutant in B. subtilis shows impairment of growth in the absence of cysteine, a phenotype resembling that of a cysQ mutant in E. coli. Phylogenetic distribution of YtqI, Orn and CysQ supports bifunctionality of YtqI.
Nucleic Acids Research 02/2007; 35(13):4552-61. DOI:10.1093/nar/gkm462
[Show abstract][Hide abstract] ABSTRACT: Gene essentiality in bacteria has been identified in silico, focusing on gene persistence, or experimentally, focusing on the growth of knockouts in rich media. Comparing 55 genomes of Firmicutes and Gamma-proteobacteria to identify the genes which, while persistent among genomes, do not lead to a lethal phenotype when inactivated, we show that the characteristics of persistence, conservation, expression, and location are shared between persistent nonessential (PNE) genes and experimentally essential genes. PNE genes show an overrepresentation of genes related to maintenance and stress response. This outlines the limits of current experimental techniques to define gene essentiality and highlights the essential role of genes implicated in maintenance which, although dispensable for growth, are not dispensable from an evolutionary point of view. Firmicutes and Gamma-proteobacteria are mostly differing in the construction of the cell envelope, DNA replication and proofreading, and RNA degradation. In addition to suggesting functions for persistent genes that had until now resisted identification, we show that these genes have many characters in common with experimentally identified essential genes. They should then be regarded as truly essential genes.
Molecular Biology and Evolution 12/2005; 22(11):2147-56. DOI:10.1093/molbev/msi211
[Show abstract][Hide abstract] ABSTRACT: A considerable fraction of life develops in the sea at temperatures lower than 15 degrees C. Little is known about the adaptive features selected under those conditions. We present the analysis of the genome sequence of the fast growing Antarctica bacterium Pseudoalteromonas haloplanktis TAC125. We find that it copes with the increased solubility of oxygen at low temperature by multiplying dioxygen scavenging while deleting whole pathways producing reactive oxygen species. Dioxygen-consuming lipid desaturases achieve both protection against oxygen and synthesis of lipids making the membrane fluid. A remarkable strategy for avoidance of reactive oxygen species generation is developed by P. haloplanktis, with elimination of the ubiquitous molybdopterin-dependent metabolism. The P. haloplanktis proteome reveals a concerted amino acid usage bias specific to psychrophiles, consistently appearing apt to accommodate asparagine, a residue prone to make proteins age. Adding to its originality, P. haloplanktis further differs from its marine counterparts with recruitment of a plasmid origin of replication for its second chromosome.
Genome Research 11/2005; 15(10):1325-35. DOI:10.1101/gr.4126905
[Show abstract][Hide abstract] ABSTRACT: The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects.
The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented.
Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore http://bioinfo.hku.hk/genochore.html, a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya) has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns.
This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis) associated to related organisms for comparison.
[Show abstract][Hide abstract] ABSTRACT: Two putative methionine aminopeptidase genes, map (essential) and yflG (non-essential), were identified in the genome sequence of Bacillus subtilis. We investigated whether they can function as methionine aminopeptidases and further explored possible reasons for their essentiality or dispensability in B. subtilis.
In silico analysis of MAP evolution uncovered a coordinated pattern of MAP and deformylase that did not correlate with the pattern of 16S RNA evolution. Biochemical assays showed that both MAP (MAP_Bs) and YflG (YflG_Bs) from B. subtilis overproduced in Escherichia coli and obtained as pure proteins exhibited a methionine aminopeptidase activity in vitro. Compared with MAP_Bs, YflG_Bs was approximately two orders of magnitude more efficient when assayed on synthetic peptide substrates. Both map and yflG genes expressed in multi-copy plasmids could complement the function of a defective map gene in the chromosomes of both E. coli and B. subtilis. In contrast, lacZ gene transcriptional fusions showed that the promoter activity of map was 50 to 100-fold higher than that of yflG. Primer extension analysis detected the transcription start site of the yflG promoter. Further work identified that YvoA acted as a possible weak repressor of yflG expression in B. subtilis in vivo.
Both MAP_Bs and YflG_Bs are functional methionine aminopeptidases in vitro and in vivo. The high expression level of map and low expression level of yflG may account for their essentiality and dispensality in B. subtilis, respectively, when cells are grown under laboratory conditions. Their difference in activity on synthetic substrates suggests that they have different protein targets in vivo.
[Show abstract][Hide abstract] ABSTRACT: A brief introduction to the genome databases GDB, GenoList and Ensembl is given. These databases, mirrored and maintained at the Centre of Bioinformatics, Peking University, provide useful information for genome research.
[Show abstract][Hide abstract] ABSTRACT: PepPat, a hybrid method that combines pattern matching with similarity scoring, is described. We also report PepPat's application in the identification of a novel tachykinin-like peptide. PepPat takes as input a query peptide and a user-specified regular expression pattern within the peptide. It first performs a database pattern match and then ranks candidates on the basis of their similarity to the query peptide. PepPat calculates similarity over the pattern spanning region, enhancing PepPat's sensitivity for short query peptides. PepPat can also search for a user-specified number of occurrences of a repeated pattern within the target sequence. We illustrate PepPat's application in short peptide ligand mining. As a validation example, we report the identification of a novel tachykinin-like peptide, C14TKL-1, and show it is an NK1 (neuokinin receptor 1) agonist whose message is widely expressed in human periphery. Availability: PepPat is offered online at: http://peppat.cbi.pku.edu.cn.
[Show abstract][Hide abstract] ABSTRACT: 54 human genes were selected as test targets for parallel cloning, expression, purification and crystallization. Proteins from these genes were selected to have a molecular weight of between 14 and 50 kDa, not to have a high percentage of hydrophobic residues (i.e. more likely to be soluble) and to have no known crystal structures and were not known to be subunits of heterocomplexes. Four proteins containing transmembrane regions were selected for comparative tests. To date, 44 expression clones have been constructed with the Gateway cloning system (Invitrogen, The Netherlands). Of these, 35 clones were expressed as recombinant proteins in Escherichia coli strain BL21 (DE3)-pLysS, of which 12 were soluble and four have been purified to homogeneity. Crystallization conditions were screened for the purified proteins in 96-well plates under oil. After further refinement with the same device or by the hanging-drop method, crystals were grown, with needle, plate and prism shapes. A 2.12 A data set was collected for protein NCC27. The results provide insights into the high-throughput target selection, cloning, expression and crystallization of human genomic proteins.
Acta Crystallographica Section D Biological Crystallography 01/2003; 58(Pt 12):2102-8. DOI:10.1107/S0907444902016359