[Show abstract][Hide abstract] ABSTRACT: Many proteins can be modified by multiple types of post-translational modiﬁcations (Mtp-proteins). Although some post-translational modiﬁcations (PTMs) are recently found associated with life-threatening diseases like cancers and neurodegenerative disorders, the underlying mechanisms remain enigmatic to date. In this study, we examined the relationship of human Mtp-proteins and disease and systematically characterized features of these proteins. Our results indicated that Mtp-proteins are significantly more inclined to participate in disease than proteins carrying no known PTM sites. Mtp-proteins were found significantly enriched in protein complexes, having more protein partners and preferred to act as hubs/super-hubs in protein-protein interaction (PPI) networks. They possess a distinct functional focus, such as chromatin assembly or disassembly, and reside in biased, multiple subcellular localizations. Moreover, most Mtp-proteins harbor more intrinsically disordered regions than the others. Mtp-proteins carrying PTM types biased towards locating in the ordered regions were mainly related to protein-DNA complex assembly. Examination of the energetic effects of PTMs on the stability of PPI revealed that only a small fraction of single PTM events influence the binding energy of >2 kcal/mol, whereas the binding energy can change dramatically by combinations of multiple PTM types. Our work not only expands the understanding of Mtp-proteins but also discloses the potential ability of Mtp-proteins to act as key elements in disease development.
Journal of Proteome Research 04/2014; · 5.06 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Salmonella enterica serotype Typhimurium human blood strains isolated from outside Africa are rarely sequenced. Here, we report the draft genome sequences of two S. Typhimurium clinical strains isolated in the same year, one from blood and another from stool, in order to gain insights into the genetic basis leading to invasive diseases.
[Show abstract][Hide abstract] ABSTRACT: In bacteria, small regulatory non-coding RNAs (sRNAs) are the most abundant class of post-transcriptional regulators. They are involved in diverse processes including quorum sensing, stress response, virulence and carbon metabolism. Recent developments in high-throughput techniques, such as genomic tiling arrays and RNA-Seq, have allowed efficient detection and characterization of bacterial sRNAs. However, a comprehensive repository to host sRNAs and their annotations is not available. Existing databases suffer from a limited number of bacterial species or sRNAs included. In addition, these databases do not have tools to integrate or analyse high-throughput sequencing data. Here, we have developed BSRD (http://kwanlab.bio.cuhk.edu.hk/BSRD), a comprehensive bacterial sRNAs database, as a repository for published bacterial sRNA sequences with annotations and expression profiles. BSRD contains over nine times more experimentally validated sRNAs than any other available databases. BSRD also provides combinatorial regulatory networks of transcription factors and sRNAs with their common targets. We have built and implemented in BSRD a novel RNA-Seq analysis platform, sRNADeep, to characterize sRNAs in large-scale transcriptome sequencing projects. We will update BSRD regularly.
Nucleic Acids Research 11/2012; · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Salmonella enterica serovar Typhimurium is one of the most prevalent serovars of Salmonella that causes human gastroenteritis. Here, we report the draft genome sequence of the S. Typhimurium multidrug-resistant strain ST1660/06. Comparative genomic analysis unveiled three strain-specific genomic islands that potentially confer the multidrug resistance and virulence of the strain.
Journal of bacteriology 11/2012; 194(22):6319-20. · 3.94 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Pyrosequencing techniques allow scientists to perform prokaryotic genome sequencing to achieve the draft genomic sequences within a few days. However, the assemblies with shotgun sequencing are usually composed of hundreds of contigs. A further multiplex PCR procedure is needed to fill all the gaps and link contigs into complete chromosomal sequence, which is the basis for prokaryotic comparative genomic studies. In this article, we study various pyrosequencing strategies by simulated assembling from 100 prokaryotic genomes.
Simulation study shows that a single end 454 Jr. run combined with a paired end 454 Jr. run (8 kb library) can produce: 1) ~90% of 100 assemblies with < 10 scaffolds and ~95% of 100 assemblies with < 150 contigs; 2) average contig N50 size is over 331 kb; 3) average single base accuracy is > 99.99%; 4) average false gene duplication rate is < 0.7%; 5) average false gene loss rate is < 0.4%.
A single end 454 Jr. run combined with a paired end 454 Jr. run (8 kb library) is a cost-effective way for prokaryotic whole genome sequencing. This strategy provides solution to produce high quality draft assemblies for most of prokaryotic organisms within days. Due to the small number of assembled scaffolds, the following multiplex PCR procedure (for gap filling) would be easy. As a result, large scale prokaryotic whole genome sequencing projects may be finished within weeks.
[Show abstract][Hide abstract] ABSTRACT: A large-scale Escherichia coli O104:H4 outbreak occurred in Germany from May to July 2011, causing numerous cases of hemolytic-uremic syndrome (HUS) and deaths. Genomes of ten outbreak isolates and a historical O104:H4 strain isolated in 2001 were sequenced using different new generation sequencing platforms. Phylogenetic analyses were performed using various approaches which either are not genome-wide or may be subject to errors due to poor sequence alignment. Also, detailed pathogenicity analyses on the 2001 strain were not available.
We reconstructed the phylogeny of E. coli using the genome-wide and alignment-free feature frequency profile method and revealed the 2001 strain to be the closest relative to the 2011 outbreak strain among all available E. coli strains at present and confirmed findings from previous alignment-based phylogenetic studies that the HUS-causing O104:H4 strains are more closely related to typical enteroaggregative E. coli (EAEC) than to enterohemorrhagic E. coli. Detailed re-examination of pathogenicity-related virulence factors and secreted proteins showed that the 2001 strain possesses virulence factors shared between typical EAEC and the 2011 outbreak strain.
Our study represents the first attempt to elucidate the whole-genome phylogeny of the 2011 German outbreak using an alignment-free method, and suggested a direct line of ancestry leading from a putative EAEC-like ancestor through the 2001 strain to the 2011 outbreak strain.