Article

Cloud computing and the DNA data race.

Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA.
Nature Biotechnology (Impact Factor: 32.44). 07/2010; 28(7):691-3. DOI: 10.1038/nbt0710-691
Source: PubMed
2 Bookmarks
 · 
86 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: There is a widening gap between the throughput of massive parallel sequencing machines and the ability to analyze these sequencing data. Traditional assembly methods requiring long execution time and large amount of memory on a single workstation limit their use on these massive data.
    BMC Bioinformatics 01/2014; 15 Suppl 9:S2. · 3.02 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Plant genetic resources collection and utilization had made a huge impact in balancing the genetic diversity of the existing crop plant species and their application in genome based studies had also increased widely. Primarily studies were based on model species, although it now enhances the transferability of information to crops and related species. With the tremendous outbreak of new high-throughput technologies like next-generation sequencing (NGS) and reduction in their costs are bringing many more plants within the range of genome and transcriptome level analysis. The completion of reference genome sequences for many important crops and the ability to perform high-throughput resequencing are providing opportunities for improving our understanding of the crop plant genetic resources to accelerate crop improvement. The future of crop improvement will be centred on comparisons of individual crop plant genomes, and some of the best opportunities may lie in using combinations of new genetic mapping strategies and evolutionary analyses to direct and optimize the discovery and use of genetic variation. Here I review the importance of crop plant genetic resources and insights that have been emerged in recent years.
    Advances in Bioscience and Biotechnology 01/2012; 3(4):378-385.
  • [Show abstract] [Hide abstract]
    ABSTRACT: A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.
    PLoS ONE 01/2014; 9(9):e108490. · 3.53 Impact Factor

Full-text

Download
1 Download
Available from