MOCAT: A Metagenomics Assembly and Gene Prediction Toolkit

Argonne National Laboratory, United States of America
PLoS ONE (Impact Factor: 3.23). 10/2012; 7(10):e47656. DOI: 10.1371/journal.pone.0047656
Source: PubMed


MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at

Download full-text


Available from: Shinichi Sunagawa, Oct 01, 2015
65 Reads
  • Source
    • "Recently, new powerful pipelines to perform the full analysis have been released. Among them we highlight MOCAT (Kultima et al., 2012) and MetAMOS (Treangen et al., 2013). MOCAT is a modular pipeline developed for the processing, assembly, and gene prediction of metagenomics NGS reads. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In recent years a major worldwide problem has arisen with regard to infectious diseases caused by resistant bacteria. Resistant pathogens are related to high mortality and also to enormous healthcare costs. In this field, cultured microorganisms have been commonly focused in attempts to isolate antibiotic resistance genes or to identify antimicrobial compounds. Although this strategy has been successful in many cases, most of the microbial diversity and related antimicrobial molecules have been completely lost. As an alternative, metagenomics has been used as a reliable approach to reveal the prospective reservoir of antimicrobial compounds and antibiotic resistance genes in the uncultured microbial community that inhabits a number of environments. In this context, this review will focus on resistance genes as well as on novel antibiotics revealed by a metagenomics approach from the soil environment. Biotechnology prospects are also discussed, opening new frontiers for antibiotic development.
    Frontiers in Microbiology; 09/2014
  • Source
    • "Taxonomic abundance profiles were estimated for each sample with the MOCAT pipeline [20] incorporating bacterial references from the RefMG.v1 database [21], based on single copy marker genes from 1,753 bacterial reference genomes. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background In recent years, studies on the human intestinal microbiota have attracted tremendous attention. Application of next generation sequencing for mapping of bacterial phylogeny and function has opened new doors to this field of research. However, little attention has been given to the effects of choice of methodology on the output resulting from such studies. Results In this study we conducted a systematic comparison of the DNA extraction methods used by the two major collaborative efforts: The European MetaHIT and the American Human Microbiome Project (HMP). Additionally, effects of homogenizing the samples before extraction were addressed. We observed significant differences in distribution of bacterial taxa depending on the method. While eukaryotic DNA was most efficiently extracted by the MetaHIT protocol, DNA from bacteria within the Bacteroidetes phylum was most efficiently extracted by the HMP protocol. Conclusions Whereas it is comforting that the inter-individual variation clearly exceeded the variation resulting from choice of extraction method, our data highlight the challenge of comparing data across studies applying different methodologies.
    06/2014; 2(1):19. DOI:10.1186/2049-2618-2-19
  • Source
    • "Primary among these is communication, with the need to develop a common language to minimize misunderstanding and misinterpretation when discussing project design, implementation and analyses. Currently, there exist a number of different databases for exploring metagenomic, other ‘omic, and environmental datasets in the context of ocean science ([4-7]). However, a common language to facilitate communication must be built on a series of standardization efforts. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The National Science Foundation’s EarthCube End User Workshop was held at USC Wrigley Marine Science Center on Catalina Island, California in August 2013. The workshop was designed to explore and characterize the needs and tools available to the community that is focusing on microbial and physical oceanography research with a particular emphasis on ‘omic research. The assembled researchers outlined the existing concerns regarding the vast data resources that are being generated, and how we will deal with these resources as their volume and diversity increases. Particular attention was focused on the tools for handling and analyzing the existing data, on the need for the construction and curation of diverse federated databases, as well as development of shared, interoperable, “big-data capable” analytical tools. The key outputs from this workshop include (i) critical scientific challenges and cyber infrastructure constraints, (ii) the current and future ocean ‘omics science grand challenges and questions, and (iii) data management, analytical and associated and cyber-infrastructure capabilities required to meet critical current and future scientific challenges. The main thrust of the meeting and the outcome of this report is a definition of the ‘omics tools, technologies and infrastructures that facilitate continued advance in ocean science biology, marine biogeochemistry, and biological oceanography.
    Standards in Genomic Sciences 03/2014; 9(3). DOI:10.4056/sigs.5749944
Show more

Questions & Answers about this publication