Genome Research (GENOME RES )

Publisher: Cold Spring Harbor Laboratory Press


The journal focuses on genome studies in all species, and presents research that provides or aids in genome-based analyses of biological processes. The journal represents a nexus point where genomic information, applications, and technology come together with biological information to create a more global understanding of all biological systems.

Impact factor 13.85

  • Hide impact factor history
    Impact factor
  • 5-year impact
  • Cited half-life
  • Immediacy index
  • Eigenfactor
  • Article influence
  • Website
    Genome Research website
  • Other titles
    Genome research (Online), Genome research
  • ISSN
  • OCLC
  • Material type
    Online system or service, Periodical, Internet resource
  • Document type
    Internet Resource, Computer File, Journal / Magazine / Newspaper

Publisher details

Cold Spring Harbor Laboratory Press

  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author can archive a post-print version
  • Conditions
    • Author's pre-print on preprint server
    • Author's pre-print must be updated with citation, DOI and link to article upon publication
    • Publisher's version/PDF may be used after 6 months
    • Publisher's version/PDF and Author's post-print on author's personal website, institutional repository, funder's designated repository
    • Authors retain copyright
    • Content automatically sent to PubMed Central after 6 months
    • Publisher copyright and source must be acknowledged
    • Publisher last contacted on 15/07/2013
  • Classification
    ​ green

Publications in this journal

  • Genome Research 01/2015;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The eukaryotic genome has vast intergenic regions containing transposons, pseudogenes, and other repetitive sequences. They produce numerous long non-coding RNAs (lncRNAs) and PIWI-interacting RNAs (piRNAs), yet the functions of the vast intergenic regions remain largely unknown. Mammalian piRNAs are abundantly expressed in late spermatocytes and round spermatids, coinciding with the widespread expression of lncRNAs in these cells. Here, we show that piRNAs derived from transposons and pseudogenes mediate the degradation of a large number of mRNAs and lncRNAs in mouse late spermatocytes. In particular, they have a large impact on the lncRNA transcriptome, as a quarter of lncRNAs expressed in late spermatocytes are up-regulated in mice deficient in the piRNA pathway. Furthermore, our genomic and in vivo functional analyses reveal that retrotransposon sequences in the 3' UTR of mRNAs are targeted by piRNAs for degradation. Similarly, the degradation of spermatogenic cell-specific lncRNAs by piRNAs is mediated by retrotransposon sequences. Moreover, we show that pseudogenes regulate mRNA stability via the piRNA pathway. The degradation of mRNAs and lncRNAs by piRNAs requires PIWIL1 (also known as MIWI) and, at least in part, depends on its slicer activity. Together, these findings reveal the presence of a highly complex and global RNA regulatory network mediated by piRNAs with retrotransposons and pseudogenes as regulatory sequences. Published by Cold Spring Harbor Laboratory Press.
    Genome Research 12/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Formation of heterochromatin serves a critical role in organizing the genome and regulating gene expression. In most organisms, heterochromatin flanks centromeres and telomeres. To identify heterochromatic regions in the heavily studied model C. elegans, which possesses holocentric chromosomes with dispersed centromeres, we analyzed the genome-wide distribution of the heterochromatin protein 1 (HP1) ortholog HPL-2 and compared its distribution to other features commonly associated with heterochromatin. HPL-2 binding highly correlates with histone H3 mono- and dimethylated at lysine 9 (H3K9me1 and H3K9me2) and forms broad domains on autosomal arms. Although HPL-2, like other HP1 orthologs, binds H3K9me peptides in vitro, the distribution of HPL-2 in vivo appears relatively normal in mutant embryos that lack H3K9me, demonstrating that the chromosomal distribution of HPL-2 can be achieved in an H3K9me-independent manner. Consistent with HPL-2 serving roles independent of H3K9me, hpl-2 mutant worms display more severe defects than mutant worms lacking H3K9me. HPL-2 binding is enriched for repetitive sequences, and on chromosome arms is anticorrelated with centromeres. At the genic level, HPL-2 preferentially associates with well-expressed genes, and loss of HPL-2 results in up-regulation of some binding targets and down-regulation of others. Our work defines heterochromatin in an important model organism and uncovers both shared and distinctive properties of heterochromatin relative to other systems. © 2015 Garrigues et al.; Published by Cold Spring Harbor Laboratory Press.
    Genome Research 12/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: 24 nucleotide small interfering (siRNAs) are central players in RNA-directed DNA methylation (RdDM), a process that establishes and maintains DNA methylation at transposable elements to ensure genome stability in plants. The plant-specific RNA polymerase IV (Pol IV) is required for siRNA biogenesis and is thought to transcribe RdDM loci to produce primary transcripts that are converted to double-stranded RNAs (dsRNAs) by RDR2 to serve as siRNA precursors. Yet, no such siRNA precursor transcripts have ever been reported. Here, through genome-wide profiling of RNAs in genotypes that compromise the processing of siRNA precursors, we were able to identify Pol IV/RDR2-dependent transcripts from tens of thousands of loci. We show that Pol IV/RDR2-dependent transcripts correspond to both DNA strands, while the RNA polymerase II (Pol II)-dependent transcripts produced upon de-repression of the loci are derived primarily from one strand. We also show that Pol IV/RDR2-dependent transcripts have a 5' monophosphate, lack a polyA tail at the 3' end, and contain no introns; these features distinguish them from Pol II-dependent transcripts. Like Pol II-transcribed genic regions, Pol IV-transcribed regions are flanked by A/T-rich sequences depleted in nucleosomes, which highlights similarities in Pol II- and Pol IV-mediated transcription. Computational analysis of siRNA abundance from various mutants reveals differences in the regulation of siRNA biogenesis at two types of loci that undergo CHH methylation via two different DNA methyltransferases. These findings begin to reveal features of Pol IV/RDR2-mediated transcription at the heart of genome stability in plants. Published by Cold Spring Harbor Laboratory Press.
    Genome Research 11/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: To understand the evolutionary dynamics between transcription factor (TF) binding and gene expression in mammals, we compared transcriptional output and the binding intensities for three tissue-specific TFs in livers from four closely related mouse species. For each transcription factor, TF dependent genes and the TF binding sites most likely to influence mRNA expression were identified by comparing mRNA expression levels between wildtype and TF knockout mice. Independent evolution was observed genome-wide between the rate of change in TF binding and the rate of change in mRNA expression across taxa, with the exception of a small number of TF dependent genes. We also found that binding intensities are preferentially conserved near genes whose expression is dependent on the TF, and the conservation is shared among binding peaks in close proximity to each other near the TSS. Expression of TF dependent genes typically showed an increased sensitivity to changes in binding levels, as measured by mRNA abundance. Taken together, these results highlight a significant tolerance to evolutionary changes in TF binding intensity in mammalian transcriptional networks, and suggest that some TF dependent genes may be largely regulated by a single TF across evolution.
    Genome Research 11/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Despite overwhelming evidence that transcriptional activation by p53 is critical for its tumor suppressive activity, the mechanisms by which p53 engages the genome in the context of chromatin to activate transcription are not well understood. Using a compendium of novel and existing genome-wide datasets, we examined the relationship between p53 binding and the dynamics of the local chromatin environment. Our analysis revealed three distinct categories of p53 binding events that differ based on the dynamics of the local chromatin environment. The first class of p53 binding events occur near transcriptional start sites (TSS) and are defined by previously-characterized promoter-associated chromatin modifications. The second class comprises a large cohort of pre-established, promoter-distal enhancer elements that demonstrate dynamic histone acetylation and transcription upon p53 binding. The third class of p53 binding sites are devoid of classic chromatin modifications and, remarkably, fall within regions of inaccessible chromatin, suggesting that p53 has intrinsic pioneer factor activity and binds within structurally inaccessible regions of chromatin. Intriguingly, these inaccessible p53 binding sites feature several enhancer-like properties in cell types within the epithelial lineage, indicating that p53 binding events include a group of 'proto-enhancers' that become active enhancers given the appropriate cellular context. These data indicate that p53, along with p63, may act as pioneer factors to specify epithelial enhancers. Further, these findings suggest that, rather than following a global cell-type invariant stress response program, p53 may tune its response based on the lineage-specific epigenomic landscape.
    Genome Research 11/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: We introduce a method for simultaneous prediction of microRNA-target interactions and their mediated competitive endogenous RNA (ceRNA) interactions. Using high-throughput validation assays in breast cancer cell lines, we show that our integrative approach significantly improves on microRNA-target prediction accuracy as assessed by both mRNA and protein level measurements. Our biochemical assays support nearly 500 microRNA-target interactions with evidence for regulation in breast-cancer tumors. Moreover, these assays constitute the most extensive validation platform for computationally inferred networks of microRNA-target interactions in breast-cancer tumors, providing a useful benchmark to ascertain future improvements.
    Genome Research 11/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Despite considerable genetic heterogeneity underlying neurodevelopmental diseases, there is compelling evidence that many disease genes will map to a much smaller number of biological subnetworks. We developed a computational method, termed MAGI (Merging Affected Genes into Integrated-networks), that simultaneously integrates protein-protein interaction and RNA-seq expression profiles during brain development to discover 'modules' enriched for de novo mutations in probands. We applied this method to recent exome sequencing of 1116 autism and intellectual disability patients discovering two distinct modules that differ in their properties and associated phenotypes. The first module consists of 80 genes associated with Wnt, Notch, SWI/SNF and NCOR complexes and shows the highest expression early during embryonic development (8-16 post-conception weeks, pcw). The second module consists of 24 genes associated with synaptic function, including long-term potentiation and calcium signaling with higher levels of postnatal expression. Patients with de novo mutations in these modules are more significantly intellectually impaired and carry more severe missense mutations when compared to probands with de novo mutations outside of these modules. We used our approach to define subsets of the network associated with higher functioning autism as well as greater severity with respect to IQ. Finally, we applied MAGI independently to epilepsy and schizophrenia exome sequencing cohorts and find significant overlap as well as expansion of these modules suggesting a core set of integrated neurodevelopmental networks common to seemingly diverse human diseases.
    Genome Research 11/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly.
    Genome Research 11/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: RNA editing increases transcriptome diversity through post-transcriptional modifications of RNA. Adenosine deaminases that act on RNA (ADARs) catalyze the adenosine-to-inosine (A-to-I) conversion, the most common type of RNA editing in higher eukaryotes. C. elegans has two ADARs, ADR-1 and ADR-2, but their functions remain unclear. Here, we profiled the RNA editomes of C. elegans at different developmental stages of the wild type and ADAR mutants. We developed a new computational pipeline with a 'bisulfite-seq-mapping-like' step and achieved a 3-fold increase in identification sensitivity. 99.5% of the 47,660 A-to-I editing sites were found in clusters. Of the 3,080 editing clusters, 65.7% overlapped with DNA transposons in noncoding regions and 73.7% could form hairpin structures. The numbers of editing sites and clusters were highest at the L1 and embryonic stages. The editing frequency of a cluster positively correlated with its number of editing sites within it. Intriguingly, for 80% of the clusters with ten or more editing sites, almost all expressed transcripts were edited. Deletion of adr-1 reduced the editing frequency but not the number of editing clusters, whereas deletion of adr-2 nearly abolished RNA editing, indicating a modulating role of ADR-1 and an essential role of ADR-2 in A-to-I editing. Quantitative proteomics analysis showed that adr-2 mutant worms altered the abundance of proteins involved in aging and lifespan regulation. Consistent with this finding, we observed that worms lacking RNA editing were short-lived. Taken together, our results revealed a sophisticated landscape of RNA editing and distinct modes of action of different ADARs.
    Genome Research 11/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Large-scale bacterial genome sequencing efforts to date have provided limited information on the most prevalent category of disease: sporadically acquired infections caused by common pathogenic bacteria. Here, we performed whole genome sequencing and de novo assembly of 312 blood- or urine-derived isolates of extraintestinal pathogenic (ExPEC) Escherichia coli, a common agent of sepsis and community-acquired urinary tract infections, obtained during the course of routine clinical care at a single institution. We find that ExPEC E. coli are highly genomically heterogeneous, consistent with pan-genome analyses encompassing the larger species. Investigation of differential virulence factor content and antibiotic resistance phenotypes reveals markedly different profiles among lineages and among strains infecting different bodily sites. We use high-resolution molecular epidemiology to explore the dynamics of infections at the level of individual patients, including identification of possible person-to-person transmission. Notably, a limited number of discrete lineages caused the majority of bloodstream infections, including one sub-clone (ST131-H30) responsible for 28% of bacteremic E. coli infections over a three-year period. We additionally use a microbial genome-wide-association study (GWAS) approach to identify individual genes responsible for antibiotic resistance, successfully recovering known genes but notably not identifying any novel factors. We anticipate that in the near future, whole genome sequencing of microorganisms associated with clinical disease will become routine. Our study reveals what kinds of information can be obtained from sequencing clinical isolates on a large scale, even well-characterized organisms such as E. coli, and provides insight into how this information might be utilized in a healthcare setting.
    Genome Research 11/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: V(D)J genomic recombination joins single gene segments to encode an extensive repertoire of antigen receptor specificities in T and B lymphocytes. This process initiates with double-stranded breaks adjacent to conserved recombination signal sequences that contain either 12 or 23 nucleotide spacer regions. Only recombination between signal sequences with unequal spacers result in productive coding genes, a phenomenon known as the '12/23 rule'. Here we present two novel genomic tools that allow the capture and analysis of immune locus rearrangements from whole thymic and splenic tissues using second generation sequencing. Further, we provide strong evidence that the 12/23 rule of genomic recombination is frequently violated under physiological conditions resulting in unanticipated hybrid recombinations in ~10% of Tcra excision circles. Hence, we demonstrate that strict adherence to the 12/23 rule is intrinsic neither to recombination signal sequences nor to the catalytic process of recombination and propose that non-classical excision circles are liberated during the formation of antigen receptor diversity.
    Genome Research 11/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: We used comparative and population genomics to study intron evolutionary dynamics in the fungal model genus Neurospora. For our investigation, we used well-annotated genomes of N. crassa, N. discreta and N. tetrasperma, and 92 resequenced genomes of N. tetrasperma from natural populations. By analyzing the four well-annotated genomes, we identified 9,495 intron sites in 7,619 orthologous genes. Our data supports non-homologous end joining (NHEJ) and tandem duplication as mechanisms for intron gains in the genus, and the RT-mRNA process as a mechanism for intron loss. We found a moderate intron gain rate (5.78 to 6.89 x10-13 intron gains per nucleotide site per year) and a high intron loss rate (7.53 to 13.76 x10-10 intron losses per intron sites per year) as compared to other eukaryotes. The derived intron gains and losses are skewed to high frequencies, relative to neutral SNPs, in natural populations of N. tetrasperma, suggesting that selection is involved in maintaining a high intron turnover. Furthermore, our analyses of the association between intron population-level frequency and genomic features suggest that selection is involved in shaping a 5' intron position bias and a low intron GC content. However, intron sequence analyses suggest the gained introns were not exposed to recent selective sweeps. Taken together, this work contributes to our understanding of the importance of mutational bias and selection in shaping the intron distribution in eukaryotic genomes.
    Genome Research 10/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: We describe a method that exploits contiguity preserving transposase sequencing (CPT-seq) to facilitate the scaffolding of de novo genome assemblies. CPT-seq is an entirely in vitro means of generating libraries comprised of 9,216 indexed pools, each of which contains thousands of sparsely sequenced long fragments ranging from 5 kilobases to over 1 megabase. These pools are 'sub-haploid', in that the lengths of fragments contained in each pool sums to approximately 5 to 10% of the full genome. The scaffolding approach described here, termed fragScaff, leverages coincidences between the content of different pools as a source of contiguity information. Specifically, CPT-seq data is mapped to a de novo genome assembly, followed by the identification of pairs of contigs or scaffolds whose ends disproportionately co-occur in the same indexed pools, consistent with true adjacency in the genome. Such candidate 'joins' are used to construct a graph, which is then resolved by a minimum spanning tree. As a proof-of-concept, we apply CPT-seq and fragScaff to substantially boost the contiguity of de novo assemblies of the human, mouse, and fly genomes, increasing the scaffold N50 of de novo assemblies by 8 to 57 fold with high accuracy. We also demonstrate that fragScaff is complementary to Hi-C-based contact probability maps, providing mid-range contiguity to support robust, accurate chromosome-scale de novo genome assemblies without the need for laborious in vivo cloning steps. Finally, we demonstrate CPT-seq as a means of anchoring unplaced novel human contigs to the reference genome as well as for detecting misassembled sequences.
    Genome Research 10/2014;