Article

The genome sequence of the Bayer’s emerald-bottle fly, Bellardia bayeri (Jacentkovsky 1937)

F1000
Wellcome Open Research
Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We present a genome assembly from an individual male Bellardia bayeri (the Bayer's emerald-bottle fly; Arthropoda; Insecta; Diptera; Calliphoridae). The genome sequence has a total length of 551.70 megabases. Most of the assembly is scaffolded into 7 chromosomal pseudomolecules, including the X and Y sex chromosomes. The mitochondrial genome has also been assembled and is 21.02 kilobases in length. Gene annotation of this assembly on Ensembl identified 21,153 protein-coding genes.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Preprint
Full-text available
This collection of protocols describe the standard operating procedures for DNA barcoding in the Darwin Tree of Life project. The SOPs cover barcoding of macroalgae and marine protists, fungi and lichens at the Marine Biological Association, protists at the University of Oxford, plants and lichens at the Royal Botanic Gardens of Edinburgh and animals at the Natural History Museum of London.
Preprint
Full-text available
This is the collection of Sanger Tree of Life Wet Laboratory protocols used for the generation of reference level genome assemblies. This collection currently contains protocols which cover all stages of the wet laboratory processes performed by the Sanger Tree of Life; these processes are sample preparation, sample homogenisation, HMW DNA extraction, HMW DNA fragmentation, fragmented DNA clean-up and RNA extraction. This collection will be updated as protocols within the Tree of Life Core Laboratory continue to be developed and refined.
Preprint
Full-text available
This protocol describes the fragmentation of HMW DNA from either the MagAttract v.1, Plant MagAttract v.1, or Plant MagAttract v.2 Sanger Tree of Life HMW DNA extraction protocols, using the Diagenode Megaruptor®3. This process is highly effective for DNA extracts from all of the taxonomic groups covered by the Tree of Life Programme, with DNA sheared into an average fragment size range of 12–20 kb. However, challenging samples include those that are highly concentrated or highly viscous, as well as samples which contain contaminants or impurities following DNA extraction. The output of this protocol is sheared DNA which can be directed towards fragmented DNA clean up, with either the Manual or Automated SPRI protocols. This protocol has since been updated to Sanger Tree of Life HMW DNA Fragmentation: Diagenode Megaruptor® 3 for LI PacBio in order to process samples resulting from the Sanger Tree of Life HMW DNA Extraction: Automated MagAttract v.2, Automated Plant MagAttract v.3 and Automated Plant MagAttract v.4 protocols. Acronyms and abbreviations HMW: high molecular weight SPRI: solid-phase reversible immobilisation HiFi: high fidelity LI: low input
Preprint
Full-text available
This protocol describes the manual clean up of fragmented DNA following the Sanger Tree of Life HMW DNA Fragmentation protocols, using PacBio AMPure PB beads. This process is highly effective for the cleaning and removal of shorter fragments from sheared DNA from all of the taxonomic groups covered by the Tree of Life Programme. The output of this protocol is DNA which can be submitted for long read sequencing, including PacBio sequencing following Low Input (LI) or Ultra-Low Input (ULI) library preparation. Acronyms HMW: high molecular weight SPRI: solid-phase reversible immobilisation LI: low input ULI: ultra-low input
Preprint
Full-text available
This protocol describes the process of sample preparation for the extraction of DNA and/or RNA from the wide variety of samples processed by the Tree of Life Core Laboratory as part of the Tree of Life Programme. It also describes the triage steps and recommended weight requirements for the different taxonomic groups covered by the Tree of Life programme. The output of this protocol is a sufficient amount of sample that can be directed towards any of the Sanger Tree of Life Sample Homogenisation protocols.
Preprint
Full-text available
This protocol describes the automated extraction of HMW DNA from multiple different tissue samples from a variety of species intended for long-read sequencing, using the Qiagen MagAttract HMW DNA extraction kit and the ThermoFisher KingFisher™ Apex. This process is effective for a wide variety of taxonomic groups covered by the Tree of Life Programme, excluding plants and fungi. The output of this protocol is HMW DNA, which depending upon yield and genome size of the species, can be directed towards either HMW DNA Pooling or HMW DNA Fragmentation: Diagenode Megaruptor®3 for LI HiFi. This protocol was adapted from Sanger Tree of Life HMW DNA Extraction: Manual MagAttract to include automation for a higher throughput of samples, and has since been updated to Sanger Tree of Life HMW DNA Extraction: Automated MagAttract v.2 to include a pre-shear SPRI of the HMW DNA extracted. Acronyms HMW: high molecular weight SPRI: solid-phase reversible immobilisation HiFi: high fidelity
Preprint
Full-text available
This protocol describes the homogenisation of tissue samples for DNA and/or RNA extraction intended for long read sequencing or RNA-Seq, using the Diagnocine PowerMasher II tissue disruptor. This process is highly effective for the disruption of tissue samples weighing less than 25 mg from all taxonomic groups covered by the Tree of Life Programme except for plants, fungi, protists, sponges and corals. Larger samples are not processed through this method due to difficulties in achieving homogenisation. Crustaceans and molluscs are particularly challenging samples. The output of this protocol is a sample that can be directed toward any of the Sanger Tree of Life DNA and RNA extraction protocols.
Article
Full-text available
Background PacBio high fidelity (HiFi) sequencing reads are both long (15–20 kb) and highly accurate (> Q20). Because of these properties, they have revolutionised genome assembly leading to more accurate and contiguous genomes. In eukaryotes the mitochondrial genome is sequenced alongside the nuclear genome often at very high coverage. A dedicated tool for mitochondrial genome assembly using HiFi reads is still missing. Results MitoHiFi was developed within the Darwin Tree of Life Project to assemble mitochondrial genomes from the HiFi reads generated for target species. The input for MitoHiFi is either the raw reads or the assembled contigs, and the tool outputs a mitochondrial genome sequence fasta file along with annotation of protein and RNA genes. Variants arising from heteroplasmy are assembled independently, and nuclear insertions of mitochondrial sequences are identified and not used in organellar genome assembly. MitoHiFi has been used to assemble 374 mitochondrial genomes (368 Metazoa and 6 Fungi species) for the Darwin Tree of Life Project, the Vertebrate Genomes Project and the Aquatic Symbiosis Genome Project. Inspection of 60 mitochondrial genomes assembled with MitoHiFi for species that already have reference sequences in public databases showed the widespread presence of previously unreported repeats. Conclusions MitoHiFi is able to assemble mitochondrial genomes from a wide phylogenetic range of taxa from Pacbio HiFi data. MitoHiFi is written in python and is freely available on GitHub (https://github.com/marcelauliano/MitoHiFi). MitoHiFi is available with its dependencies as a Docker container on GitHub (ghcr.io/marcelauliano/mitohifi:master).
Article
Full-text available
We present JBrowse 2, a general-purpose genome annotation browser offering enhanced visualization of complex structural variation and evolutionary relationships. It retains core features of JBrowse while adding new views for synteny, dotplots, breakpoints, gene fusions, and whole-genome overviews. It allows users to share sessions, open multiple genomes, and navigate between views. It can be embedded in a web page, used as a standalone application, or run from Jupyter notebooks or R sessions. These improvements are enabled by a ground-up redesign using modern web technology. We describe application functionality, use cases, performance benchmarks, and implementation notes for web administrators and developers. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-023-02914-z.
Article
Full-text available
The Darwin Tree of Life (DToL) project aims to sequence and assemble high-quality genomes from all eukaryote species in Britain and Ireland, with the first phase of the project concentrating on family-level coverage plus species of particular ecological, biomedical or evolutionary interest. We summarise the processes involved in (1) assessing the UK arthropod fauna and the status of individual species on UK lists; (2) prioritising and collecting species for initial genome sequencing; (3) handling methods to ensure that high-quality genomic DNA is preserved; and (4) compiling standard operating procedures for processing specimens for genome sequencing, identification verification and voucher specimen curation. We briefly explore some lessons learned from the pilot phase of DToL and the impact of the Covid-19 pandemic.
Article
Full-text available
We present YaHS, a user-friendly command-line tool for construction of chromosome-scale scaffolds from Hi-C data. It can be run with a single-line command, requires minimal input from users (an assembly file and an alignment file) which is compatible with similar tools, and provides assembly results in multiple formats, thereby enabling rapid, robust and scalable construction of high-quality genome assemblies with high accuracy and contiguity. Availability and implementation: YaHS is implemented in C and licensed under the MIT License. The source code, documentation and tutorial are available at https://github.com/sanger-tol/yahs. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
Motivation With the current pace at which reference genomes are being produced, the availability of tools that can reliably and efficiently generate genome assembly summary statistics has become critical. Additionally, with the emergence of new algorithms and data types, tools that can improve the quality of existing assemblies through automated and manual curation are required. Results We sought to address both these needs by developing gfastats, as part of the Vertebrate Genomes Project (VGP) effort to generate high-quality reference genomes at scale. Gfastats is a standalone tool to compute assembly summary statistics and manipulate assembly sequences in fasta, fastq, or gfa [.gz] format. Gfastats stores assembly sequences internally in a gfa-like format. This feature allows gfastats to seamlessly convert fast* to and from gfa [.gz] files. Gfastats can also build an assembly graph that can in turn be used to manipulate the underlying sequences following instructions provided by the user, while simultaneously generating key metrics for the new sequences. Availability Gfastats is implemented in C ++. Precompiled releases (Linux, MacOS, Windows) and commented source code for gfastats are available under MIT licence at https://github.com/vgl-hub/gfastats. Examples of how to run gfastats are provided in the Github. Gfastats is also available in Bioconda, in Galaxy (https://assembly.usegalaxy.eu) and as a MultiQC module (Ewels et al., 2016) (https://github.com/ewels/MultiQC). An automated test workflow is available to ensure consistency of software updates. Supplementary information Supplementary data are available at Bioinformatics online.
Article
Full-text available
We present a genome assembly from an individual female Bellardia pandia (the bisetose emerald-bottle; Arthropoda; Insecta; Diptera; Calliphoridae). The genome sequence is 617 megabases in span. The majority of the assembly (97.82%) is scaffolded into six chromosomal pseudomolecules, with the X sex chromosome assembled.
Article
Full-text available
Methods for evaluating the quality of genomic and metagenomic data are essential to aid genome assembly and to correctly interpret the results of subsequent analyses. BUSCO estimates the completeness and redundancy of processed genomic data based on universal single-copy orthologs. Here we present new functionalities and major improvements of the BUSCO software, as well as the renewal and expansion of the underlying datasets in sync with the OrthoDB v10 release. Among the major novelties, BUSCO now enables phylogenetic placement of the input sequence to automatically select the most appropriate dataset for the assessment, allowing the analysis of metagenome-assembled genomes of unknown origin. A newly-introduced genome workflow increases the efficiency and runtimes especially on large eukaryotic genomes. BUSCO is the only tool capable of assessing both eukaryotic and prokaryotic species, and can be applied to various data types, from genome assemblies and metagenomic bins, to transcriptomes and gene sets.
Article
Full-text available
High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
Article
Full-text available
Haplotype-resolved de novo assembly is the ultimate solution to the study of sequence variations in a genome. However, existing algorithms either collapse heterozygous alleles into one consensus copy or fail to cleanly separate the haplotypes to produce high-quality phased assemblies. Here we describe hifiasm, a de novo assembler that takes advantage of long high-fidelity sequence reads to faithfully represent the haplotype information in a phased assembly graph. Unlike other graph-based assemblers that only aim to maintain the contiguity of one haplotype, hifiasm strives to preserve the contiguity of all haplotypes. This feature enables the development of a graph trio binning algorithm that greatly advances over standard trio binning. On three human and five nonhuman datasets, including California redwood with a ~30-Gb hexaploid genome, we show that hifiasm frequently delivers better assemblies than existing tools and consistently outperforms others on haplotype-resolved assembly. Hifiasm is a haplotype-resolved de novo genome assembler for long-read high-fidelity sequencing data based on phased assembly graphs.
Article
Full-text available
Genome sequence assemblies provide the basis for our understanding of biology. Generating error-free assemblies is therefore the ultimate, but sadly still unachieved goal of a multitude of research projects. Despite the ever-advancing improvements in data generation, assembly algorithms and pipelines, no automated approach has so far reliably generated near error-free genome assemblies for eukaryotes. Whilst working towards improved datasets and fully automated pipelines, assembly evaluation and curation is actively used to bridge this shortcoming and significantly reduce the number of assembly errors. In addition to this increase in product value, the insights gained from assembly curation are fed back into the automated assembly strategy and contribute to notable improvements in genome assembly quality. We describe our tried and tested approach for assembly curation using gEVAL, the genome evaluation browser. We outline the procedures applied to genome curation using gEVAL and also our recommendations for assembly curation in a gEVAL-independent context to facilitate the uptake of genome curation in the wider community.
Article
Full-text available
The task of eukaryotic genome annotation remains challenging. Only a few genomes could serve as standards of annotation achieved through a tremendous investment of human curation efforts. Still, the correctness of all alternative isoforms, even in the best-annotated genomes, could be a good subject for further investigation. The new BRAKER2 pipeline generates and integrates external protein support into the iterative process of training and gene prediction by GeneMark-EP+ and AUGUSTUS. BRAKER2 continues the line started by BRAKER1 where self-training GeneMark-ET and AUGUSTUS made gene predictions supported by transcriptomic data. Among the challenges addressed by the new pipeline was a generation of reliable hints to protein-coding exon boundaries from likely homologous but evolutionarily distant proteins. In comparison with other pipelines for eukaryotic genome annotation, BRAKER2 is fully automatic. It is favorably compared under equal conditions with other pipelines, e.g. MAKER2, in terms of accuracy and performance. Development of BRAKER2 should facilitate solving the task of harmonization of annotation of protein-coding genes in genomes of different eukaryotic species. However, we fully understand that several more innovations are needed in transcriptomic and proteomic technologies as well as in algorithmic development to reach the goal of highly accurate annotation of eukaryotic genomes.
Article
Full-text available
Recent long-read assemblies often exceed the quality and completeness of available reference genomes, making validation challenging. Here we present Merqury, a novel tool for reference-free assembly evaluation based on efficient k-mer set operations. By comparing k-mers in a de novo assembly to those found in unassembled high-accuracy reads, Merqury estimates base-level accuracy and completeness. For trios, Merqury can also evaluate haplotype-specific accuracy, completeness, phase block continuity, and switch errors. Multiple visualizations, such as k-mer spectrum plots, can be generated for evaluation. We demonstrate on both human and plant genomes that Merqury is a fast and robust method for assembly validation.
Article
Full-text available
Australian horticulture relies heavily on the introduced managed honey bee, Apis mellifera Linnaeus 1758 (Hymenoptera: Apidae), to pollinate crops. Given the risks associated with reliance upon a single species, it would be prudent to identify other taxa that could be managed to provide crop pollination services. We reviewed the literature relating to the distribution, efficiency and management potential of a number of flies (Diptera) known to visit pollinator-dependent crops in Australia and worldwide. Applying this information, we identified the taxa most suitable to play a greater role as managed pollinators in Australian crops. Of the taxa reviewed, flower visitation by representatives from the dipteran families Calliphoridae, Rhiniidae and Syrphidae was frequently reported in the literature. While data available are limited, there was clear evidence of pollination by these flies in a range of crops. A review of fly morphology, foraging behaviour and physiology revealed considerable potential for their development as managed pollinators, either alone or to augment honey bee services. Considering existing pollination evidence, along with the distribution, morphology, behaviour and life history traits of introduced and endemic species, 11 calliphorid, two rhiniid and seven syrphid species were identified as candidates with high potential for use in Australian managed pollination services. Research directions for the comprehensive assessment of the pollination abilities of the identified taxa to facilitate their development as a pollination service are described. This triage approach to identifying species with high potential to become significant managed pollinators at local or regional levels is clearly widely applicable to other countries and taxa.
Article
Full-text available
Thanks to the development of high‐throughput sequencing technologies, target enrichment sequencing of nuclear ultraconserved DNA elements (UCEs) now allows routinely inferring phylogenetic relationships from thousands of genomic markers. Recently, it has been shown that mitochondrial DNA (mtDNA) is frequently sequenced alongside the targeted loci in such capture experiments. Despite its broad evolutionary interest, mtDNA is rarely assembled and used in conjunction with nuclear markers in capture‐based studies. Here, we developed MitoFinder, a user‐friendly bioinformatic pipeline, to efficiently assemble and annotate mitogenomic data from hundreds of UCE libraries. As a case study, we used ants (Formicidae) for which 501 UCE libraries have been sequenced whereas only 29 mitogenomes are available. We compared the efficiency of four different assemblers (IDBA‐UD, MEGAHIT, MetaSPAdes, and Trinity) for assembling both UCE and mtDNA loci. Using MitoFinder, we show that metagenomic assemblers, in particular MetaSPAdes, are well suited to assemble both UCEs and mtDNA. Mitogenomic signal was successfully extracted from all 501 UCE libraries allowing confirming species identification using CO1 barcoding. Moreover, our automated procedure retrieved 296 cases in which the mitochondrial genome was assembled in a single contig, thus increasing the number of available ant mitogenomes by an order of magnitude. By leveraging the power of metagenomic assemblers, MitoFinder provides an efficient tool to extract complementary mitogenomic data from UCE libraries, allowing testing for potential mito‐nuclear discordance. Our approach is potentially applicable to other sequence capture methods, transcriptomic data, and whole genome shotgun sequencing in diverse taxa.
Article
Full-text available
Pollenia rudis (Diptera, Calliphoridae) is a major nuisance insect pest in buildings and other structures during the fall and winter months. Adult flies invade to overwinter and cluster around windows and attics. These flies are parasites of earthworms, but the host worms are very poorly documented across North America. We report the first identification of Aporrectodea trapezoides as the host of Pollenia rudis in Colorado. We report the first two earthworm records for Washington County, Colorado (Aporrectodea trapezoides and Ap. turgida). Key Words: Colorado, Oligochaeta, Lumbricidae, earthworms, Pollenia rudis parasitization, distribution.
Article
Full-text available
Reconstruction of target genomes from sequence data produced by instruments that are agnostic as to the species-of-origin may be confounded by contaminant DNA. Whether introduced during sample processing or through co-extraction alongside the target DNA, if insufficient care is taken during the assembly process, the final assembled genome may be a mixture of data from several species. Such assemblies can confound sequence-based biological inference and, when deposited in public databases, may be included in downstream analyses by users unaware of underlying problems. We present BlobToolKit, a software suite to aid researchers in identifying and isolating non-target data in draft and publicly available genome assemblies. BlobToolKit can be used to process assembly, read and analysis files for fully reproducible interactive exploration in the browser-based Viewer. BlobToolKit can be used during assembly to filter non-target DNA, helping researchers produce assemblies with high biological credibility. We have been running an automated BlobToolKit pipeline on eukaryotic assemblies publicly available in the International Nucleotide Sequence Data Collaboration and are making the results available through a public instance of the Viewer at https://blobtoolkit.genomehubs.org/view . We aim to complete analysis of all publicly available genomes and then maintain currency with the flow of new genomes. We have worked to embed these views into the presentation of genome assemblies at the European Nucleotide Archive, providing an indication of assembly quality alongside the public record with links out to allow full exploration in the Viewer.
Article
Full-text available
Motivation: Rapid development in long read sequencing and scaffolding technologies is accelerating the production of reference-quality assemblies for large eukaryotic genomes. However, haplotype divergence in regions of high heterozygosity often results in assemblers creating two copies rather than one copy of a region, leading to breaks in contiguity and compromising downstream steps such as gene annotation. Several tools have been developed to resolve this problem. However, they either only focus on removing contained duplicate regions, also known as haplotigs, or fail to use all the relevant information and hence make errors. Results: Here we present a novel tool "purge_dups" that uses sequence similarity and read depth to automatically identify and remove both haplotigs and heterozygous overlaps. In comparison with current tools, we demonstrate that purge_dups can reduce heterozygous duplication and increase assembly continuity while maintaining completeness of the primary assembly. Moreover, purge_dups is fully automatic and can easily be integrated into assembly pipelines. Availability: The source code is written in C and is available at https://github.com/dfguan/purge_dups. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
We present HiGlass, an open source visualization tool built on web technologies that provides a rich interface for rapid, multiplex, and multiscale navigation of 2D genomic maps alongside 1D genomic tracks, allowing users to combine various data types, synchronize multiple visualization modalities, and share fully customizable views with others. We demonstrate its utility in exploring different experimental conditions, comparing the results of analyses, and creating interactive snapshots to share with collaborators and the broader public. HiGlass is accessible online at http://higlass.io and is also available as a containerized application that can be run on any platform. Electronic supplementary material The online version of this article (10.1186/s13059-018-1486-1) contains supplementary material, which is available to authorized users.
Article
Full-text available
Here we present Singularity, software developed to bring containers and reproducibility to scientific computing. Using Singularity containers, developers can work in reproducible environments of their choosing and design, and these complete environments can easily be copied and executed on other platforms. Singularity is an open source initiative that harnesses the expertise of system and software engineers and researchers alike, and integrates seamlessly into common workflows for both of these groups. As its primary use case, Singularity brings mobility of computing to both users and HPC centers, providing a secure means to capture and distribute software and compute environments. This ability to create and deploy reproducible environments across these centers, a previously unmet need, makes Singularity a game changing development for computational science.
Article
Full-text available
Motivation: BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters). Availability: The software is freely available at github.com/BioContainers/. Contact: yperez@ebi.ac.uk , European Molecular Biology Laboratory, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK, Tel: +44-1223-492686, Fax: +44-1223-494468.
Article
Full-text available
Motivation: Fast and accurate quality control is essential for studies involving next-generation sequencing data. Whilst numerous tools exist to quantify QC metrics, there is no common approach to flexibly integrate these across tools and large sample sets. Assessing analysis results across an entire project can be time consuming and error prone; batch effects and outlier samples can easily be missed in the early stages of analysis. Results: We present MultiQC, a tool to create a single report visualising output from multiple tools across many samples, enabling global trends and biases to be quickly identified. MultiQC can plot data from many common bioinformatics tools and is built to allow easy extension and customization. Availability: MultiQC is available with an GNU GPLv3 license on GitHub, the Python Package Index and Bioconda. Documentation and example reports available at http://multiqc.info CONTACT: phil.ewels@scilifelab.se.
Article
Full-text available
We use in situ Hi-C to probe the 3D architecture of genomes, constructing haploid and diploid maps of nine cell types. The densest, in human lymphoblastoid cells, contains 4.9 billion contacts, achieving 1 kb resolution. We find that genomes are partitioned into contact domains (median length, 185 kb), which are associated with distinct patterns of histone marks and segregate into six subcompartments. We identify ∼10,000 loops. These loops frequently link promoters and enhancers, correlate with gene activation, and show conservation across cell types and species. Loop anchors typically occur at domain boundaries and bind CTCF. CTCF sites at loop anchors occur predominantly (>90%) in a convergent orientation, with the asymmetric motifs "facing" one another. The inactive X chromosome splits into two massive domains and contains large loops anchored at CTCF-binding repeats. PAPERFLICK: Copyright © 2014 Elsevier Inc. All rights reserved.
Article
Full-text available
An updated checklist of the superfamilies Oestroidea and Hippoboscoidea recorded from Finland is presented. The checklist covers the following families: Calliphoridae, Rhiniidae, Sarcophagidae, Rhinophoridae, Tachinidae, Oestridae and Hippoboscidae.
Article
Full-text available
The growth and development of carrion-feeding calliphorid (Diptera: Calliphoridae) larvae, or maggots, is of great interest to forensic sciences, especially for estimation of a postmortem interval (PMI). The development rate of calliphorid larvae is influenced by the temperature of their immediate environment. Heat generation in larval feeding aggregations (=maggot masses) is a well-known phenomenon, but it has not been quantitatively described. Calculated development rates that do not include internally generated temperatures will result in overestimation of PMI. Over a period of 2.5 yr, 80 pig, Sus scrofa L., carcasses were placed out at study sites in north central Florida and northwestern Indiana. Once larval aggregations started to form, multiple internal and external temperatures, and weather observations were taken daily or every few days between 1400 and 1800 hours until pupation of the larvae. Volume of each aggregation was determined by measuring surface area and average depth. Live and preserved samples of larvae were taken for species identification. The four most common species collected were Lucilia coeruleiviridis (=Phaenicia) (Macquart) (77%), Cochliomyia macellaria (F.) (8.3%), Chrysomya rufifaces (Macquart) (7.7%), and Phormnia regina (Meigen) (5.5%). Statistical analyses showed that 1) volume of a larval mass had a strong influence on its temperature, 2) internal temperatures of masses on the ground were influenced by soil temperature and mass volume, 3) internal temperatures of masses smaller than 20 cm3 were influenced by ambient air temperature and mass volume, and 4) masses larger than 20 cm3 on the carcass had strongly regulated internal temperatures determined only by the volume of the mass, with larger volumes associated with higher temperatures. Nonsignificant factors included presence of rain or clouds, shape of the aggregation, weight of the carcass, species composition of the aggregation, time since death, or season.
Article
Motivation: Most existing coverage-based (epi)genomic datasets are one-dimensional, but newer technologies probing interactions (physical, genetic, etc.) produce quantitative maps with two-dimensional genomic coordinate systems. Storage and computational costs mount sharply with data resolution when such maps are stored in dense form. Hence, there is a pressing need to develop data storage strategies that handle the full range of useful resolutions in multidimensional genomic datasets by taking advantage of their sparse nature, while supporting efficient compression and providing fast random access to facilitate development of scalable algorithms for data analysis. Results: We developed a file format called cooler, based on a sparse data model, that can support genomically-labeled matrices at any resolution. It has the flexibility to accommodate various descriptions of the data axes (genomic coordinates, tracks and bin annotations), resolutions, data density patterns, and metadata. Cooler is based on HDF5 and is supported by a Python library and command line suite to create, read, inspect and manipulate cooler data collections. The format has been adopted as a standard by the NIH 4D Nucleome Consortium. Availability: Cooler is cross-platform, BSD-licensed, and can be installed from the Python package index or the bioconda repository. The source code is maintained on Github at https://github.com/mirnylab/cooler. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Genomics has revolutionised biological research, but quality assessment of the resulting assembled sequences is complicated and remains mostly limited to technical measures like N50. We propose a measure for quantitative assessment of genome assembly and annotation completeness based on evolutionarily informed expectations of gene content. We implemented the assessment procedure in open-source software, with sets of Benchmarking Universal Single-Copy Orthologs, named BUSCO. Software implemented in Python and datasets available for download from http://busco.ezlab.org. Evgeny.Zdobnov@unige.ch. © The Author (2015). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Article
Docker promises the ability to package applications and their dependencies into lightweight containers that move easily between different distros, start up quickly and are isolated from each other.
Article
The beneficial aspects of blowflies are viewed anthropocentrically. Despite the problems that blowflies cause man, and the disgust they engender, they play a valuable role as pollinators of some plants in horticulture and can increase seed yields. Without the early invasion of carrion by blowflies and their preparation of it for the arrival of other arthropods, the processes of putrefaction and decay of corpses would be limited. Because blowflies are the first insects to arrive at a dead body, and different species groups oviposit in a strict sequence, they are valuable in forensic medicine as determinants of the time of death. The use of blowfly maggots for the healing of osteomyelitis wounds has been superseded by antibiotics but, when surgical maggots were in vogue, they helped heal large numbers of otherwise intractable lesions.The beneficial aspects of Calliphoridae extend even to their use as food by man, although this practice is not widespread and other insect species have more appeal and are eaten more frequently. Blowflies are also of religious value or recreational use to some races of man.
PretextView (Paired REad TEXTure Viewer): a desktop application for viewing pretext contact maps
  • E Harry
Sanger-tol/readmapping: sanger-tol/readmapping v1.1.0 - Hebridean Black (1.1.0).
  • P Surana
British blowflies (Calliphoridae) and woodlouse blies (Rhinophoridae): Draft key to British Calliphoridae and Rhinophoridae.
  • S Falk
sanger-tol/genomenote (v1.0.dev).
  • P Surana
A DNA barcoding framework for taxonomic verification in the Darwin Tree of Life project [version 1; peer review: awaiting peer review].
  • A Twyford