Figure 1 - uploaded by Neil Hall
Content may be subject to copyright.

Changes in the cost of sequencing over time. The change in sequencing costs over time at sequencing centers funded by the National Human Genome Research Institute (NHGRI), up until January 2013. The cost displayed is per raw megabase of sequence. Data are provided by the NHGRI Genome Sequencing Program, with full details available at 7. © NHGRI; reproduced with permission.
Source publication
Similar publications
Motivation
Minimizers are efficient methods to sample k-mers from genomic sequences that unconditionally preserve sufficiently long matches between sequences. Well-established methods to construct efficient minimizers focus on sampling fewer k-mers on a random sequence and use universal hitting sets (sets of k-mers that appear frequently enough) to...
Citations
... Given the substantial global burden parasitic diseases impose on humanity (see GBD 2015 Disease and Injury Incidence and Prevalence Collaborators, 2016), it is not surprising that new molecular methods are first used to study these medically relevant parasites, in order to better understand their biology and distribution, and develop control approaches, treatments and vaccines to reduce human suffering. While during the 1990s, cost per megabase (mb, i.e. a million bases) exceeded 10 000 USD and initial genome projects ran into the hundreds of millions (the human genome project alone is estimated at 2.7 billion USD), recent years have seen the cost of sequencing drastically reduced (2017 estimate <0.1 USD per mb) (Hall, 2013; National Human Genome Research Institute, 2018). Moreover, beyond sequencing costs, the increasingly complex datasets in the genomics period require thorough bioinformatic analyses that needed to become more accessible (Muir et al., 2016). ...
New technological methods, such as rapidly developing molecular approaches, often provide new tools for scientific advances. However, these new tools are often not utilized equally across different research areas, possibly leading to disparities in progress between these areas. Here, we use empirical evidence from the scientific literature to test for potential discrepancies in the use of genetic tools to study parasitic vs non-parasitic organisms across three distinguishable molecular periods, the allozyme, nucleotide and genomics periods. Publications on parasites constitute only a fraction (<5%) of the total research output across all molecular periods and are dominated by medically relevant parasites (especially protists), particularly during the early phase of each period. Our analysis suggests an increasing complexity of topics and research questions being addressed with the development of more sophisticated molecular tools, with the research focus between the periods shifting from predominantly species discovery to broader theory-focused questions. We conclude that both new and older molecular methods offer powerful tools for research on parasites, including their diverse roles in ecosystems and their relevance as human pathogens. While older methods, such as barcoding approaches, will continue to feature in the molecular toolbox of parasitologists for years to come, we encourage parasitologists to be more responsive to new approaches that provide the tools to address broader questions.
... Given the diminishing impact of newly sequenced genomes and that the scientific literature is saturated with articles describing them, one could be forgiven for thinking that the end of the genome paper is in sight. Indeed, various researchers have predicted the death of the genome paper and have pointed out the many flaws of a 'sequence-first-ask-questions-later' approach to genomics [18,27,28]. Some of these sentiments were summarized eloquently by Viney [28] in a Science & Society article for Trends in Parasitology: 'We have to recognise the paucity of knowledge and understanding on which our genomics analyses are based. ...
... The answers are, undoubtedly, yes and greatly.' Perhaps Hall [27] described the genomic era best in his essay After the Gold Rush: ...
Next-generation sequencing technologies have revolutionized genomics and altered the scientific publication landscape. Life-science
journals abound with genome papers—peer-reviewed descriptions of newly sequenced chromosomes. Although they once filled the
pages of Nature and Science, genome papers are now mostly relegated to journals with low-impact factors. Some have forecast the death of the genome paper
and argued that they are using up valuable resources and not advancing science. However, the publication rate of genome papers
is on the rise. This increase is largely because some journals have created a new category of manuscript called genome reports,
which are short, fast-tracked papers describing a chromosome sequence(s), its GenBank accession number and little else. In
2015, for example, more than 2000 genome reports were published, and 2016 is poised to bring even more. Here, I highlight
the growing popularity of genome reports and discuss their merits, drawbacks and impact on science and the academic publication
infrastructure. Genome reports can be excellent assets for the research community, but they are also being used as quick and
easy routes to a publication, and in some instances they are not peer reviewed. One of the best arguments for genome reports
is that they are a citable, user-generated genomic resource providing essential methodological and biological information,
which may not be present in the sequence database. But they are expensive and time-consuming avenues for achieving such a
goal.
... Selective whole-genome amplification methods like those already practical for the small genomes of viruses such as FMDV are also undergoing rapid development for larger genomes and will soon also offer the option of culture-free sequencing of targeted genomes (193,221). This is, however, a rapidly moving field, and while there is no guarantee that the current growth rate will continue indefinitely, the cost of sequencing has fallen precipitously in recent years (194). When newer sequencing technologies (discussed above) enter the commercial marketplace, they will likely facilitate sample sequencing at great depth (195). ...
In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed.
Copyright © 2015, American Society for Microbiology. All Rights Reserved.
... We may be keen to jump on the problem of cancer heterogeneity at once. With sequencing getting ever cheaper [13] and new technologies waiting for us around every corner [14], it may seem that addressing this challenge has never been easier. But we would be nowhere without the right analytical toolsand the wave of methods for the analysis of cancer evolution and phylogenies published recently [15][16][17] only goes to show that the need for a new toolkit is huge. ...
It was one Saturday morning in April when, after a 24-hour journey across half of the world, and a very early morning start to the 2014 AACR annual meeting marathon, I was sitting in a dark room in the San Diego Convention Center listening to Kornelia Polyak’s talk about the evolution of breast cancer. Polyak, an expert in breast cancer genomics, was reporting on some recent results from her group. Then, she made a statement that for many cancer researchers will perhaps seem obvious, but that was not yet obvious to me. Polyak noted that it is not uncommon that when several different clones of cancer are present in a patient, the clone that metastasizes is not the most aggressive one.
This does of course make a lot of sense: the clone that metastasizes doesn’t need to be aggressive at all. Rather, it needs to be resistant to therapy. And so it is quite logical that even if other, often more dominant and larger clones are destroyed by the treatment, the ones that were until then suppressed suddenly gain breathing space to grow and migrate. The idea resonated with me a lot, as it seemed to cover so many concepts and threads that we were already seeing in the submissions for the special issue: cancer heterogeneity, clonal diversity, therapy response and resistance, and progression of the disease. In one sentence Polyak unknowingly summarized the issue of Genome Biology that we now bring you.
... Reductions in the cost of DNA sequencing [1], have brought the large scale sequencing of individual human genomes within reach financially and there are claims that the $1000 human genome is now achievable. Low cost sequencing has facilitated the sequencing of multiple human genomes and analysis of these genomes has revealed that most individuals harbour hundreds of deleterious mutations [2]. ...
Background
The domestic pig (Sus scrofa) is both an important livestock species and a model for biomedical research. Exome sequencing has accelerated identification of protein-coding variants underlying phenotypic traits in human and mouse. We aimed to develop and validate a similar resource for the pig.
Results
We developed probe sets to capture pig exonic sequences based upon the current Ensembl pig gene annotation supplemented with mapped expressed sequence tags (ESTs) and demonstrated proof-of-principle capture and sequencing of the pig exome in 96 pigs, encompassing 24 capture experiments. For most of the samples at least 10x sequence coverage was achieved for more than 90% of the target bases. Bioinformatic analysis of the data revealed over 236,000 high confidence predicted SNPs and over 28,000 predicted indels.
Conclusions
We have achieved coverage statistics similar to those seen with commercially available human and mouse exome kits. Exome capture in pigs provides a tool to identify coding region variation associated with production traits, including loss of function mutations which may explain embryonic and neonatal losses, and to improve genomic assemblies in the vicinity of protein coding genes in the pig.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-550) contains supplementary material, which is available to authorized users.
... As can be seen in Figure 1, the cost of sequencing a single base has been halving roughly every 8 months in 2008-2013, while the cost of hard disk space has been halving every 25 months in 2004-2013. Even if the most recent NHGRI data suggest some stagnation (see also the comment [4]), it may be a temporary slowdown, as 3G instruments are becoming available (PacBio RS and Heliscope). ...
Post-Sanger sequencing methods produce tons of data, and there is a general
agreement that the challenge to store and process them must be addressed
with data compression. In this review we first answer the question
“why compression” in a quantitative manner. Then we also answer
the questions “what” and “how”, by sketching the
fundamental compression ideas, describing the main sequencing data types and
formats, and comparing the specialized compression algorithms and tools.
Finally, we go back to the question “why compression” and give
other, perhaps surprising answers, demonstrating the pervasiveness of data
compression techniques in computational biology.
Background:
Altered DNA-methylation affects biological ageing in adults and developmental processes in children. DNA-methylation is altered by environmental factors, trauma and illnesses. We hypothesised that paediatric critical illness, and the nutritional management in the paediatric intensive care unit (PICU), affects DNA-methylation changes that underly the developmental processes of childhood ageing.
Results:
We studied the impact of critical illness, and of the early use of parenteral nutrition (early-PN) versus late-PN, on "epigenetic age-deviation" in buccal mucosa of 818 former PICU-patients (406 early-PN, 412 late-PN) who participated in the 2-year follow-up of the multicentre PEPaNIC-RCT (ClinicalTrials.gov-NCT01536275), as compared with 392 matched healthy children, and assessed whether this relates to their impaired growth. The epigenetic age-deviation (difference between PedBE clock-estimated epigenetic age and chronological age) was calculated. Using bootstrapped multivariable linear regression models, we assessed the impact hereon of critical illness, and of early-PN versus late-PN. As compared with healthy children, epigenetic age of patients assessed 2 years after PICU-admission deviated negatively from chronological age (p < 0.05 in 51% of bootstrapped replicates), similarly in early-PN and late-PN groups. Next, we identified vulnerable subgroups for epigenetic age-deviation using interaction analysis. We revealed that DNA-methylation age-deceleration in former PICU-patients was dependent on age at time of illness (p < 0.05 for 83% of bootstrapped replicates), with vulnerability starting from 6 years onwards. Finally, we assessed whether vulnerability to epigenetic age-deviation could be related to impaired growth from PICU-admission to follow-up at 2 and 4 years. Multivariable repeated measures ANOVA showed that former PICU-patients, as compared with healthy children, grew less in height (p = 0.0002) and transiently gained weight (p = 0.0003) over the 4-year time course. Growth in height was more stunted in former PICU-patients aged ≥ 6-years at time of critical illness (p = 0.002) than in the younger patients.
Conclusions:
As compared with healthy children, former PICU-patients, in particular those aged ≥ 6-years at time of illness, revealed epigenetic age-deceleration, with a physical correlate revealing stunted growth in height. Whether this vulnerability around the age of 6 years for epigenetic age-deceleration and stunted growth years later relates to altered endocrine pathways activated at the time of adrenarche requires further investigation.
Environmental DNA (eDNA) has seen a massive increase in application in freshwater systems with a concurrent growth in protocol developments and a drive to gain a better understanding of the ‘ecology’ of eDNA. This raises the question of whether we are currently still in an early, developmental phase of eDNA-based assessments or already transitioning into a more applied stage for biomonitoring. I conducted a systematic literature review on 381 eDNA-focused studies in freshwater systems targeting macro-organisms over the last 5 years, assessing study goals, methods, target systems and taxa and study design aspects. The results show an increase of biomonitoring-focused studies throughout the years, while the fraction of studies investigating the ‘ecology’ of eDNA decreased. The application of metabarcoding significantly increased while studies applying qPCRs tentatively declined. A geographic inequality was observed concerning study numbers and study goals biased towards the global North. Descriptive studies increased, but the fraction of in-field studies and studies applying eDNA and conventional methods combined revealed no trend. These results show a shift towards application-focused work for eDNA-based assessments but also reveal this field to still be developing. In this transitional phase, practitioners need to ensure consistency and data comparability for long-term monitoring programmes.
Current methods of high-throughput RNA sequencing of prokaryotes, including transcriptome analysis or ribosomal profiling, need deep sequencing to achieve sufficient numbers of effective reads (e.g., mapping to mRNA) in order to also find weakly expressed genetic elements. The fraction of high-quality reads mapping to coding RNAs (i.e., mRNA) is mainly influenced by the large content of rRNA and, to a lesser extent, tRNA in total RNA. Thus, depletion of rRNA increases coverage and thus sequencing costs. RiboZero, a depletion kit based on probe hybridisation and rRNA-removal was found to be most efficient in the past, but it was discontinued in 2018. To facilitate comparability with previous experiments and to help choose adequate replacements, we compare three commercially available rRNA depletion kits also based on hybridization and magnetic beads, i.e., riboPOOLs, RiboMinus and MICROBExpress, with the former RiboZero. Additionally, we constructed biotinylated probes for magnetic bead capture and rRNA depletion in this study. Based on E. coli , we found similar efficiencies in rRNA depletion for riboPOOLs and the self-made depletion method; both comparable to the former RiboZero, followed by RiboMinus, succeeded by MICROBExpress. Further, our in-house protocol allows customized species-specific rRNA or even tRNA depletion or depletion of other RNA targets. Both, the self-made biotinylated probes and riboPOOLs, were most successful in reducing the rRNA content and thereby increasing sequencing depth concerning mRNA reads. Additionally, the number of reads matching to weakly expressed genes are increased. In conclusion, the self-made specific biotinylated probes and riboPOOLs are an adequate replacement for the former RiboZero. Both are very efficient in depleting rRNAs, increasing mRNA reads and thus sequencing efficiency.