Chase W NelsonNational Cancer Institute (USA), National Institutes of Health | NCI · Division of Cancer Epidemiology and Genetics
Chase W Nelson
Ph.D. Biological Sciences, B.A. Biology
Bioinformatics, genomics, molecular evolution, viruses, de novo mutation, evolutionary simulation, overlapping genes
About
79
Publications
13,794
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,402
Citations
Introduction
Bioinformatics, cancer, de novo mutation, genomics, HLA, HPV, molecular evolution, mutation, natural selection, neutral theory, population genetics, SARS-CoV-2, viruses
Additional affiliations
Education
August 2011 - August 2016
September 2006 - May 2010
Publications
Publications (79)
New applications of next-generation sequencing technologies use pools of DNA from multiple individuals to estimate population genetic parameters. However, no publicly available tools exist to analyze single-nucleotide polymorphism (SNP) calling results directly for evolutionary parameters important in detecting natural selection, including nucleoti...
At least six small alternative-frame open reading frames (ORFs) overlapping well-characterized SARS-CoV-2 genes have been hypothesized to encode accessory proteins. Researchers have used different names for the same ORF or the same name for different ORFs, resulting in erroneous homological and functional inferences. We propose standard names for t...
Genomics of radiation-induced damage
The potential adverse effects of exposures to radioactivity from nuclear accidents can include acute consequences such as radiation sickness, as well as long-term sequelae such as increased risk of cancer. There have been a few studies examining transgenerational risks of radiation exposure but the results have...
Human papillomavirus (HPV) causes virtually all cervical cancers and many cancers at other anatomical sites in both men and women. However, only 12 of 448 known HPV types are currently classified as carcinogens, and even the most carcinogenic type - HPV16 - only rarely leads to cancer. HPV is therefore necessary but insufficient for cervical cancer...
Introduction: Persistent infection with a high-risk HPV (HR-HPV) type causes cervical cancer and many other cancers in both men and women. Our published data have shown that the distribution of specific HPV16/35 sublineages and genetic variants differ around the world and confer greater risk of cervical precancer/cancer in the populations where the...
High-coverage sequencing allows the study of variants occurring at low frequencies within samples, but is susceptible to false-positives caused by sequencing error. Ion Torrent has a very low single nucleotide variant (SNV) error rate and has been employed for the majority of human papillomavirus (HPV) whole genome sequences. However, benchmarking...
Significance
In the current changing climate, it is essential to improve crop production and resilience under dry and nutrient-poor conditions. Desert plants have naturally evolved to flourish under such conditions. Therefore, understanding the underlying mechanisms for their adaptation can potentially help to ensure food security. The Atacama Dese...
Evolutionary biochemists have suspected that ancestral proteins were more general than their modern forms, capable of binding a wider range of targets and driving a greater diversity of reactions. While Lucas Wheeler and Michael Harms initially confirmed this suspicion, their more recent work, which increases the sample size of targets from four to...
Staphylococcus aureus has evolved into diverse lineages, known as clonal complexes (CCs), which exhibit differences in the coding sequences of core virulence factors. Whether these alterations affect functionality is poorly understood. Here, we studied the highly polymorphic pore-forming toxin LukAB. We discovered that the LukAB toxin variants prod...
Significance
Transcription factor binding sites (TFBSs) are essential for gene regulation, but the majority of TFBSs remain unknown. To discover new TFBSs, we developed a computational pipeline to analyze human and mouse ChIP-seq (chromatin immunoprecipitation followed by sequencing) data. We found that the number of motif occurrences in ChIP-seq p...
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causes acute, highly transmissible respiratory infection in humans and a wide range of animal species. Its rapid global spread has resulted in a major public health emergency, necessitating commensurately rapid research to improve control strategies. In particular, the ability to effectiv...
At least six small alternate-frame open reading frames (ORFs) overlapping well-characterized SARS-CoV-2 genes have been hypothesized to encode accessory proteins. Researchers have used different names for the same ORF or the same name for different ORFs, resulting in erroneous homological and functional inferences. We propose standard names for the...
Understanding the emergence of novel viruses requires an accurate and comprehensive annotation of their genomes. Overlapping genes (OLGs) are common in viruses and have been associated with pandemics, but are still widely overlooked. We identify and characterize ORF3d, a novel OLG in SARS-CoV-2 that is also present in Guangxi pangolin-CoVs but not...
Understanding and preventing the emergence of novel viruses requires an accurate and comprehensive understanding of their genomes. One under-investigated class of functional genomic elements is overlapping genes (OLGs), which allow a single stretch of nucleotides to encode two distinct proteins in different reading frames. Viral OLGs are common and...
Purifying (negative) natural selection is a hallmark of functional biological sequences, and can be detected in protein-coding genes using the ratio of nonsynonymous to synonymous substitutions per site (dN/dS). However, when two genes overlap the same nucleotide sites in different frames, synonymous changes in one gene may be nonsynonymous in the...
HPV16 causes half of cervical cancers worldwide; for unknown reasons, most infections resolve within two years. Here, we analyze the viral genomes of 5,328 HPV16-positive case-control samples to investigate mutational signatures and the role of human APOBEC3-induced mutations in viral clearance and cervical carcinogenesis. We identify four de novo...
Purifying (negative) natural selection is a hallmark of functional biological sequences, and can be detected in protein-coding genes using the ratio of nonsynonymous to synonymous substitutions per site ( d N / d S ). However, when two genes overlap the same nucleotide sites in different frames, synonymous changes in one gene may be nonsynonymous i...
Recent de novo mutation data allow the estimation of non-reversible mutation rates for trinucleotide sequence contexts. However, existing tools for simulating DNA sequence evolution are limited to time-reversible models or do not consider trinucleotide context-dependent rates. As this ability is critical to testing evolutionary scenarios under neut...
Topoisomerase II is a critical enzyme involved in unknotting and detangling DNA during replication, transcription, and cell division. Humans have two isoforms of topoisomerase II, α (Top2A) and β (Top2B), originating from genes on separate chromosomes and displaying distinct functional roles. In addition, these enzymes are the target of several suc...
Arthropod-borne viruses are among the most genetically constrained RNA viruses, yet they have a remarkable propensity to adapt and emerge. We studied wild birds and mosquitoes naturally infected with West Nile virus (WNV) in a 'hot spot' of virus transmission in Chicago, IL, USA. We generated full coding WNV genome sequences from spatiotemporally m...
Of the ~60 human papillomavirus (HPV) genotypes that infect the cervicovaginal epithelium, only 12–13 “high-risk” types are well-established as causing cervical cancer, with HPV16 accounting for over half of all cases worldwide. While HPV16 is the most important carcinogenic type, variants of HPV16 can differ in their carcinogenicity by 10-fold or...
In the past, short protein-coding genes were often disregarded by genome annotation pipelines. Transcriptome sequencing (RNAseq) signals outside of annotated genes have usually been interpreted to indicate either ncRNA or pervasive transcription. Therefore, in addition to the transcriptome, the translatome (RIBOseq) of the enteric pathogen Escheric...
Distribution of RCV for the short annotated genes, novel genes with and without annotated homologs.
(A) RCV distribution at BHI control. (B) RCV distribution at BHI COS.
(PPTX)
Properties of the 250 short annotated genes.
With bioinformatics methods the presence of a σ70 promoter, a ρ-independent terminator, a Shine-Dalgarno sequence and selection pressure (kA/kS) were predicted or estimated. The last column gives the classification of the short genes by the machine-learning algorithm.
(DOCX)
Conservation of the novel genes.
Summary of ORF conservation as represented in Fig 5.
(XLSX)
Significant transcriptional and translational regulation in LB compared to BHI control of the novel genes and the short annotated genes.
The mean value of the two biological replicates of transcriptome and translatome counts of the BHI control and the LB condition are shown. The log-fold change was calculated and differential gene expression was de...
Summary of NGS results.
The total number of reads, the number of reads mapping to the E. coli O157:H7 Sakai genome and the distribution of mapped reads to rRNA, tRNA and mRNA are shown. Only the reads mapping to mRNA were used for further analysis. Every library contains between 1.5–9.7 m. mRNA reads.
(DOCX)
RNAseq and RIBOseq results of three different growth conditions for the 465 novel genes and the 250 short annotated genes.
The novel genes are consecutively numbered after their appearance in the EHEC Sakai genome. The RPKM transcriptome, RPKM translatome, RCV, and coverage values represent mean values of the two biological replicates.
(DOCX)
Properties of the novel genes.
Annotated homologs in other strains/species were searched using blastp. Only the best hit is listed. The fourth column illustrates annotated homologs in other E. coli O157:H7 strains or duplications of annotated genes in EHEC Sakai. With bioinformatics methods the presence of a σ70 promoter, a ρ-independent terminator...
Transcriptional and translational regulation at BHI COS compared to BHI control of the novel genes and the short annotated genes.
The mean value of the two biological replicates of transcriptome and translatome counts of the BHI control and the stress condition COS are shown. The log-fold change was calculated and differential gene expression was d...
Summary of the Predict Protein results for the short annotated genes.
The first columns show the AA composition, followed by predicted cellular localization, number of transmembrane helices, disulfide bonds and binding motives. Additionally, secondary structures, disordered regions and domains are predicted.
(XLSX)
Classification into 'real' and 'pseudo' proteins by the machine-learning algorithm.
The upper part of the table shows the results for the novel genes and the lower part for the scrambled sequences.
(XLSX)
Conservation of intergenic sequences.
A similar process as used for Fig 5 was repeated on unannotated sequences upstream and downstream of the novel genes, but without removing sequences with stop codons. Many of the sequences had no tblastn hits (too short) and some others were excluded as more than one novel gene was situated between two annotate...
Custom script used for extracting intergenic sequences—for comparative conservation analysis.
(BASH)
Summary of the Predict Protein results for the putative proteins encoded by the novel genes.
The first columns show the AA composition, followed by predicted cellular localization, number of transmembrane helices, disulfide bonds and binding motives. Additionally, secondary structures, disordered regions and domains are predicted.
(XLSX)
Custom script used for reading frame determination in the sum signal of gene groups.
(TXT)
Custom script used for detecting sequence conservation.
(BASH)
Although most cervical human papillomavirus type 16 (HPV16) infections become undetectable within 1–2 years, persistent HPV16 causes half of all cervical cancers. We used a novel HPV wholegenome sequencing technique to evaluate an exceptionally large collection of 5,570 HPV16- infected case-control samples to determine whether viral genetic variati...
Aim
Classical HLA Class I genes (HLA-A, -B and -C) is thought to be evolved from ancestral gene(s) via duplication events and considered to be homologous. Exon 2 and 3 of those genes encodes Antigen Recognition Sites (ARS). These exons are highly polymorphic and it was believed due to positive selection. We have analyzed over 100,000 human samples...
Aim
We have been sequencing whole gene HLA Class I at large scale at Histogenetics on PacBio RSII® platform since the beginning of July of 2016. To date we have sequenced around 185,000 samples for whole gene Class I. We have found 1645, 1697 and 1664 unique sequences for A, B and C genes, respectively. Most of the intronic sequence information com...
Human papillomavirus type 16 (HPV16) is the most carcinogenic HPV, causing >50% of cervical cancers. Its unique oncogenicity is unsolved, e.g., the second most carcinogenic type (HPV18, causing ~16% of cancers) is relatively distantly related to HPV16. Mirabello et al. recently reported on viral genome data from 3,215 HPV positive specimens from wo...
Human papillomavirus type 16 (HPV16) is the most carcinogenic HPV, causing >50% of cervical cancers. Its unique oncogenicity is unsolved, e.g., the second most carcinogenic type (HPV18, causing ~16% of cancers) is relatively distantly related to HPV16. Mirabello et al. recently reported on viral genome data from 3,215 HPV positive specimens from wo...
Importance:
Certain RNA viruses can cross species barriers and cause disease in new hosts. Simian arteriviruses are a diverse group of related viruses that infect captive and wild nonhuman primates, with associated disease severity ranging from apparently asymptomatic infections to severe, viral hemorrhagic fevers. We infected nonhuman primate cel...
An evolutionary tradeoff: bipedalism requires a narrow pelvis, but larger brains require wider birth canals. Hence early birth. Chase Nelson analyzes a new theory in the debate over human neonate helplessness. Helplessness itself exerts a selective force; it requires intelligent parents.
Non-human primates (NHPs) are a historically important source of zoonotic viruses and are a gold-standard model for research on many human pathogens. However, with the exception of simian immunodeficiency virus (SIV, family Retroviridae), the blood-borne viruses harbored by these animals in the wild remain incompletely characterized. Here, we repor...
Evolutionary biologist Austin Hughes (1949–2015) died suddenly this past year. Chase Nelson describes his scientific legacy, in particular his work on frequencies of synonymous and non-synonymous mutations in neutral and adaptive evolution.
Avian influenza virus reassortants resembling the 1918 human pandemic virus can become transmissible among mammals by acquiring mutations in hemagglutinin (HA) and polymerase. Using the ferret model, we trace the evolutionary pathway by which an avian-like virus evolves the capacity for mammalian replication and airborne transmission. During initia...
Importance:
Anti-HIV CD8 T cells that are part of therapeutic treatments will need to target epitopes that do not accumulate escape mutations. Defining these epitope sequences is a necessary precursor to designing approaches that enhance the functionality of CD8 T cells with the potential to control virus replication during chronic infection or af...
J. B. S. Haldane’s theory of cost limits the rate at which evolutionary change through natural selection can occur. Chase Nelson shows that biologists must account for the complex changes that shaped humans since their divergence from the chimpanzees.
Key biological properties such as high genetic diversity and high evolutionary rate enhance the potential of certain RNA viruses to adapt and emerge. Identifying viruses with these properties in their natural hosts could dramatically improve disease forecasting and surveillance. Recently, we discovered two novel members of the viral family Arterivi...
24 Key biological properties such as high genetic diversity and high evolutionary 25 rate enhance the potential of certain RNA viruses to adapt and emerge. 26 Identifying viruses with these properties in their natural hosts could dramatically 27 improve disease forecasting and surveillance. Recently, we discovered two novel 28 members of the viral...
The emergence of human-transmissible H5N1 avian influenza viruses poses a major pandemic threat. H5N1 viruses are thought to be highly genetically diverse both among and within hosts; however, the effects of this diversity on viral replication and transmission are poorly understood. Here we use deep sequencing to investigate the impact of within-ho...
Computational evolution experiments using the population genetics simulation Mendel's Accountant have suggested that deleterious mutation accumulation may pose a threat to the long-term survival of many biological species. By contrast, experiments using the program Avida have suggested that purifying selection is extremely effective and that novel...
Biosemiotic entropy involves the deterioration of biological sign systems. The genome is a coded sign system that is connected to phenotypic outputs through the interpretive functions of the tRNA/ribosome machinery. This symbolic sign system (semiosis) at the core of all biology has been termed “biosemiosis”. Layers of biosemiosis and cellular info...
Natural populations are always changing. Hardy-Weinberg assumptions are almost never realized because populations are seldom in equilibrium, and many random events (e.g., mutations, population size fluctuations, and environmental perturbations) irrevocably alter the genetic makeup of populations. Such genetic change can be either for the better (so...
Avida is a computer program that performs evolution experiments with digital organisms. Previous work has used the program to study the evolutionary origin of complex features, namely logic operations, but has consistently used extremely large mutational fitness effects. The present study uses Avida to better understand the role of low-impact mutat...
Mutation and drift. Dynamics of Avidian mutation and drift across 20 experiments in which no logic operations were rewarded.
Evolution with reduced mutation rate. Dynamics of Avidian evolution across 30 experiments in which a genomic mutation rate of 0.5 per generation was used.
Selection threshold for beneficial mutations. Dynamics of Avidian evolution across 20 replicates (160 experiments) employing alternative mutational fitness effects of ≤ 1.0.
Encyclopedias of regulatory genomic elements provide a foundation for research in areas such as disease diagnosis, disease treatment, and crop enhancement. The construction of complete encyclopedias of organism-specific genomic elements involved in gene regulation remains a significant challenge. To address this problem, the authors present novel b...