Article

Refinement of the Antarctic fur seal (Arctocephalus gazella) reference genome increases continuity and completeness

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The Antarctic fur seal (Arctocephalus gazella) is an important top predator and indicator of the health of the Southern Ocean ecosystem. Although abundant, this species narrowly escaped extinction due to historical sealing and is currently declining as a consequence of climate change. Genomic tools are essential for understanding these anthropogenic impacts and for predicting long-term viability. However, the current reference genome (“arcGaz3”) shows considerable room for improvement in terms of both completeness and contiguity. We therefore combined PacBio sequencing, haplotype-aware HiRise assembly and scaffolding based on Hi-C information to generate a refined assembly of the Antarctic fur seal reference genome (“arcGaz4_h1”). The new assembly is 2.53Gb long, has a scaffold N50 of 55.6Mb and includes 18 chromosome-sized scaffolds, which correspond to the 18 chromosomes expected in otariids. Genome completeness is greatly improved, with 23,408 annotated genes and a Benchmarking Universal Single-Copy Orthologs (BUSCO) score raised from 84.7% to 95.2%. We furthermore included the new genome in a reference-free alignment of the genomes of eleven pinniped species to characterize evolutionary conservation across the Pinnipedia using genome-wide Genomic Evolutionary Rate Profiling (GERP). We then implemented Gene Ontology (GO) enrichment analyses to identify biological processes associated with those genes showing the highest levels of either conservation or differentiation between the two major pinniped families, the Otariidae and Phocidae. We show that processes linked to neuronal development, the circulatory system and osmoregulation are overrepresented both in conserved as well as in differentiated regions of the genome.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Full-text available
The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this publication we describe enhancements made to our data processing pipeline and to our website to adapt to an ever-increasing information content. The number of sequences in UniProtKB has risen to over 227 million and we are working towards including a reference proteome for each taxonomic group. We continue to extract detailed annotations from the literature to update or create reviewed entries, while unreviewed entries are supplemented with annotations provided by automated systems using a variety of machine-learning techniques. In addition, the scientific community continues their contributions of publications and annotations to UniProt entries of their interest. Finally, we describe our new website (https://www.uniprot.org/), designed to enhance our users’ experience and make our data easily accessible to the research community. This interface includes access to AlphaFold structures for more than 85% of all entries as well as improved visualisations for subcellular localisation of proteins.
Article
Full-text available
The major histocompatibility complex (MHC) is a group of genes comprising one of the most important components of the vertebrate immune system. Consequently, there has been much interest in characterising MHC variation and its relationship with fitness in a variety of species. Due to the exceptional polymorphism of MHC genes, careful PCR primer design is crucial for capturing all of the allelic variation present in a given species. We therefore developed intronic primers to amplify the full-length 267 bp protein-coding sequence of the MHC class II DQB exon 2 in the Antarctic fur seal. We then characterised patterns of MHC variation among mother–offspring pairs from two breeding colonies and detected 19 alleles among 771 clone sequences from 56 individuals. The distribution of alleles within and among individuals was consistent with a single-copy, classical DQB locus showing Mendelian inheritance. Amino acid similarity at the MHC was significantly associated with genome-wide relatedness, but no relationship was found between MHC heterozygosity and genome-wide heterozygosity. Finally, allelic diversity was several times higher than reported by a previous study based on partial exon sequences. This difference appears to be related to allele-specific amplification bias, implying that primer design can strongly impact the inference of MHC diversity.
Article
Full-text available
We present the fifth edition of the TimeTree of Life resource (TToL5), a product of the timetree of life project that aims to synthesize published molecular timetrees and make evolutionary knowledge easily accessible to all. Using the TToL5 web portal, users can retrieve published studies and divergence times between species, the timeline of a species’ evolution beginning with the origin of life, and the timetree for a given evolutionary group at the desired taxonomic rank. TToL5 contains divergence time information on 137,306 species, 41% more than the previous edition. The TToL5 web interface is now ADA-compliant and mobile-friendly, a result of comprehensive source code refactoring. TToL5 also offers programmatic access to species divergence times and timelines through an application programming interface, which is accessible at timetree.temple.edu/api. TToL5 is publicly available at timetree.org.
Article
Full-text available
The Hawaiian monk seal (HMS) is the single extant species of tropical earless seals of the genus Neomonachus. The species survived a severe bottleneck in the late 19th century and experienced subsequent population declines until becoming the subject of a NOAA-led species recovery effort beginning in 1976 when the population was fewer than 1000 animals. Like other recovering species, the Hawaiian monk seal has been reported to have reduced genetic heterogeneity due to the bottleneck and subsequent inbreeding. Here, we report a chromosomal reference assembly for a male animal produced using a variety of methods. The final assembly consisted of 16 autosomes, an X, and portions of the Y chromosomes. We compared variants in this animal to other HMS and to a frequently sequenced human sample, confirming about 12% of the variation seen in man. To confirm that the reference animal was representative of the HMS, we compared his sequence to that of 10 other individuals and noted similarly low variation in all. Variation in the major histocompatibility (MHC) genes was nearly absent compared to the orthologous human loci. Demographic analysis predicts that Hawaiian monk seals have had a long history of small populations preceding the bottleneck, and their current low levels of heterozygosity may indicate specialization to a stable environment. When we compared our reference assembly to that of other species, we observed significant conservation of chromosomal architecture with other pinnipeds, especially other phocids. This reference should be a useful tool for future evolutionary studies as well as the long-term management of this species.
Article
Full-text available
Much debate surrounds the importance of top-down and bottom-up effects in the Southern Ocean, where the harvesting of over two million whales in the mid twentieth century is thought to have produced a massive surplus of Antarctic krill. This excess of krill may have allowed populations of other predators, such as seals and penguins, to increase, a top-down hypothesis known as the ‘krill surplus hypothesis’. However, a lack of pre-whaling population baselines has made it challenging to investigate historical changes in the abundance of the major krill predators in relation to whaling. Therefore, we used reduced representation sequencing and a coalescent-based maximum composite likelihood approach to reconstruct the recent demographic history of the Antarctic fur seal, a pinniped that was hunted to the brink of extinction by 18th and 19th century sealers. In line with the known history of this species, we found support for a demographic model that included a substantial reduction in population size around the time period of sealing. Furthermore, maximum likelihood estimates from this model suggest that the recovered, post-sealing population at South Georgia may have been around two times larger than the pre-sealing population. Our findings lend support to the krill surplus hypothesis and illustrate the potential of genomic approaches to shed light on long-standing questions in population biology.
Article
Full-text available
The Weddell seal (Leptonychotes weddellii) thrives in its extreme Antarctic environment. We generated the Weddell seal genome assembly and a high-quality annotation to investigate genome-wide evolutionary pressures that underlie its phenotype and to study genes implicated in hypoxia tolerance and a lipid-based metabolism. Genome-wide analyses included gene family expansion/contraction, positive selection, and diverged sequence (acceleration) compared to other placental mammals, identifying selection in coding and non-coding sequence in five pathways that may shape cardiovascular phenotype. Lipid metabolism as well as hypoxia genes contained more accelerated regions in the Weddell seal compared to genomic background. Top-significant genes were SUMO2 and EP300; both regulate hypoxia inducible factor signaling. Liver expression of four genes with the strongest acceleration signals differ between Weddell seals and a terrestrial mammal, sheep. We also report a high-density lipoprotein-like particle in Weddell seal serum not present in other mammals, including the shallow-diving harbor seal.
Article
Full-text available
The work mainly focused on a validation of the method for determining the content of salicylic acid and individual unknown impurities in new pharmaceutical product—tablets containing: 75, 100 or 150 mg of acetylsalicylic acid and glycine in the amount of 40 mg for each dosage. The separation of the components was carried out by means of HPLC, using a Waters Symmetry C18 column (4.6 × 250 mm, 5 μm) as the stationary phase. The mobile phase consisted of a mixture of 85% orthophosphoric acid, acetonitrile and purified water (2:400:600 V/V/V). Detection was carried out at a wavelength of 237 nm, with a constant flow rate of 1.0 ml min⁻¹. In order to verify the method, linearity, precision (repeatability and reproducibility), accuracy, specificity, range, robustness, system precision, stability of the test and standard solution, limit of quantification and forced degradation were determined. Validation tests were performed in accordance with ICH (International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use) guidelines. The method was validated successfully. It was confirmed that the method in a tested range of 0.005–0.40% salicylic acid with respect to acetylsalicylic acid content is linear, precise and accurate.
Article
Full-text available
Antarctic fur seals (AFS) are an ecologically important predator and a focal indicator species for ecosystem-based Antarctic fisheries management. This species suffered intensive anthropogenic exploitation until the early 1900s, but recolonized most of its former distribution, including the southern-most colony at Cape Shirreff, South Shetland Islands (SSI). The IUCN describes a single, global AFS population of least concern; however, extensive genetic analyses clearly identify four distinct breeding stocks, including one in the SSI. To update the population status of SSI AFS, we analyzed 20 years of field-based data including population counts, body size and condition, natality, recruitment, foraging behaviors, return rates, and pup mortality at the largest SSI colony. Our findings show a precipitous decline in AFS abundance (86% decrease since 2007), likely driven by leopard seal predation (increasing since 2001, p << 0.001) and potentially worsening summer foraging conditions. We estimated that leopard seals consumed an average of 69.3% (range: 50.3–80.9%) of all AFS pups born each year since 2010. AFS foraging-trip durations, an index of their foraging habitat quality, were consistent with decreasing krill and fish availability. Significant improvement in the age-specific over-winter body condition of AFS indicates that observed population declines are driven by processes local to the northern Antarctic Peninsula. The loss of SSI AFS would substantially reduce the genetic diversity of the species, and decrease its resilience to climate change. There is an urgent need to reevaluate the conservation status of Antarctic fur seals, particularly for the rapidly declining SSI population.
Article
Full-text available
The ancestors of marine mammals once roamed the land and independently committed to an aquatic lifestyle. These macroevolutionary transitions have intrigued scientists for centuries. Here, we generated high-quality genome assemblies of 17 marine mammals (11 cetaceans and six pinnipeds), including eight assemblies at the chromosome level. Incorporating previously published data, we reconstructed the marine mammal phylogeny and population histories and identified numerous idiosyncratic and convergent genomic variations that possibly contributed to the transition from land to water in marine mammal lineages. Genes associated with the formation of blubber (NFIA), vascular development (SEMA3E), and heat production by brown adipose tissue (UCP1) had unique changes that may contribute to marine mammal thermoregulation. We also observed many lineage-specific changes in the marine mammals, including genes associated with deep diving and navigation. Our study advances understanding of the timing, pattern, and molecular changes associated with the evolution of mammalian lineages adapting to aquatic life.
Article
Full-text available
tRNAscan-SE has been widely used for transfer RNA (tRNA) gene prediction for over twenty years, developed just as the first genomes were decoded. With the massive increase in quantity and phylogenetic diversity of genomes, the accurate detection and functional prediction of tRNAs has become more challenging. Utilizing a vastly larger training set, we created nearly one hundred specialized isotype- and clade-specific models, greatly improving tRNAscan-SE’s ability to identify and classify both typical and atypical tRNAs. We employ a new comparative multi-model strategy where predicted tRNAs are scored against a full set of isotype-specific covariance models, allowing functional prediction based on both the anticodon and the highest-scoring isotype model. Comparative model scoring has also enhanced the program's ability to detect tRNA-derived SINEs and other likely pseudogenes. For the first time, tRNAscan-SE also includes fast and highly accurate detection of mitochondrial tRNAs using newly developed models. Overall, tRNA detection sensitivity and specificity is improved for all isotypes, particularly those utilizing specialized models for selenocysteine and the three subtypes of tRNA genes encoding a CAU anticodon. These enhancements will provide researchers with more accurate and detailed tRNA annotation for a wider variety of tRNAs, and may direct attention to tRNAs with novel traits.
Article
Full-text available
Methods for evaluating the quality of genomic and metagenomic data are essential to aid genome assembly and to correctly interpret the results of subsequent analyses. BUSCO estimates the completeness and redundancy of processed genomic data based on universal single-copy orthologs. Here we present new functionalities and major improvements of the BUSCO software, as well as the renewal and expansion of the underlying datasets in sync with the OrthoDB v10 release. Among the major novelties, BUSCO now enables phylogenetic placement of the input sequence to automatically select the most appropriate dataset for the assessment, allowing the analysis of metagenome-assembled genomes of unknown origin. A newly-introduced genome workflow increases the efficiency and runtimes especially on large eukaryotic genomes. BUSCO is the only tool capable of assessing both eukaryotic and prokaryotic species, and can be applied to various data types, from genome assemblies and metagenomic bins, to transcriptomes and gene sets.
Article
Full-text available
With the advent of chromatin-interaction maps, chromosome-level genome assemblies have become a reality for a wide range of organisms. Scaffolding quality is, however, difficult to judge. To explore this gap, we generated multiple chromosome-scale genome assemblies of an emerging wild animal model for carcinogenesis, the California sea lion (Zalophus californianus). Short-read assemblies were scaffolded with two independent chromatin interaction mapping data (Hi-C and Chicago), and long-read assemblies with three data types (Hi-C, optical maps, and 10X linked reads) following the 'Vertebrate Genomes Project (VGP)' pipeline. In both approaches, 18 major scaffolds recovered the karyotype (2n=36), with scaffold N50s of 138 Mb and 147 Mb, respectively. Synteny relationships at the chromosome-level with other pinniped genomes (2n=32-36), ferret (2n=34), red panda (2n=36) and domestic dog (2n=78) were consistent across approaches and recovered known fissions and fusions. Comparative chromosome painting and multicolor chromosome tiling with a panel of 264 genome-integrated single-locus canine bacterial artificial chromosome (BAC) probes provided independent evaluation of genome organization. Broad-scale discrepancies between the approaches were observed within chromosomes, most commonly in translocations centered around centromeres and telomeres, which were better resolved in the VGP assembly. Genomic and cytological approaches agreed on near-perfect synteny of the X chromosome, and in combination allowed detailed investigation of autosomal rearrangements between dog and sea lion. This study presents high-quality genomes of an emerging cancer model and highlights that even highly fragmented short-read assemblies scaffolded with Hi-C can yield reliable chromosome level scaffolds suitable for comparative genomic analyses.
Article
Full-text available
High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
Article
Full-text available
Haplotype-resolved de novo assembly is the ultimate solution to the study of sequence variations in a genome. However, existing algorithms either collapse heterozygous alleles into one consensus copy or fail to cleanly separate the haplotypes to produce high-quality phased assemblies. Here we describe hifiasm, a de novo assembler that takes advantage of long high-fidelity sequence reads to faithfully represent the haplotype information in a phased assembly graph. Unlike other graph-based assemblers that only aim to maintain the contiguity of one haplotype, hifiasm strives to preserve the contiguity of all haplotypes. This feature enables the development of a graph trio binning algorithm that greatly advances over standard trio binning. On three human and five nonhuman datasets, including California redwood with a ~30-Gb hexaploid genome, we show that hifiasm frequently delivers better assemblies than existing tools and consistently outperforms others on haplotype-resolved assembly. Hifiasm is a haplotype-resolved de novo genome assembler for long-read high-fidelity sequencing data based on phased assembly graphs.
Article
Full-text available
Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way. Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact (i.e., sustainable) for the field, or even just one research group. We postulate that it is equally important to ensure adaptability and transparency. The former describes the ability to modify the analysis to answer extended or slightly different research questions. The latter describes the ability to understand the analysis in order to judge whether it is not only technically, but methodologically valid. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.
Article
Full-text available
Pinnipedia karyotype evolution was studied here using human, domestic dog, and stone marten whole-chromosome painting probes to obtain comparative chromosome maps among species of Odobenidae (Odobenus rosmarus), Phocidae (Phoca vitulina, Phoca largha, Phoca hispida, Pusa sibirica, Erignathus barbatus), and Otariidae (Eumetopias jubatus, Callorhinus ursinus, Phocarctos hookeri, and Arctocephalus forsteri). Structural and functional chromosomal features were assessed with telomere repeat and ribosomal-DNA probes and by CBG (C-bands revealed by barium hydroxide treatment followed by Giemsa staining) and CDAG (Chromomycin A3-DAPI after G-banding) methods. We demonstrated diversity of heterochromatin among pinniped karyotypes in terms of localization, size, and nucleotide composition. For the first time, an intrachromosomal rearrangement common for Otariidae and Odobenidae was revealed. We postulate that the order of evolutionarily conserved segments in the analyzed pinnipeds is the same as the order proposed for the ancestral Carnivora karyotype (2n = 38). The evolution of conserved genomes of pinnipeds has been accompanied by few fusion events (less than one rearrangement per 10 million years) and by novel intrachromosomal changes including the emergence of new centromeres and pericentric inversion/centromere repositioning. The observed interspecific diversity of pinniped karyotypes driven by constitutive heterochromatin variation likely has played an important role in karyotype evolution of pinnipeds, thereby contributing to the differences of pinnipeds’ chromosome sets.
Article
Full-text available
New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies1–3. For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database⁴ increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies⁵ are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus⁶, a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far.
Article
Full-text available
High density single nucleotide polymorphism (SNP) arrays allow large numbers of individuals to be rapidly and cost-effectively genotyped at large numbers of genetic markers. However, despite being widely used in studies of humans and domesticated plants and animals, SNP arrays are lacking for most wild organisms. We developed a custom 85K Affymetrix Axiom array for an intensively studied pinniped, the Antarctic fur seal ( Arctocephalus gazella ). SNPs were discovered from a combination of genomic and transcriptomic resources and filtered according to strict criteria. Out of a total of 85,359 SNPs tiled on the array, 75,601 (88.6%) successfully converted and were polymorphic in 270 animals from a breeding colony at Bird Island in South Georgia. Evidence was found for inbreeding, with three genomic inbreeding coefficients being strongly intercorrelated and the proportion of the genome in runs of homozygosity being non-zero in all individuals. Furthermore, analysis of genomic relatedness coefficients identified previously unknown first-degree relatives and multiple second-degree relatives among a sample of ostensibly unrelated individuals. Such "cryptic relatedness" within fur seal breeding colonies may increase the likelihood of consanguineous matings and could therefore have implications for understanding fitness variation and mate choice. Finally, we demonstrate the cross-amplification potential of the array in three related pinniped species. Overall, our SNP array will facilitate future studies of Antarctic fur seals and has the potential to serve as a more general resource for the wider pinniped research community.
Article
Full-text available
Understanding the effects of human exploitation on the genetic composition of wild populations is important for predicting species persistence and adaptive potential. We therefore investigated the genetic legacy of large-scale commercial harvesting by reconstructing, on a global scale, the recent demographic history of the Antarctic fur seal (Arctocephalus gazella), a species that was hunted to the brink of extinction by 18th and 19th century sealers. Molecular genetic data from over 2,000 individuals sampled from all eight major breeding locations across the species’ circumpolar geographic distribution, show that at least four relict populations around Antarctica survived commercial hunting. Coalescent simulations suggest that all of these populations experienced severe bottlenecks down to effective population sizes of around 150–200. Nevertheless, comparably high levels of neutral genetic variability were retained as these declines are unlikely to have been strong enough to deplete allelic richness by more than around 15%. These findings suggest that even dramatic short-term declines need not necessarily result in major losses of diversity, and explain the apparent contradiction between the high genetic diversity of this species and its extreme exploitation history.
Article
Full-text available
IQ-TREE (http://www.iqtree.org, last accessed February 6, 2020) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.
Article
Full-text available
Abstract Numerous studies have reported correlations between the heterozygosity of genetic markers and fitness. These heterozygosity–fitness correlations (HFCs) play a central role in evolutionary and conservation biology, yet their mechanistic basis remains open to debate. For example, fitness associations have been widely reported at both neutral and functional loci, yet few studies have directly compared the two, making it difficult to gauge the relative contributions of genome‐wide inbreeding and specific functional genes to fitness. Here, we compared the effects of neutral and immune gene heterozygosity on death from bacterial infection in Antarctic fur seal (Arctocephalus gazella) pups. We specifically developed a panel of 13 microsatellites from expressed immune genes and genotyped these together with 48 neutral loci in 234 individuals, comprising 39 pups that were classified at necropsy as having most likely died of bacterial infection together with a five times larger matched sample of healthy surviving pups. Identity disequilibrium quantified from the neutral markers was positive and significant, indicative of variance in inbreeding within the study population. However, multilocus heterozygosity did not differ significantly between healthy and infected pups at either class of marker, and little evidence was found for fitness associations at individual loci. These results support a previous study of Antarctic fur seals that found no effects of heterozygosity at nine neutral microsatellites on neonatal survival and thereby help to refine our understanding of how HFCs vary across the life cycle. Given that nonsignificant HFCs are underreported in the literature, we also hope that our study will contribute toward a more balanced understanding of the wider importance of this phenomenon.
Article
Full-text available
High-latitude ecosystems are among the fastest warming on the planet¹. Polar species may be sensitive to warming and ice loss, but data are scarce and evidence is conflicting2–4. Here, we show that, within their main population centre in the southwest Atlantic sector, the distribution of Euphausia superba (hereafter, ‘krill’) has contracted southward over the past 90 years. Near their northern limit, numerical densities have declined sharply and the population has become more concentrated towards the Antarctic shelves. A concomitant increase in mean body length reflects reduced recruitment of juvenile krill. We found evidence for environmental controls on recruitment, including a reduced density of juveniles following positive anomalies of the Southern Annular Mode. Such anomalies are associated with warm, windy and cloudy weather and reduced sea ice, all of which may hinder egg production and the survival of larval krill⁵. However, the total post-larval density has declined less steeply than the density of recruits, suggesting that survival rates of older krill have increased. The changing distribution is already perturbing the krill-centred food web⁶ and may affect biogeochemical cycling7,8. Rapid climate change, with associated nonlinear adjustments in the roles of keystone species, poses challenges for the management of valuable polar ecosystems³.
Article
Full-text available
Abstract The pinnipeds, which comprise seals, sea lions, and walruses, are a remarkable group of marine animals with unique adaptations to semi-aquatic life. However, their genomes are poorly characterized. In this study, we sequenced and characterized the genomes of three pinnipeds (Phoca largha, Callorhinus ursinus, and Eumetopias jubatus), focusing on site-wise sequence changes. We detected rapidly evolving genes in pinniped lineages and substitutions unique to pinnipeds associated with amphibious sound perception. Phenotypic convergence-related sequence convergences are not common in marine mammals. For example, FASN, KCNA5, and IL17RA contain substitutions specific to pinnipeds, yet are potential candidates of phenotypic convergence (blubber, response to hypoxia, and immunity to pathogens) in all marine mammals. The outcomes of this study will provide insight into targets for future studies of convergent evolution or gene function.
Article
Full-text available
OrthoDB (https://www.orthodb.org) provides evolutionary and functional annotations of orthologs. This update features a major scaling up of the resource coverage, sampling the genomic diversity of 1271 eukaryotes, 6013 prokaryotes and 6488 viruses. These include putative orthologs among 448 metazoan, 117 plant, 549 fungal, 148 protist, 5609 bacterial, and 404 archaeal genomes, picking up the best sequenced and annotated representatives for each species or operational taxonomic unit. OrthoDB relies on a concept of hierarchy of levels-of-orthology to enable more finely resolved gene orthologies for more closely related species. Since orthologs are the most likely candidates to retain functions of their ancestor gene, OrthoDB is aimed at narrowing down hypotheses about gene functions and enabling comparative evolutionary studies. Optional registered-user sessions allow on-line BUSCO assessments of gene set completeness and mapping of the uploaded data to OrthoDB to enable further interactive exploration of related annotations and generation of comparative charts. The accelerating expansion of genomics data continues to add valuable information, and OrthoDB strives to provide orthologs from the broadest coverage of species, as well as to extensively collate available functional annotations and to compute evolutionary annotations. The data can be browsed online, downloaded or assessed via REST API or SPARQL RDF compatible with both UniProt and Ensembl.
Article
Full-text available
Recent advances in high throughput sequencing have transformed the study of wild organisms by facilitating the generation of high quality genome assemblies and dense genetic marker datasets. These resources have the potential to significantly advance our understanding of diverse phenomena at the level of species, populations and individuals, ranging from patterns of synteny through rates of linkage disequilibrium (LD) decay and population structure to individual inbreeding. Consequently, we used PacBio sequencing to refine an existing Antarctic fur seal (Arctocephalus gazella) genome assembly and genotyped 83 individuals from six populations using restriction site associated DNA (RAD) sequencing. The resulting hybrid genome comprised 6,169 scaffolds with an N50 of 6.21 Mb and provided clear evidence for the conservation of large chromosomal segments between the fur seal and dog (Canis lupus familiaris). Focusing on the most extensively sampled population of South Georgia, we found that LD decayed rapidly, reaching the background level by around 400 kb, consistent with other vertebrates but at odds with the notion that fur seals experienced a strong historical bottleneck. We also found evidence for population structuring, with four main Antarctic island groups being resolved. Finally, appreciable variance in individual inbreeding could be detected, reflecting the strong polygyny and site fidelity of the species. Overall, our study contributes important resources for future genomic studies of fur seals and other pinnipeds while also providing a clear example of how high throughput sequencing can generate diverse biological insights at multiple levels of organisation.
Article
Full-text available
Here we present Singularity, software developed to bring containers and reproducibility to scientific computing. Using Singularity containers, developers can work in reproducible environments of their choosing and design, and these complete environments can easily be copied and executed on other platforms. Singularity is an open source initiative that harnesses the expertise of system and software engineers and researchers alike, and integrates seamlessly into common workflows for both of these groups. As its primary use case, Singularity brings mobility of computing to both users and HPC centers, providing a secure means to capture and distribute software and compute environments. This ability to create and deploy reproducible environments across these centers, a previously unmet need, makes Singularity a game changing development for computational science.
Article
Full-text available
The brain of diving mammals tolerates low oxygen conditions better than the brain of most terrestrial mammals. Previously, it has been demonstrated that the neurons in brain slices of the hooded seal (Cystophora cristata) withstand hypoxia longer than those of mouse, and also tolerate reduced glucose supply and high lactate concentrations. This tolerance appears to be accompanied by a shift in the oxidative energy metabolism to the astrocytes in the seal while in terrestrial mammals the aerobic energy production mainly takes place in neurons. Here, we used RNA-Seq to compare the effect of hypoxia and reoxygenation in vitro on brain slices from the visual cortex of hooded seals. We saw no general reduction of gene expression, suggesting that the response to hypoxia and reoxygenation is an actively regulated process. The treatments caused the preferential upregulation of genes related to inflammation, as found before e.g. in stroke studies using mammalian models. Gene ontology and KEGG pathway analyses showed a downregulation of genes involved in ion transport and other neuronal processes, indicative for a neuronal shutdown in response to a shortage of O2 supply. These differences may be interpreted in terms of an energy saving strategy in the seal's brain. We specifically analyzed the regulation of genes involved in energy metabolism. Hypoxia and reoxygenation caused a similar response, with upregulation of genes involved in glucose metabolism and downregulation of the components of the pyruvate dehydrogenase complex. We also observed upregulation of the monocarboxylate transporter Mct4, suggesting increased lactate efflux. Together, these data indicate that the seal brain responds to the hypoxic challenge by a relative increase in the anaerobic energy metabolism.
Article
Full-text available
FASTA and FASTQ are basic and ubiquitous formats for storing nucleotide and protein sequences. Common manipulations of FASTA/Q file include converting, searching, filtering , deduplication, splitting, shuffling, and sampling. Existing tools only implement some of these manipulations, and not particularly efficiently, and some are only available for certain operating systems. Furthermore, the complicated installation process of required packages and running environments can render these programs less user friendly. This paper describes a cross-platform ultrafast comprehensive toolkit for FASTA/Q processing. SeqKit provides executable binary files for all major operating systems, including Windows, Linux, and Mac OSX, and can be directly used without any dependencies or pre-configurations. SeqKit demonstrates competitive performance in execution time and memory usage compared to similar tools. The efficiency and usability of SeqKit enable researchers to rapidly accomplish common FASTA/Q file manipulations. SeqKit is open source and available on Github at https://github.com/shenwei356/seqkit.
Article
Full-text available
Custom genotyping arrays provide a flexible and accurate means of genotyping single nucleotide polymorphisms (SNPs) in a large number of individuals of essentially any organism. However, validation rates, defined as the proportion of putative SNPs that are verified to be polymorphic in a population, are often very low. A number of potential causes of assay failure have been identified, but none have been explored systematically. In particular, as SNPs are often developed from transcriptomes, parameters relating to the genomic context are rarely taken into account. Here, we assembled a draft Antarctic fur seal (Arctocephalus gazella) genome (assembly size: 2.41Gb; scaffold/contig N50 : 3.1Mb/27.5kb). We then used this resource to map the probe sequences of 144 putative SNPs genotyped in 480 individuals. The number of probe-to-genome mappings and alignment length together explained almost a third of the variation in validation success, indicating that sequence uniqueness and proximity to intron-exon boundaries play an important role. The same pattern was found after mapping the probe sequences to the Walrus and Weddell seal genomes, suggesting that the genomes of species divergent by as much as 23 million years can hold information relevant to SNP validation outcomes. Additionally, re-analysis of genotyping data from seven previous studies found the same two variables to be significantly associated with SNP validation success across a variety of taxa. Finally, our study reveals considerable scope for validation rates to be improved, either by simply filtering for SNPs whose flanking sequences align uniquely and completely to a reference genome, or through predictive modeling. This article is protected by copyright. All rights reserved.
Article
Full-text available
Significance Understanding olfactory communication in natural vertebrate populations requires knowledge of how genes and the environment influence highly complex individual chemical fingerprints. To understand how relevant information is chemically encoded and may feed into mother–offspring recognition, we therefore generated chemical and genetic data for Antarctic fur seal mother–pup pairs. We show that pups are chemically highly similar to their mothers, reflecting a combination of genetic and environmental influences. We also reveal associations between chemical fingerprints and both genetic quality and relatedness, the former correlating positively with substance diversity and the latter encoded mainly by a small subset of substances. Dissecting apart chemical fingerprints to reveal subsets of potential biological relevance has broad implications for understanding vertebrate chemical communication.
Article
Full-text available
UniProt is an important collection of protein sequences and their annotations, which has doubled in size to 80 million sequences during the past year. This growth in sequences has prompted an extension of UniProt accession number space from 6 to 10 characters. An increasing fraction of new sequences are identical to a sequence that already exists in the database with the majority of sequences coming from genome sequencing projects. We have created a new proteome identifier that uniquely identifies a particular assembly of a species and strain or subspecies to help users track the provenance of sequences. We present a new website that has been designed using a user-experience design process. We have introduced an annotation score for all entries in UniProt to represent the relative amount of knowledge known about each protein. These scores will be helpful in identifying which proteins are the best characterized and most informative for comparative analysis. All UniProt data is provided freely and is available on the web at http://www.uniprot.org/.
Article
Full-text available
Long-range and highly accurate de novo assembly from short-read data is one of the most pressing challenges in genomics. Recently, it has been shown that read pairs generated by proximity ligation of DNA in chromatin of living tissue can address this problem. These data dramatically increase the scaffold contiguity of assemblies and provide haplotype phasing information. Here, we describe a simpler approach ("Chicago") based on in vitro reconstituted chromatin. We generated two Chicago datasets with human DNA and used a new software pipeline ("HiRise") to construct a highly accurate de novo assembly and scaffolding of a human genome with scaffold N50 of 30 Mb. We also demonstrated the utility of Chicago for improving existing assemblies by re-assembling and scaffolding the genome of the American alligator. With a single library and one lane of Illumina HiSeq sequencing, we increased the scaffold N50 of the American alligator from 508 kb to 10 Mb. Our method uses established molecular biology procedures and can be used to analyze any genome, as it requires only about 5 micrograms of DNA as the starting material.
Article
Full-text available
OPEN ACCESS: http://www.nature.com/ng/journal/vaop/ncurrent/pdf/ng.3198.pdf Marine mammals from different mammalian orders share several phenotypic traits adapted to the aquatic environment and therefore represent a classic example of convergent evolution. To investigate convergent evolution at the genomic level, we sequenced and performed de novo assembly of the genomes of three species of marine mammals (the killer whale, walrus and manatee) from three mammalian orders that share independently evolved phenotypic adaptations to a marine existence. Our comparative genomic analyses found that convergent amino acid substitutions were widespread throughout the genome and that a subset of these substitutions were in genes evolving under positive selection and putatively associated with a marine phenotype. However, we found higher levels of convergent amino acid substitutions in a control set of terrestrial sister taxa to the marine mammals. Our results suggest that, whereas convergent molecular evolution is relatively common, adaptive molecular convergence linked to phenotypic convergence is comparatively rare.
Article
Full-text available
The ordering and orientation of genomic scaffolds to reconstruct chromosomes is an essential step during de novo genome assembly. Because this process utilizes various mapping techniques that each provides an independent line of evidence, a combination of multiple maps can improve the accuracy of the resulting chromosomal assemblies. We present ALLMAPS, a method capable of computing a scaffold ordering that maximizes colinearity across a collection of maps. ALLMAPS is robust against common mapping errors, and generates sequences that are maximally concordant with the input maps. ALLMAPS is a useful tool in building high-quality genome assemblies. ALLMAPS is available at: https://github.com/tanghaibao/jcvi/wiki/ALLMAPS.
Article
Full-text available
Global environmental change is expected to alter selection pressures in many biological systems, but the long-term molecular and life history data required to quantify changes in selection are rare. An unusual opportunity is afforded by three decades of individual-based data collected from a declining population of Antarctic fur seals in the South Atlantic. Here, climate change has reduced prey availability and caused a significant decline in seal birth weight. However, the mean age and size of females recruiting into the breeding population are increasing. We show that such females have significantly higher heterozygosity (a measure of within-individual genetic variation) than their non-recruiting siblings and their own mothers. Thus, breeding female heterozygosity has increased by 8.5% per generation over the last two decades. Nonetheless, as heterozygosity is not inherited from mothers to daughters, substantial heterozygote advantage is not transmitted from one generation to the next and the decreasing viability of homozygous individuals causes the population to decline. Our results provide compelling evidence that selection due to climate change is intensifying, with far-reaching consequences for demography as well as phenotypic and genetic variation.
Article
Full-text available
Sequence alignments are the starting point for most evolutionary and comparative analyses. Full genome sequences can be compared to study patterns of within and between species variation. Genome sequence alignments are complex structures containing information such as coordinates, quality scores and synteny structure, which are stored in Multiple Alignment Format (MAF) files. Processing these alignments therefore involves parsing and manipulating typically large MAF files in an efficient way. MafFilter is a command-line program written in C++ that enables the processing of genome alignments stored in the Multiple Alignment Format in an efficient and extensible manner. It provides an extensive set of tools which can be parametrized and combined by the user via option files. We demonstrate the software's functionality and performance on several biological examples covering Primate genomics and fungal population genomics. Examples analyses involve window-based alignment filtering, feature extractions and various statistics, phylogenetics and population genomics calculations. MafFilter is a highly efficient and flexible tool to analyse multiple genome alignments. By allowing the user to combine a large set of available methods, as well as designing his/her own, it enables the design of custom data filtering and analysis pipelines for genomic studies. MafFilter is an open source software available at http://bioweb.me/maffilter.
Article
Full-text available
It is now common for population geneticists to estimate FST for a large number of loci across the genome, before testing for selected loci as being outliers to the FST distribution. One surprising result of such FST scans is the often high proportion (>1% and sometimes >10%) of outliers detected, and this is often interpreted as evidence for pervasive local adaptation. In this issue of Molecular Ecolog, Fourcade et al. (2013) observe that a particularly high rate of FST outliers has often been found in river organisms, such as fishes or damselflies, despite there being no obvious reason why selection should affect a larger proportion of the genomes of these organisms. Using computer simulations, Fourcade et al. (2013) show that the strong correlation in co-ancestry produced in long onedimensional landscapes (such as rivers, valleys, peninsulas, oceanic ridges or coastlines) greatly increases the neutral variance in FST, especially when the landscape is further reticulated into fractal networks. As a consequence, outlier tests have a high rate of false positives, unless this correlation can be taken into account. Fourcade et al.'s study highlights an extreme case of the general problem, first noticed by Robertson (1975a,b) and Nei & Maruyama (1975), that correlated co-ancestry inflates the neutral variance in FST when compared to its expectation under an island model of population structure. Similar warnings about the validity of outlier tests have appeared regularly since then but have not been widely cited in the recent genomics literature. We further emphasize that FST outliers can arise in many different ways and that outlier tests are not designed for situations where the genetic architecture of local adaptation involves many loci.
Article
The Gene Ontology (GO) knowledgebase (http://geneontology.org) is a comprehensive resource concerning the functions of genes and gene products (proteins and noncoding RNAs). GO annotations cover genes from organisms across the tree of life as well as viruses, though most gene function knowledge currently derives from experiments carried out in a relatively small number of model organisms. Here, we provide an updated overview of the GO knowledgebase, as well as the efforts of the broad, international consortium of scientists that develops, maintains, and updates the GO knowledgebase. The GO knowledgebase consists of three components: (1) the GO—a computational knowledge structure describing the functional characteristics of genes; (2) GO annotations—evidence-supported statements asserting that a specific gene product has a particular functional characteristic; and (3) GO Causal Activity Models (GO-CAMs)—mechanistic models of molecular “pathways” (GO biological processes) created by linking multiple GO annotations using defined relations. Each of these components is continually expanded, revised, and updated in response to newly published discoveries and receives extensive QA checks, reviews, and user feedback. For each of these components, we provide a description of the current contents, recent developments to keep the knowledgebase up to date with new discoveries, and guidance on how users can best make use of the data that we provide. We conclude with future directions for the project.
Article
With environmental change, understanding how species recover from overharvesting and maintain viable populations is central to ecosystem restoration. Here, we reconstruct 90 years of recovery trajectory of the Antarctic fur seal at South Georgia (S.W. Atlantic), a key indicator species in the krill‐based food webs of the Southern Ocean. After being harvested to commercial extinction by 1907, this population rebounded and now constitutes the most abundant otariid in the World. However, its status remains uncertain due to insufficient and conflicting data, and anthropogenic pressures affecting Antarctic krill, an essential staple for millions of fur seals and other predators. Using integrated population models, we estimated simultaneously the long‐term abundance for Bird Island, northwest South Georgia, epicentre of recovery of the species after sealing, and population adjustments for survey counts with spatiotemporal applicability. Applied to the latest comprehensive survey data, we estimated the population at South Georgia in 2007–2009 as 3,510,283 fur seals [95% CI: 3,140,548–3,919,604] (ca. 98% of global population), after 40 years of maximum growth and range expansion owing to an abundant krill supply. At Bird Island, after 50 years of exponential growth followed by 25 years of slow stable growth, the population collapsed in 2009 and has thereafter declined by −7.2% [−5.2, −9.1] per annum, to levels of the 1970s. For the instrumental record, this trajectory correlates with a time‐varying relationship between coupled climate and sea surface temperature cycles associated with low regional krill availability, although the effects of increasing krill extraction by commercial fishing and natural competitors remain uncertain. Since 2015, fur seal longevity and recruitment have dropped, sexual maturation has retarded, and population growth is expected to remain mostly negative and highly variable. Our analysis documents the rise and fall of a key Southern Ocean predator over a century of profound environmental and ecosystem change.
Article
Pangenome references address biases of reference genomes by storing a representative set of diverse haplotypes and their alignment, usually as a graph. Alternate alleles determined by variant callers can be used to construct pangenome graphs, but advances in long-read sequencing are leading to widely available, high-quality phased assemblies. Constructing a pangenome graph directly from assemblies, as opposed to variant calls, leverages the graph's ability to represent variation at different scales. Here we present the Minigraph-Cactus pangenome pipeline, which creates pangenomes directly from whole-genome alignments, and demonstrate its ability to scale to 90 human haplotypes from the Human Pangenome Reference Consortium. The method builds graphs containing all forms of genetic variation while allowing use of current mapping and genotyping tools. We measure the effect of the quality and completeness of reference genomes used for analysis within the pangenomes and show that using the CHM13 reference from the Telomere-to-Telomere Consortium improves the accuracy of our methods. We also demonstrate construction of a Drosophila melanogaster pangenome.
Article
Different parts of the genome can vary widely in their evolutionary histories and sequence divergence from other species. Indeed, some of the most interesting biology (e.g., hybridization, horizontal gene transfer, variable mutation rates across the genome) is revealed by the discordant relationships between taxa across the genome. The goal for much of evolutionary genetics is centred on understanding the evolutionary processes by which such varied signatures arise and are maintained. Many evolutionary genetics studies seek to identify signatures of positive selection between two closely related ecotypes or taxa by delineating regions with particularly high divergence relative to a genome‐wide average, often termed “divergence outliers.” In a From the Cover article in this issue of Molecular Ecology, Booker et al. take a major step forward in showing that recombination rate differences are sufficient to create false positive divergence outliers, even under neutrality. They demonstrate that the variance of genome scan metrics is especially high in regions with low recombination rates, consistent with previous work. Furthermore, they show that both relative and absolute measures of divergence (FST and DXY, respectively) as well as other commonly used statistics in genome scans (e.g., πW, Tajima's D and H12) all have similar covariance between variance and local recombination rate. Finally, Booker et al. show that low recombination regions will tend to produce more outliers if genome‐wide averages are used as cut‐offs to define genomic outliers. Booker et al.’s results suggest that recombination rate variation, even under neutral conditions, can shape genome scans for selection, and this important variable can no longer be ignored.
Article
Sexual reproduction involves a cascade of molecular interactions between the sperm and the egg culminating in cell–cell fusion. Vital steps mediating fertilization include chemoattraction of the sperm to the egg, induction of the sperm acrosome reaction, dissolution of the egg coat, and sperm–egg plasma membrane binding and fusion. Despite decades of research, only a handful of interacting gamete recognition proteins (GRPs) have been identified across taxa mediating each of these steps, most notably in abalone, sea urchins, and mammals. This review outlines and compares notable GRP pairs mediating sperm–egg recognition in these three significant model systems and discusses the molecular basis of species‐specific fertilization driven by GRP function. In addition, we explore the evolutionary theory behind the rapid diversification of GRPs between species. In particular, we focus on how the coevolution between interacting sperm and egg proteins may contribute to the formation of boundaries to hybridization. Finally, we discuss how pairing structural information with evolutionary insights can improve our understanding of mechanisms of fertilization and their origins. Highlights • Comparisons of fertilization mechanisms between abalone, sea urchins, and mammals can lead to new insights into sperm‐egg recognition. • Interacting sperm and egg proteins undergo rapid diversification, potentially driven by their coevolution.
Article
Recombination is a central biological process with implications for many areas in the life sciences. Yet we are only beginning to appreciate variation in the recombination rate along the genome and among individuals, populations and species. Spurred by technological advances, we are now able to bring variation in this key biological parameter to centre stage. Here, we review the conceptual implications of recombination rate variation and guide the reader through the assumptions, strengths and weaknesses of genomic inference methods, including population-based, pedigree-based and gamete-based approaches. Appreciation of the differences and commonalities of these approaches is a prerequisite to formulate a unifying and comparative framework for understanding the molecular and evolutionary mechanisms shaping, and being shaped by, recombination.
Article
Article
Recent genetic studies of natural populations have shown that heterozygosity and other genetic estimates of parental relatedness correlate with a wide variety of fitness traits, from juvenile survival and parasite resistance to male reproductive success. Many of these traits involve health and survival, where the underlying mechanism may involve changes in the effectiveness of the immune system. However, for traits such as reproductive success, the likely mechanisms remain less obvious. In this paper, we examine the relationship between heterozygosity and a range of traits that contribute to male reproductive success, including time spent on territories and competitiveness. Our analysis is based on observational and genetic data from eight consecutive breeding seasons at a colony of the Antarctic fur seal. Arctocephalus gazella. Overall, male reproductive Success was found to correlate strongly with internal relatedness (IR, a form of heterozygosity). When different components of success were analyzed, we found that IR correlates independently with reproductive longevity, time spent ashore, and competitive ability per unit mating opportunity on the Study beach, with more heterozygous males being more successful. Behavioral observations were sufficiently detailed to allow examination of how daily mean IR values for males present on the beach varied within seasons and from year to year. Again, significant variation was found both among and within seasons, with more homozygous males appearing less able to hold territories in poor seasons when pup production is low and, within a season. at both the start of the season and to some extent around the peak of female estrus. Finally, we tested whether the benefits of high heterozygosity are due mainly to a genomewide effect (e.g. inbreeding depression) or to single locus heterosis by asking whether the relationship between IR and male success was robust to the removal of any single locus or to any pair of loci. Since the relationship remained significant in all cases, we favor a multilocus explanation for the effects we report.
Article
Many vertebrates are challenged by either chronic or acute episodes of low oxygen availability in their natural environments. Brain function is especially vulnerable to the effects of hypoxia and can be irreversibly impaired by even brief periods of low oxygen supply. This review describes recent research on physiological mechanisms that have evolved in certain vertebrate species to cope with brain hypoxia. Four model systems are considered: freshwater turtles that can survive for months trapped in frozen-over lakes, arctic ground squirrels that respire at extremely low rates during winter hibernation, seals and whales that undertake breath-hold dives lasting minutes to hours, and naked mole-rats that live in crowded burrows completely underground for their entire lives. These species exhibit remarkable specializations of brain physiology that adapt them for acute or chronic episodes of hypoxia. These specializations may be reactive in nature, involving modifications to the catastrophic sequelae of oxygen deprivation that occur in non-tolerant species, or preparatory in nature, preventing the activation of those sequelae altogether. Better understanding of the mechanisms used by these hypoxia-tolerant vertebrates will increase appreciation of how nervous systems are adapted for life in specific ecological niches as well as inform advances in therapy for neurological conditions such as stroke and epilepsy.