ArticlePDF Available

A Comparison of Whole Genome Sequencing of SARS-CoV-2 Using Amplicon-Based Sequencing, Random Hexamers, and Bait Capture

Authors:

Abstract and Figures

Genome sequencing of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is increasingly important to monitor the transmission and adaptive evolution of the virus. The accessibility of high-throughput methods and polymerase chain reaction (PCR) has facilitated a growing ecosystem of protocols. Two differing protocols are tiling multiplex PCR and bait capture enrichment. Each method has advantages and disadvantages but a direct comparison with different viral RNA concentrations has not been performed to assess the performance of these approaches. Here we compare Liverpool amplification, ARTIC amplification, and bait capture using clinical diagnostics samples. All libraries were sequenced using an Illumina MiniSeq with data analyzed using a standardized bioinformatics workflow (SARS-CoV-2 Illumina GeNome Assembly Line; SIGNAL). One sample showed poor SARS-CoV-2 genome coverage and consensus, reflective of low viral RNA concentration. In contrast, the second sample had a higher viral RNA concentration, which yielded good genome coverage and consensus. ARTIC amplification showed the highest depth of coverage results for both samples, suggesting this protocol is effective for low concentrations. Liverpool amplification provided a more even read coverage of the SARS-CoV-2 genome, but at a lower depth of coverage. Bait capture enrichment of SARS-CoV-2 cDNA provided results on par with amplification. While only two clinical samples were examined in this comparative analysis, both the Liverpool and ARTIC amplification methods showed differing efficacy for high and low concentration samples. In addition, amplification-free bait capture enriched sequencing of cDNA is a viable method for generating a SARS-CoV-2 genome sequence and for identification of amplification artifacts.
Content may be subject to copyright.
viruses
Article
A Comparison of Whole Genome Sequencing of
SARS-CoV-2 Using Amplicon-Based Sequencing,
Random Hexamers, and Bait Capture
Jalees A. Nasir 1, 2, , Robert A. Kozak 3, , Patryk Aftanas 3, Amogelang R. Raphenya 1,2 ,
Kendrick M. Smith 4, Finlay Maguire 5, Hassaan Maan 6, Muhannad Alruwaili 7,
Arinjay Banerjee 1,8,9 , Hamza Mbareche 3, 10, Brian P. Alcock 1,2, Natalie C. Knox 11, 12,
Karen Mossman 1,8,9 , Bo Wang 6,13, 14, Julian A. Hiscox 7, Andrew G. McArthur 1 ,2 ,*
and Samira Mubareka 3, 10
1Michael G. DeGroote Institute for Infectious Disease Research, McMaster University,
Hamilton, ON L8S 4K1, Canada; nasirja@mcmaster.ca (J.A.N.); raphenar@mcmaster.ca (A.R.R.);
banera9@mcmaster.ca (A.B.); alcockbp@mcmaster.ca (B.P.A.); mossk@mcmaster.ca (K.M.)
2Department of Biochemistry and Biomedical Sciences, McMaster University,
Hamilton, ON L8S 4K1, Canada
3Division of Microbiology, Department of Laboratory Medicine and Molecular Diagnostics,
Sunnybrook Health Sciences Centre, Toronto, ON M4N 3M5, Canada; rob.kozak@sunnybrook.ca (R.A.K.);
patryk.aftanas@sri.utoronto.ca (P.A.); hamza.mbareche@sri.utoronto.ca (H.M.);
Samira.Mubareka@sunnybrook.ca (S.M.)
4
Perimeter Institute for Theoretical Physics,
Waterloo, ON N2L 2Y5, Canada
; kmsmith@perimeterinstitute.ca
5Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 4R2, Canada;
finlaymaguire@gmail.com
6Peter Munk Cardiac Centre, University Health Network, Toronto, ON M5G 2N2, Canada;
hmaan@uoguelph.ca (H.M.); bowang@vectorinstitute.ai (B.W.)
7Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool L69 3BX, UK;
Muhannad.Alruwaili@liverpool.ac.uk (M.A.); Julian.Hiscox@liverpool.ac.uk (J.A.H.)
8Department of Pathology and Molecular Medicine, McMaster University, Hamilton, ON L8S 4K1, Canada
9McMaster Immunology Research Centre, McMaster University, Hamilton, ON L8S 4K1, Canada
10
Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON M5S 1A1, Canada
11 National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB R3E 3M4, Canada;
natalie.knox@canada.ca
12 Department of Medical Microbiology and Infectious Diseases, University of Manitoba,
Winnipeg, MB R3T 2N2, Canada
13 Department of Medical Biophysics, University of Toronto, Toronto, ON M5S 1A1, Canada
14 Vector Institute for Artificial Intelligence, Toronto, ON M5G 1M1, Canada
*Correspondence: mcarthua@mcmaster.ca
These authors contributed equally to this work.
Received: 31 July 2020; Accepted: 12 August 2020; Published: 15 August 2020


Abstract:
Genome sequencing of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)
is increasingly important to monitor the transmission and adaptive evolution of the virus.
The accessibility of high-throughput methods and polymerase chain reaction (PCR) has facilitated a
growing ecosystem of protocols. Two diering protocols are tiling multiplex PCR and bait capture
enrichment. Each method has advantages and disadvantages but a direct comparison with dierent
viral RNA concentrations has not been performed to assess the performance of these approaches.
Here we compare Liverpool amplification, ARTIC amplification, and bait capture using clinical
diagnostics samples. All libraries were sequenced using an Illumina MiniSeq with data analyzed
using a standardized bioinformatics workflow (SARS-CoV-2 Illumina GeNome Assembly Line;
SIGNAL). One sample showed poor SARS-CoV-2 genome coverage and consensus, reflective of
low viral RNA concentration. In contrast, the second sample had a higher viral RNA concentration,
Viruses 2020,12, 895; doi:10.3390/v12080895 www.mdpi.com/journal/viruses
Viruses 2020,12, 895 2 of 13
which yielded good genome coverage and consensus. ARTIC amplification showed the highest depth
of coverage results for both samples, suggesting this protocol is eective for low concentrations.
Liverpool amplification provided a more even read coverage of the SARS-CoV-2 genome, but at a
lower depth of coverage. Bait capture enrichment of SARS-CoV-2 cDNA provided results on par with
amplification. While only two clinical samples were examined in this comparative analysis, both the
Liverpool and ARTIC amplification methods showed diering ecacy for high and low concentration
samples. In addition, amplification-free bait capture enriched sequencing of cDNA is a viable method
for generating a SARS-CoV-2 genome sequence and for identification of amplification artifacts.
Keywords: SARS-CoV-2; genome sequencing; bait capture; amplicon sequencing
1. Introduction
The ongoing pandemic of COVID-19 has infected over 20 million people globally, of which over
750,000 have died (as of 13 August 2020) [
1
]. COVID-19 is caused by severe acute respiratory syndrome
coronavirus 2 (SARS-CoV-2), a novel coronavirus, which emerged in December 2019 [
2
]. As with any
outbreak of a novel pathogen, diagnostics are critical to assess infection in humans and to monitor
the extent of the spread of the pathogen. Critical components of outbreak analysis and pathogen
identification are second generation high-throughput short-read sequencing and third generation
long-read sequencing [
3
,
4
]. For COVID-19, the rapid development of diagnostic polymerase chain
reaction (PCR) was facilitated by the availability of genome sequences of SARS-CoV-2 isolates [
4
,
5
].
In addition, sequencing enables continuous monitoring of circulating strains of the virus to determine
any adaptive changes that the virus may accumulate, which may aect its detection, transmissibility,
and pathogenicity [
6
]. Sequencing will also serve an important function as antiviral and vaccine trials
roll out, identifying antiviral resistance determinants and vaccine escape mutants, and is essential
for detecting viral recombination. For reliable determination of genomic sequences, it is important
to have high quality starting genetic material, such as RNA from cultured SARS-CoV-2. Patient
samples, such as mid-turbinate swabs, may contain other viruses including seasonal coronaviruses
and are also dominated by host genetic material and resident respiratory flora. It is thus imperative
to evaluate the performance of genomic amplification and sequencing protocols needed to enhance
the derivation of SARS-CoV-2 specific genomic data. Two methods have been widely adopted to
obtain SARS-CoV-2 genome sequences from patient samples: (1) the use of SARS-CoV-2 specific PCR
primers (tiling multiplex PCR) [
7
] and (2) the use of bait capture to enrich the SARS-CoV-2 genomic
material [
8
10
]. These processes have their own advantages and disadvantages. Tiling multiplex PCR
allows for the amplification of numerous viral amplicons but can introduce synthetic artifacts with
subsequent cycles. Moreover, divergence from PCR primer sequences can result in suboptimal binding
resulting in lost information on genetic diversity or o-target hybridization. Alternatively, bait capture
enriches viral RNA by reducing the quantity of non-viral nucleotides, subsequently shrinking the total
sequencing volume of the sample. However, the generation of optimal baits requires prior knowledge
of the target virus, which is limited in the response against a novel virus. The primary objective of
this analysis is to compare genome sequencing results from direct amplification of the SARS-CoV-2
genome (i.e., the Liverpool or ARTIC PCR protocols) [
7
] with bait capture enrichment from COVID-19
patient swabs with markedly dierent viral RNA concentrations. Secondarily, we perform a genomic
analysis for a) genetic relatedness and b) diagnostic PCR primer mismatch.
Viruses 2020,12, 895 3 of 13
2. Methods
2.1. Clinical Isolates
Material from mid-turbinate swabs was collected from patients returning from travel during the
last week of January and the last week of February 2020. One patient was hospitalized [
11
] and the
other was managed as an outpatient with a less severe disease; both recovered. Diagnostic testing [
12
]
was performed at Public Health Ontario and the results were confirmed at the National Microbiology
Laboratory, Winnipeg, Manitoba. This work was approved by the Sunnybrook Institute Research
Ethics Board (amendment to 149–1994, 2 March 2020).
2.2. Genome Sequencing
Total nucleic acid was extracted from each mid-turbinate swab using the QIAamp Viral RNA Mini
kit (Qiagen, Hilden, Germany) without the addition of the carrier RNA. dsDNA for sequencing the
library preparation was synthesized using either the Liverpool SARS-CoV-2 amplification protocol
7
,
ARTIC SARS-CoV-2 amplification protocol (as described in https://artic.network/ncov-2019) [
7
],
or random priming using the Maxima H Minus Double Stranded cDNA Synthesis Kit (Thermo Fisher
Scientific, Waltham, MA, USA) with 2.5
µ
M random hexamers following the manufacturer’s protocol.
For the latter, in a PCR tube 1
µ
L of Random Primer Mix (ProtoScript II First Strand cDNA Synthesis
Kit, NEB, Ipswich, MA, USA) was added to 7
µ
L extracted RNA and denatured on a SimpliAmp
thermal cycler (Thermo Fisher Scientific, Waltham, MA, USA) at 65
C for 5 min and then incubated on
ice. Ten
µ
L 2X ProtoScript II Reaction Mix and 2
µ
L 10X ProtoScript II Enzyme Mix were then added to
the denatured sample and cDNA synthesis was performed using the following conditions: 25
C for
5 min, 48 C for 15 min, and 80 C for 5 min.
For the Liverpool protocol, primer sequences designed to overlap and amplify the entire
SARS-CoV-2 genome in two 15-plex reactions were generously shared by Public Health England. Two
100
µ
M primer pools were prepared by combining primer pairs in an alternating fashion to prevent
amplification of overlapping regions in a single reaction. After cDNA synthesis, in a new PCR tube
2.5
µ
L cDNA was combined with 12.5
µ
L Q5 High-Fidelity 2X Master Mix (NEB, Ipswich, MA, USA),
8.9
µ
L nuclease free water (Thermo Fisher Scientific, Waltham, MA, USA), and 1.1
µ
L of 100
µ
M primer
pool #1 or #2. PCR cycling was then performed as follows: 98
C for 30 sec followed by 40 cycles of
98 C for 15 sec and 65 C for 5 min.
For the ARTIC protocol, 1
µ
L Random Primer Mix (ProtoScript II First Strand cDNA Synthesis
Kit, NEB, Ipswich, MA, USA) and 1
µ
L 10mM dNTP mix (NEB, Ipswich, MA, USA) was added to
8
µ
L extracted RNA and denatured on SimpliAmp thermal cycler (Thermo Fisher Scientific, Waltham,
MA, USA) at 65
C for 5 min and then incubated on ice. 12.5
µ
L 2X ProtoScript II Reaction Mix and
2.5
µ
L 10X ProtoScript II Enzyme Mix were then added to the denatured sample and cDNA synthesis
performed using the following conditions: 25
C for 5 min, 42
C for 50 min and 80
C for 5 min. After
cDNA synthesis, in a new PCR tube 2.5
µ
L cDNA was combined with 12.5
µ
L Q5 High-Fidelity 2X
Master Mix (NEB, Ipswich, MA, USA). To pool #1 mix 5.87
µ
L nuclease free water (Thermo Fisher
Scientific, Waltham, MA, USA), and 4.13
µ
L of 10
µ
M ARTIC version 3 primer pool #1 was added. To
pool #2 mix 5.95
µ
L nuclease free water (Thermo Fisher Scientific, Waltham, MA, USA), and 4.05
µ
L of
10
µ
M ARTIC version 3 primer pool #2 was added. PCR cycling was then performed as follows: 98
C
for 30 sec followed by 35 cycles of 98 C for 15 sec and 65 C for 5 min.
cDNA synthesis (hexamers only) and PCR reactions (Liverpool amplicons) were purified using
RNAClean XP (Beckman Coulter, Brea, CA, USA) at 1.8x bead to amplicon ratio and eluted in 30
µ
L.
Combined ARTIC amplicons were purified at 1.0x bead to amplicon ratio and eluted in 30
µ
L. Two
µ
L
of amplified material was quantified using a Qubit 1X dsDNA HS (Thermo Fisher Scientific, Waltham,
MA, USA) following the manufacturer’s instructions. Illumina sequencing libraries were prepared
using Nextera DNA Flex Library Prep Kit and Nextera DNA CD Indexes (Illumina, San Diego, CA,
USA) according to manufacturer’s instructions. For both Liverpool and random hexamer cDNA
Viruses 2020,12, 895 4 of 13
libraries (but not ARTIC libraries), half of the prepared libraries were enriched for SARS-CoV-2 using
the myBaits Expert Virus SARS-CoV-2 panel (Arbor Biosciences, Ann Arbor, MI, USA) following the
manufacturer’s protocol with a 20 h hybridization time at 65
C and KAPA HiFi HotStart ReadyMix
(Roche, Basel, Switzerland) for post-enrichment library amplification, while the other half of each
library was sequenced without enrichment. Paired-end 150 bp sequencing was performed for each
library on a MiniSeq with the 300-cycle mid-output reagent kit (Illumina, San Diego, CA, USA),
multiplexed with targeted generation of ~40,000 clusters per library. A negative control library with no
input SARS-CoV-2 RNA extract was included using ARTIC amplification.
2.3. Genome Assembly
We developed a complete standardized workflow for the assembly and subsequent analysis
for short-read sequencing, released as the SARS-CoV-2 Illumina GeNome Assembly Line (SIGNAL).
For the Liverpool and ARTIC amplification-based libraries, sequencing reads pools were combined
(as R1 and R2) where needed (i.e., Liverpool amplicons), Illumina adapter sequences were removed
and low quality sequences trimmed or removed using Trimmomatic (version 0.36) [
13
], and then
amplification primer sequences removed where needed (i.e., Liverpool and ARTIC amplicons) using
cutadapt (version 1.18) [
14
]. Final sequence quality and confirmation of adapter/primer trimming
were confirmed by FASTQC (version 0.11.5) [
15
]. The percentage of reads derived from SARS-CoV-2
RNA for each library was determined using Kraken2 (version 2.0.8-beta; using RefSeq complete viral
genomes/proteins) [
16
], all non-SARS-CoV-2 reads removed using parsing of HiSAT2 (version 2.1.0) [
17
]
alignments, coverage normalized (samtools mpileup depth of 100,000), and prediction of genome
sequenced performed by iVar variant detection (version 1.2, consensus minimum depth =10) [
18
].
From these results, assembly statistics were generated by QUAST (version 5.0.2) [
19
] and depths
of coverage were assessed by HiSAT2 (version 2.1.0) alignment of the sequencing reads against the
predicted genome sequence [
17
]. Lastly, sequence variation or coverage gaps in the reads was assessed
by BreSeq (version 0.35.0) analysis relative to GenBank entry MN908947
·
3 (the first genome sequence
reported from the original Wuhan outbreak, China) [
20
]. Separately, sequencing reads were assessed
against GenBank entry MN908947
·
3 using HiSAT2 (version 2.1.0) and visualized using the Integrative
Genomics Viewer [21].
2.4. Assessment of Clinical Diagnostic PCR Primers
Clinical diagnostic amplification PCR primer sequences were designed in house using Geneious
v9.0 (https://www.geneious.com), collated from literature [
22
26
] and the World Health Organization
website [
27
], and mapped to the MN908947
·
3 SARS-CoV-2 genome sequence and added as additional
reference for BreSeq analysis of the sequencing reads, highlighting any mismatches at PCR priming sites.
2.5. Molecular Epidemiology Analysis
To confirm the epidemiological origin of both isolates, the best genome sequence of each
was included in a uniform manifold approximation and projection (UMAP) involving the aligned
genomes of 8074 SARS-CoV-2 isolates (obtained from Global Initiative on Sharing All Influenza Data,
GISAID, https://www.gisaid.org) labelled by country of origin [
28
]. For UMAP, the approximate
genomic dierences were estimated using DNA distance determined by the Kimura-80 model of DNA
evolution [
29
], after the removal of the first 55 and last 260 bp of the alignment. The same alignment
was used to generate a phylogenetic tree using a RAxML-HPC BlackBox at the CIPRES Science Gateway
with GTRGAMMA +I among site rate variation [
30
]. Both analyses excluded predicted homoplastic
sites within the alignment [31].
2.6. Data & Software Availability
The SIGNAL workflow is available at https://github.com/jaleezyy/covid-19-signal. Custom
software for uniform manifold approximation and projection (UMAP) is available at https://github.
Viruses 2020,12, 895 5 of 13
com/hsmaan/CovidGenotyper [
32
]. FASTQ sequences and assembly FASTA have been deposited in
NCBI Bioproject PRJNA636446, with assembly FASTA sequences additionally submitted to GISAID
(Wuhan-derived: EPI_ISL_413015 as submitted previously, Iran-derived: EPI_ISL_450747). Only
sequencing reads that aligned by HiSAT2 (version 2
·
1
·
0) to the SARS-CoV-2 MN908947
·
3 genome were
included in the deposited sequence files to avoid the release of sequences derived from patient DNA.
3. Result
3.1. Clinical Isolates
Two original clinical diagnostic samples from travelers returning to Canada were used for
genome sequencing; one from Wuhan, China (“Wuhan-derived”) and one from Iran (“Iran-derived”).
Sample RNA, for the generation of cDNA libraries, was extracted from mid-turbinate swabs that were
transported in a universal transport medium. The Wuhan-derived sample had a diagnostic qPCR cycle
threshold (Ct) value of 31.05 for the envelope (E) gene targets. The Iran-derived sample had Ct values
of 18.8 and 20.9 for the RNA-dependent RNA polymerase (RdRp) and E gene targets, respectively.
3.2. Genome Sequencing and Assembly
The number of paired reads and percentages of those reads that were derived from SARS-CoV-2
genetic material varied widely between library preparation protocols (Figure 1). In the Wuhan-derived
sample, the majority of read data was from the patient genome and therefore resulted in poor
SARS-CoV-2 genome coverage and consensus, potentially due to the higher Ct value of the initial
sample (i.e., less abundant or fragmented SARS-CoV-2 RNA). By contrast, sequencing data from the
Iran-derived isolate consisted predominantly of SARS-CoV-2 molecules and produced a high coverage
genome consensus (Table 1). However, ARTIC amplification led to superior results for both the Wuhan-
and Iran-derived samples (Table 1), strongly suggesting that the ARTIC protocol would be best for
samples with lower viral loads. On examining the sequencing results of the Iran-derived sample more
closely, we observed that the Liverpool amplification produced successful results with or without
subsequent bait capture enrichment, while cDNA synthesis using random hexamers led to lower
relative sampling of SARS-CoV-2 molecules in the sequencing library and poor genome coverage.
However, bait capture enriched SARS-CoV-2 cDNA molecules in the sample, producing genome
consensus and coverage on par with the Liverpool amplification approaches. None of the sequencing
protocols resolved the terminal 5
0
and 3
0
nucleotide sequences of the genomes, which was consistent
with other publicly available sequences (Table 2).
Examination of read coverage against the first genome sequence reported from the original Wuhan
(China) outbreak (GenBank MN908947
·
3) revealed that despite the better performance of the ARTIC
protocol for the Wuhan-derived sample, coverage was highly variable across the genome (Figure 2),
with ~10% of locations having less than 100x coverage, ~35% having 100–1000x coverage, and ~53%
having >1000x coverage. Amplification of the Wuhan-derived sample using Liverpool primers was
limited to a few regions of the genome with 0–100x coverage. This is in contrast to the Iran-derived
sample, which had >1000x coverage across >95% of the genome for all methods except the direct
sequencing of cDNA (for which ~99% of the genome still had 101–1000x coverage). On average, bait
capture enriched the Liverpool amplifications by 1.2 fold and the direct cDNA samples by 19.6 fold,
respectively. Although we did not perform secondary enrichment of ARTIC amplification products,
these results illustrate that secondary enrichment is not important for PCR amplicons, but valuable
for direct sequencing of cDNA. Notably, while ARTIC amplification led to the best overall results for
the Iran-derived sample, read alignment revealed several regions with low read coverage (Figure 2),
including a 319 bp coverage gap within the orf1ab gene (Table 2). This region falls within ARTICv3
0
s
amplicon 64 that has been widely reported to generate little to no sequence coverage [
33
]. In contrast,
the Liverpool amplification produced a more even read coverage across the genome (Figure 2).
Viruses 2020,12, 895 6 of 13
Viruses 2020, 12, x FOR PEER REVIEW 5 of 14
3. Result
3.1. Clinical Isolates
Two original clinical diagnostic samples from travelers returning to Canada were used for
genome sequencing; one from Wuhan, China (“Wuhan-derived”) and one from Iran (“Iran-
derived”). Sample RNA, for the generation of cDNA libraries, was extracted from mid-turbinate
swabs that were transported in a universal transport medium. The Wuhan-derived sample had a
diagnostic qPCR cycle threshold (Ct) value of 31.05 for the envelope (E) gene targets. The Iran-derived
sample had Ct values of 18.8 and 20.9 for the RNA-dependent RNA polymerase (RdRp) and E gene
targets, respectively.
3.2. Genome Sequencing and Assembly
The number of paired reads and percentages of those reads that were derived from SARS-CoV-
2 genetic material varied widely between library preparation protocols (Figure 1). In the Wuhan-
derived sample, the majority of read data was from the patient genome and therefore resulted in poor
SARS-CoV-2 genome coverage and consensus, potentially due to the higher Ct value of the initial
sample (i.e., less abundant or fragmented SARS-CoV-2 RNA). By contrast, sequencing data from the
Iran-derived isolate consisted predominantly of SARS-CoV-2 molecules and produced a high
coverage genome consensus (Table 1). However, ARTIC amplification led to superior results for both
the Wuhan- and Iran-derived samples (Table 1), strongly suggesting that the ARTIC protocol would
be best for samples with lower viral loads. On examining the sequencing results of the Iran-derived
sample more closely, we observed that the Liverpool amplification produced successful results with
or without subsequent bait capture enrichment, while cDNA synthesis using random hexamers led
to lower relative sampling of SARS-CoV-2 molecules in the sequencing library and poor genome
coverage. However, bait capture enriched SARS-CoV-2 cDNA molecules in the sample, producing
genome consensus and coverage on par with the Liverpool amplification approaches. None of the
sequencing protocols resolved the terminal 5 and 3 nucleotide sequences of the genomes, which was
consistent with other publicly available sequences (Table 2).
Figure 1. Plot showing the percent of sequencing reads mapping to the SARS-CoV-2 reference genome
against the total number of paired reads acquired from each library preparation. Each data point is
additionally labelled with a percent fraction and average read coverage of the SARS-CoV-2 genome.
Figure 1.
Plot showing the percent of sequencing reads mapping to the SARS-CoV-2 reference genome
against the total number of paired reads acquired from each library preparation. Each data point is
additionally labelled with a percent fraction and average read coverage of the SARS-CoV-2 genome.
Viruses 2020, 12, x FOR PEER REVIEW 9 of 14
Viruses 2020, 12, x; doi: FOR PEER REVIEW www.mdpi.com/journal/viruses
Examination of read coverage against the first genome sequence reported from the original
Wuhan (China) outbreak (GenBank MN908947·3) revealed that despite the better performance of the
ARTIC protocol for the Wuhan-derived sample, coverage was highly variable across the genome
(Figure 2), with ~10% of locations having less than 100x coverage, ~35% having 100–1000x coverage,
and ~53% having >1000x coverage. Amplification of the Wuhan-derived sample using Liverpool
primers was limited to a few regions of the genome with 0–100x coverage. This is in contrast to the
Iran-derived sample, which had >1000x coverage across >95% of the genome for all methods except
the direct sequencing of cDNA (for which ~99% of the genome still had 101–1000x coverage). On
average, bait capture enriched the Liverpool amplifications by 1.2 fold and the direct cDNA samples
by 19.6 fold, respectively. Although we did not perform secondary enrichment of ARTIC
amplification products, these results illustrate that secondary enrichment is not important for PCR
amplicons, but valuable for direct sequencing of cDNA. Notably, while ARTIC amplification led to
the best overall results for the Iran-derived sample, read alignment revealed several regions with low
read coverage (Figure 2), including a 319 bp coverage gap within the orf1ab gene (Table 2). This region
falls within ARTICv3s amplicon 64 that has been widely reported to generate little to no sequence
coverage [33]. In contrast, the Liverpool amplification produced a more even read coverage across
the genome (Figure 2).
Mutation analysis of the well sequenced Iran-derived sample detected one synonymous
substitution and four non-synonymous substitutions for the orf1ab gene, plus one non-synonymous
substitution for the N gene (Table 2). While positions 8653 and 28,688 overlap ARTIC PCR primers
and could reflect the failed removal of primer sequences by the bioinformatics workflow, both were
independently confirmed by the Liverpool amplifications and bait captured cDNA. All five
substitutions were consistently supported by 100% of sequencing reads, except for L3606F in the
orf1ab gene using the Liverpool amplification, the detection of which by BreSeq [20] was obscured by
a deletion predicted by a minority of reads; nonetheless iVar [18] consensus generation supported
L3606F. This location has been flagged for possible homoplastic sequencing artifacts [31]. Sequencing
of cDNA (bait enriched or otherwise) and ARTIC amplification predicted an intergenic nucleotide
substitution at position 29,742 in 100% of sequencing reads, yet this was not observed in sequences
derived from the Liverpool amplification due to missing read coverage (Figure 2). This position is
very close to the polyA tail and while not flagged for exclusion due to poor alignment [31], manual
inspection of the read alignments highlighted imperfect mapping of a minority of reads, so this single
nucleotide polymorphism (SNP) should be viewed with caution. Finally, both Liverpool and ARTIC
amplification methods had minority read support (10.9–22.8% of reads) for a deletion starting at
position 11,074 or 11,082, which was not observed for sequencing of unamplified cDNA, but this
region has been highlighted for Illumina-specific sequencing artifacts [31].
Figure 2.
Mapping and semi-log depth of coverage of trimmed sequencing reads for each library
preparation against the first Wuhan SARS-CoV-2 genome sequence (NCBI accession: MN908947
3).
Y-axis dimensions vary among samples (maximum indicated beside label) and colored positions
reflect frequency of SNPs relative to the MN908947
3 genome among the reads (green =A, blue =C,
orange =G, red =T). The plus (+) symbol indicates secondary bait capture enrichment. SARS-CoV-2
genome length and organization is highlighted on top.
Viruses 2020,12, 895 7 of 13
Table 1.
Sequencing read and genome assembly statistics including the total raw read pairs obtained and fraction captured from SARS-CoV-2 RNA, the fraction of
29,903 bp MN908947.3 genome sequence covered, depth of coverage, and number of variants detected relative to MN908947.3.
Sample Amplification Enrichment
Number of
Paired
Reads
Reads from
SARS-CoV-2
(%)
SARS-CoV-2
Genome
Fraction (%)
Average
Depth of
Coverage
0–100x
Coverage
(%)
101–1000x
Coverage
(%)
>1000x
Coverage
(%)
# iVar
Variants
Negative ARTIC No 938,693 0.01 0 4.1x 99.2 0.8 0.1 n/a
Wuhan Liverpool No 883,212 0.52 19.587 37.9x 93.88 6.08 0.04 1
Wuhan Liverpool Yes 22,119 58.73 20.811 98.6x 89.6 6.8 3.6 1
Wuhan Hexamers No 585,396 0.01 0 0.3x 99.9 0.1 0.00 n/a
Wuhan Hexamers Yes 1536 1.56 0 n/a n/a n/a n/a n/a
Wuhan ARTIC No 2,271,152 73.86 59.104 15,604.0x 10.6 35.5 53.9 5
Iran Liverpool No 813,975 90.13 98.53 6528.3x 1.2 3.1 95.6 6
Iran Liverpool Yes 901,124 89.76 98.54 8214.4x 0.7 0.2 99.1 6
Iran Hexamers No 1,091,011 2.77 99.89 215.3x 0.43 99.56 0.00 7
Iran Hexamers Yes 619,661 89.17 99.83 4383.9x 0.2 0.3 99.6 7
Iran ARTIC No 1,935,748 88.25 99.31 14,032.7x 0.2 1.7 98.1 7
Viruses 2020,12, 895 8 of 13
Table 2.
Predicted mutations relative to the MN908947.3 SARS-CoV-2 genome for each library for the high titre Iran-derived sample identified by BreSeq analysis of
sequencing reads. Mutations within codons are underlined. All mutations were predicted by 100% of sequencing reads mapping to that position unless otherwise
noted. Mutations in bold existed in the final iVar-called genome sequence, while those in italics exist in the final iVar-called genome sequence but were obscured by
deletion predictions in the minority reads for BreSeq.
Mutation Liverpool Alone Liverpool +
Enrichment Hexamers Alone Hexamers +
Enrichment
ARTIC
Amplification
Clinical Diagnostic
Primer Mismatch
Unresolved 50sequence 259 bp 258 bp 40 bp 0 bp 49 bp
Unresolved 30sequence 200 bp 190 bp 77 bp 139 bp 67 bp
pos. 835 (orf1ab polyprotein) F190F (TTCTTT) F190F (TTCTTT) F190F (TTCTTT) F190F (TTCTTT) F190F (TTCTTT) NIID_WH-1_R854
pos. 884 (orf1ab polyprotein) R207C (CGTTGT) R207C (CGTTGT) R207C (CGTTGT) R207C (CGTTGT) R207C (CGTTGT) NIID_WH-1_R913
pos. 1397 (orf1ab polyprotein) V378I (GTAATA) V378I (GTAATA) V378I (GTAATA) V378I (GTAATA) V378I (GTAATA)
pos. 8653 (orf1ab polyprotein) M2796I (ATGATT) M2796I (ATGATT) M2796I (ATGATT) M2796I (ATGATT) M2796I (ATGATT) Spike_F1
pos. 9502 (orf1ab polyprotein) 5.0% of reads suggest
A3079A (GC
C
GC
T
)
Spike_F1
pos. 11,074 (orf1ab
polyprotein)
11.8% of reads
suggest a deletion
between positions
10,809 and 13,203
11.8% of reads
suggest a deletion
between positions
10,809 and 13,203
10.9% of reads
suggest a deletion
between positions
10,809 and 13,203
Spike_F1
pos. 11,082 (orf1ab
polyprotein)
18.1% of reads
suggest a deletion
between positions
10,817 and 10,819
22.8% of reads
suggest a deletion
between positions
10,817 and 10,819
Spike_F1
pos. 11,083 (orf1ab
polyprotein) L3606F (TTGTTT) L3606F (TTGTTT) L3606F (TTGTTT) L3606F (TTGTTT) L3606F (TTGTTT) Spike_F1
pos. 19,285–19,603 (orf1ab
polyprotein)
319 bp coverage gap
(no aligned reads);
amplicon 64
pos. 27,156 (membrane
glycoprotein)
5.3% of reads suggest
S212C (AGTTGT)
pos. 28,688 (nucleocapsid
phosphoprotein) L139L (TTGCTG) L139L (TTGCTG) L139L (TTGCTG) L139L (TTGCTG) L139L (TTGCTG) 2019-nCoV_N3-F
pos. 29,742 (intergenic) no coverage no coverage GT GT GT
Viruses 2020,12, 895 9 of 13
Mutation analysis of the well sequenced Iran-derived sample detected one synonymous
substitution and four non-synonymous substitutions for the orf1ab gene, plus one non-synonymous
substitution for the Ngene (Table 2). While positions 8653 and 28,688 overlap ARTIC PCR primers
and could reflect the failed removal of primer sequences by the bioinformatics workflow, both
were independently confirmed by the Liverpool amplifications and bait captured cDNA. All five
substitutions were consistently supported by 100% of sequencing reads, except for L3606F in the orf1ab
gene using the Liverpool amplification, the detection of which by BreSeq [
20
] was obscured by a
deletion predicted by a minority of reads; nonetheless iVar [
18
] consensus generation supported L3606F.
This location has been flagged for possible homoplastic sequencing artifacts [
31
]. Sequencing of cDNA
(bait enriched or otherwise) and ARTIC amplification predicted an intergenic nucleotide substitution
at position 29,742 in 100% of sequencing reads, yet this was not observed in sequences derived from
the Liverpool amplification due to missing read coverage (Figure 2). This position is very close to
the polyA tail and while not flagged for exclusion due to poor alignment [
31
], manual inspection of
the read alignments highlighted imperfect mapping of a minority of reads, so this single nucleotide
polymorphism (SNP) should be viewed with caution. Finally, both Liverpool and ARTIC amplification
methods had minority read support (10.9–22.8% of reads) for a deletion starting at position 11,074
or 11,082, which was not observed for sequencing of unamplified cDNA, but this region has been
highlighted for Illumina-specific sequencing artifacts [31].
3.3. Assessment of Clinical Diagnostic PCR Primers
SARS-CoV-2 diagnostic PCRs rely on the ecient binding of primers to their designated targets.
Mutations in these regions will prevent primer annealing and produce false negative results. Thus, given
the critical importance of identifying mutations in diagnostic PCR target sites, our pipeline includes
mapping of diagnostic primer sequences [
22
27
] relative to the mutations detected. We identified a
number of these have single nucleotide mismatches in the Iran-derived sample, which was supported
by 100% of sequencing reads, as well as minority read support for loss of priming sites for the spike
protein (Table 2).
3.4. Molecular Epidemiology Analysis
There is very little variation among available SARS-CoV-2 genome sequences, as summarized
at GISAID (www.gisaid.org) and exemplified by our own detection of only 6 SNPs between the
original Wuhan genome and our Iran-derived sample. By utilizing a uniform manifold approximation
and projection (UMAP) [
28
] of genome sequence similarity, we were able to place this isolate in a
small cluster of genomes from Australia (11), China (4), India (4), Kuwait (3), Norway (1), Pakistan
(1), Taiwan (5), Turkey (4), USA (1), United Arab Emirates (3), and United Kingdom (1) (Figure 3).
Cross-referencing with GISAID metadata revealed that within this small cluster, isolates from Australia
(2 isolates), India (4 isolates), and Pakistan (1 isolate) also had travel history associated with the
outbreak in Iran. Unfortunately, GISAID did not contain sequences from Iran, but phylogenetic
analysis confirmed these UMAP results, placing our Iran-derived sample and the nearby UMAP
samples in a well supported clade (Figure S1). The incomplete genome sequence obtained for our
Wuhan-derived isolate precluded its inclusion in UMAP and phylogenetic analyses.
Viruses 2020,12, 895 10 of 13
Viruses 2020, 12, x FOR PEER REVIEW 10 of 14
Figure 2. Mapping and semi-log depth of coverage of trimmed sequencing reads for each library
preparation against the first Wuhan SARS-CoV-2 genome sequence (NCBI accession: MN908947•3).
Y-axis dimensions vary among samples (maximum indicated beside label) and colored positions
reflect frequency of SNPs relative to the MN908947•3 genome among the reads (green = A, blue = C,
orange = G, red = T). The plus (+) symbol indicates secondary bait capture enrichment. SARS-CoV-2
genome length and organization is highlighted on top.
3.3. Assessment of Clinical Diagnostic PCR Primers
SARS-CoV-2 diagnostic PCRs rely on the efficient binding of primers to their designated targets.
Mutations in these regions will prevent primer annealing and produce false negative results. Thus,
given the critical importance of identifying mutations in diagnostic PCR target sites, our pipeline
includes mapping of diagnostic primer sequences [22–27] relative to the mutations detected. We
identified a number of these have single nucleotide mismatches in the Iran-derived sample, which
was supported by 100% of sequencing reads, as well as minority read support for loss of priming
sites for the spike protein (Table 2).
3.4. Molecular Epidemiology Analysis
There is very little variation among available SARS-CoV-2 genome sequences, as summarized
at GISAID (www.gisaid.org) and exemplified by our own detection of only 6 SNPs between the
original Wuhan genome and our Iran-derived sample. By utilizing a uniform manifold
approximation and projection (UMAP) [28] of genome sequence similarity, we were able to place this
isolate in a small cluster of genomes from Australia (11), China (4), India (4), Kuwait (3), Norway (1),
Pakistan (1), Taiwan (5), Turkey (4), USA (1), United Arab Emirates (3), and United Kingdom (1)
(Figure 3). Cross-referencing with GISAID metadata revealed that within this small cluster, isolates
from Australia (2 isolates), India (4 isolates), and Pakistan (1 isolate) also had travel history associated
with the outbreak in Iran. Unfortunately, GISAID did not contain sequences from Iran, but
phylogenetic analysis confirmed these UMAP results, placing our Iran-derived sample and the
nearby UMAP samples in a well supported clade (Figure S1). The incomplete genome sequence
obtained for our Wuhan-derived isolate precluded its inclusion in UMAP and phylogenetic analyses.
Figure 3.
Uniform manifold approximation and projection (UMAP) involving the aligned genomes of
8075 SARS-CoV-2 isolates labelled by country of origin. The Iran-derived sample is indicated by an
arrow. The top inset illustrates the analysis of all 8075 isolates, labelled by region, with the zoomed
region indicated by the hashed box.
4. Discussion
Our results underscore the importance of presumptive viral load, based on qPCR cycle threshold,
for obtaining a complete SARS-CoV-2 genome sequence, reinforcing the findings of others [
34
].
While the Liverpool amplification primers provided a more even read coverage of the SARS-CoV-2
genome, amplification using the ARTIC primers was superior for obtaining a complete genome
sequence to the point where it was the only successful protocol for one of our samples. Yet ARTIC
amplification had regions of low or missing sequence coverage not seen with sequencing of cDNA or
the Liverpool amplification (Figure 2). Additionally, low Liverpool and ARTIC coverage at positions
~11,500 to ~13,000 was associated with minority read support for a deletion in the BreSeq analysis,
which was not supported by bait enriched cDNA sequencing. This region has been associated
with artifacts of Illumina sequencing of amplicons [
31
]. Yet our standardized iVar-based pipeline
(github.com/jaleezyy/covid-19-signal), compatible with and extending the Connor lab ARTIC nextflow
pipeline (github.com/connor-lab/ncov2019-artic-nf), was able to overcome these regions of low coverage,
favoring the majority reads to generate a final genome sequence. ARTIC amplification and sequencing
resulted in a 319 bp gap within the coding region for the orf1ab gene (amplicon 64) so this would
underpredict any SNPs in this region, while the Liverpool amplification was confirmed to miss a
possible intergenic SNP due to missing coverage at the 3
0
terminal region of the SARS-CoV-2 genome.
Considering the low variation observed to date among SARS-CoV-2 genomes, accurate prediction of
every possible SNP using a standardized workflow is of high importance for molecular epidemiological
analyses, phylogenetic tree generation, and molecular diagnostic assays. Additionally, it is important
for prioritizing virus isolates for subsequent analysis of glycosylation sites and other post-translational
modification, as well as cell-culture experiments to investigate
in vitro
phenotypes. Notably, the
Viruses 2020,12, 895 11 of 13
prediction of glycosylation sites using NetOGlyc (http://www.cbs.dtu.dk/services/NetOGlyc/) found no
dierences between the original Wuhan genome (MN908947
·
3) and our Iran-derived isolate. However,
our work did detect mismatches for currently used diagnostics PCR primers, specifically in primers
designed by the CDC and the Japanese NID. Clinical laboratories should be aware of this, and
we suggest this should be part of ongoing genomic surveillance eorts. We also note that neither
amplification method (Liverpool or ARTIC) was perfect, but the results indicated that amplification-free,
bait capture enriched sequencing of cDNA is of high utility for the identification of amplification
artifacts and may additionally be useful for direct sequencing of SARS-CoV-2 RNA from cell culture.
Overall, the availability of alternate protocols permits confirmation of novel mutations by excluding
protocol-specific sequencing and analysis artifacts.
Understanding the advantages and limitations of dierent protocols is essential to population-level
whole genome sequencing of SARS-CoV-2 directly from clinical samples. Although the heterogeneity of
this source of material may be a limitation, particularly for samples with low quantities and/or quality
of RNA, it is the most feasible approach given the constraints of virus isolation. This approach also
produces sequences most closely reflecting those within the host. However, we also acknowledge that
this work is limited to two clinical samples, which give a preliminary outlook onto the ecacy of each
protocol. Additionally, our study only investigated one sample type and evaluation of these protocols
with other sample types (e.g., lower-respiratory tract samples) will be informative. Recently, Xiao and
colleagues performed comparative studies on sputum, throat swabs, anal swabs, and nasopharyngeal
swabs and reported that more viral reads were recovered from nasal swabs than any other sample
type [
35
], although it is not clear if they were using paired samples. This suggests protocol optimization
for other sample types is necessary. Overall, standardization and quality controls are necessary for
informative broad analyses and to enable DNA sequencing protocol implementation at regional sites
of care for enhanced turnaround time to generate actionable data.
Supplementary Materials:
The following are available online at http://www.mdpi.com/1999-4915/12/8/895/s1,
Figure S1: Clade within the larger 8,075 isolate phylogenetic tree containing the Toronto isolate derived from Iran,
with isolates associated by UMAP marked by an asterisk (red indicating travel history associated with the Iran
outbreak). Branch lengths represent evolutionary distance while node labels represent bootstrap support.
Author Contributions:
R.A.K. and S.M. developed the concept, P.A. performed the construction of sequencing
libraries and MiniSeq sequencing, B.P.A. performed biocuration of reference data, J.A.N., A.R.R. and A.G.M.
performed the bioinformatics analyses, A.B. and N.C.K. assisted in the interpretation of genomic data. K.M.S.,
F.M., A.R.R. and J.A.N. developed the SIGNAL workflow. H.M. (Hamza Mbareche) tested the analytical workflow
and helped with the interpretation of genomic data. H.M. (Hassaan Maan) performed the UMAP and phylogenetic
analyses. M.A. and J.A.H. provided the Liverpool amplification reagents and protocols. K.M., B.W., A.G.M. and
S.M. provided funding and supervised the entire project. All authors prepared the manuscript and approved the
final article. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the Canadian Institutes of Health Research grant PJT-156214.
Acknowledgments:
Technical discussion from Jared Simpson (Ontario Institute for Cancer Research) was greatly
appreciated. J.A.N. was supported by funds from the Comprehensive Antibiotic Resistance Database. B.P.A.
and A.R.R. were supported by Canadian Institutes of Health Research (CIHR) funding (PJT-156214 to A.G.M.).
Computer resources were supplied by Hewlett Packard Enterprise, Canada. K.M. is funded by CIHR and Natural
Sciences and Engineering Research Council of Canada (NSERC). A.B. is funded by NSERC. F.M. is supported by a
Donald Hill Family Fellowship in Computer Science. H.M. is supported by a postdoctoral fellowship from Fond
de Recherche du Qu
é
bec Nature et Technologie and is the recipient of the Lab Exchange Visitor Program Award
from the Canadian Society for Virology. S.M. and R.A.K. are supported by the McLaughlin Centre and the Toronto
COVID-19 Action Initiative from the University of Toronto. Methods development of an amplicon system for
SARS-CoV-2 by J.A.H. and M.A. is funded by the US Food and Drug Administration.
Conflicts of Interest:
The authors declare no competing interests. The funders had no role in the design of the
study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to
publish the results.
References
1.
Dong, E.; Du, H.; Gardner, L. An interactive web-based dashboard to track COVID-19 in real time. Lancet
Infect. Dis. 2020,20, 533–534. [CrossRef]
Viruses 2020,12, 895 12 of 13
2.
Zhou, P.; Yang, X.L.; Wang, X.G.; Hu, B.; Zhang, L.; Zhang, W.; Si, H.R.; Zhu, Y.; Li, B.; Huang, C.L.; et al.
A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature
2020
,579, 270–273.
[CrossRef]
3.
Zaki, A.M.; Van Boheemen, S.; Bestebroer, T.M.; Osterhaus, A.D.; Fouchier, R.A. Isolation of a Novel
Coronavirus from a Man with Pneumonia in Saudi Arabia. N. Engl. J. Med.
2012
,367, 1814–1820. [CrossRef]
4.
Lu, R.; Zhao, X.; Li, J.; Niu, P.; Yang, B.; Wu, H.; Wang, W.; Song, H.; Huang, B.; Zhu, N.; et al. Genomic
characterisation and epidemiology of 2019 novel coronavirus: Implications for virus origins and receptor
binding. Lancet 2020,395, 565–574. [CrossRef]
5.
Wang, W.; Xu, Y.; Gao, R.; Lu, R.; Han, K.; Wu, G.; Tan, W. Detection of SARS-CoV-2 in Dierent Types of
Clinical Specimens. JAMA 2020,323, 1843–1844. [CrossRef] [PubMed]
6.
Taubenberger, J.K.; Kash, J.C. Influenza virus evolution, host adaptation, and pandemic formation. Cell Host
Microbe 2010,7, 440–451. [CrossRef] [PubMed]
7.
Quick, J.; Grubaugh, N.D.; Pullan, S.T.; Claro, I.M.; Smith, A.D.; Gangavarapu, K.; Oliveira, G.;
Robles-Sikisaka, R.; Rogers, T.F.; Beutler, N.A.; et al. Multiplex PCR method for MinION and Illumina
sequencing of Zika and other virus genomes directly from clinical samples. Nat. Protoc.
2017
,12, 1261–1276.
[CrossRef]
8.
Li, B.; Si, H.R.; Zhu, Y.; Yang, X.L.; Anderson, D.E.; Shi, Z.L.; Wang, L.F.; Zhou, P. Discovery of Bat
Coronaviruses through Surveillance and Probe Capture-Based Next-Generation Sequencing. mSphere
2020
,
5, e00807-19. [CrossRef]
9.
Metsky, H.C.; Siddle, K.J.; Gladden-Young, A.; Qu, J.; Yang, D.K.; Brehio, P.; Goldfarb, A.; Piantadosi, A.;
Wohl, S.; Carter, A.; et al. Capturing sequence diversity in metagenomes with comprehensive and scalable
probe design. Nat. Biotechnol. 2019,37, 160–168. [CrossRef]
10.
Depledge, D.P.; Palser, A.L.; Watson, S.J.; Lai, I.Y.C.; Gray, E.R.; Grant, P.; Kanda, R.K.; Leproust, E.; Kellam, P.;
Breuer, J. Specific Capture and Whole-Genome Sequencing of Viruses from Clinical Samples. PLoS ONE
2011,6, e27805. [CrossRef]
11.
Marchand-Sen
é
cal, X.; Kozak, R.; Mubareka, S.; Salt, N.; Gubbay, J.B.; Eshaghi, A.; Allen, V.; Li, Y.; Bastien, N.;
Gilmour, M.; et al. Diagnosis and Management of First Case of COVID-19 in Canada: Lessons applied from
SARS. Clin. Infect. Dis. 2020, ciaa227. [CrossRef] [PubMed]
12.
LeBlanc, J.J.; Gubbay, J.B.; Li, Y.; Needle, R.; Arneson, S.R.; Marcino, D.; Charest, H.; Desnoyers, G.; Dust, K.;
Fattouh, R.; et al. Real-time PCR-based SARS-CoV-2 detection in Canadian laboratories. J. Clin. Virol.
2020
,
128, 104433. [CrossRef] [PubMed]
13.
Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics
2014,30, 2114–2120. [CrossRef] [PubMed]
14.
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J.
2011
,
17, 10. [CrossRef]
15.
Andrews, S. FastQC: A Quality Control Tool for High Throughput Sequence Data. 2010. Available online:
http://www.bioinformatics.babraham.ac.uk/projects/fastqc (accessed on 3 May 2020).
16.
Wood, D.E.; Lu, J.; Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol.
2019
,20, 257.
[CrossRef]
17.
Kim, D.; Paggi, J.M.; Park, C.; Bennett, C.; Salzberg, S.L. Graph-based genome alignment and genotyping
with HISAT2 and HISAT-genotype. Nat. Biotechnol. 2019,37, 907–915. [CrossRef]
18.
Grubaugh, N.D.; Gangavarapu, K.; Quick, J.; Matteson, N.L.; De Jesus, J.G.; Main, B.J.; Tan, A.L.; Paul, L.M.;
Brackney, D.E.; Grewal, S.; et al. An amplicon-based sequencing framework for accurately measuring
intrahost virus diversity using PrimalSeq and iVar. Genome Biol. 2019,20, 8. [CrossRef]
19.
Gurevich, A.; Saveliev, V.; Vyahhi, N.; Tesler, G. QUAST: Quality assessment tool for genome assemblies.
Bioinformatics 2013,29, 1072–1075. [CrossRef]
20.
Deatherage, D.E.; Traverse, C.C.; Wolf, L.N.; Barrick, J.E. Detecting rare structural variation in evolving
microbial populations from new sequence junctions using breseq. Front. Genet. 2015,5, 468. [CrossRef]
21.
Robinson, J.T.; Thorvaldsd
ó
ttir, H.; Wenger, A.M.; Zehir, A.; Mesirov, J.P. Variant Review with the Integrative
Genomics Viewer. Cancer Res. 2017,77, e31–e34. [CrossRef]
22.
Zhu, N.; Zhang, D.; Wang, W.; Li, X.; Yang, B.; Song, J.; Zhao, X.; Huang, B.; Shi, W.; Lu, R.; et al. A Novel
Coronavirus from Patients with Pneumonia in China, 2019. N. Engl. J. Med.
2020
,382, 727–733. [CrossRef]
[PubMed]
Viruses 2020,12, 895 13 of 13
23.
Huang, C.; Wang, Y.; Li, X.; Ren, L.; Zhao, J.; Hu, Y.; Zhang, L.; Fan, G.; Xu, J.; Gu, X.; et al. Clinical features
of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet (Lond. Engl.)
2020
,395, 497–506.
[CrossRef]
24.
Vermeiren, C.; Marchand-Sen
é
cal, X.; Sheldrake, E.; Bulir, D.; Smieja, M.; Chong, S.; Forbes, J.D.; Katz, K.
Comparison of Copan Eswab and FLOQswab for COVID-19 PCR diagnosis: Working around a supply
shortage. J. Clin. Microbiol. 2020,58, e00669-20. [CrossRef] [PubMed]
25.
Nalla, A.K.; Casto, A.M.; Huang, M.L.W.; Perchetti, G.A.; Sampoleo, R.; Shrestha, L.; Wei, Y.; Zhu, H.;
Jerome, K.R.; Greninger, A.L. Comparative Performance of SARS-CoV-2 Detection Assays using Seven
Dierent Primer/Probe Sets and One Assay Kit. J. Clin. Microbiol.
2020
,58, e00557-20. [CrossRef] [PubMed]
26.
Chu, D.K.; Pan, Y.; Cheng, S.M.; Hui, K.P.; Krishnan, P.; Liu, Y.; Ng, D.Y.; Wan, C.K.; Yang, P.; Wang, Q.; et al.
Molecular Diagnosis of a Novel Coronavirus (2019-nCoV) Causing an Outbreak of Pneumonia. Clin. Chem.
2020,66, 549–555. [CrossRef] [PubMed]
27.
World Health Organization. Molecular Assays to Diagnose COVID-19: Summary Table of Available
Protocols. Available online: https://www.who.int/who- documents-detail/molecular-assays-to-diagnose-
covid-19-summary-table-of-available-protocols (accessed on 11 May 2020).
28.
Uniform Manifold Approximation and Projection for Dimension Reduction. Available online: http:
//arxiv.org/abs/1802.03426. (accessed on 15 May 2020).
29.
Kimura, M.A. simple method for estimating evolutionary rates of base substitutions through comparative
studies of nucleotide sequences. J. Mol. Evol. 1980,16, 111–120. [CrossRef]
30.
Stamatakis, A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies.
Bioinformatics 2014,30, 1312–1313. [CrossRef]
31.
Issues with SARS-CoV-2 Sequencing Data. Virological. 2020. Available online: http://virological.org/t/issues-
with-sars-cov-2-sequencing-data/473 (accessed on 11 May 2020).
32.
Maan, H.; Mbareche, H.; Raphenya, A.R.; Banerjee, A.; Nasir, J.A.; Kozak, R.A.; Knox, N.; Mubareka, S.;
McArthur, A.G.; Wang, B. Genotyping SARS-CoV-2 through an interactive web application. The Lancet
Digital Health 2020,2, E340–E341. [CrossRef]
33.
Network, A. hCoV-2019 (nCoV-2019/SARS-CoV-2). Available online: https://artic.network/ncov-2019
(accessed on 24 March 2020).
34.
Gudbjartsson, D.F.; Helgason, A.; Jonsson, H.; Magnusson, O.T.; Melsted, P.; Norddahl, G.L.;
Saemundsdottir, J.; Sigurdsson, A.; Sulem, P.; Agustsdottir, A.B.; et al. Spread of SARS-CoV-2 in the
Icelandic Population. N. Engl. J. Med. 2020,382, 2302–2315. [CrossRef]
35.
Xiao, M.; Liu, X.; Ji, J.; Li, M.; Li, J.; Yang, L.; Sun, W.; Ren, P.; Yang, G.; Zhao, J.; et al. Multiple approaches for
massively parallel sequencing of SARS-CoV-2 genomes directly from clinical samples. Genome Med.
2020
,12,
57. [CrossRef]
©
2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).
... Using molecular biology methods to identify SARS-CoV-2 variants was generally applied in surveillance and clinical diagnosis. Deep sequencing technology is widely utilized to identify SARS-CoV-2 variants, which can identify each mutation in the sample [37,38]. Real-time PCR assays for identifying SARS-CoV-2 variants were also reported [39,40]. ...
Article
Full-text available
Developing variant vaccines or multivalent vaccines is a feasible way to address the epidemic as the SARS-CoV-2 variants of concern (VOCs) posed an increased risk to global public health. The spike protein of the SARS-CoV-2 virus was usually used as the main antigen in many types of vaccines to produce neutralizing antibodies against the virus. However, the spike (S) proteins of different variants were only differentiated by a few amino acids, making it difficult to obtain specific antibodies that can distinguish different VOCs, thereby challenging the accurate distinction and quantification of the variants using immunological methods such as ELISA. Here, we established a method based on LC–MS to quantify the S proteins in inactivated monovalent vaccines or trivalent vaccines (prototype, Delta, and Omicron strains). By analyzing the S protein sequences of the prototype, Delta, and Omicron strains, we identified peptides that were different and specific among the three strains and synthesized them as references. The synthetic peptides were isotopically labeled as internal targets. Quantitative analysis was performed by calculating the ratio between the reference and internal target. The verification results have shown that the method we established had good specificity, accuracy, and precision. This method can not only accurately quantify the inactivated monovalent vaccine but also could be applied to each strain in inactivated trivalent SARS-CoV-2 vaccines. Hence, the LC–MS method established in this study can be applied to the quality control of monovalent and multivalent SARS-CoV-2 variation vaccines. By enabling more accurate quantification, it will help to improve the protection of the vaccine to some extent.
... Whole genome sequencing (WGS) data are used to study pathogen transmission and microevolution, enabling the determination of direct transmission between donor and recipient. Viral RNA was amplified by PCR, sequencing library was constructed, a high-throughput sequencing template was prepared, and WGS was performed (Nasir et al. 2020). When SARS-CoV-2 is spread to another area, local outbreaks occur, requiring urgent testing and tracing for initial containment of infected persons. ...
Article
Full-text available
COVID-19 is a highly infectious disease caused by the SARS-CoV-2 virus, which primarily affects the respiratory system and can lead to severe illness. The virus is extremely contagious, early and accurate diagnosis of SARS-CoV-2 is crucial to contain its spread, to provide prompt treatment, and to prevent complications. Currently, the reverse transcriptase polymerase chain reaction (RT-PCR) is considered to be the gold standard for detecting COVID-19 in its early stages. In addition, loop-mediated isothermal amplification (LMAP), clustering rule interval short palindromic repeats (CRISPR), colloidal gold immunochromatographic assay (GICA), computed tomography (CT), and electrochemical sensors are also common tests. However, these different methods vary greatly in terms of their detection efficiency, specificity, accuracy, sensitivity, cost, and throughput. Besides, most of the current detection methods are conducted in central hospitals and laboratories, which is a great challenge for remote and underdeveloped areas. Therefore, it is essential to review the advantages and disadvantages of different COVID-19 detection methods, as well as the technology that can enhance detection efficiency and improve detection quality in greater details.
... -signal), as well as the NCoV-Tools (v1.8.0) workflow for control and quality assessment (https://github .com/jts/ncov-tools) (28). Within SIGNAL, mutation calling was performed using the FreeBayes (29) (v1.3.2, consensus minimum depth = 10) option. ...
Article
Full-text available
Genomic epidemiology can facilitate an understanding of evolutionary history and transmission dynamics of a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) outbreak. We used next-generation sequencing techniques to study SARS-CoV-2 genomes isolated from patients and health care workers (HCWs) across five wards of a Canadian hospital with an ongoing SARS-CoV-2 outbreak. Using traditional contact tracing methods, we show transmission events between patients and HCWs, which were also supported by the SARS-CoV-2 lineage assignments. The outbreak predominantly involved SARS-CoV-2 B.1.564.1 across all five wards, but we also show evidence of community introductions of lineages B.1, B.1.1.32, and B.1.231, falsely assumed to be outbreak related. Altogether, our study exemplifies the value of using contact tracing in combination with genomic epidemiology to understand the transmission dynamics and genetic underpinnings of a SARS-CoV-2 outbreak. IMPORTANCE Our manuscript describes a SARS-CoV-2 outbreak investigation in an Ontario tertiary care hospital. We use traditional contract tracing paired with whole-genome sequencing to facilitate an understanding of the evolutionary history and transmission dynamics of this SARS-CoV-2 outbreak in a clinical setting. These advancements have enabled the incorporation of phylogenetics and genomic epidemiology into the understanding of clinical outbreaks. We show that genomic epidemiology can help to explore the genetic evolution of a pathogen in real time, enabling the identification of the index case and helping understand its transmission dynamics to develop better strategies to prevent future spread of SARS-CoV-2 in congregate, clinical settings such as hospitals.
... This identified 12 patients with longitudinal samples from a total of 472 patients. RNA was isolated from the longitudinal swabs and used as a template for the amplification of SARS-CoV-2 genome and sgmRNAs using both short-(ARTIC-Illumina) and longer-read length (Rapid Sequencing of Long Amplicons-Nanopore (RSLA-Nanopore)) sequencing [14,15]. These longitudinal samples had sufficient read depth to call a consensus for the dominant viral genome sequence and to derive information on the frequency of minor genomic variants, focusing on codons 323 in NSP12 and 614 in the spike protein. ...
Article
Full-text available
Background: The mutational landscape of SARS-CoV-2 varies at the dominant viral genome sequence and minor genomic variant population. During the COVID-19 pandemic, an early substitution in the genome was the D614G change in the spike protein, associated with an increase in transmissibility. Genomes with D614G are accompanied by a P323L substitution in the viral polymerase (NSP12). However, P323L is not thought to be under strong selective pressure. Results: Investigation of P323L/D614G substitutions in the population shows rapid emergence during the containment phase and early surge phase during the first wave. These substitutions emerge from minor genomic variants which become dominant viral genome sequence. This is investigated in vivo and in vitro using SARS-CoV-2 with P323 and D614 in the dominant genome sequence and L323 and G614 in the minor variant population. During infection, there is rapid selection of L323 into the dominant viral genome sequence but not G614. Reverse genetics is used to create two viruses (either P323 or L323) with the same genetic background. L323 shows greater abundance of viral RNA and proteins and a smaller plaque morphology than P323. Conclusions: These data suggest that P323L is an important contribution in the emergence of variants with transmission advantages. Sequence analysis of viral populations suggests it may be possible to predict the emergence of a new variant based on tracking the frequency of minor variant genomes. The ability to predict an emerging variant of SARS-CoV-2 in the global landscape may aid in the evaluation of medical countermeasures and non-pharmaceutical interventions.
... Sequencing data was analyzed using DRAGEN COVID Lineage (version 3.5.9; Illumina Inc., USA) and method as described previously [11,12]. In brief, sequencing reads of human origin were removed using NCBI Human Read Scrubber algorithm. ...
Article
Full-text available
Pediatric population was generally less affected clinically by SARS-CoV-2 infection. Few pediatric cases of COVID-19 have been reported compared to those reported in infected adults. However, a rapid increase in the hospitalization rate of SARS-CoV-2 infected pediatric patients was observed during Omicron variant dominated COVID-19 outbreak. In this study, we analyzed the B.1.1.529 (Omicron) genome sequences collected from pediatric patients by whole viral genome amplicon sequencing using Illumina next generation sequencing platform, followed by phylogenetic analysis. The demographic, epidemiologic and clinical data of these pediatric patients are also reported in this study. Fever, cough, running nose, sore throat and vomiting were the more commonly reported symptoms in children infected by Omicron variant. A novel frameshift mutation was found in the ORF1b region (NSP12) of the genome of Omicron variant. Seven mutations were identified in the target regions of the WHO listed SARS-CoV-2 primers and probes. On protein level, eighty-three amino acid substitutions and fifteen amino acid deletions were identified. Our results indicate that asymptomatic infection and transmission among children infected by Omicron subvariants BA.2.2 and BA.2.10.1 are not common. Omicron may have different pathogenesis in pediatric population.
Article
Background: Accurate genome sequences form the basis for genomic surveillance programs, the added value of which was impressively demonstrated during the COVID-19 pandemic by tracing transmission chains, discovering new viral lineages and mutations, and assessing them for infectiousness and resistance to available treatments. Amplicon strategies employing Illumina sequencing have become widely established for variant detection and reference-based reconstruction of SARS-CoV-2 genomes, and are routine bioinformatics tasks. Yet, specific challenges arise when analyzing amplicon data, for example, when crucial and even lineage-determining mutations occur near primer sites. Methods: We present CoVpipe2, a bioinformatics workflow developed at the Public Health Institute of Germany to reconstruct SARS-CoV-2 genomes based on short-read sequencing data accurately. The decisive factor here is the reliable, accurate, and rapid reconstruction of genomes, considering the specifics of the used sequencing protocol. Besides fundamental tasks like quality control, mapping, variant calling, and consensus generation, we also implemented additional features to ease the detection of mixed samples and recombinants. Results: Here, we highlight common pitfalls in primer clipping, detecting heterozygote variants, and dealing with low-coverage regions and deletions. We introduce CoVpipe2 to address the above challenges and have compared and successfully validated the pipeline against selected publicly available benchmark datasets. CoVpipe2 features high usability, reproducibility, and a modular design that specifically addresses the characteristics of short-read amplicon protocols but can also be used for whole-genome short-read sequencing data. Conclusions: CoVpipe2 has seen multiple improvement cycles and is continuously maintained alongside frequently updated primer schemes and new developments in the scientific community. Our pipeline is easy to set up and use and can serve as a blueprint for other pathogens in the future due to its flexibility and modularity, providing a long-term perspective for continuous support. CoVpipe2 is written in Nextflow and is freely accessible from https://github.com/rki-mf1/CoVpipe2 under the GPL3 license.
Preprint
Full-text available
To control the SARS-CoV-2 pandemic, healthcare systems have focused on ramping up their capacity for epidemiological surveillance through viral whole genome sequencing. In this paper, we tested the performance of two protocols of SARS-CoV-2 nucleic acid enrichment, an amplicon enrichment using different versions of the ARTIC primer panel and a hybrid-capture method using KAPA RNA Hypercap. We focused on the challenge of the Omicron variant sequencing, the advantages of automated library preparation and the influence of the bioinformatic analysis in the final consensus sequence. All 94 samples were sequenced using Illumina iSeq 100 and analysed with two bioinformatic pipelines: a custom-made pipeline and an Illumina-owned pipeline. We were unsuccessful in sequencing six samples using the capture enrichment due to low reads. On the other hand, amplicon dropout and mispriming caused the loss of mutation G21987A and the erroneous addition of mutation T15521A respectively using amplicon enrichment. Overall, we found high sequence agreement regardless of method of enrichment, bioinformatic pipeline or the use of automation for library preparation in eight different SARS-CoV-2 variants. Automation and the use of a simple app for bioinformatic analysis can simplify the genotyping process, making it available for more diagnostic facilities and increasing global vigilance.
Article
Background: Great concerns have been raised on SARS-CoV-2 impact on men's andrological well-being, and many studies have attempted to determine whether SARS-CoV-2 is present in the semen and till now the data are unclear and somehow ambiguous. However, these studies used quantitative real-time (qRT) PCR, which is not sufficiently sensitive to detect nucleic acids in clinical samples with a low viral load. Methods: The clinical performance of various nucleic acid detection methods (qRT-PCR, OSN-qRT-PCR, cd-PCR, and CBPH) was assessed for SARS-CoV-2 using 236 clinical samples from laboratory-confirmed COVID-19 cases. Then, the presence of SARS-CoV-2 in the semen of 12 recovering patients was investigated using qRT-PCR, OSN-qRT-PCR, cd-PCR, and CBPH in parallel using 24 paired semen, blood, throat swab, and urine samples. Results: The sensitivity and specificity along with AUC of CBPH was markedly higher than the other 3methods. Although qRT-PCR, OSN-qRT-PCR and cdPCR detected no SARS-CoV-2 RNA in throat swab, blood, urine, and semen samples of the 12 patients, CBPH detected the presence of SARS-CoV-2 genome fragments in semen samples, but not in paired urine samples, of 3 of 12 patients. The existing SARS-CoV-2 genome fragments were metabolized over time. Conclusions: Both OSN-qRT-PCR and cdPCR had better performance than qRT-PCR, and CBPH had the highest diagnostic performance in detecting SARS-CoV-2, which contributed the most improvement to the determination of the critical value in gray area samples with low vrial load, which then provides a rational screening strategy for studying the clearance of coronavirus in the semen over time in patients recovering from COVID-19. Although the presence of SARS-CoV-2 fragments in the semen was demonstrated by CBPH, COVID-19 is unlikely to be sexually transmitted from male partners for at least 3 months after hospital discharge.
Article
Full-text available
The disastrous spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has induced severe public healthcare issues and weakened the global economy significantly. Although SARS-CoV-2 infection is not as fatal as the initial outbreak, many infected victims suffer from long COVID. Therefore, rapid and large-scale testing is critical in managing patients and alleviating its transmission. Herein, we review the recent advances in techniques to detect SARS-CoV-2. The sensing principles are detailed together with their application domains and analytical performances. In addition, the advantages and limits of each method are discussed and analyzed. Besides molecular diagnostics and antigen and antibody tests, we also review neutralizing antibodies and emerging SARS-CoV-2 variants. Further, the characteristics of the mutational locations in the different variants with epidemiological features are summarized. Finally, the challenges and possible strategies are prospected to develop new assays to meet different diagnostic needs. Thus, this comprehensive and systematic review of SARS-CoV-2 detection technologies may provide insightful guidance and direction for developing tools for the diagnosis and analysis of SARS-CoV-2 to support public healthcare and effective long-term pandemic management and control.
Article
Full-text available
Rapid identification of the rise and spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants of concern remains critical for monitoring of the efficacy of diagnostics, therapeutics, vaccines, and control strategies. A wide range of SARS-CoV-2 next-generation sequencing (NGS) methods have been developed over the last years, but cross-sequence technology benchmarking studies have been scarce. In the current study, 26 clinical samples were sequenced using five protocols: AmpliSeq SARS-CoV-2 (Illumina), EasySeq RC-PCR SARS-CoV-2 (Illumina/NimaGen), Ion AmpliSeq SARS-CoV-2 (Thermo Fisher), custom primer sets (Oxford Nanopore Technologies (ONT)), and capture probe-based viral metagenomics (Roche/Illumina). Studied parameters included genome coverage, depth of coverage, amplicon distribution, and variant calling. The median SARS-CoV-2 genome coverage of samples with cycle threshold (Ct) values of 30 and lower ranged from 81.6 to 99.8% for, respectively, the ONT protocol and Illumina AmpliSeq protocol. Correlation of coverage with PCR Ct values varied per protocol. Amplicon distribution signatures differed across the methods, with peak differences of up to 4 log10 at disbalanced positions in samples with high viral loads (Ct values ≤ 23). Phylogenetic analyses of consensus sequences showed clustering independent of the workflow used. The proportion of SARS-CoV-2 reads in relation to background sequences, as a (cost-)efficiency metric, was the highest for the EasySeq protocol. The hands-on time was the lowest when using EasySeq and ONT protocols, with the latter additionally having the shortest sequence runtime. In conclusion, the studied protocols differed on a variety of the studied metrics. This study provides data that assist laboratories when selecting protocols for their specific setting.
Article
Full-text available
Background: COVID-19 (coronavirus disease 2019) has caused a major epidemic worldwide; however, much is yet to be known about the epidemiology and evolution of the virus partly due to the scarcity of full-length SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) genomes reported. One reason is that the challenges underneath sequencing SARS-CoV-2 directly from clinical samples have not been completely tackled, i.e., sequencing samples with low viral load often results in insufficient viral reads for analyses. Methods: We applied a novel multiplex PCR amplicon (amplicon)-based and hybrid capture (capture)-based sequencing, as well as ultra-high-throughput metatranscriptomic (meta) sequencing in retrieving complete genomes, inter-individual and intra-individual variations of SARS-CoV-2 from serials dilutions of a cultured isolate, and eight clinical samples covering a range of sample types and viral loads. We also examined and compared the sensitivity, accuracy, and other characteristics of these approaches in a comprehensive manner. Results: We demonstrated that both amplicon and capture methods efficiently enriched SARS-CoV-2 content from clinical samples, while the enrichment efficiency of amplicon outran that of capture in more challenging samples. We found that capture was not as accurate as meta and amplicon in identifying between-sample variations, whereas amplicon method was not as accurate as the other two in investigating within-sample variations, suggesting amplicon sequencing was not suitable for studying virus-host interactions and viral transmission that heavily rely on intra-host dynamics. We illustrated that meta uncovered rich genetic information in the clinical samples besides SARS-CoV-2, providing references for clinical diagnostics and therapeutics. Taken all factors above and cost-effectiveness into consideration, we proposed guidance for how to choose sequencing strategy for SARS-CoV-2 under different situations. Conclusions: This is, to the best of our knowledge, the first work systematically investigating inter- and intra-individual variations of SARS-CoV-2 using amplicon- and capture-based whole-genome sequencing, as well as the first comparative study among multiple approaches. Our work offers practical solutions for genome sequencing and analyses of SARS-CoV-2 and other emerging viruses.
Article
Full-text available
With emergence of pandemic COVID-19, rapid and accurate diagnostic testing is essential. This study compared laboratory-developed tests (LDTs) used for the detection of SARS-CoV-2 in Canadian hospital and public health laboratories, and some commercially available real-time RT-PCR assays. Overall, analytical sensitivities were equivalent between LDTs and most commercially available methods.
Article
Full-text available
On March 16 th 2020, WHO Director-General stated “You cannot fight a fire blindfolded. And we cannot stop this [COVID-19] pandemic if we don't know who is infected. We have a simple message for all countries: test, test, test. Test every suspected case.” ( https://www.who.int/dg/speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19---16-march-2020 ) This strategy hinges on the availability of appropriate, validated collection and transport systems to ensure preservation of nucleic acids and compatibility with downstream molecular testing – an acute challenge in the current pandemic. We present direct comparison of COVID-19 specimens collected with FLOQswab Nasopharyngeal Swab preserved in universal transport medium (Copan UTM System, Copan, Italy, catalog No.305C), optimized for viral specimens, and flocked regular nylon tip swab preserved in liquid amies (Eswab Collection system, Copan, Italy, catalog No. 480C), optimized for bacterial specimens.
Article
Full-text available
Background During the current worldwide pandemic, coronavirus disease 2019 (Covid-19) was first diagnosed in Iceland at the end of February. However, data are limited on how SARS-CoV-2, the virus that causes Covid-19, enters and spreads in a population. Methods We targeted testing to persons living in Iceland who were at high risk for infection (mainly those who were symptomatic, had recently traveled to high-risk countries, or had contact with infected persons). We also carried out population screening using two strategies: issuing an open invitation to 10,797 persons and sending random invitations to 2283 persons. We sequenced SARS-CoV-2 from 643 samples. Results As of April 4, a total of 1221 of 9199 persons (13.3%) who were recruited for targeted testing had positive results for infection with SARS-CoV-2. Of those tested in the general population, 87 (0.8%) in the open-invitation screening and 13 (0.6%) in the random-population screening tested positive for the virus. In total, 6% of the population was screened. Most persons in the targeted-testing group who received positive tests early in the study had recently traveled internationally, in contrast to those who tested positive later in the study. Children under 10 years of age were less likely to receive a positive result than were persons 10 years of age or older, with percentages of 6.7% and 13.7%, respectively, for targeted testing; in the population screening, no child under 10 years of age had a positive result, as compared with 0.8% of those 10 years of age or older. Fewer females than males received positive results both in targeted testing (11.0% vs. 16.7%) and in population screening (0.6% vs. 0.9%). The haplotypes of the sequenced SARS-CoV-2 viruses were diverse and changed over time. The percentage of infected participants that was determined through population screening remained stable for the 20-day duration of screening. Conclusions In a population-based study in Iceland, children under 10 years of age and females had a lower incidence of SARS-CoV-2 infection than adolescents or adults and males. The proportion of infected persons identified through population screening did not change substantially during the screening period, which was consistent with a beneficial effect of containment efforts. (Funded by deCODE Genetics–Amgen.)
Article
Full-text available
Nearly 400,000 people worldwide are known to have been infected with SARS-CoV-2 beginning in December 2019. The virus has now spread to over 168 countries including the United States, where the first cluster of cases was observed in the Seattle metropolitan area in Washington. Given the rapid increase in the number of cases in many localities, the availability of accurate, high-throughput SARS-CoV-2 testing is vital to efforts to manage the current public health crisis. In the course of optimizing SARS-CoV-2 testing performed by the University of Washington Clinical Virology Lab (UW Virology Lab), we evaluated assays using seven different primer/probe sets and one assay kit. We found that the most sensitive assays were those that used the E-gene primer/probe set described by Corman et al. (Eurosurveillance 25 (3), 2020, https://doi.org/10.2807/1560-7917.ES.2020.25.3.2000045 ) and the N2 set developed by the CDC (Division of Viral Diseases, Centers for Disease Control and Prevention, 2020, https://www.cdc.gov/coronavirus/2019-ncov/downloads/rt-pcr-panel-primer-probes.pdf ). All assays tested were found to be highly specific for SARS-CoV-2, with no cross-reactivity with other respiratory viruses observed in our analyses regardless of the primer/probe set or kit used. These results will provide valuable information to other clinical laboratories who are actively developing SARS-CoV-2 testing protocols at a time when increased testing capacity is urgently needed worldwide.
Article
An epidemic of respiratory disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) began in China and has spread to other countries.¹ Real-time reverse transcriptase–polymerase chain reaction (rRT-PCR) of nasopharyngeal swabs typically has been used to confirm the clinical diagnosis.² However, whether the virus can be detected in specimens from other sites, and therefore potentially transmitted in other ways than by respiratory droplets, is unknown.
Article
We report diagnosis and management of the first laboratory-confirmed case of coronavirus disease 2019 (COVID-19) hospitalized in Toronto, Canada. No healthcare-associated transmission occurred. In the face of a potential pandemic of COVID-19, we suggest sustainable and scalable control measures developed based on lessons learned from severe acute respiratory syndrome. © 2020 The Author(s) 2020. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail: [email protected]
Article
Background: A novel coronavirus of zoonotic origin (2019-nCoV) has recently been identified in patients with acute respiratory disease. This virus is genetically similar to SARS coronavirus and bat SARS-like coronaviruses. The outbreak was initially detected in Wuhan, a major city of China, but has subsequently been detected in other provinces of China. Travel-associated cases have also been reported in a few other countries. Outbreaks in health care workers indicate human-to-human transmission. Molecular tests for rapid detection of this virus are urgently needed for early identification of infected patients. Methods: We developed two 1-step quantitative real-time reverse-transcription PCR assays to detect two different regions (ORF1b and N) of the viral genome. The primer and probe sets were designed to react with this novel coronavirus and its closely related viruses, such as SARS coronavirus. These assays were evaluated using a panel of positive and negative controls. In addition, respiratory specimens from two 2019-nCoV-infected patients were tested. Results: Using RNA extracted from cells infected by SARS coronavirus as a positive control, these assays were shown to have a dynamic range of at least seven orders of magnitude (2x10-4-2000 TCID50/reaction). Using DNA plasmids as positive standards, the detection limits of these assays were found to be below 10 copies per reaction. All negative control samples were negative in the assays. Samples from two 2019-nCoV-infected patients were positive in the tests. Conclusions: The established assays can achieve a rapid detection of 2019n-CoV in human samples, thereby allowing early identification of patients.