PreprintPDF Available

The Abnormal Nature of the Fecal Swab Sample used for NGS Analysis of RaTG13 Genome Sequence Imposes a Question on the Correctness of the RaTG13 Sequence

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

RaTG13 is the next relative of SARS-CoV-2 derived from bat feces. The Illumina based NGS sequence of RaTG13 MN996532.1 was deposited on 27th Jan 2020 and the raw data, a little later on 13th Feb 2020 https://www.ncbi.nlm.nih.gov/sra/SRX7724752[accn]. The fecal swab sample shows abnormally high reads from eukaryotes which includes not only bats but other animals, as per the NCBI site. Also, comparison of the fecal swab to other bat fecal swabs deposited by the same group on the same date indicates that the fecal swab from which RaTG13 sequence was derived looked abnormal. The proportion of bacteria in this RNA Seq project was only 0.7% in contrast to 70-90% abundance in other fecal swabs from bats. Also, the amplicon sequencing done on the same sample showed large number of gaps and inconsistencies. This poses a question on the authenticity of the RaTG13 sequence also.
Content may be subject to copyright.
Short Note
1
The Abnormal Nature of the Fecal Swab Sample used for
2
NGS Analysis of RaTG13 Genome Sequence Imposes a
3
Question on the Correctness of the RaTG13 Sequence
4
5
Monali C. Rahalkar1* and Rahul A. Bahulikar2
6
1C2, Bioenergy group, MACS Agharkar Research Institute, G.G. Agarkar Road,
7
Pune 411004, Maharashtra, India
8
2BAIF Development Research Foundation, Central Research Station,
9
Urulikanchan, Pune 412202
10
*Corresponding author: monalirahalkar@aripune.org
11
12
13
14
15
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
© 2020 by the author(s). Distributed under a Creative Commons CC BY license.
1
2
3
4
5
6
7
8
9
10
11
12
Abstract:
RaTG13 is the next relative of SARS-CoV-2 derived from bat feces. The Illumina based
NGS sequence of RaTG13 MN996532.1 was deposited on 27th Jan 2020 and the raw data, a
little later on 13th Feb 2020 https://www.ncbi.nlm.nih.gov/sra/SRX7724752[accn]. The fecal
swab sample shows abnormally high reads from eukaryotes which includes not only bats but
other animals, as per the NCBI site. Also, comparison of the fecal swab to other bat fecal
swabs deposited by the same group on the same date indicates that the fecal swab from which
RaTG13 sequence was derived looked abnormal. The proportion of bacteria in this RNA Seq
project was only 0.7% in contrast to 70-90% abundance in other fecal swabs from bats. Also,
the amplicon sequencing done on the same sample showed large number of gaps and
inconsistencies. This poses a question on the authenticity of the RaTG13 sequence also.
Keywords: RaTG13; SARS-COV-2; Illumina sequencing; amplicon sequencing; NGS; fecal
swab
13
14
15
16
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
Covid-19 has been a devastating pandemic affecting more than nineteen million people in
1
more than 200 countries and killing three quarter million people till now. SARS-CoV2, the
2
virus responsible for the disease is most similar to RaTG13 (a bat derived virus) on the
3
genomic level. RaTG13 has been known as the sister virus of SARS-CoV-2 as its shows
4
96.2% overall genomic similarity to CoV-2 genome (Zhou et al., 2020). RaTG13 has been
5
widely used for various comparative experiments with that of SARS-CoV-2. This includes
6
the capacity of its spike to bind to human ACE-2, its infective capacity, etc. RaTG13 genome
7
is also used for calculations of the common ancestor and also for further calculations before
8
how long RaTG13 and SARS-CoV-2 got separated, etc.
9
RaTG13 is described as the virus (not a real virus, but available as a sequence) from the RNA
10
of a bat fecal swab collected in July 2013, from Tongguan mines in Yunnan. The old name of
11
RaTG13 virus is CoV4991 (Ge et al., 2016). However, the sample appears to be over or not
12
available to the scientific community as per a recent news investigation (2020). One main
13
condition for using RaTG13 for all future experiments is that the sequence of this virus
14
should be accurate and based on a good raw data.
15
RaTG13 never seemed to have existed before SARS-COV-2 was described, as the genome
16
sequence was not available on NCBI before (Zhou et al., 2020) .The Illumina based NGS
17
sequence of RaTG13 MN996532.1 was deposited on 27th Jan 2020 and the raw data, a little
18
later on 13th Feb 2020 https://www.ncbi.nlm.nih.gov/sra/SRX7724752[accn].
19
The earlier name of RaTG13 is CoV/4991. A 370 base RdRp fragment (KP378696.1) of
20
CoV/4991 and showed highest similarity to SARS-CoV-2 RdRp fragment with only 3-5
21
bases different (NCBI blast analysis). Also, 4991 or RaTG13 has a great significance as it
22
was recovered from the same site where a COVID-19 like disease occurred (2020, Rahalkar
23
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
and Bahulikar, 2020). CoV 4991 is also the first and only SARS-like CoV associated with
1
human pneumonia cases, before SARS-COV-2 (Rahalkar and Bahulikar, 2020).
2
Problems seen in the RAW DATA of RaTG13: Illumina sequence SRX7724752
3
Here are the basic discrepancies encountered after the analysis of the Illumina raw data
4
https://www.ncbi.nlm.nih.gov/sra/SRX7724752[accn]:
5
1. The genome of RaTG13 is derived from a fecal or anal swab (MN996532.1). However in
6
the Illumina sequencing description, SRX7724752, the sample is described to be of a BAL
7
fluid (broncho alveolar lavage).
8
2. The total raw data is 3.3 Gb. After the Krona analysis it is seen that ~30% reads are
9
unidentified (no matches) and only ~ 70% reads are identified. Out of the 70%, a vast
10
majority i.e. 68% was contributed by eukaryotes (fig. 1). This is highly unusual as it is a fecal
11
swab and the analysis of other bat fecal or anal swabs cannot show such high proportion of
12
eukaryotic RNA.
13
3. Within the 68% eukaryote sequences, the bat sequences are about 36-40% (Fig 1a.), and
14
rest of the 30% sequences are contributed by squirrels, flying foxes, foxes, and other types of
15
animals (Fig.1 b). First of all, why would such high proportion of eukaryotic sequences
16
appear in the RNA when it’s a fecal swab? From where do these animal sequences come
17
when it is supposed to be a Rhinophus affinis swab? Also, even though the Rhinophus affinis
18
sequence may not be present in the database, why are they similar to so many bat sequences?
19
Some of these bats are found only in Mexico or USA (Zhang, 2020).
20
4. The RNA Seq data shows extremely less abundance of bacteria, only 0.65%. This is far too
21
less in comparison to other fecal or anal swab of bats, which show a very high proportion of
22
bacterial sequences ~76-90% (Fig.2 and.3). SRA data of six other fecal swabs submitted by
23
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
the same group were used for comparison (data not shown). Bacteria are the highest
1
constituents of a fecal sample.
2
5. The coronavirus sequence (RaTG13) contributes to only ~0.003% of the total sequence
3
reads. These raw reads were used to build an almost complete assembly, though the overall
4
coverage is very less ~8X. Though there were less overlaps in some regions there are only 2-
5
3 gaps. The Wuhan Institute of Virology has recently described methods like probe-capture
6
for getting the whole genome of viruses from samples like bat feces (Li et al 2019). In this
7
case, without the use of any other methods, and after using so old fecal swab or fecal swab
8
RNA with no bacteria in it, how did they recover such good quality viral reads?
9
6. The assembly method and the actual assembly accession for RaTG13 is not described or
10
linked to MN669532 and also no assembly method is specified in the raw data SRX7724752
11
and the Illumina run. Therefore, no assembly data is available for RaTG13 genome.
12
7. After blasting the RaTG13 genome against the SRA, ~1700 reads can be retrieved which
13
covers only 252 Kb of the total 3.3 Gb. The genome size of RaTG13 is known to be ~30 kb.
14
Therefore this is ~8x coverage, which is quite less and insufficient to arrive to a definitive
15
assembly. Then how was the sequence MN669532 used so confidently by various researchers
16
without any doubt?
17
8. We also compared the fecal/anal swab from the same species, i.e. Rhinolophus affinis
18
(Fig.2) and fecal swab from another bat (Fig. 3) and it clearly shows that the other two swabs
19
showed normal findings, with 70-90% bacterial reads and very few reads associated with the
20
host. Also these swabs do not show sequences coming from other animals.
21
9. Similar findings have been documented in a latest preprint by Zhang, D. (Zhang,
22
2020) https://zenodo.org/record/3969272#.Xypwfn5S-Un.
23
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
Problems in the Amplicon sequencing data:
1
We found that some amplicon sequencing data for RaTG13 (SRX8357956) was submitted in
2
May 2020.
3
1. No indications of amplicon sequencing given by Zhou et al 2020 about the amplicon
4
sequencing of RaTG13. There are in total 33 spots with forward and reverse sequences.
5
2. This sequencing shows that the dates are 2017 and 2018. However, the submission has
6
been done in 2020. This sequencing has never been mentioned in any publications. Also, it
7
does not cover the entire genome and major gaps are seen in various regions.
8
3. There are two contrasting sequences for a single patch (spots 23 and spot 24), e.g. shows
9
94-96% similarity to that of MN669532.1. However, another spot the same sequence showed
10
99% similarity to the described RaTG13 consensus MN669532.1.
11
4. In general, the amplicons show 97-99% similarity with the MN669532.1. However, it does
12
not cover the entire genome and major gaps are seen in various regions.
13
5. Also the RdRp derived from the amplicon sequencing is incomplete and does not match
14
with RdRp of 4991 KP876546.1. Around 170 bases from 370 base sequences are missing and
15
it shows 2 base mismatches.
16
Conclusions:
17
a. Our main objection is that the fecal swab from which RaTG13 sequence
18
is derived does not appear like a normal fecal sample due to the above
19
listed things.
20
b. RaTG13 sequence has been used extensively for all genomic comparisons
21
as it is believed to be the next relative of SARS-CoV-2.
22
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
c. However, the nature of the fecal swab appears very suspicious, with 70%
1
of eukaryotic sequences also from sources which should not have been
2
detected in bat feces like mexican bats, squirrels, flying foxes, red foxes,
3
etc.).
4
d. And most importantly, there is negligible abundance of bacteria.
5
Bacteria constitute a major part of any feces, irrespective if it is an animal
6
or bird or any eukaryote.
7
e. The reads from which the viral sequence of RaTG13 was derived
8
appears not to be affected. An almost complete assembly is assumed to be
9
had been built from this raw data (Illumina reads). How did so good data
10
come from an otherwise abnormal looking, old and degraded fecal swab
11
sample preserved for 7-8 years?
12
f. The amplicon data is incomplete and submitted much later and
13
undescribed anywhere.
14
g. The question is why are these anomalies? And if these are there, should
15
the scientific community really rely on the RaTG13 genome sequence
16
MN996532.1? Should this data be used for further important experiments?
17
18
19
20
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
Figures:
1
Fig.1 RNA-Seq of Rhinolophus affinis:Fecal swabTaxonomy Analysis (RaTG13)
2
3
Fig1a. RNA-Seq of Rhinolophus affinis:Fecal swab (RaTG13)
4
5
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
1
2
3
Fig. 1b. Distribution of the reads in the raw data. The individual distribution is given and in the
4
second part, the reads which contribute to a higher extent are given.
5
6
7
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
1
2
Fig.1 c. Krona chart of RaTG13 raw data, 29% unidentified reads, 43% Chiroptera, 13% Gileres, 3%
3
Primates, 0.7% bacteria and 0.024% RaTG13 reads
4
5
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
Fig 2. RNA-Seq of Rhinolophus affinis: Fecal swab Taxonomy Analysis
1
https://www.ncbi.nlm.nih.gov/sra/SRX7724693[accn]
2
3
Fig. 2a. RNA-Seq of Rhinolophus affinis: Anal swab (SRR11085736)
4
5
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
1
Fig. 2b. Distribution of the reads in the raw data. The individual distribution is given and in the
2
second part, the reads which contribute to a higher extent are given.
3
4
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
1
Fig. 2c. Krona chart of the anal swab of Rhinolophus affinis: Fecal swab Taxonomy
2
3
4
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
Fig 3 RNA-Seq of Miniopterus schreibersii: Fecal swab Taxonomy Analysis
1
2
Fig. 3a. RNA-Seq of fecal swab Miniopterus schreibersii
3
4
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
Fig. 3c. Krona chart of Miniopterus schreibersii: Fecal swab Taxonomy
1
2
3
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
Fig. 4
1
2
3
References:
4
2020. https://www.thetimes.co.uk/article/seven year covid trail revealed l5vxt7jqp. The Sunday
5
Times.
6
Ge, X. Y., Wang, N., Zhang, W., Hu, B., Li, B., Zhang, Y. Z., Zhou, J. H., Luo, C. M., Yang, X. L., Wu, L. J.,
7
Wang, B., Zhang, Y., Li, Z. X. & Shi, Z. L. 2016. Coexistence of multiple coronaviruses in
8
several bat colonies in an abandoned mineshaft. Virol. Sin., 31, 31-40.
9
Rahalkar, Monali C. & Bahulikar, Rahul A. 2020. Understanding the origin of ‘BatCoVRaTG13’, a virus
10
closest to SARS-CoV-2.
11
Zhang, Daoyu 2020. Anomalies in BatCoV/RaTG13 sequencing and provenance.
12
https://zenodo.org/record/3969272#.Xy0m5jVS_IX.
13
Zhou, P., Yang, X. L., Wang, X. G., Hu, B., Zhang, L., Zhang, W., Si, H. R., Zhu, Y., Li, B., Huang, C. L.,
14
Chen, H. D., Chen, J., Luo, Y., Guo, H., Jiang, R. D., Liu, M. Q., Chen, Y., Shen, X. R., Wang, X.,
15
Zheng, X. S., Zhao, K., Chen, Q. J., Deng, F., Liu, L. L., Yan, B., Zhan, F. X., Wang, Y. Y., Xiao, G.
16
F. & Shi, Z. L. 2020. A pneumonia outbreak associated with a new coronavirus of probable
17
bat origin. Nature, 579, 270-273.
18
19
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
... Whereas adequate experimental details have been provided for the human sample 2 , this was found lacking for the RaTG13 which was apparently rediscovered after the COVID-19 outbreak by the same group. Recently multiple other publications have come out which discuss concerns 21,22,23,24 about the RaTG13 sequence and the associated dataset. We analyzed the sequence and found that the data quality issues along with the lack of sufficient experimental details preclude reliable inference of the origins of SARS-CoV-2 . ...
... The scientific community has focused on the human isolate, however we show that the RaTG13 sequence needs equal attention in order to draw valid conclusions about the origins of SARS-CoV-2. Our work, in addition to works by other authors 21,22,23,24 , is an attempt to stimulate a dispassionate review of the dataset and to determine the (complete/partial) sequence(s) that emanates from it. We also argue that it may be prudent to withhold conjectures and await details as to the methods adopted that led to the generation of this unique dataset and the genome sequence. ...
Preprint
An intense scientific debate is ongoing as to the origin of SARS-CoV-2. An oft-cited piece of information in this debate is the genome sequence of a bat coronavirus strain referred to as RaTG13 1 mentioned in a recent Nature paper 2 showing 96.2% genome homology with SARS-CoV-2. This is discussed as a fossil record of a strain whose current existence is unknown. The said strain is conjectured by many to have been part of the ancestral pool from which SARS-CoV-2 may have evolved 7, 8, 9. Multiple groups have been discussing the features of the genome sequence of the said strain. In this paper, we report that the currently specified level of details are grossly insufficient to draw inferences about the origin of SARS-CoV-2. De-novo assembly, KRONA analysis for metagenomic and re-examining data quality highlights the key issues with the RaTG13 genome and the need for a dispassionate review of this data. This work is a call to action for the scientific community to better collate scientific evidence about the origins of SARS-CoV-2 so that future incidence of such pandemics may be effectively mitigated.
Chapter
The COVID-19 pandemic caused by the novel coronavirus SARS-CoV-2 has led to deaths worldwide and decimation of the global economy. Despite its tremendous impact, the origin of SARS-CoV-2 has remained mysterious and controversial. The natural origin theory, although widely accepted, lacks substantial support. The alternative theory that the virus may have come from a research laboratory is, however, censored on peer-reviewed scientific journals. Nonetheless, SARS-CoV-2 shows biological characteristics that are inconsistent with a naturally occurring, zoonotic virus. In this report, the authors describe the genomic, structural, medical, and literature evidence, which, when considered together, strongly contradicts the natural origin theory. The evidence shows that SARS-CoV2 should be a laboratory product created by using bat coronaviruses ZC45 and/or ZXC21 as a template and/or backbone. Building upon the evidence, the authors further postulate a synthetic route for SARS-CoV-2, demonstrating that the lab-creation of this coronavirus is convenient and can be accomplished in approximately six months.
Chapter
Two possibilities should be considered for the origin of SARS-CoV-2: natural evolution or laboratory creation. In the authors' earlier report titled “Unusual Features of the SARS-CoV-2 Genome Suggesting Sophisticated Laboratory Modification Rather Than Natural Evolution and Delineation of Its Probable Synthetic Route,” they disproved the possibility of SARS-CoV-2 arising naturally through evolution and instead proved that SARS-CoV-2 must have been a product of laboratory modification. Despite this and similar efforts, the laboratory creation theory continues to be downplayed or even diminished. This is fundamentally because the natural origin theory remains supported by several novel coronaviruses published after the start of the outbreak. Here, however, the authors use in-depth analyses of the available data and literature to prove that these novel animal coronaviruses do not exist in nature.
Article
Full-text available
Since the SARS outbreak 18 years ago, a large number of severe acute respiratory syndrome-related coronaviruses (SARSr-CoV) have been discovered in their natural reservoir host, bats1–4. Previous studies indicated that some of those bat SARSr-CoVs have the potential to infect humans5–7. Here we report the identification and characterization of a novel coronavirus (2019-nCoV) which caused an epidemic of acute respiratory syndrome in humans in Wuhan, China. The epidemic, which started from 12 December 2019, has caused 2,050 laboratory-confirmed infections with 56 fatal cases by 26 January 2020. Full-length genome sequences were obtained from five patients at the early stage of the outbreak. They are almost identical to each other and share 79.5% sequence identify to SARS-CoV. Furthermore, it was found that 2019-nCoV is 96% identical at the whole-genome level to a bat coronavirus. The pairwise protein sequence analysis of seven conserved non-structural proteins show that this virus belongs to the species of SARSr-CoV. The 2019-nCoV virus was then isolated from the bronchoalveolar lavage fluid of a critically ill patient, which can be neutralized by sera from several patients. Importantly, we have confirmed that this novel CoV uses the same cell entry receptor, ACE2, as SARS-CoV.
Article
Since the 2002–2003 severe acute respiratory syndrome (SARS) outbreak prompted a search for the natural reservoir of the SARS coronavirus, numerous alpha- and betacoronaviruses have been discovered in bats around the world. Bats are likely the natural reservoir of alpha- and betacoronaviruses, and due to the rich diversity and global distribution of bats, the number of bat coronaviruses will likely increase. We conducted a surveillance of coronaviruses in bats in an abandoned mineshaft in Mojiang County, Yunnan Province, China, from 2012–2013. Six bat species were frequently detected in the cave: Rhinolophus sinicus, Rhinolophus affinis, Hipposideros pomona, Miniopterus schreibersii, Miniopterus fuliginosus, and Miniopterus fuscus. By sequencing PCR products of the coronavirus RNA-dependent RNA polymerase gene (RdRp), we found a high frequency of infection by a diverse group of coronaviruses in different bat species in the mineshaft. Sequenced partial RdRp fragments had 80%–99% nucleic acid sequence identity with well-characterized Alphacoronavirus species, including BtCoV HKU2, BtCoV HKU8, and BtCoV1, and unassigned species BtCoV HKU7 and BtCoV HKU10. Additionally, the surveillance identified two unclassified betacoronaviruses, one new strain of SARS-like coronavirus, and one potentially new betacoronavirus species. Furthermore, coronavirus co-infection was detected in all six bat species, a phenomenon that fosters recombination and promotes the emergence of novel virus strains. Our findings highlight the importance of bats as natural reservoirs of coronaviruses and the potentially zoonotic source of viral pathogens.
Understanding the origin of 'BatCoVRaTG13', a virus 10 closest to SARS-CoV-2
  • Monali C Rahalkar
  • Bahulikar
  • A Rahul
Rahalkar, Monali C. & Bahulikar, Rahul A. 2020. Understanding the origin of 'BatCoVRaTG13', a virus 10 closest to SARS-CoV-2.
Anomalies in BatCoV/RaTG13 sequencing and provenance
  • Daoyu Zhang
Zhang, Daoyu 2020. Anomalies in BatCoV/RaTG13 sequencing and provenance. 12 https://zenodo.org/record/3969272#.Xy0m5jVS_IX.