Content uploaded by Rahul Bahulikar
Author content
All content in this area was uploaded by Rahul Bahulikar on Aug 14, 2020
Content may be subject to copyright.
Short Note
1
The Abnormal Nature of the Fecal Swab Sample used for
2
NGS Analysis of RaTG13 Genome Sequence Imposes a
3
Question on the Correctness of the RaTG13 Sequence
4
5
Monali C. Rahalkar1* and Rahul A. Bahulikar2
6
1C2, Bioenergy group, MACS Agharkar Research Institute, G.G. Agarkar Road,
7
Pune 411004, Maharashtra, India
8
2BAIF Development Research Foundation, Central Research Station,
9
Urulikanchan, Pune 412202
10
*Corresponding author: monalirahalkar@aripune.org
11
12
13
14
15
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
© 2020 by the author(s). Distributed under a Creative Commons CC BY license.
1
2
3
4
5
6
7
8
9
10
11
12
Abstract:
RaTG13 is the next relative of SARS-CoV-2 derived from bat feces. The Illumina based
NGS sequence of RaTG13 MN996532.1 was deposited on 27th Jan 2020 and the raw data, a
little later on 13th Feb 2020 https://www.ncbi.nlm.nih.gov/sra/SRX7724752[accn]. The fecal
swab sample shows abnormally high reads from eukaryotes which includes not only bats but
other animals, as per the NCBI site. Also, comparison of the fecal swab to other bat fecal
swabs deposited by the same group on the same date indicates that the fecal swab from which
RaTG13 sequence was derived looked abnormal. The proportion of bacteria in this RNA Seq
project was only 0.7% in contrast to 70-90% abundance in other fecal swabs from bats. Also,
the amplicon sequencing done on the same sample showed large number of gaps and
inconsistencies. This poses a question on the authenticity of the RaTG13 sequence also.
Keywords: RaTG13; SARS-COV-2; Illumina sequencing; amplicon sequencing; NGS; fecal
swab
13
14
15
16
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
Covid-19 has been a devastating pandemic affecting more than nineteen million people in
1
more than 200 countries and killing three quarter million people till now. SARS-CoV2, the
2
virus responsible for the disease is most similar to RaTG13 (a bat derived virus) on the
3
genomic level. RaTG13 has been known as the sister virus of SARS-CoV-2 as its shows
4
96.2% overall genomic similarity to CoV-2 genome (Zhou et al., 2020). RaTG13 has been
5
widely used for various comparative experiments with that of SARS-CoV-2. This includes
6
the capacity of its spike to bind to human ACE-2, its infective capacity, etc. RaTG13 genome
7
is also used for calculations of the common ancestor and also for further calculations before
8
how long RaTG13 and SARS-CoV-2 got separated, etc.
9
RaTG13 is described as the virus (not a real virus, but available as a sequence) from the RNA
10
of a bat fecal swab collected in July 2013, from Tongguan mines in Yunnan. The old name of
11
RaTG13 virus is CoV4991 (Ge et al., 2016). However, the sample appears to be over or not
12
available to the scientific community as per a recent news investigation (2020). One main
13
condition for using RaTG13 for all future experiments is that the sequence of this virus
14
should be accurate and based on a good raw data.
15
RaTG13 never seemed to have existed before SARS-COV-2 was described, as the genome
16
sequence was not available on NCBI before (Zhou et al., 2020) .The Illumina based NGS
17
sequence of RaTG13 MN996532.1 was deposited on 27th Jan 2020 and the raw data, a little
18
later on 13th Feb 2020 https://www.ncbi.nlm.nih.gov/sra/SRX7724752[accn].
19
The earlier name of RaTG13 is CoV/4991. A 370 base RdRp fragment (KP378696.1) of
20
CoV/4991 and showed highest similarity to SARS-CoV-2 RdRp fragment with only 3-5
21
bases different (NCBI blast analysis). Also, 4991 or RaTG13 has a great significance as it
22
was recovered from the same site where a COVID-19 like disease occurred (2020, Rahalkar
23
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
and Bahulikar, 2020). CoV 4991 is also the first and only SARS-like CoV associated with
1
human pneumonia cases, before SARS-COV-2 (Rahalkar and Bahulikar, 2020).
2
Problems seen in the RAW DATA of RaTG13: Illumina sequence SRX7724752
3
Here are the basic discrepancies encountered after the analysis of the Illumina raw data
4
https://www.ncbi.nlm.nih.gov/sra/SRX7724752[accn]:
5
1. The genome of RaTG13 is derived from a fecal or anal swab (MN996532.1). However in
6
the Illumina sequencing description, SRX7724752, the sample is described to be of a BAL
7
fluid (broncho alveolar lavage).
8
2. The total raw data is 3.3 Gb. After the Krona analysis it is seen that ~30% reads are
9
unidentified (no matches) and only ~ 70% reads are identified. Out of the 70%, a vast
10
majority i.e. 68% was contributed by eukaryotes (fig. 1). This is highly unusual as it is a fecal
11
swab and the analysis of other bat fecal or anal swabs cannot show such high proportion of
12
eukaryotic RNA.
13
3. Within the 68% eukaryote sequences, the bat sequences are about 36-40% (Fig 1a.), and
14
rest of the 30% sequences are contributed by squirrels, flying foxes, foxes, and other types of
15
animals (Fig.1 b). First of all, why would such high proportion of eukaryotic sequences
16
appear in the RNA when it’s a fecal swab? From where do these animal sequences come
17
when it is supposed to be a Rhinophus affinis swab? Also, even though the Rhinophus affinis
18
sequence may not be present in the database, why are they similar to so many bat sequences?
19
Some of these bats are found only in Mexico or USA (Zhang, 2020).
20
4. The RNA Seq data shows extremely less abundance of bacteria, only 0.65%. This is far too
21
less in comparison to other fecal or anal swab of bats, which show a very high proportion of
22
bacterial sequences ~76-90% (Fig.2 and.3). SRA data of six other fecal swabs submitted by
23
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
the same group were used for comparison (data not shown). Bacteria are the highest
1
constituents of a fecal sample.
2
5. The coronavirus sequence (RaTG13) contributes to only ~0.003% of the total sequence
3
reads. These raw reads were used to build an almost complete assembly, though the overall
4
coverage is very less ~8X. Though there were less overlaps in some regions there are only 2-
5
3 gaps. The Wuhan Institute of Virology has recently described methods like probe-capture
6
for getting the whole genome of viruses from samples like bat feces (Li et al 2019). In this
7
case, without the use of any other methods, and after using so old fecal swab or fecal swab
8
RNA with no bacteria in it, how did they recover such good quality viral reads?
9
6. The assembly method and the actual assembly accession for RaTG13 is not described or
10
linked to MN669532 and also no assembly method is specified in the raw data SRX7724752
11
and the Illumina run. Therefore, no assembly data is available for RaTG13 genome.
12
7. After blasting the RaTG13 genome against the SRA, ~1700 reads can be retrieved which
13
covers only 252 Kb of the total 3.3 Gb. The genome size of RaTG13 is known to be ~30 kb.
14
Therefore this is ~8x coverage, which is quite less and insufficient to arrive to a definitive
15
assembly. Then how was the sequence MN669532 used so confidently by various researchers
16
without any doubt?
17
8. We also compared the fecal/anal swab from the same species, i.e. Rhinolophus affinis
18
(Fig.2) and fecal swab from another bat (Fig. 3) and it clearly shows that the other two swabs
19
showed normal findings, with 70-90% bacterial reads and very few reads associated with the
20
host. Also these swabs do not show sequences coming from other animals.
21
9. Similar findings have been documented in a latest preprint by Zhang, D. (Zhang,
22
2020) https://zenodo.org/record/3969272#.Xypwfn5S-Un.
23
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
Problems in the Amplicon sequencing data:
1
We found that some amplicon sequencing data for RaTG13 (SRX8357956) was submitted in
2
May 2020.
3
1. No indications of amplicon sequencing given by Zhou et al 2020 about the amplicon
4
sequencing of RaTG13. There are in total 33 spots with forward and reverse sequences.
5
2. This sequencing shows that the dates are 2017 and 2018. However, the submission has
6
been done in 2020. This sequencing has never been mentioned in any publications. Also, it
7
does not cover the entire genome and major gaps are seen in various regions.
8
3. There are two contrasting sequences for a single patch (spots 23 and spot 24), e.g. shows
9
94-96% similarity to that of MN669532.1. However, another spot the same sequence showed
10
99% similarity to the described RaTG13 consensus MN669532.1.
11
4. In general, the amplicons show 97-99% similarity with the MN669532.1. However, it does
12
not cover the entire genome and major gaps are seen in various regions.
13
5. Also the RdRp derived from the amplicon sequencing is incomplete and does not match
14
with RdRp of 4991 KP876546.1. Around 170 bases from 370 base sequences are missing and
15
it shows 2 base mismatches.
16
Conclusions:
17
a. Our main objection is that the fecal swab from which RaTG13 sequence
18
is derived does not appear like a normal fecal sample due to the above
19
listed things.
20
b. RaTG13 sequence has been used extensively for all genomic comparisons
21
as it is believed to be the next relative of SARS-CoV-2.
22
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
c. However, the nature of the fecal swab appears very suspicious, with 70%
1
of eukaryotic sequences also from sources which should not have been
2
detected in bat feces like mexican bats, squirrels, flying foxes, red foxes,
3
etc.).
4
d. And most importantly, there is negligible abundance of bacteria.
5
Bacteria constitute a major part of any feces, irrespective if it is an animal
6
or bird or any eukaryote.
7
e. The reads from which the viral sequence of RaTG13 was derived
8
appears not to be affected. An almost complete assembly is assumed to be
9
had been built from this raw data (Illumina reads). How did so good data
10
come from an otherwise abnormal looking, old and degraded fecal swab
11
sample preserved for 7-8 years?
12
f. The amplicon data is incomplete and submitted much later and
13
undescribed anywhere.
14
g. The question is why are these anomalies? And if these are there, should
15
the scientific community really rely on the RaTG13 genome sequence
16
MN996532.1? Should this data be used for further important experiments?
17
18
19
20
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
Fig 2. RNA-Seq of Rhinolophus affinis: Fecal swab Taxonomy Analysis
1
https://www.ncbi.nlm.nih.gov/sra/SRX7724693[accn]
2
3
Fig. 2a. RNA-Seq of Rhinolophus affinis: Anal swab (SRR11085736)
4
5
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1
Fig. 4
1
2
3
References:
4
2020. https://www.thetimes.co.uk/article/seven year covid trail revealed l5vxt7jqp. The Sunday
5
Times.
6
Ge, X. Y., Wang, N., Zhang, W., Hu, B., Li, B., Zhang, Y. Z., Zhou, J. H., Luo, C. M., Yang, X. L., Wu, L. J.,
7
Wang, B., Zhang, Y., Li, Z. X. & Shi, Z. L. 2016. Coexistence of multiple coronaviruses in
8
several bat colonies in an abandoned mineshaft. Virol. Sin., 31, 31-40.
9
Rahalkar, Monali C. & Bahulikar, Rahul A. 2020. Understanding the origin of ‘BatCoVRaTG13’, a virus
10
closest to SARS-CoV-2.
11
Zhang, Daoyu 2020. Anomalies in BatCoV/RaTG13 sequencing and provenance.
12
https://zenodo.org/record/3969272#.Xy0m5jVS_IX.
13
Zhou, P., Yang, X. L., Wang, X. G., Hu, B., Zhang, L., Zhang, W., Si, H. R., Zhu, Y., Li, B., Huang, C. L.,
14
Chen, H. D., Chen, J., Luo, Y., Guo, H., Jiang, R. D., Liu, M. Q., Chen, Y., Shen, X. R., Wang, X.,
15
Zheng, X. S., Zhao, K., Chen, Q. J., Deng, F., Liu, L. L., Yan, B., Zhan, F. X., Wang, Y. Y., Xiao, G.
16
F. & Shi, Z. L. 2020. A pneumonia outbreak associated with a new coronavirus of probable
17
bat origin. Nature, 579, 270-273.
18
19
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 August 2020 doi:10.20944/preprints202008.0205.v1