PreprintPDF Available

Verifying Covid-19 evolved through natural selection from SARS by searching for structural modifications

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

When the new Corona virus outbreak started in Wuhan, China, the genome of the virus was quickly isolated and the sequence publicly released. It was immediately clear that the new virus was a strain of Corona viruses and probably evolved from SARS. There have been understandable concerns that the virus was created by and escaped from the Wuhan Institute of Virology (WIV). To substantiate manual intervention in de Covid-19 genome, we compare the older SARS sequence with the newer Covid-19 sequence and reach the conclusion that there was no manual intervention and the Covid-19 sequence evolved through natural selection from SARS. The question of whether this evolution occurred naturally or under controlled circumstances in a lab remains difficult to prove and is left open.
Content may be subject to copyright.
Verifying Covid-19 evolved through
natural selection from SARS by searching
for structural modifications
Robert ten Hoora, April 2020
artenhoor@emolta.nl
Abstract: When the new Corona virus outbreak started in Wuhan, China, the genome of the virus
was quickly isolated and the sequence publicly released. It was immediately clear that the new virus
was a strain of Corona viruses and probably evolved from SARS. There have been understandable
concerns that the virus was created by and escaped from the Wuhan Institute of Virology (WIV). To
substantiate manual intervention in de Covid-19 genome, we compare the older SARS sequence with
the newer Covid-19 sequence and reach the conclusion that there was no manual intervention and
the Covid-19 sequence evolved through natural selection from SARS. The question of whether this
evolution occurred naturally or under controlled circumstances in a lab remains difficult to prove and
is left open.
Keywords: SARS, COVID-19
Conflicts of Interest: The authors declare no conflict of interest.
1. Introduction
In the beginning of 2020 a new human virus caused a global pandemic. The virus was isolated from
patients and called Covid-19. When the sequence of the RNA-virus became publicly available (1) it
was immediately clear that the new genome was genetically similar to another virus know as SARS
(3), a Corona virus.
A number of studies have subsequently been conducted comparing the two genomes (5, 10). The
genomes are so similar that it is reasonable to initially assume that the Covid-19 genome evolved
from the SARS genome.
Some studies have concluded that the new genome was modified in a bio-lab, some have found
evidence of parts of foreign viruses in the new virus genome, like for example parts originating from
HIV (6). They assert that the ease of cell-entry of the virus (and hence the level of its infectiousness)
was artificially improved.
If that is the case, these additions have not evolved naturally through natural selection, but were
inserted manually (9). Using the latest technology today, it is very difficult if not impossible- to
engineer an effective RNA sequence from nothing. Most likely therefore, these effective insertions
would have to be taken from existing, natural viruses.
In that case, pieces of other viruses should be present in the Covid-19 sequence that are not present
in the older SARS sequence. A number of studies claim to have found such inserts (6) and the fact
that controversy remains shows the difficulty in alignment of the two sequences.
In this article we are showing an alignment approach that closely matches the two sequences
together. We experiment with an algorithm to align the sequences as good as possible and are able
to show an almost complete match between the two sequences.
1.1. RNA editing with endogenous ADARs and other methods of modification
Using enzymes, parts of foreign RNA can be introduced into a natural virus (9). Equally, parts of the
natural RNA can be removed. If this has actually happened in the case of Covid-19, these changes
would show up as larger unmatched areas existing in the sequence of Covid-19. As these changes
would not have evolved from SARS, they would not show up in the SARS sequence.
1.2. Evolutionary modifications
Another way of changing the RNA of a virus is by growing several generations of populations of the
virus in a controlled environment. From the resulting populations virus specimen leaning most
towards the desired qualities would be selected manually. This process can be repeated until the
desired qualities appear in the population.
Changes thus effected are in essence naturally evolved. It will be near impossible to prove human
intervention in their appearance in the new genome.
1.3. Alignment is difficult for these reasons
The two genome-sequences of Covid-19 and SARS are very different. There are many modifications
in the evolved Covid-19 sequence. Many of these modifications appear close to each other. This
causes large pieces of the sequences to be aligned easily, but other regions do not easily match up.
Also, there are so many modified regions that alignment in parts becomes very fussy.
However, just outside of the modified regions, that is, before and after the modified regions, the
sequences usually align closely again. Using these alignments as anchor points, the modified region
between them can usually be aligned as well. It appears this approach works in most cases for the
SARS Covid-19 alignment. If not careful, it is easy to lose alignment in a larger part of modification,
causing a shift in the alignment between the two sequences which is then hard to detect. The
undetected alignment and large alignment shift will appear to be a large structural insert or deletion
(“indel”). This may lead to suspicion of man-made inserts by some researchers.
2. Methods
2.1. Acquisition of Sequence Data
The SARS genome was downloaded from (4) on 2/2/2020 and the Covid-19 genome was downloaded
from (2) on 28/1/2020.
2.2. Data and Code Availability
All sequence and immunological data is available online. The script used to align the sequences
(written in C#) is available in appendix 1.
2.3. Working of the algorithm used for alignment
The algorithm assumes that one of the two sequences has evolved from the other and that they can
be aligned.
First, the algorithm searches for longer fragments that are an exact match between the two
sequences (the anchor points). These exact matches may not be in the same position on both
sequences. Insertions or deletions of base pairs may have shifted the alignment.
These identical fragments are then sorted based on their length, assuming that the longer the
fragment, the higher the chance of the Covid-19 genome having it inherited form SARS.
For the purpose of this study, all fragments of longer than 100 base pairs (“Bp.”) were used as
anchors with the highest likelihood of matching between the two sequences.
Next, the areas between the anchors are matched. These do not match perfectly, so some allowance
was made in the matching criteria, as described in paragraph [2.3.2].
2.3.1. Identifying largest equal segments
In a sequence as long as the SARS/Covid-19 sequences, there will be shorter sub-sequences that will
match. For example, a 2 Bp. sequence will have many matches on either virus sequence. We
experimented with various lengths. Sorting the exactly matching areas by length. We decided that a
minimal length of 100 Bp. would give enough anchors (it gave 51) to further align the entire genomes
but at the same time prevent coincidental, false matches.
Table 1. Longer base pair regions (>100 Bp.) that exactly match between the two complete genomes.
Sorted by length of the exact match in Bp. The table shows the position of the matching region on
the respective genomes.
Match
Position
Covid-19
Position
SARS
Length of
matching
area
1
14116
14046
4449
2
23630
23505
2121
3
10991
10921
1794
4
7103
7033
1383
5
28292
28141
1224
6
18817
18747
1143
7
20764
20694
798
8
773
772
702
9
1631
1630
627
10
9107
9037
582
11
26533
26405
552
12
20241
20171
519
13
5879
5809
471
14
8672
8602
429
15
22484
22374
408
16
142
141
372
17
23012
22899
366
18
5099
5029
363
19
29530
29387
363
20
26118
25990
351
21
9359
9289
330
22
4553
4480
327
23
6353
6283
327
24
9362
9292
327
25
9365
9295
324
26
9368
9298
321
27
9371
9301
318
28
9374
9304
315
29
3542
3475
285
30
5639
5569
237
31
27373
27252
231
32
21779
21699
222
33
2720
2722
219
34
4880
4810
216
35
27670
27552
213
36
23381
23268
210
37
3335
3268
204
38
27087
26959
204
39
25933
25808
180
40
6733
6663
171
41
25753
25628
162
42
3023
3028
141
43
3099
3104
129
44
3954
3881
129
45
8488
8418
129
46
22344
22234
129
47
2942
2944
126
48
4143
4070
123
49
2587
2586
117
50
1480
1479
111
51
22185
22093
105
The difference in position (Position Covid-19 minus Position SARS) is called the offset. Sorting table 1
by position on the Covid-19 sequence confirms a steadily growing offset as the position is increased.
This indicates that base pairs were sometimes added during the evolutionary process. The offset
turns positive towards the end of the sequences as the Covid-19 sequence is longer than the SARS
sequence.
Table 2. Longer base pair areas (>100 Bp.) that exactly match between the two complete genomes.
The table shows the position of the matching region on the respective genomes and the offset,
sorted by position on the Covid-19 sequence.
Match
Position
Covid-
19
Position
SARS
16
142
141
8
773
772
50
1480
1479
9
1631
1630
49
2587
2586
33
2720
2722
47
2942
2944
42
3023
3028
43
3099
3104
37
3335
3268
29
3542
3475
44
3954
3881
48
4143
4070
22
4553
4480
34
4880
4810
18
5099
5029
30
5639
5569
13
5879
5809
23
6353
6283
40
6733
6663
4
7103
7033
45
8488
8418
14
8672
8602
10
9107
9037
21
9359
9289
24
9362
9292
25
9365
9295
26
9368
9298
27
9371
9301
28
9374
9304
3
10991
10921
1
14116
14046
6
18817
18747
12
20241
20171
7
20764
20694
32
21779
21699
51
22185
22093
46
22344
22234
15
22484
22374
17
23012
22899
36
23381
23268
2
23630
23505
41
25753
25628
39
25933
25808
20
26118
25990
11
26533
26405
38
27087
26959
31
27373
27252
35
27670
27552
5
28292
28141
19
29530
29387
2.3.2. Matching similar segments
In between the largest identical fragments are large regions where the sequences do not exactly
match. By taking blocks of 18 Bp. on each sequence and considering these matches if at least half the
Bp. within the block match, most of the remaining sequence could indeed be matched. On occasions
one of the two sequences had to be offset against the other. This is indicative of a small insert/delete
at that position.
A manual inspection confirmed that the matches seemed reasonable and both sequences aligned
well.
2.3.3. Manual checks and improvements
Some parts were still not matched automatically and were therefore inspected manually. This is
indicated by the spaces and x-es in the matched file (8). The manual corrections indeed improved the
alignment between the sequences further.
2.4. Other manual checks
2.4.1. Manually checked inserted sequences
A jump of 72 Bp. in the offset occurs between Covid-19 position 3099 and 3335 (table 2). This is
indicative of a large insert in the Covid-19 sequence.
The inserted sequence appears to be:
TTGGTCAACAAGACGGCAGTGAGGACAATCAGACAACTACTATTCAAACAATTGTTGAGGTTCAACCTCAAT
A search for matching sequences to the one above was done in a database with known virus
sequences (7).
It is not completely clear where the boundaries of the inserted sequence are, as there appears to be
a cluster of modifications near the edges of the inserted sequence. For this reason, 10 Bp. on the left
and right side of the inserted sequence were omitted in a BLAST search. This should have given more
potential matches.
The search in the virus database (7) on the inserted sequence from Covid-19 yielded 500 matches. All
of these matches were in SARS virus sequences. This would be expected as the requested sequence
actually came from these viruses themselves.
Table 3. First 3 matches of BLAST search. Not shown: 497 similar matches.
Accession
Release
Date
Species
Length
Geo
MT198652
17/03/2
020
Severe acute respiratory syndrome-related
coronavirus
29782
Spain:
Valencia
MT198653
17/03/2
020
Severe acute respiratory syndrome-related
coronavirus
29611
Spain:
Valencia
MT192759
16/03/2
020
Severe acute respiratory syndrome-related
coronavirus
29862
Taiwan
3. Results
The results of the matching of the two genomes are shown in (8).
4. Discussion
The two sequences match up very well. There are many mutations of <5 Bp.. This is to be expected
after an evolutionary process in which the Covid-19 virus evolved from the SARS virus.
Very few large inserts/deletes were found that could indicate man-made changes. The largest of
these (still a relatively short 72 Bp.) was tracked and found not to be of known foreign virus origin.
The author concludes that the Covid-19 sequence very likely evolved from the SARS sequence
through evolution. No evidence of man-made changes or pieces of other viruses were found.
This evolution could have happened in a natural environment, in a laboratorial environment or both.
It seems very difficult, if not impossible, to differentiate between these scenario’s. If the same
mutation was found in bats, this could indicate, but not prove, a natural evolution.
The research described in this document was therefore unable and did not attempt to do this.
5. References
1. Wu, Zhao, Yu, Chen, Wang, Song, Hu, et al.
A new coronavirus associated with human respiratory disease in China: Nature.
2020 Mar;579(7798):265-269. doi: 10.1038/s41586-020-2008-3. Epub 2020 Feb 3
https://www.ncbi.nlm.nih.gov/pubmed/32015508
2. Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
(2020)
https://www.ncbi.nlm.nih.gov/nuccore/MN908947
3. He, Dobie, Ballantine, Leeson, Li, Bastien
Analysis of multimerization of the SARS coronavirus nucleocapsid protein: Biochem Biophys
Res Commun. 2004 Apr 2;316(2):476-83.
https://www.ncbi.nlm.nih.gov/pubmed/15020242
4. SARS coronavirus, complete genome - NCBI Reference Sequence: NC_004718.3 (2018):
https://www.ncbi.nlm.nih.gov/nuccore/NC_004718.
5. Leila Mousavizadeh SorayyaGhasemi
Genotype and phenotype of COVID-19: Their roles in pathogenesis
https://www.sciencedirect.com/science/article/pii/S1684118220300827
6. Prashant Pradhan, Ashutosh Kumar Pandey, Akhilesh Mishra, Parul Gupta, Praveen Kumar
Tripathi et al.
Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag
https://www.biorxiv.org/content/10.1101/2020.01.30.927871v1
7. National Center for Biotechnology Information (NCBI) Virus database:
https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/
8. File showing the two matched sequences in detail, an example is given in appendix 2. The full
file is available on request from the author.
9. Merkle T1, Merz S1, Reautschnig P1, Blaha A1, Li Q2, Vogel P et al
Precise RNA editing by recruiting endogenous ADARs with antisense oligonucleotides
https://www.ncbi.nlm.nih.gov/pubmed/30692694
10. Kristian G. Andersen, Andrew Rambaut, W. Ian Lipkin, Edward C. Holmes & Robert F. Garry
The proximal origin of SARS-CoV-2
https://www.nature.com/articles/s41591-020-0820-9#Fig1
6. Appendix 1: Algorithm used for initial matching
D:\tmp\genome\compareGenomes\compareGenomes\Program.cs 1
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace compareGenomes
{
class match
{
public int chrom1Start, chrom2Start, len;
}
class Program
{
static List<match> matchList;
static string infileName1 = "", infileName2 = "";
static int file1Len = 0, file2Len = 0;
static void Main(string[] args)
{
string chrom1, chrom2, alignedChrom1 = "", alignedChrom2 = "";
int pos1 = 0, pos2 = 0;
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
Console.WriteLine("Stopwatch: start");
infileName1 = args[0];
infileName2 = args[1];
file1Len = readChromosomeFile(infileName1, out chrom1);
file2Len = readChromosomeFile(infileName2, out chrom2);
findLongMatches(chrom1, chrom2, out matchList);
Console.WriteLine("Stopwatch: long matches search finished -
elapsed {0} sec", stopWatch.Elapsed.Seconds);
int maxPos1 = chrom1.Length, maxPos2 = chrom2.Length;
while ((pos1 < maxPos1) && (pos2 < maxPos2))
{
// indels and matches should alternate
findNextMatch(chrom1, chrom2, ref alignedChrom1, ref
alignedChrom2, ref pos1, ref pos2);
findNextIndel(chrom1, chrom2, ref alignedChrom1, ref
alignedChrom2, ref pos1, ref pos2);
}
Console.WriteLine("Stopwatch: ready. elapsed {0} sec",
stopWatch.Elapsed.Seconds);
writeAlignedChromosomes(args[2], alignedChrom1, alignedChrom2);
// Suspend the screen.
Console.ReadLine();
}
private static void writeAlignedChromosomes(string fileName, string
alignedChrom1, string alignedChrom2)
{
FileInfo writeFile = new FileInfo(fileName);
StreamWriter mainFile = writeFile.CreateText();
mainFile.WriteLine("2 genomes aligned. {0}", DateTime.Now);
mainFile.WriteLine("File1 (top line): {0} (length {1} Bp)",
infileName1, file1Len);
mainFile.WriteLine("File2 (bottom line): {0} (length {1} Bp)",
infileName2, file2Len);
mainFile.WriteLine();
int pos1 = 0, pos2 = 0, maxPos1 = alignedChrom1.Length, maxPos2 =
alignedChrom2.Length, linePos, origPos1 = 0, origPos2 = 0;
while ((pos1 < maxPos1) || (pos2 < maxPos2))
{
linePos = 0;
mainFile.Write("{0, 10}: ", origPos1);
while ((pos1 < maxPos1) && (linePos < 50))
{
char c = alignedChrom1[pos1];
mainFile.Write(c);
if (c != ' ') origPos1++;
linePos++;
pos1++;
}
mainFile.WriteLine();
linePos = 0;
mainFile.Write("{0, 10}: ", origPos2);
while ((pos2 < maxPos2) && (linePos < 50))
{
char c = alignedChrom2[pos2];
mainFile.Write(c);
if (c != ' ') origPos2++;
linePos++;
pos2++;
}
mainFile.WriteLine();
mainFile.WriteLine();
}
mainFile.WriteLine("-------------------------------------");
mainFile.WriteLine("Long matching areas (roughly):");
foreach (match m in matchList)
{
mainFile.WriteLine("pos1 {0}, pos2 {1}, len {2}",
m.chrom1Start, m.chrom2Start, m.len);
}
mainFile.Close();
}
private static void findNextMatch(string chrom1, string chrom2, ref
string alignedChrom1, ref string alignedChrom2, ref int chrom1Pos,
ref int chrom2Pos)
{
const int matchLen = 3, minMatches = 1;
int offset = 0,
maxoffset = Math.Min(chrom1.Length - chrom1Pos - matchLen,
chrom2.Length - chrom2Pos - matchLen);
int matches = minMatches;
while ((offset < maxoffset) && (matches >= minMatches))
{
matches = 0;
for (int i = 0; i < matchLen; i++)
{
if (chrom1[chrom1Pos + i + offset] == chrom2[chrom2Pos + i
+ offset]) matches++;
}
offset++;
};
offset--;
if (offset > 0)
{
// we have a match
alignedChrom1 = string.Concat(alignedChrom1, chrom1.Substring
(chrom1Pos, offset));
alignedChrom2 = string.Concat(alignedChrom2, chrom2.Substring
(chrom2Pos, offset));
chrom1Pos += offset;
chrom2Pos += offset;
}
}
private static void findNextIndel(string chrom1, string chrom2, ref
string alignedChrom1, ref string alignedChrom2, ref int chrom1Pos,
ref int chrom2Pos)
{
int offset = 0, matches1 = 0, matches2 = 0;
int maxoffset, max1Pos = 0, max2Pos = 0;
int matchLen = 18, minMatches = 9;
int start1Pos = chrom1Pos;
int start2Pos = chrom2Pos;
IEnumerable<match> remainingList = matchList.Where(m =>
((m.chrom1Start >= start1Pos) && (m.chrom2Start >= start2Pos)));
if (remainingList.Count() > 0)
{
max1Pos = remainingList.Min(m => m.chrom1Start);
max2Pos = remainingList.Min(m => m.chrom2Start);
maxoffset = Math.Min(max1Pos - chrom1Pos, max2Pos -
chrom2Pos);
}
else
{
maxoffset = Math.Min(chrom1.Length - chrom1Pos - matchLen,
chrom2.Length - chrom2Pos - matchLen);
}
while ((matches1 < minMatches) && (matches2 < minMatches) &&
(offset < maxoffset))
{
matches1 = 0; matches2 = 0;
for (int i = 0; i < matchLen; i++)
{
if (chrom1[chrom1Pos + i] == chrom2[chrom2Pos + i +
offset]) matches1++;
if (chrom1[chrom1Pos + i + offset] == chrom2[chrom2Pos +
i]) matches2++;
}
offset++;
}
offset--;
if (offset == maxoffset - 1)
{
// we have reached a known match or the end of the chromosome
//Console.WriteLine("{0} - {1} - {2} - {3}", chrom1Pos,
chrom2Pos, max1Pos, max2Pos);
// the offset should be changed to the known one
offset = (max2Pos - max1Pos) - (chrom2Pos - chrom1Pos);
if (offset < 0)
{
matches2 = minMatches; // this will add space to chrom2
offset = -offset;
}
else
{
matches1 = minMatches; // this will add space to chrom1
matchLen = max1Pos - chrom1Pos;
}
}
// this will ensure that if both qualify, the next code will take
the one with most matches
if (matches1 > matches2) matches2 = 0; else matches1 = 0;
if (offset == 0)
{
// according to this methods definition, we have a match...
alignedChrom1 = string.Concat(alignedChrom1, chrom1.Substring
(chrom1Pos, matchLen));
alignedChrom2 = string.Concat(alignedChrom2, chrom2.Substring
(chrom2Pos, matchLen));
chrom1Pos += matchLen;
chrom2Pos += matchLen;
}
else
if (matches1 >= minMatches)
{
alignedChrom1 = string.Concat(alignedChrom1, new string(' ',
offset));
alignedChrom2 = string.Concat(alignedChrom2, chrom2.Substring
(chrom2Pos, offset));
chrom2Pos += offset;
}
else if (matches2 >= minMatches)
{
alignedChrom2 = string.Concat(alignedChrom2, new string(' ',
offset));
alignedChrom1 = string.Concat(alignedChrom1, chrom1.Substring
(chrom1Pos, offset));
chrom1Pos += offset;
}
else
{
if ((chrom1.Length <= (chrom1Pos + matchLen)) ||
(chrom2.Length <= (chrom2Pos + matchLen)))
{
// we just add the last bits without checking...
alignedChrom2 = alignedChrom2 + chrom2.Substring
(chrom2Pos);
alignedChrom1 = alignedChrom1 + chrom1.Substring
(chrom1Pos);
chrom2Pos = chrom2.Length;
chrom1Pos = chrom1.Length;
}
}
}
private static void findLongMatches(string chrom1, string chrom2, out
List<match> matchList)
{
matchList = new List<match>();
int chrom1Pos = 0, chrom2Pos = 0, matchLength = 0, recordLen = 0,
recordPos1 = 0, recordPos2 = 0;
int chrom1Len = chrom1.Length - 3, chrom2Len = chrom2.Length - 3;
match newMatch;
const int minMatchLen = 100, maxMatchLen = 300000000;
match lastMatch = null, preLast = null;
while ((chrom1Pos < chrom1Len) && (matchLength < maxMatchLen))
{// search the first chromosome
recordLen = 0;
while ((chrom2Pos < chrom2Len) && (matchLength < maxMatchLen))
{// search the second chromosome
//find the maximum matchlength of this position
matchLength = 0;
while (
(matchLength < maxMatchLen) && // we stop searching at
this length, to save realtime
((chrom1Pos + matchLength) < chrom1Len) && ((chrom2Pos
+ matchLength) < chrom2Len) && //make sure we stay within
boundries
((chrom1[chrom1Pos + matchLength] == chrom2[chrom2Pos
+ matchLength]) ||
(chrom1[chrom1Pos + 1 + matchLength] == chrom2
[chrom2Pos + 1 + matchLength]) ||
(chrom1[chrom1Pos + 2 + matchLength] == chrom2
[chrom2Pos + 2 + matchLength]))
)
{ matchLength = matchLength + 3; }
// see if we have a new record
if (matchLength > recordLen)
{
recordLen = matchLength;
recordPos1 = chrom1Pos;
recordPos2 = chrom2Pos;
}
chrom2Pos++;
}
if (recordLen > minMatchLen)
{
// we have found a long matching area, remember it
//if ((preLast == null) || ((preLast != null) &&
(preLast.chrom2Start < recordPos2)))
if ((lastMatch != null) &&
((lastMatch.chrom2Start + lastMatch.len >
recordPos2)))
{
// start ligt in het laatste gebied, neem de langste
if (recordLen >= lastMatch.len)
{
lastMatch.chrom1Start = recordPos1;
lastMatch.chrom2Start = recordPos2;
lastMatch.len = recordLen;
}
// else, we do not register the record
}
else
{
// a completely new area, so new record
newMatch = new match { chrom1Start = recordPos1,
chrom2Start = recordPos2, len = recordLen };
matchList.Add(newMatch);
preLast = lastMatch;
lastMatch = newMatch;
}
}
chrom1Pos++; //next pos
//chrom2Pos = (preLast != null) ? preLast.chrom2Start +
preLast.len : 0;
chrom2Pos = 0;
// we search from the beginning of the second chrom
}
}
private static int readChromosomeFile(string fileName, out string
chrom)
{
const int maxLines = int.MaxValue; //sets the maximum number of
lines read from the sourcefile
const char chromRecogniser = '>';
chrom = "";
FileInfo readFile = new FileInfo(fileName);
if (readFile.Exists)
{
StreamReader mainFile = readFile.OpenText();
string line;
int readCounter = 0;
// read away some intro lines
while (((line = mainFile.ReadLine()) != null) && (readCounter
< maxLines) && ((line.Length == 0) || (line[0] !=
chromRecogniser))) { readCounter++; }
while (((line = mainFile.ReadLine()) != null) && (readCounter
< maxLines))
{
readCounter++;
chrom = string.Concat(chrom, line);
}
mainFile.Close();
Console.WriteLine("read file {0}, length {1}", fileName,
chrom.Length);
return chrom.Length;
}
else
{
Console.WriteLine("{0} file does not exist.", fileName);
return 0;
}
}
}
}
7. Appendix 2: example final result - aligned sequence SARS-Covid-19
0: ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTC
0: ATATTAGGTTTTTACCTACCCAGGAAA AGCCAACCAACCT CGATCTC
50: TTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTC
47: TTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTAGCTGTCGCTC
100: GGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATT ACT
97: GGCTGCATGCCTAGTGCACCTACGCAGTATAAACAATAATAAATTTTACT
148: GTCGTTGACAGGACACGAGTAACTCGTCTATCTTCTGCAGGCTGCTTACG
147: GTCGTTGACAAGAAACGAGTAACTCGTCCCTCTTCTGCAGACTGCTTACG
198: GTTTCGTCCGTGTTGCAGCCGATCATCAGCACATCTAGGTTTCGTCCGGG
197: GTTTCGTCCGTGTTGCAGTCGATCATCAGCATACCTAGGTTTCGTCCGGG
248: TGTGACCGAAAGGTAAGATGGAGAGCCTTGTCCCTGGTTTCAACGAGAAA
247: TGTGACCGAAAGGTAAGATGGAGAGCCTTGTTCTTGGTGTCAACGAGAAA
..
continues SARS length: 29751 Bp, Covid-19 length: 29903 Bp
..
ResearchGate has not been able to resolve any citations for this publication.
Article
Severe Acute Respiratory Syndrome (SARS), an emerging disease characterized by atypical pneumonia, has recently been attributed to a novel coronavirus. The genome of SARS Coronavirus (SARS-CoV) has recently been sequenced, and a number of genes identified, including that of the nucleocapsid protein (N). It is noted, however, that the N protein of SARS-CoV (SARS-CoV N) shares little homology with nucleocapsid proteins of other members of the coronavirus family [Science 300 (2003) 1399; Science 300 (2003) 1394]. N proteins of other coronavirus have been reported to be involved in forming the viral core and also in the packaging and transcription of the viral RNA. As data generated from some viral systems other than coronaviruses suggested that viral N-N self-interactions may be necessary for subsequent formation of the nucleocapsid and assembly of the viral particles, we decided to investigate SARS-CoV N-N interaction. By using mammalian two-hybrid system and sucrose gradient fractionations, a homotypic interaction of N, but not M, was detected by the two-hybrid analysis. The mammalian two-hybrid assay revealed an approximately 50-fold increase in SEAP activity (measurement of protein-protein interaction) in N-N interaction compared to that observed in either M-M or mock transfection. Furthermore, mutational analyses characterized that a serine/arginine-rich motif (SSRSSSRSRGNSR) between amino acids 184 and 196 is crucial for N protein oligomerization, since deletion of this region completely abolished the N protein self-multimerization. Finally, the full-length nucleocapsid protein expressed and purified from baculovirus system was found to form different levels of higher order structures as detected by Western blot analysis of the fractionated proteins. Collectively, these results may aid us in elucidating the mechanism pertaining to formation of viral nucleocapsid core, and designing molecular approaches to intervene SARS-CoV replication.
A new coronavirus associated with human respiratory disease in China: Nature
  • Wu
  • Zhao
  • Yu
  • Chen
  • Wang
  • Song
  • Hu
Wu, Zhao, Yu, Chen, Wang, Song, Hu, et al. A new coronavirus associated with human respiratory disease in China: Nature. 2020 Mar;579(7798):265-269. doi: 10.1038/s41586-020-2008-3. Epub 2020 Feb 3 https://www.ncbi.nlm.nih.gov/pubmed/32015508