Content uploaded by Jeffrey P Tomkins
Author content
All content in this area was uploaded by Jeffrey P Tomkins on Jan 30, 2015
Content may be subject to copyright.
How Genomes are Sequenced and Why it Matters:
Implications for Studies in Comparative Genomics of
Humans and Chimpanzees
Answer s Research Journal 4 (2011):81–8 8.
ww w.answersingenesis.org/contents/379/arj/v4/genomes _chimpanzees_humans.pdf
Abstract
Claims about high genomic DNA sequence similarity between humans and chimpanzees are
typically made to audiences that do not understand the various layers of technology and ideological
bias imposed upon the origination of the data in question. The recent human -chimp Y-chromosome
project introduced a number of important genomic tools to achieve a considerably less-biased
analysis. The results indicated a much higher level of dissimilarity in both gene content and overall
sequence similarity than the previously reported levels up to 99% similar ity. As of yet, no similar study
utilizing a less-biased genomic framework for autosomal regions has been reported. When evaluating
compar isons between genomes using DNA sequence, it is important to understand the nature of how
that sequence was obtained and bioinformatically manipulated before drawing any conclusions. It is
not uncommon to arrange the sequence of a genome for which little is known by using the genome
of a hypothetical closely related organism that has better developed genomic resources. It is also
QRWXQFRPPRQWRÀUVWVFUHHQ WKHIUDPHZRUNPRGHOJHQRPHWRÀQGUHJLRQVRIKLJKVLPLODULW\SULRUWR
DQ\ FRPSDUDWLYH DQDO\VHV DQG WR HYHQ RPLWJDSV LQ WKHÀQDO '1$DOLJQPHQWV EHIRUHGHWHUPLQLQJ
sequence identity. As a result, evolutionary bias literally colors every aspect of the DNA analysis and
annotation. Understanding the technology used to produce a comparative genomic product for
LQWHUJHQRPHVWXGLHVLVUHTXLUHGSULRUWRPDNLQJDQ\GHÀQLWLYHFRQFOXVLRQVDERXWWKHGDWDSUHVHQWHG
At present, a considerably more unbiased approach to comparative genomics needs to be applied
to the analysis and annotation of genome.
Keywords: comparative genomics, human-chimp similarity, human genome, chimpanzee genome,
DNA sequencing, genome sequencing, cloning DNA
ISSN: 1937-9056 Copyright © 2011 Answers in Genesis. All rights reserved. Consent is given to unlimited copying, downloading, quoting from, and distribution of this article for
QRQFRPPHUFLDOQRQVDOHSXUSRVHVRQO\SURYLGHGWKHIROORZLQJFRQGLWLRQVDUHPHWWKHDXWKRURIWKHDUWLFOHLVFOHDUO\LGHQWLÀHG$QVZHUVLQ*HQHVLVLVDFNQRZOHGJHGDVWKHFRS\ULJKW
RZQHU$QVZHUV5HVHDUFK-RXUQDODQGLWVZHEVLWHZZZDQVZHUVUHVHDUFKMRXUQDORUJDUHDFNQRZOHGJHGDVWKHSXEOLFDWLRQVRXUFHDQGWKHLQWHJULW\RIWKHZRUNLVQRWFRPSURPLVHG
LQDQ\ZD\)RUPRUHLQIRUPDWLRQZULWHWR$QVZHUVLQ*HQHVLV32%R[+HEURQ.<$WWQ(GLWRU$QVZHUV5HVHDUFK-RXUQDO
The views expressed are those of the writer(s) and not necessarily those of the $QVZHUV5HVHDUFK-RXUQDOEditor or of Answers in Genesis.
Jeffrey P. Tomkins, Institute for Creation Research, 1806 Royal Lane, Dallas, TX 75229
Introduction
The ability to sequence the DNA of an organism’s
JHQRPH ZDV DQ LPSRUWDQW VFLHQWLÀF DGYDQFH WKDW
radically changed many aspects of molecular
biology and genetics in both the academic and
private sectors. Unfortunately, many discussions
and interpretations surrounding genomic sequence,
particularly those of a comparative nature, are
errant or misleading because of the type of DNA
sequence in question. Depending on the type of
research approach and technologies used to produce
the overall DNA sequence assembly for a particular
organism, certain limitations to its application and
XVDJHPXVWEHWDNHQLQWRDFFRXQWZKHQDSSO\LQJLW
for any comparative purpose.
Not surprisingly, the role of available research
funds weighed against the cost per base of DNA
sequence is, in most cases, the deciding factor on the
overall amount and quality of sequence produced.
*HWWLQJ PRUH ´EDQJ IRU WKH EXFNµ LV JHQHUDOO\ WKH
way grant funds are used when it comes to DNA
sequencing. This general ideology is true of many post-
KXPDQJHQRPH UHVHDUFKSURMHFWVZKLFK LQFRUSRUDWH
a DNA sequencing strategy called “whole genome
VKRWJXQVHTXHQFLQJµ 7KLVW\SHRIWHFKQRORJ\WDNHV
RQSDUWLFXODUVLJQLÀFDQFHZKHQ WDNLQJ LQWRDFFRXQW
the massive amounts of data now being produced
XVLQJQH[WJHQHUDWLRQ´PDVVLYHO\SDUDOOHOµVHTXHQFLQJ
technologies.
In 2004, the human genome was formally completed
LQ UHJDUG WR VHTXHQFLQJ WKH PDMRU HXFKURPDWLF
sections (International Human Genome Sequencing
Consortium 2004). In 2005 (The Chimpanzee
Sequencing and Analysis Consortium), a rough
draft of the chimpanzee genome was reported with
the hope that its availability would vindicate the
claims of biologists who had been promoting high
VLPLODULW\RUJUHDWHU%ULWWHQDVVRFLDWHG
with an ape to human evolutionary transition. Years
before the DNA revolution began, chimpanzees were
often positioned in the evolutionary tree closest to
humans out of all the extant apes. Some biologists
even went so far as to say that humans and chimps
should be placed in the same genus and considered
separate species (Wildman et al. 2003). However,
most scientists recognized the vast behavioral and
anatomical differences that exist between humans
and chimps and do not agree that they should be
placed in the same genus (Taylor 2009). In addition,
recent research has shown that some sections of the
human genome are more similar to orangutan, and
not chimpanzee producing evolutionary aberrant
'1$ SDWWHUQV FDOOHG ´LQFRPSOHWH OLQHDJH VRUWLQJµ
(Hobolth et al. 2011).
J. P. Tomkins
82
Brief History of DNA Sequencing Technology
7R IXOO\ XQGHUVWDQG WKH UDPLÀFDWLRQV RI WKH
incredibly large amount of DNA sequence data
currently available today in the world’s public
UHSRVLWRULHVLWLVLPSRUWDQWWRÀUVWWDNHDEULHIORRN
at the history of DNA sequencing technologies. This
ZLOOKHOSH[SODLQZK\FHUWDLQDSSURDFKHVZHUHWDNHQ
to sequence certain organisms and also allows an
understanding of the resulting overall quality and
usability for that particular sequence set. For a
WLPHOLQHRIVHOHFWHGPDMRUHYHQWVLQ WKHKLVWRU\RI
DNA sequencing research related to sequencing, see
Fig. 1.
Fig. 1. 7LPHOLQH VKRZLQJVLJQLÀFDQWPLOHVWRQHVUHODWHGWR
the history of DNA sequencing..
The whole modern phenomenon of DNA
VHTXHQFLQJ ZDV LQWURGXFHG E\ WKH ZRUN RI ELRORJLVW
DQG FKHPLVW )UHG 6DQJHU 6DQJHU 1LFNOHQ DQG
Coulson 1977), research that earned him the Nobel
Prize. Surprisingly, the basic chemistry invented by
Fred Sanger, referred to as Sanger-style sequencing,
has remained essentially the same from its earliest
years until the present time. Drastic improvements in
Sanger-style DNA sequencing since 1977 were largely
achieved through four areas:
1. the introduction of the polymerase chain reaction
3&5DQGLQLPSURYHPHQWVLQWKHEDVLFFKHPLFDO
components (various enzymes, reagents and DNA
fragment labeling),
2. the automation of sample preparation via large-
scale microtiter plate (primarily 96 and 384-well
formats) systems using robotically automated
pipetting and thermo-cycler platforms,
3. automated laser-based fragment detection
systems which evolved from 96-lane slab gel
systems to extremely high-throughput/automated
robotic platforms using large arrays of individual
capillaries that could resolve DNA fragments in 96
RUPRUHVHTXHQFLQJ UHDFWLRQV LQ D PDWWHU RI MXVW
D FRXSOH RI KRXUV DQG WKHQ DXWRPDWLFDOO\ UHORDG
themselves, and
4. bioinformatic and computational advances in
hardware and software to edit, process, and submit
massive amounts of DNA sequence data to both
local and off-site database repositories. Advances
in laboratory information management systems
(LIMS) contributed to the overall automation and
integration of the overall process.
One important feature of modern Sanger-style
sequencing is the long high-quality read lengths that
can be achieved. Under relatively optimal conditions,
high-quality DNA sequence with a rate of only 1 error
in 10,000 bases can be routinely obtained with average
individual read lengths up to ~1,200 bases. The public
KXPDQJHQRPHSURMHFWZDV ODUJHO\FRPSOHWHGXVLQJ
Sanger-style technology on DNA libraries constructed
from mapped large-insert DNA clones (International
Human Genome Sequencing Consortium 2001, 2004).
Slab-gel DNA sequencers were used at the beginning
RIWKHSURMHFWDQGZHUHHYHQWXDOO\UHSODFHGZLWKÀUVW
generation capillary technology.
Currently, next generation DNA sequencing
technologies based on an overall strategy called
PDVVLYHO\SDUDOOHOVHTXHQFLQJ0DUGLV5RJHUV
and Venter 2005), have increased overall total DNA
VHTXHQFH RXWSXW +RZHYHU RQH LQKHUHQW GUDZEDFN
to massively parallel sequencing as a whole is the
dramatic reduction in the amount of high quality
sequence per individual read. Based on the next
generation technology variant, individual read
lengths vary from about 25 bases to 100 bases (Mardis
83
How Genomes are Sequenced and Why it Matters
2008) with some recent claims by machine suppliers
as high as 400. The overall trend is that the more
EXON VHTXHQFH SURGXFHG E\ D SDUWLFXODU WHFKQRORJ\
within a certain span of time, the shorter the average
read length of the individual sequences. Massively
SDUDOOHO VHTXHQFLQJ KDV LPSRUWDQW UDPLÀFDWLRQVIRU
comparative genomics that will be discussed after
VRPHEDFNJURXQGLQIRUPDWLRQRQJHQRPHVHTXHQFLQJ
strategies is discussed.
Approaches To Genome Sequencing
7KH ÀUVW JHQRPHV VHTXHQFHG ZHUH VPDOO DQG
microbial in nature and included several species of
EDFWHULD)UDVHUHWDO0XVKHJLDQDQG.RRQLQ
1996). This is because the DNA in bacterial genomes
is relatively void of non-protein coding DNA sequence
ZKLFKLVRIWHQUHSHWLWLYHDQGGLIÀFXOWWRVHTXHQFHDQG
computationally assemble. With highly repetitive
JHQRPHVHTXHQFHLQKLJKHUHXNDU\RWHVFHUWDLQEORFNV
of DNA sequence are repeated for very long stretches.
The problem in such cases is not that the chemistry is
unable to sequence the DNA, but the computational
assembly of the repetitive sequence reads to form
a single long error-free contiguous DNA sequence
(contig) is confounded. In addition to the computational
limitations of assembling highly repetitive sequences,
the incorporation of a single errant sequence into a
contig can also pull in a large number of other related
errant sequences, producing sequencing chimeras. To
VROYHWKLVSUREOHPWHFKQLTXHVWRMXPSRYHUWKHVHDUHDV
RIWKHJHQRPHXVLQJYDULRXVW\SHVRIIUDPHZRUNVDQG
bridging scaffolds were implemented. Nevertheless,
JHQRPHVHTXHQFLQJÀUVWWHVWHGWKHZDWHUVZLWKVPDOO
non-repetitive genomes that were easily assembled
and then moved on to some of the more challenging
HXNDU\RWLFJHQRPHVVXFKDVIUXLWÁ\QHPDWRGHDQG
human.
Genetic Maps
)RU WKHSXEOLFKXPDQJHQRPHSURMHFWDVZHOODV
VHYHUDO RWKHU LQLWLDO HXNDU\RWLF JHQRPHV VXFK DV
QHPDWRGHDQGIUXLWÁ\DIUDPHZRUNEDVHGDSSURDFK
was developed to methodically sequence the genomes.
,QDIUDPHZRUNDSSURDFKDYDULHW\RIJHQRPLFWRROV
DUHLQWHJUDWHG WR ÀUVWIRUPD JHQRPLFVFDIIROGWKDW
can be used to identify targeted regions to sequence
in addition to arranging and orienting sequencing
UHDGV 0H\HUV 6FDODEULQ DQG 0RUJDQWH
:DUUHQ HW DO 7KH ÀUVW SDUW RI WKH VFDIIROG
is called a molecular genetic map, which involves
WKH SODFHPHQW RI '1$ ODQGPDUNV WKURXJKRXW WKH
JHQRPH E\ REVHUYLQJ KRZ '1$ PDUNHUV VHJUHJDWH
in the offspring of controlled matings or in the case
of humans, utilizing the extant pedigrees of large
families (Kong et al. 2002).
*HQHWLF PDSSLQJ SURMHFWV SURGXFH KXQGUHGV WR
WKRXVDQGVRI'1$PDUNHUV SRVLWLRQHGLQWKHSURSHU
order along chromosomes and separated by relative
frequency-based distances called centimorgans.
Without going into any more detail than this, it is
VXIÀFLHQWWRQRWHWKDWWKHSURFHVVRIJHQHWLFPDSSLQJ
can produce a rather detailed map of a genome
WKDW VKRZV VSHFLÀF ODQGPDUNV DORQJ FKURPRVRPHV
PXFK OLNH D URDGPDS VKRZV FLWLHV SRVLWLRQHG DORQJ
a highway (see Fig. 2 for an example of a genetic
map). While genetic maps can be rather detailed, the
GLVWDQFHEHWZHHQODQGPDUNVLVQRWDSK\VLFDOGLVWDQFH
that can be measured in actual base pairs of DNA,
but rather represents a centimorgan unit which is a
relative distance based on frequency of recombination
EHWZHHQOLQNHGFKURPRVRPDOVLWHV
Physical (Contig-Based) Clone Maps
7KHVHFRQGNH\FRPSRQHQWRIDJHQRPLFIUDPHZRUN
is a physical map, often referred to as a contig-based
clone map which provides literal physical distances
between points in the genome (Meyers, Scalabrin, and
0RUJDQWH:DUUHQ HWDO&ORQLQJ '1$
IUDJPHQWV ZDV D WHFKQRORJ\ ÀUVW GHYHORSHG LQ WKH
early 1970s shortly after the discovery of restriction
HQ]\PHVSURWHLQVWKDWFXW'1$DWVSHFLÀFVHTXHQFH
sites. In cloning DNA, the restriction fragments of the
target organism’s DNA are placed in a small piece of
Fig. 2. Hypothetical genetic map showing sequence tagged
VLWHV676RUJHQHWLFPDUNHUVZLWKUHFRPELQDWLRQEDVHG
distances between them demarcated in centimorgans
F0 DOVR UHIHUUHG WR DV PDS XQLWV *HQHWLF PDUNHU
QRPHQFODWXUHLVGLYHUVHWKH676XVDJHLQWKLVÀJXUHLV
for illustration purposes.
J. P. Tomkins
84
engineered circular DNA called a plasmid.
These plasmids are then transferred into lab
strains of E. coli where they are maintained,
replicated, and frozen for storage. The cloned
DNA can be placed in arrayed sets of clones in
microtiter plates called libraries.
These libraries are often frozen at extremely
low temperatures (–60° to 80° C) and can be
stored for years or discarded following their
use as sequencing reagents. Early bacterial
cloning systems only allowed for the cloning of
small DNA fragments of no more then 10,000
EDVHV NE/DWHUDWWHPSWVDWFORQLQJODUJH
DNA fragments that would facilitate the
representation of entire genomes at redundant
levels in single libraries were initially made
using yeast as a cloning vector, but the yeast
V\VWHP ZDV WHFKQLFDOO\ FKDOOHQJLQJ GLIÀFXOW
to automate and produced libraries with high
levels of chimeric clones.
The revolution in large fragment DNA cloning
ZDV ÀUVW UHSRUWHG LQ DQG GHVFULEHG D QHZ
type of single-copy plasmid vector called a Bacterial
$UWLÀFLDO&KURPRVRPH %$&6KL]X\DHWDO
The BAC system allowed for the cloning of very large
SLHFHV RI '1$ WR NE XVLQJ HVWDEOLVKHG E.
coli SURWRFROV ZLWK RQO\ PRGHUDWH PRGLÀFDWLRQ ,Q
BAC cloning, the target substrate represents size-
selected large fragment portions of partially digested
DNA. The large partially digested fragments provide
the ability to contiguously assemble overlapping
clones into a genomic physical map. Given this level
of cloning capacity, BAC libraries that represented a
10-fold redundant coverage (or more) of a large
JHQRPHOLNHWKDWRIKXPDQVFRXOGEHGHYHORSHG7KH
ÀUVW UHSRUWHG XVH RI %$& OLEUDULHV ZDV IRU KXPDQ
DNA, but the technology was subsequently utilized
for many animal and plant taxa.
While BAC libraries could be applied to a variety
of genomic applications, their primary utility was in
the development of contig-based clone maps that could
be integrated with genetic maps to form an elaborate
SK\VLFDOJHQHWLF IUDPHZRUN IRU JHQRPH VHTXHQFLQJ
0H\HUV6FDODEULQDQG0RUJDQWH:DUUHQHW
al. 2006). In developing a contig-based clone map,
WKH FORQHV LQ D %$& OLEUDU\ DUH ÀUVW ÀQJHUSULQWHG
meaning that the DNA of each clone fragment is
systematically cut with one or more restriction
enzymes. The fragments are then separated based
on size through a process called electrophoresis. The
patterns of fragmentation are then digitized and
SODFHGLQDGDWDEDVHRIFORQHÀQJHUSULQWV&ORQHVZLWK
VKDUHG IUDJPHQWDWLRQ SDWWHUQV ÀQJHUSULQWV DUH
computationally assembled into sets of overlapping
clones to form large reconstructed sections of
FKURPRVRPHVÀJ
Through a process of tagging the BAC clones in
D SK\VLFDO PDS ZLWK FRUUHVSRQGLQJ PDUNHUV IURP
a genetic map, based on sequence similarity, the
physical map could be integrated with the genetic
PDSÀJ.QRZOHGJHRI%$&FORQH DQGIUDJPHQW
size in a physical-genetic map allows for the
calculation of actual physical distance or base pairs
RI'1$ EHWZHHQJHQHWLFPDUNHUV7KLVLVDQDORJRXV
to determining the actual mileage between cities on a
map. Conversely, the clone-based contigs themselves
can now be positionally oriented in the genome based
RQWKHOLQNDJHJURXSVFRUUHVSRQGLQJWRFKURPRVRPHV
in the genetic map. By assembling the clone contigs
LQWRWKHLUUHVSHFWLYHOLQNDJHJURXSVEDVHGRQWKHLU
DVVRFLDWLRQWRFRUUHVSRQGLQJJHQHWLFPDUNHUVHQWLUH
chromosomes can be reconstructed. The end result
is a highly accurate map of the entire genome of an
RUJDQLVP WKDW FDQ VHUYH DV D IUDPHZRUN WRRO IRU D
YDULHW\RIDSSOLFDWLRQVLQFOXGLQJWKHLGHQWLÀFDWLRQRI
genes of interest, targeted genome sequencing, and
complete genome sequencing.
Sequencing Strategies Developed
in the Human Genome Project
7KHSXEOLFVHFWRURIWKHKXPDQJHQRPHSURMHFWZDV
a consortium of laboratories around the world located
ODUJHO\LQWKH86$(QJODQG)UDQFHDQG-DSDQ8VLQJ
the physical-genetic map, the various labs were each
DVVLJQHG DVSHFLÀFVHWRI RYHUODSSLQJ %$&FORQHV WR
sequence in a methodical clone-by-clone highly ordered
strategy. Multiple locations on chromosomes were
being sequenced at the same time, each initiated by a
single BAC called a seed clone. Despite this technology,
there are still regions of the human genome which
remain unsequenced due to their highly repetitive and
variable nature. These regions are so large that they
cannot be bridged by a BAC clone.
Fig. 3. 'HYHORSPHQW RI D SK\VLFDO IUDPHZRUN IRU DQ LVRODWHG
section of a hypothetical genome. The illustration shows how
overlapping large fragment clones form a contig. The addition
RI JHQHWLF PDUNHUV WR WKH FRQWLJ LV DOVR LOOXVWUDWHG WR IRUP WKH
SK\VLFDOJHQHWLF JHQRPLFIUD PHZRUN (QWLUH FKURPRVRPHV DQG
genomes can be assembled via the development of these contigs
ZKLFKDUHRULHQWHGDQGSRVLWLRQHGZLWKWKHJHQHWLFPDUNHUV
85
How Genomes are Sequenced and Why it Matters
Each BAC clone selected for genome sequencing
became the chief substrate for DNA sequencing. This
was accomplished by the physical shearing of the 100
WR NE %$& FORQH IROORZHG E\ HQGUHSDLU RI WKH
fragments, and cloning into a small-insert plasmid
sequencing vector. The BAC sub-clones are then
production sequenced en masse until about an 8- to
10-fold redundant coverage of the original BAC
clone has been achieved. Following assembly of the
production sequence reads, in most cases there
remain gaps in the sequence that need to be closed
LQDSURFHVVFDOOHG´ÀQLVKLQJµRU´JDSFORVXUHµ*DS
closure often requires the use of a variety of techniques
and chemistries and typically costs as much or more
than the original production sequencing operation.
In cases where a gap could not be closed with actual
DNA sequence, it was often bridged with paired reads
from both sides of the gap with a large DNA clone of
NQRZQVL]H
This whole process of methodical genome
sequencing is quite involved, time consuming, and
expensive. As a result, government DNA sequencing
funding strategies were changed after the human
genome and several model genomes were completed.
Whole Genome Shotgun Sequencing (WGSS)
In contrast to the effort by the public sector, which
GLGQRWSURGXFHDZRUNDEOHGUDIWRIWKHJHQRPHXQWLO
DQG D QHDUFRPSOHWH ÀQDO YHUVLRQ LQ
research scientist Craig Venter in the private sector
(Celera Genomics), proposed a more rapid approach
,VWUDLO HW DO 9HQWHU HW DO :HEHU DQG
Myers 1997). Venter’s method employed a technique
FDOOHG´ZKROHJHQRPH VKRWJXQVHTXHQFLQJ µ:*66
in which construction of an initial genetic-physical
IUDPHZRUN PD\ EHE\SDVVHG,Q VXFK D SURMHFWWKH
entire genome is fragmented en masse and cloned
as large batches of random fragments. To improve
the process, multiple types of plasmid vectors and
fragment sizes are cloned, providing multiple libraries
for sequencing. The clones in each of the libraries are
then production sequenced en masse to certain levels
of genomic redundancy based on research funds.
The caveat of the propaganda surrounding Venter’s
´ZKROHJHQRPH VKRWJXQ VHTXHQFLQJµ HIIRUW ZDV WKH
fact that his laboratory still relied on the use of the
SK\VLFDOJHQHWLF IUDPHZRUN GHYHORSHG E\ WKH SXEOLF
VHFWRURIWKHKXPDQSURMHFWWRVRUWRXWWKHKXJHPDVV
of random DNA sequences and sequencing contigs.
7KLVFDYHDWHYHQWKRXJKFOHDUO\RXWOLQHGLQWKHRIÀFLDO
MRXUQDO SXEOLFDWLRQ 9HQWHU HW DO ZDV QHYHU
widely discussed in the popular media. Nevertheless,
WKH FRQFHSW RI ´ZKROHJHQRPH VKRWJXQ VHTXHQFLQJµ
became quite popular and was subsequently used as
a cost-effective strategy for genome sequencing for a
wide variety of other plant and animal genomes.
Chimpanzee Shotgun Sequence
and the Human Framework
:KLOH RQH ZRXOG WKLQN WKDW WKH EDVLF WHFKQLFDO
process of producing a genomic sequence would be free
of any philosophical constraints, this is not always
the case. Perhaps the most dramatic example of this
LV WKH FKLPSDQ]HH JHQRPH SURMHFW ZKLFK FRQVLVWHG
of an initial 5-fold redundant shotgun coverage
(The Chimpanzee Genome Consortium 2005). In
FRQWUDVWWRWKH KXPDQ JHQRPHSURMHFWIXQGLQJZDV
OLPLWHG DQG WKH SURMHFWLQLWLDOO\ HPSOR\HG D ´ZKROH
JHQRPHVKRWJXQVHTXHQFLQJµVWUDWHJ\WKDWSURGXFHG
a 5-fold redundant coverage. However, to organize
the millions of sequencing reads, the human genome
SK\VLFDOIUDPHZRUNZDVLQLWLDOO\XVHGDVDVFDIIROG,Q
other words, the chimp genomic sequence was sorted
out and organized according to the human genomic
IUDPHZRUN XQGHU WKH DVVXPSWLRQ WKDW FKLPSDQ]HH
and human are genetically similar, which evolutionists
assume is due to a shared common ancestor about one
to six million years ago.
One concern regarding the use of the human
JHQRPH DV D IUDPHZRUN IRU FKLPSDQ]HH LV WKH
SRVVLELOLW\WKDWWKHUHPD\EHDPDMRUVL]HGLVFUHSDQF\
8VLQJ ÁRZ F\WRPHWU\ WR HVWLPDWH QXFOHDU '1$
content, the human genome is widely used as a
calibration standard at 7.0 picograms for a 2C diploid
cell (Dolezel and Greilhuber 2010), and listed at
3.5 pg for a 1C equivalent at www.genomesize.com. At
WKHVDPHZHEVLWHWKHUHDUHÀYHUHIHUHQFHGHVWLPDWHV
for chimpanzee which range from 3.46 to 3.85 for
& D WR LQFUHDVH LQ JHQRPH VL]H FRPSDUHG
to human. The reported average estimated genome
size increase of chimpanzee over human is about 5%.
Interestingly, in 2009, statistics for the chimpanzee
genome sequencing effort posted on the Washington
University Genome Center web site indicated that the
total amount of contiguously assembled chimpanzee
sequence was close to 20% more than the same
parameter for the human genome. However, the
sequencing statistics for chimpanzee were removed
from the web in 2010 even though a new build version
was announced. At the time of this writing (2011),
no current chimpanzee genome assembly statistics
are listed online although DNA sequence and BAC
FORQHÀQJHUSULQWGDWDDUHIUHHO\ DYDLODEOHIRUSXEOLF
download.
Perhaps the most startling human-chimpanzee
genome data of recent times, are the results
from comparing DNA sequence from human
and chimpanzee Y-chromosomes (Hughes et al.
6SHFLÀFDOO\ WKLV UHFHQW VWXG\ LQYROYHG WKH
FRPSDULVRQ RI WKH PDOHVSHFLÀF UHJLRQV RI WKH <
chromosome (MSY). While much of the human Y
chromosome has been sequenced, only the MSY
region of the chimpanzee Y chromosome was
J. P. Tomkins
86
sequenced to a high level of completion and then
compared to the corresponding region in the human
Y-chromosome.
What made this study unique was that the MSY
region in chimpanzee was largely assembled and
constructed based on a clone-based physical map for
FKLPSDQ]HH QRW WKH KXPDQ SK\VLFDO IUDPHZRUN
This allowed for a relatively reasonable comparison
of the MSY sequence between human and chimp, the
ÀUVW WLPH VXFK DQ DSSDUHQWO\ XQELDVHG ODUJHVFDOH
comparison had actually been done. The results were
completely unexpected and radically contradicted
the standard evolutionary dogma which pervades
WKH VFLHQWLÀF FRPPXQLW\ 7KH UHVHDUFK SDSHU WLWOH
was well chosen and a very accurate one-sentence
VXPPDU\ RI WKH SURMHFW ´&KLPSDQ]HH DQG KXPDQ
FKURPRVRPHV DUHUHPDUNDEO\GLYHUJHQWLQVWUXFWXUH
DQG JHQH FRQWHQWµ 3HUKDSV WKH PRVW LQWHUHVWLQJ
highlight of the study was the difference in gene
content. While the non-genic areas between human
and chimp in the MSY region were also dramatically
different, the human MSY contained 78 genes while
the chimpanzee only contained 37, a 48% difference
in total gene content alone. In addition, the human
MSY contained 27 different classes of genes (gene
families/categories) while chimpanzee contained
RQO\ PHDQLQJ WKDW QLQH HQWLUH FODVVHV RU JHQH
categories were not even present in the chimpanzee
MSY region. Perhaps the best way to summarize the
XQSUHFHGHQWHGSURMHFWLVWRTXRWHVRPHOLQHVIURPWKH
original research report.
+HUH ZH ÀQLVKHG VHTXHQFLQJ RI WKH PDOHVSHFLÀF
region of the Y chromosome (MSY) in our closest
living relative, the chimpanzee, achieving levels of
accuracy and completion previously reached for the
human MSY. By comparing the MSYs of the two
species we show that they differ radically in sequence
structure and gene content . . . The chimpanzee MSY
contains twice as many massive palindromes as the
human MSY, yet it has lost large fractions of the
MSY protein-coding genes and gene families present
in the last common ancestor (excerpt from abstract,
Hughes et a l. 2010, p. 536).
A number of autosomal comparative studies
have been done using both coding and non-coding
sequences. Two of the most prominent studies are
ZRUWKPHQWLRQLQJEULHÁ\7KHÀUVWLVDFRPSDUDWLYH
study between human chromosome 21 and chimpanzee
FKURPRVRPH VRFDOOHG KRPRORJV :DWDQDEH HW
al. 2004). The chimpanzee sequence was somewhat
limited at the time, but in contrast to the recent Y-
FKURPRVRPHSURMHFWDSK\VLFDOPDSIRUFKLPSDQ]HH
was not utilized. Large insert clones were selected
by screening libraries with human probes and only
WKH PRVW KLJKO\ DOLJQDEOH KXPDQOLNH FORQHV ZHUH
selected. These hand selected and sequenced clones
ZHUH RULHQWHG RQ WKH KXPDQ SK\VLFDO IUDPHZRUN
with the non-alignable sections and gaps ignored.
As a result, the data regarding genomic similarity
was biased or constricted to those areas which were
previously determined to be strong candidates for
similarity.
Although the authors provide interesting data for
the selected regions they analyzed, they do not commit
WRDQ\GHÀQLWLYHOHYHORIRYHUDOOVHTXHQFHVLPLODULW\
other than to say that 83% of the translated protein
coding regions would produce differences in protein
sequence between human and chimp. Considering
that only similar DNA clones were selected, the fact
that 83% of the actual coding sequence would produce
different proteins is indicative of more dissimilarity
WKDQ VLPLODULW\ :H DOVR QRZ NQRZ WKDW SURWHLQ
translation is a complicated mix of non-protein
coding DNA regulation features where a single
gene under differential control can produce a wide
YDULHW\RIWUDQVFULSWV%DUDVKHWDO:DQJDQG
Burge 2008). Nevertheless, evolutionists will cite the
Watanabe et al. (2004) study as a conclusive genomic
effort for high sequence similarity.
The second study of interest is a whole genome
type of comparison using chimpanzee genomic
sequences derived from the ends of large insert clones,
called BAC-end sequences (BES) (Britten 2002).
7KH FKLPSDQ]HH VHTXHQFHV DUH ÀUVW VFUHHQHG IRU
DQ\WKLQJWKDW·VKXPDQOLNHDQGKLJKO\DOLJQDEOHDQG
then the best candidates are passed along for more
detailed analyses. It should also be noted that such a
procedure eliminates large portions of important non-
coding regulatory sequences. Sequences of selected
interest are then, once again, positioned using the
KXPDQ SK\VLFDO IUDPHZRUN DQG WKHQ HYDOXDWHG IRU
similarity.
7KH<FKURPRVRPHSURMHFWRQO\HYDOXDWHGDVLQJOH
LVRODWHG SRUWLRQ RI WKH <FKURPRVRPH WKH RQO\
part that was readily alignable was novel in that
LW XWLOL]HG DQ DFWXDO SK\VLFDO IUDPHZRUN GHULYHG
for the chimpanzee genome to isolate and target
sequence for comparison. The section that was
chosen for the Y-chromosome effort also appears
to be the most readily amenable to comparative
study. A physical map assembly has recently been
reported for chimpanzee (Warren et al. 2006).
However, the only published genomic sequence
comparison between human and chimpanzee using
VSHFLHV VSHFLÀF SK\VLFDO IUDPHZRUNV KDV EHHQ WKH
<FKURPRVRPH SURMHFW ,W ZRXOG EH TXLWH YDOXDEOH
WR HYROXWLRQLVWV DQG FUHDWLRQLVWV DOLNH LI XQELDVHG
large-scale autosomal comparisons between human
and chimpanzee could be completed now that the
resources are available. In fact, the results of the Y-
chromosome study demand that similar approaches
EHWDNHQIRUWKHUHVWRIWKHJHQRPH
87
How Genomes are Sequenced and Why it Matters
Implications for Next Generation Sequencing
Technologies
Massively parallel DNA sequencing representing
next generation technologies refers to literally
thousands of individual reactions conducted
simultaneously by a single machine (see Mardis 2008
for a technological review). The different proprietary
DNA sequencing systems being utilized are based on a
VLQJOHJHQHUDOFRQFHSWWKHDPSOLÀFDWLRQRILQGLYLGXDO
DNA strands in a massively parallel (simultaneous)
fashion. The strand being copied from the template
fragment in each individual reaction is systematically
interrogated by high precision optics such that the
consecutive addition of nucleotide bases up to a
threshold level is determined. In general, for each
WHFKQRORJ\ WKH PRUH EXON '1$ VHTXHQFH REWDLQHG
in a single machine run (~6 to 8 hours), the shorter
the individual read lengths. As mentioned previously,
current systems typically produce 25 to 100 bases
of high quality sequence with some companies now
claiming routine reads up to 400 bases. Despite
WKH PDUNHG UHGXFWLRQ LQ UHDG OHQJWK FRPSDUHG WR
Sanger-style methodologies (still commonly used),
the two primary advantages include: no DNA cloning/
bacterial manipulation is required and the production
of megabase quantities of DNA sequence in a single
run.
The new massively parallel sequencing
technology has proven ideal for the sequencing of
microbial genomes, whole microbial communities
(metagenomics), diverse types of transcriptomes, and
HXNDU\RWLF JHQRPH UHVHTXHQFLQJ IRUSRO\PRUSKLVP
detection (genetic variation). The DNA substrate for
these technologies is often randomly sheared whole
JHQRPH VKRWJXQ IUDJPHQWV VLPLODU WR WKH ÀUVW
step of DNA preparation used in WGSS discussed
previously. Because of this, the same problems apply to
the resulting genomic sequences. In fact, the problem
of sorting out and aligning sequences in the genome is
even worse because of the short read lengths. In other
ZRUGV\RXZLOOQHHGDQH[LVWLQJSK\VLFDOIUDPHZRUNWR
VRUWRXWWKHGDWDSDUWLFXODUO\LQHXNDU\RWLFJHQRPHV
OLNH KXPDQ:KLOHWKHQHZVHTXHQFLQJ WHFKQRORJLHV
are extremely innovative, there are caveats that must
be understood to properly utilize them.
Conclusion
In the early days of biotechnology, it became
apparent that humans, apes, and other mammals
shared protein sequences that were very similar. In
fact, many human proteins exhibit high amino acid
similarity in both ape and non-primate mammalian
taxa (Clamp et al. 2007). One of the primary issues
of concern in various evolutionary studies is that
PRVW VFLHQWLVWV RQO\ WDNH LQWR DFFRXQW VLPLODULWLHV
between biological sequences present in both human
and apes that are pre-selected and already considered
similar at some level. Also, DNA sequences that do
not align well are often discarded or gaps may not
be accounted for in alignment analyses. Another
important consideration is whether an expressed
genomic product is doing the same thing in humans
as it does in apes and is it expressed in the same way?
These factors are often not given proper recognition.
$PDMRULW\RIWKHSXEOLFDQGVFLHQWLÀFFRPPXQLW\DUH
not aware of these caveats and still told hold to the
dogma that the human genome is 98 to 99% similar
WRFKLPSDQ]HHZKLFKLVPRVWOLNHO\QRWWKHFDVH7KH
IDFWLVWKDWPDMRUGLIIHUHQFHVEHWZHHQWKHVWUXFWXUHRI
the human and a chimpanzee genomes are now being
documented as the genomic resources improve.
When evaluating comparisons between genomes
using DNA sequence, it is important to understand
the nature of how that sequence was obtained and
bioinformatically manipulated. It is not uncommon
to arrange the DNA sequence of a genome for which
OLWWOHLVNQRZQE\XVLQJWKHJHQRPHRIDK\SRWKHWLFDO
HYROXWLRQDU\ FRPPRQ DQFHVWRU RU µFORVH UHODWLYHµ
that has better-developed genomic resources. This
obviously introduces an evolutionary bias at several
levels. Furthermore, sequence comparisons that
have yielded similarities are typically screened DNA
clones and regions selected beforehand based on
some level of similarity. While many DNA sequences
LQHXNDU\RWLFJHQRPHVDUHGLIÀFXOWWRZRUNZLWKGXH
to their repetitive nature, they also contain critical
regulatory features that are now appearing to be
MXVWDVLPSRUWDQWDVWKHJHQHVWKHPVHOYHVIRUSURSHU
function. Understanding the technology used to
produce a genomic DNA sequence product is critical
SULRUWRPDNLQJDQ\GHÀQLWLYH FRQFOXVLRQVDERXWWKH
data in question.
Most biologists among creationists and evolutionists
would expect DNA sequence similarities between
humans and apes due to shared anatomical and
SK\VLRORJLFDOIHDWXUHV+RZHYHULWLVYHU\OLNHO\WKDW
earlier comparative genomic studies constrained
by limited resources and propelled primarily by
evolutionary dogma, need to be repeated using better
tools and less bias.
References
Barash, Y. et al. 2010. Deciphering the splicing code. Nature
465:53–59.
%ULWWHQ5-'LYHUJHQFHEHWZHHQVDPSOHVRIFKLPSDQ]HH
and human DNA sequences is 5% counting indels.
Proceedings of the National Academy of Sciences of the
United States of America 99, no. 21:13633–13635.
Clamp et al. 2007. Distinguishing protein-coding and
noncoding genes in the human genome. Proceedings of
the National Academy of Sciences of the United States of
America 104, no. 40:19428–19433.
J. P. Tomkins
88
'ROH]HO-DQG-*UHLOKXEHU1XFOHDUJHQRPHVL]H$UH
we getting closer? Cytometry Part A 77, no. 7:635– 642.
Fraser, C. M. et al. 1995. The minimal gene complement of
Mycoplasma genitalium. Science 270, no. 5235:397–403.
Hobolth, A. et al. 2011. Incomplete lineage sorting patterns
among human, chimpanzee, and orangutan suggest recent
orangutan speciation and widespread selection. Genome
Research 21, no. 5:349–356.
+XJKHV -) HW DO &KLPSDQ]HH DQG KXPDQ <
FKURPRVRPHVDUH UHPDUNDEO\GLYHUJHQW LQVWUXFWXUHDQG
gene content. Nature 463:536 –539.
International Human Genome Sequencing Consortium. 2001.
Initial sequencing and analysis of the human genome.
Nature 409:861–920.
International Human Genome Sequencing Consortium. 2004.
Finishing the euchromatic sequence of the human genome.
Nature 431:931–945.
Istrail et al. 2004. Whole-genome shotgun assembly and
comparison of human genome assemblies. Proceedings of
the National Academy of Sciences of the United States of
America 101, no. 7:1916–1921.
Kong, A. et al. 2002. A high-resolution recombination map of
the human genome. Nature Genetics 31:241–247.
0DUGLV (5 1H[WJHQHUDWLRQ VHTXHQFLQJ PHWKRGV
Annual Review of Genomics and Human Genetics 9:
387–402.
Meyers, B. C., S. Scalabrin, and M. Morgante. 2004. Mapping
and sequencing genomes: Let’s get physical. Nature Reviews
Genetics 5(8):578 –588.
0XVKHJ LDQ$5DQG(9.RRQLQ$PLQLPDOJHQHVHW
for cellular life derived by comparison of complete bacterial
genomes. Proceedings of the National Academy of Sciences
of the United States of America 93, no. 19: 10268–10273.
5RJHUV <+ DQG -& 9HQWHU 0DVVLYHO\ SDUDOOHO
sequencing. Nature 437:326–327.
6DQJHU ) 6 1LFNOHQ DQG $5 &RXOVRQ '1$
sequencing with chain-terminating inhibitors. Proceedings
of the National Academy of Sciences of the United States of
America 74, no. 12:5463–5467.
Shizuya, H. et al. 1992. Cloning and stable maintenance of 300-
NLOREDVHSDLUIUDJPHQWVRIKXPDQ'1$LQ(VFKHULFKLDFROL
using an F-factor-based vector. Proceedings of the National
Academy of Sciences of the United States of America 89, no.
18:8794– 8797.
7D\ORU-1RWDFKLPS7KHKXQWWRÀQGWKHJHQHV WKDW
make us human2[IRUG8QLYHUVLW\3UHVV1HZ<RUN1HZ
<RUN
The Chimpanzee Sequencing and Analysis Consortium. 2005.
Initial sequence of the chimpanzee genome and comparison
with the human genome. Nature 437:69–87.
9HQ WH U - &HWDO7KHVHTXHQFHRIWKHKXPDQJHQRPH
Science 291(5507):1304–1351.
Wan g, Z. and C. B. Burge. 2008. Splicing regulation: From a
parts list of regulatory elements to an integrated splicing
code. RNA 14:802–813.
:DU UHQ5 /HWDO3K\VLFDOPDSDVVLVWHGZKROHJHQRPH
shotgun assemblies. Genome Research 16:768–775.
Watanabe et al. 2004. DNA sequence and comparative analysis
of chimpanzee chromosome 22. Nature 429:382–388.
:HEHU -/ DQG (: 0\HUV +XPDQ ZKROHJHQRPH
shotgun sequencing. Genome Research 7:401–409.
Wildman, D. E. et al. 2003. Implications of natural selection
in shaping 99.4% nonsynonymous DNA identity between
humans and chimpan zees: Enlarging genu s Homo.
Proceedings of the National Academy of Sciences of the
United States of America 100, no. 12:7181–7188.