ArticlePDF Available

Genetic genealogy for cold case and active investigations

Authors:
  • Parabon NanoLabs

Abstract and Figures

Investigative genetic genealogy has rapidly emerged as a highly effective tool for using DNA to determine the identity of unknown individuals (unidentified remains or perpetrators), generating identifications in dozens of law enforcement cases, both cold and active. The amount of press coverage of these cases may have given the impression that the analysis is straightforward and the outcome guaranteed once a sample is uploaded to a database. However, the database query results serve only as clues from which in-depth genealogy and descendancy research must proceed to determine the possible identities of an unknown individual. While there certainly will be more announcements of cases solved using this new technique, there are many more cases where identification has not yet been possible due to the wide variety of complications present in these investigations. This paper lays out the fundamentals of genetic genealogy, along with the challenges that are encountered in many of these investigations, and concludes with a set of case studies that demonstrate the variety of cases encountered thus far.
Content may be subject to copyright.
1
Author Manuscript, published in final form as:
Greytak EM, Moore C, & Armentrout SL (2019). Genetic genealogy for cold case and active
investigations. Forensic Science International, 299, 103113. doi: 10.1016/j.forsciint.2019.03.039
Genetic genealogy for cold case and active investigations
Ellen M. Greytak, CeCe Moore, Steven L. Armentrout
Parabon NanoLabs, Inc., 11260 Roger Bacon Dr. Suite 406, Reston, VA, 20190, USA
Highlights
Genetic genealogy is helping to close both cold and active investigations
Forensic DNA is uploaded to public genetic genealogy databases to find relatives
Extensive genealogy and descendancy research generate a list of possible identities
Many complicating factors can impede the research
Identity is narrowed down using a wide range of information & confirmed using STRs
Abstract
Investigative genetic genealogy has rapidly emerged as a highly effective tool for using DNA to
determine the identity of unknown individuals (unidentified remains or perpetrators), generating
identifications in dozens of law enforcement cases, both cold and active. The amount of press
coverage of these cases may have given the impression that the analysis is straightforward and
the outcome guaranteed once a sample is uploaded to a database. However, the database
query results serve only as clues from which in-depth genealogy and descendancy research
must proceed to determine the possible identities of an unknown individual. While there
certainly will be more announcements of cases solved using this new technique, there are many
more cases where identification has not yet been possible due to the wide variety of
complications present in these investigations. This paper lays out the fundamentals of genetic
genealogy, along with the challenges that are encountered in many of these investigations, and
concludes with a set of case studies that demonstrate the variety of cases encountered thus far.
Keywords: Genetic genealogy; Forensic genetics; DNA; SNPs; Cold cases; Human
identification
Introduction
Traditional genealogy has been practiced for centuries, using documentary records and oral
histories to trace families backwards in time. Until recently, these were the only ways to
connect extended family members, but with the advent of direct-to-consumer (DTC) genetic
testing, it is now possible to find relatives through shared DNA. This has enabled thousands of
individuals who have lost their biological identity through adoption, abandonment, anonymous
gamete donation, misattributed parentage, etc., to regain their genetic heritage. More recently,
these same tools have been used to identify DNA from suspected perpetrators in more than
thirty law enforcement cases, only some of which have been publicly announced (Table 1).
Table 1: Cases for which law enforcement agencies have announced identification of DNA from a
suspected perpetrator with the aid of genetic genealogy (through 1/31/19). * Deceased; ** Pled guilty
Location
Case
Year(s)
Identified As
Date
Announced
Genetic
Genealogist
1
California
Multiple Homicides and Sexual
Assaults - “Golden State Killer”
1974 -
1986
Joseph James
DeAngelo
April 24,
2018
Barbara
Rae-Venter
2
2
Snohomish
County, WA
Double Homicide of Jay Cook (20)
and Tanya Van Cuylenborg (18)
1987
William Earl
Talbott II
May 21, 2018
Parabon
3
Tacoma, WA
Homicide of Michella Welch (12)
1986
Gary Charles
Hartman
June 20,
2018
Parabon
4
Lancaster, PA
Homicide of Christy Mirack (25)
1992
Raymond Charles
Rowe**
June 25,
2018
Parabon
5
Brazos County,
TX
Homicide of Virginia Freeman (40)
1980
James Otto
Earhart*
June 25,
2018
Parabon
6
Fort Wayne, IN
Homicide of April Tinsley (8)
1988
John Dale Miller**
July 15, 2018
Parabon
7
Woonsocket, RI
Homicide of Constance Gauthier
(81)
2016
Matthew Norman
Dessault
July 18, 2018
Parabon
8
St. George, UT
Sexual Assault of Carla Brooks
(79)
2018
Spencer Glen
Monnett**
July 28, 2018
Parabon
9
Fayetteville, NC
Multiple Sexual Assaults -
“Ramsey Street Rapist”
2006 -
2008
Darold Wayne
Bowden
August 22,
2018
Parabon
10
Champaign
County, IL
Homicide of Holly Cassano (22)
2009
Michael F. A.
Henslick
August 29,
2018
Parabon
11
Montgomery
County, MD
Multiple Sexual Assaults
2007 -
2011
Marlon Michael
Alexander
September
14, 2018
Parabon
12
Sarasota, FL
Homicide of Deborah Dalzell (47)
1999
Luke Edward
Fleming
September
19, 2018
Parabon and
Barbara
Rae-Venter
13
California
Multiple Sexual Assaults - “NorCal
Rapist”
1991 -
2006
Roy Waller
September
21, 2018
Law
Enforcement
14
Greenville, SC;
Memphis, TN;
Portageville, MO
Multiple Homicides and Sexual
Assaults
1990 -
1998
Robert Eugene
Brashers*
October 5,
2018
Parabon
15
Starkville, MS
Double Homicide of Betty Jones
(65) and Kathryn Crigler (81)
1990
Michael W.
DeVaughn
October 8,
2018
Parabon
16
Greenbrier, AR
Homicide of Pam Felkins (32)
1990
Edward Keith
Renegar*
October 29,
2018
Parabon
17
Fulton County,
GA
Homicide of Lorrie Ann Smith (28)
1997
Jerry Lee
November 1,
2018
Parabon
18
Anne Arundel
County, MD
Homicide of Michael Temple (29)
2010
Fred Lee
Frampton, Jr.
November 2,
2018
Parabon
19
Orlando, FL
Homicide of Christine Franke (25)
2001
Benjamin L.
Holmes
November 5,
2018
Parabon &
Florida Dept.
of Law
Enforcement
20
Carlsbad, CA
Homicide of Jodine Serrin (39)
2007
David Mabrito*
November
13, 2018
Parabon and
Barbara
Rae-Venter
21
Santa Clara, CA
Homicide of Leslie Marie Perlov
(21)
1973
John Arthur
Getreu
November
21, 2018
Parabon
22
College Station,
TX
Multiple Sexual Assaults
2018
Christopher Quinn
Williams
December
12, 2018
Parabon
23
Cedar Rapids, IA
Homicide of Michelle Martinko
(18)
1979
Jerry Lynn Burns
December
19, 2018
Parabon
24
Hernando
County, FL
Sexual Assault of Unnamed Victim
(12)
1983
William L. Nichols*
January 10,
2019
Parabon
25
Orange County,
Sexual Assaults of Two Unnamed
1995 &
Kevin Konther
January 11,
Law
3
CA
Victims (9 and 31)
1998
2019
Enforcement
26
La Mesa, CA
Homicide of Scott Martinez (47)
2006
Zachary Aaron
Bunney
January 24,
2019
Parabon
27
Fremont, CA
Homicide of Jack Upton (30)
1990
Russell Guerrero
January 24,
2019
Parabon
28
Portland, OR
Homicide of Anna Marie Hlavka
(20)
1979
Jerry Walter
McFadden*
January 31,
2019
Parabon
Generating Data
Unlike traditional forensic DNA analysis, which uses autosomal short tandem repeats (STRs) to
generate an identity profile from ~20 loci, genetic genealogy uses hundreds of thousands of
single nucleotide polymorphisms (SNPs) spread across the autosome. Participants in genetic
genealogy have had their DNA tested by a direct-to-consumer (DTC) genetic testing company,
such as 23andMe or AncestryDNA, which use microarrays to genotype up to ~1 million SNPs.
DTC companies obtain DNA from spit kits or cheek swabs and thus always have a large amount
of high-quality single-source DNA to work with. Forensic DNA samples, on the other hand,
often only have a small amount of degraded DNA, which may be mixed with DNA from one or
more other individuals. Microarray genotyping has previously been shown to be effective and
accurate with forensic samples (Keating et al., 2013), and Parabon has used it for casework
since 2015, generating high genotyping call rates from forensic samples down to 1 ng of DNA
(Table 2). Parabon has also found it is possible to accurately deconvolute microarray data from
two-person mixtures, as long as the person-of-interest is at least 40% of the mixture and a
single-source reference sample from the second contributor is available.
Table 2: Summary of Parabon’s >250 forensic DNA samples used in genetic genealogy casework and
the resulting microarray genotyping call rates.
Type
Quantity
Call Rate
Semen
48.0%
Single Source
79.4%
2.5 ng
22.7%
> 95%
47.5%
Blood
24.6%
Low Mixture
16.4%
2.5-5 ng
12.6%
90-95%
12.2%
Tissue
10.1%
High Mixture
(Deconvoluted)
4.2%
5-10 ng
13.0%
80-90%
17.5%
Saliva
7.7%
10-20 ng
17.8%
70-80%
6.1%
Bone
4.8%
20-40 ng
27.1%
60-70%
12.2%
Touch
4.8%
40-80 ng
3.2%
<60%
4.6%
>80 ng
3.6%
Parabon’s casework currently uses the Illumina CytoSNP-850K array, an off-the-shelf chip that
contains >98% of the SNPs on the OmniExpress chip used by Ancestry.com, FamilyTreeDNA,
and MyHeritage. 23andMe previously also based their chip on the OmniExpress but has since
moved to smaller custom chips that overlap less with the other DTC companies. For law
enforcement cases, extracted DNA samples are processed at a CLIA-certified lab, and the data
is uploaded securely to Parabon.
Determining Relatedness from DNA
Given enough SNPs, it is possible to determine the degree of relatedness between two people,
which is defined by the expected amount of shared DNA, not the number of meioses (Figure 1).
4
Figure 1: Pedigree showing the degrees of relatedness, as defined by the expected amount of shared
DNA. Each relationship is defined with respect to the red “self / twin” box.
While several relationship inference methods had previously been proposed (Huff et al., 2011;
Manichaikul et al., 2010), 23andMe was the first DTC company to introduce an accurate,
scalable approach to inferring approximately how closely related two DNA samples are from
autosomal SNPs (Henn et al., 2012). Each person has two copies of each of the 22 autosomal
chromosomes (“autosomes”), one inherited from their mother and one inherited from their
father. Autosomes are not inherited intact from each parent; rather, each parent’s own pair of
chromosomes is randomly recombined into a new chromosome that is passed onto the child.
While recombination occurs randomly, nucleotides that are closer to one another on a
chromosome are more likely to be inherited together, while nucleotides that are far apart are
more likely to be separated by recombination. The probability of recombination between two
nucleotides is quantified as their genetic distance, which is measured in centimorgans (cM),
such that 1 cM equates to a 1% probability of recombination.
Rather than simply looking at the total number of shared SNPs, genetic genealogy takes
advantage of the fact that recombination will break up long stretches of shared DNA over the
generations, such that more closely related people will share longer stretches of DNA
(“segments”) that are identical-by-descent (IBD) (Figure 2). The more recombination events
that have occurred, the shorter the shared IBD segments will be, so the number and length of
IBD segments in cM can be used to approximate the degree of relatedness.
5
Figure 2: Inheritance of DNA segments on a single chromosome. The lengths of the shared segments
(shaded boxes) are summed across all 22 autosomes to give the total amount of shared DNA.
To detect IBD segments, genetic genealogy algorithms search for regions of the genome where
two individuals share at least one allele at every SNP. To be counted, these segments must
contain a minimum number of SNPs (typically ~500) and be over a certain length (typically 5-7
cM), which screens out most segments that are shared by chance rather than due to common
descent. When summed across all autosomes, the amount of DNA shared IBD strongly
correlates with the degree of relatedness between two individuals, such that more distant
relatives tend to share less DNA (Table 3). However, due to the random nature of
recombination, the amount of shared DNA can vary greatly for relatives of the same degree,
and this variation increases with more recombination events, such that ~10% of third cousins
and ~50% of fourth cousins share no detectable IBD segments.
Table 3: The range of DNA shared by pairs of people with each relationship. While most pairs from a
given relationship fall within a narrower range, these values represent the full ranges that have been
observed (Ball et al., 2016).
cM Range
Degree
Relationship
3,600
1
Parent-Child
2,000-3,600
1
Full Sibling
1,060-2,500
2
Half-Sibling, Avuncular, Double First Cousin, Grandparent / Grandchild
425-1,500
3
First Cousin (1C), Half-Avuncular, Great-Grandparent / Great-Grandchild, Great-
6
Avuncular
160-950
4
First Cousin Once-Removed (1C1R), Half-First Cousin (½ 1C), Half-Great-
Aunt/Uncle / Half-Great-Niece/Nephew
65-650
5
Second Cousin (2C), First Cousin Twice-Removed (1C2R),
Half-First Cousin Once-Removed (½ 1C1R)
0-375
6
Second Cousin Once-Removed (2C1R), Half-Second Cousin (½ 2C), First
Cousin Thrice-Removed (1C3R), Half-First Cousin Twice-Removed (½ 1C2R)
0-245
7
Third Cousin (3C), Second Cousin Twice-Removed (2C2R)
0-185
>7
Third Cousin Once-Removed (3C1R), Distant Cousins
Genetic Genealogy Databases and Genetic Privacy
DTC genetic testing companies’ private databases have exploded in size, with AncestryDNA
currently containing nearly 15 million individuals, 23andMe containing nearly 10 million, and
MyHeritage and FamilyTreeDNA (FTDNA) together containing roughly 3.5 million (Regalado,
2019). AncestryDNA and 23andMe maintain their databases separately and are not accessible
to law enforcement, as the only way to submit a sample is via a cheek swab or spit kit.
MyHeritage and FTDNA both allow uploads of data generated from other sources, but law
enforcement usage of either requires written permission from the company, as well as a court
order for MyHeritage or “the required legal documentation” for FTDNA.
GEDmatch, on the other hand, is not a DTC company. It was created by Curtis Rogers and
John Olson in 2010 as a public database where individuals from different testing companies
could compare their DNA by downloading their raw data from a DTC company’s site and
uploading it to a common database. After the Golden State Killer suspect was identified through
surreptitious use of GEDmatch, the site’s administrators decided to explicitly allow law
enforcement usage. They posted a notice on the front page of the site (Figure 3) and also
updated their Terms of Service to state that law enforcement can and is using GEDmatch to
identify remains and perpetrators of violent crimes, defined as homicides or sexual assaults
(GEDmatch.com). Both new and existing users were required to view these new Terms and
decide whether to accept them before using the site. Critics of genetic genealogy argue that
many people who joined the site prior to this update may not have considered the possibility that
their desire to locate relatives could lead to the discovery that they are related to someone
whose DNA is associated with a crime and to the apprehension of that relative. Indeed, it is
possible some of them still may be unaware of the new warning, and individuals who had their
data uploaded by another individual or have been inactive on the site may not have reviewed
the new Terms to decide whether to consent. However, even prior to implementing these new
Terms, GEDmatch’s Terms clearly stated that any data set to “public” would be searchable by
anyone. The law has generally allowed information made available to the public to be used in
criminal investigations. Users can easily have their data set to “private,” hiding it from all search
queries, or removed entirely. Thus, the DNA data files in a public database like GEDmatch
come from individuals who have proactively downloaded their data from a private DNA testing
company’s website, uploaded the information to a public website, reviewed the Terms of
Service that permits law enforcement usage, and opted in to public comparisons against their
data.
7
Figure 3: Notice posted on GEDmatch’s homepage after the site’s use in the Golden State Killer
investigation was made public.
Additionally, no sensitive genetic information is disclosed to law enforcement during a genetic
genealogy search, as the raw genetic data from GEDmatch users is not accessible. Raw
genetic data can contain sensitive health-related information, and this type of private genetic
information should be protected. In keeping with this precept, no raw genotypes are displayed
or made available for download by GEDmatch. GEDmatch simply performs comparisons
among samples, returning the lengths and chromosomal locations of shared DNA segments,
which are used to determine the approximate relationship between individuals. Similarly, data
obtained from abandoned DNA at a crime scene and used for genetic genealogy are not
exposed to other users and can be prevented from appearing in search results (an option
available to all users). At Parabon, genetic data is kept on an encrypted server only accessible
to authorized employees, and the company’s GEDmatch accounts can only be accessed by the
bioinformatics team and the lead genetic genealogist, CeCe Moore. These facts mitigate many
of the privacy concerns surrounding genetic genealogy, as individuals have control over
whether their data is used as part of law enforcement investigations, and sensitive raw data is
not accessed (Greytak et al., 2018).
Unlike with familial searching of law enforcement databases, no one is legally required to
contribute to a genetic genealogy database, and the samples are not in the possession of
government agencies. The persons contributing to GEDmatch are warned explicitly that
criminal investigators as well as fellow genealogy enthusiasts are able to perform comparisons
against their data. If they choose to participate anyway, there is no reason why law
enforcement should not be able to use this information. These significant differences from
familial searching argue against automatically applying familial search policies, such as
restricting analysis to the end of an investigation, to genetic genealogy. The two techniques are
entirely independent; familial searching has previously been used in some genetic genealogy
cases and not in other; The public is strongly in favor of the use of genetic genealogy to
investigate violent crimes: GEDmatch saw a significant increase in the number of participants
after the Golden State Killer arrest (Milian, 2018), and a recent survey showed overwhelming
public support (Guerrini et al., 2018).
Database Searching
A GEDmatch one-to-many query compares the DNA of interest to all public data in the
database, returning a list of individuals who share the most autosomal DNA. Each “match”
includes the individual’s name or alias, the email address associated with their GEDmatch
account, and any haplogroup or family tree information they have chosen to share (Figure 4).
8
Figure 4: Top five results from a GEDmatch one-to-many comparison, with potentially identifying
information (kit numbers, names, and email addresses) removed.
A one-to-one comparison can then be run on each match using a more precise algorithm to see
the lengths and chromosomal locations of the shared segments. Comparing the amount of
shared DNA to reference data (e.g., (Bettinger & Perl, 2018)) gives the probability that the
relationship between the unknown individual and the match falls into each degree of
relatedness. For example, a match sharing 100 cM could be anywhere from 5th degree to >8th
degree, with 6th degree being most likely.
However, there are additional complications. First, in addition to multiple possible degrees of
relatedness, each degree contains many relationship types that must be considered (e.g., 5th
degree relatives around the same age could be second cousins, first cousins twice-removed, or
half-first cousins once-removed). Second, the amount of DNA shared by each relationship
varies among populations. Populations founded by a small number of individuals can have low
genetic diversity and high background relatedness, or endogamy. In such populations,
individuals with a given relationship will share significantly more DNA than in other populations,
such that even very distant cousins can share significant amounts of DNA. Endogamy
manifests as a large number of matches, each sharing many small segments, indicating that the
segments were actually inherited from distant ancestors (ISOGG, 2019). Another challenge is
pedigree collapse, in which the same families intermarry multiple times throughout history,
which can inflate the amount of shared DNA between their descendants.
Casework Match Results
More than 80% of samples from Parabon’s >250 law enforcement cases have resulted in a
match at the third cousin level or closer (>60 cM), with subjects of European descent having a
higher probability of success due to their overrepresentation in genetic genealogy databases
(Greytak & Moore, 2018) (Figure 5A). European descent was assessed by Snapshot DNA
Phenotyping, which infers an individual’s genetic admixture from seven continental populations
(African, Middle Eastern, European, Central/South Asian, East Asian, Oceanian, and Native
American). In this analysis, samples were considered “European” if they had at least 80%
European ancestry. Note that the law enforcement cases submitted to Parabon are primarily
from North American agencies, and samples from other regions will likely have lower match
probabilities due to lower participation in DTC genetic testing and use of GEDmatch.
The closeness of the top match is not the sole variable in determining viability for genetic
genealogy. A comprehensive assessment must include consideration not only of the closest
match, but of the quality of the supporting matches and the amount of information available
about each match. For example, progress may be difficult if the top match has unknown
parentage and/or is from a country where records are not available. Parabon assesses each
sample on a subjective scale: 1) very high probability of identification (e.g., parent-child match),
2) high probability of identification, 3) medium probability of identification, 4) low probability of
identification but likely to generate actionable information, and 5) unlikely to generate actionable
information. An assessment does not guarantee a particular outcome but is intended to help
9
agencies to decide how to proceed. Thus far, ~80% of European samples and ~60% of non-
European samples have been assessed as workable (assessments 1-4) (Figure 5B).
Figure 5: For Parabon’s >250 law enforcement samples, the frequency of A) the top GEDmatch one-to-
many match being in each degree of relatedness and B) samples receiving each assessment level.
Results are reported for European, non-European, and all samples, as well as for those cases that have
been solved (i.e., resulted in an identification) thus far. Degree of relatedness is based solely on the
amount of shared DNA, not the true relationship determined through genealogy: Parent-Child (>3300 cM),
Full Siblings (2200-3300), 2nd Degree (1300-2200), 3rd Degree (650-1300), 4th Degree (340-650), 5th
Degree (200-340), 6th Degree (90-200), 7th Degree (60-90), 8th Degree (30-60), >8th Degree (<30).
Importantly, just because a sample does not have sufficient promising match data today does
not mean it never will. Hundreds of new individuals upload their data to GEDmatch every day
(Milian, 2018), and as the database grows, the proportion of samples with close matches will
increase. Thus, Parabon monitors all unsolved cases for new matches on a weekly basis.
Genealogy Research
While most of the discussion surrounding genetic genealogy focuses on the database matches,
the vast majority of genetic genealogy work happens after the match list is generated. Many US
records are available to the public and have been compiled into searchable databases
accessible via subscription. For example, Ancestry.com provides a mechanism for accessing a
large collection of records, such as the census through 1940, vital records (birth, marriage,
death) from many states, the Social Security Death Index, and Newspapers.com. Some
Ancestry.com users also create and share public family trees, although these can contain
errors, so they must be examined critically. People search databases and public social media
can also be used to help determine family structures. In some cases, law enforcement may be
asked to assist with this research using their greater access to records.
A previous analysis of the MyHeritage DTC database showed that ~60% of individuals of
Northern European descent will have a match at 100 cM or closer (Erlich, Shor, Pe, & Carmi,
2018). Using simulation, the authors showed that it is often possible to identify an unknown
individual from a single third cousin level match given knowledge of his or her sex, location
within 100 miles, and age within 5 years. However, in addition to the fact that such detailed
demographic information is often not available in law enforcement cases, this assumes that,
given a third cousin match, it is straightforward to obtain a complete list of the match’s relatives
at that distance (the authors determined this number to be ~850, not including half relatives). In
reality, a massive amount of work is required to expand a match into a list of relatives (Greytak
A)
B)
10
et al., 2018).
The first task is to definitively identify each match, which itself can be quite difficult. Although
GEDmatch displays the name and email address associated with each matching kit, users can
choose to use an alias or an anonymous email address, and kits are sometimes managed by
someone other than the match themselves. Moreover, even if a user associates their actual
name, it may be common (e.g., John Smith), which can complicate identification. Consequently,
the initial identification of matches is both critical and challenging, and often requires
considerable genetic genealogical skill and creative problem solving, e.g., deciphering initials,
inferring identities from other identifiable matches, and figuring out who DNA is from when the
kit is managed by someone else. Even though contacting matches via the given email address
might enable identification and even produce family tree information, Parabon seldom contacts
matches directly so as to minimize the number of people involved in an investigation and reduce
the risk of tipping off a suspect. Matches closer than third cousins are only contacted with the
permission of the investigating agency, and the agency can choose to make the contact instead.
Any contact includes the fact that the questions are in regard to a law enforcement investigation
(no specifics of the case are given), and the individual is informed they are free to participate or
not. If the individual asks not to be involved, they are not contacted again.
Once the matches are identified, their family trees must be constructed back to the set of
possible common ancestors with the unknown individual. The number of generations back in
time to the common ancestors of interest is determined by the distance of the matches’
relationships, although since the estimates are not usually specific to a single relationship, often
the family trees must be built even further back than these levels would imply. Building family
trees back in time requires traditional genealogy research: combing through public records to
determine the identities of each generation’s parents.
However, records are not always available - not all US states maintain an accurate and public
birth index, many families trace back to immigrants from other countries where records are not
readily available, etc. In addition, biological family trees often do not match documented family
trees due to misattributed paternity, unrecorded adoption, unknown parentage, etc., and
individuals in these situations are overrepresented in genetic genealogy databases. Surnames
and spellings also often change through the generations, further complicating the analysis.
Descendancy Research
Once possible common ancestors have been identified, the family trees must then be built
forward in time (“descendancy research” or “reverse genealogy”) to elucidate the possible
identities of the unknown individual (Figure 6).
Figure 6: A hypothetical family tree resulting from genetic genealogy research. Given a match in
GEDmatch (orange star), the family tree is built backward in time to the possible common ancestors
(orange) and then forward in time (blue) to determine the possible identities of the unknown individual (in
this case, from among the “second cousins”).
11
The possible ancestors from which the unknown individual descends can sometimes be
narrowed using genomic ancestry (e.g., if the family tree is Northern European, but the unknown
individual has 25% ancestry from another population, the genetic genealogist can search
among the possible grandparents for one who married someone from that ancestral group).
Shared DNA on the X-chromosome can also narrow down the possible paths between matches,
as males only inherit X-DNA from their mothers. Thus, if an unknown male shares X-DNA with
a match, they must be related through his mother, and the path between them cannot pass
through two males in a row. When available, Y-chromosome and mitochondrial (mtDNA)
haplogroups can also narrow down the possibilities, as these are passed directly from father to
son and from mother to child, respectively. Thus, individuals share a mtDNA haplogroup with
their maternal lineage, and males share a Y haplogroup with their paternal lineage.
DNA sharing among matches can also be used to narrow down where the unknown individual
falls in the tree. If matches do not share any DNA with one another, they are likely related to the
individual on different branches of his or her family tree, and the genetic genealogist can then
search for an intersection (“triangulation”) between the two matches’ families in the form of a
marriage that produced children or an out-of-wedlock birth (Figure 7). While there could be
hundreds or thousands of individuals who are second or third cousins to a single match, there
are typically only a few individuals who are cousins at the right distance to multiple matches.
Figure 7: Triangulation between two hypothetical family trees. Given two matches in GEDmatch who are
unrelated to one another (orange stars), family trees are built for each and then searched for an
intersection (green) in the form of a marriage or out-of-wedlock birth. Children of this intersection are
related to both matches, while all other individuals in the tree are only related to one match.
Narrowing Down the Possible Identities
Once candidate individuals have been identified, the genetic genealogist can use a variety of
factors to include or exclude them, in addition to traditional investigative information, such as a
connection to the crime scene or the victim. Sex is known from the DNA, and some age
information may be available for unidentified remains, age can be estimated; for perpetrators,
at minimum, they had to be alive and physically capable of committing the crime. The individual
also had to be in a given location at a given time, which may mean he or she lived nearby.
While the GEDmatch matches may be spread across the US or even the world, it is sometimes
possible to focus on a particular branch of the family that moved close to the location of interest.
Parabon’s genetic genealogists also use Snapshot DNA Phenotyping (Greytak & Armentrout,
2015) to prioritize among individuals and confirm or exclude hypotheses. An individual’s eye
color, hair color, and skin color can often be determined from mugshots, yearbook photos, or
social media and compared to the predictions. Full siblings cannot be distinguished using
genetic genealogy, as they share all the same genealogical relationships with the matches.
However, if they differ in phenotype, this can be used to prioritize among them. Similarly, if
genealogy research leads to an individual whose phenotypes are at odds with the predictions,
this can spur continued research, while a close similarity can help corroborate an identification.
The degree to which the identity of the unknown individual can be narrowed down varies from
case to case. In the best-case scenario, a single individual or a set of siblings can confidently
be identified through matches to multiple branches of their family tree. More often, there are
12
multiple cousins (descendants of a particular set of common ancestors) who are consistent with
the available information. These leads can then be followed up through additional research,
traditional investigation, and/or targeted kinship testing of family members to more precisely
place the unknown individual in the family tree. Parabon’s Snapshot Kinship Inference tool uses
genome-wide SNP data to predict the precise degree of relatedness between individuals, out to
6th-degree relatives (Greytak et al., 2017). Using a machine learning model built on thousands
of reference subjects with known relationships, Snapshot predicts the probability that a pair
belongs to each degree of relatedness. Confidence is calculated using the probability of the
most likely degree and the precision calculated for that degree in cross-validation.
Law Enforcement Leads
During decades-long cold case investigations, hundreds or thousands of individuals may be
investigated before the perpetrator is found. Genetic genealogy offers an efficient means of
narrowing an investigation, often to only a few individuals. The number of possible relatives
included in a genetic genealogy analysis varies depending on the number and distance of the
matches. Even when the only matches are distant and large family trees must be constructed
because common ancestors are many generations in the past, experienced genetic
genealogists can triangulate among the matches to determine the most promising branches of
the family tree and limit the amount of unnecessary tree building. Given sufficient triangulation
and time, the number of leads can be reduced to the offspring of a single couple.
No matter how confident the identification, however, genetic genealogy alone cannot prove
identity with 100% certainty. There is always a remote possibility that the unknown individual
could have been adopted or abandoned, and his or her existence could be unknown to family
and not revealed through official records. Therefore, genetic genealogy leads must be verified
through a direct DNA comparison between the person-of-interest’s STR profile and that of the
crime scene sample. It is this traditional forensic DNA match that is used for prosecution.
Case Studies
The following case studies demonstrate how genetic genealogy has been used to assist
investigators with identifying a suspect in cold case investigations. Only information approved
for public release by the investigating agencies is included, so some case details (e.g., DNA
sample source, exact GEDmatch match information) have been obfuscated.
Case Study #1: Snohomish County, WA; 31-year-old cold case (double homicide)
This case study demonstrates the ideal genetic genealogy case, where there are close matches
and clear familial connections that point to only a single conclusion. However, even seemingly
straightforward cases require a large amount of research and the expertise to recognize and
cope with confounding factors such as unknown and misattributed parentage.
The Crime: In 1987, a young Canadian couple, Jay Cook (20) and Tanya Van Cuylenborg (18),
traveled from British Columbia to Washington State in a van. After purchasing a ferry ticket to
Seattle, they were never heard from again. Days later, Tanya’s body was found in a ditch in the
woods, and a few days after that, Jay’s body and the van were found in two separate locations.
DNA evidence was obtained for an unknown suspect (“Subject”).
GEDmatch: There were two matches at approximately the 5th degree relative level, plus
additional more distant matches. The top two matches had no shared DNA between them,
meaning they were most likely related to the Subject on different branches of his family tree.
Family Trees: Family trees were constructed for both key matches back to their great-
13
grandparents and beyond using census records, vital records, newspaper archives, public
“people search” databases, public social media data, and public family trees. Next,
descendancy research was performed to trace the descendants of each set of ancestors to
determine if an intersection between them could be found.
A triangulating marriage was found between a granddaughter of Match #2’s great-grandparents
and a son of Match #1’s great-grandmother. Extensive research revealed that this son had
taken his stepfather’s surname, initially obscuring his true relationship to Match #1. Thus, the
children of this marriage were half first cousins once-removed to Match #1, as well as second
cousins to Match #2. While both of these relationships are 5th degree, it is critical to consider
all possible relationship types, as half relationships are quite common. No other marriages were
found between the descendants of these ancestors. There was only one son from this
marriage, William Earl Talbott II, and he was therefore the only known male who could be
carrying this mix of DNA from both matches’ families (Figure 8).
Mr. Talbott had never been arrested for a crime that would require submitting DNA to a
database. He had no known connection to the victims and no reason to have been on the
investigators’ radar. His phenotypes matched those predicted by Snapshot, but without other
information to tie him to the crime, this had not been enough to identify him as a suspect.
Figure 8: Anonymized family tree released by the Snohomish County Sheriff’s Department as part of
their announcement of the arrest of William Earl Talbott II. The tree shows the position of Mr. Talbott
(Suspect) and two GEDmatch matches (Cousins) used to determine his identity.
Resolution: Based on the lead provided by genetic genealogy, the detectives were able to
collect DNA from a cup discarded by Mr. Talbott, which, using traditional STR analysis, was
shown to match the DNA from the crime scene. He was arrested and is currently awaiting trial.
Case Study #2: Tacoma, WA; 32-year-old cold case (homicide)
Triangulation between matches using documentary sources is sometimes not possible. In
addition to being able to tenaciously research records and meticulously build family trees, this
14
case study shows how genetic genealogists must be able to think creatively about possible
hypotheses to explain the available data.
The Crime: 12-year old Michella Welch went missing on 26 March 1986. She had taken her
two younger sisters to Puget Park in Tacoma, Washington and then ridden her bicycle home to
make lunch while her sisters played nearby. When the sisters returned to the park, they found a
brown paper bag with their lunches but no Michella. By 3:10 p.m., officers arrived at the park
and started searching for the missing girl. A tracking dog found her body around 11:30 p.m.
She had been beaten and sexually assaulted and died from a cut to the neck.
The DNA: Another young Tacoma girl, Jennifer Bastian, was also killed around the same time,
and investigators had long believed one person committed both crimes. More than 10,000
investigative hours went into the cases in 1986 alone. Recent DNA testing showed that the
crimes were committed by different men, but neither DNA profile resulted in a CODIS match.
Genetic Ancestry: The Subject was predicted to be predominantly Northern European with a
small but notable amount of Northern Native American admixture (~10%).
GEDmatch: The two top matches did not share DNA, suggesting they were most likely related
to the Subject on different branches of his family tree.
Family Trees: Trees were built for the two top matches back to their great-great-grandparents
and beyond, and extensive descendancy research was performed, but no documented
intersection was found between the two families. The analyst identified a pair of brothers who
were cousins of Match #1, lived within a few miles of the crime scene in 1986, and had two
Native American great-great-grandparents on different branches of their family trees, which was
consistent with the predicted ancestry of the Subject. However, the Subject only shared about
half as much DNA with Match #1 as would be expected for a cousin, and there should have
been an intersection between the families that would connect these cousins to both matches.
When families are connected through DNA but do not intersect on paper (e.g., through a
marriage license or a birth certificate), the explanation may be misattributed paternity: a pair of
individuals from each family had a child together, but the true biological father was not recorded.
Through census record research, it was discovered that relatives of the two matches had lived
in the same small town when one of the cousins’ ancestors was conceived. This was the only
discovered geographical intersection between these families. Based on the amount of shared
DNA, it was postulated that Match #2’s relative was the unrecorded biological father of the
cousins’ ancestor (Figure 9). Under this hypothesis, the cousins would actually be half cousins
to Match #1, which matched the amount of shared DNA. They would also be related to Match
#2 at the appropriate genetic distance.
15
Figure 9: Pedigree for two cousins of Match #1 who were identified as persons-of-interest in the Tacoma
case, showing the apparent misattributed paternity between Match #1’s relative and Match #2’s relative.
Resolution: The genetic genealogy analysis identified a pair of brothers who could be the
Subject, neither of whom had ever been arrested for a crime that would have required
submission of DNA to a database. Officers were eventually able to follow one of the brothers,
Gary Charles Hartman, into a restaurant, where they obtained a napkin he had used and
discarded. Traditional STR analysis showed that the DNA on the napkin matched the DNA
found at the crime scene. More than thirty years after Michella Welch was found murdered in a
Washington park, investigators announced that they had arrested a suspect in her murder.
Hartman is currently awaiting trial.
Case Study #3: Nearly 40-year-old cold case (homicide)
When there are not enough strong matches in GEDmatch to fully narrow down the possible
branches of a large family tree, cases cannot always be resolved efficiently through genetic
genealogy alone. If an intersection between the matches’ families cannot be found, the number
of possible identities for the Subject can be very large. However, as this case study shows, if
family members of the matches are willing to cooperate, targeted kinship testing can quickly
include or exclude various branches of the family tree and thus arrive at a small number of
included individuals. Due to the close relatives of the suspect who were eventually found in this
investigation, the details of this case are not included to protect their privacy.
GEDmatch: The Subject’s top two matches were both in the 6th-8th degree relative range and
had no shared DNA between them, meaning they were most likely related to the Subject on
different branches of his family tree. There were also additional, more distant matches.
Family Trees: Trees were built for the two top matches back to their great-great-grandparents,
but no intersection was found between the two families. The Subject was most likely a great-
grandson or great-great-grandson of one of Match #1’s great-great-grandparent couples, but
without triangulation, it was not possible to narrow his identity down further. Parabon
recommended more research to identify branches of the family that might have moved to the
area of the crime, as well as targeted kinship testing of members of the top match’s family.
Kinship Testing: The investigating agency obtained a voluntary buccal swab from a cousin on
16
Match #1’s paternal side, from which DNA was extracted, genotyped, and compared to the
Subject. Snapshot Kinship Inference predicted this individual was unrelated to the Subject, and
Match #1’s paternal family could therefore likely be excluded (assuming the familial
relationships on paper were correct). The agency then obtained a voluntary buccal swab from a
cousin on Match #1’s maternal side, who was predicted with 94.2% confidence to be a 3rd
degree relative (first cousin or genetic equivalent) to the Subject.
Targeted Family Trees: The analyst built family trees for the spouses of each of the kinship
tester’s maternal aunts and uncles back to their great-great-great-grandparents. One uncle’s
wife was determined to be a distant cousin to many of the Subject’s more distant matches. This
triangulation meant that one of the male children of this couple was most likely the Subject, as
he would be related to the GEDmatch matches on both sides of his family tree second cousins
once-removed (6th degree relatives) to Match #1 and distant cousins (ranging from third
cousins once-removed to fifth cousins once-removed) to Distant Matches #1-7 (Figure 10).
Importantly, barring additional independent intersections between these family trees, the
identified Persons of Interest were the only individuals who were related to both of these
families. These children were also the right age at the time of the crime, lived nearby, and all
appeared to have phenotypes consistent with the Snapshot predictions.
Figure 10: Pedigree built for Match #1’s family after the possible branches leading to the Subject were
narrowed down through targeted kinship testing and subsequent triangulation with distant matches.
Resolution: The genetic genealogy analysis identified a set of brothers who could be the
Subject, none of whom had ever been arrested for a crime that would have required submission
of DNA to a database. Officers were eventually able to narrow the investigation down to a
single brother and match his DNA to the crime scene DNA using traditional STR analysis. He
17
has been arrested and is awaiting trial.
Conclusions
Genetic genealogy has been called “2018’s biggest contribution to crime science” (Augenstein,
2018) and is rapidly changing the face of cold case investigations. Even for perpetrators who
are completely under the radar or long dead, given DNA from a crime scene, it may be possible
to identify them with genetic genealogy. Importantly, genetic genealogy has just as much power
to generate leads in active cases as in cold cases. In fact, it was recently used to identify a
perpetrator in a sexual assault case that had occurred only three months earlier (Havens, 2019),
and he has since pled guilty. Rather than wait until years have passed and all other leads have
been exhausted, investigators now have access to innovative forensic DNA technologies that
can generate significant new leads and prevent cases from going cold. Looking to the future,
genetic genealogy has the potential to significantly reduce the number of unsolved cold cases in
North America while also reducing the rate at which cases go cold.
References
Augenstein, S. (2018). Working Backward From Genealogy: Tracking a Dead Killer’s Trail.
Forensic Magazine.
Ball, C., Barber, M., Byrnes, J., Carbonetto, P., Chahine, K., Curtis, R., . . . Willmore, L. (2016).
Ancestry DNA Matching White Paper. Retrieved from
https://www.ancestry.com/corporate/sites/default/files/AncestryDNA-Matching-White-
Paper.pdf
Bettinger, B. T., & Perl, J. (2018). The Shared cM Project 3.0 tool v4. Retrieved from
https://dnapainter.com/tools/sharedcmv4
Erlich, Y., Shor, T., Pe, I., & Carmi, S. (2018). Identity inference of genomic data using long-
range familial searches. Science, 362(6415), 690-694. doi:10.1126/science.aau4832
GEDmatch.com. Terms of Service and Privacy Policy.
Greytak, E., & Moore, C. (2018). Closing Cases with a Single SNP Array: Integrated Genetic
Genealogy, DNA Phenotyping, and Kinship Analyses. Proceedings of the 29th
International Symposium on Human Identification.
Greytak, E. M., & Armentrout, S. (2015). DNA Phenotyping: Predicting Ancestry and Physical
Appearance from Forensic DNA. Proceedings of the 26th International Symposium on
Human Identification.
Greytak, E. M., Gorden, E. M., Marshall, C. K., Sturk-Andreaggi, K., McMahon, T. P., &
Armentrout, S. L. (2017). SNP Recovery from Degraded Samples for Kinship
Assessment.
Greytak, E. M., Kaye, D. H., Budowle, B., Moore, C., & Armentrout, S. L. (2018). Privacy and
genetic genealogy data. Science, 361(6405), 857. doi:10.1126/science.aav0330
Greytak, E. M., Moore, C., & Armentrout, S. L. (2018). RE: Identity inference of genomic data
using long-range familial searches, Erlich et al. Science, 362(6415) (2018), 690-694
(eLetter, 10-29-18).
Guerrini, C. J., Robinson, J. O., Petersen, D., & McGuire, A. L. (2018). Should police have
access to genetic genealogy databases? Capturing the Golden State Killer and other
criminals using a controversial new forensic technique. PLOS Biology, 16(10),
e2006906-e2006906. doi:10.1371/journal.pbio.2006906
Havens, E. (2019). Elderly woman in home invasion rape case: I forgive my attacker. St.
George Spectrum & Daily News. Retrieved from
https://www.thespectrum.com/story/news/2019/02/26/elderly-woman-home-invasion-
rape-case-forgive-my-attacker/2995143002/
18
Henn, B. M., Hon, L., Macpherson, J. M., Eriksson, N., Saxonov, S., Pe'er, I., & Mountain, J. L.
(2012). Cryptic distant relatives are common in both isolated and cosmopolitan genetic
samples. PLoS One, 7. doi:10.1371/journal.pone.0034267
Huff, C. D., Witherspoon, D. J., Simonson, T. S., Xing, J., Watkins, W. S., Zhang, Y., . . . Jorde,
L. B. (2011). Maximum-likelihood estimation of recent shared ancestry (ERSA). Genome
Research, 21, 768-774. doi:10.1101/gr.115972.110
International Society of Genetic Genealogy (2019). Endogamy. Accessed January 30, 2019.
Retrieved from https://isogg.org/wiki/Endogamy
Keating, B., Bansal, A. T., Walsh, S., Millman, J., Newman, J., Kidd, K., . . . Kayser, M. (2013).
First all-in-one diagnostic tool for DNA intelligence: genome-wide inference of
biogeographic ancestry, appearance, relatedness, and sex with the Identitas v1 Forensic
Chip. International Journal of Legal Medicine, 127, 559-572. doi:10.1007/s00414-012-
0788-1
Manichaikul, A., Mychaleckyj, J. C., Rich, S. S., Daly, K., Sale, M., & Chen, W.-M. (2010).
Robust relationship inference in genome-wide association studies. Bioinformatics, 26,
2867-2873. doi:10.1093/bioinformatics/btq559
Milian, J. (2018). Cold-case murders, rapes cracked by Lake Worth genealogy website. The
Palm Beach Post. Retrieved from https://www.palmbeachpost.com/news/20181129/cold-
case-murders-rapes-cracked-by-lake-worth-genealogy-website
Regalado, A. (2019). More than 26 million people have taken an at-home ancestry test. MIT
Technology Review. Retrieved from https://www.technologyreview.com/s/612880/more-
than-26-million-people-have-taken-an-at-home-ancestry-test/
... Searching these platforms using profiles from biological samples recovered in criminal investigations can help identify relatives of potential offenders, leading to additional genealogical research and ultimately the identification of a suspect whose DNA can be compared to crime samples [106]. Nevertheless, this approach has raised concerns about data privacy and ethics [107]. ...
Article
Full-text available
Forensic genetics, leveraging molecular tools and scientific applications, has witnessed significant advancements in DNA analysis over the last three decades. These progressions have enhanced the discrimination power, speed, and sensitivity of DNA profiling methods, enabling the analysis of challenging samples. This article explores the significance of forensic genetics in criminal investigations, traces the historical evolution of DNA analysis techniques, and presents recent developments in the field. The article aims to provide a comprehensive understanding of the crucial role of forensic genetics in criminal investigations and sheds light on the latest trends and breakthroughs in this area. The evolution of DNA typing from ABO blood typing to the current standard of short tandem repeat (STR) analysis is discussed, along with alternative DNA analysis methods, such as Y-chromosome analysis and single nucleotide polymorphism (SNP) typing. Massively parallel sequencing (MPS) represents a groundbreaking advancement, enabling whole genome sequencing and addressing complex cases. The article also covers recent innovations, including DNA methylation analysis, body fluid identification, forensic DNA phenotyping, and genetic genealogy, highlighting their potential benefits in forensic investigations. Despite these advancements, standard STR profiling remains the gold standard due to its established protocols and databases. Ethical considerations regarding data privacy and cost implications are crucial as these technologies continue to progress in their pursuit of justice.
... Searching these platforms using profiles from biological samples recovered in criminal investigations can help identify relatives of potential offenders, leading to additional genealogical research and ultimately the identification of a suspect whose DNA can be compared to crime samples [106]. Nevertheless, this approach has raised concerns about data privacy and ethics [107]. ...
Article
Full-text available
Forensic genetics, leveraging molecular tools and scientific applications, has witnessed significant advancements in DNA analysis over the last three decades. These progressions have enhanced the discrimination power, speed, and sensitivity of DNA profiling methods, enabling the analysis of challenging samples. This article explores the significance of forensic genetics in criminal investigations, traces the historical evolution of DNA analysis techniques, and presents recent developments in the field. The article aims to provide a comprehensive understanding of the crucial role of forensic genetics in criminal investigations and sheds light on the latest trends and breakthroughs in this area. The evolution of DNA typing from ABO blood typing to the current standard of short tandem repeat (STR) analysis is discussed, along with alternative DNA analysis methods, such as Y-chromosome analysis and single nucleotide polymorphism (SNP) typing. Massively parallel sequencing (MPS) represents a groundbreaking advancement, enabling whole genome sequencing and addressing complex cases. The article also covers recent innovations, including DNA methylation analysis, body fluid identification, forensic DNA phenotyping, and genetic genealogy, highlighting their potential benefits in forensic investigations. Despite these advancements, standard STR profiling remains the gold standard due to its established protocols and databases. Ethical considerations regarding data privacy and cost implications are crucial as these technologies continue to progress in their pursuit of justice.
... The ascendancy search often results in pointing to more than one person, thus several lineages need to be subsequently checked and revised. The process of finding descendants typically ends once a currently living generation is reached to narrow the number of potential leads (44). At this point the results are given to the legal authority that needs to undertake classical investigation -if a link to specific individual is confirmed, a reference sample needs to be legally collected and analyzed with standard short tandem repeat set to assure high confidence of a genetic match. ...
Article
Full-text available
Forensic genetic genealogy (FGG) benefits largely from popularity of genealogical research within (mostly) American society and the advent of new sequencing techniques that allow typing of challenging forensic samples. It is considered a true breakthrough for both active and especially cold cases where all other resources and methods have failed during investigation. Despite media coverage generally highlighting its powers, the method itself is considered very laborious and the investigation may easily got suspended at every stage due to many factors including no hits in the database or breaks in traceable lineages within the family tree. This review summarizes the scope of FGG use, mentions most concerns and misconceptions associated with the technique and points to the plausible solutions already suggested. It also brings together current guidelines and regulations intended to be followed by law enforcement authorities wishing to utilize genetic genealogy research.
Article
Objective To compare the differences in the haplogroup classification and population discrimination abilities of five Y-SNP panels based on next-generation sequencing and to provide references for the forensic application of these panels. Methods The haplogroup classification and genetic structure analysis of 1,600 samples from the high-depth sequencing project of 1,000 Human Genomes were analyzed. The Y-SNP loci, Y-InDel loci, and haplogroup information provided by the five published panels were used to perform statistical analysis. and Results There were obvious differences in the proportion of haplogroups and haplogroup resolution among different panels. The 639 Y-SNP panel has the highest average resolution for the haplogroups C, D and O, and the CSYseq panel has the highest average resolution for the haplogroups N and Q. The three panels developed for Chinese populations showed higher population specificity, among which the 639 Y-SNP panel showed a highest resolution level in the three Chinese populations and was closest to the fineness level of the ISOGG 2019 phylogenetic tree. Except the 859 Y-SNP panel, other four panels exhibited good discrimination abilities for Asian populations. Conclusion The three Y-SNP panels developed by Chinese research groups showed outstanding
Article
Degraded DNA is used to answer questions in the fields of ancient DNA (aDNA) and forensic genetics. While aDNA studies typically center around human evolution and past history, and forensic genetics is often more concerned with identifying a specific individual, scientists in both fields face similar challenges. The overlap in source material has prompted periodic discussions and studies on the advantages of collaboration between fields toward mutually beneficial methodological advancements. However, most have been centered around wet laboratory methods (sampling, DNA extraction, library preparation, etc.). In this review, we focus on the computational side of the analytical workflow. We discuss limitations and considerations to consider when working with degraded DNA. We hope this review provides a framework to researchers new to computational workflows for how to think about analyzing highly degraded DNA and prompts an increase of collaboration between the forensic genetics and aDNA fields.
Article
This article investigates the innovative fusion of cellular physiology and forensic anthropometry, revealing a promising synergy for increasing investigative capacities, particularly in resource-constrained environments. To gain essential information about identifying people, morphological analysis has traditionally been utilized in forensic anthropometry, a well-established field that measures human skeletal remains. To better understand post-mortem cellular alterations, new forensic investigational paths have been made possible by recent developments in cellular physiology. This interdisciplinary approach has great promise in low- and middle-income countries (LMICs), where forensic resources are frequently scarce. Investigators can gather crucial information about the post-mortem period, the cause of death, and probable signs of trauma or pathology thanks to the integration of cellular physiology with forensic anthropometry. This strategy utilises methods like immunohistochemistry, gene expression analysis, and cellular bioinformatics to enable a more nuanced and precise reconstruction of events leading up to death. The actual use of this interdisciplinary paradigm in LMIC environments is elucidated in this paper. It tackles issues with accessibility to technology, training, and infrastructure while suggesting flexible ways to close existing gaps. This strategy is positioned to strengthen investigative capacities in areas where they are most urgently required by stressing cost-effective methodologies and the use of open-source resources.We show the concrete effect of this integration on actual investigations through a number of case studies and comparative analyses. When compared to using only standard anthropometric methods, the results reveal a significant improvement in accuracy, precision, and efficiency. Additionally, this interdisciplinary strategy may make it easier to create extensive forensic databases, enabling more effective identification attempts and aiding in the solving of unsolved cold cases.
Article
In cases where human remains are unidentified because there is no initial identification hypothesis, limited contextual information, and/or poor preservation, radiocarbon (14 C) dating may be a useful tool to further assist with identification. Through measuring the amount of 14 C remaining in organic material, such as bone, teeth, nail, or hair, radiocarbon dating may provide an estimated year of birth and year of death for a deceased person. This information, may assist in, establishing whether a case of unidentified human remains (UHR) is actually of medicolegal significance and therefore, requires forensic investigation and identification. This case series highlights the application of 14 C dating to seven of the 132 UHR cases in Victoria, Australia. Cortical bone was sampled from each case and the level of 14 C was measured to provide an estimated year of death. Four of the seven cases analyzed contained the levels of 14 C consistent with an archeological timeframe, one contained a level of 14 C consistent with a modern (i.e., of medicolegal significance) timeframe, and the results for the remaining two samples were inconclusive. Applying this technique not only reduced the number of UHR cases in Victoria but also has investigative, cultural, and practical implications for medicolegal casework in general.
Article
Since the arrest of the Golden State Killer in the US in April 2018, forensic geneticists have been increasingly interested in the investigative genetic genealogy (IGG) method. While this method has already been in practical use as a powerful tool for criminal investigation, we have yet to know well the limitations and potential risks. In this current study, we performed an evaluation study focusing on degraded DNA using the Affymetrix Genome-Wide Human SNP Array 6.0 platform (Thermo Fisher Scientific). We revealed one of the potential problems that occur during SNP genotype determination using a microarray-based platform. Our analysis results indicated that the SNP profiles derived from degraded DNA contained many false heterozygous SNPs. In addition, it was confirmed that the total amount of probe signal intensity on microarray chips derived from degraded DNA decreased significantly. Because the conventional analysis algorithm performs normalization during genotype determination, we concluded that noise signals could be genotype-called. To address this issue, we proposed a novel microarray data analysis method without normalization (nMAP). Although the nMAP algorithm resulted in a low call rate, it substantially improved genotyping accuracy. Finally, we confirmed the usefulness of the nMAP algorithm for kinship inferences. These findings and the nMAP algorithm will make a contribution to the advance of the IGG method.
Article
Full-text available
On April 24, 2018, a suspect in California’s notorious Golden State Killer cases was arrested after decades of eluding the police. Using a novel forensic approach, investigators identified the suspect by first identifying his relatives using a free, online genetic database populated by individuals researching their family trees. In the wake of the case, media outlets reported privacy concerns with police access to personal genetic data generated by or shared with genealogy services. Recent data from 1,587 survey respondents, however, provide preliminary reason to question whether such concerns have been overstated. Still, limitations on police access to genetic genealogy databases in particular may be desirable for reasons other than current public demand for them.
Article
Full-text available
When a forensic DNA sample cannot be associated directly with a previously genotyped reference sample by standard short tandem repeat profiling, the investigation required for identifying perpetrators, victims, or missing persons can be both costly and time consuming. Here, we describe the outcome of a collaborative study using the Identitas Version 1 (v1) Forensic Chip, the first commercially available all-in-one tool dedicated to the concept of developing intelligence leads based on DNA. The chip allows parallel interrogation of 201,173 genome-wide autosomal, X-chromosomal, Y-chromosomal, and mitochondrial single nucleotide polymorphisms for inference of biogeographic ancestry, appearance, relatedness, and sex. The first assessment of the chip’s performance was carried out on 3,196 blinded DNA samples of varying quantities and qualities, covering a wide range of biogeographic origin and eye/hair coloration as well as variation in relatedness and sex. Overall, 95 % of the samples (N = 3,034) passed quality checks with an overall genotype call rate >90 % on variable numbers of available recorded trait information. Predictions of sex, direct match, and first to third degree relatedness were highly accurate. Chip-based predictions of biparental continental ancestry were on average ~94 % correct (further support provided by separately inferred patrilineal and matrilineal ancestry). Predictions of eye color were 85 % correct for brown and 70 % correct for blue eyes, and predictions of hair color were 72 % for brown, 63 % for blond, 58 % for black, and 48 % for red hair. From the 5 % of samples (N = 162) with <90 % call rate, 56 % yielded correct continental ancestry predictions while 7 % yielded sufficient genotypes to allow hair and eye color prediction. Our results demonstrate that the Identitas v1 Forensic Chip holds great promise for a wide range of applications including criminal investigations, missing person investigations, and for national security purposes. Electronic supplementary material The online version of this article (doi:10.1007/s00414-012-0788-1) contains supplementary material, which is available to authorized users.
Article
Full-text available
Although a few hundred single nucleotide polymorphisms (SNPs) suffice to infer close familial relationships, high density genome-wide SNP data make possible the inference of more distant relationships such as 2(nd) to 9(th) cousinships. In order to characterize the relationship between genetic similarity and degree of kinship given a timeframe of 100-300 years, we analyzed the sharing of DNA inferred to be identical by descent (IBD) in a subset of individuals from the 23andMe customer database (n = 22,757) and from the Human Genome Diversity Panel (HGDP-CEPH, n = 952). With data from 121 populations, we show that the average amount of DNA shared IBD in most ethnolinguistically-defined populations, for example Native American groups, Finns and Ashkenazi Jews, differs from continentally-defined populations by several orders of magnitude. Via extensive pedigree-based simulations, we determined bounds for predicted degrees of relationship given the amount of genomic IBD sharing in both endogamous and 'unrelated' population samples. Using these bounds as a guide, we detected tens of thousands of 2(nd) to 9(th) degree cousin pairs within a heterogenous set of 5,000 Europeans. The ubiquity of distant relatives, detected via IBD segments, in both ethnolinguistic populations and in large 'unrelated' populations samples has important implications for genetic genealogy, forensics and genotype/phenotype mapping studies.
Article
Full-text available
Accurate estimation of recent shared ancestry is important for genetics, evolution, medicine, conservation biology, and forensics. Established methods estimate kinship accurately for first-degree through third-degree relatives. We demonstrate that chromosomal segments shared by two individuals due to identity by descent (IBD) provide much additional information about shared ancestry. We developed a maximum-likelihood method for the estimation of recent shared ancestry (ERSA) from the number and lengths of IBD segments derived from high-density SNP or whole-genome sequence data. We used ERSA to estimate relationships from SNP genotypes in 169 individuals from three large, well-defined human pedigrees. ERSA is accurate to within one degree of relationship for 97% of first-degree through fifth-degree relatives and 80% of sixth-degree and seventh-degree relatives. We demonstrate that ERSA's statistical power approaches the maximum theoretical limit imposed by the fact that distant relatives frequently share no DNA through a common ancestor. ERSA greatly expands the range of relationships that can be estimated from genetic data and is implemented in a freely available software package.
Article
Full-text available
Genome-wide association studies (GWASs) have been widely used to map loci contributing to variation in complex traits and risk of diseases in humans. Accurate specification of familial relationships is crucial for family-based GWAS, as well as in population-based GWAS with unknown (or unrecognized) family structure. The family structure in a GWAS should be routinely investigated using the SNP data prior to the analysis of population structure or phenotype. Existing algorithms for relationship inference have a major weakness of estimating allele frequencies at each SNP from the entire sample, under a strong assumption of homogeneous population structure. This assumption is often untenable. Here, we present a rapid algorithm for relationship inference using high-throughput genotype data typical of GWAS that allows the presence of unknown population substructure. The relationship of any pair of individuals can be precisely inferred by robust estimation of their kinship coefficient, independent of sample composition or population structure (sample invariance). We present simulation experiments to demonstrate that the algorithm has sufficient power to provide reliable inference on millions of unrelated pairs and thousands of relative pairs (up to 3rd-degree relationships). Application of our robust algorithm to HapMap and GWAS datasets demonstrates that it performs properly even under extreme population stratification, while algorithms assuming a homogeneous population give systematically biased results. Our extremely efficient implementation performs relationship inference on millions of pairs of individuals in a matter of minutes, dozens of times faster than the most efficient existing algorithm known to us. Our robust relationship inference algorithm is implemented in a freely available software package, KING, available for download at http://people.virginia.edu/∼wc9c/KING.
Article
Detecting familial matches Recent advances in DNA technology and companies that provide array-based testing have led to services that collect, share, and analyze volunteered genomic information. Privacy concerns have been raised, especially in light of the use of these services by law enforcement to identify suspects in criminal cases. Testing models of relatedness, Erlich et al. show that many individuals of European ancestry in the United States—even those that have not undergone genetic testing—can be identified on the basis of available genetic information. These results indicate a need for procedures to help maintain genetic privacy for individuals. Science , this issue p. 690
DNA phenotyping: predicting ancestry and physical appearance from forensic DNA
  • Greytak
Greytak, E. M., & Armentrout, S. (2015). DNA Phenotyping: Predicting Ancestry and Physical Appearance from Forensic DNA. Proceedings of the 26th International Symposium on Human Identification.
RE: Identity inference of genomic data using long-range familial searches
  • E M Greytak
  • C Moore
  • S L Armentrout
Greytak, E. M., Moore, C., & Armentrout, S. L. (2018). RE: Identity inference of genomic data using long-range familial searches, Erlich et al. Science, 362(6415) (2018), 690-694 (eLetter, 10-29-18).
Elderly woman in home invasion rape case: I forgive my attacker
  • E Havens
Havens, E. (2019). Elderly woman in home invasion rape case: I forgive my attacker. St. George Spectrum & Daily News. Retrieved from https://www.thespectrum.com/story/news/2019/02/26/elderly-woman-home-invasionrape-case-forgive-my-attacker/2995143002/