ArticlePDF Available

Abstract and Figures

Investigative genetic genealogy has rapidly emerged as a highly effective tool for using DNA to determine the identity of unknown individuals (unidentified remains or perpetrators), generating identifications in dozens of law enforcement cases, both cold and active. The amount of press coverage of these cases may have given the impression that the analysis is straightforward and the outcome guaranteed once a sample is uploaded to a database. However, the database query results serve only as clues from which in-depth genealogy and descendancy research must proceed to determine the possible identities of an unknown individual. While there certainly will be more announcements of cases solved using this new technique, there are many more cases where identification has not yet been possible due to the wide variety of complications present in these investigations. This paper lays out the fundamentals of genetic genealogy, along with the challenges that are encountered in many of these investigations, and concludes with a set of case studies that demonstrate the variety of cases encountered thus far.
Content may be subject to copyright.
1
Author Manuscript, published in final form as:
Greytak EM, Moore C, & Armentrout SL (2019). Genetic genealogy for cold case and active
investigations. Forensic Science International, 299, 103113. doi: 10.1016/j.forsciint.2019.03.039
Genetic genealogy for cold case and active investigations
Ellen M. Greytak, CeCe Moore, Steven L. Armentrout
Parabon NanoLabs, Inc., 11260 Roger Bacon Dr. Suite 406, Reston, VA, 20190, USA
Highlights
Genetic genealogy is helping to close both cold and active investigations
Forensic DNA is uploaded to public genetic genealogy databases to find relatives
Extensive genealogy and descendancy research generate a list of possible identities
Many complicating factors can impede the research
Identity is narrowed down using a wide range of information & confirmed using STRs
Abstract
Investigative genetic genealogy has rapidly emerged as a highly effective tool for using DNA to
determine the identity of unknown individuals (unidentified remains or perpetrators), generating
identifications in dozens of law enforcement cases, both cold and active. The amount of press
coverage of these cases may have given the impression that the analysis is straightforward and
the outcome guaranteed once a sample is uploaded to a database. However, the database
query results serve only as clues from which in-depth genealogy and descendancy research
must proceed to determine the possible identities of an unknown individual. While there
certainly will be more announcements of cases solved using this new technique, there are many
more cases where identification has not yet been possible due to the wide variety of
complications present in these investigations. This paper lays out the fundamentals of genetic
genealogy, along with the challenges that are encountered in many of these investigations, and
concludes with a set of case studies that demonstrate the variety of cases encountered thus far.
Keywords: Genetic genealogy; Forensic genetics; DNA; SNPs; Cold cases; Human
identification
Introduction
Traditional genealogy has been practiced for centuries, using documentary records and oral
histories to trace families backwards in time. Until recently, these were the only ways to
connect extended family members, but with the advent of direct-to-consumer (DTC) genetic
testing, it is now possible to find relatives through shared DNA. This has enabled thousands of
individuals who have lost their biological identity through adoption, abandonment, anonymous
gamete donation, misattributed parentage, etc., to regain their genetic heritage. More recently,
these same tools have been used to identify DNA from suspected perpetrators in more than
thirty law enforcement cases, only some of which have been publicly announced (Table 1).
Table 1: Cases for which law enforcement agencies have announced identification of DNA from a
suspected perpetrator with the aid of genetic genealogy (through 1/31/19). * Deceased; ** Pled guilty
Location
Case
Year(s)
Identified As
Date
Announced
Genetic
Genealogist
1
California
Multiple Homicides and Sexual
Assaults - “Golden State Killer”
1974 -
1986
Joseph James
DeAngelo
April 24,
2018
Barbara
Rae-Venter
2
2
Snohomish
County, WA
Double Homicide of Jay Cook (20)
and Tanya Van Cuylenborg (18)
1987
William Earl
Talbott II
May 21, 2018
Parabon
3
Tacoma, WA
Homicide of Michella Welch (12)
1986
Gary Charles
Hartman
June 20,
2018
Parabon
4
Lancaster, PA
Homicide of Christy Mirack (25)
1992
Raymond Charles
Rowe**
June 25,
2018
Parabon
5
Brazos County,
TX
Homicide of Virginia Freeman (40)
1980
James Otto
Earhart*
June 25,
2018
Parabon
6
Fort Wayne, IN
Homicide of April Tinsley (8)
1988
John Dale Miller**
July 15, 2018
Parabon
7
Woonsocket, RI
Homicide of Constance Gauthier
(81)
2016
Matthew Norman
Dessault
July 18, 2018
Parabon
8
St. George, UT
Sexual Assault of Carla Brooks
(79)
2018
Spencer Glen
Monnett**
July 28, 2018
Parabon
9
Fayetteville, NC
Multiple Sexual Assaults -
“Ramsey Street Rapist”
2006 -
2008
Darold Wayne
Bowden
August 22,
2018
Parabon
10
Champaign
County, IL
Homicide of Holly Cassano (22)
2009
Michael F. A.
Henslick
August 29,
2018
Parabon
11
Montgomery
County, MD
Multiple Sexual Assaults
2007 -
2011
Marlon Michael
Alexander
September
14, 2018
Parabon
12
Sarasota, FL
Homicide of Deborah Dalzell (47)
1999
Luke Edward
Fleming
September
19, 2018
Parabon and
Barbara
Rae-Venter
13
California
Multiple Sexual Assaults - “NorCal
Rapist”
1991 -
2006
Roy Waller
September
21, 2018
Law
Enforcement
14
Greenville, SC;
Memphis, TN;
Portageville, MO
Multiple Homicides and Sexual
Assaults
1990 -
1998
Robert Eugene
Brashers*
October 5,
2018
Parabon
15
Starkville, MS
Double Homicide of Betty Jones
(65) and Kathryn Crigler (81)
1990
Michael W.
DeVaughn
October 8,
2018
Parabon
16
Greenbrier, AR
Homicide of Pam Felkins (32)
1990
Edward Keith
Renegar*
October 29,
2018
Parabon
17
Fulton County,
GA
Homicide of Lorrie Ann Smith (28)
1997
Jerry Lee
November 1,
2018
Parabon
18
Anne Arundel
County, MD
Homicide of Michael Temple (29)
2010
Fred Lee
Frampton, Jr.
November 2,
2018
Parabon
19
Orlando, FL
Homicide of Christine Franke (25)
2001
Benjamin L.
Holmes
November 5,
2018
Parabon &
Florida Dept.
of Law
Enforcement
20
Carlsbad, CA
Homicide of Jodine Serrin (39)
2007
David Mabrito*
November
13, 2018
Parabon and
Barbara
Rae-Venter
21
Santa Clara, CA
Homicide of Leslie Marie Perlov
(21)
1973
John Arthur
Getreu
November
21, 2018
Parabon
22
College Station,
TX
Multiple Sexual Assaults
2018
Christopher Quinn
Williams
December
12, 2018
Parabon
23
Cedar Rapids, IA
Homicide of Michelle Martinko
(18)
1979
Jerry Lynn Burns
December
19, 2018
Parabon
24
Hernando
County, FL
Sexual Assault of Unnamed Victim
(12)
1983
William L. Nichols*
January 10,
2019
Parabon
25
Orange County,
Sexual Assaults of Two Unnamed
1995 &
Kevin Konther
January 11,
Law
3
CA
Victims (9 and 31)
1998
2019
Enforcement
26
La Mesa, CA
Homicide of Scott Martinez (47)
2006
Zachary Aaron
Bunney
January 24,
2019
Parabon
27
Fremont, CA
Homicide of Jack Upton (30)
1990
Russell Guerrero
January 24,
2019
Parabon
28
Portland, OR
Homicide of Anna Marie Hlavka
(20)
1979
Jerry Walter
McFadden*
January 31,
2019
Parabon
Generating Data
Unlike traditional forensic DNA analysis, which uses autosomal short tandem repeats (STRs) to
generate an identity profile from ~20 loci, genetic genealogy uses hundreds of thousands of
single nucleotide polymorphisms (SNPs) spread across the autosome. Participants in genetic
genealogy have had their DNA tested by a direct-to-consumer (DTC) genetic testing company,
such as 23andMe or AncestryDNA, which use microarrays to genotype up to ~1 million SNPs.
DTC companies obtain DNA from spit kits or cheek swabs and thus always have a large amount
of high-quality single-source DNA to work with. Forensic DNA samples, on the other hand,
often only have a small amount of degraded DNA, which may be mixed with DNA from one or
more other individuals. Microarray genotyping has previously been shown to be effective and
accurate with forensic samples (Keating et al., 2013), and Parabon has used it for casework
since 2015, generating high genotyping call rates from forensic samples down to 1 ng of DNA
(Table 2). Parabon has also found it is possible to accurately deconvolute microarray data from
two-person mixtures, as long as the person-of-interest is at least 40% of the mixture and a
single-source reference sample from the second contributor is available.
Table 2: Summary of Parabon’s >250 forensic DNA samples used in genetic genealogy casework and
the resulting microarray genotyping call rates.
Type
Quantity
Call Rate
Semen
48.0%
Single Source
79.4%
2.5 ng
22.7%
> 95%
47.5%
Blood
24.6%
Low Mixture
16.4%
2.5-5 ng
12.6%
90-95%
12.2%
Tissue
10.1%
High Mixture
(Deconvoluted)
4.2%
5-10 ng
13.0%
80-90%
17.5%
Saliva
7.7%
10-20 ng
17.8%
70-80%
6.1%
Bone
4.8%
20-40 ng
27.1%
60-70%
12.2%
Touch
4.8%
40-80 ng
3.2%
<60%
4.6%
>80 ng
3.6%
Parabon’s casework currently uses the Illumina CytoSNP-850K array, an off-the-shelf chip that
contains >98% of the SNPs on the OmniExpress chip used by Ancestry.com, FamilyTreeDNA,
and MyHeritage. 23andMe previously also based their chip on the OmniExpress but has since
moved to smaller custom chips that overlap less with the other DTC companies. For law
enforcement cases, extracted DNA samples are processed at a CLIA-certified lab, and the data
is uploaded securely to Parabon.
Determining Relatedness from DNA
Given enough SNPs, it is possible to determine the degree of relatedness between two people,
which is defined by the expected amount of shared DNA, not the number of meioses (Figure 1).
4
Figure 1: Pedigree showing the degrees of relatedness, as defined by the expected amount of shared
DNA. Each relationship is defined with respect to the red “self / twin” box.
While several relationship inference methods had previously been proposed (Huff et al., 2011;
Manichaikul et al., 2010), 23andMe was the first DTC company to introduce an accurate,
scalable approach to inferring approximately how closely related two DNA samples are from
autosomal SNPs (Henn et al., 2012). Each person has two copies of each of the 22 autosomal
chromosomes (“autosomes”), one inherited from their mother and one inherited from their
father. Autosomes are not inherited intact from each parent; rather, each parent’s own pair of
chromosomes is randomly recombined into a new chromosome that is passed onto the child.
While recombination occurs randomly, nucleotides that are closer to one another on a
chromosome are more likely to be inherited together, while nucleotides that are far apart are
more likely to be separated by recombination. The probability of recombination between two
nucleotides is quantified as their genetic distance, which is measured in centimorgans (cM),
such that 1 cM equates to a 1% probability of recombination.
Rather than simply looking at the total number of shared SNPs, genetic genealogy takes
advantage of the fact that recombination will break up long stretches of shared DNA over the
generations, such that more closely related people will share longer stretches of DNA
(“segments”) that are identical-by-descent (IBD) (Figure 2). The more recombination events
that have occurred, the shorter the shared IBD segments will be, so the number and length of
IBD segments in cM can be used to approximate the degree of relatedness.
5
Figure 2: Inheritance of DNA segments on a single chromosome. The lengths of the shared segments
(shaded boxes) are summed across all 22 autosomes to give the total amount of shared DNA.
To detect IBD segments, genetic genealogy algorithms search for regions of the genome where
two individuals share at least one allele at every SNP. To be counted, these segments must
contain a minimum number of SNPs (typically ~500) and be over a certain length (typically 5-7
cM), which screens out most segments that are shared by chance rather than due to common
descent. When summed across all autosomes, the amount of DNA shared IBD strongly
correlates with the degree of relatedness between two individuals, such that more distant
relatives tend to share less DNA (Table 3). However, due to the random nature of
recombination, the amount of shared DNA can vary greatly for relatives of the same degree,
and this variation increases with more recombination events, such that ~10% of third cousins
and ~50% of fourth cousins share no detectable IBD segments.
Table 3: The range of DNA shared by pairs of people with each relationship. While most pairs from a
given relationship fall within a narrower range, these values represent the full ranges that have been
observed (Ball et al., 2016).
cM Range
Degree
Relationship
3,600
1
Parent-Child
2,000-3,600
1
Full Sibling
1,060-2,500
2
Half-Sibling, Avuncular, Double First Cousin, Grandparent / Grandchild
425-1,500
3
First Cousin (1C), Half-Avuncular, Great-Grandparent / Great-Grandchild, Great-
6
Avuncular
160-950
4
First Cousin Once-Removed (1C1R), Half-First Cousin (½ 1C), Half-Great-
Aunt/Uncle / Half-Great-Niece/Nephew
65-650
5
Second Cousin (2C), First Cousin Twice-Removed (1C2R),
Half-First Cousin Once-Removed (½ 1C1R)
0-375
6
Second Cousin Once-Removed (2C1R), Half-Second Cousin (½ 2C), First
Cousin Thrice-Removed (1C3R), Half-First Cousin Twice-Removed (½ 1C2R)
0-245
7
Third Cousin (3C), Second Cousin Twice-Removed (2C2R)
0-185
>7
Third Cousin Once-Removed (3C1R), Distant Cousins
Genetic Genealogy Databases and Genetic Privacy
DTC genetic testing companies’ private databases have exploded in size, with AncestryDNA
currently containing nearly 15 million individuals, 23andMe containing nearly 10 million, and
MyHeritage and FamilyTreeDNA (FTDNA) together containing roughly 3.5 million (Regalado,
2019). AncestryDNA and 23andMe maintain their databases separately and are not accessible
to law enforcement, as the only way to submit a sample is via a cheek swab or spit kit.
MyHeritage and FTDNA both allow uploads of data generated from other sources, but law
enforcement usage of either requires written permission from the company, as well as a court
order for MyHeritage or “the required legal documentation” for FTDNA.
GEDmatch, on the other hand, is not a DTC company. It was created by Curtis Rogers and
John Olson in 2010 as a public database where individuals from different testing companies
could compare their DNA by downloading their raw data from a DTC company’s site and
uploading it to a common database. After the Golden State Killer suspect was identified through
surreptitious use of GEDmatch, the site’s administrators decided to explicitly allow law
enforcement usage. They posted a notice on the front page of the site (Figure 3) and also
updated their Terms of Service to state that law enforcement can and is using GEDmatch to
identify remains and perpetrators of violent crimes, defined as homicides or sexual assaults
(GEDmatch.com). Both new and existing users were required to view these new Terms and
decide whether to accept them before using the site. Critics of genetic genealogy argue that
many people who joined the site prior to this update may not have considered the possibility that
their desire to locate relatives could lead to the discovery that they are related to someone
whose DNA is associated with a crime and to the apprehension of that relative. Indeed, it is
possible some of them still may be unaware of the new warning, and individuals who had their
data uploaded by another individual or have been inactive on the site may not have reviewed
the new Terms to decide whether to consent. However, even prior to implementing these new
Terms, GEDmatch’s Terms clearly stated that any data set to “public” would be searchable by
anyone. The law has generally allowed information made available to the public to be used in
criminal investigations. Users can easily have their data set to “private,” hiding it from all search
queries, or removed entirely. Thus, the DNA data files in a public database like GEDmatch
come from individuals who have proactively downloaded their data from a private DNA testing
company’s website, uploaded the information to a public website, reviewed the Terms of
Service that permits law enforcement usage, and opted in to public comparisons against their
data.
7
Figure 3: Notice posted on GEDmatch’s homepage after the site’s use in the Golden State Killer
investigation was made public.
Additionally, no sensitive genetic information is disclosed to law enforcement during a genetic
genealogy search, as the raw genetic data from GEDmatch users is not accessible. Raw
genetic data can contain sensitive health-related information, and this type of private genetic
information should be protected. In keeping with this precept, no raw genotypes are displayed
or made available for download by GEDmatch. GEDmatch simply performs comparisons
among samples, returning the lengths and chromosomal locations of shared DNA segments,
which are used to determine the approximate relationship between individuals. Similarly, data
obtained from abandoned DNA at a crime scene and used for genetic genealogy are not
exposed to other users and can be prevented from appearing in search results (an option
available to all users). At Parabon, genetic data is kept on an encrypted server only accessible
to authorized employees, and the company’s GEDmatch accounts can only be accessed by the
bioinformatics team and the lead genetic genealogist, CeCe Moore. These facts mitigate many
of the privacy concerns surrounding genetic genealogy, as individuals have control over
whether their data is used as part of law enforcement investigations, and sensitive raw data is
not accessed (Greytak et al., 2018).
Unlike with familial searching of law enforcement databases, no one is legally required to
contribute to a genetic genealogy database, and the samples are not in the possession of
government agencies. The persons contributing to GEDmatch are warned explicitly that
criminal investigators as well as fellow genealogy enthusiasts are able to perform comparisons
against their data. If they choose to participate anyway, there is no reason why law
enforcement should not be able to use this information. These significant differences from
familial searching argue against automatically applying familial search policies, such as
restricting analysis to the end of an investigation, to genetic genealogy. The two techniques are
entirely independent; familial searching has previously been used in some genetic genealogy
cases and not in other; The public is strongly in favor of the use of genetic genealogy to
investigate violent crimes: GEDmatch saw a significant increase in the number of participants
after the Golden State Killer arrest (Milian, 2018), and a recent survey showed overwhelming
public support (Guerrini et al., 2018).
Database Searching
A GEDmatch one-to-many query compares the DNA of interest to all public data in the
database, returning a list of individuals who share the most autosomal DNA. Each “match”
includes the individual’s name or alias, the email address associated with their GEDmatch
account, and any haplogroup or family tree information they have chosen to share (Figure 4).
8
Figure 4: Top five results from a GEDmatch one-to-many comparison, with potentially identifying
information (kit numbers, names, and email addresses) removed.
A one-to-one comparison can then be run on each match using a more precise algorithm to see
the lengths and chromosomal locations of the shared segments. Comparing the amount of
shared DNA to reference data (e.g., (Bettinger & Perl, 2018)) gives the probability that the
relationship between the unknown individual and the match falls into each degree of
relatedness. For example, a match sharing 100 cM could be anywhere from 5th degree to >8th
degree, with 6th degree being most likely.
However, there are additional complications. First, in addition to multiple possible degrees of
relatedness, each degree contains many relationship types that must be considered (e.g., 5th
degree relatives around the same age could be second cousins, first cousins twice-removed, or
half-first cousins once-removed). Second, the amount of DNA shared by each relationship
varies among populations. Populations founded by a small number of individuals can have low
genetic diversity and high background relatedness, or endogamy. In such populations,
individuals with a given relationship will share significantly more DNA than in other populations,
such that even very distant cousins can share significant amounts of DNA. Endogamy
manifests as a large number of matches, each sharing many small segments, indicating that the
segments were actually inherited from distant ancestors (ISOGG, 2019). Another challenge is
pedigree collapse, in which the same families intermarry multiple times throughout history,
which can inflate the amount of shared DNA between their descendants.
Casework Match Results
More than 80% of samples from Parabon’s >250 law enforcement cases have resulted in a
match at the third cousin level or closer (>60 cM), with subjects of European descent having a
higher probability of success due to their overrepresentation in genetic genealogy databases
(Greytak & Moore, 2018) (Figure 5A). European descent was assessed by Snapshot DNA
Phenotyping, which infers an individual’s genetic admixture from seven continental populations
(African, Middle Eastern, European, Central/South Asian, East Asian, Oceanian, and Native
American). In this analysis, samples were considered “European” if they had at least 80%
European ancestry. Note that the law enforcement cases submitted to Parabon are primarily
from North American agencies, and samples from other regions will likely have lower match
probabilities due to lower participation in DTC genetic testing and use of GEDmatch.
The closeness of the top match is not the sole variable in determining viability for genetic
genealogy. A comprehensive assessment must include consideration not only of the closest
match, but of the quality of the supporting matches and the amount of information available
about each match. For example, progress may be difficult if the top match has unknown
parentage and/or is from a country where records are not available. Parabon assesses each
sample on a subjective scale: 1) very high probability of identification (e.g., parent-child match),
2) high probability of identification, 3) medium probability of identification, 4) low probability of
identification but likely to generate actionable information, and 5) unlikely to generate actionable
information. An assessment does not guarantee a particular outcome but is intended to help
9
agencies to decide how to proceed. Thus far, ~80% of European samples and ~60% of non-
European samples have been assessed as workable (assessments 1-4) (Figure 5B).
Figure 5: For Parabon’s >250 law enforcement samples, the frequency of A) the top GEDmatch one-to-
many match being in each degree of relatedness and B) samples receiving each assessment level.
Results are reported for European, non-European, and all samples, as well as for those cases that have
been solved (i.e., resulted in an identification) thus far. Degree of relatedness is based solely on the
amount of shared DNA, not the true relationship determined through genealogy: Parent-Child (>3300 cM),
Full Siblings (2200-3300), 2nd Degree (1300-2200), 3rd Degree (650-1300), 4th Degree (340-650), 5th
Degree (200-340), 6th Degree (90-200), 7th Degree (60-90), 8th Degree (30-60), >8th Degree (<30).
Importantly, just because a sample does not have sufficient promising match data today does
not mean it never will. Hundreds of new individuals upload their data to GEDmatch every day
(Milian, 2018), and as the database grows, the proportion of samples with close matches will
increase. Thus, Parabon monitors all unsolved cases for new matches on a weekly basis.
Genealogy Research
While most of the discussion surrounding genetic genealogy focuses on the database matches,
the vast majority of genetic genealogy work happens after the match list is generated. Many US
records are available to the public and have been compiled into searchable databases
accessible via subscription. For example, Ancestry.com provides a mechanism for accessing a
large collection of records, such as the census through 1940, vital records (birth, marriage,
death) from many states, the Social Security Death Index, and Newspapers.com. Some
Ancestry.com users also create and share public family trees, although these can contain
errors, so they must be examined critically. People search databases and public social media
can also be used to help determine family structures. In some cases, law enforcement may be
asked to assist with this research using their greater access to records.
A previous analysis of the MyHeritage DTC database showed that ~60% of individuals of
Northern European descent will have a match at 100 cM or closer (Erlich, Shor, Pe, & Carmi,
2018). Using simulation, the authors showed that it is often possible to identify an unknown
individual from a single third cousin level match given knowledge of his or her sex, location
within 100 miles, and age within 5 years. However, in addition to the fact that such detailed
demographic information is often not available in law enforcement cases, this assumes that,
given a third cousin match, it is straightforward to obtain a complete list of the match’s relatives
at that distance (the authors determined this number to be ~850, not including half relatives). In
reality, a massive amount of work is required to expand a match into a list of relatives (Greytak
A)
B)
10
et al., 2018).
The first task is to definitively identify each match, which itself can be quite difficult. Although
GEDmatch displays the name and email address associated with each matching kit, users can
choose to use an alias or an anonymous email address, and kits are sometimes managed by
someone other than the match themselves. Moreover, even if a user associates their actual
name, it may be common (e.g., John Smith), which can complicate identification. Consequently,
the initial identification of matches is both critical and challenging, and often requires
considerable genetic genealogical skill and creative problem solving, e.g., deciphering initials,
inferring identities from other identifiable matches, and figuring out who DNA is from when the
kit is managed by someone else. Even though contacting matches via the given email address
might enable identification and even produce family tree information, Parabon seldom contacts
matches directly so as to minimize the number of people involved in an investigation and reduce
the risk of tipping off a suspect. Matches closer than third cousins are only contacted with the
permission of the investigating agency, and the agency can choose to make the contact instead.
Any contact includes the fact that the questions are in regard to a law enforcement investigation
(no specifics of the case are given), and the individual is informed they are free to participate or
not. If the individual asks not to be involved, they are not contacted again.
Once the matches are identified, their family trees must be constructed back to the set of
possible common ancestors with the unknown individual. The number of generations back in
time to the common ancestors of interest is determined by the distance of the matches’
relationships, although since the estimates are not usually specific to a single relationship, often
the family trees must be built even further back than these levels would imply. Building family
trees back in time requires traditional genealogy research: combing through public records to
determine the identities of each generation’s parents.
However, records are not always available - not all US states maintain an accurate and public
birth index, many families trace back to immigrants from other countries where records are not
readily available, etc. In addition, biological family trees often do not match documented family
trees due to misattributed paternity, unrecorded adoption, unknown parentage, etc., and
individuals in these situations are overrepresented in genetic genealogy databases. Surnames
and spellings also often change through the generations, further complicating the analysis.
Descendancy Research
Once possible common ancestors have been identified, the family trees must then be built
forward in time (“descendancy research” or “reverse genealogy”) to elucidate the possible
identities of the unknown individual (Figure 6).
Figure 6: A hypothetical family tree resulting from genetic genealogy research. Given a match in
GEDmatch (orange star), the family tree is built backward in time to the possible common ancestors
(orange) and then forward in time (blue) to determine the possible identities of the unknown individual (in
this case, from among the “second cousins”).
11
The possible ancestors from which the unknown individual descends can sometimes be
narrowed using genomic ancestry (e.g., if the family tree is Northern European, but the unknown
individual has 25% ancestry from another population, the genetic genealogist can search
among the possible grandparents for one who married someone from that ancestral group).
Shared DNA on the X-chromosome can also narrow down the possible paths between matches,
as males only inherit X-DNA from their mothers. Thus, if an unknown male shares X-DNA with
a match, they must be related through his mother, and the path between them cannot pass
through two males in a row. When available, Y-chromosome and mitochondrial (mtDNA)
haplogroups can also narrow down the possibilities, as these are passed directly from father to
son and from mother to child, respectively. Thus, individuals share a mtDNA haplogroup with
their maternal lineage, and males share a Y haplogroup with their paternal lineage.
DNA sharing among matches can also be used to narrow down where the unknown individual
falls in the tree. If matches do not share any DNA with one another, they are likely related to the
individual on different branches of his or her family tree, and the genetic genealogist can then
search for an intersection (“triangulation”) between the two matches’ families in the form of a
marriage that produced children or an out-of-wedlock birth (Figure 7). While there could be
hundreds or thousands of individuals who are second or third cousins to a single match, there
are typically only a few individuals who are cousins at the right distance to multiple matches.
Figure 7: Triangulation between two hypothetical family trees. Given two matches in GEDmatch who are
unrelated to one another (orange stars), family trees are built for each and then searched for an
intersection (green) in the form of a marriage or out-of-wedlock birth. Children of this intersection are
related to both matches, while all other individuals in the tree are only related to one match.
Narrowing Down the Possible Identities
Once candidate individuals have been identified, the genetic genealogist can use a variety of
factors to include or exclude them, in addition to traditional investigative information, such as a
connection to the crime scene or the victim. Sex is known from the DNA, and some age
information may be available for unidentified remains, age can be estimated; for perpetrators,
at minimum, they had to be alive and physically capable of committing the crime. The individual
also had to be in a given location at a given time, which may mean he or she lived nearby.
While the GEDmatch matches may be spread across the US or even the world, it is sometimes
possible to focus on a particular branch of the family that moved close to the location of interest.
Parabon’s genetic genealogists also use Snapshot DNA Phenotyping (Greytak & Armentrout,
2015) to prioritize among individuals and confirm or exclude hypotheses. An individual’s eye
color, hair color, and skin color can often be determined from mugshots, yearbook photos, or
social media and compared to the predictions. Full siblings cannot be distinguished using
genetic genealogy, as they share all the same genealogical relationships with the matches.
However, if they differ in phenotype, this can be used to prioritize among them. Similarly, if
genealogy research leads to an individual whose phenotypes are at odds with the predictions,
this can spur continued research, while a close similarity can help corroborate an identification.
The degree to which the identity of the unknown individual can be narrowed down varies from
case to case. In the best-case scenario, a single individual or a set of siblings can confidently
be identified through matches to multiple branches of their family tree. More often, there are
12
multiple cousins (descendants of a particular set of common ancestors) who are consistent with
the available information. These leads can then be followed up through additional research,
traditional investigation, and/or targeted kinship testing of family members to more precisely
place the unknown individual in the family tree. Parabon’s Snapshot Kinship Inference tool uses
genome-wide SNP data to predict the precise degree of relatedness between individuals, out to
6th-degree relatives (Greytak et al., 2017). Using a machine learning model built on thousands
of reference subjects with known relationships, Snapshot predicts the probability that a pair
belongs to each degree of relatedness. Confidence is calculated using the probability of the
most likely degree and the precision calculated for that degree in cross-validation.
Law Enforcement Leads
During decades-long cold case investigations, hundreds or thousands of individuals may be
investigated before the perpetrator is found. Genetic genealogy offers an efficient means of
narrowing an investigation, often to only a few individuals. The number of possible relatives
included in a genetic genealogy analysis varies depending on the number and distance of the
matches. Even when the only matches are distant and large family trees must be constructed
because common ancestors are many generations in the past, experienced genetic
genealogists can triangulate among the matches to determine the most promising branches of
the family tree and limit the amount of unnecessary tree building. Given sufficient triangulation
and time, the number of leads can be reduced to the offspring of a single couple.
No matter how confident the identification, however, genetic genealogy alone cannot prove
identity with 100% certainty. There is always a remote possibility that the unknown individual
could have been adopted or abandoned, and his or her existence could be unknown to family
and not revealed through official records. Therefore, genetic genealogy leads must be verified
through a direct DNA comparison between the person-of-interest’s STR profile and that of the
crime scene sample. It is this traditional forensic DNA match that is used for prosecution.
Case Studies
The following case studies demonstrate how genetic genealogy has been used to assist
investigators with identifying a suspect in cold case investigations. Only information approved
for public release by the investigating agencies is included, so some case details (e.g., DNA
sample source, exact GEDmatch match information) have been obfuscated.
Case Study #1: Snohomish County, WA; 31-year-old cold case (double homicide)
This case study demonstrates the ideal genetic genealogy case, where there are close matches
and clear familial connections that point to only a single conclusion. However, even seemingly
straightforward cases require a large amount of research and the expertise to recognize and
cope with confounding factors such as unknown and misattributed parentage.
The Crime: In 1987, a young Canadian couple, Jay Cook (20) and Tanya Van Cuylenborg (18),
traveled from British Columbia to Washington State in a van. After purchasing a ferry ticket to
Seattle, they were never heard from again. Days later, Tanya’s body was found in a ditch in the
woods, and a few days after that, Jay’s body and the van were found in two separate locations.
DNA evidence was obtained for an unknown suspect (“Subject”).
GEDmatch: There were two matches at approximately the 5th degree relative level, plus
additional more distant matches. The top two matches had no shared DNA between them,
meaning they were most likely related to the Subject on different branches of his family tree.
Family Trees: Family trees were constructed for both key matches back to their great-
13
grandparents and beyond using census records, vital records, newspaper archives, public
“people search” databases, public social media data, and public family trees. Next,
descendancy research was performed to trace the descendants of each set of ancestors to
determine if an intersection between them could be found.
A triangulating marriage was found between a granddaughter of Match #2’s great-grandparents
and a son of Match #1’s great-grandmother. Extensive research revealed that this son had
taken his stepfather’s surname, initially obscuring his true relationship to Match #1. Thus, the
children of this marriage were half first cousins once-removed to Match #1, as well as second
cousins to Match #2. While both of these relationships are 5th degree, it is critical to consider
all possible relationship types, as half relationships are quite common. No other marriages were
found between the descendants of these ancestors. There was only one son from this
marriage, William Earl Talbott II, and he was therefore the only known male who could be
carrying this mix of DNA from both matches’ families (Figure 8).
Mr. Talbott had never been arrested for a crime that would require submitting DNA to a
database. He had no known connection to the victims and no reason to have been on the
investigators’ radar. His phenotypes matched those predicted by Snapshot, but without other
information to tie him to the crime, this had not been enough to identify him as a suspect.
Figure 8: Anonymized family tree released by the Snohomish County Sheriff’s Department as part of
their announcement of the arrest of William Earl Talbott II. The tree shows the position of Mr. Talbott
(Suspect) and two GEDmatch matches (Cousins) used to determine his identity.
Resolution: Based on the lead provided by genetic genealogy, the detectives were able to
collect DNA from a cup discarded by Mr. Talbott, which, using traditional STR analysis, was
shown to match the DNA from the crime scene. He was arrested and is currently awaiting trial.
Case Study #2: Tacoma, WA; 32-year-old cold case (homicide)
Triangulation between matches using documentary sources is sometimes not possible. In
addition to being able to tenaciously research records and meticulously build family trees, this
14
case study shows how genetic genealogists must be able to think creatively about possible
hypotheses to explain the available data.
The Crime: 12-year old Michella Welch went missing on 26 March 1986. She had taken her
two younger sisters to Puget Park in Tacoma, Washington and then ridden her bicycle home to
make lunch while her sisters played nearby. When the sisters returned to the park, they found a
brown paper bag with their lunches but no Michella. By 3:10 p.m., officers arrived at the park
and started searching for the missing girl. A tracking dog found her body around 11:30 p.m.
She had been beaten and sexually assaulted and died from a cut to the neck.
The DNA: Another young Tacoma girl, Jennifer Bastian, was also killed around the same time,
and investigators had long believed one person committed both crimes. More than 10,000
investigative hours went into the cases in 1986 alone. Recent DNA testing showed that the
crimes were committed by different men, but neither DNA profile resulted in a CODIS match.
Genetic Ancestry: The Subject was predicted to be predominantly Northern European with a
small but notable amount of Northern Native American admixture (~10%).
GEDmatch: The two top matches did not share DNA, suggesting they were most likely related
to the Subject on different branches of his family tree.
Family Trees: Trees were built for the two top matches back to their great-great-grandparents
and beyond, and extensive descendancy research was performed, but no documented
intersection was found between the two families. The analyst identified a pair of brothers who
were cousins of Match #1, lived within a few miles of the crime scene in 1986, and had two
Native American great-great-grandparents on different branches of their family trees, which was
consistent with the predicted ancestry of the Subject. However, the Subject only shared about
half as much DNA with Match #1 as would be expected for a cousin, and there should have
been an intersection between the families that would connect these cousins to both matches.
When families are connected through DNA but do not intersect on paper (e.g., through a
marriage license or a birth certificate), the explanation may be misattributed paternity: a pair of
individuals from each family had a child together, but the true biological father was not recorded.
Through census record research, it was discovered that relatives of the two matches had lived
in the same small town when one of the cousins’ ancestors was conceived. This was the only
discovered geographical intersection between these families. Based on the amount of shared
DNA, it was postulated that Match #2’s relative was the unrecorded biological father of the
cousins’ ancestor (Figure 9). Under this hypothesis, the cousins would actually be half cousins
to Match #1, which matched the amount of shared DNA. They would also be related to Match
#2 at the appropriate genetic distance.
15
Figure 9: Pedigree for two cousins of Match #1 who were identified as persons-of-interest in the Tacoma
case, showing the apparent misattributed paternity between Match #1’s relative and Match #2’s relative.
Resolution: The genetic genealogy analysis identified a pair of brothers who could be the
Subject, neither of whom had ever been arrested for a crime that would have required
submission of DNA to a database. Officers were eventually able to follow one of the brothers,
Gary Charles Hartman, into a restaurant, where they obtained a napkin he had used and
discarded. Traditional STR analysis showed that the DNA on the napkin matched the DNA
found at the crime scene. More than thirty years after Michella Welch was found murdered in a
Washington park, investigators announced that they had arrested a suspect in her murder.
Hartman is currently awaiting trial.
Case Study #3: Nearly 40-year-old cold case (homicide)
When there are not enough strong matches in GEDmatch to fully narrow down the possible
branches of a large family tree, cases cannot always be resolved efficiently through genetic
genealogy alone. If an intersection between the matches’ families cannot be found, the number
of possible identities for the Subject can be very large. However, as this case study shows, if
family members of the matches are willing to cooperate, targeted kinship testing can quickly
include or exclude various branches of the family tree and thus arrive at a small number of
included individuals. Due to the close relatives of the suspect who were eventually found in this
investigation, the details of this case are not included to protect their privacy.
GEDmatch: The Subject’s top two matches were both in the 6th-8th degree relative range and
had no shared DNA between them, meaning they were most likely related to the Subject on
different branches of his family tree. There were also additional, more distant matches.
Family Trees: Trees were built for the two top matches back to their great-great-grandparents,
but no intersection was found between the two families. The Subject was most likely a great-
grandson or great-great-grandson of one of Match #1’s great-great-grandparent couples, but
without triangulation, it was not possible to narrow his identity down further. Parabon
recommended more research to identify branches of the family that might have moved to the
area of the crime, as well as targeted kinship testing of members of the top match’s family.
Kinship Testing: The investigating agency obtained a voluntary buccal swab from a cousin on
16
Match #1’s paternal side, from which DNA was extracted, genotyped, and compared to the
Subject. Snapshot Kinship Inference predicted this individual was unrelated to the Subject, and
Match #1’s paternal family could therefore likely be excluded (assuming the familial
relationships on paper were correct). The agency then obtained a voluntary buccal swab from a
cousin on Match #1’s maternal side, who was predicted with 94.2% confidence to be a 3rd
degree relative (first cousin or genetic equivalent) to the Subject.
Targeted Family Trees: The analyst built family trees for the spouses of each of the kinship
tester’s maternal aunts and uncles back to their great-great-great-grandparents. One uncle’s
wife was determined to be a distant cousin to many of the Subject’s more distant matches. This
triangulation meant that one of the male children of this couple was most likely the Subject, as
he would be related to the GEDmatch matches on both sides of his family tree second cousins
once-removed (6th degree relatives) to Match #1 and distant cousins (ranging from third
cousins once-removed to fifth cousins once-removed) to Distant Matches #1-7 (Figure 10).
Importantly, barring additional independent intersections between these family trees, the
identified Persons of Interest were the only individuals who were related to both of these
families. These children were also the right age at the time of the crime, lived nearby, and all
appeared to have phenotypes consistent with the Snapshot predictions.
Figure 10: Pedigree built for Match #1’s family after the possible branches leading to the Subject were
narrowed down through targeted kinship testing and subsequent triangulation with distant matches.
Resolution: The genetic genealogy analysis identified a set of brothers who could be the
Subject, none of whom had ever been arrested for a crime that would have required submission
of DNA to a database. Officers were eventually able to narrow the investigation down to a
single brother and match his DNA to the crime scene DNA using traditional STR analysis. He
17
has been arrested and is awaiting trial.
Conclusions
Genetic genealogy has been called “2018’s biggest contribution to crime science” (Augenstein,
2018) and is rapidly changing the face of cold case investigations. Even for perpetrators who
are completely under the radar or long dead, given DNA from a crime scene, it may be possible
to identify them with genetic genealogy. Importantly, genetic genealogy has just as much power
to generate leads in active cases as in cold cases. In fact, it was recently used to identify a
perpetrator in a sexual assault case that had occurred only three months earlier (Havens, 2019),
and he has since pled guilty. Rather than wait until years have passed and all other leads have
been exhausted, investigators now have access to innovative forensic DNA technologies that
can generate significant new leads and prevent cases from going cold. Looking to the future,
genetic genealogy has the potential to significantly reduce the number of unsolved cold cases in
North America while also reducing the rate at which cases go cold.
References
Augenstein, S. (2018). Working Backward From Genealogy: Tracking a Dead Killer’s Trail.
Forensic Magazine.
Ball, C., Barber, M., Byrnes, J., Carbonetto, P., Chahine, K., Curtis, R., . . . Willmore, L. (2016).
Ancestry DNA Matching White Paper. Retrieved from
https://www.ancestry.com/corporate/sites/default/files/AncestryDNA-Matching-White-
Paper.pdf
Bettinger, B. T., & Perl, J. (2018). The Shared cM Project 3.0 tool v4. Retrieved from
https://dnapainter.com/tools/sharedcmv4
Erlich, Y., Shor, T., Pe, I., & Carmi, S. (2018). Identity inference of genomic data using long-
range familial searches. Science, 362(6415), 690-694. doi:10.1126/science.aau4832
GEDmatch.com. Terms of Service and Privacy Policy.
Greytak, E., & Moore, C. (2018). Closing Cases with a Single SNP Array: Integrated Genetic
Genealogy, DNA Phenotyping, and Kinship Analyses. Proceedings of the 29th
International Symposium on Human Identification.
Greytak, E. M., & Armentrout, S. (2015). DNA Phenotyping: Predicting Ancestry and Physical
Appearance from Forensic DNA. Proceedings of the 26th International Symposium on
Human Identification.
Greytak, E. M., Gorden, E. M., Marshall, C. K., Sturk-Andreaggi, K., McMahon, T. P., &
Armentrout, S. L. (2017). SNP Recovery from Degraded Samples for Kinship
Assessment.
Greytak, E. M., Kaye, D. H., Budowle, B., Moore, C., & Armentrout, S. L. (2018). Privacy and
genetic genealogy data. Science, 361(6405), 857. doi:10.1126/science.aav0330
Greytak, E. M., Moore, C., & Armentrout, S. L. (2018). RE: Identity inference of genomic data
using long-range familial searches, Erlich et al. Science, 362(6415) (2018), 690-694
(eLetter, 10-29-18).
Guerrini, C. J., Robinson, J. O., Petersen, D., & McGuire, A. L. (2018). Should police have
access to genetic genealogy databases? Capturing the Golden State Killer and other
criminals using a controversial new forensic technique. PLOS Biology, 16(10),
e2006906-e2006906. doi:10.1371/journal.pbio.2006906
Havens, E. (2019). Elderly woman in home invasion rape case: I forgive my attacker. St.
George Spectrum & Daily News. Retrieved from
https://www.thespectrum.com/story/news/2019/02/26/elderly-woman-home-invasion-
rape-case-forgive-my-attacker/2995143002/
18
Henn, B. M., Hon, L., Macpherson, J. M., Eriksson, N., Saxonov, S., Pe'er, I., & Mountain, J. L.
(2012). Cryptic distant relatives are common in both isolated and cosmopolitan genetic
samples. PLoS One, 7. doi:10.1371/journal.pone.0034267
Huff, C. D., Witherspoon, D. J., Simonson, T. S., Xing, J., Watkins, W. S., Zhang, Y., . . . Jorde,
L. B. (2011). Maximum-likelihood estimation of recent shared ancestry (ERSA). Genome
Research, 21, 768-774. doi:10.1101/gr.115972.110
International Society of Genetic Genealogy (2019). Endogamy. Accessed January 30, 2019.
Retrieved from https://isogg.org/wiki/Endogamy
Keating, B., Bansal, A. T., Walsh, S., Millman, J., Newman, J., Kidd, K., . . . Kayser, M. (2013).
First all-in-one diagnostic tool for DNA intelligence: genome-wide inference of
biogeographic ancestry, appearance, relatedness, and sex with the Identitas v1 Forensic
Chip. International Journal of Legal Medicine, 127, 559-572. doi:10.1007/s00414-012-
0788-1
Manichaikul, A., Mychaleckyj, J. C., Rich, S. S., Daly, K., Sale, M., & Chen, W.-M. (2010).
Robust relationship inference in genome-wide association studies. Bioinformatics, 26,
2867-2873. doi:10.1093/bioinformatics/btq559
Milian, J. (2018). Cold-case murders, rapes cracked by Lake Worth genealogy website. The
Palm Beach Post. Retrieved from https://www.palmbeachpost.com/news/20181129/cold-
case-murders-rapes-cracked-by-lake-worth-genealogy-website
Regalado, A. (2019). More than 26 million people have taken an at-home ancestry test. MIT
Technology Review. Retrieved from https://www.technologyreview.com/s/612880/more-
than-26-million-people-have-taken-an-at-home-ancestry-test/
... Uma parte significativa teve o envolvimento da Parabon NanoLabs, uma empresa que oferece serviços forenses, como genealogia genética, inferência de parentesco e fenotipagem forense de DNA (a este respeito, ver também Wienroth, 2018a). Numa publicação mais recente, os membros da empresa referem-se a mais de 30 casos criminais resolvidos por eles e seus colaboradores (Greytak, Moore & Armentrout, 2019). ...
... Utilizando este método, a pesquisa familiar pode, na melhor das hipóteses, identificar parentes biológicos próximos (irmãos, pais ou filhos). Em contraste, pesquisas familiares de longo alcance em bancos de dados genéticos recreativos utilizam polimorfismos de nucleotídeos únicos (SNPs, do inglês Single nucleotide polymorphism), que são caracterizados por sua riqueza informacional (Greytak et al., 2019;Kennett, 2019;Murphy, 2018) 27 . Como resultado, este tipo de uso disponibiliza dados mais informativos para a aplicação da lei, ao mesmo tempo que expande significativamente a rede de pessoas que podem ser impactadas por tais procedimentos (Murphy, 2018). ...
... Isto implica, portanto, que a vigilância genética já não é restrita à "gestão daqueles já considerados criminosos" 29 (Williams & Johnson, 2004, p. 11): atualmente 27 Além disso, como as empresas de DTC recolhem DNA de kits de saliva ou esfregaços de mucosa jugal, os perfis de DNA são sempre baseados em uma grande quantidade de DNA de alta qualidade proveniente de uma única fonte. Em contraste, as amostras de DNA forense podem enfrentar vários obstáculos de análise por terem apenas uma pequena quantidade de DNA degradado e/ou misturada com DNA de outros indivíduos (Greytak et al., 2019). 28 Tradução livre. ...
Book
Full-text available
Este livro mobiliza uma perspectiva sociológica crítica para explorar modos contemporâneos da governança da criminalidade por via da genética forense. Helena Machado e Rafaela Granja abordam um conjunto de temas útil à compreensão do lugar e do papel da genética nos sistemas de justiça criminal, bem como os desafios sociais, éticos e políticos subjacentes. Em particular, as autoras exploram os usos da genética para identificar suspeitos criminais ou para prever o comportamento humano e os riscos para a privacidade e direitos humanos associados, a expansão da vigilância transnacional e o uso do big data. O livro integra também a análise de tecnologias controversas que têm o potencial de consolidar a criminalização e estigmatização de determinados grupos sociais, indivíduos e famílias, bem como fazer recrudescer manifestações racistas baseadas na biologia. Redigido numa linguagem acessível, este livro destina-se a estudantes, pesquisadores e profissionais de diversas áreas – da Sociologia, Criminologia e outras ciências sociais ao Direito e à Genética Forense.
... Such use of genetic information to trace the identity of skeletal remains is not new; it had been previously applied to investigate the identities of skeletal remains within purported Romanov graves and had been a resource for thousands of hobbyists and amateur genealogists (36,53,68). However, the expansion of direct-to-consumer personal genome services moved the use of genetic genealogy from amateur and historical endeavors into criminal investigations (41). ...
... Since the 2007 emergence of personal genome services, broad swaths of people have gained access to their genetic code (73), which in turn has led to the growth of private and public databases that contain genetic data from millions of people. Since a transformative criminal investigation in 2018, the use of genetic genealogy has expanded from a primarily recreational and civic tool to one now used broadly in police and forensic investigations (41,97). ...
... Two recent reviews provide background on investigative genetic genealogy (IGG): Kennett (63) provided a comprehensive review of IGG in criminal investigations and missing persons, including details on how public data are stored and accessed, and Greytak et al. (41) provided a description of IGG technical approaches. This review examines how IGG came to be a favored investigative approach for crime solving in the United States, how the approach is expected to expand in the coming years, and the ethical and policy challenges raised by the ever-increasing amount of data accessible to law enforcement. ...
Article
In the past few years, cases with DNA evidence that could not be solved with direct matches in DNA databases have benefited from comparing single-nucleotide polymorphism data with private and public genomic databases. Using a combination of genome comparisons and traditional genealogical research, investigators can triangulate distant relatives to the contributor of DNA data from a crime scene, ultimately identifying perpetrators of violent crimes. This approach has also been successful in identifying unknown deceased persons and perpetrators of lesser crimes. Such advances are bringing into focus ethical questions on how much access to DNA databases should be granted to law enforcement and how best to empower public genome contributors with control over their data. The necessary policies will take time to develop but can be informed by reflection on the familial searching policies developed for searches of the federal DNA database and considerations of the anonymity and privacy interests of civilians. Expected final online publication date for the Annual Review of Genomics and Human Genetics, Volume 21 is August 31, 2020. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
... In addition to direct-to-consumer (DTC) genetic genealogy companies, law enforcement agencies have also partnered with companies, like Parabon Nanolabs, which among others specializes in DNA phenotyping, to help them investigate unsolved cold cases. By using a DNA analysis methodology, as was described in a paper by Greytak et al., Parabon has successfully assisted law enforcement in making positive identifications and solving over 50 other high-profile criminal cases, since May 2018 [9][10][11]. In addition to the numerous identifications, guilty pleas and confessions, this new forensic technique has now led to a guilty verdict, making this the first case where a man was convicted after being arrested through genetic genealogy search [12,13]. ...
... future science group10.2217/pme-2019-0100 ...
Article
The rapidly evolving popularity of direct-to-consumer genetic genealogy companies has made it possible to retrieve genomic information for unintended reasons by third parties, including the emerging use for law enforcement purposes. The question remains whether users of direct-to-consumer genetic genealogy companies and genealogical databases are aware that their genetic and/or genealogical data could be used as means to solving forensic cases. Our review of 22 companies’ and databases’ policies showed that only four companies have provided additional information on how law enforcement agencies should request permission to use their services for law enforcement purposes. Moreover, two databases have adopted a different approach by providing a special service for law enforcement. Although all companies and databases included in the study provide at least some provisions about police access, there is an ongoing debate over the ethics of these practices, and how to balance users’ privacy with law enforcement requests.
... In April 2018, the use of investigative genetic genealogy drew international attention with the identification and arrest of alleged "Golden State Killer" Joseph DeAngelo [27,28]. Genetic data from distant relatives in public genetic genealogy databases have aided dozens of other cold case investigations since then [29] and increased discussions around genetic privacy (e.g., Refs. ...
... Forensic Science International organized a virtual special issue on cold cases (guest editors: Rob Davis and James Adcock) with articles spread across the May, June, July, and August 2019 issues of the journal [234] covering topics such as considering the benefits versus the costs of DNA testing in sexual assault cases [235], genetic genealogy for cold case and active investigations [29], and assisting missing persons cases with genetic genealogy database searches [236]. ...
Article
Full-text available
This review paper covers the forensic-relevant literature in biological sciences from 2016 to 2019 as a part of the 19th Interpol International Forensic Science Managers Symposium. The review papers are also available at the Interpol website at: https://www.interpol.int/content/download/14458/file/Interpol%20Review%20Papers%202019.pdf.
... We also observe that the open personal genomics database and genealogy website, GEDmatch [38], is one of the topics with the greatest weights (0.054); see Table 3. GEDmatch allows users to upload their genetic data obtained from DTC genetic testing companies to identify potential relatives who have also uploaded their data. Interestingly, in December 2018, US police forces declared that GEDmatch helped them find suspects in 28 cold murder and rape cases [40]. Overall, as shown in Figure 2, the subreddits about genetics and ancestry attract far less toxic comments than the random Reddit sample, and are the least toxic categories among the rest in our dataset. ...
Conference Paper
Progress in genomics has enabled the emergence of a booming market for “direct-to-consumer” genetic testing. Nowadays, companies like 23andMe and AncestryDNA provide affordable health, genealogy, and ancestry reports, and have already tested tens of millions of customers. At the same time, alt- and far-right groups have also taken an interest in genetic testing, using them to attack minorities and prove their genetic “purity.” In this paper, we present a measurement study shedding light on how genetic testing is being discussed on Web communities in Reddit and 4chan. We collect 1.3M comments posted over 27 months on the two platforms, using a set of 280 keywords related to genetic testing. We then use NLP and computer vision tools to identify trends, themes, and topics of discussion. Our analysis shows that genetic testing attracts a lot of attention on Reddit and 4chan, with discussions often including highly toxic language expressed through hateful, racist, and misogynistic comments. In particular, on 4chan's politically incorrect board (/pol/), content from genetic testing conversations involves several alt-right personalities and openly antisemitic rhetoric, often conveyed through memes. Finally, we find that discussions build around user groups, from technology enthusiasts to communities promoting fringe political views.
Article
Full-text available
Currently, there are over a thousand unsolved homicide cases in Poland. Up to this point, numerous, mostly popular science, research papers have been focusing on the individual units in charge of these difficult cases. This paper, however, is an attempt to represent the current state of investigations that were discontinued due to the fact that the perpetrators could not be found, hereinafter referred to as Cold Case Homicides. This paper depicts both the researcher's perspective and the statistical side of such conduct. Furthermore, it presents the first results of a pilot study conducted among the prosecutors, concerning the problem of Cold Case Homicides from their perspective, the possibility of cooperation with the academics, and their opinion on the idea of complex research, concerning the reconstruction of events in this specific area of crime.
Preprint
Full-text available
DNA-assisted identification of historical remains requires the genetic analysis of highly degraded DNA, along with a comparison to DNA from known relatives. This can be achieved by targeting single nucleotide polymorphisms (SNPs) using a hybridization capture and next-generation sequencing approach suitable for degraded skeletal samples. In the present study, two SNP capture panels were designed to target ∼25,000 (25K) and ∼95,000 (95K) autosomal SNPs, respectively, to enable distant kinship estimation (up to 4 th degree relatives). Low-coverage SNP data were successfully recovered from 14 skeletal elements 75 years postmortem, with captured DNA having mean insert sizes ranging from 32-170 bp across the 14 samples. SNP comparison with DNA from known family references was performed in the Parabon Fχ Forensic Analysis Platform, which utilizes a likelihood approach for kinship prediction that was optimized for low-coverage sequencing data with DNA damage. The 25K and 95K panels produced 15,000 and 42,000 SNPs on average, respectively allowing for accurate kinship prediction in 17 and 19 of the 21 pairwise comparisons. Whole genome sequencing was not able to produce sufficient SNP data for accurate kinship prediction, demonstrating that hybridization capture is necessary for historical samples. This study provides the groundwork for the expansion of research involving compromised samples to include SNP hybridization capture. Author Summary Our study evaluates ancient DNA techniques involving SNP capture and Next-Generation Sequencing for use in forensic identification. We utilized bone samples from 14 sets of previously identified historical remains aged 70 years postmortem for low-coverage SNP genotyping and extended kinship analysis. We performed whole genome sequencing and hybridization capture with two SNP panels, one targeting ∼25,000 SNPs and the other targeting ∼95,000 SNPs, to assess SNP recovery and accuracy in kinship estimation. A genotype likelihood approach was utilized for SNP profiling of degraded DNA characterized by cytosine deamination typical of ancient and historical specimens. Family reference samples from known relatives up to 4 th degree were genotyped using a SNP microarray. We then utilized the Parabon Fχ Forensic Analysis Platform to perform pairwise comparisons of all bone and reference samples for kinship prediction. The results showed that both capture panels facilitated accurate kinship prediction in more than 80% of the tested relationships without producing false positive matches (or adventitious hits), which were commonly observed in the whole genome sequencing comparisons. We demonstrate that SNP capture can be an effective method for genotyping of historical remains for distant kinship analysis with known relatives, which will support humanitarian efforts and forensic identification.
Article
Resumen A pesar del avance de las técnicas investigativas de Genética Forense, todavía existen gran cantidad de casos criminales no resueltos y en muchos casos inabordables por su antigüedad. Recientemente hemos asistido al nacimiento de una nueva disciplina forense, la genealogía forense o genealogía genética investigativa, que permite que muchos de estos casos puedan ser resueltos satisfactoriamente combinando la tecnología de análisis de ADN y las herramientas de búsquedas genealógicas. La potencialidad de esta nueva estrategia investigativa es evidente, de la misma manera que lo son sus riesgos. Hemos de mantener un equilibrio entre la privacidad personal y los intereses de las personas afectadas por un lado y la seguridad pública, el bien de la comunidad y la resolución y esclarecimiento de hechos delictivos por el otro, siendo necesario un debate social, legal y científico que clarifique todos estos aspectos y posibilite una adecuada regulación legal de estas prácticas.
Article
The use of genetic genealogy techniques to identify Joseph James DeAngelo as the prime suspect in the Golden State Killer case in 2018 has opened up a new approach to investigation of cold cases. Since that breakthrough, genetic genealogy methods have been reported to be applied to around 100 cases. To date, all of these reports relate to investigations in the US, where the high uptake of "direct-to-consumer" (DTC) genetic testing by individuals conducting private ancestral research has provided the necessary publicly available data for successful forensic investigations. We have conducted a study to assess the likely effectiveness of genetic genealogy techniques if applied to investigations in the UK. Ten volunteers provided their own SNP array data, downloaded from a DTC provider of their choice. These data sets were anonymised and uploaded to the GEDmatch Genesis genealogy website, mimicking data sets from unsourced crime samples or unidentified human remains. A team of experienced genealogists then attempted to identify the donors of the anonymised data sets by working with matches on the database and identifying points where the matches' trees intersect to determine their shared family lineages which were further investigated using traditional resources (such as birth, marriage, death and census records, social media and online family trees). Through these methods, four of the ten donors were identified, at least to the level of one of a set of siblings. This confirms that, despite the over-representation of US citizens on publicly accessible genealogy databases, there is still potential for effective use in investigations outside the US where legislation permits. One of our four identified individuals was of Indian heritage (via St Vincent and the Grenadines) highlighting that in the right circumstances individuals of non-European origin can be identified.
Chapter
Despite their consolidated role in providing evidence for criminal justice, DNA technologies have been subjected to continuous investment that has given rise to the emergence of new DNA technologies. This chapter will focus on such innovations, explaining how forensic genetics is increasingly expanding its role in the criminal justice system. Recent technologies such as familial searching and forensic DNA phenotyping might help to generate intelligence for criminal investigations. Familial searching is a technology that attempts to identify criminal suspects through their genetic connection with relatives. Forensic DNA phenotyping makes it possible to focus on a particular suspect group that shares genetic ancestry and/or externally visible characteristics. The chapter critically reviews the existing debate in the field of social sciences about emerging DNA technologies. The core argument is that the application of DNA phenotyping and familial searching in the governance of crime holds the potential to increase risks of stigmatization and reinforce the criminalization of certain populations who are more vulnerable to the actions of the criminal justice system.
Article
Full-text available
On April 24, 2018, a suspect in California’s notorious Golden State Killer cases was arrested after decades of eluding the police. Using a novel forensic approach, investigators identified the suspect by first identifying his relatives using a free, online genetic database populated by individuals researching their family trees. In the wake of the case, media outlets reported privacy concerns with police access to personal genetic data generated by or shared with genealogy services. Recent data from 1,587 survey respondents, however, provide preliminary reason to question whether such concerns have been overstated. Still, limitations on police access to genetic genealogy databases in particular may be desirable for reasons other than current public demand for them.
Article
Full-text available
When a forensic DNA sample cannot be associated directly with a previously genotyped reference sample by standard short tandem repeat profiling, the investigation required for identifying perpetrators, victims, or missing persons can be both costly and time consuming. Here, we describe the outcome of a collaborative study using the Identitas Version 1 (v1) Forensic Chip, the first commercially available all-in-one tool dedicated to the concept of developing intelligence leads based on DNA. The chip allows parallel interrogation of 201,173 genome-wide autosomal, X-chromosomal, Y-chromosomal, and mitochondrial single nucleotide polymorphisms for inference of biogeographic ancestry, appearance, relatedness, and sex. The first assessment of the chip's performance was carried out on 3,196 blinded DNA samples of varying quantities and qualities, covering a wide range of biogeographic origin and eye/hair coloration as well as variation in relatedness and sex. Overall, 95 % of the samples (N = 3,034) passed quality checks with an overall genotype call rate >90 % on variable numbers of available recorded trait information. Predictions of sex, direct match, and first to third degree relatedness were highly accurate. Chip-based predictions of biparental continental ancestry were on average ~94 % correct (further support provided by separately inferred patrilineal and matrilineal ancestry). Predictions of eye color were 85 % correct for brown and 70 % correct for blue eyes, and predictions of hair color were 72 % for brown, 63 % for blond, 58 % for black, and 48 % for red hair. From the 5 % of samples (N = 162) with <90 % call rate, 56 % yielded correct continental ancestry predictions while 7 % yielded sufficient genotypes to allow hair and eye color prediction. Our results demonstrate that the Identitas v1 Forensic Chip holds great promise for a wide range of applications including criminal investigations, missing person investigations, and for national security purposes.
Article
Full-text available
Although a few hundred single nucleotide polymorphisms (SNPs) suffice to infer close familial relationships, high density genome-wide SNP data make possible the inference of more distant relationships such as 2(nd) to 9(th) cousinships. In order to characterize the relationship between genetic similarity and degree of kinship given a timeframe of 100-300 years, we analyzed the sharing of DNA inferred to be identical by descent (IBD) in a subset of individuals from the 23andMe customer database (n = 22,757) and from the Human Genome Diversity Panel (HGDP-CEPH, n = 952). With data from 121 populations, we show that the average amount of DNA shared IBD in most ethnolinguistically-defined populations, for example Native American groups, Finns and Ashkenazi Jews, differs from continentally-defined populations by several orders of magnitude. Via extensive pedigree-based simulations, we determined bounds for predicted degrees of relationship given the amount of genomic IBD sharing in both endogamous and 'unrelated' population samples. Using these bounds as a guide, we detected tens of thousands of 2(nd) to 9(th) degree cousin pairs within a heterogenous set of 5,000 Europeans. The ubiquity of distant relatives, detected via IBD segments, in both ethnolinguistic populations and in large 'unrelated' populations samples has important implications for genetic genealogy, forensics and genotype/phenotype mapping studies.
Article
Full-text available
Accurate estimation of recent shared ancestry is important for genetics, evolution, medicine, conservation biology, and forensics. Established methods estimate kinship accurately for first-degree through third-degree relatives. We demonstrate that chromosomal segments shared by two individuals due to identity by descent (IBD) provide much additional information about shared ancestry. We developed a maximum-likelihood method for the estimation of recent shared ancestry (ERSA) from the number and lengths of IBD segments derived from high-density SNP or whole-genome sequence data. We used ERSA to estimate relationships from SNP genotypes in 169 individuals from three large, well-defined human pedigrees. ERSA is accurate to within one degree of relationship for 97% of first-degree through fifth-degree relatives and 80% of sixth-degree and seventh-degree relatives. We demonstrate that ERSA's statistical power approaches the maximum theoretical limit imposed by the fact that distant relatives frequently share no DNA through a common ancestor. ERSA greatly expands the range of relationships that can be estimated from genetic data and is implemented in a freely available software package.
Article
Full-text available
Genome-wide association studies (GWASs) have been widely used to map loci contributing to variation in complex traits and risk of diseases in humans. Accurate specification of familial relationships is crucial for family-based GWAS, as well as in population-based GWAS with unknown (or unrecognized) family structure. The family structure in a GWAS should be routinely investigated using the SNP data prior to the analysis of population structure or phenotype. Existing algorithms for relationship inference have a major weakness of estimating allele frequencies at each SNP from the entire sample, under a strong assumption of homogeneous population structure. This assumption is often untenable. Here, we present a rapid algorithm for relationship inference using high-throughput genotype data typical of GWAS that allows the presence of unknown population substructure. The relationship of any pair of individuals can be precisely inferred by robust estimation of their kinship coefficient, independent of sample composition or population structure (sample invariance). We present simulation experiments to demonstrate that the algorithm has sufficient power to provide reliable inference on millions of unrelated pairs and thousands of relative pairs (up to 3rd-degree relationships). Application of our robust algorithm to HapMap and GWAS datasets demonstrates that it performs properly even under extreme population stratification, while algorithms assuming a homogeneous population give systematically biased results. Our extremely efficient implementation performs relationship inference on millions of pairs of individuals in a matter of minutes, dozens of times faster than the most efficient existing algorithm known to us. Our robust relationship inference algorithm is implemented in a freely available software package, KING, available for download at http://people.virginia.edu/∼wc9c/KING.
Article
Consumer genomics databases have reached the scale of millions of individuals. Recently, law enforcement authorities have exploited some of these databases to identify suspects via distant familial relatives. Using genomic data of 1.28 million individuals tested with consumer genomics, we investigated the power of this technique. We project that about 60% of the searches for individuals of European-descent will result in a third cousin or closer match, which can allow their identification using demographic identifiers. Moreover, the technique could implicate nearly any US-individual of European-descent in the near future. We demonstrate that the technique can also identify research participants of a public sequencing project. Based on these results, we propose a potential mitigation strategy and policy implications to human subject research.
DNA Phenotyping: Predicting Ancestry and Physical Appearance from Forensic DNA
  • E M Greytak
  • S Armentrout
Greytak, E. M., & Armentrout, S. (2015). DNA Phenotyping: Predicting Ancestry and Physical Appearance from Forensic DNA. Proceedings of the 26th International Symposium on Human Identification.
RE: Identity inference of genomic data using long-range familial searches
  • E M Greytak
  • C Moore
  • S L Armentrout
Greytak, E. M., Moore, C., & Armentrout, S. L. (2018). RE: Identity inference of genomic data using long-range familial searches, Erlich et al. Science, 362(6415) (2018), 690-694 (eLetter, 10-29-18).
Elderly woman in home invasion rape case: I forgive my attacker
  • E Havens
Havens, E. (2019). Elderly woman in home invasion rape case: I forgive my attacker. St. George Spectrum & Daily News. Retrieved from https://www.thespectrum.com/story/news/2019/02/26/elderly-woman-home-invasionrape-case-forgive-my-attacker/2995143002/