Article

A guide to results and diagnostics within a STRmix™ report

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Until recently, forensic DNA profile interpretation was predominantly a manual, time‐consuming process undertaken by analysts using heuristics to determine those genotype combinations that could reasonably explain a recovered profile. Probabilistic genotyping (PG) has now become commonplace in the interpretation of DNA profiling evidence. As the complexity of PG necessitates the use of algorithms and modern computing power it has been dubbed by some critics as a “black box” approach. Here we discuss the wealth of information that is provided within the output of STRmix™, one example of a continuous PG system. We discuss how this information can be evaluated by analysts either to give confidence in the results or to indicate that further interpretation may be warranted. Specifically, we discuss the “primary” and “secondary” diagnostics output by STRmix™ and give some context to the values that may be observed. This article is categorized under: • Forensic Biology > Interpretation of Biological Evidence • Forensic Biology > Forensic DNA Technologies

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... STRmix interpretations were undertaken using the recommended MCMC parameters (shown in Table 1) [46]. In follow up analyses two interpretations were repeated with an increase in the number of accepts (1,000,000 burn-in and 500,000 post burn-in accepts per chain) to allow each of the chains to explore more possibilities in the probability space [59]. The reported sub-source LRs within the STRmix reports were considered for the analysis in this study. ...
... Comparing performance and likelihood ratios for different PG models discussed in detail in Russell et al. [59]. In actual casework, every analysis should be subjected to diagnostic checks. ...
... In such cases and if samples are sufficient, either replicate analysis or sample reamplification is used. Otherwise, options are to either ignore that locus during deconvolution, or repeat the deconvolution in STRmix with: a random starting seed for the MCMC different than the one that gave LR = 0, or an increase in number of MCMC accepts, or a larger Random Walk Standard Deviation (RWSD) [2,7,36,59,71]. Here, we repeated the runs in STRmix with more MCMC accepts (as discussed in Section 2.7) and the repeated interpretations generated non-zero LRs for the affected loci, and profile log 10 (LRs) of 24.8 and 19.6 (S8 Table). It is to note that these two discussed 2P H1-true test interpretations with profile LRs of 0 assigned by STRmix were plotted: (i) at −125 on the log 10 scale in Fig 2 and S1 and S2 Files; (ii) at-Infinity (-Inf) in Figs 4A and 6A; (iii) at Infinity (Inf) in Figs 7A and 8A; and were binned into the exclusionary verbal category ( Table A in Fig 8). ...
Article
Full-text available
A likelihood ratio (LR) system is defined as the entire pipeline of the measurement and interpretation processes where probabilistic genotyping software (PGS) is a piece of the whole LR system. To gain understanding on how two LR systems perform, a total of 154 two-person, 147 three-person, and 127 four-person mixture profiles of varying DNA quality, DNA quantity, and mixture ratios were obtained from the filtered (.CSV) files of the GlobalFiler 29 cycles 15s PROVEDIt dataset and deconvolved in two independently developed fully continuous programs, STRmix v2.6 and EuroForMix v2.1.0. Various parameters were set in each software and LR computations obtained from the two software were based on same/fixed EPG features, same pair of propositions, number of contributors, theta, and population allele frequencies. The ability of each LR system to discriminate between contributor (H1-true) and non-contributor (H2-true) scenarios was evaluated qualitatively and quantitatively. Differences in the numeric LR values and their corresponding verbal classifications between the two LR systems were compared. The magnitude of the differences in the assigned LRs and the potential explanations for the observed differences greater than or equal to 3 on the log 10 scale were described. Cases of LR < 1 for H1-true tests and LR > 1 for H2-true tests were also discussed. Our intent is to demonstrate the value of using a publicly available ground truth known mixture dataset to assess discrimination performance of any LR system and show the steps used to understand similarities and differences between different LR systems. We share our observations with the forensic community and describe how examining more than one PGS with similar discrimination power can be beneficial, help analysts compare interpretation especially with low-template profiles or minor contributor cases, and be a potential additional diagnostic check even if software in use does contain certain diagnostic statistics as part of the output.
... In the interpretation of the mixtures in this study, there were six observations of exclusions of known donors to the mixture using STRmix™. Following normal casework protocol, we carefully scrutinized the results by first assessing the primary diagnostics [21]. We would have also further scrutinized the secondary diagnostics should it have been required [21]. ...
... Following normal casework protocol, we carefully scrutinized the results by first assessing the primary diagnostics [21]. We would have also further scrutinized the secondary diagnostics should it have been required [21]. Examining the perlocus LRs for these six observations, we noted that these were all a result of single-locus exclusions. ...
Article
Likelihood ratios (LR) differences between the probabilistic genotyping software EuroForMix and STRmix™ are examined. After considering differences in the allele probabilities, the LRs from both software for an unambiguous single‐source profile were identical (four significant figures). LRs from both software for an unambiguous single‐source profile with alleles previously unseen in the allele frequency database (rare alleles) were the same (three significant figures) for θ = 0.01. Due to differences in the minimum allele frequencies, the LRs differed by three orders of magnitude when θ = 0. For both software, the LRs for a single‐source dilution series decreased as the input amount decreased. The LRs from both software were within an order of magnitude for known contributors. The largest difference was where the target input amount was 0.0156 ng: The LREuroForMix was 2.1 × 1025 and the LRSTRmix was 8.0 × 1024. Both software show similar LR behavior with respect to mixture ratio. For two person mixtures the LR increases for both the major and the minor as the ratio moves away from 1:1. The LR for the major stabilizes at about 3:1 whereas the LR for the minor reaches its maximum at about 3:1 and then declines. Greater differences in LR were observed between EuroForMix and STRmix™ for mixtures. One‐hundred and twenty‐nine mixtures from the PROVEDIt dataset were compared. LRs for 84% of the comparisons for known contributors without rare alleles were within two orders of magnitude. Five divergent results were investigated, and a manual intervention approach was applied where appropriate.
... All the 564 log 10 LR B ≥ 3 corresponded to a component of the mixture where the STRmix™ assigned posterior mean template was less than 588 rfu. The STRmix™ template value is the mean of the posterior burn-in MCMC iterations used for template in the biological model implemented by STRmix™ [17]. The STRmix™ template value aligns with the approximate peak heights of alleles at the left-hand side of the electropherogram (before any degradation is applied). ...
Article
Relatives tend to have more DNA in common than unrelated people. The closer the biological relationship, the higher the chance of alleles being identical by descent between the individuals. Therefore, when considering a mixed DNA profile, close relatives of the true contributor may not always be excluded as a possible contributor to a mixture due to allele sharing. In these situations, it might be more appropriate under the alternate proposition to consider that the DNA could have originated from a relative of the person of interest rather than an unrelated individual. The probabilistic genotyping software STRmix™ automatically provides LRs considering close biological relatives as alternate sources of the DNA. In this paper, we investigate the support for siblings of the true contributor to a mixture (who are not present in the mixture themselves). We interpret the mixtures and assign LRs using STRmix™ and investigate whether the resulting LRs could be used to indicate whether the true contributor could be a sibling of the POI. Most siblings will have one or more alleles that are not observed in the mixture profile. Support for siblings to have contributed can only occur when allelic dropout is a possibility at the loci where the siblings have alleles that are not observed in the profile. In these data, that was only observed in components with assigned template of 588 rfu or less.
... Allele and stutter peak height variability as separate constants within the MCMC V2.0 [15] Peak height variability as random variables within the MCMC V2.3 [196] Model for calibrating laboratory peak height variability V2.0 [196] Application of a Gaussian random walk to the MCMC process V2.3 [205] Modelling of back stutter by regressing stutter ratio against allelic designation V2.0 [156,197,206,207] Modelling of back stutter by regressing stutter ratio against LUS V2.3 [156,162,206,207] Modelling of forward stutter V2.4 [157] Modelling of allelic drop-in using a simple exponential or uniform distribution V2.0 [15] Modelling of allelic drop-in using a γ distribution V2. Modelling expected stutter peak heights in saturated data V2.3 [157] Taking into account the 'factor of two' in LR calculations V2.3 [104] Model for incorporating prior beliefs in mixture proportions V2.3 [210] Combining DNA profiles produced under different conditions into a single analysis V2.5 [155] Assigning a range for the number of contributors to a DNA profile V2.6 [164] Mixture-to-mixture comparison to identify common DNA donors V2.7 [20] A top-down DNA search approach V2.8 [74] The diagnostic outputs of STRmix™ V2.3 [211] The publications relating to STRmix™ models (and some initial validation work) were initially all described in the Taylor et al. publication "The interpretation of single source and mixed DNA profiles" [15]. Since that publication, updates in models occurred over time and so by necessity have appeared in numerous publications. ...
Article
Full-text available
Probabilistic genotyping has become widespread. EuroForMix and DNAStatistX are both based upon maximum likelihood estimation using a γ model, whereas STRmix™ is a Bayesian approach that specifies prior distributions on the unknown model parameters. A general overview is provided of the historical development of probabilistic genotyping. Some general principles of interpretation are described, including: the application to investigative vs. evaluative reporting; detection of contamination events; inter and intra laboratory studies; numbers of contributors; proposition setting and validation of software and its performance. This is followed by details of the evolution, utility, practice and adoption of the software discussed.
... The disadvantage is that carrying out the MCMC in this manner does not allow easy subsequent LR calculation, i.e. if we were interested in the potential DNA contribution of five different people to a complex sample, then the MCMC would need to be run five times, once for each person. There is also a difficultly facilitating human review of the MCMC, as one of the main intuitive diagnostic outputs are the weights [15]. ...
Preprint
Full-text available
Two methods for applying a lower bound to the variation induced by the Monte Carlo effect are trialled. One of these is implemented in the widely used probabilistic genotyping system, STRmix ™ . Neither approach is giving the desired 99% coverage. In some cases the coverage is much lower than the desired 99%. The discrepancy (i.e. the distance between the LR corresponding to the desired coverage and the LR observed coverage at 99%) is not large. For example, the discrepancy of 0.23 for approach 1 suggests the lower bounds should be moved downwards by a factor of 1.7 to achieve the desired 99% coverage. Although less effective than desired these methods provide a layer of conservatism that is additional to the other layers. These other layers are from factors such as the conservatism within the sub-population model, the choice of conservative measures of co-ancestry, the consideration of relatives within the population and the resampling method used for allele probabilities, all of which tend to understate the strength of the findings. Highlights Two methods for quantifying Monte Carlo variability are tested, Both give less than the desired 99% coverage, The magnitude of possible discrepancy is small, For example an LR of 4.3 × 10 ¹¹ could be reported as 1.8 × 10 ¹² An LR of 18 could be reported as 22.
Article
Forensic genetic investigations typically rely on analysis of DNA for attribution purposes. There are times, however, when the amount and/or the quality of the DNA is limited, and thus little or no information can be obtained regarding the source of the sample. An alternative biochemical target that also contains genetic signatures is protein. One class of genetic signatures is protein polymorphisms that are a direct consequence of simple/single/short nucleotide polymorphisms (SNPs) in DNA. However, to interpret protein polymorphisms in a forensic context, certain complexities must be understood and addressed. These complexities include: 1) SNPs can generate 0, 1, or arbitrarily many polymorphisms in a polypeptide; and 2) as an object of expression that is modulated by alleles, genes and interactions with the environment, proteins may be present or absent in a given sample. To address these issues, a novel approach was taken to generate the expected protein alleles in a reference sample based on whole genome (or exome) sequence data and assess the significance of the evidence using a haplotype-based semi-continuous likelihood algorithm that leverages whole proteome data. Converting the genomic into the proteomic information allows for the zero-to-many relationship between SNPs and GVPs to be abstracted away. When viewed as a haplotype, many GVPs that correspond to the same SNP is equivalent to many SNPs in perfect linkage disequilibrium (LD). As long as the likelihood formulation correctly accounts for LD, the correspondence between the SNP and the proteome can be safely neglected. Tests were performed on simulated samples, including single-source and two-person mixtures, and the power of using a classical semi-continuous likelihood versus one that has been adapted to neglect drop-out was compared. Additionally, summary statistics and a rudimentary set of decision guidelines were introduced to help identify mixtures from protein data. Keywords
Article
To overcome the multifactorial complexity associated with the analysis and interpretation of the capillary electrophoresis results of forensic mixture samples, probabilistic genotyping methods have been developed and implemented as software, based on either qualitative or quantitative models. The former considers the electropherograms’ qualitative information (detected alleles), whilst the latter also takes into account the associated quantitative information (height of allele peaks). Both models then quantify the genetic evidence through the computation of a likelihood ratio (LR), comparing the probabilities of the observations given two alternative and mutually exclusive hypotheses. In this study, the results obtained through the qualitative software LRmix Studio (v.2.1.3), and the quantitative ones: STRmix™ (v.2.7) and EuroForMix (v.3.4.0), were compared considering real casework samples. A set of 156 irreversibly anonymized sample pairs (GeneMapper files), obtained under the scope of former cases of the Portuguese Scientific Police Laboratory, Judiciary Police (LPC-PJ), were independently analyzed using each software. Sample pairs were composed by (i) a mixture profile with either two or three estimated contributors, and (ii) a single contributor profile associated. In most cases, information on 21 short tandem repeat (STR) autosomal markers were considered, and the majority of the single-source samples could not be a priori excluded as belonging to a contributor to the paired mixture sample. This inter-software analysis shows the differences between the probative values obtained through different qualitative and quantitative tools, for the same input samples. LR values computed in this work by quantitative tools showed to be generally higher than those obtained by the qualitative. Although the differences between the LR values computed by both quantitative software showed to be much smaller, STRmix™ generated LRs are generally higher than those from EuroForMix. As expected, mixtures with three estimated contributors showed generally lower LR values than those obtained for mixtures with two estimated contributors. Different software products are based on different approaches and mathematical or statistical models, which necessarily result in the computation of different LR values. The understanding by the forensic experts of the models and their differences among available software is therefore crucial. The better the expert understands the methodology, the better he/she will be able to support and/or explain the results in court or any other area of scrutiny.
Article
Probabilistic genotyping software based on continuous models is effective for interpreting DNA profiles derived from DNA mixtures and small DNA samples. In this study, we updated our previously developed Kongoh software (to ver. 3.0.1) to interpret DNA profiles typed using the GlobalFilerTM PCR Amplification Kit. Recently, highly sensitive typing systems such as the GlobalFiler system have facilitated the detection of forward, double-back, and minus 2-nt stutters; therefore, we implemented statistical models for these stutters in Kongoh. In addition, we validated the new version of Kongoh using 2–4-person mixtures and DNA profiles with degradation in the GlobalFiler system. The likelihood ratios (LRs) for true contributors and non-contributors were well separated as the information increased (i.e., larger peak height and fewer contributors), and these LRs tended to neutrality as the information decreased. These trends were observed even in profiles with DNA degradation. The LR values were highly reproducible, and the accuracy of the calculation was also confirmed. Therefore, Kongoh ver. 3.0.1 is useful for interpreting DNA mixtures and degraded DNA samples in the GlobalFiler system.
Article
Slooten described a method of targeting major contributors in mixed DNA profiles and comparing them to individuals on a DNA database. The method worked by taking incrementally more peak information from the profile (based on the peak contribution), and using a semi-continuous model, calculating likelihood ratios for the comparison to database individuals. We describe the performance of this “top down approach” to profile interpretation within probabilistic genotyping software employing a fully continuous model. We interpret both complex constructed profiles where ground truth is known and casework profiles from non-suspect crimes. The interpretation of constructed four- and five- person mixtures demonstrated good discrimination power between contributors and non-contributors to the mixtures. Not all known contributors linked, and this is expected, particularly for minor contributors of DNA to the profile, or when the DNA from contributors was in relatively equal contributions. This finding was also reported by Slooten for the semi-continuous application of the approach. The maximum observed LR was shown to not exceed the LR obtained after a standard interpretation approach outside of that expected due to Monte Carlo variation. The interpretation of 91 complex profiles from no-suspect casework demonstrated that approximately 75% of profiles returned a link to someone on a database of known individuals. With an average of 110 no-suspect cases that fall into this too-complex category, the top down analysis, if applied to all such profiles, would represent an increase of 83 links per year of investigative information that could be provided to investigators. We foremost treat the results as investigative information, and have not yet decided on a LR threshold, or other criteria, for using them as evaluative.
Article
Probabilistic methods of DNA profile interpretation are being adopted by forensic laboratories worldwide. One commonality to all probabilistic genotyping software is an assignment of the strength of evidence using the likelihood ratio (LR). The probabilistic genotyping software STRmix™ reports a number of LRs that differ based on the propositions considered and the level within the hierarchy of propositions considered. Within this paper, we describe the different LRs assigned in a STRmix™ software report. This article is categorized under: • Forensic Biology > Interpretation of Biological Evidence • Forensic Biology > Forensic DNA Technologies
Article
Full-text available
Selected profiles typed at the Promega PowerPlex 21 (PP21) loci were examined to determine if a linear or exponential model best described the relationship between peak height and molecular weight. There were fewer large departures from observed and expected peak heights using the exponential model. The larger differences that were observed were exclusively at the high molecular weight loci. We conclude that the data supports the use of an exponential curve to model peak heights versus molecular weight in PP21 profiles. We believe this observation will improve our ability to model expected peak heights for use in DNA interpretation software.
Article
Full-text available
Repetitive sequences in the human genome called Short Tandem Repeats (STRs) are used in human identification for forensic purposes. Interpretation of DNA profiles generated using STRs is often problematic because of uncertainty in the number of contributors to the sample. Existing methods to identify the number of contributors work on the number of peaks observed and/or allele frequencies. We have developed a computational method called NOCIt that calculates the a posteriori probability (APP) on the number of contributors. NOCIt works on single source calibration data consisting of known genotypes to compute the APP for an unknown sample. The method takes into account signal peak heights, population allele frequencies, allele dropout and stutter—a commonly occurring PCR artifact. We tested the performance of NOCIt using 278 experimental and 40 simulated DNA mixtures consisting of one to five contributors with total DNA mass from 0.016 to 0.25 ng. NOCIt correctly identified the number of contributors in 86% of the experimental samples and in 73% of the simulated mixtures, while the accuracy of the best pre-existing method to determine the number of contributors was 72% for the experimental samples and 73% for the simulated mixtures. Moreover, NOCIt calculated the APP for the true number of contributors to be at least 1% in 96% of the experimental samples and in all the simulated mixtures.
Article
The interpretation of mixed profiles from DNA evidentiary material is one of the more challenging duties of the forensic scientist. Traditionally, analysts have used a “binary” approach to interpretation where inferred genotypes are either included or excluded from the mixture using a stochastic threshold and other biological parameters such as heterozygote balance, mixture ratio, and stutter ratios. As the sensitivity of STR multiplexes and capillary electrophoresis instrumentation improved over the past 25 years, coupled with the change in the type of evidence being submitted for analysis (from high quality and quantity (often single-source) stains to low quality and quantity (often mixed) “touch” samples), the complexity of DNA profile interpretation has equally increased. This review provides a historical perspective on the movement from binary methods of interpretation to probabilistic methods of interpretation. We describe the two approaches to probabilistic genotyping (semi-continuous and fully continuous) and address issues such as validation and court acceptance. Areas of future needs for probabilistic software are discussed.
Article
We report a large compilation of the internal validations of the probabilistic genotyping software STRmix™. Thirty one laboratories contributed data resulting in 2825 mixtures comprising three to six donors and a wide range of multiplex, equipment, mixture proportions and templates. Previously reported trends in the LR were confirmed including less discriminatory LRs occurring both for donors and non-donors at low template (for the donor in question) and at high contributor number. We were unable to isolate an effect of allelic sharing. Any apparent effect appears to be largely confounded with increased contributor number.
Article
DNA-based human identity testing is conducted by comparison of PCR-amplified polymorphic Short Tandem Repeat (STR) motifs from a known source with the STR profiles obtained from uncertain sources. Samples such as those found at crime scenes often result in signal that is a composite of incomplete STR profiles from an unknown number of unknown contributors, making interpretation an arduous task. To facilitate advancement in STR interpretation challenges we provide over 25,000 multiplex STR profiles produced from one to five known individuals at target levels ranging from one to 160 copies of DNA. The data, generated under 144 laboratory conditions, are classified by total copy number and contributor proportions. For the 70% of samples that were synthetically compromised, we report the level of DNA damage using quantitative and end-point PCR. In addition, we characterize the complexity of the signal by exploring the number of detected alleles in each profile.
Book
Intended as a companion to the Fundamentals of Forensic DNA Typing volume published in 2009, Advanced Topics in Forensic DNA Typing: Methodology contains 18 chapters with 4 appendices providing up-to-date coverage of essential topics in this important field and citation to more than 2800 articles and internet resources. The book builds upon the previous two editions of John Butler's internationally acclaimed Forensic DNA Typing textbook with forensic DNA analysts as its primary audience. This book provides the most detailed information written to-date on DNA databases, low-level DNA, validation, and numerous other topics including a new chapter on legal aspects of DNA testing to prepare scientists for expert witness testimony. Over half of the content is new compared to previous editions. A forthcoming companion volume will cover interpretation issues. - Contains the latest information - hot-topics and new technologies - Well edited, attractively laid out, and makes productive use of its four-color format.
Article
The interpretation of DNA evidence can entail analysis of challenging STR typing results. Genotypes inferred from low quality or quantity specimens, or mixed DNA samples originating from multiple contributors, can result in weak or inconclusive match probabilities when a binary interpretation method and necessary thresholds (such as a stochastic threshold) are employed. Probabilistic genotyping approaches, such as fully continuous methods that incorporate empirically determined biological parameter models, enable usage of more of the profile information and reduce subjectivity in interpretation. As a result, software-based probabilistic analyses tend to produce more consistent and more informative results regarding potential contributors to DNA evidence. Studies to assess and internally validate the probabilistic genotyping software STRmix™ for casework usage at the Federal Bureau of Investigation Laboratory were conducted using lab-specific parameters and more than 300 single-source and mixed contributor profiles. Simulated forensic specimens, including constructed mixtures that included DNA from two to five donors across a broad range of template amounts and contributor proportions, were used to examine the sensitivity and specificity of the system via more than 60,000 tests comparing hundreds of known contributors and non-contributors to the specimens. Conditioned analyses, concurrent interpretation of amplification replicates, and application of an incorrect contributor number were also performed to further investigate software performance and probe the limitations of the system. In addition, the results from manual and probabilistic interpretation of both prepared and evidentiary mixtures were compared. The findings support that STRmix™ is sufficiently robust for implementation in forensic laboratories, offering numerous advantages over historical methods of DNA profile analysis and greater statistical power for the estimation of evidentiary weight, and can be used reliably in human identification testing. With few exceptions, likelihood ratio results reflected intuitively correct estimates of the weight of the genotype possibilities and known contributor genotypes. This comprehensive evaluation provides a model in accordance with SWGDAM recommendations for internal validation of a probabilistic genotyping system for DNA evidence interpretation
Article
In 2015 the Scientific Working Group on DNA Analysis Methods published the SWGDAM Guidelines for the Validation of Probabilistic Genotyping Systems [1]. STRmix™ is probabilistic genotyping software that employs a continuous model of DNA profile interpretation. This paper describes the developmental validation activities of STRmix™ following the SWGDAM guidelines. It addresses the underlying scientific principles, and the performance of the models with respect to sensitivity, specificity and precision and results of interpretation of casework type samples. This work demonstrates that STRmix™ is suitable for its intended use for the interpretation of single source and mixed DNA profiles.
Article
The sensitivity and resolution of modern DNA profiling hardware is such that forensic laboratories generate more data than they have resources to analyse. One coping mechanism is to set a threshold, above the minimum required by instrument noise, so that weak peaks are screened out. In binary interpretations of forensic profiles, the impact of this threshold (sometimes called an analytical threshold – AT) was minimal as interpretations were often limited to a clear major component. With the introduction of continuous typing systems, the interpretation of weak minor components of mixed DNA profiles is possible and consequently the consideration of peaks just above or just below the analytical threshold becomes relevant. We investigate here the occurrence of low-level DNA profile information, specifically that which falls below the analytical threshold. We investigate how it can be dealt with and the consequences of each choice in the framework of continuous DNA profile interpretation systems. Where appropriate we illustrate how these can be implemented using the probabilistic interpretation software STRmix. We demonstrate a feature of STRmix that allows the analyst to guide the software, using human observation that there is a low-level contributor present, through user-designated prior distributions for contributor mixture proportions.
Article
In forensic DNA analysis a DNA extract is amplified using polymerase chain reaction (PCR), separated using capillary electrophoresis and the resulting DNA products are detected using fluorescence. Sampling variation occurs when the DNA molecules are aliquotted during the PCR setup stage and this translates to variability in peak heights in the resultant electropherogram or between electropherograms generated from a DNA extract. Beyond the variability caused by sampling variation it has been observed that there are factors in generating the DNA profile that can contribute to the magnitude of variability observed, most notably the number of PCR cycles. In this study we investigate a number of factors in the generation of a DNA profile to determine which contribute to levels of peak height variability.
Article
STRs (short tandem repeat) have become popular DNA repeat markers because they are easily amplified by the polymerase chain reaction (PCR) without the problems of differential amplification. This is because both alleles from a heterozygous individual are similar in size since the repeat size is small. The number of repeats in STR markers can be highly variable among individuals, which makes these STRs effective for human identification purposes. STR repeats sequences are named by the length of the repeat unit. STR markers are usually identified in one of the two ways: searching DNA sequence databases such as GenBank for regions with more than six or so contiguous repeat units or performing molecular biology isolation methods. STRs are often divided into several categories based on the repeat pattern. Simple repeats contain units of identical length and sequence; compound repeats comprise two or more adjacent simple repeats; and complex repeats may contain several repeat blocks of variable unit length as well as variable intervening sequences. An STR typing kit consists of five components: a PCR primer mixture containing oligonucleotides designed to amplify a set of STR loci, a PCR buffer containing deoxynucleotide triphosphates, MgCl2, and other reagents necessary to perform PCR, a DNA polymerase, which is sometimes premixed with the PCR buffer, an allelic ladder with common alleles for the STR loci being amplified to enable calibration of allele repeat size, and a positive control DNA sample to verify that the kit reagents are working properly.
Article
In response to requests from the forensic community, commercial companies are generating larger, more sensitive, and more discriminating STR multiplexes. These multiplexes are now applied to a wider range of samples including complex multi-person mixtures. In parallel there is an overdue reappraisal of profile interpretation methodology. Aspects of this reappraisal include In this work we present a full scheme for validation of a new multiplex that is suitable for informing modern interpretation practice. We predominantly use GlobalFiler™ as an example multiplex but we suggest that the aspects investigated here are fundamental to introducing any multiplex in the modern interpretation environment.
Article
Forward stutter, or over stutter, one repeat unit length larger than the parent allele (N+1 stutter), is a relatively rare product of the PCR amplification of short tandem repeats used in forensic DNA analysis. We have investigated possible explanatory variables for the occurrence and size of forward stutter for four different autosomal multiplexes. In addition, we have investigated models used to predict the expected heights of forward stutter. For all tetra and penta-nucleotide repeats we can find no correlation between allelic peak height, marker or longest uninterrupted sequence in the allele. The data fit a gamma distribution with no explanatory variables. For the single tri-nucleotide repeat present in two of the four multiplexes (D22S1045) forward stutter is much more common and the best explanatory variable appears to be back stutter height. This suggests some fundamental co-causation of high backward and forward stutter for this locus. This article is protected by copyright. All rights reserved.
Article
DNA profile interpretation has benefitted from recent improvements that use semi-continuous or fully continuous methods to interpret information within an electropherogram. These methods are likelihood ratio based and currently require that a number of contributors be assigned prior to analysis. Often there is ambiguity in the choice of number of contributors, and an analyst is left with the task of determining what they believe to be the most probable number. The choice can be particularly important when the difference between two possible contributor numbers means the difference between excluding a person of interest as being a possible contributor, and producing a statistic that favours their inclusion. Presenting both options in a court of law places the decision with the court. We demonstrate here an MCMC method of correctly weighting analyses of DNA profile data spanning a range of contributors. We explore the theoretical behaviour of such a weight and demonstrate these theories using practical examples. We also highlight the issues with omitting this weight term from the LR calculation when considering different numbers of contributors in the one calculation.
Article
A method for interpreting autosomal mixed DNA profiles based on continuous modelling of peak heights is described. MCMC is applied with a model for allelic and stutter heights to produce a probability for the data given a specified genotype combination. The theory extends to handle any number of contributors and replicates, although practical implementation limits analyses to four contributors. The probability of the peak data given a genotype combination has proven to be a highly intuitive probability that may be assessed subjectively by experienced caseworkers. Whilst caseworkers will not assess the probabilities per se, they can broadly judge genotypes that fit the observed data well, and those that fit relatively less well. These probabilities are used when calculating a subsequent likelihood ratio. The method has been trialled on a number of mixed DNA profiles constructed from known contributors. The results have been assessed against a binary approach and also compared with the subjective judgement of an analyst.
Article
Determining the number of contributors to a forensic DNA mixture using maximum allele count is a common practice in many forensic laboratories. In this paper, we compare this method to a maximum likelihood estimator, previously proposed by Egeland et al., that we extend to the cases of multiallelic loci and population subdivision. We compared both methods’ efficiency for identifying mixtures of two to five individuals in the case of uncertainty about the population allele frequencies and partial profiles. The proportion of correctly resolved mixtures was >90% for both estimators for two- and three-person mixtures, while likelihood maximization yielded success rates 2- to 15-fold higher for four- and five-person mixtures. Comparable results were obtained in the cases of uncertain allele frequencies and partial profiles. Our results support the use of the maximum likelihood estimator to report the number of contributors when dealing with complex DNA mixtures.
Article
In the forensic examination of DNA mixtures, the question of how to set the total number of contributors (N) presents a topic of ongoing interest. Part of the discussion gravitates around issues of bias, in particular when assessments of the number of contributors are not made prior to considering the genotypic configuration of potential donors. Further complication may stem from the observation that, in some cases, there may be numbers of contributors that are incompatible with the set of alleles seen in the profile of a mixed crime stain, given the genotype of a potential contributor. In such situations, procedures that take a single and fixed number contributors as their output can lead to inferential impasses. Assessing the number of contributors within a probabilistic framework can help avoiding such complication. Using elements of decision theory, this paper analyses two strategies for inference on the number of contributors. One procedure is deterministic and focuses on the minimum number of contributors required to 'explain' an observed set of alleles. The other procedure is probabilistic using Bayes' theorem and provides a probability distribution for a set of numbers of contributors, based on the set of observed alleles as well as their respective rates of occurrence. The discussion concentrates on mixed stains of varying quality (i.e., different numbers of loci for which genotyping information is available). A so-called qualitative interpretation is pursued since quantitative information such as peak area and height data are not taken into account. The competing procedures are compared using a standard scoring rule that penalizes the degree of divergence between a given agreed value for N, that is the number of contributors, and the actual value taken by N. Using only modest assumptions and a discussion with reference to a casework example, this paper reports on analyses using simulation techniques and graphical models (i.e., Bayesian networks) to point out that setting the number of contributors to a mixed crime stain in probabilistic terms is, for the conditions assumed in this study, preferable to a decision policy that uses categoric assumptions about N.
Article
A general method, suitable for fast computing machines, for investigating such properties as equations of state for substances consisting of interacting individual molecules is described. The method consists of a modified Monte Carlo integration over configuration space. Results for the two-dimensional rigid-sphere system have been obtained on the Los Alamos MANIAC and are presented here. These results are compared to the free volume equation of state and to a four-term virial coefficient expansion.
Article
A general method, suitable for fast computing machines, for investigating such properties as equations of state for substances consisting of interacting individual molecules is described. The method consists of a modified Monte Carlo integration over configuration space. Results for the two-dimensional rigid-sphere system have been obtained on the Los Alamos MANIAC and are presented here. These results are compared to the free volume equation of state and to a four-term virial coefficient expansion. The Journal of Chemical Physics is copyrighted by The American Institute of Physics.
Article
Stutter is an artefact seen when amplifying short tandem repeats and typically occurs at one repeat unit shorter in length than the parent allele. In forensic analysis, stutter complicates the analysis of DNA profiles from multiple contributors, known as mixed profiles, a common profile type. Consequently it is important to both understand and predict stutter behaviour in order to improve our understanding of the resolution and interpretation of these profiles. Whilst stutter is well recognised and documented, little information is available that identifies and quantifies what influences the formation of stutter. In this work we use a novel approach to examine this. We have used synthetic oligonucleotides comprising multiple repeat units to test; the influence of repeat number, the influence of repeat sequence and the impact of interruptions to the repeat sequence length. Using multiple replicates allows detailed statistical analysis. We have confirmed a linear relationship between stutter ratio and repeat number. We have shown that increased A-T content increases stutter ratio and that interruptions in repeating sequences decreased stutter ratios to levels similar to the longest uninterrupted repeat stretch. We also found that there was no relationship between stutter ratio and repeat number for a repeat unit with an A-T content of 1/4 and that half of the interrupted repeat sequences stuttered significantly less than their longest uninterrupted repeat stretches. We have applied the knowledge gained to examine specific features of the loci present in the AmpFlSTR(®) SGM Plus(®) multiplex kit used in our laboratory.
Article
The interpretation of mixed DNA profiles presents additional challenges for the forensic scientist. There has been a broad based call for transparency in the process of interpretation of all evidence including mixed DNA profiles. This interpretation is greatly facilitated by a sound understanding of the variability in peak heights for the two peaks of a heterozygote, in the sizes of stutter peaks and in the variability in peak heights across loci. This study examines single source and mixed DNA profiles to assess this variability. The relative variability in peak height between the two peaks of a heterozygote and in the peak heights across loci becomes greater as the peaks themselves become smaller. This is consistent with findings from other multiplexes. This variability appears larger in the MiniFiler™ system at 30 cycles than, for example, in the Identifiler™ system at 28 cycles and this difference is largely explained by the two extra cycles of amplification. Stutter peaks appear no larger in the MiniFiler™ system at 30 cycles than in the Identifiler™ system at 28 cycles.
Article
The Gibbs sampler, the algorithm of Metropolis and similar iterative simulation methods are potentially very helpful for summarizing multivariate distributions. Used naively, however, iterative simulation can give misleading answers. Our methods are simple and generally applicable to the output of any iterative simulation; they are designed for researchers primarily interested in the science underlying the data and models they are analyzing, rather than for researchers interested in the probability theory underlying the iterative simulations themselves. Our recommended strategy is to use several independent sequences, with starting points sampled from an overdispersed distribution. At each step of the iterative simulation, we obtain, for each univariate estimand of interest, a distributional estimate and an estimate of how much sharper the distributional estimate might become if the simulations were continued indefinitely. Because our focus is on applied inference for Bayesian posterior distributions in real problems, which often tend toward normality after transformations and marginalization, we derive our results as normal-theory approximations to exact Bayesian inference, conditional on the observed simulations. The methods are illustrated on a random-effects mixture model applied to experimental measurements of reaction times of normal and schizophrenic patients.
Article
SUMMARY A generalization of the sampling method introduced by Metropolis et al. (1953) is presented along with an exposition of the relevant theory, techniques of application and methods and difficulties of assessing the error in Monte Carlo estimates. Examples of the methods, including the generation of random orthogonal matrices and potential applications of the methods to numerical problems arising in statistics, are discussed.
Article
PCR amplification of tetrameric short tandem repeats (STRs) can lead to Taq enzyme slippage and artefact products typically one repeat unit less in size than the parent STR. These back stutter or n-4 amplification products are low-level relative to the amplification of the parent STR but are widely seen in the forensic community where tetrameric STRs are employed in the generation of DNA profiles. To aid the interpretation of DNA mixtures where minor contributor(s) might be present in comparable amounts to the back stutter products, the typical amounts of back stutter generated have been well characterised and guidelines for interpretation are in place. However, further artefacts thought to be Taq enzyme slippage leading to products with one repeat unit greater than the parent sequence (n+4 or forward stutter) or two repeats less (n-8 or double back stutter) also occur, but these have not been well characterised despite their potential influence in mixture interpretations. Here we present findings with respect to these additional artefacts from a study of 10,000 alleles and include guidelines for interpretation.
Article
Tandemly reiterated sequences represent a rich source of highly polymorphic markers for genetic linkage, mapping, and personal identification. Human trimeric and tetrameric short tandem repeats (STRs) were studied for informativeness, frequency, distribution, and suitability for DNA typing and genetic mapping. The STRs were highly polymorphic and inherited stably. A STR-based multiplex PCR for personal identification is described. It features fluorescent detection of amplified products on sequencing gels, specific allele identification, simultaneous detection of independent loci, and internal size standards. Variation in allele frequencies were explored for four U.S. populations. The three STR loci (chromosomes 4, 11, and X) used in the fluorescent multiplex PCR have a combined average individualization potential of 1/500 individuals. STR loci appear common, being found every 300-500 kb on the X chromosome. The combined frequency of polymorphic trimeric and tetrameric STRs could be as high as 1 locus/20 kb. The markers should be useful for genetic mapping, as they are sequence based, and can be multiplexed with the PCR. A method enabling rapid localization of STRs and determination of their flanking DNA sequences was developed, thus simplifying the identification of polymorphic STR loci. The ease by which STRs may be identified, as well as their genetic and physical mapping utility, give them the properties of useful sequence tagged sites (STSs) for the human genome initiative.
Article
Dinucleotide repeat polymorphisms (‘microsatellites’) are usually typed by resolving the products of PCR amplification on denaturing acrylamide gels. With this methodology, an allele consists not of a single fragment, but rather of a ladder of fragments, typically separated by intervals of 2nt. Mechanisms that have been invoked to explain the generation of these ‘shadow bands’ include slipped strand mispairing occurring during the PCR and artefactual ‘recombination’ caused by out-of-register annealing of truncated PCR products. The D11S527 locus contains the microsatellite sequence (GT)n(CTGT)m. By performing direct sequencing of PCR products derived from individuals homozygous at D11S527, we show that these products vary in length due solely to variations in the length of the dinucleotide repeat tract. These results rule out PCR recombination and support slipped strand mispairing as the major mechanism for generation of shadow bands.
Article
The PCR amplification of tetranucleotide short tandem repeat (STR) loci typically produces a minor product band 4 bp shorter than the corresponding main allele band; this is referred to as the stutter band. Sequence analysis of the main and stutter bands for two sample alleles of the STR locus vWA reveals that the stutter band lacks one repeat unit relative to the main allele. Sequencing results also indicate that the number and location of the different 4 bp repeat units vary between samples containing a typical verses low proportion of stutter product. The results also suggest that the proportion of stutter product relative to the main allele increases as the number of uninterrupted core repeat units increases. The sequence analysis and results obtained using various DNA polymerases appear to support the slipped strand displacement model as a potential explanation for how these stutter products are generated.
Article
Several years ago, a theory to interpret mixed DNA profiles was proposed that included a consideration of peak area using the method of least squares. This method of mixture interpretation has not been widely adopted because of the complexity of the associated calculations. Most reporting officers (RO) employ an experience and judgement based approach to the interpretation of mixed DNA profiles. Here we present an approach that has formalised the thinking behind this experience and judgement. This has been written into a computer program package called PENDULUM. The program uses a least squares method to estimate the pre-amplification mixture proportion for two potential contributors. It then calculates the heterozygous balance for all of the potential sets of genotypes. A list of "possible" genotypes is generated using a set of heuristic rules. External to the programme the candidate genotypes may then be used to formulate likelihood ratios (LR) that are based on alternative casework propositions. The system does not represent a black box approach; rather it has been integrated into the method currently used by the reporting officers at the Forensic Science Service (FSS). The time saved in automating routine calculations associated with mixtures analysis is significant. In addition, the computer program assists in unifying reporting processes, thereby improving the consistency of reporting.
Article
Y chromosome-specific short tandem repeat (Y-STR) analysis has become another widely accepted tool for human identification. The PowerPlex Y System is a fluorescent multiplex that includes the 12 loci: DYS19, DYS385a/b, DYS389I/II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438 and DYS439. This panel of markers incorporates the 9-locus European minimal haplotype (EMH) loci recommended by the International Y-STR User Group and the 11-locus set recommended by the Scientific Working Group on DNA Analysis Methods (SWGDAM). Described here are inter-laboratory results from 17 developmental validation studies of the PowerPlex Y System and include the following results: (a) samples distributed between laboratories and commercial standards produced expected and reproducible haplotypes; (b) use of common amplification and detection instruments were successfully demonstrated; (c) full profiles were obtained with standard 30 and 32 cycle amplification protocols and cycle number (24-28 cycles) could be modified to match different substrates (such as direct amplification of FTA paper); (d) complete profiles were observed with reaction volumes from 6.25 to 50 microL; (e) minimal impact was observed with variation of enzyme concentration; (f) full haplotypes were observed with 0.5-2x primer concentrations; however, relative yield between loci varied with concentration; (g) reduction of magnesium to 1mM (1.5 mM standard) resulted in minimal amplification, while only partial loss of yield was observed with 1.25 mM magnesium; (h) decreasing the annealing temperature by 2-4 degrees C did not generate artifacts or locus dropout and most laboratories observed full amplification with the annealing temperature increased by 2 degrees C and significant locus dropout with a 4 degrees C increase in annealing temperature; (i) amplification of individual loci with primers used in the multiplex produced the same alleles as observed with the multiplex amplification; (j) all laboratories observed full amplification with >or = 125 pg of male template with partial and/or complete profiles observed using 30-62.5 pg of DNA; (k) analysis of < or = 500 ng of female DNA did not yield amplification products; (l) the minor male component of a male/female mixture was observed with < or =1200-fold excess female DNA with the majority of alleles still observed with 10,000-fold excess female; (m) male/male mixtures produced full profiles from the minor contributor with 10-20-fold excess of the major contributor; (n) average stutter for each locus; (o) precision of sizing were determined; (p) human-specificity studies displayed amplification products only with some primate samples; and (q) reanalysis of 102 non-probative casework samples from 65 cases produced results consistent with original findings and in some instances additional identification of a minor male contributor to a male/female mixture was obtained. In general, the PowerPlex Y System was shown to have the sensitivity, specificity and reliability required for forensic DNA analysis.
  • Buckleton J. S.