Article

Assessment of the stochastic threshold, back- and forward stutter filters and low template techniques for NGM

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The AmpFlSTR(®) NGM™ kit shows an increased sensitivity compared to previous AmpFlSTR(®) kits, and the addition of a 29th PCR cycle was found to be the major cause for this. During in-house validation, we evaluated whether the increased sensitivity requires elevation of the stochastic threshold (below which alleles are prone to drop out due to low template amplification effects). To determine the stochastic threshold, over 500 false homozygotes were examined and the threshold was set at the rfu value where 99% of the alleles had a peak height below this value. Using 2085 Dutch reference samples, locus-specific stutter ratios were empirically determined and compared with the ones provided by Applied Biosystems. Application of sharp stutter filters is especially important for the analysis of unequal mixtures. To prevent allele calling of 99% of the -1 repeat unit stutters, thirteen stutter ratio filters could be lowered by up to 1.79% and for two loci the stutter ratio filters had to be elevated slightly with a maximum of 0.06%. At all loci +1 repeat stutters were visible for the higher DNA inputs and for lower inputs at the tri-nucleotide repeat locus D22S1045 as well. The overall +1 stutter ratio filter was set to 2.50% and for D22S1045 it was determined to be 7.27%. To find the optimal strategy to sensitise genotyping for low template DNA samples, a comparison was made between enhancing the capillary electrophoresis settings (9kV for 10s) and increasing the number of PCR cycles (29+5 cycles).

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Profile analysis was performed using Genemapper ID-X version 1.1.1 (Life Technologies), a detection threshold of 50 relative fluorescence units (rfu) and locus-specific stutter ratio thresholds as described in Ref. [40]. Samples resulting in incomplete profiles with less than eight detected alleles were subjected to +5-cycle NGM PCR amplification [40]. ...
... (Life Technologies), a detection threshold of 50 relative fluorescence units (rfu) and locus-specific stutter ratio thresholds as described in Ref. [40]. Samples resulting in incomplete profiles with less than eight detected alleles were subjected to +5-cycle NGM PCR amplification [40]. Amplification products were separated and analysed as described above using 1.5× stutter filter thresholds [40]. ...
... Samples resulting in incomplete profiles with less than eight detected alleles were subjected to +5-cycle NGM PCR amplification [40]. Amplification products were separated and analysed as described above using 1.5× stutter filter thresholds [40]. Samples resulting in profiles with less than eight detected alleles in the 29 + 5-cycle STR profiles were subjected to mtDNA analysis using a SNaPshot assay [37]. ...
Article
Full-text available
When postmortem intervals (PMIs) increase such as with longer burial times, human remains suffer increasingly from the taphonomic effects of decomposition processes such as autolysis and putrefaction. In this study, various DNA analysis techniques and a messenger RNA (mRNA) profiling method were applied to examine for trends in nucleic acid degradation and the postmortem interval. The DNA analysis techniques include highly sensitive DNA quantitation (with and without degradation index), standard and low template STR profiling, insertion and null alleles (INNUL) of retrotransposable elements typing and mitochondrial DNA profiling. The used mRNA profiling system targets genes with tissue specific expression for seven human organs as reported by Lindenbergh et al. (Int J Legal Med 127:891-900, 27) and has been applied to forensic evidentiary traces but not to excavated tissues. The techniques were applied to a total of 81 brain, lung, liver, skeletal muscle, heart, kidney and skin samples obtained from 19 excavated graves with burial times ranging from 4 to 42 years. Results show that brain and heart are the organs in which both DNA and RNA remain remarkably stable, notwithstanding long PMIs. The other organ tissues either show poor overall profiling results or vary for DNA and RNA profiling success, with sometimes DNA and other times RNA profiling being more successful. No straightforward relations were observed between nucleic acid profiling results and the PMI. This study shows that not only DNA but also RNA molecules can be remarkably stable and used for profiling of long-buried human remains, which corroborate forensic applications. The insight that the brain and heart tissues tend to provide the best profiling results may change sampling policies in identification cases of degrading cadavers.
... Usually allelic peaks are included in the analysis based on a peak height range [10,13] in order to exclude saturated signals (which overestimates the stutter ratio). There are several ways to handle stutters that share position 10 : (1) exclude the peak to avoid additive effects, (2) include the peak [14], or (3) assign the peak to the backward stutter [13]. ...
... Usually allelic peaks are included in the analysis based on a peak height range [10,13] in order to exclude saturated signals (which overestimates the stutter ratio). There are several ways to handle stutters that share position 10 : (1) exclude the peak to avoid additive effects, (2) include the peak [14], or (3) assign the peak to the backward stutter [13]. ...
... The results (plotted in Supplemental Figures 11 and 12) were in agreement with the internal validation study using pristine samples. Similar to previous reports [10,13,15,16] the general trend was to observe more stutter with increased number of repeats. ...
... To adopt stutters into MPS probabilistic models, it is necessary to characterize their behavior. Extensive analyses have been reported for CE forensic data, and variability has been explained mostly by (1) number of nucleotides of the core repetition unit (or motif), (2) ATcontent and (3) the length of the uninterrupted stretch (LUS) [13,[22][23][24]. ...
... For this stutter type, proportions could be modelled as two separate regressions: (1) PTUS with lengths of 7-9 repeats and (2) PTUS with lengths of 10-13 repeats of the proportions ( Supplementary Fig. S4 plot A). This is consistent with the study in [24]. (1), have been observed. ...
Article
Full-text available
Interpretation of DNA evidence involving mixtures is challenging when alleles from minor contributors coincide with stutters from major contributors. To accommodate this, it is important to have a good understanding of stutter sequence formation trends. Here, multiple stutter types were characterized based on MPS data from 387 single source samples, using the Verogen ForenSeq™ DNA Signature Prep kit. A beta regression model was used to investigate the relationship between the stutter proportion and candidate explanatory variables. In the final model, stutter proportions were explained by the length of the parental uninterrupted stretch (PTUS), which is comparable to block length of the missing motif (BLMM). Also, different stutter types (n+1, n-1, n+2, n-2, n0) were analyzed separately per locus. The fitted stutter models were then integrated into an extended probabilistic genotyping model based on EuroForMix (MPSproto). An illustrative minor/major mock mixture example is discussed. Evaluation of multiple types of stutters on a per locus basis improved the probabilistic genotyping result compared to the conventional EuroForMix model, using the LUS+ nomenclature.
... Participating laboratories provided a table with genotyping results and the raw data sample files (FSA files) with the printouts of the obtained electropherograms. Heterozygous peak height ratio (PHR) was calculated for a given locus by dividing the peak height of an allele with a lower relative fluorescence units (RFU) value by the peak height of an allele with a higher RFU value in a heterozygous pair, and then multiplying this value by 100 to express the PHR as a percentage [24]. For the PHR, the mean, standard deviation (SD), median, minimum and maximum were calculated. ...
... To identify if a peak was a true allele or a stutter, we applied stutter ratio filters for different STR loci. Peaks below that filters were considered stutters [24]. Results from positive control, sample 1 and sample 2, each in triplicate, were utilized to determine the −1 and + 1 stutter ratio thresholds percentage. ...
Article
Full-text available
Cannabis sativa is the most used controlled substance in Europe. With the advent of new and less restrictive European laws on cannabis sale for recreational use (including in Italy), an increase in indoor cannabis crops were observed. This increase was possible due to the availability of cannabis seeds through the internet market. Genetic identification of cannabis can link seizures and if in possession then might aid in an investigation. A 13-locus multiplex STR method was previously developed and validated by Houston et al. A collaborative exercise was organized by the Italian Forensic Geneticists – International Society of Forensic Genetics (Ge.F.I. – ISFG) Working Group with the aim to test the reproducibility, reliability and robustness of this multiplex cannabis STR kit. Twenty-one laboratories from three European countries participated in the collaborative exercise and were asked to perform STR typing of two cannabis samples. Cannabis DNA samples and the multiplex STR kit were provided by the University of Barcelona and Sam Houston State University. Different platforms for PCR amplification, capillary electrophoresis (CE) and genotyping software were selected at the discretion of the participating laboratories. Although the participating laboratories used different PCR equipment, CE platforms and genotyping software, concordant results were obtained from the majority of the samples. The overall genotyping success ratio was 96%. Only minor artifacts were observed. The mean peak height ratio was estimated to be 76.3% and 78.1% for sample 1 and sample 2, respectively. The lowest amount of -1 / + 1 stutter percentage produced, when the height of the parent allele was higher than 8000 RFU, resulted to be less than 10% of the parent allele height. Few common issues were observed such as a minor peak imbalance in some heterozygous loci, some artifact peaks and few instances of allelic drop-out. The results of this collaborative exercise demonstrated the robustness and applicability of the 13-locus system for cannabis DNA profiling for forensic purposes.
... DNA recovered at a crime scene is often found in low quantity and may be highly degraded. This affects the efficiency of the PCR and may lead to partial profiles with frequent locus and allele drop-outs, and in some situations also allele drop-ins, due to stochastic effects [1][2][3][4][5][6][7][8][9][10][11][12][13]. We will only consider drop-outs. ...
... S.B. Vilsen et al.Forensic Science International: Genetics 37 (2018)[6][7][8][9][10][11][12] ...
Article
We used a Poisson-gamma model to analyse the allele coverage of autosomal short tandem repeat (STR) systems obtained by massively parallel sequencing (MPS). The Poisson-gamma coverage model was created using the peak height models from capillary electrophoresis (CE) based detection of PCR products as a starting point. The CE models were modified to account for the differences between CE and MPS signals by accounting for the large marker imbalances seen for MPS data and by using the Poisson-gamma distribution instead of the normal, log-normal, or gamma distributions that were applied for CE data. We took two approaches to estimate the marker imbalance parameters by (1) using a work-flow data base, and (2) using the results of replicate investigations of the samples. The Poisson-gamma model was used to estimate the rate of drop-outs of (1) single contributor dilution series experiments and (2) the minor contributor in two-person mixture samples. We examined the predictive capabilities of the model by comparing the observed and expected Brier scores of each sample. We derived the expected Brier scores and their variances to create asymptotic confidence intervals of the Brier scores. We found that the Poisson-gamma model performed well when using the work-flow data base, but that the replicate approach is not necessarily a viable option.
... Although the high degree of variation at STR-loci [1] provides useful discriminatory power for forensic and paternity cases, STRs are not the ideal marker type when degraded or mixed samples are involved. The interpretation of samples that have multiple contributors (and especially those with unequal contributions) can be complicated by the effects of slippage of DNA polymerases at the repeat stretches, resulting in stutter peaks that reside foremost at the n-1 position (representing products of one repeat unit less than the original allele length) [2]. Also, STR fragments with higher repeat numbers can be too long to allow amplification in severely degraded DNA samples [3]. ...
... Although many of the initial candidate loci were rejected, a final set of 16 MHs remained with expected inheritance of the haplotypes in the tested families and a high degree of variation in the population samples. With a varying number of haplotypes for each MH (2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19) and corresponding haplotype frequencies, the discriminating power is not as strong as STRs but the set of 16 loci still reaches strong random match probabilities (RMP) of: 1.0 × 10 −9 in the Asian population, 4.4 × 10 −11 in the Dutch population and 9.2 × 10 −13 in the African population. For identification purposes, our set of loci prove to be more informative than other alternative non-STR loci as can be observed from Table 3. ...
Article
Full-text available
Since two decades, short tandem repeats (STRs) are the preferred markers for human identification, routinely analysed by fragment length analysis. Here we present a novel set of short hypervariable autosomal microhaplotypes (MH) that have four or more SNPs in a span of less than 70 nucleotides (nt). These MHs display a discriminating power approaching that of STRs and provide a powerful alternative for the analysis;1;is of forensic samples that are problematic when the STR fragment size range exceeds the integrity range of severely degraded DNA or when multiple donors contribute to an evidentiary stain and STR stutter artefacts complicate profile interpretation. MH typing was developed using the power of massively parallel sequencing (MPS) enabling new powerful, fast and efficient SNP-based approaches. MH candidates were obtained from queries in data of the 1000 Genomes, and Genome of the Netherlands (GoNL) projects. Wet-lab analysis of 276 globally dispersed samples and 97 samples of nine large CEPH families assisted locus selection and corroboration of informative value. We infer that MHs represent an alternative marker type with good discriminating power per locus (allowing the use of a limited number of loci), small amplicon sizes and absence of stutter artefacts that can be especially helpful when unbalanced mixed samples are submitted for human identification.
... En cualquier caso, siempre es recomendable revisar la bibliografía a la búsqueda de cualquier actualización al respecto (p.ej. Westen et al., 2012). Como es el caso del método más o menos elaborado, propuesto por Gill y colaboradores (2009), para el cálculo del umbral estocástico (o como ellos proponen, "umbral low-template-DNA"), consistente en valorar la probabilidad de drop-out, representándola frente a la altura del alelo superviviente. ...
... El cálculo del ratio stutter, como ampliamente está descrito en la bibliografía (Moretti et al., 2001b;Leclair, 2004;Butler, 2005;Coble, 2010a;Westen et al., 2012), deberán consistir en dividir el valor en RFUs del pico -4 o +4 (en el caso del marcador D22S1045, en posiciones -3 y +3), por el pico alélico al que acompaña, y ese valor darlo en porcentaje. Los valores de stutter se deberán derivar de homocigotos y heterocigotos con alelos separados más de dos unidades de repetición (Moretti et al., 2001b), para evitar de ese modo efectos aditivos. ...
Technical Report
Full-text available
Recommendations of the GHEPMIX Commission for the acceptance and evaluation of mixtures profiles.
... The LUS has also been employed as part of an NGS data analysis tool to help filter stutter products and other noise from sequence-based STR data [60]. As has been noted [26,58,61], sequence variants within the repeat region may result in different lengths of the LUS for alleles of the same size/represented by the same repeat unit designation. Thus, it is clear that the LUS length itself could serve to distinguish many same-length alleles that differ by sequence. ...
... While the −1 AGAC stutter data was limited due to its lower incidence, the plot indicates that at the same LUS length, the AGAT portion of D12S391 produces more stutter than the AGAC portion of the repeat. Given its application to the AGAT and AGAC portions of the repeat separately, the LUS stutter model fit to the data (assessed by the R 2 value) is similar to what has been previously reported for some other STR loci [59,61]. ...
Article
Full-text available
Some of the expected advantages of next generation sequencing (NGS) for short tandem repeat (STR) typing include enhanced mixture detection and genotype resolution via sequence variation among non-homologous alleles of the same length. However, at the same time that NGS methods for forensic DNA typing have advanced in recent years, many caseworking laboratories have implemented or are transitioning to probabilistic genotyping to assist the interpretation of complex autosomal STR typing results. Current probabilistic software programs are designed for length-based data, and were not intended to accommodate sequence strings as the product input. Yet to leverage the benefits of NGS for enhanced genotyping and mixture deconvolution, the sequence variation among same-length products must be utilized in some form. Here, we propose use of the longest uninterrupted stretch (LUS) in allele designations as a simple method to represent sequence variation within the STR repeat regions and facilitate − in the near-term − probabilistic interpretation of NGS-based typing results. An examination of published population data indicated that a reference LUS region is straightforward to define for most autosomal STR loci, and that using repeat unit plus LUS length as the allele designator can represent greater than 80% of the alleles detected by sequencing. A proof of concept study performed using a freely available probabilistic software demonstrated that the LUS length can be used in allele designations when a program does not require alleles to be integers, and that utilizing sequence information improves interpretation of both single-source and mixed contributor STR typing results as compared to using repeat unit information alone. The LUS concept for allele designation maintains the repeat-based allele nomenclature that will permit backward compatibility to extant STR databases, and the LUS lengths themselves will be concordant regardless of the NGS assay or analysis tools employed. Further, these biologically-based, easy-to-derive designations uphold clear relationships between parent alleles and their stutter products, enabling analysis in fully continuous probabilistic programs that model stutter while avoiding the algorithmic complexities that come with string based searches. Though using repeat unit plus LUS length as the allele designator does not capture variation that occurs outside of the core repeat regions, this straightforward approach would permit the large majority of known STR sequence variation to be used for mixture deconvolution and, in turn, result in more informative mixture statistics in the near term. Ultimately, the method could bridge the gap from current length-based probabilistic systems to facilitate broader adoption of NGS by forensic DNA testing laboratories.
... Another potential difficulty associated with the CE detection of STRs is the background signal arising from stutter peaks [19], caused by slippage of the polymerase in the PCR. In DNA samples from a single person, genuine alleles and stutter alleles can be easily distinguished. ...
... D2S1338 allele 21). For complex STRs, the longest uninterrupted repeat stretch determines the stutter ratio [19] which is confirmed by our data as illustrated in Fig. 6. Here, detailed stutter graphs for D18S51 are shown for both methods; the dots of the alleles carrying an interrupted repeat motif (marked in red) tend to have lower stutter ratios than the uninterrupted alleles of the same length. ...
Article
Full-text available
Current forensic DNA analysis predominantly involves identification of human donors by analysis of short tandem repeats (STRs) using Capillary Electrophoresis (CE). Recent developments in Massively Parallel Sequencing (MPS) technologies offer new possibilities in analysis of STRs since they might overcome some of the limitations of CE analysis. In this study 17 STRs and Amelogenin were sequenced in high coverage using a prototype version of the Promega PowerSeq™ system for 297 population samples from the Netherlands, Nepal, Bhutan and Central African Pygmies. In addition, 45 two-person mixtures with different minor contributions down to 1% were analysed to investigate the performance of this system for mixed samples. Regarding fragment length, complete concordance between the MPS and CE-based data was found, marking the reliability of MPS PowerSeq™ system. As expected, MPS presented a broader allele range and higher power of discrimination and exclusion rate. The high coverage sequencing data were used to determine stutter characteristics for all loci and stutter ratios were compared to CE data. The separation of alleles with the same length but exhibiting different stutter ratios lowers the overall variation in stutter ratio and helps in differentiation of stutters from genuine alleles in mixed samples. All alleles of the minor contributors were detected in the sequence reads even for the 1% contributions, but analysis of mixtures below 5% without prior information of the mixture ratio is complicated by PCR and sequencing artefacts.
... A very basic example would correspond to rounding or binning of a measured continuous variable. Another example would consist of the application of detection thresholds, stutter filter, artefact pruning etc for a DNA profile [31,23]. In other words, we assume here that (d x , d y ) are the raw data, and that (e x , e y ) are the data after standard data cleaning ...
Preprint
We show that the incorporation of any new piece of information allows for improved decision making in the sense that the expected costs of an optimal decision decrease (or, in boundary cases where no or not enough new information is incorporated, stays the same) whenever this is done by the appropriate update of the probabilities of the hypotheses. Versions of this result have been stated before. However, previous proofs rely on auxiliary constructions with proper scoring rules. We, instead, offer a direct and completely general proof by considering elementary properties of likelihood ratios only. We do point out the relation to proper scoring rules. We apply our results to make a contribution to the debates about the use of score based/feature based and common/specific source likelihood ratios. In the literature these are often presented as different ``LR-systems''. We argue that deciding which LR to compute is simply a matter of the available information. There is no such thing as different ``LR-systems'', there are only differences in the available information. In particular, despite claims to the contrary, scores can very well be used in forensic practice and we illustrate this with an extensive example in DNA kinship context.
... Kapa-N and TruSeq-N methods performed better than the non-normalized ones. stochastic threshold [87,[91][92][93][94][95]. One of those methods is the use of the logistic regression curves to estimate the probability of drop-out as a function of either peak height [54,91] or average peak height of a profile [65,96]. ...
Article
The sequencing of STR markers provides additional information present in the underlying sequence variation that is typically masked by traditional fragment-based genotyping. However, the interpretation of STR profiles generated by targeted sequencing methods are susceptible to the same factors encountered in profiles processed through capillary gel electrophoresis. These factors include stochastic variation, noise, stutter artifacts, heterozygote imbalance, and allelic drop-out/in. Our goal is to characterize and understand how these behave in targeted sequence datasets. Here, we developed a framework using statistical tools to systematically interpret the characteristics of single-source DNA profiles generated by targeted sequencing. Sensitivity studies were performed using known single-source samples amplified with the PowerSeq 46GY System Prototype with varying DNA target masses ranging from 15 pg to 500 pg. The STR loci were subjected to DNA library preparation using two commercially available library kits and sequenced on the Illumina MiSeq platform. Raw FASTQ data files were analyzed in STRait Razor v2.0 without applying any thresholds (at a coverage ≥ 1). We investigated the effect of library normalization on average locus coverage and studied methods for setting analytical and zygosity thresholds. All the data were analyzed per DNA quantity as well as investigated per method. Analyses presented can be applied to sequence data generated by similar targeted sequencing panels and/or NGS platforms.
... The number of markers can be amplified simultaneously in the same aliquot with an optimized mix of primer pairs, but the reaction can be inefficient due to the presence of different inhibitor molecules. When amplifying a low level of sample DNA-with higher probability of mixed samples-an unequal, stochastic fluctuation can manifest, which may lead to a preferential, imbalanced presence of the allelic component, and the number of PCR cycles cannot be increased unlimitedly [17, [82][83][84]. ...
... Kapa-N and TruSeq-N methods performed better than the non-normalized ones. stochastic threshold [87,[91][92][93][94][95]. One of those methods is the use of the logistic regression curves to estimate the probability of drop-out as a function of either peak height [54,91] or average peak height of a profile [65,96]. ...
Article
Full-text available
Next Generation Sequencing (NGS) is transforming the landscape of the Short Tandem Repeat (STR) genotyping. NGS determines the length as well as the sequence of each allele, identifies polymorphisms in the repeat or adjacent DNA regions, allows for a greater degree of multiplexing, and generates Gigabases of reads in a single run. An additional step introduced into the DNA typing process with NGS methods is library preparation. The PCR amplicons of the targeted loci are further purified and modified with adapters and sample specific indices prior to sequencing. In this study, two PCR purification methods were used to cleanup amplification reactions prior to library construction: column-based and bead-based technologies. The influence of different PCR purification protocols on the dropout events was examined.
... Using highly deionized, purified formamide will significantly reduce ions and by-products in the formamide that may decompose the STR fragments (4,5). There are also size-exclusion matrices that can be used to remove very short DNA fragments, such as the amplification primers, and their use has been reported in articles for STR analysis (6,7). These columns can be used on selected samples, where increased sensitivity or removing unwanted small DNA fragments is desirable. ...
Article
Electrokinetic injection (EI) is the primary method used in forensic laboratories to load amplified PCR product in capillary electrophoresis for short tandem repeat (STR) fragment separation. Because all samples subjected to capillary electrophoresis use internal lane standard (ILS), this study investigated the consequence of varying the volume of ILS and its effects on allele peak heights and number of alleles detected. Results demonstrated that when the volume of ILS is reduced, the average peak height and number of alleles increased, thereby increasing the sensitivity of the detection method. Sizing anomalies were observed; however, they did not adversely affect accuracy and precision. The method developed in this study offers a simple and universal procedure to increase the alleles detected in forensic STR analysis. Reducing the volume of ILS to achieve greater sensitivity is applicable to all STR amplification kits and capillary electrophoresis instruments currently used in forensic DNA analysis.
... Further, a second threshold, the stochastic threshold, may be used as a tool to detect the presence of allelic peaks [7]. The traditional way to counter the effect of stutter is to apply a stutter ratio threshold, where the ratio is calculated by dividing the height of the peak in stutter position by the height of the allelic peak [8]- [11]. Other effects are generally not treated specifically [12]. ...
Conference Paper
Full-text available
For forensic purposes, short tandem repeat allele signals are used as DNA fingerprints. The interpretation of signals measured from samples has traditionally been conducted by applying thresholding. More quantitative approaches have recently been developed, but not for the purposes of identifying an appropriate signal model. By analyzing data from 643 single person samples, we develop such a signal model. Three standard classes of two-parameter distributions, one symmetric (normal) and two right-skewed (gamma and log-normal), were investigated for their ability to adequately describe the data. Our analysis suggests that additive noise is well modeled via the log-normal distribution class and that variability in peak heights is well described by the gamma distribution class. This is a crucial step towards the development of principled techniques for mixed sample signal deconvolution.
... Different analytical thresholds were calculated for each dye at a 99.9% confidence interval. Stochastic threshold was also determined with a confidence interval of 99.9% as previously described [19]. ...
... For determination of the stochastic threshold, a larger number of low template DNA samples are required. For practical use stochastic threshold is chosen at a relative fluorescent unit value for which 99% of the single alleles on heterozygous loci were below it [10]. ...
Article
Full-text available
Rapid advancements in forensic DNA technology has resulted in its increasing use to resolve crime cases, particularly in the detection of low-level DNA traces. This has been made possible by the increasing sensitivity of STR typing kits. Low-template DNA analysis requires careful consideration of the derived stochastic variations that lead to heterozygote imbalance, allele drop-out and increased detection of background contamination. The relevance of the evidence and the probative value of the DNA profile are important issues in the evaluation of forensic evidence.
... The maximum observed stutter ratio (SR) is routinely estimated by individual laboratories as part of an internal validation of a new multiplex, amplification protocol or analysis platform. Previous work has investigated the longest uninterrupted sequence (LUS) as a predictor of stutter for both autosomal [9,14,15] and Y STR profiles [16]. It has been shown that alleles with large LUS values stutter more than alleles with small LUS values and plausibly also amplify less. ...
Article
Highly polymorphic markers, such as microsatellites, are invaluable for the study of natural populations. However, contemporary methods for genotyping highly polymorphic variants have serious drawbacks that impede their efficiency. We created Polly, an R package with C++ source code that uses Illumina short‐read data to genotype microsatellites, detect highly polymorphic variants and identify clusters of highly polymorphic SNPs, indels and microsatellites. We tested Polly on short‐read data from Xiphophorus birchmanni (Teleostei: Poeciliidae) and Arabidopsis thaliana , finding it to be efficient and accurate both for microsatellite genotyping and polymorphic marker detection. This program can be applied to any diploid population for which there exists short‐read data and at least one scaffolded reference genome.
Article
Interpretation of crime stain profiles is one of the challenging tasks of forensic scientists. Before using the probabilistic genotyping system, forensic scientists should interpret crime stain profiles using analytical and stochastic thresholds. According to the guidelines presented by the Scientific Working Group on DNA Analysis Methods, these thresholds shall be based on and supported by applicable internal validation studies. In this study, we performed the internal validation study of our DNA typing system to determine these thresholds. A total of 350 DNA samples and 11 negative controls were amplified using GlobalFilerTM PCR Amplification Kit with 30 cycles and PCR products were then analyzed on a SeqStudio Genetic Analyzer. As a result, the analytical threshold was set to 110 RFU, which has high sensitivity to obtain almost full profiles of 0.063 ng DNA, and has high specificity not to detect most of the pull-up and drop-in peaks. The stochastic threshold was set to 890 RFU allowing a 1% probability of allelic drop-out based on logistic regression. These thresholds are useful for interpreting crime stain profiles in our GlobalFiler system.
Article
As DNA typing systems have become increasingly sensitive in recent years, probability distribution models for back, forward, double-back, and minus 2-nt stutter ratios have been desired to be considered in DNA evidence interpretation using specific software programs. However, experimental investigations have been insufficient, especially for forward, double-back, and minus 2-nt stutters. In this study, we experimentally reevaluated the probability distribution models for each stutter ratio in the typing systems of GlobalFilerTM PCR Amplification Kit and 3500xL Genetic Analyzer from Thermo Fisher Scientific. In addition, to enhance the reliability of longest uninterrupted stretch (LUS) values and corrected allele numbers used in previously developed models for stutter ratios using sequence information (i.e., LUS model and multi-seq model), we propose the weighted average of LUS values and corrected allele numbers based on the number of observations in sequence-based population data. Back stutter ratios demonstrated a positive correlation with allele numbers (allele model) in eight loci, LUS values (LUS model) in eight loci, and corrected allele numbers (multi-seq model) in five loci. The forward stutter ratios (FSRs) of D22S1045 followed the LUS model. FSRs other than D22S1045 and double-back stutter ratios followed the LUS model by considering multiple loci together. Minus 2-nt stutter ratios observed in SE33 and D1S1656 did not increase with the increase in the allele numbers. The adopted models for each stutter ratio can be implemented in software programs for DNA evidence interpretation and enable a reliable interpretation of crime stain profiles in forensic caseworks.
Article
In this study, we propose a stutter ratio for a minus two base pair stutter (-2bpSR) model of the D1S1656 locus in capillary electrophoresis (CE)-based short tandem repeat (STR) typing. DNA from a total of 108 Japanese individuals was analyzed via massively parallel sequencing to investigate the length of the longest uninterrupted stretch of two base repeat motif (2bpLUS value) within repetitive structures involving the flanking region. Additionally, -2bpSR data was collected using the GlobalFiler Kit on a 3500xL Genetic Analyzer. As a result of sequencing analysis, all alleles were classified into two types by their 2bpLUS values. The -2bpSR differed significantly between the types. Then, we modeled the -2bpSR with a mixture log-normal distribution using the classification of alleles based on the 2bpLUS values. Furthermore, probabilities of the sequence type within each repeat number in the mixture log-normal distribution model were estimated using logistic regression for each of the five major detected populations. This study is expected to enable interpretation of STR typing while considering minus two base pair stutter at the D1S1656 locus.
Article
Full-text available
It is widely recognized that microhaps are powerful markers for different forensic purposes, mainly due to their advantages of both short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs), including multiple alleles, low mutation rate and absence of stutter peaks. In the present study, a panel of 60 microhap loci was developed and utilized in forensic kinship analysis as a preliminary study. Genotyping of microhap was performed by massively parallel sequencing (MPS) and haplotypes were directly achieved from sequence reads of 73 samples from Chinese Han population. We observed that 49 out of 60 loci have effective number of alleles (Ae) greater than 3.0 and 10 out of 60 have values above 4.0, with an average value of 3.5598. The heterozygosity values were in a range from 0.5840 to 0.8546 with an average of 0.7268 and the cumulative power of exclusion value of the 60 loci is equal to 1–4.78 × 10(18. Moreover, we demonstrated the applicability of this method by different relationship inference problems, including identification of single parent‐offspring, full‐sibling, and second‐degree relative. The results indicated that the assembled microhap panel provided more power for relationship inference, than commonly used STR or SNP system. This article is protected by copyright. All rights reserved
Thesis
Full-text available
DNA profiling has revolutionised forensic science as a powerful tool in criminal investigations. Court decisions can have an enormous impact on a person’s life. It is therefore essential to characterise the molecular biology processes used in forensic DNA profiling assays to define its limitations and ensure confident interpretation. Knowledge is acquired through validation of the DNA analysis kit. Traditionally, it has been both time-consuming and resource-intensive. Published data often use different kinds of statistics, preventing direct comparisons. The aims of the thesis were to approach this problem by developing software packages to 1) automate characterisation of forensic STR DNA profiling assays and 2) simulate the molecular biology processes used. We developed the free and open source computer software STR-validator to contribute to standardisation and improved quality of validation while allowing laboratories to greatly reduce the time spent analysing data. We developed PCR-sim to simulate the entire DNA analysis process. Benefits of using STR-validator was exemplified by exploratory analysis of validation data and comparison to published results. Amplification of low-template and degraded samples were simulated using PCR-sim to explore the heterozygote balance of diploid and haploid cells and the results were compared to empirical data. By exploiting information from today’s standard quantification kits, it is possible to predict the outcome of the DNA profile. Negative controls collected from the routine DNA analysis process was analysed to identify possible artefacts and contamination. Single molecule amplification was performed to support the underlying theory. STR-validator is widely used, promoting faster implementation of new and better DNA profiling kits worldwide. Simulations can be used to optimise analysis of essential casework samples, and to explore properties of samples that are difficult to create in the laboratory. ISBN 978-82-8377-319-4
Article
Pentameric-repeat short tandem repeats (STRs), consisting of loci with repeat units of five base-pairs, have the advantage of reduced stutter products compared to their tetrameric-repeat STR counterparts. This characteristic potentially helps the interpretation of mixed DNA profiles when minor component alleles may coincide with stutter peaks from the major components. To develop a simple but informative forensic multiplex with the capability to aid mixture interpretation, we designed an 11-plex assay of nine pentameric STRs new to forensic analysis plus two male- specific markers: DYS391 and the Y-Indel rs2032678 used in GlobalFiler™ (Life Technologies). East Asian-specific variation in the recently adopted Y-Indel rs2032678 is reported in this study for the first time in its forensic use as a sex marker. We estimated the levels of variation observed in the nine pentameric STRs in three of the major population groups sampled in the HGDP-CEPH human genome diversity panel: African, European and East Asian (combining individual populations as their sample sizes were too small for STR allele frequency estimations); and we include genotype data from a population sample of Northwest Spain. From this data, forensic informativeness metrics were estimated when applying the nine novel STRs in identification or kinship analyses. The assay was assessed for forensic sensitivity and ability to successfully genotype highly degraded DNA. In the profiles from the 11-plex assay we observed an average 2.15% stutter ratio in all the pentameric loci compared to 7.32% across equivalently-sized tetrameric STRs in the Promega Powerplex® ESX-17 kit.
Article
The interpretation of complex DNA profiles may differ between laboratories and reporting officers, which can lead to discrepancies in the final reports. In this study, we assessed the intra and inter laboratory variation in DNA mixture interpretation for three European ISO17025-accredited laboratories. To this aim, 26 reporting officers analyzed five sets of DNA profiles. Three main aspects were considered: 1) whether the mixed DNA profiles met the criteria for comparison to a reference profile, 2) the actual result of the comparison between references and DNA profiling data and 3) whether the weight of the DNA evidence could be assessed. Similarity in answers depended mostly on the complexity of the tasks. This study showed less variation within laboratories than between laboratories which could be the result of differences between internal laboratory guidelines and methods and tools available. Results show the profile types for which the three laboratories report differently, which informs indirectly on the complexity threshold the laboratories employ. Largest differences between laboratories were caused by the methods available to assess the weight of the DNA evidence. This exercise aids in training forensic scientists, refining laboratory guidelines and explaining differences between laboratories in court. Undertaking more collaborative exercises in future may stimulate dialog and consensus regarding interpretation. For training purposes, DNA profiles of the mixed stains and questioned references are made available.
Article
The investigation of the performance of models to interpret complex DNA profiles is best undertaken using real DNA profiles. Here we used a data set to reflect the variety typically encountered in real casework. The "crime-stains" were constructed from known individuals and comprised a total of 59 diverse samples: pristine DNA/DNA extracted from blood, 2-3 person mixtures, degradation/no-degradation, differences in allele sharing, dropout/no dropout, etc. Two siblings were also included in the test-set in order to challenge the systems. Two kinds of analyses were performed, namely tests on whether a person of interest is a contributor based on weight-of-evidence (likelihood ratio) calculations, and deconvolution test to estimate the profile of unknown constituent parts. The weight-of-evidence analyses compared LRmix Studio with EuroForMix including exploration of the effect of applying an ad hoc stutter-filter. For the deconvolution analysis we compared EuroForMix with LoCIM-tool. When we classified persons of interests into being true contributors or non-contributors, we found that EuroForMix, overall, returned a higher true positive rate for the same false positive levels compared to LRmix. In particular, in cases with an unknown major component, EuroForMix was more discriminating for mixtures where the person of interest was a minor contributor. Comparing deconvolution of major contributors we found that EuroForMix overall performed better than LoCIM-tool.
Chapter
Complex mixtures, which are defined here as biological samples containing DNA with three or more contributors, exhibit several significant challenges. First, allele sharing will occur at many of the loci tested, making it challenging to unambiguously discern the full genotypes of the mixture contributors. Second, complex mixtures are likely to contain low-template DNA (LTDNA) for one or more of the contributors since PCR reactions are usually run with 1 ng or less of total DNA. Each additional contributor to a mixture means a dilution of one or more of the contributors into the stochastic danger zone where allele drop-out is more likely. Concepts developed by two-person mixtures like stochastic thresholds will not always be applicable with mixtures containing three or more contributors, largely because of the possibility of allele sharing.
Article
We assessed various approaches for DNA profiling using the same total amount of DNA. The choice of profiling approach affects genotyping success and may in addition affect the likelihood ratio (LR).
Article
Interpretation of DNA mixtures with three or more contributors, defined here as high order mixtures, is difficult because of the inevitability of allele sharing. Allele sharing complicates the estimation of the number of contributors, which is an important parameter to assess the probative value. Consequently, these mixtures may not be deemed suitable for interpretation and reporting. In this study, we generated three-, four- and five-person mixtures with little or no drop-out and with varying levels of allele sharing. For these DNA mixtures we computed likelihood ratios (LRs) using the LRmix model, and always using persons of interest that are true contributors. We assessed the influence of different scenarios on the LR, and used (1) the true or an incorrect number of contributors, (2) zero, one or two anchored individuals and (3) an equal number of contributors under Hp and Hd or an extra contributor under Hd. It was shown that the LR varied considerably when the hypotheses used an incorrect number of contributors, especially when individuals were anchored under the hypotheses. Overall, when analysing high order mixtures, there may occur a transition from LR greater than one to less than one if an incorrect number of contributors is conditioned. This is a result of allele sharing among the multiple contributors rather than allele drop-out, since this study only utilised samples with little or no drop-out. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Article
Minute amounts of DNA representing only few diploid cells, may be interrogated using enhanced DNA profiling, which will be accompanied by stochastic amplification effects. Notwithstanding, a weight of evidence statistic may be calculated using current interpretation software. In this study, we profiled single donor, two- and three-person samples having only 3 pg to 12 pg of DNA per contributor using both standard and enhanced capillary electrophoresis (CE) injection settings. Likelihood ratios (LRs) were computed using LRmix Studio, compared for both types of profiles and examined in relation to the amount of DNA, drop-out level, number of detected alleles, peak heights and reproducibility of alleles. Especially for DNA profiles that were generated using enhanced CE, the obtained LRs could indicate strong evidence in favour of the prosecution (log10(LR) > 6), also when the amount of DNA represented about half of a diploid cell equivalent in the amplification. These results illustrate that an assessment of the criminalistic relevance of a sample carrying minute amounts of DNA is essential prior to applying enhanced interrogation techniques and/or calculating a weight of evidence statistic.
Article
The introduction of Short Tandem Repeat (STR) DNA was a revolution within a revolution that transformed forensic DNA profiling into a tool that could be used, for the first time, to create National DNA databases. This transformation would not have been possible without the concurrent development of fluorescent automated sequencers, combined with the ability to multiplex several loci together. Use of the polymerase chain reaction (PCR) increased the sensitivity of the method to enable the analysis of a handful of cells. The first multiplexes were simple: 'the quad', introduced by the defunct UK Forensic Science Service (FSS) in 1994, rapidly followed by a more discriminating 'six-plex' (Second Generation Multiplex) in 1995 that was used to create the world's first national DNA database. The success of the database rapidly outgrew the functionality of the original system - by the year 2000 a new multiplex of ten-loci was introduced to reduce the chance of adventitious matches. The technology was adopted world-wide, albeit with different loci. The political requirement to introduce pan-European databases encouraged standardisation - the development of European Standard Set (ESS) of markers comprising twelve-loci is the latest iteration. Although development has been impressive, the methods used to interpret evidence have lagged behind. For example, the theory to interpret complex DNA profiles (low-level mixtures), had been developed fifteen years ago, but only in the past year or so, are the concepts starting to be widely adopted. A plethora of different models (some commercial and others non-commercial) have appeared. This has led to a confusing 'debate' about the 'best' to use. The different models available are described along with their advantages and disadvantages. A section discusses the development of national DNA databases, along with details of an associated controversy to estimate the strength of evidence of matches. Current methodology is limited to searches of complete profiles - another example where the interpretation of matches has not kept pace with development of theory. STRs have also transformed the area of Disaster Victim Identification (DVI) which frequently requires kinship analysis. However, genotyping efficiency is complicated by complex, degraded DNA profiles. Finally, there is now a detailed understanding of the causes of stochastic effects that cause DNA profiles to exhibit the phenomena of drop-out and drop-in, along with artefacts such as stutters. The phenomena discussed include: heterozygote balance; stutter; degradation; the effect of decreasing quantities of DNA; the dilution effect. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Article
When dealing with mixed DNA profiles where contributors have donated DNA in unequal amounts, it is often useful to deduce the genotype of the major contributor. Inference of a major contributor's genotype empowers storage of the DNA profile in a DNA database (DDB), which is especially of interest in cases without a suspect. When a major contributor's genotype cannot be inferred straightforwardly, for instance because low level components are present, replicate analyses can be prepared and combined into a consensus profile [1]. Here we describe an automated and freely available tool to deduce the major component's alleles in mixed consensus DNA profiles. In these consensus profiles, theoretical peak heights (PHs) are assigned to the alleles using the sum of the PHs in the individual amplifications. The LoCIM-tool (Locus Classification & Inference of the Major-tool) uses these PHs plus parameters on the stochastic threshold, heterozygote balance (HB) and major to minor(s) ratio to classify every locus as a type 1, type 2 or type 3 locus, which represent classes of increasing complexity. Based on the type of locus, the LoCIM-tool applies an inclusion percentage to deduce the alleles for the major contributor. Using the LoCIM-tool, 99.9% of all type 1 loci and 96.7% of all type 2 loci were inferred correctly from a large set of consensus DNA profiles that were generated from mixtures varying for the mixture ratio, amount of DNA per contributor, number of contributors, quality of DNA, and allele sharing among the contributors. For type 3 loci, we aimed at inferring the major contributor's alleles and possibly extra alleles, which occurred for 87.2% of all type 3 loci analysed using the LoCIM-tool. When compared to the overall results of manual inference by a group of forensic scientists, the LoCIM-tool obtains a higher percentage of correctly inferred loci. From our results, we conclude that the LoCIM-tool presents an objective, uniform and fast method to reliably deduce alleles of a major component.
Article
DNA mixtures are challenging not only at low template DNA level but also at highly balanced quantitative ratio. In this latter case, interpretation may be complicated by the joint action of combinatorial uncertainty and stochastic effects of the PCR. We explore this particular and so far little noticed aspect of mixture interpretation by first providing a complete quantitative combinatorial analysis of the two-person mixture model (2PM) at highly balanced ratio of contributors, and then by carrying out a calibration study of the 2PM model on good quality experimental mixtures. The calibration tests provided the evidence for the existence of irregular distribution of peak heights, that can misguide the correct genotype assignment at high template ratios too. Repeating the experiment, performing Bayesian analysis to the whole evidence and developing a careful joint prediction of all plausible genotype datasets is highly mandatory in these cases, prior to set evidentiary LRs and use them in court.
Article
Heterozygote imbalances leading to allele drop-outs and disproportionally large stutters leading to allele drop-ins are known stochastic phenomena related to STR typing of low-template DNA (LtDNA). The large stutters and the many drop-ins in typical STR stutter positions are artifacts from the PCR amplification of tandem repeats. These artifacts may be avoided by typing bi-allelic markers instead of STRs. In this work, the SNPforID multiplex assay was used to type LtDNA. A sensitized SNP typing protocol was introduced, that increased signal strengths without increasing noise and without affecting the heterozygote balance. Allele drop-ins were only observed in experiments with 25 pg of DNA and not in experiments with 50 and 100 pg of DNA. The allele drop-in rate in the 25 pg experiments was 0.06% or 100 times lower than what was previously reported for STR typing of LtDNA. A composite model and two different consensus models were used to interpret the SNP data. Correct profiles with 42–49 SNPs were generated from the 50 and 100 pg experiments, whereas a few incorrect genotypes were included in the generated profiles from the 25 pg experiments. With the strict consensus model, between 35 and 48 SNPs were correctly typed in the 25 pg experiments and only one allele drop-out (error rate: 0.07%) was observed in the consensus profiles.
Article
DNA profiles from degraded samples often suffer from information loss at the longer short tandem repeat (STR) loci. Sensitising the reactions, either by performing additional PCR cycles or increasing the capillary electrophoresis injection settings, carries the risk of over-amplifying or overloading the shorter fragments. We explored whether profiling of degraded DNA can be improved by preferential capturing of the longer amplified fragments. To this aim, a post-PCR purification protocol was developed that is based on AMPure XP beads that have size-selective properties. A comparison was made with an unselective post-PCR purification system (DTR gel filtration) and no purification of the PCR products. Besides a set of differently and serially degraded single source samples, unequal mixtures of degraded DNAs were analysed, in order to extract more genotyping information for the minor contributor without overloading the major component at the shorter amplicons. Purification by the AMPure protocol resulted in higher peak heights especially for the longer amplicons, while DTR gel filtration gave higher peaks for all amplicon sizes. Both purification methods presented more detected alleles, with the AMPure protocol performing slightly better, on average. In conclusion, the in-house developed AMPure protocol can be employed to improve STR profile analysis of degraded single source and (unequally) mixed DNA samples.
Article
Full-text available
In a recent contribution to this journal Grisedale and Van Daal concluded that a single STR analysis of all available template DNA is to be preferred over replicate analyses and a consensus approach when analyzing low template DNA samples. A single STR analysis approach does not allow for an assessment of the validity of the resulting DNA profile. We argue that the use of replicate amplifications is the best way to objectively quantify the extent of the stochastic variation in the data. By applying consensus methodology and/or a probabilistic model, the interpretation of the data will therefore be more objective and reliable.
Article
The autosomal short tandem repeat (STR) kits that are currently used in forensic science have a high discrimination power. However, this discrimination power is sometimes not sufficient for complex kinship analyses or decreases when alleles are missing due to degradation of the DNA. The Investigator HDplex kit contains nine STRs that are additional to the commonly used forensic markers, and we validated this kit to assist human identification. With the increasing number of markers it becomes inevitable that forensic and kinship analyses include two or more STRs present on the same chromosome. To examine whether such markers can be regarded as independent, we evaluated the 30 STRs present in NGM, Identifiler and HDplex. Among these 30 markers, 17 syntenic STR pairs can be formed. Allelic association between these pairs was examined using 335 Dutch reference samples and no linkage disequilibrium was detected, which makes it possible to use the product rule for profile probability calculations in unrelated individuals. Linkage between syntenic STRs was studied by determining the recombination fraction between them in five three-generation CEPH families. The recombination fractions were compared to the physical and genetic distances between the markers. For most types of pedigrees, the kinship analyses can be performed using the product rule, and for those cases that require an alternative calculation method (Gill et al., Forensic Sci Int Genet 6:477-486, 2011), the recombination fractions as determined in this study can be used. Finally, we calculated the (combined) match probabilities, for the supplementary genotyping results of HDplex, NGM and Identifiler.
Article
Full-text available
DNA analysis is frequently used to acquire information from biological material to aid enquiries associated with criminal offences, disaster victim identification and missing persons investigations. As the relevance and value of DNA profiling to forensic investigations has increased, so too has the desire to generate this information from smaller amounts of DNA. Trace DNA samples may be defined as any sample which falls below recommended thresholds at any stage of the analysis, from sample detection through to profile interpretation, and can not be defined by a precise picogram amount. Here we review aspects associated with the collection, DNA extraction, amplification, profiling and interpretation of trace DNA samples. Contamination and transfer issues are also briefly discussed within the context of trace DNA analysis. Whilst several methodological changes have facilitated profiling from trace samples in recent years it is also clear that many opportunities exist for further improvements.
Article
Full-text available
PCR amplification of tetrameric short tandem repeats (STRs) can lead to Taq enzyme slippage and artefact products typically one repeat unit less in size than the parent STR. These back stutter or n-4 amplification products are low-level relative to the amplification of the parent STR but are widely seen in the forensic community where tetrameric STRs are employed in the generation of DNA profiles. To aid the interpretation of DNA mixtures where minor contributor(s) might be present in comparable amounts to the back stutter products, the typical amounts of back stutter generated have been well characterised and guidelines for interpretation are in place. However, further artefacts thought to be Taq enzyme slippage leading to products with one repeat unit greater than the parent sequence (n+4 or forward stutter) or two repeats less (n-8 or double back stutter) also occur, but these have not been well characterised despite their potential influence in mixture interpretations. Here we present findings with respect to these additional artefacts from a study of 10,000 alleles and include guidelines for interpretation.
Article
Full-text available
An allelic ladder containing amplified sequences of seven alleles of the polymorphic human tyrosine hydroxylase locus, HUMTH01, was constructed and employed as a standard marker. Sequence analysis of each ladder component indicates that fragments differ by integral multiples of the AATG core repeat sequence characteristic of this locus. Individual alleles are designated "5" through "11," according to the number of complete reiterations of the core repeat contained within them. Comparison of the HUMTH01 allelic ladder with DNA samples amplified at this locus revealed core repeat length heterogeneity (i.e., deletions or insertions shorter than one core repeat) within the human population. In particular, a common allele was identified which migrates more quickly than allele 10, but more slowly than allele 9, on electrophoresis through a denaturing polyacrylamide gel. Sequence analysis of this allele, designated "10-1," reveals lack of a single adenine normally present in the seventh copy of the AATG. The allelic ladder was used to reevaluate previously published population data. Results of testing for Hardy-Weinberg equilibrium and population substructure were not altered significantly by these modifications.
Article
In this validation study, we have evaluated the efficacy and the validity of the SGM Plus test using an amplification regime of 34 cycles. We obtained valid DNA typing results from pristine extracts with an extremely low DNA content. In this context, the aspects of single cell PCR typing were also evaluated. In these experiments, the allele dropout phenomenon was clearly demonstrated. From actual casework samples, we obtained conclusive DNA profiles from highly purified extracts of bone and teeth that failed to demonstrate typing results using the standard PCR protocol of 28 cycles. Moreover, low copy number (LCN) DNA typing offered us the possibility to reanalyse crime samples that failed to produce a conclusive profile after 28 cycles. Unfortunately, several complications accompany ultrasensitive PCR amplification. During our validation studies, we have observed increased risk of contamination, allelic dropout, locus dropout and heightened stutters. Analyses of heterozygote balance, between-loci balance and stutter heights, show that the 34-cycle PCR has its own characteristic features. We finally show that reamplification of SGM Plus PCR products by an extra 6 PCR cycles offers a promising new alternative if too little of the original sample extract is left for a complete reanalysis. D 2003 Elsevier Science B.V. All rights reserved.
Article
Forensic laboratories employ various approaches to obtain short tandem repeat (STR) profiles from minimal traces (<100 pg DNA input). Most approaches aim to sensitize DNA profiling by increasing the amplification level by a higher cycle number or enlarging the amount of PCR products analyzed during capillary electrophoresis. These methods have limitations when unequal mixtures are genotyped, since the major component will be over-amplified or over-loaded. This study explores an alternative strategy for improved detection of the minor components in low template (LT) DNA typing that may be better suited for the detection of the minor component in mixtures. The strategy increases the PCR amplification efficiency by extending the primer annealing time several folds. When the AmpFℓSTR(®) Identifiler(®) amplification parameters are changed to an annealing time of 20 min during all 28 cycles, the drop-out frequency is reduced for both pristine DNA and single or multiple donor mock case work samples. In addition, increased peak heights and slightly more drop-ins are observed while the heterozygous peak balance remains similar as with the conventional Identifiler protocol. By this extended protocol, full DNA profiles were obtained from only 12 sperm heads (which corresponds to 36 pg of DNA) that were collected by laser micro dissection. Notwithstanding the improved detection, allele drop-outs do persist, albeit in lower frequencies. Thus a LT interpretation strategy such as deducing consensus profiles from multiple independent amplifications is appropriate. The use of extended PCR conditions represents a general approach to improve detection of unequal mixtures as shown using four commercially available kits (AmpFℓSTR(®) Identifiler, SEfiler Plus, NGM and Yfiler). The extended PCR protocol seems to amplify more of the molecules in LT samples during PCR, which results in a lower drop-out frequency.
Article
Stutter is an artefact seen when amplifying short tandem repeats and typically occurs at one repeat unit shorter in length than the parent allele. In forensic analysis, stutter complicates the analysis of DNA profiles from multiple contributors, known as mixed profiles, a common profile type. Consequently it is important to both understand and predict stutter behaviour in order to improve our understanding of the resolution and interpretation of these profiles. Whilst stutter is well recognised and documented, little information is available that identifies and quantifies what influences the formation of stutter. In this work we use a novel approach to examine this. We have used synthetic oligonucleotides comprising multiple repeat units to test; the influence of repeat number, the influence of repeat sequence and the impact of interruptions to the repeat sequence length. Using multiple replicates allows detailed statistical analysis. We have confirmed a linear relationship between stutter ratio and repeat number. We have shown that increased A-T content increases stutter ratio and that interruptions in repeating sequences decreased stutter ratios to levels similar to the longest uninterrupted repeat stretch. We also found that there was no relationship between stutter ratio and repeat number for a repeat unit with an A-T content of 1/4 and that half of the interrupted repeat sequences stuttered significantly less than their longest uninterrupted repeat stretches. We have applied the knowledge gained to examine specific features of the loci present in the AmpFlSTR(®) SGM Plus(®) multiplex kit used in our laboratory.
Article
To analyze DNA samples with very low DNA concentrations, various methods have been developed that sensitize short tandem repeat (STR) typing. Sensitized DNA typing is accompanied by stochastic amplification effects, such as allele drop-outs and drop-ins. Therefore low template (LT) DNA profiles are interpreted with care. One can either try to infer the genotype by a consensus method that uses alleles confirmed in replicate analyses, or one can use a statistical model to evaluate the strength of the evidence in a direct comparison with a known DNA profile. In this study we focused on the first strategy and we show that the procedure by which the consensus profile is assembled will affect genotyping reliability. In order to gain insight in the roles of replicate number and requested level of reproducibility, we generated six independent amplifications of samples of known donors. The LT methods included both increased cycling and enhanced capillary electrophoresis (CE) injection [1]. Consensus profiles were assembled from two to six of the replications using four methods: composite (include all alleles), n-1 (include alleles detected in all but one replicate), n/2 (include alleles detected in at least half of the replicates) and 2× (include alleles detected twice). We compared the consensus DNA profiles with the DNA profile of the known donor, studied the stochastic amplification effects and examined the effect of the consensus procedure on DNA database search results. From all these analyses we conclude that the accuracy of LT DNA typing and the efficiency of database searching improve when the number of replicates is increased and the consensus method is n/2. The most functional number of replicates within this n/2 method is four (although a replicate number of three suffices for samples showing >25% of the alleles in standard STR typing). This approach was also the optimal strategy for the analysis of 2-person mixtures, although modified search strategies may be needed to retrieve the minor component in database searches. From the database searches follows the recommendation to specifically mark LT DNA profiles when entering them into the DNA database.
Article
The CEPH human genome diversity cell line panel (CEPH-HGDP) of 51 globally distributed populations was used to analyze patterns of variability in 20 core human identification STRs. The markers typed comprised the 15 STRs of Identifiler, one of the most widely used forensic STR multiplexes, plus five recently introduced European Standard Set (ESS) STRs: D1S1656, D2S441, D10S1248, D12S391 and D22S1045. From the genotypes obtained for the ESS STRs we identified rare, intermediate or off-ladder alleles that had not been previously reported for these loci. Examples of novel ESS STR alleles found were characterized by sequence analysis. This revealed extensive repeat structure variation in three ESS STRs, with D12S391 showing particularly high variability for tandem runs of AGAT and AGAC repeat units. The global geographic distribution of the CEPH panel samples gave an opportunity to study in detail the extent of substructure shown by the 20 STRs amongst populations and between their parent population groups. An assessment was made of the forensic informativeness of the new ESS STRs compared to the loci they will replace: CSF1PO, D5S818, D7S820, D13S317 and TPOX, with results showing a clear enhancement of discrimination power using multiplexes that genotype the new ESS loci. We also measured the ability of Identifiler and ESS STRs to infer the ancestry of the CEPH-HGDP samples and demonstrate that forensic STRs in large multiplexes have the potential to differentiate the major population groups but only with sufficient reliability when used with other ancestry-informative markers such as single nucleotide polymorphisms. Finally we checked for possible association by linkage between the two ESS multiplex STRs closely positioned on chromosome-12: vWA and D12S391 by examining paired genotypes from the complete CEPH data set.
Article
Evidentiary traces may contain low quantities of DNA, and regularly incomplete short tandem repeat (STR) profiles are obtained. In this study, higher capillary electrophoresis injection settings were used to efficiently improve incomplete STR profiles generated from low-level DNA samples under standard polymerase chain reaction (PCR) conditions. The method involves capillary electrophoresis with higher injection voltage and extended injection time. STR peak heights increased six-fold. Inherent to the analysis of low-level DNA samples, we observed stochastic amplification artifacts, mainly in the form of allele dropout and heterozygous peak imbalance. Increased stutter ratios and allele drop-in were rarely seen. Upon STR typing of 10:1 admixed samples, the profile of the major component did not become overloaded when using higher injection settings as was observed upon elevated cycling. Thereby an improved profile of the minor component was obtained. For low-level DNA casework samples, we adhere to independent replication of the PCR amplification and boosted capillary electrophoresis.
Article
Although the low-template or stochastic threshold is in widespread use and is typically set to 150-200 rfu peak height, there has been no consideration on its determination and meaning. In this paper we propose a definition that is based upon the specific risk of wrongful designation of a heterozygous genotype as a homozygote which could lead to a false exclusion. Conversely, it is possible that a homozygote {a,a} could be designated as {a,F} where 'F' is a 'wild card', and this could lead to increased risk of false inclusion. To determine these risk levels, we analysed an experimental dataset that exhibited extreme drop-out using logistic regression. The derived probabilities are employed in a graphical model to determine the relative risks of wrongful designations that may cause false inclusions and exclusions. The methods described in this paper provide a preliminary solution of risk evaluation for any DNA process that employs a stochastic threshold.
Article
The PCR amplification of tetranucleotide short tandem repeat (STR) loci typically produces a minor product band 4 bp shorter than the corresponding main allele band; this is referred to as the stutter band. Sequence analysis of the main and stutter bands for two sample alleles of the STR locus vWA reveals that the stutter band lacks one repeat unit relative to the main allele. Sequencing results also indicate that the number and location of the different 4 bp repeat units vary between samples containing a typical verses low proportion of stutter product. The results also suggest that the proportion of stutter product relative to the main allele increases as the number of uninterrupted core repeat units increases. The sequence analysis and results obtained using various DNA polymerases appear to support the slipped strand displacement model as a potential explanation for how these stutter products are generated.
Article
The aim of the study was to test the hypothesis that polymerase slippage correlates to the length of repeat stretches consisting of uniform repeats against the alternative hypothesis that the total number of repeats is most relevant. Two short tetrameric short tandem repeats (STRs) with different repeat structures were investigated: D3S1545 containing only homogeneous (GATA)n repeat stretches and D7S1517 with compound repeat arrays of GAAA and CAAA repeats. Additionally two different polymerases (Herculase and AmpliTaq Gold) were used which gave comparable results. No correlation was found for the hypothesis "total repeat number against percent of stutter"; in contrast, the other hypothesis that the number of uniform repeats is relevant for the degree of stutter gave a strong positive correlation (0.82 for selected D7S1517 alleles) which confirmed the hypothesis that polymerase slippage correlates to the length of repeat stretches consisting of uniform repeats.
Article
Following a recent meeting by the ENFSI and EDNAP groups on the 4-5 April, 2005, in Glasgow, UK, it was unanimously agreed that the process of standardization within Europe should take account of recent work that unequivocally demonstrated that chance of obtaining a result from a degraded sample was increased when small amplicons (mini-STRs) were analysed. Consequently, it was recommended that existing multiplexes are re-engineered to enable small amplicon detection, and that three new mini-STR loci with alleles <130 bp (D10S1248, D14S1434 and D22S1045) are adopted as universal. This will increase the number of European standard Interpol loci from 7 to 10.
Article
The DNA commission of the International Society of Forensic Genetics (ISFG) was convened at the 21st congress of the International Society for Forensic Genetics held between 13 and 17 September in the Azores, Portugal. The purpose of the group was to agree on guidelines to encourage best practice that can be universally applied to assist with mixture interpretation. In addition the commission was tasked to provide guidance on low copy number (LCN) reporting. Our discussions have highlighted a significant need for continuing education and research into this area. We have attempted to present a consensus from experts but to be practical we do not claim to have conveyed a clear vision in every respect in this difficult subject. For this reason, we propose to allow a period of time for feedback and reflection by the scientific community. Then the DNA commission will meet again to consider further recommendations.
Article
In this study, we have evaluated the efficacy and the validity of the AmpFISTR SGM plus multiplex PCR typing system when Low Copy Number (LCN) amounts of DNA are processed. The characteristics of SGM plus profiles produced under LCN conditions were studied on the basis of heterozygote balance, between loci balance and stutter proportion based on profiles that were obtained from a variety of mock casework samples. These experiments clearly showed that LCN DNA profiles carry their own characteristic features, which must be taken into account during interpretation. Herewith, we confirmed the data of recent other studies that a comprehensive interpretation strategy is dependent upon multiple replication of the PCR using the same extract together with the proper use of extraction and amplification controls. The limitations of LCN DNA analysis were further studied in a series of single cell PCR experiments using an amplification regime of 34 PCR cycles. The allele dropout phenomenon was demonstrated to its full extent when single cells were analysed. However, the "consensus profile" which was obtained from separate single cell PCR experiments matched the actual profile of the cell donor. Single cell PCR experiments also showed that a further increase of the number of PCR cycles did not result in enhanced sensitivity and had a highly negative effect on the balance of this multiplex PCR system which hampered correct interpretation of the profile. Also, the potential of LCN typing in analysing mixtures of DNA was investigated. It was clearly shown that LCN typing had no advantages over 28 cycles amplification in the detection of the minor component of DNA-mixtures. In addition to the 34 cycles PCR amplification regime, the utility of a new approach that involved reamplification of the 28 cycle SGM plus PCR products with an extra 6 PCR cycles after the addition of fresh AmpliTaq Gold DNA Polymerase was investigated. This approach provides the scientist with an extra typing result that enhances the reliability of the consensus profile, which is commonly retrieved from two separate 34 cycle PCR results. Furthermore, the 28 + 6 cycles approach may be used to screen LCN samples for their potential to produce a 34 PCR cycle profile. Finally and as a last resort the 28 + 6 cycles approach can be used in those cases where no further extract from the crime sample is available. Finally, the potential of LCN typing was demonstrated in typing samples from non-probative and actual casework examples. From a high proportion of samples that failed to demonstrate SGM plus typing results using the standard protocol of 28 cycles, at least partial profiles could be obtained after LCN methods were used. For example, LCN typing was applied in a case where 10-year old samples from bones and teeth that were retrieved from a mass grave had to be identified. This study resulted in the positive identification of a number of victims by comparing the LCN DNA profiles with the profiles from putative relatives. The value of LCN DNA typing was further demonstrated in a strangulation case. The throat of the victim was sampled and only after 34 PCR cycles were we able to reveal that the evidential sample contained a distinct mixture of the victim's own DNA and the DNA of the defendant.
The Next Generation Multiplex (NGM TM Kit) in a Forensic Setting, Forensic News
  • A S Matai
  • J Harteveld
  • T Sijen
A.S. Matai, J. Harteveld, T. Sijen, The Next Generation Multiplex (NGM TM Kit) in a Forensic Setting, Forensic News, 2010, http://marketing.appliedbiosystems.com/ images/All_Newsletters/Forensic_0710/pdfs/Customer-Corner/nextGeneration. pdf?bcsi_scan_B0A38A178AE5B708=0&bcsi_scan_filename=nextGeneration.pdf.
Development of the AmpFlSTR 1 NGM SElect TM Kit: New Sequence Discoveries and Implications for Genotype Concordance, Forensic News
  • N Oldroyd
  • R Green
  • J Mulero
  • L Hennessy
  • J Tabak
N. Oldroyd, R. Green, J. Mulero, L. Hennessy, J. Tabak, Development of the AmpFlSTR 1 NGM SElect TM Kit: New Sequence Discoveries and Implications for Genotype Concordance, Forensic News, 2011, http://www3.appliedbiosystems. com/cms/groups/applied_markets_marketing/documents/generaldocuments/ cms_090694.pdf.
Developmental Validation of the AmpFlSTR 1 NGM TM Kit, a Robust and Highly Discriminatory STR Multiplex
  • R Green
  • J Mulero
  • R Lagace
  • W Norona
  • C Chang
  • N Oldroyd
  • L Hennessy
R. Green, J. Mulero, R. Lagace, W. Norona, C. Chang, N. Oldroyd, L. Hennessy, Developmental Validation of the AmpFlSTR 1 NGM TM Kit, a Robust and Highly Discriminatory STR Multiplex, 2009, https://www3.appliedbiosystems.com/ cms/groups/applied_markets_marketing/documents/generaldocuments/cms_ 073984.pdf.