Automating HIV drug resistance genotyping with RECall, a freely accessible sequence analysis tool.
ABSTRACT Genotypic HIV drug resistance testing is routinely used to guide clinical decisions. While genotyping methods can be standardized, a slow, labor-intensive, and subjective manual sequence interpretation step is required. We therefore performed external validation of our custom software RECall, a fully automated sequence analysis pipeline. HIV-1 drug resistance genotyping was performed on 981 clinical samples at the Stanford Diagnostic Virology Laboratory. Sequencing trace files were first interpreted manually by a laboratory technician and subsequently reanalyzed by RECall, without intervention. The relative performances of the two methods were assessed by determination of the concordance of nucleotide base calls, identification of key resistance-associated substitutions, and HIV drug resistance susceptibility scoring by the Stanford Sierra algorithm. RECall is freely available at http://pssm.cfenet.ubc.ca. In total, 875 of 981 sequences were analyzed by both human and RECall interpretation. RECall analysis required minimal hands-on time and resulted in a 25-fold improvement in processing speed (∼150 technician-hours versus ∼6 computation-hours). Excellent concordance was obtained between human and automated RECall interpretation (99.7% agreement for >1,000,000 bases compared). Nearly all discordances (99.4%) were due to nucleotide mixtures being called by one method but not the other. Similarly, 98.6% of key antiretroviral resistance-associated mutations observed were identified by both methods, resulting in 98.5% concordance of resistance susceptibility interpretations. This automated sequence analysis tool provides both standardization of analysis and a significant improvement in data workflow. The time-consuming, error-prone, and dreadfully boring manual sequence analysis step is replaced with a fully automated system without compromising the accuracy of reported HIV drug resistance data.
- SourceAvailable from: Madisa Mine[Show abstract] [Hide abstract]
ABSTRACT: As more HIV-infected people gain access to antiretroviral therapy (ART), monitoring HIV drug resistance (HIVDR) becomes essential to combat both acquired and transmitted HIVDR. Studies have demonstrated dried blood spots (DBS) are a suitable alternative in HIVDR monitoring using DBS collected on Whatman 903 (W-903). In this study, we sought to evaluate two other commercially available filter papers, Ahlstrom 226 (A-226) and Munktell TFN (M-TFN), for HIVDR genotyping following ambient temperature storage. DBS were prepared from remnant blood specimens collected from 334 ART patients and stored at ambient temperature for a median time of 30 days. HIV-1 viral load was determined using NucliSENS EasyQ® HIV-1 v2.0 RUO test kits prior to genotyping of the protease and reverse transcriptase regions of the HIV-1 pol gene using an in-house assay. Among the DBS tested, 26 specimens had a viral load ≥1000 copies/mL in all three types of filter paper and were included in the genotyping analysis. Genotyping efficiencies were similar between DBS collected on W-903 (92.3%), A-226 (88.5%), and M-TFN (92.3%) filter papers (P = 1.00). We identified 50 DR-associated mutations in DBS collected on W-903, 33 in DBS collected on A-226, and 48 in DBS collected on M-TFN, resulting in mutation detection sensitivities of 66.0% for A-226 and 88.0% for M-TFN when compared to W-903. Our data indicate that differences among filter papers may exist at this storage condition and warrant further studies evaluating filter paper type for HIVDR monitoring.PLoS ONE 01/2014; 9(10):e109060. · 3.53 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Increased trends of primary drug resistance mutations (DRMs) among treatment-naive HIV-1-infected patients in low- and middle-income countries (LMICs) and the non-availability of pre-antiretroviral therapy (ART) genotypic resistance testing (GRT) may severely affect future therapeutic outcomes. The main objective of this study was therefore to develop a simplified, cost- and labour-efficient but high-throughput GRT protocol to be applied in the large-scale surveillance of DRMs in LMICs.Journal of Antimicrobial Chemotherapy 07/2014; · 5.44 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Background The reproducible nature of HIV-1 escape from HLA-restricted CD8+ T-cell responses allows the identification of HLA-associated viral polymorphisms ¿at the population level¿ ¿ that is, via analysis of cross-sectional, linked HLA/HIV-1 genotypes by statistical association. However, elucidating their timing of selection traditionally requires detailed longitudinal studies, which are challenging to undertake on a large scale. We investigate whether the extent and relative timecourse of immune-driven HIV adaptation can be inferred via comparative cross-sectional analysis of independent early and chronic infection cohorts.ResultsSimilarly-powered datasets of linked HLA/HIV-1 genotypes from individuals with early (median¿<¿3 months) and chronic untreated HIV-1 subtype B infection, matched for size (N¿>¿200/dataset), HLA class I and HIV-1 Gag/Pol/Nef diversity, were established. These datasets were first used to define a list of 162 known HLA-associated polymorphisms detectable at the population level in cohorts of the present size and host/viral genetic composition. Of these 162 known HLA-associated polymorphisms, 15% (occurring at 14 Gag, Pol and Nef codons) were already detectable via statistical association in the early infection dataset at p¿¿¿0.01 (q¿<¿0.2) ¿ identifying them as the most consistently rapidly escaping sites in HIV-1. Among these were known rapidly-escaping sites (e.g. B*57-Gag-T242N) and others not previously appreciated to be reproducibly rapidly selected (e.g. A*31:01-associated adaptations at Gag codons 397, 401 and 403). Escape prevalence in early infection correlated strongly with first-year escape rates (Pearson¿s R¿=¿0.68, p¿=¿0.0001), supporting cross-sectional parameters as reliable indicators of longitudinally-derived measures. Comparative analysis of early and chronic datasets revealed that, on average, the prevalence of HLA-associated polymorphisms more than doubles between these two infection stages in persons harboring the relevant HLA (p¿<¿0.0001, consistent with frequent and reproducible escape), but remains relatively stable in persons lacking the HLA (p¿=¿0.15, consistent with slow reversion). Published HLA-specific Hazard Ratios for progression to AIDS correlated positively with average escape prevalence in early infection (Pearson¿s R¿=¿0.53, p¿=¿0.028), consistent with high early within-host HIV-1 adaptation (via rapid escape and/or frequent polymorphism transmission) as a correlate of progression.Conclusion Cross-sectional host/viral genotype datasets represent an underutilized resource to identify reproducible early pathways of HIV-1 adaptation and identify correlates of protective immunity.Retrovirology 08/2014; 11(1):64. · 4.77 Impact Factor
Automating HIV Drug Resistance Genotyping with RECall, a Freely
Accessible Sequence Analysis Tool
Conan K. Woods,aChanson J. Brumme,aTommy F. Liu,bCelia K. S. Chui,aAnna L. Chu,aBrian Wynhoven,aTom A. Hall,c
Christina Trevino,dRobert W. Shafer,band P. Richard Harrigana,e
BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canadaa; Department of Medicine, Stanford University School of Medicine, Stanford, California, USAb;
IBIS Biosciences, Carlsbad, California, USAc; Diagnostic Virology Laboratory, Stanford University School of Medicine, Stanford, California, USAd; and Division of AIDS,
University of British Columbia, Vancouver, British Columbia, Canadae
help guide and tailor highly active antiretroviral therapy
areas of the viral genome targeted by antiretroviral drugs, geno-
successful virological suppression (2, 6) and subsequently reduc-
ing the overall cost of treatment by minimizing the use of ineffec-
tive drugs and avoiding treatment failure-related inpatient
The predominant methodology used for HIV genotypic drug
resistance testing involves reverse transcriptase PCR (RT-PCR)
amplification of extracted viral RNA from plasma followed by
population-based (bulk) sequencing (4). Since multiple sequenc-
ing primers are required to provide bidirectional coverage over
are then assembled into a contiguous consensus sequence by use
of analysis software. Commercially available HIV drug resistance
genotyping kits, such as the Trugene (Siemens, Deerfield, IL) (10,
13) and ViroSeq (Abbott Laboratories, Des Plaines, IL) (5) tests,
are distributed with custom analysis software; however, simple
in-house. The available “generic” sequence analysis programs re-
quire considerable hands-on time; highly trained technicians
must first inspect each trace file and trim out regions of problem-
atic or low-quality sequence before manually specifying the se-
quence reads to assemble. The sequence assembly is subsequently
uman immunodeficiency virus (HIV) drug resistance geno-
typing has been used in clinical practice for over 10 years to
final consensus sequence is then exported and processed with a
drug resistance interpretation algorithm.
Drug resistance mutation reporting often varies between lab-
oratories, even when identical samples are tested (9, 17). While
many interlaboratory discrepancies can result from differences in
sample preparation (e.g., primer choice or stochastic variation),
variation may be introduced by technicians as they subjectively
review the assembled sequences (11). Since drug-resistant HIV
variants may be present at low frequencies in clinical isolates, ac-
curate identification of nucleotide “mixtures” (positions where
two or more nucleotides are observed) is required. Differences in
individual technicians’ propensities to identify low-level nucleo-
tide mixtures could result in clinically relevant drug resistance
mutations being missed (10, 11). In order to minimize erroneous
HIV drug resistance reporting and optimize genotyping proto-
cols, clinical and research laboratories often participate in quality
assurance programs (QAP), where identical samples are sent to
multiple laboratories for independent analysis (9, 17). Unfortu-
Received 13 December 2011 Returned for modification 27 January 2012
Accepted 1 March 2012
Published ahead of print 7 March 2012
Address correspondence to P. Richard Harrigan, email@example.com.
C.K.W. and C.J.B. contributed equally to this article.
Copyright © 2012, American Society for Microbiology. All Rights Reserved.
jcm.asm.orgJournal of Clinical Microbiologyp. 1936–1942June 2012 Volume 50 Number 6
nately, due to the complications of subjective sequence interpre-
The implementation of an automated sequence analysis tool
would enable objective and consistent interpretation of HIV ge-
notype data and provide considerable practical advantages, most
notably improvements in processing speed and significantly de-
creased labor and software costs. At the British Columbia Centre
for Excellence in HIV/AIDS (BCCfE), we have developed a bioin-
formatics tool, RECall, to address these challenges. RECall is a
pipeline for assembling, aligning, analyzing, and finishing se-
quence chromatogram files and has been tailored specifically for
HIV genotyping. While these steps are performed by most se-
quence assembly programs, RECall has been designed specifically
to reproducibly call nucleotide mixtures. RECall is available for
free as a Web application (http://pssm.cfenet.ubc.ca/).
Here we present the results of an external validation of auto-
mated RECall analysis of sequence data generated by an indepen-
MATERIALS AND METHODS
Laboratory methods. HIV genotyping was performed at the Stanford
University Hospital Diagnostic Virology Laboratory (Stanford, CA).
Clinical genotypic resistance testing was performed on 981 sequentially
collected plasma samples by use of a previously described approach (17).
Briefly, plasma virus extraction and purification were performed on
Qiagen BioRobot M48 or QIAsymphony SP automated nucleic acid ex-
traction instruments (Qiagen Inc., Valencia, CA), followed by one-step
RT-PCR and a nested 2nd-round PCR. Direct bidirectional sequencing
encompassing HIV-1 protease (PR) and the first 296 codons of reverse
transcriptase (RT) was performed on an ABI 3730 sequencer (Life Tech-
nologies, Carlsbad, CA). Chromatograms were created using Sequencing
Analysis v 5.2 (ABI). Nucleotide mixtures (positions containing two or
more nucleotides, with the minor peak height being ?20% of the major
peak height) were marked with model 3730 Data Collection software v
A laboratory technologist assembled the sequence trace files for each
8.0 (DNAStar, Madison, WI). SeqMan uses user-defined parameters to
identify positions in the assembly with potential “conflicts.” Conflicts are
overlapping sequences do not have the same base call, and any “N” calls
that the sequencer could not distinguish. The analyzing technologist vi-
sually inspected each sequence, stopping at each conflict and making
alyzed using RECall. The software requires a consistent file naming con-
vention to automatically group multiple sequence reads (primers) be-
longing to the same sample into a single consensus sequence (contig).
The sequencing trace files (.ab1) are first processed with the software
package phred (7, 8), which calls bases and assigns quality scores to each
nucleotide. In a trace file, a “mixed” or “ambiguous” base is represented
by overlapping peaks. When calling bases, phred determines the location
and area of the primary peak (“called base”) and the largest secondary
peak (“uncalled base”) in the trace file. Since the primary and secondary
peaks are often offset, RECall attempts to align the peak positions to their
corresponding locations in the .ab1 files. The quality scores that phred
assigns are a measure of the accuracy of the base call. Regions of poor
trimmed automatically by RECall. phred quality scores are also used to
identify low-quality regions (regions with phred scores of ?20) within a
fragment, which are also flagged and excluded from the final contig as-
sembly. Grouped fragments are assembled and aligned to a user-supplied
reference sequence (e.g., HIV-1 HXB2 [GenBank accession no. K03455])
by use of a modified Smith-Waterman algorithm (18). For this study, all
chromatograms (.ab1) were submitted to RECall in a single batch and
were processed without any human intervention, using a standard desk-
top PC (Intel Core-i5 660 3.33-GHz CPU, 3 GB RAM, Windows XP).
RECall nucleotide mixture calling and “marking” of potentially
problematic bases. The most important feature of RECall is the process
by which the software calls ambiguous nucleotides (mixtures). Following
the assembly and alignment step, RECall identifies mixtures based on the
mined by phred. The RECall configuration variables for mixture calling
for clinical drug resistance testing at the BCCfE are listed in Table 1. Each
position in the sequence alignment is examined sequentially. For each
position, a list is first generated by counting the frequency of each nucle-
otide that appears as either a called base or an uncalled base meeting the
mixture area criterion. This list is then reduced to include only nucleo-
common (secondary) bases are retained. If the secondary base is called
with more than half the frequency of the majority base, then a mixture is
called. If two bases tie as the most common bases but no majority is
achieved, then a mixture of those two bases is called. Finally, if none of
Since phred is limited to calling a maximum of two nucleotides per chro-
lematic sequences according to the parameters listed in Table 1. Inser-
optional confirmation by a human user. In addition, positions meeting
the mark area criterion are flagged for review in a manner similar to the
mixture calling procedure (Table 1). In this study, mixtures and marks
were not subjected to human interpretation.
RECall pass-fail criteria. Sequences were passed or failed based on
criteria established in the BCCfE laboratory, which form the default pa-
rameters in RECall. Multiple quality checks were performed on every
TABLE 1 Configuration variables for nucleotide mixture calling and base “marking” for clinical drug resistance genotyping
Quality censoring cutoff
Mixture area (%)
?10Bases with phred quality scores below the cutoff are excluded from the assembly.
?20The area of the uncalled peak must be at least 20% of the called peak area. If ?50% of
the reads pass this threshold, then a mixture is called.
The area of the uncalled peak must have at least 17.5% of the called peak area. If ?50%
of the reads pass this threshold, then a mark is made.
If the average quality of the base across all reads is below the cutoff, then a mark is
Insertions, deletions, and single primer coverage are also marked.
Mark area (%)
Mark average quality cutoff
RECall: Automated HIV Sequence Analysis
June 2012 Volume 50 Number 6jcm.asm.org 1937
sample to ensure that the sequence was acceptable. Problems leading to
eters can be modified by the user. Sequences that pass internal quality
ted files. Because RECall by default requires double primer coverage over
the entire sequence length, some samples that the Stanford laboratory
deemed acceptable by human interpretation were rejected by RECall. In
the following analyses, we included only those sequences that passed
RECall’s default quality control criteria.
RECall Web application. The RECall Web application includes per-
sonal password-protected user accounts that allow sequencing jobs to be
saved and reanalyzed in the future without the need to upload files again.
access are not given access to the program parameters but may process
data using only the parameters provided by the local “SuperUser.” Pro-
cessed sequences are retained on the RECall server for a user-chosen pe-
riod, after which they are automatically deleted. No submitted data are
reprocessed, collected, analyzed, used for any purpose, or shared with
Data analyses. The finished sequences generated by RECall were re-
turned to the Stanford laboratory for comparison of these results with the
results of conventional HIV genotypic drug resistance testing methods
tide discordance was considered to be present when one methodology re-
ported a nucleotide mixture and the other reported one of the mixture’s
components (e.g., human-reported Y and RECall-reported C). A complete
a different unambiguous nucleotide at the same position for a sample (e.g.,
tance mutation positions for mutations defined as key resistance muta-
tions by the International AIDS Society (USA table) (12). A drug resis-
part of an amino acid mixture. The Stanford HIV drug resistance geno-
typing Web service Sierra (algorithm version 6.0.1 [http://hivdb.stanford
.edu/pages/algs/sierra_sequence.html]; Stanford University, Stanford,
CA) (14) was used to infer antiretroviral drug susceptibilities from both
human- and RECall-analyzed PR-RT nucleotide sequences.
it was tested on in-house sequences (data not shown). We therefore
wished to perform an external validation of the applicability of
HIV protease-RT sequences and raw .ab1 sequence trace files
(with manual technician review) and reanalyzed by RECall. Of
these, 875 (89.2%) met the default RECall acceptability criteria
after automated processing. The primary reason for failure was a
lack of double primer coverage over the entire sequence length.
GB RAM, Windows XP), RECall completed base calling, assem-
bly, and alignment in less than 6 h, with no hands-on analysis. In
contrast, manual analysis required an estimated 150 h of techni-
Nucleic acid sequence concordance between human and
RECall interpretations. There was 99.7% overall agreement in
base calling between human and RECall over 1,036,875 analyzed
bases. The rates of complete sequence concordance were 99.6%
for 259,875 protease (PR) nucleotide positions and 99.7% for
777,000 reverse transcriptase (RT) nucleotide positions (Fig. 1).
discordant” (mixtures called by one method but not the other),
RT nucleotides, 2,517 (99.3%) were partially discordant, and 18
(0.7%) were completely discordant. Most of the partially discor-
dant bases (2,530 of 3,457 bases [73.2%]) comprised nucleotide
pairs resulting from transitions (R ? A/G, Y ? C/T) rather than
transversions (K ? G/T, M ? A/C, S ? C/G, W ? A/T). The
completely discordant positions were relatively equally distrib-
(n ? 11, 6, and 5, respectively) (Fig. 1). Nucleotide mixtures were
mixtures per 1,185-bp PR-RT fragment. Overall, the human op-
erator called a marginally larger number of mixtures (10,996 hu-
man-called mixtures [1.06%] and 10,921 RECall-called mixtures
[1.05%]; P ? 0.8). Positions with three-nucleotide mixtures (i.e.,
B, D, H, or V) (Fig. 1) were automatically discordant because
phred (and therefore RECall) is not programmed to recognize
dant calls by RECall and the human operator is shown in Fig. 2.
Amino acid sequence concordance between human and
RECall interpretations. The 944 discordant PR nucleotide posi-
RECall interpretations when the sequences were translated to
amino acids: 378 (99.5%) were partial amino acid discordances
(where at least one amino acid was shared between the two inter-
ences. In RT, the 2,535 discordant nucleotide positions occurred
in 2,469 unique codons. When the sequences were translated to
amino acids, 729 (29.5%) discordant substitutions were observed
TABLE 2 Criteria used by RECall for rejecting a sequence
Failure category Description
Too many mixtures
Any unambiguous stop codon (TGA, TAA, or TAG)
An insertion relative to the reference sequence that is not a multiple of three bases, resulting in a frameshift
A deletion relative to the reference sequence that is not a multiple of three bases, resulting in a frameshift
?3.5% of nucleotides sequenced are called as mixtures
?5 Ns (any base) in the sequence
?100 positions marked as being potentially problematic
?3 consecutive bases of single-read coverage with phred scores of ?40
Any section where the quality of all coverage is too low to make a call
Woods et al.
jcm.asm.orgJournal of Clinical Microbiology
partial differences, and 5 (0.7%) were completely discordant.
Overall, human and RECall sequence review identified 1,096
(266 in PR and 830 in RT) and 1,098 (269 in PR and 829 in RT)
“key” antiretroviral drug resistance mutations (12), respectively,
For PR, the two methods were in agreement for 265 cases (264
[99.6%] were in complete agreement). The human method iden-
tified 1 PR resistance mutation that RECall did not, while RECall
two methods both identified resistance mutations in 824 cases
(809 [98.2%] were in complete agreement). The human method
identified 6 RT resistance mutations that RECall did not, while
RECall identified 5 that the human method did not. In general, it
was not obvious which method was “correct.”
interpreted by both methods were submitted to Sierra, the Stanford
HIV Drug Resistance Database genotyping tool (algorithm version
6.0.1), and were scored for susceptibility to all currently available
protease inhibitors (PI), nucleoside/nucleotide reverse transcriptase
inhibitors (NRTI), and nonnucleoside reverse transcriptase inhibi-
resistance score for 19 PI, NRTI, and NNRTI (14). A higher score
score difference of ?10, with the maximum difference being 72.
However, small differences in susceptibility scores may not translate
ing raw susceptibility scores, Sierra categorizes each sequence inter-
and “R” interpretations were grouped together into a single “resis-
tant” category. Only 13 samples (1.5%) had a discordant drug resis-
FIG 1 Concordant and discordant nucleotide base calls in protease and reverse transcriptase sequences analyzed manually and by RECall. Matrices depict the
frequencies of nucleotides in protease (A) and reverse transcriptase (B) called by human operators (vertical axis) and by RECall (horizontal axis). Concordant
as the software does not call three-base mixtures. Overall, 99.7% concordance was observed for more than 1 million bases compared.
RECall: Automated HIV Sequence Analysis
June 2012 Volume 50 Number 6jcm.asm.org 1939
tance interpretation for ?1 drugs (median of 2 drugs). Among
16,625 drug resistance scores, only 35 (0.2%) had discordant resis-
tance interpretations between human- and RECall-interpreted se-
quences (Table 3). Of these discordances, 25 (71.4%) were cases
where human calls resulted in a “susceptible” interpretation while
This study evaluated the performance of RECall, an automated
sequence analysis tool developed by the BC Centre for Excellence
in HIV/AIDS, in quickly and accurately interpreting HIV geno-
typic data for drug resistance testing. RECall is available free of
pared the results generated by RECall to human-verified se-
quences from the Stanford University Hospital Diagnostic Virol-
ogy Laboratory, a well-recognized institution that has conducted
routine HIV genotypic drug resistance testing for over 10 years.
Using a set of 875 HIV-1 protease and reverse transcriptase
sequences, we analyzed the concordance of detection of ambigu-
ous nucleotides, amino acid changes, and drug resistance muta-
tions between sequences interpreted manually by lab technicians
or automatically by RECall. RECall showed excellent agreement
with subjective human interpretation of HIV sequence data, with
99.7% concordance over more than 1 million bases compared.
Similar degrees of agreement (?99.5%) were noted in previous
analyses of smaller data sets from other independent laboratories
(3, 19). Of the limited number of differences in base calling, the
vast majority were due to partial nucleotide discordance, where
one method detected a mixture and the other detected one com-
ponent of the mixture. As a result, the majority of amino acid
differences detected by human versus RECall interpretation were
also due to partial discordances. Human and RECall reviews
FIG 2 Chromatograms illustrating discordant base calls between human and RECall sequence interpretations. The majority of differences between the
two analysis methods were due to partial discordances where one method called a nucleotide mixture and the other method called only one nucleotide
component of a mixture. Depicted here are representative chromatogram traces for discordant mixture base calls. Panels A to C show positions called
mixtures by human visual inspection but not by RECall. Panels D to F show positions called mixtures by RECall but not by human interpretation. In each
panel, the top line of text contains the consensus human base calls, while the bottom line of text shows the consensus RECall base calls. The discordant
mixtures are circled in orange.
TABLE 3 Sierra drug resistance interpretation concordance between
human- and RECall-analyzed sequences
Drug classDrug (abbrev)a
Sierra resistance interpretation
by Human analysis/RECall
analysis (no. of samples)b
S/S R/R S/R R/S
NRTI Lamivudine (3TC)
PI Atazanavir/r (ATV/r)
Total 14,4022,188 2510
a“/r” indicates a combination with ritonavir.
bS/S, susceptible by both methods; R/R, resistant by both methods; S/R, susceptible by
human interpretation but resistant by RECall; R/S, resistant by human interpretation
but susceptible by RECall. “Resistant” interpretations included both intermediate (I)
and resistant (R) Sierra calls.
Woods et al.
jcm.asm.orgJournal of Clinical Microbiology
tations identified by either method. For comparison, when iden-
tical sequence trace files are inspected and edited by multiple hu-
man operators, the rate of identification of resistance mutations
can be ?90% (11), depending on the samples tested. Although a
very small number of key resistance mutations were identified by
a single method only, these were all a result of partial mismatches
due to differential detection of nucleotide mixtures.
Despite the extremely high concordance between methods,
there may be several reasons for the observed discrepancies. First,
RECall relies on phred peak areas to call mixtures and is therefore
unable to call mixtures of three nucleotides. The impact of this
shortcoming, however, is negligible, as 3-base mixtures were
called exceedingly rarely by human interpretation (0.007% of
bases called) and could simply represent technical artifacts (11,
17). Second, technicians, especially less-experienced ones, can ar-
of sequence may not be considered “true” mixtures, and fre-
a person’s decision to call mixtures with a borderline secondary
peak area. In contrast, RECall is not programmed to weigh se-
quence reads based on peak height; determination of mixtures is
scores. While the inflexibility of a fully automated system for se-
quence analysis and interpretation may appear to be a drawback,
the results of this study show otherwise. RECall is configured to
mark unusual sequence positions, including mixtures, which a
technician could visually check. In this study, RECall was run
without human intervention and still rapidly produced unbiased,
consistent results for a data set generated by different methods in
an external laboratory.
RECall did call marginally fewer mixtures overall than the hu-
man operator, but this difference was not statistically significant.
Subjectively, these discordant bases could be considered “hard to
call” by human operators, and the mixture calling frequency
interpretation methods produced discordant results, with one
method calling a mixture and the other not. In general, the ma-
jority of operators preferentially called a mixture; 75% of those
surveyed chose a mixed-base call over half the time. However,
mixture calling frequencies varied widely among technicians,
ranging from 25 to 75% (data not shown), illustrating the ex-
human versus RECall analysis fall well within the range of inter-
operator mixture calling variability (11); the small difference in
mixed-base frequencies may be as likely due to overcalling by the
technicians as to undercalling by RECall. If this discrepancy is of
threshold (% uncalled base area) to more closely mimic a favored
ing parameters chosen, RECall provides standardization of base
calling frequencies—an extremely important feature of a clinical
reporting tool, and one that is clearly not achievable with solely
human interpretation (11). Furthermore, good laboratory prac-
its justification should be robustly documented (1). In HIV drug
resistance testing, the large number of manual changes required
during the assembly and editing of a consensus sequence pre-
cludes this. RECall, however, provides a system for minimizing
and tracking these manual edits.
Most importantly, RECall significantly improves the process-
ing efficiency of HIV drug resistance genotyping sequence data.
Specifically, RECall removes the need to perform several time-
cluding identifying and grouping chromatograms from a single
sample, trimming regions of low-quality data, aligning primer
sequences to a reference standard, manually reviewing mixed
bases, and exporting a finished FASTA file. Once the RECall pro-
gram is initiated (a process requiring only a few mouse clicks),
automated analysis requires no subsequent human intervention.
Furthermore, additional efficiency gains are achieved by fully in-
tegrating RECall into the data processing pipeline; ideally, RECall
is set to run immediately as soon as chromatogram data are re-
leased from the sequencing instrument.
While the results presented here are limited to HIV drug resis-
tance genotyping of protease and RT, RECall can easily be ex-
tended to analyze other regions of HIV or any protein coding
regions that can be sequenced by population-based methods. At
resistance genotyping of protease-RT, integrase, and gp41, as well
as for genotypic tropism testing of the V3 loop. RECall was used
(without human review) to process sequences from several ran-
to be predictive of virological outcomes (15, 16).
The results of our interlaboratory comparisons show that
RECall can provide an objective, standardized protocol for HIV
sequence interpretation in clinical and research laboratories. The
speed and cost-effectiveness of using an automated tool for se-
quence analysis are the primary advantages. Standardizing se-
be evaluated independent of the sequence interpretation steps.
Furthermore, RECall enables unbiased sequence interpretation,
and its internal parameters provide additional quality control
mechanisms, both of which ensure that only consistent, high-
quality data are reported.
We thank the staff of the Stanford University School of Medicine Diag-
nostic Virology Laboratory for their assistance with HIV drug resistance
genotyping. We appreciate the help of all current and former staff and
students of the BCCfE who assisted with software testing and for provid-
ing input during the development of RECall.
C.J.B. is supported by a Vanier Graduate Scholarship from the Cana-
dian Institutes of Health Research (CIHR). P.R.H. holds a GlaxoSmith-
Kline/CIHR Chair in Clinical Virology.
The funding sources played no role in the study design or in the col-
lection, analysis, and interpretation of data.
1. Anonymous. 2011. Ground-truth data cannot do it alone. Nat. Methods
2. Baxter JD, et al. 2000. A randomized study of antiretroviral management
ing therapy. AIDS 14:F83–F93.
3. Brooks JI, et al. 2009. Evaluation of an automated sequence analysis tool
RECall: Automated HIV Sequence Analysis
June 2012 Volume 50 Number 6 jcm.asm.org 1941
HIV/AIDS Res., Vancouver, Canada.
4. Cockerill FR, 3rd. 1999. Genetic methods for assessing antimicrobial
resistance. Antimicrob. Agents Chemother. 43:199–212.
5. Cunningham S, et al. 2001. Performance of the Applied Biosystems
ViroSeq human immunodeficiency virus type 1 (HIV-1) genotyping sys-
tem for sequence-based analysis of HIV-1 in pediatric plasma samples. J.
Clin. Microbiol. 39:1254–1257.
6. DeGruttola V, et al. 2000. The relation between baseline HIV drug resis-
tance and response to antiretroviral therapy: re-analysis of retrospective
and prospective studies using a standardized data analysis plan. Antivir.
phred. II. Error probabilities. Genome Res. 8:186–194.
8. Ewing B, Hillier L, Wendl MC, Green P. 1998. Base-calling of automated
sequencer traces using phred. I. Accuracy assessment. Genome Res.
9. Galli RA, Sattha B, Wynhoven B, O’Shaughnessy MV, Harrigan PR.
based genotypic assay for human immunodeficiency virus type 1 drug
resistance. J. Clin. Microbiol. 41:2900–2907.
10. Grant RM, et al. 2003. Accuracy of the TRUGENE HIV-1 genotyping kit.
J. Clin. Microbiol. 41:1586–1593.
11. Huang DD, Eshleman SH, Brambilla DJ, Palumbo PE, Bremer JW.
type 1 genotyping. J. Clin. Microbiol. 41:3265–3272.
12. Johnson VA, et al. 2010. Update of the drug resistance mutations in
HIV-1: December 2010. Top. HIV Med. 18:156–163.
13. Kuritzkes DR, et al. 2003. Performance characteristics of the TRUGENE
14. Liu TF, Shafer RW. 2006. Web resources for HIV type 1 genotypic-
resistance test interpretation. Clin. Infect. Dis. 42:1608–1618.
15. McGovern RA, et al. 2010. Population-based sequencing of the V3-loop
is comparable to the enhanced sensitivity Trofile assay (ESTA) in predict-
the MERIT Trial, abstr 92. 17th Conf. Retrovir. Opportun. Infect., San
a retrospective analysis using screening samples from the A4001029 and
MOTIVATE studies. AIDS 24:2517–2525.
17. Shafer RW, et al. 2001. High degree of interlaboratory reproducibility of
human immunodeficiency virus type 1 protease and reverse transcriptase
sequencing of plasma samples from heavily treated patients. J. Clin. Mi-
18. Smith TF, Waterman MS. 1981. Identification of common molecular
subsequences. J. Mol. Biol. 147:195–197.
19. Tilston P, et al. 2011. Evaluation of RECall automated basecalling soft-
Meet., London, United Kingdom.
20. Weinstein MC, et al. 2001. Use of genotypic resistance testing to guide
HIV therapy: clinical impact and cost-effectiveness. Ann. Intern. Med.
Woods et al.
jcm.asm.orgJournal of Clinical Microbiology