JOURNAL OF VIROLOGY, Feb. 2005, p. 1734–1742
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
Vol. 79, No. 3
Semen-Specific Genetic Characteristics of Human
Immunodeficiency Virus Type 1 env†
Satish K. Pillai,1* Benjamin Good,2Sergei Kosakovsky Pond,1Joseph K. Wong,1,2
Matt C. Strain,1Douglas D. Richman,1,2and Davey M. Smith1
University of California, San Diego, La Jolla,1and Veterans Administration,
San Diego Healthcare System, San Diego,2California
Received 2 July 2004/Accepted 10 September 2004
Human immunodeficiency virus type 1 (HIV-1) in the male genital tract may comprise virus produced locally
in addition to virus transported from the circulation. Virus produced in the male genital tract may be genet-
ically distinct, due to tissue-specific cellular characteristics and immunological pressures. HIV-1 env sequences
derived from paired blood and semen samples from the Los Alamos HIV Sequence Database were analyzed to
ascertain a male genital tract-specific viral signature. Machine learning algorithms could predict seminal
tropism based on env sequences with accuracies exceeding 90%, suggesting that a strong genetic signature does
exist for virus replicating in the male genital tract. Additionally, semen-derived viral populations exhibited
constrained diversity (P < 0.05), decreased levels of positive selection (P < 0.025), decreased CXCR4 core-
ceptor utilization, and altered glycosylation patterns. Our analysis suggests that the male genital tract rep-
resents a distinct selective environment that contributes to the apparent genetic bottlenecks associated with the
sexual transmission of HIV-1.
Most human immunodeficiency virus (HIV) transmission
events globally occur via mucosal exposure to male genital
secretions carrying the virus (34, 46). Although the risk of
sexual HIV transmission correlates with the amount of virus
present in the blood of the source partner (36), the correlation
between the viral load in the blood and genital compartment is
inconsistent (3, 23, 24). The biological determinants that influ-
ence the transmissibility of different viral variants from within
the genital tract of the HIV-infected source are still incom-
pletely understood. Since transmitted virus represents the ini-
tial virus that the immune system encounters, the understand-
ing of its composition will be critical in our attempts to develop
a successful HIV vaccine (1, 7, 54).
HIV in each chronically infected person exists as a diverse
population of related genetic variants (5, 12, 20). Anatomic
compartmentalization of these variants has been described in
blood, lung, central nervous system, and genital tract (10, 16,
17, 20, 21, 32, 41, 50, 53). Male genital tract tissues (e.g., the
prostate, seminal vesicles, and epididymis) serve as sites of
viral replication and are likely to differ from peripheral tissues
in immunological surveillance, target cell characteristics, and
efficiencies of drug penetration (10, 17, 43). Virus replicating
within the male genital tract could therefore develop distinct,
compartment-specific characteristics in response to these local
selective pressures (10, 16, 17, 20, 21, 32, 41, 50, 53). Although
genetic differences between blood- and semen-derived HIV in
an individual have been documented, a seminal signature se-
quence remains elusive (6, 10). This failure to identify a sig-
nature sequence could be attributable to the fact that previous
efforts mainly focused on proviral DNA sequences, which of-
ten represent archival viral genotypes rather than contempo-
rary, actively replicating variants (4, 44).
We investigated viral genetics and compartmentalization
within the male genital tract by applying a battery of compu-
tational techniques to paired semen- and blood-derived HIV-1
RNA env sequences. Our results suggest that the male genital
tract can represent a legitimate viral compartment, although
this compartmentalization is not absolute. Furthermore, when
viral migration between blood plasma and the male genital
tract is minimal and infrequent, there are several distinct genetic
features associated with semen-derived HIV variants. Under-
standing these tissue-specific properties of HIV type 1 (HIV-1)
will likely be crucial for the development of an effective vaccine.
MATERIALS AND METHODS
Sequence data. All of the semen-derived HIV-1 env sequences from the Los
Alamos National Lab HIV Sequence Database with accompanying subject iden-
tification were downloaded. Blood-derived sequences from the same individuals
were downloaded; semen sequences without matching blood data were removed
from the set. GenBank database accession numbers included in our analysis are
AF098718 to AF098734, AF256230 to AF256465, AF373037 to AF373043,
AF535219 to AF535859, AY005164 to AY005179, U00821 to U00843, U13381 to
U13388, and U96502 to U96608. Duplicates, sequences derived by direct PCR
sequencing, proviral DNA sequences, and nonfunctional open reading frames
(containing frameshifts, premature stop codons, etc.) were deleted. The final set
consisted of 659 env C2-V3 RNA sequences (spanning HXB2 coordinates 799 to
1410) from a total of 12 patients (376 plasma and 283 semen samples).
Phylogenetic reconstruction. Initial multiple sequence alignments were gen-
erated by using Multalin (8), with default gap parameters and the DNA 5-0
substitution matrix. Subsequent manual aligning was performed by using the
Se-Al sequence alignment editor (37). Phylogenies describing sequences from
each individual host were built by using FastDNAml (30), estimating base fre-
quencies from the data and a transition/transversion ratio of 2.0. All diversity and
divergence measurements were calculated by using dnadist (14). The absolute
rate of molecular evolution (molecular clock) was estimated by running TipDate
(38) on maximum likelihood phylogenies with dated tips. A master tree describ-
ing the entire data set was built by implementing dnadist and neighbor within the
PHYLIP version 3.5c software package (14) by using the F84 model, gamma
* Corresponding author. Mailing address: University of California,
San Diego, Division of Biological Sciences, 9500 Gilman Dr., MC
0679, La Jolla, CA 92093. Phone: (858) 552-8585, ext. 7169. Fax: (858)
552-7445. E-mail: firstname.lastname@example.org.
† Supplemental material for this article may be found at http://jvi
distributed rates across sites, and a transition/transversion ratio of 2.0. Trees
were viewed with TreeView X (31).
Evaluation of compartmentalization. The degree of segregation between com-
partments was assessed by testing for panmixis by using gene phylogenies (18, 42)
as implemented in the MacClade program (Sinauer, Sunderland, Mass.). In
brief, the minimum possible number of intercompartment migration events was
tallied, based on the maximum likelihood trees for each individual subject’s
C2-V3 sequences and their characterization according to compartment of origin.
This result was compared to the distribution of migration events for 1,000
randomly generated trees. Evidence of restricted gene flow (compartmentaliza-
tion) was documented when ?1% of the random trees required the same or
fewer number of migration events as for the sample data (29).
Machine learning classification. A machine learning approach was employed
to look for a tissue-specific genetic signature. All classification experiments in
this analysis were conducted by using WEKA (Waikato environment for knowl-
edge analysis), an open source collection of data processing and machine learn-
ing algorithms (49). The J48 decision tree inducer, based on the C4.5 algorithm
(35) was implemented with the parameter “MinNumObj” set at a value of 7 to
limit the complexity of theories and minimize the risk of overfitting. Classifiers
were evaluated by using 100 iterations of stratified 10-fold cross-validation, a
procedure designed to reflect the performance of classification models on novel
data sets. For each of 100 trials, the data set was randomly divided into 10 groups
of approximately equal size and class distribution. For each “fold,” the classifier
was trained by using all but 1 of the 10 groups and then tested on the unseen
group. This procedure was repeated for each of the 10 groups. The cross-
validation score for one trial was the average performance across each of the 10
training runs. The reported score is the average across the 100 trials (49). In
addition, we have reported the true positive rate (TPR) and precision for these
classification experiments: TPR ? [number of true positives/(number of true
positives ? number of false negatives)]; precision ? [number of true positives/
(number of true positives ? number of false positives)].
Analysis of selection. A maximum likelihood method was used to detect and
quantify positive and negative selection. All data sets were first evaluated by
using a model selection procedure (22) to identify and correct for strong nucleo-
tide substitution biases which are ubiquitous in HIV. The fixed-effects likelihood
(FEL) approach (22) was employed to test for selective pressure at a given site.
Maximum likelihood estimates of branch lengths and nucleotide substitution rate
parameters were derived from the entire alignment. A full codon model, using a
modified MG94 (28) rate matrix with site-specific instantaneous synonymous
(alphas) and nonsynonymous (betas) rates was then fitted independently to every
codon position in the data, under two hypotheses: H_0, neutral evolution (alphas
equal betas); H_A, nonneutral evolution (alphasand betasare free to vary
When the hypothesis of neutrality was rejected at site s, it was called positively
selected if betaswas estimated to be greater than alphas. The FEL method was
implemented on a cluster of computers by using the HyPhy package (22).
Coreceptor usage prediction. A support vector machine-based method was
employed to predict the coreceptor usage of viruses based on the V3 loop amino
acid sequence (33). This method is highly reliable and is reported to predict
CXCR4 usage with a specificity of 93% (19). The coreceptor classifier is available
for public use at: http://genomiac2.ucsd.edu:8080/wetcat/tropism.html.
Glycosylation. GlycoTracker.pl (S. Pillai, unpublished data) was used to iden-
tify N-linked glycosylation sites within each sequence. The Perl script provides a
tally of all sequons, along with their respective locations (numbered according to
HXB2 gp160). We compared the extent and distribution of N-linked glycosyla-
tion across the C2-V3 region in both compartments by identifying NXS and NXT
(where X is some other residue) motifs in plasma- and semen-derived sequences
(25). All statistical comparisons were performed by using a Wilcoxon Mann-
Whitney test (11).
Codon usage analysis. The general codon usage analysis (GCUA) package was
implemented to look for compartment-specific codon usage biases (26).
Compartmentalization of semen-derived virus. To deter-
mine if the male genital tract represents a viral compartment,
we used systematic phylogenetic comparison of matched blood-
FIG. 1. Examples of compartmentalized and noncompartmentalized viral populations. Maximum likelihood phylogenies of C2-V3 env se-
quences. (a) Individual A, compartmentalized virus. (b) Individual J, noncompartmentalized virus. Open circles represent semen sequences, and
closed circles indicate plasma-derived sequences. Black squares represent the HXB2 outgroup. Scale bar equals 10% genetic distance.
VOL. 79, 2005 SEMEN-SPECIFIC GENETIC CHARACTERISTICS OF HIV-1 env 1735
and semen-derived HIV-1 RNA env sequences from 12 indi-
viduals. We hypothesized that if the male genital tract is indeed
a viral compartment, semen-derived sequences within each
individual should cluster independently, while exhibiting simi-
lar levels of diversity and divergence as matching plasma se-
quences given comparable effective population sizes (29). Max-
imum likelihood trees describing contemporaneous variants
from both tissues revealed that the male genital tract repre-
sented a distinct virologic compartment in six individuals
(identified as A to F) (Fig. 1a; see Fig. S1 in the supplemental
material), based on phylogenetic segregation between blood
and semen virus. In five of the individuals, sequences did not
cluster with respect to compartment (Fig. 1b; see Fig. S3 in the
supplemental material). In one individual, G, there were lon-
gitudinal data that showed compartmentalization at the earlier
time points but then apparent panmixis at later time points
(see Fig. S2 in the supplemental material). In accordance with
previous reports, a neighbor-joining tree comprising pooled
data from all compartmentalized patients revealed that host,
rather than compartment of origin, was the strongest phyloge-
netic determinant (see Fig. S4 in the supplemental material).
Genetic diversity in plasma- and semen-derived viral pop-
ulations. Genetic diversity was characterized by calculating the
average pairwise distance within a population, based on dis-
tance measurements obtained by using the F84 matrix. Data
across multiple time points were pooled when available. Indi-
viduals with phylogenetically distinct virus in blood and semen
consistently exhibited lower genetic diversity in semen-derived
viral populations (P ? 0.01 by a paired Wilcoxon test). Con-
versely, individuals with noncompartmentalized virus failed to
demonstrate any significant differences in viral diversity be-
tween tissues (Fig. 2).
Analysis of longitudinal sequence data. Longitudinal se-
quence data spanning multiple years were available for five
individuals (identified as F, G, I, J, and K). We first evaluated
tissue-specific longitudinal genetic diversity in these individuals
by computing average pairwise genetic distances for each time
point where blood and semen sequences were available. The
longitudinal data reinforced our aforementioned results; indi-
vidual F, characterized by compartmentalized virus at all avail-
able time points, exhibited constrained viral diversity in semen
throughout the 2-year monitored period (Fig. 3a). Individual
G, who transitioned from compartmentalized to noncompart-
mentalized virus, showed considerable variation in tissue-spe-
cific diversity; semen diversity bounced between being greater
and less than contemporaneous plasma diversity, in accor-
dance with inconsistent trafficking between these tissues. In-
dividuals I, J, and K were consistently characterized by non-
compartmentalized virus and exhibited similar levels of viral
diversity in blood and semen at nearly all sample points (see
Fig. S5 in the supplemental material).
We next looked at longitudinal divergence in these five in-
dividuals, by calculating the average genetic distance from se-
quences at each time point to an artificial, tissue-specific base-
line consensus sequence. On average, the observed level of
divergence was comparable across tissues in individuals with
both compartmentalized and noncompartmentalized virus,
consistent with actively replicating viral populations in both
blood and male genital tract (see Fig. S5 in the supplemental
material). We also calculated the divergence between blood-
and semen-derived virus by computing the average genetic
distance between these populations at each time point. Indi-
vidual F as expected demonstrated continually increasing di-
vergence between tissue-specific populations, most probably
FIG. 2. Genetic diversity in semen-derived and blood-derived viral populations. Genetic diversity was significantly lower in semen-derived viral
populations within individuals characterized by compartmentalized virus (individuals A to Gc; P ? 0.01 by a paired Wilcoxon test). No significant
difference in viral diversity between blood and semen viral populations was observed in individuals with noncompartmentalized virus (individuals
Gn to L). Vertical bars represent standard error.
1736PILLAI ET AL.J. VIROL.
due to a combination of genetic drift and compartment-specific
viral adaptation. Intercompartment genetic distance exceeded
5% at the last available sample point (Fig. 3b). Individual G
showed declining intercompartment divergence at each time
point, mirroring the increased contribution of systemic virus to
the seminal viral population. Divergence steadily diminished
from approximately 8% at the onset to 2% at the final sam-
pling time. Finally, hosts I, J, and K characterized by noncom-
partmentalized virus maintained low levels of intercompart-
ment divergence throughout the monitored period; distances
stayed below 2% at nearly all time points (see Fig. S5 in the
Estimation of molecular clock. We used dated maximum
likelihood phylogenies of sequences from host F, the only
individual with compartmentalized virus and with available
longitudinal data, to compare the viral molecular clock be-
tween plasma and semen. The estimated absolute rates of mo-
lecular evolution based on these phylogenies were 0.01004877
and 0.00637917 substitutions/site/year for plasma- and semen-
derived sequences, respectively.
Semen-specific env genetic signature. Although phylogenetic
evidence suggests that semen- and blood-derived viruses from
a given host are more closely related to each other than to virus
from corresponding tissues in other individuals, semen-derived
FIG. 3. Longitudinal viral diversity and divergence in an individual with compartmentalized virus, individual F. (a) Genetic diversity measured
over a 2-year period. (b) Divergence and intercompartment genetic distance measured over a 2-year period. Vertical bars represent standard error.
VOL. 79, 2005 SEMEN-SPECIFIC GENETIC CHARACTERISTICS OF HIV-1 env1737
viruses may still share genetic characteristics across individu-
als due to tissue-specific selective pressures that are common
across hosts. We employed a machine learning approach (27,
33, 39) to identify a genetic signature associated with seminal
tropism. The J48 decision tree inducer (based on the C4.5
algorithm) used in our analysis has been relied on extensively
as an alternative to traditional discriminant analysis, due
largely to its capacity to detect and exploit interactions be-
tween feature variables in training data sets (27). We first
applied this algorithm to classify env sequences from all indi-
viduals based on tissue of origin. The training data for this
experiment drew samples from the entire available sequence
set, consisting of 376 plasma sequences and 283 from semen.
Our results (Table 1) indicate that in this first classification
only 65% of sequences were classified correctly, and seminal
tropism was predicted with a true positive rate of 0.48.
It is likely that a lack of apparent viral compartmentalization
is due to persistent trafficking between blood and semen. To
determine if these low scores were due to the presence of viral
sequence data classified as semen-derived that actually repre-
sented a recent introgression of plasma virus into the male
genital tract, we purged the training set of all data associated
with noncompartmentalized hosts. We retained the sequence
data from individual G at compartmentalized time points. This
pruned set consisted of 143 plasma sequences and 122 from
semen. Our results for this second trial (Table 1) demonstrate
a strong genetic signature associated with semen-derived se-
quences; 82% of sequences were classified accurately based on
tissue of origin, and seminal tropism was predicted with a
precision of 0.842 and a TPR of 0.818 (well over 90% of
sequences were classified accurately when the entire training
set was used for testing). It is important to point out that the
cross-validation procedure used to evaluate this model is quite
conservative; the classifier is always tested on a subset of the
sequence data that it did not encounter during the training
process. The signature underlying seminal tropism com-
prises a total of four positions within the C2-V3 region
(numbered from the start of HXB2 gp160): 270, 291, 387, and
464 (Fig. 4; see Fig. S6 in the supplemental material). The bulk
of the signature focuses on either the amino acid character at
position 464 or its immediate linkage with a single other env
Identification of positively selected sites. We used a maxi-
mum likelihood approach to identify sites within env that were
under positive selection in both compartments, focusing on
individuals with compartmentalized virus. We sought to deter-
mine if the overall extent of selection and the array of sites
under selection varied between compartments, consistent with
our finding of a male genital tract-specific genetic signature.
Sequence data from hosts A to G (including only data from the
initial compartmentalized points associated with subject G)
were first individually evaluated on a per compartment basis by
using a model selection procedure to account for any existing
mutational biases. Next the FEL approach (22) was employed
to test for selective pressure at a given site. All sites in both
compartments that appeared to be under positive selection
were cataloged and compared. The number of positively se-
lected sites was universally lower in semen-derived viral pop-
ulations (P ? 0.01 by a paired Wilcoxon test) (Table 2). Four
out of seven individuals failed to exhibit positive selection at
any sites within the C2-V3 region in their seminal virus. Ad-
ditionally, in most cases the sites determined to be under
positive selection varied between compartments. Only 3 out of
10 sites identified in seminal populations were also positively
selected in corresponding plasma populations (Table 2).
N-linked glycosylation in plasma- and semen-derived viral
populations. To investigate variation in selection pressure
from the neutralizing antibody response, we examined glyco-
slyation patterns across the viral envelope (48). If the antibody
response is attenuated in the male genital tract, we might
expect fewer glycosylation sites within semen-derived viral se-
quences. If the response is equivalent, but targeting different
epitopes, we might expect a reassortment of sites though the
overall number may remain constant. Our results demonstrate
that the extent of glycosylation differs significantly in six out of
FIG. 4. Genetic signature associated with seminal sequences from compartmentalized individuals. Decision tree classifying C2-V3 env se-
quences based on tissue of origin with 82% accuracy. p, plasma classification; s, semen. The values in parentheses are the number of instances/
number of incorrect classifications. Residue numbers are based on HXB2 gp160 positions.
TABLE 1. Classification of env C2V3 sequences based on
tissue of origin (cross-validation statistics)
Correctly classified instances
Incorrectly classified instances
Total number of instances
True positive rate, blood
True positive rate, semen
1738 PILLAI ET AL.J. VIROL.
seven patients characterized by compartmentalized virus, but
the direction of the discrepancy is inconsistent (P ? 0.05 for six
intrapatient comparisons; Mann-Whitney test). Individuals A,
E, and G have higher average numbers of sequons in semen-
derived sequences, while the opposite condition holds true for
individuals C, D, and F (Fig. 5).
The distribution of glycosylation sites over time was tracked
in the two individuals with compartmentalized virus and with
associated longitudinal sequence data. Semen-derived sequences
from individual F gradually acquired a single additional sequon
at a site (position 411) that was never glycosylated in plasma
populations. Plasma sequences demonstrated a continual re-
assortment of sites with negligible fluctuation in overall number,
in accordance with the notion of an evolving “glycan shield”
(48). Individual G exhibited a gradual increase in net number
of glycosylation sites in both seminal and plasma-derived env
sequences, with little reassortment in either compartment.
Prediction of coreceptor usage. We predicted the chemokine
receptor preference for all sequences derived from patients
with compartmentalized virus to determine if seminal tropism
was correlated with altered coreceptor usage. Our results sug-
gest that a trend towards reduced CXCR4 usage in the male
genital tract exists, although it is not statistically significant due
to the rarity of the CXCR4 phenotype across individuals and
compartments; only three out of seven hosts harbored variants
predicted to use the CXCR4 receptor (Fig. 6).
Evaluation of codon usage bias. It has previously been re-
ported that the differential availability of nucleotide precursor
pools in target cells may influence HIV-1 codon usage patterns.
Additionally, the cytidine deaminase APOBEC3G, found in
lymphocytes, induces G to A mutations that skew codon usage
towards A-rich triplets (51). If viral target cells within the male
genital tract differ from peripheral tissues in precursor fre-
quencies and APOBEC3G expression levels, an altered codon
usage bias may evolve in seminal virus. Our analysis revealed
no significant differences in codon usage between blood and
semen virus (data not shown).
In these investigations we applied a battery of computational
techniques to paired semen- and blood-derived HIV-1 env
sequences, which confirmed previous reports that HIV within
the genital tract is different from that within the bloodstream
(10, 20). This study extends those observations with findings
important to the understanding of how HIV adapts to the male
genital tract. First, the male genital tract can function as a viral
compartment, but the extent of compartmentalization differs
between individuals and within individuals over time. Second,
there are discordant selective pressures operating in the male
genital tract and blood. Third, semen-derived viruses share a
FIG. 5. Extent of viral glycosylation in plasma and semen of individuals with compartmentalized virus. N-linked glycosylation sites were
predicted based on an NXS or NXT sequence motif. Asterisks indicate significant comparisons (P ? 0.05 by a Mann-Whitney test). Vertical bars
represent standard error.
TABLE 2. Sites under positive selection in individuals characterized by compartmentalized virusa
Sites by individual
461 354, 438402 335, 336, 337, 340, 343, 354, 446
283, 335, 336, 346, 354, 263, 364, 405, 466, 467
354, 401, 402, 455, 460, 463, 471
aFewer sites were under selective pressure in seminal populations based on the FEL method (P ? 0.01 by a paired Wilcoxon test).
VOL. 79, 2005SEMEN-SPECIFIC GENETIC CHARACTERISTICS OF HIV-1 env1739
genetic signature across individuals due to tissue-specific se-
lective pressures that are common across hosts.
Viral compartments are characterized by a restriction of
gene flow between cells or tissues, usually identified by phylo-
genetic analysis (29). In this study, viral compartmentalization
between blood and the male genital tract was identified in 6 out
of 12 individuals, and another individual demonstrated com-
partmentalization of virus only at the earliest sampling times.
Viral migration between blood plasma and the male genital
tract was minimal and infrequent in these individuals, which
reinforces the concept that a significant fraction of virus shed
in semen is produced locally in the male genital tract. Further-
more, there was a lower genetic diversity and rate of molecular
evolution in seminal sequences, probably reflecting a lower
effective population size within the male genital tract. This
lower effective population size may contribute to the genetic
bottleneck associated with HIV-1 transmission. We cannot ex-
clude the possibility, however, that sampling issues contributed
to this phenomenon; the efficiency of RNA extraction and
reverse transcription-PCR may be lower in semen than plasma,
increasing the potential for resampling.
The degree of compartmentalization varied among individ-
uals and also within individuals over time. This may explain the
observations of intermittent viral shedding in the semen of
HIV-infected men (15, 47) and the increased viral shedding
when the urethra is inflamed by concomitant bacterial or viral
infection (40). Local inflammation is a likely explanation for
increased trafficking of HIV from the circulation to the genital
compartment. Future studies examining the relationship be-
tween sexually transmitted infections and seminal viral loads
may provide valuable insight into viral adaptation and dynam-
ics within the male genital tract. This understanding could be
crucial in the development of methods to interrupt HIV trans-
mission such as vaccines, microbicides, and antiretroviral sup-
Seeding of genital tissues occurs very early in infection be-
fore the development of any anti-HIV immune response (13).
Once the host mounts an anti-HIV immune response, it most
likely varies in strength and nature between compartments
(29). We investigated the degree of selection on the virus
within the two compartments and found that there was greater
positive selection on virus in the blood than virus in the male
genital tract. In six out of the seven individuals with compart-
mentalized virus, there were highly significant differences in
env glycosylation but not in a consistent direction. While this
reinforces the theory that virus is produced locally in the male
genital tract and responds to local humoral immunity, it does
not explain the recent reports that HIV transmission through
heterosexual exposure involves viruses with fewer envelope
Since cellular tropism may also play a role in viral compart-
mentalization and adaptation to the male genital tract, we
investigated the coreceptor usage of viruses in blood and se-
men. It is provocative that in all individuals who harbored
CXCR4-using viruses, these viruses were underrepresented in
the genital tract. Selection favoring R5 variants in the male
genital tract may explain the observation that newly infected
individuals are disproportionately infected with CCR5-using
viruses (54, 55).
Although HIV within the male genital tract is often different
from that within the bloodstream (10, 17, 32), the initially
infecting virus (founding virus) and the individual’s immune
responses determine viral genetics more than tissue of origin
(29). Therefore, it has been difficult to determine if semen-
derived virus shares common genetic characteristics among
individuals (10). Using machine learning techniques, we have
found that semen-derived HIV-1 has a strong genetic signature
among individuals with compartmentalized virus. The signa-
ture comprises several positions across C2-V3; however, the
residue at position 464 appears to be the most critical in de-
termining viral tropism to the male genital tract. This particu-
lar position, to the best of our knowledge, has not previously
been reported within the context of tissue tropism or viral
compartmentalization. Nevertheless, this classification trial
presents convincing evidence that the male genital tract envi-
ronment selects for similar, predictable genetic changes in env
The male genital tract has been characterized as a reservoir
FIG. 6. Coreceptor phenotype in plasma and semen of individuals with compartmentalized virus showing evidence of CXCR4 usage in either
tissue. Phenotypes were predicted based on V3 genotype by using a machine learning approach.
1740 PILLAI ET AL. J. VIROL.
(43, 52), a compartment (10), and a drug sanctuary (45). All
have significant implications for preventing the transmission of
HIV by using various theoretical methods such as microbi-
cides, vaccines, or antiretroviral therapy (2, 9, 10). Our inves-
tigations uniquely detail the viral compartmentalization dy-
namics and differing selection pressures between the blood and
male genital tract and document a specific genetic signature of
virus compartmentalized in the male genital tract. Taken to-
gether, these data offer important insights into the adaptation
of HIV to the male genital tract, which may be valuable in the
rational design of an effective vaccine.
We are grateful to Susan Little and Simon Frost for their insightful
comments. We also thank Brian Gaschen for assistance with assimi-
lating the sequence data, John Day for his technical expertise, and
Darica Smith and Sharon Wilcox for helping with the preparation of
This work was supported by grants 5K23AI055276, AI27670,
AI38858, AI43638, AI43752, AI36214 (UCSD Center for AIDS Re-
search), AI29164, and AI047745 from the National Institutes of
Health. Additional support was provided by the Research Center for
AIDS and HIV Infection of the San Diego Veterans Affairs Health-
1. Altfeld, M., E. S. Rosenberg, R. Shankarappa, J. S. Mukherjee, F. M. Hecht,
R. L. Eldridge, M. M. Addo, S. H. Poon, M. N. Phillips, G. K. Robbins, P. E.
Sax, S. Boswell, J. O. Kahn, C. Brander, P. J. Goulder, J. A. Levy, J. I.
Mullins, and B. D. Walker. 2001. Cellular immune responses and viral
diversity in individuals treated during acute and early HIV-1 infection. J Exp.
2. Auvert, B., S. Males, A. Puren, A. Taljaard, M. Carael, and B. Williams.
2004. Can highly active antiretroviral therapy reduce the spread of HIV?: a
study in a township of South Africa. J. Acquir. Immune Defic. Syndr. 36:
3. Chakraborty, H., P. K. Sen, R. W. Helms, P. L. Vernazza, S. A. Fiscus, J. J.
Eron, B. K. Patterson, R. W. Coombs, J. N. Krieger, and M. S. Cohen. 2001.
Viral burden in genital secretions determines male-to-female sexual trans-
mission of HIV-1: a probabilistic empiric model. AIDS 15:621–627.
4. Chun, T.-W., L. Carruth, D. Finzi, X. Shen, J. A. DiGiuseppe, H. Taylor, M.
Hermankova, K. Chadwick, J. Margolick, T. C. Quinn, Y.-H. Kuo, R. Brook-
meyer, M. A. Zeiger, P. Barditch-Crovo, and R. F. Siliciano. 1997. Quanti-
fication of latent tissue reservoirs and total body viral load in HIV-1 infec-
tion. Nature 387:183–188.
5. Coffin, J. M. 1995. HIV population dynamics in vivo: implications for genetic
variation, pathogenesis and therapy. Science 267:483–489.
6. Coombs, R. W., P. S. Reichelderfer, and A. L. Landay. 2003. Recent obser-
vations on HIV type-1 infection in the genital tract of men and women.
7. Coombs, R. W., C. E. Speck, J. P. Hughes, W. Lee, R. Sampoleo, S. O. Ross,
J. Dragavon, G. Peterson, T. M. Hooton, A. C. Collier, L. Corey, L. Koutsky,
and J. N. Krieger. 1998. Association between culturable human immunode-
ficiency virus type 1 (HIV-1) in semen and HIV-1 RNA levels in semen and
blood: evidence for compartmentalization of HIV-1 between semen and
blood. J. Infect. Dis. 177:320–330.
8. Corpet, F. 1988. Multiple sequence alignment with hierarchical clustering.
Nucleic Acids Res. 16:10881–10890.
9. Davis, C. W., and R. W. Doms. 2004. HIV transmission: closing all the doors.
J. Exp. Med. 199:1037–1040.
10. Delwart, E. L., J. I. Mullins, P. Gupta, G. H. Learn, Jr., M. Holodniy, D.
Katzenstein, B. D. Walker, and M. K. Singh. 1998. Human immunodefi-
ciency virus type 1 populations in blood and semen. J. Virol. 72:617–623.
11. Derdeyn, C. A., J. M. Decker, F. Bibollet-Ruche, J. L. Mokili, M. Muldoon,
S. A. Denham, M. L. Heil, F. Kasolo, R. Musonda, B. H. Hahn, G. M. Shaw,
B. T. Korber, S. Allen, and E. Hunter. 2004. Envelope-constrained neutral-
ization-sensitive HIV-1 after heterosexual transmission. Science 303:2019–
12. Drew, W. L., R. C. Miner, D. F. Busch, S. E. Follansbee, J. Gullett, S. G.
Mehalko, S. M. Gordon, W. F. Owen, Jr., T. R. Matthews, W. C. Buhles, and
B. DeArmond. 1991. Prevalence of resistance in patients receiving ganciclovir
for serious cytomegalovirus infection. J. Infect. Dis. 163:716–719.
13. Dyer, J. R., B. L. Gilliam, J. J. Eron, Jr., M. S. Cohen, S. A. Fiscus, and P. L.
Vernazza. 1997. Shedding of HIV-1 in semen during primary infection.
14. Felsenstein, J. 1993. PHYLIP–phylogeny inference package, version 3.5c.
University of Washington, Seattle, Washington.
15. Fiscus, S. A., P. L. Vernazza, B. Gilliam, J. Dyer, J. J. Eron, and M. S. Cohen.
1998. Factors associated with changes in HIV shedding in semen. AIDS Res.
Hum. Retrovir. 14(Suppl. 1):S27–S31.
16. Gunthard, H. F., D. V. Havlir, S. Fiscus, Z. Q. Zhang, J. Eron, J. Mellors, R.
Gulick, S. D. Frost, A. J. Brown, W. Schleif, F. Valentine, L. Jonas, A.
Meibohm, C. C. Ignacio, R. Isaacs, R. Gamagami, E. Emini, A. Haase, D. D.
Richman, and J. K. Wong. 2001. Residual human immunodeficiency virus
(HIV) type 1 RNA and DNA in lymph nodes and HIV RNA in genital
secretions and in cerebrospinal fluid after suppression of viremia for 2 years.
J. Infect. Dis. 183:1318–1327.
17. Gupta, P., C. Leroux, B. K. Patterson, L. Kingsley, C. Rinaldo, M. Ding, Y.
Chen, K. Kulka, W. Buchanan, B. McKeon, and R. Montelaro. 2000. Human
immunodeficiency virus type 1 shedding pattern in semen correlates with the
compartmentalization of viral quasispecies between blood and semen. J. In-
fect. Dis. 182:79–87.
18. Hudson, R. R., M. Slatkin, and W. P. Maddison. 1992. Estimation of levels
of gene flow from DNA sequence data. Genetics 132:583–589.
19. Jensen, M. A., and A. B. van’t Wout. 2003. Predicting HIV-1 coreceptor
usage with sequence analysis. AIDS Rev. 5:104–112.
20. Kemal, K. S., B. Foley, H. Burger, K. Anastos, H. Minkoff, C. Kitchen, S. M.
Philpott, W. Gao, E. Robison, S. Holman, C. Dehner, S. Beck, W. A. Meyer,
III, A. Landay, A. Kovacs, J. Bremer, and B. Weiser. 2003. HIV-1 in genital
tract and plasma of women: compartmentalization of viral sequences, core-
ceptor usage, and glycosylation. Proc. Natl. Acad. Sci. USA 100:12972–
21. Kiessling, A. K., G. Zheng, and R. C. Eyre. 1992. Semen producing organs
are an isolated reservoir of HIV which may play a significant role in the
development of drug resistant strains. J. Hum. Virol. 2:193.
22. Kosakovsky-Pond, S., and S. D. W. Frost. Not so different after all: a
comparison of methods for detecting amino acid sites under selection. Mol.
Biol. Evol., in press.
23. Krieger, J. N., R. W. Coombs, A. C. Collier, D. D. Ho, S. O. Ross, J. E. Zeh,
and L. Corey. 1995. Intermittent shedding of human immunodeficiency virus
in semen: implications for sexual transmission. J. Urol. 154:1035–1040.
24. Krieger, J. N., A. Nirapathpongporn, M. Chaiyaporn, G. Peterson, I. Niko-
laeva, R. Akridge, S. O. Ross, and R. W. Coombs. 1998. Vasectomy and
human immunodeficiency virus type 1 in semen. J. Urol. 159:820–825.
25. Marshall, R. D. 1974. The nature and metabolism of the carbohydrate-
peptide linkages of glycoproteins. Biochem. Soc. Symp. 40:17–26.
26. McInerney, J. O. 1998. GCUA: general codon usage analysis. Bioinformatics
27. Mjolsness, E., and D. DeCoste. 2001. Machine learning for science: state of
the art and future prospects. Science 293:2051–2055.
28. Muse, S. V., and B. S. Gaut. 1994. A likelihood approach for comparing
synonymous and nonsynonymous nucleotide substitution rates, with appli-
cation to the chloroplast genome. Mol. Biol. Evol. 11:715–724.
29. Nickle, D. C., D. Shriner, J. E. Mittler, L. M. Frenkel, and J. I. Mullins.
2003. Importance and detection of virus reservoirs and compartments of
HIV infection. Curr. Opin. Microbiol. 6:410–416.
30. Olsen, G. J., H. Matsuda, R. Hagstrom, and R. Overbeek. 2004. fastDNAml:
a tool for construction of phylogenetic trees of DNA sequences using max-
imum likelihood. Comput. Appl. Biosci. 10:41–48.
31. Page, R. D. M. 1996. TREEVIEW: an application to display phylogenetic
trees on personal computers. Comput. Appl. Biosci. 12:357–358.
32. Paranjpe, S., J. Craigo, B. Patterson, M. Ding, P. Barroso, L. Harrison, R.
Montelaro, and P. Gupta. 2002. Subcompartmentalization of HIV-1 quasi-
species between seminal cells and seminal plasma indicates their origin in
distinct genital tissues. AIDS Res. Hum. Retrovir. 18:1271–1280.
33. Pillai, S., B. Good, D. Richman, and J. Corbeil. 2003. A new perspective on
V3 phenotype prediction. AIDS Res. Hum. Retrovir. 19:145–149.
34. Piot, P., M. Bartos, P. D. Ghys, N. Walker, and B. Schwartlander. 2001. The
global impact of HIV/AIDS. Nature 410:968–973.
35. Quinlan, J. R. 1993. C4.5: programs for machine learning. Morgan Kauf-
mann, San Francisco, Calif.
36. Quinn, T. C., M. J. Wawer, N. Sewankambo, D. Serwadda, C. Li, F. Wabwire-
Mangen, M. O. Meehan, T. Lutalo, R. H. Gray, et al. 2000. Viral load and
heterosexual transmission of human immunodeficiency virus type 1. N. Engl.
J. Med. 342:921–929.
37. Rambaut, A. 2002. Se-Al sequence alignment editor version 2.0. Department
of Zoology, University of Oxford, Oxford, United Kingdom.
38. Rambaut, A. 2000. Estimating the rate of molecular evolution: incorporating
non-contemporaneous sequences into maximum likelihood phylogenies.
39. Resch, W., N. Hoffman, and R. Swanstrom. 2001. Improved success pheno-
type prediction of the human immunodeficiency virus type 1 from envelope
variable loop 3 sequence using neural networks. Virology 288:51–62.
40. Sadiq, S. T., S. Taylor, S. Kaye, J. Bennett, R. Johnstone, P. Byrne, A. J.
Copas, S. M. Drake, D. Pillay, and I. Weller. 2002. The effects of antiretro-
viral therapy on HIV-1 RNA loads in seminal plasma in HIV-positive pa-
tients with and without urethritis. AIDS 16:219–225.
VOL. 79, 2005SEMEN-SPECIFIC GENETIC CHARACTERISTICS OF HIV-1 env1741
41. Singh, A., G. Besson, A. Mobasher, and R. G. Collman. 1999. Patterns of
chemokine receptor fusion cofactor utilization by human immunodeficiency
virus type 1 variants from the lungs and blood. J. Virol. 73:6680–6690.
42. Slatkin, M., and W. P. Maddison. 1989. A cladistic measure of gene flow
inferred from the phylogenies of alleles. Genetics 123:603–613.
43. Smith, D. M., J. D. Kingery, J. K. Wong, C. C. Ignacio, D. D. Richman, and
S. J. Little. 2004. The prostate as a reservoir for HIV-1. AIDS 18:6–8.
44. Strain, M. C., H. F. Gu ¨nthard, D. V. Havlir, C. C. Ignacio, D. M. Smith, A. J.
Leigh Brown, T. R. Macaranas, R. Y. Lam, O. A. Daly, M. Fischer, M.
Opravil, H. Levine, L. Bacheler, C. A. Spina, D. D. Richman, and J. K. Wong.
2003. Heterogeneous clearance rates of long-lived lymphocytes infected with
HIV: intrinsic stability predicts lifelong persistence. Proc. Natl. Acad. Sci.
45. Taylor, S., R. P. van Heeswijk, R. M. Hoetelmans, J. Workman, S. M. Drake,
D. J. White, and D. Pillay. 2000. Concentrations of nevirapine, lamivudine
and stavudine in semen of HIV-1-infected men. AIDS 14:1979–1984.
46. UNAIDS/WHO. 2004. AIDS epidemic update: December 2003. UNAIDS/
World Health Organization, Geneva, Switzerland.
47. Vernazza, P. L., B. L. Gilliam, J. Dyer, S. A. Fiscus, J. J. Eron, A. C. Frank,
and M. S. Cohen. 1997. Quantification of HIV in semen: correlation with
antiviral treatment and immune status. AIDS 11:987–993.
48. Wei, X., J. M. Decker, S. Wang, H. Hui, J. C. Kappes, X. Wu, J. F. Salazar-
Gonzalez, M. G. Salazar, J. M. Kilby, M. S. Saag, N. L. Komarova, M. A.
Nowak, B. H. Hahn, P. D. Kwong, and G. M. Shaw. 2003. Antibody neutral-
ization and escape by HIV-1. Nature 422:307–312.
49. Witten, I. H., and E. Frank. 2000. Data mining practical machine learning
tools and techniques with Java implementations. Morgan Kaufmann, San
50. Wong, J. K., C. C. Ignacio, F. Torriani, D. Havlir, N. J. S. Fitch, and D. D.
Richman. 1997. In vivo compartmentalization of HIV: evidence from the
examination of pol sequences from autopsy tissues. J. Virol. 70:2059–2071.
51. Yu, Q., R. Konig, S. Pillai, K. Chiles, M. Kearney, S. Palmer, D. Richman,
J. M. Coffin, and N. R. Landau. 2004. Single-strand specificity of
APOBEC3G accounts for minus-strand deamination of the HIV genome.
Nat. Struct. Mol. Biol. 11:435–442.
52. Zhang, H., G. Dornadula, M. Beumont, L. Livornese, B. Van Uitert, K.
Henning, and R. J. Pomerantz. 1998. Human immunodeficiency virus type 1
in the semen of men receiving highly active antiretroviral therapy. N. Engl.
J. Med. 339:1803–1809.
53. Zhang, L., L. Rowe, T. He, C. Chung, J. Yu, W. Yu, A. Talal, M. Markowitz,
and D. D. Ho. 2002. Compartmentalization of surface envelope glycoprotein
of human immunodeficiency virus type 1 during acute and chronic infection.
J. Virol. 76:9465–9473.
54. Zhang, L. Q., P. MacKenzie, A. Cleland, E. C. Holmes, A. J. Leigh-Brown,
and P. Simmonds. 1993. Selection for specific sequences in the external
envelope protein of human immunodeficiency virus type 1 upon primary
infection. J. Virol. 67:3345–3356.
55. Zhu, T., H. Mo, N. Wang, D. S. Nam, Y. Cao, R. A. Koup, and D. D. Ho. 1993.
Genotypic and phenotypic characterization of HIV-1 in patients with pri-
mary infection. Science 261:1179–1184.
1742PILLAI ET AL. J. VIROL.