-
Melissa-Rose Abrahams,
Florette K Treurnicht,
Nobubelo K Ngandu,
Sarah A Goodier,
Jinny C Marais,
Helba Bredell,
Ruwayhida Thebus,
Debra de Assis Rosa,
Koleka Mlisana, Cathal Seoighe,
Salim Abdool Karim,
Clive M Gray,
Carolyn Williamson
[show abstract]
[hide abstract]
ABSTRACT: OBJECTIVE(S):: There is limited information on full-length genome sequences and the early evolution of transmitted HIV-1 subtype C viruses, which constitute the majority of viruses spread in Africa. The purpose of this study was to characterize the earliest changes across the genome of subtype C viruses following transmission, to better understand early control of viremia. DESIGN:: We derived the near full-length genome sequence responsible for clinical infection from five HIV subtype C-infected individuals with different disease progression profiles and tracked adaptation to immune responses in the first 6 months of infection. METHODS:: Near full-length genomes were generated by single genome amplification and direct sequencing. Sequences were analyzed for amino acid mutations associated with cytotoxic T lymphocyte (CTL) or antibody-mediated immune pressure, and for reversion. RESULTS:: Fifty-five sequence changes associated with adaptation to the new host were identified, with 38% attributed to CTL pressure, 35% to antibody pressure, 16% to reversions and the remainder were unclassified. Mutations in CTL epitopes were most frequent in the first 5 weeks of infection, with the frequency declining over time with the decline in viral load. CTL escape predominantly occurred in nef, followed by pol and env. Shuffling/toggling of mutations was identified in 81% of CTL epitopes, with only 7% reaching fixation within the 6-month period. CONCLUSION:: There was rapid virus adaptation following transmission, predominantly driven by CTL pressure, with most changes occurring during high viremia. Rapid escape and complex escape pathways provide further challenges for vaccine protection.
AIDS (London, England) 02/2013; 27(4):507-518. · 4.91 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Heterogeneity in sample composition is an inherent issue in many gene expression studies and, in many cases, should be taken into account in the downstream analysis to enable correct interpretation of the underlying biological processes. Typical examples are infectious diseases or immunology-related studies using blood samples, where, for example, the proportions of lymphocyte sub-populations are expected to vary between cases and controls. Nonnegative Matrix Factorization (NMF) is an unsupervised learning technique that has been applied successfully in several fields, notably in bioinformatics where its ability to extract meaningful information from high-dimensional data such as gene expression microarrays has been demonstrated. Very recently, it has been applied to biomarker discovery and gene expression deconvolution in heterogeneous tissue samples. Being essentially unsupervised, standard NMF methods are not guaranteed to find components corresponding to the cell types of interest in the sample, which may jeopardize the correct estimation of cell proportions. We have investigated the use of prior knowledge, in the form of a set of marker genes, to improve gene expression deconvolution with NMF algorithms. We found that this improves the consistency with which both cell type proportions and cell type gene expression signatures are estimated. The proposed method was tested on a microarray dataset consisting of pure cell types mixed in known proportions. Pearson correlation coefficients between true and estimated cell type proportions improved substantially (typically from about 0.5 to approximately 0.8) with the semi-supervised (marker-guided) versions of commonly used NMF algorithms. Furthermore known marker genes associated with each cell type were assigned to the correct cell type more frequently for the guided versions. We conclude that the use of marker genes improves the accuracy of gene expression deconvolution using NMF and suggest modifications to how the marker gene information is used that may lead to further improvements.
Infection, genetics and evolution: journal of molecular epidemiology and evolutionary genetics in infectious diseases 09/2011; 12(5):913-21. · 3.22 Impact Factor
-
Norman L Letvin,
Srinivas S Rao,
David C Montefiori,
Michael S Seaman,
Yue Sun,
So-Yon Lim,
Wendy W Yeh,
Mohammed Asmal,
Rebecca S Gelman,
Ling Shen, [......],
Adam P Buzby,
Linh V Mach,
Jinrong Zhang,
Harikrishnan Balachandran,
George M Shaw,
Stephen D Schmidt,
John-Paul Todd,
Alan Dodson,
John R Mascola,
Gary J Nabel
[show abstract]
[hide abstract]
ABSTRACT: The RV144 vaccine trial in Thailand demonstrated that an HIV vaccine could prevent infection in humans and highlights the importance of understanding protective immunity against HIV. We used a nonhuman primate model to define immune and genetic mechanisms of protection against mucosal infection by the simian immunodeficiency virus (SIV). A plasmid DNA prime/recombinant adenovirus serotype 5 (rAd5) boost vaccine regimen was evaluated for its ability to protect monkeys from infection by SIVmac251 or SIVsmE660 isolates after repeat intrarectal challenges. Although this prime-boost vaccine regimen failed to protect against SIVmac251 infection, 50% of vaccinated monkeys were protected from infection with SIVsmE660. Among SIVsmE660-infected animals, there was about a one-log reduction in peak plasma virus RNA in monkeys expressing the major histocompatibility complex class I allele Mamu-A*01, implicating cytotoxic T lymphocytes in the control of SIV replication once infection is established. Among Mamu-A*01-negative monkeys challenged with SIVsmE660, no CD8(+) T cell response or innate immune response was associated with protection against virus acquisition. However, low levels of neutralizing antibodies and an envelope-specific CD4(+) T cell response were associated with vaccine protection in these monkeys. Moreover, monkeys that expressed two TRIM5 alleles that restrict SIV replication were more likely to be protected from infection than monkeys that expressed at least one permissive TRIM5 allele. This study begins to elucidate the mechanisms of vaccine protection against immunodeficiency viruses and highlights the need to analyze these immune and genetic correlates of protection in future trials of HIV vaccine strategies.
Science translational medicine 05/2011; 3(81):81ra36. · 7.80 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: In order to interpret the results obtained from a microarray experiment, researchers often shift focus from analysis of individual differentially expressed genes to analyses of sets of genes. These gene-set analysis (GSA) methods use previously accumulated biological knowledge to group genes into sets and then aim to rank these gene sets in a way that reflects their relative importance in the experimental situation in question. We suspect that the presence of paralogs affects the ability of GSA methods to accurately identify the most important sets of genes for subsequent research.
We show that paralogs, which typically have high sequence identity and similar molecular functions, also exhibit high correlation in their expression patterns. We investigate this correlation as a potential confounding factor common to current GSA methods using Indygene http://www.cbio.uct.ac.za/indygene, a web tool that reduces a supplied list of genes so that it includes no pairwise paralogy relationships above a specified sequence similarity threshold. We use the tool to reanalyse previously published microarray datasets and determine the potential utility of accounting for the presence of paralogs.
The Indygene tool efficiently removes paralogy relationships from a given dataset and we found that such a reduction, performed prior to GSA, has the ability to generate significantly different results that often represent novel and plausible biological hypotheses. This was demonstrated for three different GSA approaches when applied to the reanalysis of previously published microarray datasets and suggests that the redundancy and non-independence of paralogs is an important consideration when dealing with GSA methodologies.
BMC Bioinformatics 01/2011; 12:29. · 2.75 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Admixed populations present unique opportunities to discover the genetic factors underlying many multifactorial diseases. The geographical position and complex history of South Africa has led to the establishment of the unique admixed population known as the South African Coloured. Not much is known about the genetic make-up of this population, and the historical record is patchy. We genotyped 959 individuals from the Western Cape area, self-identified as belonging to this population, using the Affymetrix 500k genotyping platform. This resulted in nearly 75,000 autosomal SNPs that could be compared with populations represented in the International HapMap Project and the Human Genome Diversity Project. Analysis by means of both the admixture and linkage models in STRUCTURE revealed that the major ancestral components of this population are predominantly Khoesan (32-43%), Bantu-speaking Africans (20-36%), European (21-28%) and a smaller Asian contribution (9-11%), depending on the model used. This is consistent with historical data. While of great historical and genealogical interest, this information is also essential for future admixture mapping of disease genes in this population.
Human Genetics 08/2010; 128(2):145-53. · 5.07 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Existing methods for the prediction of immunologically active T-cell epitopes are based on the amino acid sequence or structure of pathogen proteins. Additional information regarding the locations of epitopes may be acquired by considering the evolution of viruses in hosts with different immune backgrounds. In particular, immune-dependent evolutionary patterns at sites within or near T-cell epitopes can be used to enhance epitope identification. We have developed a mutation-selection model of T-cell epitope evolution that allows the human leukocyte antigen (HLA) genotype of the host to influence the evolutionary process. This is one of the first examples of the incorporation of environmental parameters into a phylogenetic model and has many other potential applications where the selection pressures exerted on an organism can be related directly to environmental factors. We combine this novel evolutionary model with a hidden Markov model to identify contiguous amino acid positions that appear to evolve under immune pressure in the presence of specific host immune alleles and that therefore represent potential epitopes. This phylogenetic hidden Markov model provides a rigorous probabilistic framework that can be combined with sequence or structural information to improve epitope prediction. As a demonstration, we apply the model to a data set of HIV-1 protein-coding sequences and host HLA genotypes.
Molecular Biology and Evolution 05/2010; 27(5):1212-20. · 5.55 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Nonnegative Matrix Factorization (NMF) is an unsupervised learning technique that has been applied successfully in several fields, including signal processing, face recognition and text mining. Recent applications of NMF in bioinformatics have demonstrated its ability to extract meaningful information from high-dimensional data such as gene expression microarrays. Developments in NMF theory and applications have resulted in a variety of algorithms and methods. However, most NMF implementations have been on commercial platforms, while those that are freely available typically require programming skills. This limits their use by the wider research community.
Our objective is to provide the bioinformatics community with an open-source, easy-to-use and unified interface to standard NMF algorithms, as well as with a simple framework to help implement and test new NMF methods. For that purpose, we have developed a package for the R/BioConductor platform. The package ports public code to R, and is structured to enable users to easily modify and/or add algorithms. It includes a number of published NMF algorithms and initialization methods and facilitates the combination of these to produce new NMF strategies. Commonly used benchmark data and visualization methods are provided to help in the comparison and interpretation of the results.
The NMF package helps realize the potential of Nonnegative Matrix Factorization, especially in bioinformatics, providing easy access to methods that have already yielded new insights in many applications. Documentation, source code and sample data are available from CRAN.
BMC Bioinformatics 01/2010; 11:367. · 2.75 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: In eukaryotes mRNA transcripts of protein-coding genes in which an intron has been retained in the coding region normally result in premature stop codons and are therefore degraded through the nonsense-mediated mRNA decay (NMD) pathway. There is evidence in the form of selective pressure for in-frame stop codons in introns and a depletion of length three introns that this is an important and conserved quality-control mechanism. Yet recent reports have revealed that the efficiency of NMD varies across tissues and between individuals, with important clinical consequences.
Using previously published Affymetrix exon microarray data from cell lines genotyped as part of the International HapMap project, we investigated whether there are heritable, inter-individual differences in the abundance of intron-containing transcripts, potentially reflecting differences in the efficiency of NMD. We identified intronic probesets using EST data and report evidence of heritability in the extent of intron expression in 56 HapMap trios. We also used a genome-wide association approach to identify genetic markers associated with intron expression. Among the top candidates was a SNP in the DCP1A gene, which forms part of the decapping complex, involved in NMD.
While we caution that some of the apparent inter-individual difference in intron expression may be attributable to different handling or treatments of cell lines, we hypothesize that there is significant polymorphism in the process of NMD, resulting in heritable differences in the abundance of intronic mRNA. Part of this phenotype is likely to be due to a polymorphism in a decapping enzyme on human chromosome 3.
PLoS ONE 01/2010; 5(7):e11657. · 4.09 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The cytotoxic T-lymphocyte immune response is important in controlling HIV-1 replication in infected humans. In this immune pathway, viral peptides within infected cells are presented to T-lymphocytes by the polymorphic human leukocyte antigens (HLA). HLA alleles exert selective pressure on the peptide regions and immune escape mutations that occur at some of the targeted sites can enable the virus to adapt to the infected host. The pattern of ongoing immune escape and reversion associated with several human HLA alleles has been studied extensively. Such mutations revert upon transmission to a host without the HLA allele because the escape mutation incurs a fitness cost. However, to-date there has been little attempt to study permanent loss of CTL epitopes due to escape mutations without an effect on fitness.
Here, we set out to determine the extent of adaptation of HIV-1 to three well-characterized HLA alleles during the initial exposure of the virus to the human cytotoxic immune responses following transmission from chimpanzee. We generated a chimpanzee consensus sequence to approximate the virus sequence that was initially transmitted to the human host and used a method based on peptide binding affinity to HLA crystal structures to predict peptides that were potentially targeted by the HLA alleles on this sequence. Next, we used codon-based phylogenetic models to quantify the average selective pressure that acted on these regions during the period immediately following the zoonosis event, corresponding to the branch of the phylogenetic tree leading to the common ancestor of all of the HIV-1 sequences. Evidence for adaptive evolution during this period was observed at regions recognised by HLA A*6801 and A*0201, both of which are common in African populations. No evidence of adaptive evolution was observed at sites targeted by HLA-B*2705, which is a rare allele in African populations.
Our results suggest that the ancestral HIV-1 virus experienced a period of positive selective pressure due to immune responses associated with HLA alleles that were common in the infected human population. We propose that this resulted in permanent escape from immune responses targeting unconstrained regions of the virus.
Virology Journal 10/2009; 6:164. · 2.34 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The Mesembryanthemoideae and Ruschioideae subfamilies are a major component of the Greater Cape Floristic Region in southern Africa. The Ruschioideae show an astonishing diversity of leaf shape and growth forms. Although 1,585 species are recognised within the morphologically diverse Ruschioideae, these species show minimal variation in plastid DNA sequence. We have investigated whether changes in selected leaf development transcription factors underpin the recent, rapid diversification of this large group of succulent plants. Degenerate primers designed to conserved regions of Asymmetric Leaves1/Rough Sheath 2/Phantastica (ARP) and the Class III HD-ZIP family of genes, were used to amplify sequences corresponding to these genes from several species within the Mesembryanthemoideae and Ruschioideae subfamilies. Two members of the Class III HD-ZIP family were identified in both the Mesembryanthemoideae and Ruschioideae, and were derived from an ancient gene duplication event that preceded the divergence of gymnosperms and angiosperms. While a single ARP orthologue was identified in the Mesembryanthemoideae, two paralogues, ARPa and ARPb, were identified in the Ruschioideae subfamily. ARPa was present in all species of Ruschioideae analysed in this study. ARPb has been lost from the Apatesieae and Dorotheantheae tribes, which form an early evolutionary branch from the Ruschieae tribe, as well as from selected species within the Ruschieae. The recent duplication and subsequent selected gene loss of the ARP transcription factor correlates with the rapid diversification of plant forms in the Ruschioideae.
Archiv für Entwickelungsmechanik der Organismen 07/2009; 219(6):331-8. · 1.77 Impact Factor
-
Natasha Wood,
Tanmoy Bhattacharya,
Brandon F Keele,
Elena Giorgi,
Michael Liu,
Brian Gaschen,
Marcus Daniels,
Guido Ferrari,
Barton F Haynes,
Andrew McMichael,
George M Shaw,
Beatrice H Hahn,
Bette Korber, Cathal Seoighe
[show abstract]
[hide abstract]
ABSTRACT: The pattern of viral diversification in newly infected individuals provides information about the host environment and immune responses typically experienced by the newly transmitted virus. For example, sites that tend to evolve rapidly across multiple early-infection patients could be involved in enabling escape from common early immune responses, could represent adaptation for rapid growth in a newly infected host, or could represent reversion from less fit forms of the virus that were selected for immune escape in previous hosts. Here we investigated the diversification of HIV-1 env coding sequences in 81 very early B subtype infections previously shown to have resulted from transmission or expansion of single viruses (n = 78) or two closely related viruses (n = 3). In these cases, the sequence of the infecting virus can be estimated accurately, enabling inference of both the direction of substitutions as well as distinction between insertion and deletion events. By integrating information across multiple acutely infected hosts, we find evidence of adaptive evolution of HIV-1 env and identify a subset of codon sites that diversified more rapidly than can be explained by a model of neutral evolution. Of 24 such rapidly diversifying sites, 14 were either i) clustered and embedded in CTL epitopes that were verified experimentally or predicted based on the individual's HLA or ii) in a nucleotide context indicative of APOBEC-mediated G-to-A substitutions, despite having excluded heavily hypermutated sequences prior to the analysis. In several cases, a rapidly evolving site was embedded both in an APOBEC motif and in a CTL epitope, suggesting that APOBEC may facilitate early immune escape. Ten rapidly diversifying sites could not be explained by CTL escape or APOBEC hypermutation, including the most frequently mutated site, in the fusion peptide of gp41. We also examined the distribution, extent, and sequence context of insertions and deletions, and we provide evidence that the length variation seen in hypervariable loop regions of the envelope glycoprotein is a consequence of selection and not of mutational hotspots. Our results provide a detailed view of the process of diversification of HIV-1 following transmission, highlighting the role of CTL escape and hypermutation in shaping viral evolution during the establishment of new infections.
PLoS Pathogens 06/2009; 5(5):e1000414. · 9.13 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Host immune responses against infectious pathogens exert strong selective pressures favouring the emergence of escape mutations that prevent immune recognition. Escape mutations within or flanking functionally conserved epitopes can occur at a significant cost to the pathogen in terms of its ability to replicate effectively. Such mutations come under selective pressure to revert to the wild type in hosts that do not mount an immune response against the epitope. Amino acid positions exhibiting this pattern of escape and reversion are of interest because they tend to coincide with immune responses that control pathogen replication effectively. We have used a probabilistic model of protein coding sequence evolution to detect sites in HIV-1 exhibiting a pattern of rapid escape and reversion. Our model is designed to detect sites that toggle between a wild type amino acid, which is susceptible to a specific immune response, and amino acids with lower replicative fitness that evade immune recognition. Through simulation, we show that this model has significantly greater power to detect selection involving immune escape and reversion than standard models of diversifying selection, which are sensitive to an overall increased rate of non-synonymous substitution. Applied to alignments of HIV-1 protein coding sequences, the model of immune escape and reversion detects a significantly greater number of adaptively evolving sites in env and nef. In all genes tested, the model provides a significantly better description of adaptively evolving sites than standard models of diversifying selection. Several of the sites detected are corroborated by association between Human Leukocyte Antigen (HLA) and viral sequence polymorphisms. Overall, there is evidence for a large number of sites in HIV-1 evolving under strong selective pressure, but exhibiting low sequence diversity. A phylogenetic model designed to detect rapid toggling between wild type and escape amino acids identifies a larger number of adaptively evolving sites in HIV-1, and can in some cases correctly identify the amino acid that is susceptible to the immune response.
PLoS Pathogens 01/2009; 4(12):e1000242. · 9.13 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Positive selection pressure acting on protein-coding sequences is usually inferred when the rate of nonsynonymous substitution is greater than the synonymous rate. However, purifying selection acting directly on the nucleotide sequence can lower the synonymous substitution rate. This could result in false inference of positive selection because when synonymous changes at some sites are under purifying selection, the average synonymous rate is an underestimate of the neutral rate of evolution. Even though HIV-1 coding sequences contain a number of regions that function at the nucleotide level, and are thus likely to be affected by purifying selection, studies of positive selection assume that synonymous substitutions can be used to estimate the neutral rate of evolution.
We modelled site-to-site variation in the synonymous substitution rate across coding regions of the HIV-1 genome. Synonymous substitution rates were found to vary significantly within and between genes. Surprisingly, regions of the genome that encode proteins in more than one frame had significantly higher synonymous substitution rates than regions coding in a single frame. We found evidence of strong purifying selection pressure affecting synonymous mutations in fourteen regions with known functions. These included an exonic splicing enhancer, the rev-responsive element, the poly-purine tract and a transcription factor binding site. A further five highly conserved regions were located within known functional domains. We also found four conserved regions located in env and vpu which have not been characterized previously.
We provide the coordinates of genomic regions with markedly lower synonymous substitution rates, which are putatively under the influence of strong purifying selection pressure at the nucleotide level as well as regions encoding proteins in more than one frame. These regions should be excluded from studies of positive selection acting on HIV-1 coding regions.
Virology Journal 01/2009; 5:160. · 2.34 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Probabilistic models of sequence evolution are in widespread use in phylogenetics and molecular sequence evolution. These models have become increasingly sophisticated and combined with statistical model comparison techniques have helped to shed light on how genes and proteins evolve. Models of codon evolution have been particularly useful, because, in addition to providing a significant improvement in model realism for protein-coding sequences, codon models can also be designed to test hypotheses about the selective pressures that shape the evolution of the sequences. Such models typically assume a phylogeny and can be used to identify sites or lineages that have evolved adaptively. Recently some of the key assumptions that underlie phylogenetic tests of selection have been questioned, such as the assumption that the rate of synonymous changes is constant across sites or that a single phylogenetic tree can be assumed at all sites for recombining sequences. While some of these issues have been addressed through the development of novel methods, others remain as caveats that need to be considered on a case-by-case basis. Here, we outline the theory of codon models and their application to the detection of positive selection. We review some of the more recent developments that have improved their power and utility, laying a foundation for further advances in the modeling of coding sequence evolution.
Briefings in Bioinformatics 11/2008; 10(1):97-109. · 5.20 Impact Factor
-
Clive M Gray,
Mandla Mlotshwa,
Catherine Riou,
Tiyani Mathebula,
Debra de Assis Rosa,
Tumelo Mashishi, Cathal Seoighe,
Nobubelo Ngandu,
Francois van Loggerenberg,
Lynn Morris,
Koleka Mlisana,
Carolyn Williamson,
Salim Abdool Karim
[show abstract]
[hide abstract]
ABSTRACT: It is unknown whether patterns of human immunodeficiency virus (HIV)-specific T-cell responses during acute infection may influence the viral set point and the course of disease. We wished to establish whether the magnitude and breadth of HIV type 1 (HIV-1)-specific T-cell responses at 3 months postinfection were correlated with the viral-load set point at 12 months and hypothesized that the magnitude and breadth of HIV-specific T-cell responses during primary infection would predict the set point. Gamma interferon (IFN-gamma) enzyme-linked immunospot (ELISPOT) assay responses across the complete proteome were measured in 47 subtype C HIV-1-infected participants at a median of 12 weeks postinfection. When corrected for amino acid length and individuals responding to each region, the order of recognition was as follows: Nef > Gag > Pol > Rev > Vpr > Env > Vpu > Vif > Tat. Nef responses were significantly (P < 0.05) dominant, targeted six epitopic regions, and were unrelated to the course of viremia. There was no significant difference in the magnitude and breadth of responses for each protein region with disease progression, although there was a trend of increased breadth (mean, four to seven pools) in rapid progressors. Correlation of the magnitude and breadth of IFN-gamma responses with the viral set point at 12 months revealed almost zero association for each protein region. Taken together, these data demonstrate that the magnitude and breadth of IFN-gamma ELISPOT assay responses at 3 months postinfection are unrelated to the course of disease in the first year of infection and are not associated with, and have low predictive power for, the viral set point at 12 months.
Journal of Virology 10/2008; 83(1):470-8. · 5.40 Impact Factor
-
Brandon F Keele,
Elena E Giorgi,
Jesus F Salazar-Gonzalez,
Julie M Decker,
Kimmy T Pham,
Maria G Salazar,
Chuanxi Sun,
Truman Grayson,
Shuyi Wang,
Hui Li, [......],
Brian Gaschen,
Gayathri S Athreya,
Ha Y Lee,
Natasha Wood, Cathal Seoighe,
Alan S Perelson,
Tanmoy Bhattacharya,
Bette T Korber,
Beatrice H Hahn,
George M Shaw
[show abstract]
[hide abstract]
ABSTRACT: The precise identification of the HIV-1 envelope glycoprotein (Env) responsible for productive clinical infection could be instrumental in elucidating the molecular basis of HIV-1 transmission and in designing effective vaccines. Here, we developed a mathematical model of random viral evolution and, together with phylogenetic tree construction, used it to analyze 3,449 complete env sequences derived by single genome amplification from 102 subjects with acute HIV-1 (clade B) infection. Viral env genes evolving from individual transmitted or founder viruses generally exhibited a Poisson distribution of mutations and star-like phylogeny, which coalesced to an inferred consensus sequence at or near the estimated time of virus transmission. Overall, 78 of 102 subjects had evidence of productive clinical infection by a single virus, and 24 others had evidence of productive clinical infection by a minimum of two to five viruses. Phenotypic analysis of transmitted or early founder Envs revealed a consistent pattern of CCR5 dependence, masking of coreceptor binding regions, and equivalent or modestly enhanced resistance to the fusion inhibitor T1249 and broadly neutralizing antibodies compared with Envs from chronically infected subjects. Low multiplicity infection and limited viral evolution preceding peak viremia suggest a finite window of potential vulnerability of HIV-1 to vaccine-elicited immune responses, although phenotypic properties of transmitted Envs pose a formidable defense.
Proceedings of the National Academy of Sciences 06/2008; 105(21):7552-7. · 9.68 Impact Factor
-
Denis R Chopera,
Zenda Woodman,
Koleka Mlisana,
Mandla Mlotshwa,
Darren P Martin, Cathal Seoighe,
Florette Treurnicht,
Debra Assis de Rosa,
Winston Hide,
Salim Abdool Karim,
Clive M Gray,
Carolyn Williamson
[show abstract]
[hide abstract]
ABSTRACT: One of the most important genetic factors known to affect the rate of disease progression in HIV-infected individuals is the genotype at the Class I Human Leukocyte Antigen (HLA) locus, which determines the HIV peptides targeted by cytotoxic T-lymphocytes (CTLs). Individuals with HLA-B*57 or B*5801 alleles, for example, target functionally important parts of the Gag protein. Mutants that escape these CTL responses may have lower fitness than the wild-type and can be associated with slower disease progression. Transmission of the escape variant to individuals without these HLA alleles is associated with rapid reversion to wild-type. However, the question of whether infection with an escape mutant offers an advantage to newly infected hosts has not been addressed. Here we investigate the relationship between the genotypes of transmitted viruses and prognostic markers of disease progression and show that infection with HLA-B*57/B*5801 escape mutants is associated with lower viral load and higher CD4+ counts.
PLoS Pathogens 04/2008; 4(3):e1000033. · 9.13 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Keyword searching through PubMed and other systems is the standard means of retrieving information from Medline. However, ad-hoc retrieval systems do not meet all of the needs of databases that curate information from literature, or of text miners developing a corpus on a topic that has many terms indicative of relevance. Several databases have developed supervised learning methods that operate on a filtered subset of Medline, to classify Medline records so that fewer articles have to be manually reviewed for relevance. A few studies have considered generalisation of Medline classification to operate on the entire Medline database in a non-domain-specific manner, but existing applications lack speed, available implementations, or a means to measure performance in new domains.
MScanner is an implementation of a Bayesian classifier that provides a simple web interface for submitting a corpus of relevant training examples in the form of PubMed IDs and returning results ranked by decreasing probability of relevance. For maximum speed it uses the Medical Subject Headings (MeSH) and journal of publication as a concise document representation, and takes roughly 90 seconds to return results against the 16 million records in Medline. The web interface provides interactive exploration of the results, and cross validated performance evaluation on the relevant input against a random subset of Medline. We describe the classifier implementation, cross validate it on three domain-specific topics, and compare its performance to that of an expert PubMed query for a complex topic. In cross validation on the three sample topics against 100,000 random articles, the classifier achieved excellent separation of relevant and irrelevant article score distributions, ROC areas between 0.97 and 0.99, and averaged precision between 0.69 and 0.92.
MScanner is an effective non-domain-specific classifier that operates on the entire Medline database, and is suited to retrieving topics for which many features may indicate relevance. Its web interface simplifies the task of classifying Medline citations, compared to building a pre-filter and classifier specific to the topic. The data sets and open source code used to obtain the results in this paper are available on-line and as supplementary material, and the web interface may be accessed at http://mscanner.stanford.edu.
BMC Bioinformatics 02/2008; 9:108. · 2.75 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Accurate mRNA splicing depends on multiple regulatory signals encoded in the transcribed RNA sequence. Many examples of mutations within human splice regulatory regions that alter splicing qualitatively or quantitatively have been reported and allelic differences in mRNA splicing are likely to be a common and important source of phenotypic diversity at the molecular level, in addition to their contribution to genetic disease susceptibility. However, because the effect of a mutation on the efficiency of mRNA splicing is often difficult to predict, many mutations that cause disease through an effect on splicing are likely to remain undiscovered.
We have combined a genome-wide scan for sequence polymorphisms likely to affect mRNA splicing with analysis of publicly available Expressed Sequence Tag (EST) and exon array data. The genome-wide scan uses published tools and identified 30,977 SNPs located within donor and acceptor splice sites, branch points and exonic splicing enhancer elements. For 1,185 candidate splicing polymorphisms the difference in splicing between alternative alleles was corroborated by publicly available exon array data from 166 lymphoblastoid cell lines. We developed a novel probabilistic method to infer allele-specific splicing from EST data. The method uses SNPs and alternative mRNA isoforms mapped to EST sequences and models both regulated alternative splicing as well as allele-specific splicing. We have also estimated heritability of splicing and report that a greater proportion of genes show evidence of splicing heritability than show heritability of overall gene expression level. Our results provide an extensive resource that can be used to assess the possible effect on splicing of human polymorphisms in putative splice-regulatory sites.
We report a set of genes showing evidence of allele-specific splicing from an integrated analysis of genomic polymorphisms, EST data and exon array data, including several examples for which there is experimental evidence of polymorphisms affecting splicing in the literature. We also present a set of novel allele-specific splicing candidates and discuss the strengths and weaknesses of alternative technologies for inferring the effect of sequence variants on mRNA splicing.
BMC Genomics 01/2008; 9:265. · 4.07 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Abstract
Background
Keyword searching through PubMed and other systems is the standard means of retrieving information from Medline. However, ad-hoc retrieval systems do not meet all of the needs of databases that curate information from literature, or of text miners developing a corpus on a topic that has many terms indicative of relevance. Several databases have developed supervised learning methods that operate on a filtered subset of Medline, to classify Medline records so that fewer articles have to be manually reviewed for relevance. A few studies have considered generalisation of Medline classification to operate on the entire Medline database in a non-domain-specific manner, but existing applications lack speed, available implementations, or a means to measure performance in new domains.
Results
MScanner is an implementation of a Bayesian classifier that provides a simple web interface for submitting a corpus of relevant training examples in the form of PubMed IDs and returning results ranked by decreasing probability of relevance. For maximum speed it uses the Medical Subject Headings (MeSH) and journal of publication as a concise document representation, and takes roughly 90 seconds to return results against the 16 million records in Medline. The web interface provides interactive exploration of the results, and cross validated performance evaluation on the relevant input against a random subset of Medline. We describe the classifier implementation, cross validate it on three domain-specific topics, and compare its performance to that of an expert PubMed query for a complex topic. In cross validation on the three sample topics against 100,000 random articles, the classifier achieved excellent separation of relevant and irrelevant article score distributions, ROC areas between 0.97 and 0.99, and averaged precision between 0.69 and 0.92.
Conclusion
MScanner is an effective non-domain-specific classifier that operates on the entire Medline database, and is suited to retrieving topics for which many features may indicate relevance. Its web interface simplifies the task of classifying Medline citations, compared to building a pre-filter and classifier specific to the topic. The data sets and open source code used to obtain the results in this paper are available on-line and as supplementary material, and the web interface may be accessed at http://mscanner.stanford.edu .
BMC Bioinformatics. 01/2008;