Article

iHMMune-align: hidden Markov model-based alignment and identification of germline genes in rearranged immunoglobulin gene sequences

School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW 2052, Australia.
Bioinformatics (Impact Factor: 4.62). 08/2007; 23(13):1580-7. DOI: 10.1093/bioinformatics/btm147
Source: PubMed

ABSTRACT Immunoglobulin heavy chain (IGH) genes in mature B lymphocytes are the result of recombination of IGHV, IGHD and IGHJ germline genes, followed by somatic mutation. The correct identification of the germline genes that make up a variable VH domain is essential to our understanding of the process of antibody diversity generation as well as to clinical investigations of some leukaemias and lymphomas.
We have developed iHMMune-align, an alignment program that uses a hidden Markov model (HMM) to model the processes involved in human IGH gene rearrangement and maturation. The performance of iHMMune-align was compared to that of other immunoglobulin gene alignment utilities using both clonally related and randomly selected IGH sequences. This evaluation suggests that iHMMune-align provides a more accurate identification of component germline genes than other currently available IGH gene characterization programs.
iHMMune-align cross-platform Java executable and web interface are freely available to academic users and can be accessed at http://www.emi.unsw.edu.au/~ihmmune/.

0 Followers
 · 
67 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: Individual variation in germline and expressed B-cell immunoglobulin (Ig) repertoires has been associated with aging, disease susceptibility, and differential response to infection and vaccination. Repertoire properties can now be studied at large-scale through next-generation sequencing of rearranged Ig genes. Accurate analysis of these repertoire-sequencing (Rep-Seq) data requires identifying the germline variable (V), diversity (D), and joining (J) gene segments used by each Ig sequence. Current V(D)J assignment methods work by aligning sequences to a database of known germline V(D)J segment alleles. However, existing databases are likely to be incomplete and novel polymorphisms are hard to differentiate from the frequent occurrence of somatic hypermutations in Ig sequences. Here we develop a Tool for Ig Genotype Elucidation via Rep-Seq (TIgGER). TIgGER analyzes mutation patterns in Rep-Seq data to identify novel V segment alleles, and also constructs a personalized germline database containing the specific set of alleles carried by a subject. This information is then used to improve the initial V segment assignments from existing tools, like IMGT/HighV-QUEST. The application of TIgGER to Rep-Seq data from seven subjects identified 11 novel V segment alleles, including at least one in every subject examined. These novel alleles constituted 13% of the total number of unique alleles in these subjects, and impacted 3% of V(D)J segment assignments. These results reinforce the highly polymorphic nature of human Ig V genes, and suggest that many novel alleles remain to be discovered. The integration of TIgGER into Rep-Seq processing pipelines will increase the accuracy of V segment assignments, thus improving B-cell repertoire analyses.
    Proceedings of the National Academy of Sciences 02/2015; 112(8). DOI:10.1073/pnas.1417683112 · 9.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Adaptive immune responses in humans rely on somatic genetic rearrangements of Ig and T-cell receptor loci to generate diverse antigen receptors. It is unclear to what extent an individual's genetic background affects the characteristics of the antibody repertoire used in responding to vaccination or infection. We studied the B-cell repertoires and clonal expansions in response to attenuated varicella-zoster vaccination in four pairs of adult identical twins and found that the global antibody repertoires of twin pair members showed high similarity in antibody heavy chain V, D, and J gene segment use, and in the length and features of the complementarity-determining region 3, a major determinant of antigen binding. These twin similarities were most pronounced in the IgM-expressing B-cell pools, but were seen to a lesser extent in IgG-expressing B cells. In addition, the degree of antibody somatic mutation accumulated in the B-cell repertoire was highly correlated within twin pair members. Twin pair members had greater numbers of shared convergent antibody sequences, including mutated sequences, suggesting similarity among memory B-cell clonal lineages. Despite these similarities in the memory repertoire, the B-cell clones used in acute responses to ZOSTAVAX vaccination were largely unique to each individual. Taken together, these results suggest that the overall B-cell repertoire is significantly shaped by the underlying germ-line genome, but that stochastic or individual-specific effects dominate the selection of clones in response to an acute antigenic stimulus.
    Proceedings of the National Academy of Sciences 12/2014; 112(2). DOI:10.1073/pnas.1415875112 · 9.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Antibodies are glycoproteins produced by the immune system as a dynamically adaptive line of defense against invading pathogens. Very elegant and specific mutational mechanisms allow B lymphocytes to produce a large and diversified repertoire of antibodies, which is modified and enhanced throughout all adulthood. One of these mechanisms is somatic hypermutation, which stochastically mutates nucleotides in the antibody genes, forming new sequences with different properties and, eventually, higher affinity and selectivity to the pathogenic target. Since somatic hypermutation involves fast mutation of antibody sequences, this process can be described using a Markov substitution model of molecular evolution. Here, using large sets of antibody sequences from mice and humans, we infer an empirical amino acid substitution model AB, which is specific to antibody sequences. Compared to existing general amino acid models, we show that the AB model provides significantly better description for the somatic evolution of mice and human antibody sequences, as demonstrated on large next generation sequencing (NGS) antibody data. General amino acid models are reflective of conservation at the protein level due to functional constraints, with most frequent amino acids exchanges taking place between residues with the same or similar physicochemical properties. In contrast, within the variable part of antibody sequences we observed an elevated frequency of exchanges between amino acids with distinct physicochemical properties. This is indicative of a sui generis mutational mechanism, specific to antibody somatic hypermutation. We illustrate this property of antibody sequences by a comparative analysis of the network modularity implied by the AB model and general amino acid substitution models. We recommend using the new model for computational studies of antibody sequence maturation, including inference of alignments and phylogenetic trees describing antibody somatic hypermutation in large NGS data sets. The AB model is implemented in the open-source software CodonPhyML (http://sourceforge.net/projects/codonphyml) and can be downloaded and supplied by the user to ProGraphMSA (http://sourceforge.net/projects/prographmsa) or other alignment and phylogeny reconstruction programs that allow for user-defined substitution models. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
    Molecular Biology and Evolution 12/2014; 32(3). DOI:10.1093/molbev/msu340 · 14.31 Impact Factor