Gaeta, B.A. et al. iHMMune-align: hidden Markov model-based alignment and identification of germline genes in rearranged immunoglobulin gene sequences. Bioinformatics 23, 1580-1587

School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW 2052, Australia.
Bioinformatics (Impact Factor: 4.98). 08/2007; 23(13):1580-7. DOI: 10.1093/bioinformatics/btm147
Source: PubMed


Immunoglobulin heavy chain (IGH) genes in mature B lymphocytes are the result of recombination of IGHV, IGHD and IGHJ germline genes, followed by somatic mutation. The correct identification of the germline genes that make up a variable VH domain is essential to our understanding of the process of antibody diversity generation as well as to clinical investigations of some leukaemias and lymphomas.
We have developed iHMMune-align, an alignment program that uses a hidden Markov model (HMM) to model the processes involved in human IGH gene rearrangement and maturation. The performance of iHMMune-align was compared to that of other immunoglobulin gene alignment utilities using both clonally related and randomly selected IGH sequences. This evaluation suggests that iHMMune-align provides a more accurate identification of component germline genes than other currently available IGH gene characterization programs.
iHMMune-align cross-platform Java executable and web interface are freely available to academic users and can be accessed at

Download full-text


Available from: Andrew Michael Collins, Sep 03, 2015
  • Source
    • "Many software focuses on V(D)J segmentation, identifying the V, D, and J regions in a sequence. The available V(D)J segmenters perform sequence alignments against full germline databases (JoinSolver [13], V-QUEST [9], HighV-QUEST [11]), possibly with some alignment heuristic ([14], IgBlast [15]), models such as hidden Markov models (HMMs) (iHMMune-align [16], SoDA2 [17]), or maximum-likelihood-based techniques (VDJSolver [18]). A short benchmark of some of these tools has been published [19], but there is the need for more complete and independent evaluation. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background V(D)J recombinations in lymphocytes are essential for immunological diversity. They are also useful markers of pathologies. In leukemia, they are used to quantify the minimal residual disease during patient follow-up. However, the full breadth of lymphocyte diversity is not fully understood. Results We propose new algorithms that process high-throughput sequencing (HTS) data to extract unnamed V(D)J junctions and gather them into clones for quantification. This analysis is based on a seed heuristic and is fast and scalable because in the first phase, no alignment is performed with germline database sequences. The algorithms were applied to TR γ HTS data from a patient with acute lymphoblastic leukemia, and also on data simulating hypermutations. Our methods identified the main clone, as well as additional clones that were not identified with standard protocols. Conclusions The proposed algorithms provide new insight into the analysis of high-throughput sequencing data for leukemia, and also to the quantitative assessment of any immunological profile. The methods described here are implemented in a C++ open-source program called Vidjil. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-409) contains supplementary material, which is available to authorized users.
    BMC Genomics 05/2014; 15(1):409. DOI:10.1186/1471-2164-15-409 · 3.99 Impact Factor
  • Source
    • "Usually, this sequence is not available, so it is reconstructed by identifying the original gene segments used, based on highest homology to the mutated sequence; the germline junction region is then deduced from a consensus of all clonally related sequences. Several programs may be used for identifying the germline segments and the junction regions, such as IMGT/V-QUEST [18], SoDA [19] or iHMMune-align [20]. We currently use SoDA for our analyses as it is most convenient to use. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Immunoglobulin (that is, antibody) and T cell receptor genes are created through somatic gene rearrangement from gene segment libraries. Immunoglobulin genes are further diversified by somatic hypermutation and selection during the immune response. Studying the repertoires of these genes yields valuable insights into immune system function in infections, aging, autoimmune diseases and cancers. The introduction of high throughput sequencing has generated unprecedented amounts of repertoire and mutation data from immunoglobulin genes. However, common analysis programs are not appropriate for pre-processing and analyzing these data due to the lack of a template or reference for the whole gene. We present here the automated analysis pipeline we created for this purpose, which integrates various software packages of our own development and others', and demonstrate its performance. Our analysis pipeline presented here is highly modular, and makes it possible to analyze the data resulting from high-throughput sequencing of Ig genes, in spite of the lack of a template gene. An executable version of the Automation program (and its source code) is freely available for downloading from our website:
    Journal of Clinical Bioinformatics 08/2013; 3(1):15. DOI:10.1186/2043-9113-3-15
  • Source
    • "All V H libraries were sequenced twice by independent runs on a GS FLX System (454 Life Sciences, Roche) (Margulies et al., 2005). Sequences were segregated based on DNA barcodes, and V H rearrangements were aligned to germline gene repertoires using the iHMMune-align algorithm (Gaë ta et al., 2007; Jackson et al., 2010). Resulting alignments were parsed to obtain V, D, and J matches. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Dengue is the most prevalent mosquito-borne viral disease in humans, and the lack of early prognostics, vaccines, and therapeutics contributes to immense disease burden. To identify patterns that could be used for sequence-based monitoring of the antibody response to dengue, we examined antibody heavy-chain gene rearrangements in longitudinal peripheral blood samples from 60 dengue patients. Comparing signatures between acute dengue, postrecovery, and healthy samples, we found increased expansion of B cell clones in acute dengue patients, with higher overall clonality in secondary infection. Additionally, we observed consistent antibody sequence features in acute dengue in the highly variable major antigen-binding determinant, complementarity-determining region 3 (CDR3), with specific CDR3 sequences highly enriched in acute samples compared to postrecovery, healthy, or non-dengue samples. Dengue thus provides a striking example of a human viral infection where convergent immune signatures can be identified in multiple individuals. Such signatures could facilitate surveillance of immunological memory in communities.
    Cell host & microbe 06/2013; 13(6):691-700. DOI:10.1016/j.chom.2013.05.008 · 12.33 Impact Factor
Show more