Gaeta, B.A. et al. iHMMune-align: hidden Markov model-based alignment and identification of germline genes in rearranged immunoglobulin gene sequences. Bioinformatics 23, 1580-1587

School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW 2052, Australia.
Bioinformatics (Impact Factor: 4.98). 08/2007; 23(13):1580-7. DOI: 10.1093/bioinformatics/btm147
Source: PubMed

ABSTRACT Immunoglobulin heavy chain (IGH) genes in mature B lymphocytes are the result of recombination of IGHV, IGHD and IGHJ germline genes, followed by somatic mutation. The correct identification of the germline genes that make up a variable VH domain is essential to our understanding of the process of antibody diversity generation as well as to clinical investigations of some leukaemias and lymphomas.
We have developed iHMMune-align, an alignment program that uses a hidden Markov model (HMM) to model the processes involved in human IGH gene rearrangement and maturation. The performance of iHMMune-align was compared to that of other immunoglobulin gene alignment utilities using both clonally related and randomly selected IGH sequences. This evaluation suggests that iHMMune-align provides a more accurate identification of component germline genes than other currently available IGH gene characterization programs.
iHMMune-align cross-platform Java executable and web interface are freely available to academic users and can be accessed at

Download full-text


Available from: Andrew Michael Collins, Sep 03, 2015
10 Reads
  • Source
    • "Many software focuses on V(D)J segmentation, identifying the V, D, and J regions in a sequence. The available V(D)J segmenters perform sequence alignments against full germline databases (JoinSolver [13], V-QUEST [9], HighV-QUEST [11]), possibly with some alignment heuristic ([14], IgBlast [15]), models such as hidden Markov models (HMMs) (iHMMune-align [16], SoDA2 [17]), or maximum-likelihood-based techniques (VDJSolver [18]). A short benchmark of some of these tools has been published [19], but there is the need for more complete and independent evaluation. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background V(D)J recombinations in lymphocytes are essential for immunological diversity. They are also useful markers of pathologies. In leukemia, they are used to quantify the minimal residual disease during patient follow-up. However, the full breadth of lymphocyte diversity is not fully understood. Results We propose new algorithms that process high-throughput sequencing (HTS) data to extract unnamed V(D)J junctions and gather them into clones for quantification. This analysis is based on a seed heuristic and is fast and scalable because in the first phase, no alignment is performed with germline database sequences. The algorithms were applied to TR γ HTS data from a patient with acute lymphoblastic leukemia, and also on data simulating hypermutations. Our methods identified the main clone, as well as additional clones that were not identified with standard protocols. Conclusions The proposed algorithms provide new insight into the analysis of high-throughput sequencing data for leukemia, and also to the quantitative assessment of any immunological profile. The methods described here are implemented in a C++ open-source program called Vidjil. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-409) contains supplementary material, which is available to authorized users.
    BMC Genomics 05/2014; 15(1):409. DOI:10.1186/1471-2164-15-409 · 3.99 Impact Factor
  • Source
    • "Usually, this sequence is not available, so it is reconstructed by identifying the original gene segments used, based on highest homology to the mutated sequence; the germline junction region is then deduced from a consensus of all clonally related sequences. Several programs may be used for identifying the germline segments and the junction regions, such as IMGT/V-QUEST [18], SoDA [19] or iHMMune-align [20]. We currently use SoDA for our analyses as it is most convenient to use. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Immunoglobulin (that is, antibody) and T cell receptor genes are created through somatic gene rearrangement from gene segment libraries. Immunoglobulin genes are further diversified by somatic hypermutation and selection during the immune response. Studying the repertoires of these genes yields valuable insights into immune system function in infections, aging, autoimmune diseases and cancers. The introduction of high throughput sequencing has generated unprecedented amounts of repertoire and mutation data from immunoglobulin genes. However, common analysis programs are not appropriate for pre-processing and analyzing these data due to the lack of a template or reference for the whole gene. We present here the automated analysis pipeline we created for this purpose, which integrates various software packages of our own development and others', and demonstrate its performance. Our analysis pipeline presented here is highly modular, and makes it possible to analyze the data resulting from high-throughput sequencing of Ig genes, in spite of the lack of a template gene. An executable version of the Automation program (and its source code) is freely available for downloading from our website:
    Journal of Clinical Bioinformatics 08/2013; 3(1):15. DOI:10.1186/2043-9113-3-15
  • Source
    • "Here we demonstrate a new method for identifying clonally related sequences in large sets of rearranged IGH sequences, based on analysis of the highly variable CDR3 region of the VH domain. Sequences are partitioned using iHMMune-align [8] then clustered based on CDR3 similarity and common V and J genes. Clusters meeting an empirical quality criterion are then identified and extracted as sets of potentially clonally related sequences. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Clonal expansion of B lymphocytes coupled with somatic mutation and antigen selection allow the mammalian humoral immune system to generate highly specific immunoglobulins (IG) or antibodies against invading bacteria, viruses and toxins. The availability of high-throughput DNA sequencing methods is providing new avenues for studying this clonal expansion and identifying the factors guiding the generation of antibodies. The identification of groups of rearranged immunoglobulin gene sequences descended from the same rearrangement (clonally-related sets) in very large sets of sequences is facilitated by the availability of immunoglobulin gene sequence alignment and partitioning software that can accurately predict component germline gene, but has required painstaking visual inspection and analysis of sequences. We have developed and implemented an algorithm for identifying sets of clonally-related sequences in large human immunoglobulin heavy chain gene variable region sequence sets. The program processes sequences that have been partitioned using iHMMune-align, and uses pairwise comparisons of CDR3 sequences and similarity in IGHV and IGHJ germline gene assignments to construct a distance matrix. Agglomerative hierarchical clustering is then used to identify likely groups of clonally-related sequences. The program is available for download from The method was evaluated on several benchmark datasets and provided a more accurate and considerably faster identification of clonally-related immunoglobulin gene sequences than visual inspection by domain experts.
    Immunome Research 09/2010; 6 Suppl 1(Suppl 1):S4. DOI:10.1186/1745-7580-6-S1-S4
Show more