[show abstract][hide abstract] ABSTRACT: Proteins of the mammalian PYHIN (IFI200/HIN-200) family are involved in defence against infection through recognition of foreign DNA. The family member absent in melanoma 2 (AIM2) binds cytosolic DNA via its HIN domain and initiates inflammasome formation via its pyrin domain. AIM2 lies within a cluster of related genes, many of which are uncharacterised in mouse. To better understand the evolution, orthology and function of these genes, we have documented the range of PYHIN genes present in representative mammalian species, and undertaken phylogenetic and expression analyses.
No PYHIN genes are evident in non-mammals or monotremes, with a single member found in each of three marsupial genomes. Placental mammals show variable family expansions, from one gene in cow to four in human and 14 in mouse. A single HIN domain appears to have evolved in the common ancestor of marsupials and placental mammals, and duplicated to give rise to three distinct forms (HIN-A, -B and -C) in the placental mammal ancestor. Phylogenetic analyses showed that AIM2 HIN-C and pyrin domains clearly diverge from the rest of the family, and it is the only PYHIN protein with orthology across many species. Interestingly, although AIM2 is important in defence against some bacteria and viruses in mice, AIM2 is a pseudogene in cow, sheep, llama, dolphin, dog and elephant. The other 13 mouse genes have arisen by duplication and rearrangement within the lineage, which has allowed some diversification in expression patterns.
The role of AIM2 in forming the inflammasome is relatively well understood, but molecular interactions of other PYHIN proteins involved in defence against foreign DNA remain to be defined. The non-AIM2 PYHIN protein sequences are very distinct from AIM2, suggesting they vary in effector mechanism in response to foreign DNA, and may bind different DNA structures. The PYHIN family has highly varied gene composition between mammalian species due to lineage-specific duplication and loss, which probably indicates different adaptations for fighting infectious disease. Non-genomic DNA can indicate infection, or a mutagenic threat. We hypothesise that defence of the genome against endogenous retroelements has been an additional evolutionary driver for PYHIN proteins.
[show abstract][hide abstract] ABSTRACT: Protein sequences are often composed of regions that have distinct evolutionary histories as a consequence of domain shuffling, recombination or gene conversion. New approaches are required to discover, visualize and analyze these sequence regions and thus enable a better understanding of protein evolution.
Here, we have developed an alignment-free and visual approach to analyze sequence relationships. We use the number of shared n-grams between sequences as a measure of sequence similarity and rearrange the resulting affinity matrix applying a spectral technique. Heat maps of the affinity matrix are employed to identify and visualize clusters of related sequences or outliers, while n-gram-based dot plots and conservation profiles allow detailed analysis of similarities among selected sequences. Using this approach, we have identified signatures of domain shuffling in an otherwise poorly characterized family, and homology clusters in another. We conclude that this approach may be generally useful as a framework to analyze related, but highly divergent protein sequences. It is particularly useful as a fast method to study sequence relationships prior to much more time-consuming multiple sequence alignment and phylogenetic analysis.
A software implementation (MOSAIC) of the framework described here can be downloaded from http://bioinformatics.org.au/mosaic/
Supplementary data are available at Bioinformatics online.