Article

Hotspots of mutation and breakage in dog and human chromosomes

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Sequencing of the dog genome allows an investigation of the location-dependent evolutionary processes that occurred since the common ancestor of primates and carnivores, approximately 95 million years ago. We investigated variations in G+C nucleotide fraction and synonymous nucleotide substitution rates (Ks) across dog and human genomes. Our results show that dog genes located either in subtelomeric and pericentromeric regions, or in short synteny blocks, possess significantly elevated G+C fraction and Ks values. Human subtelomeric, but not pericentromeric, genes also exhibit these elevations. We then examined 1.048 Gb of human sequence that is likely not to have been located near a primate telomere at any time since the common ancestor of dog and human. We observed that regions of highest G+C or Ks ("hotspots"; median sizes of 0.5 or 1.3 Mb, respectively) within this sequence were preferentially segregated to dog subtelomeres and pericentromeres during the rearrangements that eventually gave rise to the extant canine karyotype. Our data cannot be accounted for solely on the basis of gradually elevating G+C fractions in subtelomeric regions as a consequence of biased gene conversion. Rather, we propose that high G+C sequences are found preferentially within dog subtelomeres as a direct consequence of chromosomal fission occurring more frequently within regions elevated in G+C.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Repetitive sequences. Repetitive sequences comprise approximately 50% of the mammal genomes 30 and are associated with syntenic breakpoints and chromosomal fragility [31][32][33] . Repetitive sequences of six species of mammals (horses 29 , humans 30 , mouse 34 , dogs 35 , cattle 36 and pigs 37 ) were examined in this study (Fig. 4a). ...
... We found some types of repetitive sequences were significantly increased in the rearrangement regions (Fig. 4b, Supplementary Table S12). This result is consistent with previous findings [31][32] . Interestingly, the proportions of LINE_L1 and LTR_ ERV1 increased, but the proportions of LINE_L2 and several other repetitive sequences decreased (Fig. 4c, Supplementary Table S13 to S16). ...
... Some studies have demonstrated that repetitive sequences are associated with syntenic breakpoints and chromosomal fragility 31,32 . This study did not reveal significant differences in repetitive sequences among different species and different chromosomes (except the X chromosome). ...
Article
Full-text available
Karyotypic diversification is more prominent in Equus species than in other mammals. Here, using next generation sequencing technology, we generated and de novo assembled quality genomes sequences for a male wild horse (Przewalski's horse) and a male domestic horse (Mongolian horse), with about 93-fold and 91-fold coverage, respectively. Portion of Y chromosome from wild horse assemblies (3 M bp) and Mongolian horse (2 M bp) were also sequenced and de novo assembled. We confirmed a Robertsonian translocation event through the wild horse's chromosomes 23 and 24, which contained sequences that were highly homologous with those on the domestic horse's chromosome 5. The four main types of rearrangement, insertion of unknown origin, inserted duplication, inversion, and relocation, are not evenly distributed on all the chromosomes, and some chromosomes, such as the X chromosome, contain more rearrangements than others, and the number of inversions is far less than the number of insertions and relocations in the horse genome. Furthermore, we discovered the percentages of LINE_L1 and LTR_ERV1 are significantly increased in rearrangement regions. The analysis results of the two representative Equus species genomes improved our knowledge of Equus chromosome rearrangement and karyotype evolution.
... Estimated rates of chromosomal change for rodents are among the highest observed in mammals (Mouse Consortium 2002;), but little is know about the effects of those rearrangements on the speciation process that separated mouse and rat. The mouse and rat genomes have diverged during 12-24 Myrs (Mouse Consortium 2002; Gibbs et al. 2004) and are very divergent from the common ancestor of eutherian mammals, both in terms of their highly rearranged genomes, and in the relatively large number of nucleotide substitutions in selectively neutral sites that they have accumulated (Bourque et al. 2004; Mouse Consortium 2002; Gibbs et al. 2004; Webber and Ponting 2005). Considering all these facts, rodents are an excellent model to test for an association between chromosomal evolution and evolutionary rates. ...
... Given the results of recent papers, this kind of phenomena may well be a major contributor to our observations. The relationship between rearrangement breakpoints and higher neutral evolution found in our papers has been recently confirmed by Webber and Ponting (2005), who analyzed the dog genome, thus suggesting a consistent effect in the whole mammalian evolution. These authors found significant negative correlations between either G+C content or Ks and distance to a synteny breakpoint. ...
... This suggested that chromosomal rearrangements (especially chromosomal fissions) tend to happen in regions of ancestral high GC content. In mammalian genomes, CpG dinucleotides tend to be mutated to TpG through a methilation and deamination processes, leading to higher divergence measures ( Ebersberger et al. 2002; Matassi et al. 1999; Webber and Ponting 2005; Yi et al. 2002). Thus, higher GC content could act at the same time as source for chromosomal rearrangements (by an as yet undetermined mechanism) and as a source for higher mutation rates. ...
Article
Full-text available
AbstractThe main objectives of this work are:a) To test the predictions of suppressed-recombination chromosomal speciation models on two different lineages of mammals: rodents andprimates.Suppressed-recombination chromosomal speciation is still quite elusive as a mode of speciation in mammals. Experimental results are scarce and the first objective of this work is to analyze whole-genome data looking for traces of events of chromosomal speciation. Rodent and primate lineages were chosen for this search, not just because of their particular biological and cytological characteristics, which make them good candidates to have speciated by thismechanism, but also because they were the first mammalian organisms to be fully sequenced.b) To study the effects of chromosomal rearrangements on genic evolutionary rates.As have been seen in the introduction, there are many of potential interactions among chromosomal rearrangements and evolutionary rates, so the second goal of this work was to try to understand the impact of chromosomal rearrangements over substitution rates by means of other mechanisms not related with speciation.c) To distinguish individual contributions of different genomic factors in the potential association among chromosomal rearrangements andevolutionary rates.The third main goal of this thesis was to discern among the different factors that could be explaining the many associations between chromosomal and genic evolution that were detected in different studies.
... In a linear biological entity, such as a chromosome; the biological entity is represented on the horizontal axis and the score, such as GC content, is represented on the vertical axis. This result in a line graph representing the GC content across the entire chromosome, which can then be used to identify GC content fluctuations across a chromosome [2]. ...
... The resulting landscapes can then be used for comparison across different chromosomes or genomes [2]. The vertical scale is not limited to interval or ratio values -it can also be ordinal values, such as binary. ...
... This suggests GC peaks may be targets of meiotic DSBs. Interestingly, GC rich regions also seem to be involved in genome rearrangements during canid genome evolution, where they have relocated to telomeric regions [28]. This could indicate that GC peaks are important targets of NAHR and often involved in rearrangements. ...
... However, it is also possible that it is simply GC-richness that promotes recombination, or that the peaks of high GC content are mainly a consequence of elevated recombination rate due to GCbiased gene conversion rather than its cause, which are detected as CpG islands regardless of methylation [41]. These results are particularly interesting in light of the suggestion that GC-rich regions have acted as novel target sites of chromosomal fissions during canid evolution [28]. ...
Article
Full-text available
Copy number variants (CNVs) account for substantial variation between genomes and are a major source of normal and pathogenic phenotypic differences. The dog is an ideal model to investigate mutational mechanisms that generate CNVs as its genome lacks a functional ortholog of the PRDM9 gene implicated in recombination and CNV formation in humans. Here we comprehensively assay CNVs using high-density array comparative genomic hybridization in 50 dogs from 17 dog breeds and 3 gray wolves. We use a stringent new method to identify a total of 430 high-confidence CNV loci, which range in size from 9 kb to 1.6 Mb and span 26.4 Mb, or 1.08%, of the assayed dog genome, overlapping 413 annotated genes. Of CNVs observed in each breed, 98% are also observed in multiple breeds. CNVs predicted to disrupt gene function are significantly less common than expected by chance. We identify a significant overrepresentation of peaks of GC content, previously shown to be enriched in dog recombination hotspots, in the vicinity of CNV breakpoints. A number of the CNVs identified by this study are candidates for generating breed-specific phenotypes. Purifying selection seems to be a major factor shaping structural variation in the dog genome, suggesting that many CNVs are deleterious. Localized peaks of GC content appear to be novel sites of CNV formation in the dog genome by non-allelic homologous recombination, potentially activated by the loss of PRDM9. These sequence features may have driven genome instability and chromosomal rearrangements throughout canid evolution.
... While the rebuttal of RBM caused a contro- versy [7, 43, 44], Peng et al., 2006 [36] and Alekseyev and Pevzner, 2007 [2] revealed some flaws in the arguments against FBM. Furthermore, the rebuttal of RBM was followed by many studies supporting FBM [6, 8, 9, 12, 14, 17, 19, 20, 21, 24, 27, 28, 29, 30, 31, 39, 40, 41, 46, 47, 49, 51]. Comparative analysis of the human chromosomes reveals many short adjacent regions corresponding to parts of several mouse chromosomes [16]. ...
... We demonstrate that data in [25] reveal rampant but elusive breakpoint reuse that cannot be detected via counting repeated breakages between various pairs of branches of the evolutionary tree. TFBM is an extension of FBM that reconciles seemingly contradictory results in [6, 8, 9, 12, 14, 17, 19, 20, 21, 24, 27, 28, 29, 30, 31, 39, 40, 41, 46, 47, 49, 51] and [25] and explains that they do not contradict to each other. TFBM postulates that fragile regions have a limited lifespan and implies that they can migrate between different genomic locations. ...
Conference Paper
An important question in genome evolution is whether there exist fragile regions (rearrangement hotspots) where chromosomal rearrangements are happening over and over again. Although nearly all recent studies supported the existence of fragile regions in mammalian genomes, the most comprehensive phylogenomic study of mammals (Ma et al. (2006) Genome Research 16, 1557-1565) raised some doubts about their existence. We demonstrate that fragile regions are subject to a “birth and death” process, implying that fragility has limited evolutionary lifespan. This finding implies that fragile regions migrate to different locations in different mammals, explaining why there exist only a few chromosomal breakpoints shared between different lineages. The birth and death of fragile regions phenomenon reinforces the hypothesis that rearrangements are promoted by matching segmental duplications and suggests putative locations of the currently active fragile regions in the human genome.
... We assume that these CNVs have been sampled uniformly from those present in the human population. We tested whether CNVs occur more frequently, like synonymous substitutions [26], close to telomeres or to pericentromeres, whether they contain unusually high densities of genes, repeats, or G þ C base content. We also examined the relative evolutionary rates of CNV genes and their functions. ...
... Second, we found that the rates of synonymous substitution (K S values) for genes within CNVs (median K S ¼ 0.653) are significantly higher (p ¼ 1.5 3 10 À3 ) than those for non-CNV genes (median K S ¼ 0.593). As K S values are known to be elevated in regions approaching telomeres [26], which are also overrepresented in CNVs (this report), we considered that these two observations might be causally connected. Nevertheless, the significant elevation in K S persisted even when CNVs within 2 Mb from a telomeric end were discounted (p ¼ 1.6 3 10 À2 ). ...
Article
Full-text available
Although large-scale copy-number variation is an important contributor to conspecific genomic diversity, whether these variants frequently contribute to human phenotype differences remains unknown. If they have few functional consequences, then copy-number variants (CNVs) might be expected both to be distributed uniformly throughout the human genome and to encode genes that are characteristic of the genome as a whole. We find that human CNVs are significantly overrepresented close to telomeres and centromeres and in simple tandem repeat sequences. Additionally, human CNVs were observed to be unusually enriched in those protein-coding genes that have experienced significantly elevated synonymous and nonsynonymous nucleotide substitution rates, estimated between single human and mouse orthologues. CNV genes encode disproportionately large numbers of secreted, olfactory, and immunity proteins, although they contain fewer than expected genes associated with Mendelian disease. Despite mouse CNVs also exhibiting a significant elevation in synonymous substitution rates, in most other respects they do not differ significantly from the genomic background. Nevertheless, they encode proteins that are depleted in olfactory function, and they exhibit significantly decreased amino acid sequence divergence. Natural selection appears to have acted discriminately among human CNV genes. The significant overabundance, within human CNVs, of genes associated with olfaction, immunity, protein secretion, and elevated coding sequence divergence, indicates that a subset may have been retained in the human population due to the adaptive benefit of increased gene dosage. By contrast, the functional characteristics of mouse CNVs either suggest that advantageous gene copies have been depleted during recent selective breeding of laboratory mouse strains or suggest that they were preferentially fixed as a consequence of the larger effective population size of wild mice. It thus appears that CNV differences among mouse strains do not provide an appropriate model for large-scale sequence variations in the human population.
... Expectedly, the association was more consistent in the gibbon genome, which is likely due to independent species-specific remodeling events in each lineage. The weakest overlap was observed in dog, possibly due to the highly rearranged nature of the dog genome compared to other mammals (Webber and Ponting, 2005). ...
Preprint
Full-text available
The relationship between evolutionary genome remodeling and the three-dimensional structure of the genome remain largely unexplored. Here we use the heavily rearranged gibbon genome to examine how evolutionary chromosomal rearrangements impact genome-wide chromatin interactions, topologically associating domains (TADs), and their epigenetic landscape. We use high-resolution maps of gibbon-human breaks of synteny (BOS), apply Hi-C in gibbon, measure an array of epigenetic features, and perform cross-species comparisons. We find that gibbon rearrangements occur at TAD boundaries, independent of the parameters used to identify TADs. This overlap is supported by a remarkable genetic and epigenetic similarity between BOS and TAD boundaries, namely presence of CpG islands and SINE elements, and enrichment in CTCF and H3K4me3 binding. Cross-species comparisons reveal that regions orthologous to BOS also correspond with boundaries of large (400-600kb) TADs in human and other mammalian species. The co-localization of rearrangement breakpoints and TAD boundaries may be due to higher chromatin fragility at these locations and/or increased selective pressure against rearrangements that disrupt TAD integrity. We also examine the small portion of BOS that did not overlap with TAD boundaries and gave rise to novel TADs in the gibbon genome. We postulate that these new TADs generally lack deleterious consequences. Lastly, we show that limited epigenetic homogenization occurs across breakpoints, irrespective of their time of occurrence in the gibbon lineage. Overall, our findings demonstrate remarkable conservation of chromatin interactions and epigenetic landscape in gibbons, in spite of extensive genomic shuffling.
... The relatively small number of EBRs in dog may be related to the fact that the dog presents the highest chromosome number among the Carnivora species. Previous studies revealed that EBRs are associated with several genomic features, such as high GC sequences [39], gene-rich regions [40], chromosome fragile sites [41], and elevated frequencies of segmental duplications and repeat elements [42]. In this study, the results showed that significant increases in gene density, GC content, and repeat elements were observed in the EBRs compared with the whole genome for the giant panda and dog, respectively, but no significant differences were detected between the cat EBRs and whole genome because the karyotype of the cat is closer to that of ancestral carnivore karyotype as compared to those of the dog and giant panda [3,43]. ...
Article
Full-text available
Background: Chromosome evolution is an important driver of speciation and species evolution. Previous studies have detected chromosome rearrangement events among different Carnivora species using chromosome painting strategies. However, few of these studies have focused on chromosome evolution at a nucleotide resolution due to the limited availability of chromosome-level Carnivora genomes. Although the de novo genome assembly of the giant panda is available, current short read-based assemblies are limited to moderately sized scaffolds, making the study of chromosome evolution difficult. Results: Here, we present a chromosome-level giant panda draft genome with a total size of 2.29 Gb. Based on the giant panda genome and published chromosome-level dog and cat genomes, we conduct six large-scale pairwise synteny alignments and identify evolutionary breakpoint regions. Interestingly, gene functional enrichment analysis shows that for all of the three Carnivora genomes, some genes located in evolutionary breakpoint regions are significantly enriched in pathways or terms related to sensory perception of smell. In addition, we find that the sweet receptor gene TAS1R2, which has been proven to be a pseudogene in the cat genome, is located in an evolutionary breakpoint region of the giant panda, suggesting that interchromosomal rearrangement may play a role in the cat TAS1R2 pseudogenization. Conclusions: We show that the combined strategies employed in this study can be used to generate efficient chromosome-level genome assemblies. Moreover, our comparative genomics analyses provide novel insights into Carnivora chromosome evolution, linking chromosome evolution to functional gene evolution.
... Although the nucleotide context surrounding a site is known to have a large impact on local mutation rates (Aggarwala and Voight 2016), the number of DNMs discovered here is underpowered to detect such effects. Additionally, we observed a statistically significant excess of DNMs on chromosome 10 (P ¼ 0.004, multinomial test) and in subtelomeric regions, defined as 5 Mb from the ends of assembled chromosomes (Webber and Ponting 2005) (P ¼ 0.012, binomial test), compared with segregating variants that were transmitted from parents to offspring (supplementary figs. S10 and S11 and supplementary table S1, Supplementary Material online). ...
Article
Full-text available
Knowledge of mutation rates is crucial for calibrating population genetics models of demographic history in units of years. However, mutation rates remain challenging to estimate because of the need to identify extremely rare events. We estimated the nuclear mutation rate in wolves by identifying de novo mutations in a pedigree of seven wolves. Putative de novo mutations were discovered by whole-genome sequencing and were verified by Sanger sequencing of parents and offspring. Using stringent filters and an estimate of the false negative rate in the remaining observable genome, we obtain an estimate of ∼4.5 x 10-9 per base pair per generation and provide conservative bounds from 2.6 x 10-9 and 7.1 x 10-9. Although our estimate is consistent with recent mutation rate estimates from ancient DNA (4.0 x 10-9 and 3.0-4.5 x 10-9), it implies a wider possible range. We also examined the consequences of our rate and the accompanying interval for dating several critical events in canid demographic history. For example, applying our full range of rates to coalescent models of dog and wolf demographic history implies a wide set of possible divergence times between the ancestral populations of dogs and extant Eurasian wolves (16,000 - 64,000 years ago) although our point estimate indicates a date between 25,000 and 33,000 years ago. Aside from one study in mice, ours provides the only direct mammalian mutation rate outside of primates, and is likely to be vital to future investigations of mutation rate evolution.
... This way, repetitive sequences may provide the fuel for karyotype variability, while coding regions retain high degree of sequence conservativity. The presence of elevated number of CMA+/GC-rich regions in both studied arapaimids may partly support our hypothesis as GC-rich regions, especially in conjunction with their terminal location on chromosomes, are more prone to high recombination rates (e.g., [86][87][88]). At the same time (or as an alternative explanation), higher flexibility of chromatin functional arrangement in interphase nuclei would be expected to be required to facilitate elevated plasticity for genome reshuffling and this flexibility might be, on the other hand, missing in Notopteridae fishes. ...
Article
Full-text available
Osteoglossiformes represents one of the most ancestral teleost lineages, currently widespread over almost all continents, except for Antarctica. However, data involving advanced molecular cytogenetics or comparative genomics are yet largely limited for this fish group. Therefore, the present investigations focus on the osteoglossiform family Arapaimidae, studying a unique fish model group with advanced molecular cytogenetic genomic tools. The aim is to better explore and clarify certain events and factors that had impact on evolutionary history of this fish group. For that, both South American and African representatives of Arapaimidae, namely Arapaima gigas and Heterotis niloticus, were examined. Both species differed markedly by diploid chromosome numbers, with 2n = 56 found in A. gigas and 2n = 40 exhibited by H. niloticus. Conventional cytogenetics along with fluorescence in situ hybridization revealed some general trends shared by most osteoglossiform species analyzed thus far, such as the presence of only one chromosome pair bearing 18S and 5S rDNA sites and karyotypes dominated by acrocentric chromosomes, resembling thus the patterns of hypothetical ancestral teleost karyotype. Furthermore, the genomes of A. gigas and H. niloticus display remarkable divergence in terms of repetitive DNA content and distribution, as revealed by comparative genomic hybridization (CGH). On the other hand, genomic diversity of single copy sequences studied through principal component analyses (PCA) based on SNP alleles genotyped by the DArT seq procedure demonstrated a very low genetic distance between the South American and African Arapaimidae species; this pattern contrasts sharply with the scenario found in other osteoglossiform species. Underlying evolutionary mechanisms potentially explaining the obtained data have been suggested and discussed.
... Chromosomal rearrangements essentially depend on these repeated sequences that are recombination hot spots, consisting of GC-rich sequences (Bailey et al., 2004). This is supported by the observation that groups with high recombination rates such as Muridae (Rodentia) have large quantities of GC repeats in pericentromeric regions as a result of the generation of neocentromeres after centric fission events (Webber and Ponting, 2005). This is clearly observed when rKDmicros are mapped onto the mammalian phylogeny ( Figure 1b): contrasting patterns are observed within the same phylogenetic group. ...
Article
Full-text available
Chromosomal rearrangements have a relevant role in organismic evolution. However, little is known about the mechanisms that lead different phylogenetic clades to have different chromosomal rearrangement rates. Here, we investigate the causes behind the wide karyotypic diversity exhibited by mammals. In particular, we analyzed the role of metabolic, reproductive, biogeographic and genomic characteristics on the rates of macro- and microstructural karyotypic diversification (rKD) using comparative phylogenetic methods. We found evidence that reproductive characteristics such as larger litter size per year and longevity, by allowing a higher number of meioses in absolute time, favor a higher probability of chromosomal change. Furthermore, families with large geographic distributions but containing species with restricted geographic ranges showed a greater probability of fixation of macrostructural chromosomal changes in different geographic areas. Finally, rKD does not evolve by Brownian motion because the mutation rate depends on the concerted evolution of repetitive sequences. The decisive factors of rKD evolution will be natural selection, genetic drift and meiotic drive that will eventually allow or not the fixation of the rearrangements. Our results indicate that mammalian karyotypic diversity is influenced by historical and adaptive mechanisms where reproductive and genomic factors modulate the rate of chromosomal change.
... Chromosomal rearrangements essentially depend on these repeated sequences that are recombination hot spots, consisting of GC-rich sequences (Bailey et al., 2004). This is supported by the observation that groups with high recombination rates such as Muridae (Rodentia) have large quantities of GC repeats in pericentromeric regions as a result of the generation of neocentromeres after centric fission events (Webber and Ponting, 2005). This is clearly observed when rKDmicros are mapped onto the mammalian phylogeny ( Figure 1b): contrasting patterns are observed within the same phylogenetic group. ...
Article
Full-text available
Chromosomal rearrangements have a relevant role in organismic evolution. However, little is known about the mechanisms that lead different phylogenetic clades to have different chromosomal rearrangement rates. Here, we investigate the causes behind the wide karyotypic diversity exhibited by mammals. In particular, we analyzed the role of metabolic, reproductive, biogeographic and genomic characteristics on the rates of macro-and microstructural karyotypic diversification (rKD) using comparative phylogenetic methods. We found evidence that reproductive characteristics such as larger litter size per year and longevity, by allowing a higher number of meioses in absolute time, favor a higher probability of chromosomal change. Furthermore, families with large geographic distributions but containing species with restricted geographic ranges showed a greater probability of fixation of macrostructural chromosomal changes in different geographic areas. Finally, rKD does not evolve by Brownian motion because the mutation rate depends on the concerted evolution of repetitive sequences. The decisive factors of rKD evolution will be natural selection, genetic drift and meiotic drive that will eventually allow or not the fixation of the rearrangements. Our results indicate that mammalian karyotypic diversity is influenced by historical and adaptive mechanisms where reproductive and genomic factors modulate the rate of chromosomal change.
... Transposable elements have been invoked to explain the presence of multiple sex-determining regions in species of the Simuliidae, although direct evidence is still wanting and alternative models have been considered (Procunier 1982b, Bedo 1984, Brockhouse 1985. In support of a nonrandom model is the association of high G+C regions with areas of high-frequency breakage (Webber and Ponting 2005). Also relevant to general genome organization is the finding that AT-rich (heterochromatic) polytene bands of the Simulium vittatum complex are randomly dispersed throughout the complement (Procunier and Smith 1993). ...
Article
Full-text available
An extreme example of nonrandom rearrangements, especially inversion breaks, is described in the polytene chromosomes of the black fly Simulium bergi Rubtsov, 1956 from Armenia and Turkey. A total of 48 rearrangements was discovered, relative to the standard banding sequence for the subgenus Simulium Latreille, 1802. One rearrangement, an inversion (IIS-C) in the short arm of the second chromosome, was fixed. Six (12.5%) of the rearrangements were autosomal polymorphisms, and the remaining 41 (85.4%) were sex linked. More than 40 X- and Y-linked rearrangements, predominantly inversions, were clustered in the long arm of the second chromosome (IIL), representing about 15% of the total complement. The pattern conforms to a nonrandom model of chromosome breakage, perhaps associated with an underlying molecular mechanism.
... En fait, il semblerait que les régions fragiles pour les réarrangements soient également des régions fragiles pour l'insertion des duplications segmentaires (voir Chapitre 1, Section 1.4.3). D'autres caractéristiques comme les sites fragiles, divers éléments répétés, le contenu en GC, la densité en gènes ont été étudiées dans la littérature, mais ces études sont limitées, soit à certains points de cassure ou certaines régions génomiques(Gordon et al., 2007;Webber et Ponting, 2005), soit par la résolution des points de cassure considérés(Murphy et al., 2005;Schibler et al., 2006;Ruiz-Herrera et al., 2006).Nous espérons, grâce à la meilleure résolution de nos points de cassure, pouvoir apporter de nouveaux éléments de réponse à ces questions. Nous nous proposons donc d'étudier la distribution des points de cassure le long du génome humain en fonction de plusieurs structures génomiques. ...
Article
version abrégée sans images soumises à copyright
... Segmental duplication. In human and other eutherians, segmental duplications (defined as pairs of regions with $90% sequence similarity over $1 kb) are associated with chromosomal fragility and syntenic breakpoints 55,56 . The relative karyotypic stability of metatherians therefore indicated that they might have a low proportion of segmental duplications. ...
Article
Full-text available
We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation.
... We further investigated whether variations of mutation rate or recombination rate would affect the power of these methods, because both of the rates are not uniform in different human genome regions (Webber and Ponting 2005;Coop et al. 2008;Berg et al. 2010;Conrad et al. 2011). In the scenarios with different mutation rates, all of these methods showed a trend of power increase along with increase of mutation rate ( fig. ...
Article
Full-text available
Studies of natural selection, followed by functional validation, are shedding light on understanding of genetic mechanisms underlying human evolution and adaptation. Classic methods for detecting selection, such as the integrated haplotype score (iHS) and Fay&Wu's H statistic, are useful for candidate gene searching underlying positive selection. These methods, however, have limited capability to localize causal variants in selection target regions. In the present study, we developed a novel method based on conditional coalescent tree to detect recent positive selection by counting unbalanced mutations on coalescent gene genealogies. Extensive simulation studies revealed that our method is more robust than many other approaches against biases due to various demographic effects, including population bottleneck, expansion, or stratification, while not sacrificing its power. Furthermore, our method demonstrated its superiority in localizing causal variants from massive linked genetic variants. The rate of successful localization was about 20 to 40% higher than that of other state of the art methods on simulated data sets. On empirical data, validated functional causal variants of four well-known positive selected genes were all successfully localized by our method, such as ADH1B, MCM6, APOL1, and HBB. Finally, the computational efficiency of this new method was much higher than that of iHS implementations, i.e. 24 to 66 times faster than the REHH package, and more than 10,000 times faster than the original iHS implementation. These magnitudes make our method suitable for applying on large sequencing datasets. Software can be downloaded from https://github.com/wavefancy/scct.
... Причем это верно и для генов, которые оказались теломерными у одного из видов млекопитающих и не являются теломерными у других видов. В первом случае эти гены будут накапливать несинонимичные замены, а во втором -нет (Webber, Ponting, 2005). Сравнение геномов человека и собаки показало, что этот феномен наблюдается не только вблизи теломер, но и в районах, приближенных к любым эволюционным разрывам хромосом, хотя и в меньшей степени (Lindblad-Toh et al., 2005). ...
Article
Full-text available
Недавний прогресс в расшифровке геномов млекопитающих способствовал их детальному сравнению. Это сравнение может пролить свет на закономерности эволюции геномов, включая реорганизацию хромосом в ходе эволюции. Наиболее важными являются вопросы о случайности или предопре- деленности позиций хромосомных перестроек, особенностей организации хромосом в районах эволюционных разрывов, связи эволюционных хромосомных разрывов и позиций хромосомных аберраций, возникающих в раковых клетках, а также степени реорганизации предкового генома у представителей разных отрядов млекопитающих. В работе представлен обзор недавно полученных данных, проливающих свет на эти и некоторые другие аспекты эволюции хромосом млекопитающих.
... With respect to the common ancestor of eutherian mammals (CAE, 2n = 42), their genome is substantially rearranged. However, mouse and rat genomes are also severely altered with respect to the CAE genome, as they are highly rearranged and have accumulated large numbers of nucleotide substitutions in neutral sites [22]. Nonetheless, the canine gene products seem to be more closely related to their human homologs than those of mice. ...
Article
Full-text available
Companion animals like dogs frequently develop tumors with age and similarly to human malignancies, display interpatient tumoral heterogeneity. Tumors are frequently characterized with regard to their mutation spectra, changes in gene expression or protein levels. Among others, these changes affect proteins involved in the DNA damage response (DDR), which served as a basis for the development of numerous clinically relevant cancer therapies. Even though the effects of different DNA damaging agents, as well as DDR kinetics, have been well characterized in mammalian cells in vitro, very little is so far known about the kinetics of DDR in tumor and normal tissues in vivo. Due to (i) the similarities between human and canine genomes, (ii) the course of spontaneous tumor development, as well as (iii) common exposure to environmental agents, canine tumors are potentially an excellent model to study DDR in vivo. This is further supported by the fact that dogs show approximately the same rate of tumor development with age as humans. Though similarities between human and dog osteosarcoma, as well as mammary tumors have been well established, only few studies using canine tumor samples addressed the importance of affected DDR pathways in tumor progression, thus leaving many questions unanswered. Studies in humans showed that misregulated DDR pathways play an important role during tumor development, as well as in treatment response. Since dogs are proposed to be a good tumor model in many aspects of cancer research, we herein critically investigate the current knowledge of canine DDR and discuss (i) its future potential for studies on the in vivo level, as well as (ii) its possible translation to veterinary and human medicine.
... The FBM postulates existence of fragile genomic regions that are more likely to be broken by rearrangements than the rest of the genome, implying (in contrast to the RBM) high breakpoint re-use rate. A variety of further studies argued for existence of fragile regions in mammalian genomes [12,13,14,15,16,17,18,19,20]. For example, Kikuta et al, 2007 [20] analyzed the links between genome fragility and the need to keep genome intact by regulatory elements and came to the conclusion that " the Nadeau and Taylor hypothesis is not possible for the explanation of synteny in general. ...
... However, Pevzner and Tesler, 2003 [3] recently refuted RBM and suggested an alternative Fragile Breakage Model of chromosome evolution. Murphy et al., 2005 [4] and a variety of other studies further argued for the existence of fragile regions in mammalian genomes5678910. 1 The standard rearrangement operations (reversals/translocations/fusions/fissions) can be modelled by making 2-breaks in a genome and gluing the resulting fragments in a new order. Most biologists believe that k-break rearrangements are unlikely for k > 3 and relatively rare for k = 3 (at least in mammalian evolution). ...
Article
Most genome rearrangements (e.g., reversals and translocations) can be represented as 2-breaks that break a genome at 2 points and glue the resulting fragments in a new order. Multi-break rearrangements break a genome into multiple fragments and further glue them together in a new order. While multi-break rearrangements were studied in depth for k = 2 breaks, the k-break distance problem for arbitrary k remains unsolved. We prove a duality theorem for multi-break distance problem and give a polynomial algorithm for computing this distance. c 2008 Elsevier B.V. All rights reserved.
... If we assume that recombination in dogs is targeted more at ancestrally GCrich sequence then in other lineages, then this would predict that chromosomal rearrangements in dogs would occur preferentially at GC-rich sequence. In support of this prediction, Webber and Ponting (2005) previously reported that the evolution of the canid karyotype has been under the influence of chromosomal rearrangements that showed a tendency to target GC-rich sequence. It is therefore likely that the loss of PRDM9 influenced the distribution of recombination events and the nature of karyotype evolution in canids. ...
Article
Full-text available
Analysis of diverse eukaryotes has revealed that recombination events cluster in discrete genomic locations known as hotspots. In humans, a zinc-finger protein, PRDM9, is believed to initiate recombination in >40% of hotspots by binding to a specific DNA sequence motif. However, the PRDM9 coding sequence is disrupted in the dog genome assembly, raising questions regarding the nature and control of recombination in dogs. By analyzing the sequences of PRDM9 orthologs in a number of dog breeds and several carnivores, we show here that this gene was inactivated early in canid evolution. We next use patterns of linkage disequilibrium using more than 170,000 SNP markers typed in almost 500 dogs to estimate the recombination rates in the dog genome using a coalescent-based approach. Broad-scale recombination rates show good correspondence with an existing linkage-based map. Significant variation in recombination rate is observed on the fine scale, and we are able to detect over 4000 recombination hotspots with high confidence. In contrast to human hotspots, 40% of canine hotspots are characterized by a distinct peak in GC content. A comparative genomic analysis indicates that these peaks are present also as weaker peaks in the panda, suggesting that the hotspots have been continually reinforced by accelerated and strongly GC biased nucleotide substitutions, consistent with the long-term action of biased gene conversion on the dog lineage. These results are consistent with the loss of PRDM9 in canids, resulting in a greater evolutionary stability of recombination hotspots. The genetic determinants of recombination hotspots in the dog genome may thus reflect a fundamental process of relevance to diverse animal species.
... The subtelomeres harbor several multigene families, including the olfactory receptor and immunoglobulin heavy chain variable region genes which exhibit a fairly high level of sequence divergence as well as multiple duplication and deletion events (Linardopoulou et al. 2001; Das et al. 2008b). Recent studies suggest that the subtelomeric and the pericentromeric regions might be less constrained than other genomic regions for recombination, duplication, gene conversion, point mutation, and translocation (Linardopoulou et al. 2005; Webber and Ponting 2005). Thus, these regions may facilitate the birth and death of their harboring genes. ...
Article
Full-text available
Defensin genes encode small cationic antimicrobial peptides that form an important part of the innate immune system. They are divided into three families, alpha (α), beta (β), and theta (), according to arrangement of the disulfide bonding pattern between cysteine residues. Considering the functional importance of defensins, investigators have studied the evolution and the genomic organization of defensin genes. However, these studies have been restricted mainly to β-defensins. To understand the evolutionary dynamics of α-defensin genes among primates, we identified the α-defensin repertoires in human, chimpanzee, orangutan, macaque, and marmoset. The α-defensin genes in primates can be classified into three phylogenetic classes (class I, II, and III). The presence of all three classes in the marmoset indicates that their divergence occurred before the separation of New World and Old World monkeys. Comparative analysis of the α-defensin genomic clusters suggests that the makeup of the α-defensin gene repertoires between primates is quite different, as their genes have undergone dramatic birth-and-death evolution. Analysis of the encoded peptides of the α-defensin genes indicates that despite the overall high level of sequence divergence, certain amino acid residues or motifs are conserved within and between the three phylogenetic classes. The evolution of α-defensins in primates, therefore, appears to be governed by two opposing evolutionary forces. One force stabilizes specific amino acid residues and motifs to preserve the functional and structural integrity of the molecules and the other diversifies the sequences generating molecules with a wide range of activities against a large number of pathogens.
... Our results are consistent with some previous observations in studies made at a lower resolution [22,40]. Also, in more specific studies, these observations were made by looking at restricted parts of some genomes, for example when human/chicken evolutionary breakpoints were compared [24] between two chicken chromosomes (11 and 28), when human-dog orthologs were studied in high GC regions [41], or a whole set of human/gibbon or human/cattle breakpoints [42,43]. In the latter case, the presence of many translocation breakpoints in gene rich regions was interpreted as positive selection acting on those genes. ...
Article
Full-text available
The Intergenic Breakage Model, which is the current model of structural genome evolution, considers that evolutionary rearrangement breakages happen with a uniform propensity along the genome but are selected against in genes, their regulatory regions and in-between. However, a growing body of evidence shows that there exists regions along mammalian genomes that present a high susceptibility to breakage. We reconsidered this question taking advantage of a recently published methodology for the precise detection of rearrangement breakpoints based on pairwise genome comparisons. We applied this methodology between the genome of human and those of five sequenced eutherian mammals which allowed us to delineate evolutionary breakpoint regions along the human genome with a finer resolution (median size 26.6 kb) than obtained before. We investigated the distribution of these breakpoints with respect to genome organisation into domains of different activity. In agreement with the Intergenic Breakage Model, we observed that breakpoints are under-represented in genes. Surprisingly however, the density of breakpoints in small intergenes (1 per Mb) appears significantly higher than in gene deserts (0.1 per Mb).More generally, we found a heterogeneous distribution of breakpoints that follows the organisation of the genome into isochores (breakpoints are more frequent in GC-rich regions). We then discuss the hypothesis that regions with an enhanced susceptibility to breakage correspond to regions of high transcriptional activity and replication initiation. We propose a model to describe the heterogeneous distribution of evolutionary breakpoints along human chromosomes that combines natural selection and a mutational bias linked to local open chromatin state.
... It seems, however, much more plausible that the pattern of recombination itself is dependent upon the distribution of open chromatin regions over the genome. Indeed, DNA duplications also occur more frequently in GC-rich compared to GC-poor isochores [44] and chromosomal fission takes place frequently within regions elevated in GC [46]. As already mentioned, in several cases the localizations of insertions/deletions in chromosomes indicate some specific preferences, such as those shown in Figure 5 and Table S1, which correspond to hot spots of recombination. ...
Article
Full-text available
The very recent availability of fully sequenced individual human genomes is a major revolution in biology which is certainly going to provide new insights into genetic diseases and genomic rearrangements. We mapped the insertions, deletions and SNPs (single nucleotide polymorphisms) that are present in Craig Venter's genome, more precisely on chromosomes 17 to 22, and compared them with the human reference genome hg17. Our results show that insertions and deletions are almost absent in L1 and generally scarce in L2 isochore families (GC-poor L1+L2 isochores represent slightly over half of the human genome), whereas they increase in GC-rich isochores, largely paralleling the densities of genes, retroviral integrations and Alu sequences. The distributions of insertions/deletions are in striking contrast with those of SNPs which exhibit almost the same density across all isochore families with, however, a trend for lower concentrations in gene-rich regions. Our study strongly suggests that the distribution of insertions/deletions is due to the structure of chromatin which is mostly open in gene-rich, GC-rich isochores, and largely closed in gene-poor, GC-poor isochores. The different distributions of insertions/deletions and SNPs are clearly related to the two different responsible mechanisms, namely recombination and point mutations.
... Supplement L compares MGRA and inferCARs on simulated data and illustrates that MGRA generates more accurate ancestral reconstructions for all choices of parameters. However, analyzing all these tools on simulated data may generate over-optimistic results since RBM does not reflect the realities of mammalian evolution (Bailey et al. 2004; van der Wind et al. 2004; Zhao et al. 2004; Murphy et al. 2005; Webber and Ponting 2005; Hinsch and Hannenhalli 2006; Ruiz-Herrera et al. 2006; Yue and Haaf 2006; Caceres et al. 2007; Gordon et al. 2007; Figure 7. The breakpoint graph G(M,R,D,Q,H,C) (the complete multi-edges are not shown) after MGRA Stage 1 (top panel) and after MGRA Stages 1–2 (bottom panel). ...
Article
Recently completed whole-genome sequencing projects marked the transition from gene-based phylogenetic studies to phylogenomics analysis of entire genomes. We developed an algorithm MGRA for reconstructing ancestral genomes and used it to study the rearrangement history of seven mammalian genomes: human, chimpanzee, macaque, mouse, rat, dog, and opossum. MGRA relies on the notion of the multiple breakpoint graphs to overcome some limitations of the existing approaches to ancestral genome reconstructions. MGRA also generates the rearrangement-based characters guiding the phylogenetic tree reconstruction when the phylogeny is unknown.
... Pevzner and Tesler (2003b) came up with an alternative Fragile Breakage Model (FBM) that postulates existence of fragile genomic regions that are more likely to be broken by rearrangements than the rest of the genome, implying (in contrast to the RBM) high breakpoint re-use rate. A variety of further studies argued for existence of fragile regions in mammalian genomes (Murphy et al., 2005; van der Wind et al., 2004; Bailey et al., 2004; Zhao et al., 2004; Webber and Ponting, 2005; Hinsch and Hannenhalli, 2006; Ruiz-Herrera et al., 2006; Mehan et al., 2007; Kikuta et al., 2007; Caceres et al., 2007; Gordon et al., 2007). In the current paper, we extend the results from Alekseyev and Pevzner (2007a) to the case of linear genomes and provide the foundation for further identification and analysis of fragile regions in mammalian genomes (Alekseyev and Pevzner, 2008b). ...
Article
Multi-break rearrangements break a genome into multiple fragments and further glue them together in a new order. While 2-break rearrangements represent standard reversals, fusions, fissions, and translocations, 3-break rearrangements represent a natural generalization of transpositions. Alekseyev and Pevzner (2007a, 2008a) studied multi-break rearrangements in circular genomes and further applied them to the analysis of chromosomal evolution in mammalian genomes. In this paper, we extend these results to the more difficult case of linear genomes. In particular, we give lower bounds for the rearrangement distance between linear genomes and for the breakpoint re-use rate as functions of the number and proportion of transpositions. We further use these results to analyze comparative genomic architecture of mammalian genomes.
... Also, rearrangements may tend to occur or to be fixed in regions of relaxed purifying selection and, thus, of faster genic evolution [5,36] . Finally, chromosomal rearrangements (especially chromosomal fissions) have been found to be located in regions of ancestrally high GC content in mammals (at least in the Dog genome) [46]. Thus, ancestral GC content could be contributing to the observed relationship between chromosomal rearrangements and higher mutation rates by means of methylation and deamination of CpG dinucleotides, leading to higher divergence measures in regions close (and within) the rearrangements. ...
Article
Full-text available
It has been suggested that chromosomal rearrangements harbor the molecular footprint of the biological phenomena which they induce, in the form, for instance, of changes in the sequence divergence rates of linked genes. So far, all the studies of these potential associations have focused on the relationship between structural changes and the rates of evolution of single-copy DNA and have tried to exclude segmental duplications (SDs). This is paradoxical, since SDs are one of the primary forces driving the evolution of structure and function in our genomes and have been linked not only with novel genes acquiring new functions, but also with overall higher DNA sequence divergence and major chromosomal rearrangements. Here we take the opposite view and focus on SDs. We analyze several of the features of SDs, including the rates of intraspecific divergence between paralogous copies of human SDs and of interspecific divergence between human SDs and chimpanzee DNA. We study how divergence measures relate to chromosomal rearrangements, while considering other factors that affect evolutionary rates in single copy DNA. We find that interspecific SD divergence behaves similarly to divergence of single-copy DNA. In contrast, old and recent paralogous copies of SDs do present different patterns of intraspecific divergence. Also, we show that some relatively recent SDs accumulate in regions that carry inversions in sister lineages.
... The nonuniform distribution of CNVs may arise from nearby repetitive sequences, facilitating a duplication or deletion via nonallelic homologous recombination (NAHR) (Stankiewicz and Lupski 2002;Hurles 2005;Lupski and Stankiewicz 2005). CNVs, SDs, and, indeed, other fragile portions of genomes such as synteny breakpoints, are also associated with further mutational biases, such as elevated nucleotide substitution rates (Armengol et al. 2005;Webber and Ponting 2005;Nguyen et al. 2006). Genomic regions both rich in SDs and prone to recombination are expected also to be enriched in CNVs, as allelic homologous recombination and NAHR are intimately related (Lindsay et al. 2006). ...
Article
Full-text available
Copy number variation is a dominant contributor to genomic variation and may frequently underlie an individual's variable susceptibilities to disease. Here we question our previous proposition that copy number variants (CNVs) are often retained in the human population because of their adaptive benefit. We show that genic biases of CNVs are best explained, not by positive selection, but by reduced efficiency of selection in eliminating deleterious changes from the human population. Of four CNV data sets examined, three exhibit significant increases in protein evolutionary rates. These increases appear to be attributable to the frequent coincidence of CNVs with segmental duplications (SDs) that recombine infrequently. Furthermore, human orthologs of mouse genes, which, when disrupted, result in pre- or postnatal lethality, are unusually depleted in CNVs. Together, these findings support a model of reduced purifying selection (Hill-Robertson interference) within copy number variable regions that are enriched in nonessential genes, allowing both the fixation of slightly deleterious substitutions and increased drift of CNV alleles. Additionally, all four CNV sets exhibited increased rates of interspecies chromosomal rearrangement and nucleotide substitution and an increased gene density. We observe that sequences with high G+C contents are most prone to copy number variation. In particular, frequently duplicated human SD sequence, or CNVs that are large and/or observed frequently, tend to be elevated in G+C content. In contrast, SD sequences that appear fixed in the human population lie more frequently within low G+C sequence. These findings provide an overarching view of how CNVs arise and segregate in the human population.
... For small inversions in human, we used 1-kb intervals centered on each endpoint, covering 1.60 Mb. Our observations are summarized inTable 3. GC content around breakpoints is slightly higher than the genome average, but not as elevated as reported for dog chromosomal breaks by Webber and Ponting (2005) . The breakpoint regions are substantially enriched for RefSeq genes, consistent with what Murphy et al. (2005) observed in larger (∼1 Mb) regions around breakpoints. ...
Article
This article analyzes mammalian genome rearrangements at higher resolution than has been published to date. We identify 3171 intervals, covering approximately 92% of the human genome, within which we find no rearrangements larger than 50 kilobases (kb) in the lineages leading to human, mouse, rat, and dog from their most recent common ancestor. Combining intervals that are adjacent in all contemporary species produces 1338 segments that may contain large insertions or deletions but that are free of chromosome fissions or fusions as well as inversions or translocations >50 kb in length. We describe a new method for predicting the ancestral order and orientation of those intervals from their observed adjacencies in modern species. We combine the results from this method with data from chromosome painting experiments to produce a map of an early mammalian genome that accounts for 96.8% of the available human genome sequence data. The precision is further increased by mapping inversions as small as 31 bp. Analysis of the predicted evolutionary breakpoints in the human lineage confirms certain published observations but disagrees with others. Although only a few mammalian genomes are currently sequenced to high precision, our theoretical analyses and computer simulations indicate that our results are reasonably accurate and that they will become highly accurate in the foreseeable future. Our methods were developed as part of a project to reconstruct the genome sequence of the last ancestor of human, dogs, and most other placental mammals.
... The variation in G 1 C content in genomes is a complex phenomenon that probably interacts with the d I measure in multiple ways, such as their shared correlation with local recombination rate (Eyre-Walker 1992;Fullerton et al. 2001;Lander et al. 2001;Lercher and Hurst 2002;Waterston et al. 2002;Hardison et al. 2003;Hellmann et al. 2003), hypermutability of CpG dinucleotides, which more frequently occur in elevated G 1 C areas (Fryxell and Moon 2005), biased gene conversion to GC (Marais 2003) causing G 1 C content elevation in regions of high recombination, increased short interspersed nuclear element insertion in elevated G 1 C regions (Jurka 1997), and other effects. In particular, Webber and Ponting (2005) have observed elevated G 1 C content and human-dog d S values in dog genes in subtelomeric and pericentromeric regions or in syntenic blocks of ,4 Mb, as well as human genes in subtelomeric regions, whereas noting that these phenomena are much less striking in the mouse and rat genomes. Due to the correlation between d S and d I , similar phenomena may affect d I , although the CpG hypermutability will affect intron sites differently from synonymous coding sites as noted earlier. ...
Article
Full-text available
Evolutionary biologists frequently rely on estimates of the neutral rate of evolution when characterizing the selective pressure on protein-coding genes. We introduce a new method to estimate this value based on intron nucleotide substitutions. The new method uses a metascript model that considers alternative splicing forms and an algorithm to pair orthologous introns, which we call Introndeuce. We compare the intron method with a widely used method that uses observed substitutions in synonymous coding nucleotides, by using both methods to estimate the neutral rate for human-dog and mouse-rat comparisons. The estimates of the 2 methods correlate strongly (r(S) = 0.75), but cannot be considered directly equivalent. We also investigate the effect of alignment error and G + C content on the variance in the intron method: in both cases there is an effect, and it is species-pair specific. Although the intron method may be more useful for shorter evolutionary distances, it is less useful at longer distances due to the poor alignment of less-conserved positions.
... However, the opossum genome had not yet been assembled, and the authors had to resort to chicken, which diverged ;310 Mya from the mammalian lineage, considerably earlier than the opossum did. Moreover, there is strong evidence for hotspots of breakage [12] and breakpoint reuse [13], discounting the ''random breakage'' model. The use of (nuclear) gene orderings to analyze rearrangements further exacerbates these issues, as it affords little power to resolve breakpoints and artificially increases inhomogeneities in breakage rates, because of large and highly variable intergenic distances. ...
... However, the opossum genome had not yet been assembled, and the authors had to resort to chicken, which diverged ;310 Mya from the mammalian lineage, considerably earlier than the opossum did. Moreover, there is strong evidence for hotspots of breakage [12] and breakpoint reuse [13], discounting the ''random breakage'' model. The use of (nuclear) gene orderings to analyze rearrangements further exacerbates these issues, as it affords little power to resolve breakpoints and artificially increases inhomogeneities in breakage rates, because of large and highly variable intergenic distances. ...
Article
Full-text available
Not Available Bibtex entry for this abstract Preferred format for this abstract (see Preferences) Find Similar Abstracts: Use: Authors Title Return: Query Results Return items starting with number Query Form Database: Astronomy Physics arXiv e-prints
Article
A comprehensive, domain-wide comparative analysis of genomic imprinting between mammals that imprint and those that do not can provide valuable information about how and why imprinting evolved. The imprinting status, DNA methylation, and genomic landscape of the Dlk1-Dio3 cluster were determined in eutherian, metatherian, and prototherian mammals including tammar wallaby and platypus. Imprinting across the whole domain evolved after the divergence of eutherian from marsupial mammals and in eutherians is under strong purifying selection. The marsupial locus at 1.6 megabases, is double that of eutherians due to the accumulation of LINE repeats. Comparative sequence analysis of the domain in seven vertebrates determined evolutionary conserved regions common to particular sub-groups and to all vertebrates. The emergence of Dlk1-Dio3 imprinting in eutherians has occurred on the maternally inherited chromosome and is associated with region-specific resistance to expansion by repetitive elements and the local introduction of noncoding transcripts including microRNAs and C/D small nucleolar RNAs. A recent mammal-specific retrotransposition event led to the formation of a completely new gene only in the eutherian domain, which may have driven imprinting at the cluster. Citation: Edwards CA, Mungall AJ, Matthews L, Ryder E, Gray DJ, et al. (2008) The evolution of the DLK1-DIO3 imprinted domain in mammals. PLoS Biol 6(6): e135. doi:10. 1371/journal.pbio.0060135
Article
Full-text available
Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.
Chapter
The dog genome illuminates several important features of mammalian genome evolution. The dog is a distant relative to the primate and rodent clades, so its genome sequence can be used to identify genetic changes that occurred on these lineages and as an aid in annotating the human genome. The dog also has a complex population history, shaped by ancient domestication events and, recently, strong artificial selection to form distinct breeds. The dog genome sequence enabled studies that shed light on canid phylogeny, revealing that this history created a genomic haplotype structure that is highly amenable to the association mapping. Dogs also exhibit enormous phenotypic variation and suffer from many of the same disorders as humans, making them a powerful model organism for identifying genetic variants responsible for many common traits and diseases including cancer, diabetes, autoimmune disorders and epilepsy. Key concepts The dog is distantly related to the primate and rodent clades. Dog genome evolution was characterized with a large number of chromosomal fissions and a low activity of transposons. Genetic evidence supports an ancient origin of dogs in East Asia. Dog domestication involved two population bottlenecks, one during dog domestication, over 15 000 years ago, and one during the formation of modern breeds in the past few hundred years. Modern dog breeds exhibit remarkable phenotypic diversity and increased rates of common diseases shared with humans, including cancer, diabetes, autoimmune disorders and epilepsy. Patterns of genetic variation in dogs are highly amenable to trait mapping by association studies.
Chapter
The emergence of high-quality mammalian whole genome sequence data has enabled for the first time a comprehensive investigation of the molecular mechanisms of mammalian chromosome evolution. New sequence data reveal an unexpected degree of chromosomal plasticity, both in the healthy human population and in cross-species evolutionary comparisons. In the context of the evolutionary framework established by comparative cytogenetics, the new data reveal lineage-specific chromosomal rearrangement pattern often linked to particular duplicated or repetitive sequences. Multispecies comparisons indicate evolutionary reuse of certain chromosomal breakpoints as well as some associations of breakpoints to cytogenetic chromosomal landmarks. Keywords: genome plasticity; segmental duplications; karyotype evolution; chromosome rearrangement mechanisms; comparative genetics
Article
Full-text available
Specific features of the evolution of chromosomes in mammals and other amniotes are reviewed. A comparative analysis of the chromosome architecture revealed the nonrandom distribution of chromosome rearrangement sites in the genome, the probable role of chromosome rearrangement in adaptation, and the evolution-mediated selection of conserved chromosome blocks. Chromosome sites stable during evolution are enriched for genes that contribute to early organism development. Rearrangements within these blocks are incompatible with the survival of the organism. Further analysis of chromosome evolution requires more information about completely sequenced genomes. Key wordscomparative genomics-genome evolution-amniotes
Conference Paper
Multi-break rearrangements break a genome into multiple fragments and further glue them together in a new order. While 2-break rearrangements represent standard reversals, fusions, fissions, and translocations operations; 3-break rearrangements are a natural generalization of transpositions and inverted transpositions. Multi-break rearrangements in circular genomes were studied in depth in[1] and were further applied to the analysis of chromosomal evolution in mammalian genomes[2]. In this paper we extend these results to the more difficult case of linear genomes. In particular, we give lower bounds for the rearrangement distance between linear genomes and use these results to analyze comparative genomic architecture of mammalian genomes.
Article
Cerastoderma edule (Cardiidae) has a diploid chromosome number of 2n = 38, its karyotype consisting of 12 submetacentric, 4 subtelocentric and 3 telocentric chromosome pairs. Hyperdiploid cells had previously been observed in two populations of the Northern Galician coasts (northwest of Spain). The supernumerary chromosomes being easily distinguished by their reduced differentiated size and by their intra- and inter-individual variability. After the recent observation of 35% of cells with supernumerary chromosomes in a population of the Southern Galician coasts (Vigo) and 15% of cells with supernumerary chromosomes in a population of the south of Portugal (Ria Formosa, Algarve), we attempted, in this paper, an elucidation of the nature of these supernumerary chromosomes, by differential banding technique with restriction enzymes on these hyperdiploid cells. Analysis of the restriction enzyme banding of the 2n > 38 karyotypes led us to propose the occurrence of a chromosomal fission event involving the largest submetacentric chromosome pair. This study represents the first description of the occurrence of a possible chromosomal fission in marine bivalves. Different levels of environmental pollution are suggested as possible explanation for the differences observed on the proportion of hyperdiploid cells between the Southern Portugal population and the three Galician ones.
Article
Full-text available
An important question in genome evolution is whether there exist fragile regions (rearrangement hotspots) where chromosomal rearrangements are happening over and over again. Although nearly all recent studies supported the existence of fragile regions in mammalian genomes, the most comprehensive phylogenomic study of mammals raised some doubts about their existence. Here we demonstrate that fragile regions are subject to a birth and death process, implying that fragility has a limited evolutionary lifespan. This finding implies that fragile regions migrate to different locations in different mammals, explaining why there exist only a few chromosomal breakpoints shared between different lineages. The birth and death of fragile regions as a phenomenon reinforces the hypothesis that rearrangements are promoted by matching segmental duplications and suggests putative locations of the currently active fragile regions in the human genome.
Article
The suggestion that chromosomal rearrangements play a role in speciation resulted from the observation that heterokaryotypes are often infertile. However, the first chromosomal speciation models were unsatisfactory and data available to test them was scarce. Recently, large amounts of data have become available and new theoretical models have been developed explaining how rearrangements facilitate speciation in the face of gene flow. Here, we re-examine theoretical predictions and revisit different sources of data. Although rearrangements are often associated with increased levels of divergence, unequivocal demonstration that their role in suppressing recombination results in speciation is often lacking. Finally, we question some previous predictions and suggest new empirical and theoretical approaches to understanding the relevance of rearrangements in the origin of species.
Article
The genomes of birds and nonavian reptiles (Reptilia) are critical for understanding genome evolution in mammals and amniotes generally. Despite decades of study at the chromosomal and single-gene levels, and the evidence for great diversity in genome size, karyotype, and sex chromosome diversity, reptile genomes are virtually unknown in the comparative genomics era. The recent sequencing of the chicken and zebra finch genomes, in conjunction with genome scans and the online publication of the Anolis lizard genome, has begun to clarify the events leading from an ancestral amniote genome--predicted to be large and to possess a diverse repeat landscape on par with mammals and a birdlike sex chromosome system--to the small and highly streamlined genomes of birds. Reptilia exhibit a wide range of evolutionary rates of different subgenomes and, from isochores to mitochondrial DNA, provide a critical contrast to the genomic paradigms established in mammals.
Article
Full-text available
Recombination is typically thought of as a symmetrical process resulting in large-scale reciprocal genetic exchanges between homologous chromosomes. Recombination events, however, are also accompanied by short-scale, unidirectional exchanges known as gene conversion in the neighborhood of the initiating double-strand break. A large body of evidence suggests that gene conversion is GC-biased in many eukaryotes, including mammals and human. AT/GC heterozygotes produce more GC- than AT-gametes, thus conferring a population advantage to GC-alleles in high-recombining regions. This apparently unimportant feature of our molecular machinery has major evolutionary consequences. Structurally, GC-biased gene conversion explains the spatial distribution of GC-content in mammalian genomes-the so-called isochore structure. Functionally, GC-biased gene conversion promotes the segregation and fixation of deleterious AT --> GC mutations, thus increasing our genomic mutation load. Here we review the recent evidence for a GC-biased gene conversion process in mammals, and its consequences for genomic landscapes, molecular evolution, and human functional genomics.
Article
Metazoan genomes are being sequenced at an increasingly rapid rate. For each new genome, the number of protein-coding genes it encodes and the amount of functional DNA it contains are known only inaccurately. Nevertheless, there have been considerable recent advances in identifying protein-coding and non-coding sequences that have remained constrained in diverse species. However, these approaches struggle to pinpoint genomic sequences that are functional in some species but that are absent or not functional in others. Yet it is here, encoded in lineage-specific and functional sequence, that we expect physiological differences between species to be most concentrated.
Article
Full-text available
The domestic dog exhibits greater diversity in body size than any other terrestrial vertebrate. We used a strategy that exploits the breed structure of dogs to investigate the genetic basis of size. First, through a genome-wide scan, we identified a major quantitative trait locus (QTL) on chromosome 15 influencing size variation within a single breed. Second, we examined genetic variation in the 15-megabase interval surrounding the QTL in small and giant breeds and found marked evidence for a selective sweep spanning a single gene (IGF1), encoding insulin-like growth factor 1. A single IGF1 single-nucleotide polymorphism haplotype is common to all small breeds and nearly absent from giant breeds, suggesting that the same causal sequence variant is a major contributor to body size in all small dogs.
Article
Full-text available
Ore mineral and host lithologies have been sampled with 89 oriented samples from 14 sites in the Naica District, northern Mexico. Magnetic parameters permit to charac- terise samples: saturation magnetization, density, low- high-temperature magnetic sus- ceptibility, remanence intensity, Koenigsberger ratio, Curie temperature and hystere- sis parameters. Rock magnetic properties are controlled by variations in titanomag- netite content and hydrothermal alteration. Post-mineralization hydrothermal alter- ation seems the major event that affected the minerals and magnetic properties. Curie temperatures are characteristic of titanomagnetites or titanomaghemites. Hysteresis parameters indicate that most samples have pseudo-single domain (PSD) magnetic grains. Alternating filed (AF) demagnetization and isothermal remanence (IRM) ac- quisition both indicate that natural and laboratory remanences are carried by MD-PSD spinels in the host rocks. The trend of NRM intensity vs susceptibility suggests that the carrier of remanent and induced magnetization is the same in all cases (spinels). The Koenigsberger ratio range from 0.05 to 34.04, indicating the presence of MD and PSD magnetic grains. Constraints on the geometry of the intrusive source body devel- oped in the model of the magnetic anomaly are obtained by quantifying the relative contributions of induced and remanent magnetization components.
Article
Full-text available
In the yeast Saccharomyces cerevisiae, meiotic recombination is initiated by double-strand DNA breaks (DSBs). Meiotic DSBs occur at relatively high frequencies in some genomic regions (hotspots) and relatively low frequencies in others (coldspots). We used DNA microarrays to estimate variation in the level of nearby meiotic DSBs for all 6,200 yeast genes. Hotspots were nonrandomly associated with regions of high G + C base composition and certain transcriptional profiles. Coldspots were nonrandomly associated with the centromeres and telomeres.
Article
Full-text available
The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.
Article
Full-text available
Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.
Article
Full-text available
We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.
Article
Full-text available
The CpG dinucleotide is present at approximately 20% of its expected frequency in vertebrate genomes, a deficiency thought due to a high mutation rate from the methylated form of CpG to TpG and CpA. We examine the hypothesis that the 20% frequency represents an equilibrium between rate of creation of new CpGs and accelerated rate of CpG loss from methylation. Using this model, we calculate the expected reduction in the equilibrium frequency of the CpG dinucleotide and find that the observed CpG deficiency can be explained by mutation from methylated CpG to TpG/CpA at approximately 12 times the normal transition rate, the exact rate depending on the ratio of transitions to transversions. The observed rate of CpG dinucleotide loss in a human alpha-globin nonprocessed pseudogene, psi alpha 1, and the apparent replenishment of the CpG pool in this sequence by new mutations, agree with the above parameters. These calculations indicate that it would take 25 million years or less, a small fraction of the time for vertebrate evolution, for CpG frequency to be reduced from undepleted levels to the current depleted levels.
Article
Full-text available
In the yeast Saccharomyces cerevisiae, meiotic recombination is initiated by transient DNA double-strand breaks (DSBs) that are repaired by interaction of the broken chromosome with its homologue. To identify a large number of DSB sites and gain insight into the control of DSB formation at both the local and the whole chromosomal levels, we have determined at high resolution the distribution of meiotic DSBs along the 340 kb of chromosome III. We have found 76 DSB regions, mostly located in intergenic promoter-containing intervals. The frequency of DSBs varies at least 50-fold from one region to another. The global distribution of DSB regions along chromosome III is nonrandom, defining large (39-105 kb) chromosomal domains, both hot and cold. The distribution of these localized DSBs indicates that they are likely to initiate most crossovers along chromosome III, but some discrepancies remain to be explained.
Article
Full-text available
The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.
Article
Full-text available
Nucleotide substitution rates and G + C content vary considerably among mammalian genes. It has been proposed that the mammalian genome comprises a mosaic of regions - termed isochores - with differing G + C content. The regional variation in gene G + C content might therefore be a reflection of the isochore structure of chromosomes, but the factors influencing the variation of nucleotide substitution rate are still open to question. To examine whether nucleotide substitution rates and gene G + C content are influenced by the chromosomal location of genes, we compared human and murid (mouse or rat) orthologues known to belong to one of the chromosomal (autosomal) segments conserved between these species. Multiple members of gene families were excluded from the dataset. Sets of neighbouring genes were defined as those lying within 1 centiMorgan (cM) of each other on the mouse genetic map. For both synonymous substitution rates and G + C content at silent sites, neighbouring genes were found to be significantly more similar to each other than sets of genes randomly drawn from the dataset. Moreover, we demonstrated that the regional similarities in G + C content (isochores) and synonymous substitution rate were independent of each other. Our results provide the first substantial statistical evidence for the existence of a regional variation in the synonymous substitution rate within the mammalian genome, indicating that different chromosomal regions evolve at different rates. This regional phenomenon which shapes gene evolution could reflect the existence of 'evolutionary rate units' along the chromosome.
Article
Full-text available
The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
Article
Full-text available
There is considerable interest in understanding patterns of linkage disequilibrium (LD) in the human genome, to aid investigations of human evolution and facilitate association studies in complex disease. The relative influences of meiotic crossover distribution and population history on LD remain unclear, however. In particular, it is uncertain to what extent crossovers are clustered into 'hot spots, that might influence LD patterns. As a first step to investigating the relationship between LD and recombination, we have analyzed a 216-kb segment of the class II region of the major histocompatibility complex (MHC) already characterized for familial crossovers. High-resolution LD analysis shows the existence of extended domains of strong association interrupted by patchwork areas of LD breakdown. Sperm typing shows that these areas correspond precisely to meiotic crossover hot spots. All six hot spots defined share a remarkably similar symmetrical morphology but vary considerably in intensity, and are not obviously associated with any primary DNA sequence determinants of hot-spot activity. These hot spots occur in clusters and together account for almost all crossovers in this region of the MHC. These data show that, within the MHC at least, crossovers are far from randomly distributed at the molecular level and that recombination hot spots can profoundly affect LD patterns.
Article
Full-text available
Determination of recombination rates across the human genome has been constrained by the limited resolution and accuracy of existing genetic maps and the draft genome sequence. We have genotyped 5,136 microsatellite markers for 146 families, with a total of 1,257 meiotic events, to build a high-resolution genetic map meant to: (i) improve the genetic order of polymorphic markers; (ii) improve the precision of estimates of genetic distances; (iii) correct portions of the sequence assembly and SNP map of the human genome; and (iv) build a map of recombination rates. Recombination rates are significantly correlated with both cytogenetic structures (staining intensity of G bands) and sequence (GC content, CpG motifs and poly(A)/poly(T) stretches). Maternal and paternal chromosomes show many differences in locations of recombination maxima. We detected systematic differences in recombination rates between mothers and between gametes from the same mother, suggesting that there is some underlying component determined by both genetic and environmental factors that affects maternal recombination rates.
Article
Full-text available
Canidae species fall into two categories with respect to their chromosome composition: those with high numbered largely acrocentric karyotypes and others with a low numbered principally metacentric karyotype. Those species with low numbered metacentric karyotypes are derived from multiple independent fusions of chromosome segments found as acrocentric chromosomes in the high numbered species. Extensive chromosome homology is apparent among acrocentric chromosome arms within Canidae species; however, little chromosome arm homology exists between Canidae species and those from other Carnivore families. Here we use Zoo-FISH (fluorescent in situ hybridization, also called chromosomal painting) probes from flow-sorted chromosomes of the Japanese raccoon dog (Nyctereutes procyonoides) to examine two phylogenetically divergent canids, the arctic fox (Alopex lagopus) and the crab-eating fox (Cerdocyon thous). The results affirm intra-canid chromosome homologies, also implicated by G-banding. In addition, painting probes from domestic cat (Felis catus), representative of the ancestral carnivore karyotype (ACK), and giant panda (Ailuropoda melanoleuca) were used to define primitive homologous segments apparent between canids and other carnivore families. Canid chromosomes seem unique among carnivores in that many canid chromosome arms are mosaics of two to four homology segments of the ACK chromosome arms. The mosaic pattern apparently preceded the divergence of modern canid species since conserved homology segments among different canid species are common, even though those segments are rearranged relative to the ancestral carnivore genome arrangement. The results indicate an ancestral episode of extensive centric fission leading to an ancestral canid genome organization that was subsequently reorganized by multiple chromosome fusion events in some but not all Canidae lineages.
Article
Full-text available
The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.
Article
Full-text available
To understand the origin and evolution of isochores-the peculiar spatial distribution of GC content within mammalian genomes-we analyzed the synonymous substitution pattern in coding sequences from closely related species in different mammalian orders. In primate and cetartiodactyls, GC-rich genes are undergoing a large excess of GC --> AT substitutions over AT --> GC substitutions: GC-rich isochores are slowly disappearing from the genome of these two mammalian orders. In rodents, our analyses suggest both a decrease in GC content of GC-rich isochores and an increase in GC-poor isochores, but more data will be necessary to assess the significance of this pattern. These observations question the conclusions of previous works that assumed that base composition was at equilibrium. Analysis of allele frequency in human polymorphism data, however, confirmed that in the GC-rich parts of the genome, GC alleles have a higher probability of fixation than AT alleles. This fixation bias appears not strong enough to overcome the large excess of GC --> AT mutations. Thus, whatever the evolutionary force (neutral or selective) at the origin of GC-rich isochores, this force is no longer effective in mammals. We propose a model based on the biased gene conversion hypothesis that accounts for the origin of GC-rich isochores in the ancestral amniote genome and for their decline in present-day mammals.
Article
Full-text available
Six measures of evolutionary change in the human genome were studied, three derived from the aligned human and mouse genomes in conjunction with the Mouse Genome Sequencing Consortium, consisting of (1) nucleotide substitution per fourfold degenerate site in coding regions, (2) nucleotide substitution per site in relics of transposable elements active only before the human-mouse speciation, and (3) the nonaligning fraction of human DNA that is nonrepetitive or in ancestral repeats; and three derived from human genome data alone, consisting of (4) SNP density, (5) frequency of insertion of transposable elements, and (6) rate of recombination. Features 1 and 2 are measures of nucleotide substitutions at two classes of "neutral" sites, whereas 4 is a measure of recent mutations. Feature 3 is a measure dominated by deletions in mouse, whereas 5 represents insertions in human. It was found that all six vary significantly in megabase-sized regions genome-wide, and many vary together. This indicates that some regions of a genome change slowly by all processes that alter DNA, and others change faster. Regional variation in all processes is correlated with, but not completely accounted for, by GC content in human and the difference between GC content in human and mouse.
Article
Full-text available
Competing hypotheses for the timing of the placental mammal radiation focus on whether extant placental orders originated and diversified before or after the Cretaceous-Tertiary (KT) boundary. Molecular studies that have addressed this issue suffer from single calibration points, unwarranted assumptions about the molecular clock, andor taxon sampling that lacks representatives of all placental orders. We investigated this problem using the largest available molecular data set for placental mammals, which includes segments of 19 nuclear and three mitochondrial genes for representatives of all extant placental orders. We used the ThorneKishino method, which permits simultaneous constraints from the fossil record and allows rates of molecular evolution to vary on different branches of a phylogenetic tree. Analyses that used different sets of fossil constraints, different priors for the base of Placentalia, and different data partitions all support interordinal divergences in the Cretaceous followed by intraordinal diversification mostly after the KT boundary. Four placental orders show intraordinal diversification that predates the KT boundary, but only by an average of 10 million years. In contrast to some molecular studies that date the rat-mouse split as old as 46 million years, our results show improved agreement with the fossil record and place this split at 16-23 million years. To test the hypothesis that molecular estimates of Cretaceous divergence times are an artifact of increased body size subsequent to the KT boundary, we also performed analyses with a "KT body size" taxon set. In these analyses, interordinal splits remained in the Cretaceous.
Article
Full-text available
The soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C. briggsae genome to a high-quality draft stage and compared it to the finished C. elegans sequence. We predict approximately 19,500 protein-coding genes in the C. briggsae genome, roughly the same as in C. elegans. Of these, 12,200 have clear C. elegans orthologs, a further 6,500 have one or more clearly detectable C. elegans homologs, and approximately 800 C. briggsae genes have no detectable matches in C. elegans. Almost all of the noncoding RNAs (ncRNAs) known are shared between the two species. The two genomes exhibit extensive colinearity, and the rate of divergence appears to be higher in the chromosomal arms than in the centers. Operons, a distinctive feature of C. elegans, are highly conserved in C. briggsae, with the arrangement of genes being preserved in 96% of cases. The difference in size between the C. briggsae (estimated at approximately 104 Mbp) and C. elegans (100.3 Mbp) genomes is almost entirely due to repetitive sequence, which accounts for 22.4% of the C. briggsae genome in contrast to 16.5% of the C. elegans genome. Few, if any, repeat families are shared, suggesting that most were acquired after the two species diverged or are undergoing rapid evolution. Coclustering the C. elegans and C. briggsae proteins reveals 2,169 protein families of two or more members. Most of these are shared between the two species, but some appear to be expanding or contracting, and there seem to be as many as several hundred novel C. briggsae gene families. The C. briggsae draft sequence will greatly improve the annotation of the C. elegans genome. Based on similarity to C. briggsae, we found strong evidence for 1,300 new C. elegans genes. In addition, comparisons of the two genomes will help to understand the evolutionary forces that mold nematode genomes.
Article
The compositional distributions of coding sequences and DNA molecules (in the 50-100-kb range) are remarkably narrower in murids (rat and mouse) compared to humans (as well as to all other mammals explored so far). In murids, both distributions begin at higher and end at lower GC values. A comparison of homologous coding sequences from murids and humans revealed that their different compositional distributions are due to differences in GC levels in all three codon positions, particularly of genes located at both ends of the distribution. In turn, these differences are responsible for differences in both codon usage and amino acids. When GC levels at first+second codon positions and third codon positions, respectively, of murid genes are plotted against corresponding GC levels of homologous human genes, linear relationships (with very high correlation coefficients and slopes of about 0.78 and 0.60, respectively) are found. This indicates a conservation of the order of GC levels in homologous genes from humans and murids. (The same comparison for mouse and rat genes indicates a conservation of GC levels of homologous genes.) A similar linear relationship was observed when plotting GC levels of corresponding DNA fractions (as obtained by density gradient centrifugation in the presence of a sequence-specific ligand) from mouse and human. These findings indicate that orderly compositional changes affecting not only coding sequences but also noncoding sequences took place since the divergence of murids. Such directional fixations of mutations point to the existence of selective pressures affecting the genome as a whole.
Article
Calf DNA preparations having molecular weights of 5 to 7 × 106 have been fractionated by preparative Cs2SO4—Ag+ density gradient centrifugation into a number of components. These may be divided into three groups: (1) the main DNA component (1.697 g/cm3; all densities quoted are those determined in CsCl density gradients), the 1.704 and 1.709 g/cm3 components form about 50, 25 and 10% of the genome, respectively; they are characterized by having symmetrical CsCl bands and melting curves, both of which have standard deviations close to those of bacterial DNAs of comparable molecular weight, and by their G + C contents being equal to 39, 48 and 54%, respectively; after heat-denaturation and reannealing, their buoyant densities in CsCl are greater than native DNA by 12, 10 and 3 mg/cm3, respectively. (2) The 1.705, 1.710, 1.714 and 1.723 g/cm3 components represent 4, 1.5, 7 and 1.5% of the DNA, respectively, and exhibit the properties of “satellite” DNAs; their CsCl bands and melting curves have standard deviations lower than those of bacterial DNAs; after heat-denaturation and reannealing, their buoyant densities are identical to native DNA, except for the 1.705 g/cm3 component, which remains heavier by 5 mg/cm3; in alkaline CsCl, only the 1.714 g/cm3 component shows a strand separation. (3) A number of minor components, forming 1% of the DNA, have been recognized, but they have not been investigated in detail; two of them (1.719 and 1.699 g/cm3) might correspond to ribosomal cistrons and mitochondrial DNA, respectively.
Article
CpG dinucleotides mutate at a high rate because cytosine is vulnerable to deamination, cytosines in CpG dinucleotides are often methylated, and deamination of 5-methylcytosine (5mC) produces thymidine. Previous experiments have shown that DNA melting is the rate-limiting step in cytosine deamination. Here we show, through the analysis of human single-nucleotide polymorphisms (SNPs), that the mutation rate produced by 5mC deamination is highly dependent on local GC content. In fact, linear regression analysis showed that the log(10) of the 5mC mutation rates (inferred from SNP frequencies) had slopes of -3 when graphed with respect to the GC content of neighboring sequences. This is the ideal slope that would be expected if the correlation between CpG underrepresentation and GC content had been solely caused by DNA melting. Moreover, this same result was obtained regardless of the SNP locations (all SNPs versus only SNPs in noncoding intergenic regions, excluding CpG islands) and regardless of the lengths over which GC content was calculated (SNP sequences with a modal length of 564 bp versus genomic contigs with a modal length of 163 kb). Several alternative interpretations are discussed.
Article
Break points of structural rearrangements of human chromosomes can be identified by banding techniques. The present study attempts to analyze the randomness and the distribution of the reported spontaneous break points in the human genome. Reports of break points in structural rearrangements of human chromosomes from the published sources up to October 1976 were analyzed. Based on the assumption that each unit length of band has an equal chance of being broken, chi2 tests show that positions of breakage are highly non-random; that is, breaks are more frequent in the negative band areas and in the centromeric and terminal regions. In double-break rearrangements the same band types tend to rejoin. The distribution of breaks is not proportional to the chromosome length. The longer chromosomes (i.e., 1--12, X) have a lower number of breaks per unit length, while the shorter chromosomes (i.e., 13--22, Y) have a greater number of breaks per unit length with the exception of chromosomes 4, 9, 10, 16, 17, 19, 20 and X. Out of the whole genome, chromosomes 9, 13, 18, 21, 22 and Y have the most breaks per unit length and chromosomes 16, 6, 2, 3 and 19 have the fewest. 18p11, 21q22 and Yp11 are the three bands with most frequent breaks. There are 53 bands where no breaks have been reported.
Article
High resolution studies of structural rearrangements were carried out using the G-band technique. A total of 220 breakage points were identified within individual bands from 117 unrelated cases born with a structural rearrangement. Breakage points were not evenly distributed along chromosomes in terms of G-band patterns. There was an excess involvement of light bands and a striking lack of dark bands in both reciprocal translocations and inversions. In reciprocal translocations, the middle part of a chromosome arm has less chance of being the site of an exchange than the terminal and centromeric parts. The implications of these results are briefly discussed.
Article
Localization of chromosome breaks in human chromosomes was analyzed in 264 peripheral lymphocyte cultures. Three hundred and sixty-nine chromosome breaks could be exactly localized to a chromosome band or region of the Paris Conference nomenclature. The distribution of breaks in the chromosome regions was found to be nonrandom. Chromosome 3 alone had 23% of the breaks and region 3p2 had 13% of the total breaks. Some other chromosome regions, such as 5p1, 9q1, 14q2, and 16q2 also displayed clustering of breaks. Sex chromosomes had less breaks than expected. Spontaneous chromosome breaks were almost exclusively located in the lightly stained G bands.
Article
Mismatches arise during recombination, as errors of DNA replication, and from deamination of 5-methylcytosine to thymine. We determined the efficiency and specificity of mismatch correction in simian cells. Analysis of plaques, obtained after transfection with SV40 DNA molecules harboring a single mispair in a defined orientation within the intron of the large T antigen gene, revealed that all types of base/base mispairs were corrected, albeit with different efficiencies and specificities. Heterogeneous mispairs G/T, A/C, C/T, and A/G, corrected with 96%, 78%, 72%, and 39% efficiencies, respectively, tended to be corrected to G/C. Homogeneous mispairs G/C, C/C, A/A, and T/T were corrected with 92%, 66%, 58%, and 39% efficiencies, respectively, and repair bias was influenced by mismatch flanking sequences.
Article
In the traditional view of molecular evolution, the rate of point mutation is uniform over the genome of an organism and variation in the rate of nucleotide substitution among DNA regions reflects differential selective constraints. Here we provide evidence for significant variation in mutation rate among regions in the mammalian genome. We show first that substitutions at silent (degenerate) sites in protein-coding genes in mammals seem to be effectively neutral (or nearly so) as they do not occur significantly less frequently than substitutions in pseudogenes. We then show that the rate of silent substitution varies among genes and is correlated with the base composition of genes and their flanking DNA. This implies that the variation in both silent substitution rate and base composition can be attributed to systematic differences in the rate and pattern of mutation over regions of the genome. We propose that the differences arise because mutation patterns vary with the timing of replication of different chromosomal regions in the germline. This hypothesis can account for both the origin of isochores in mammalian genomes and the observation that silent nucleotide substitutions in different mammalian genes do not have the same molecular clock.
Article
5-Methylcytosine spontaneously deaminates to form thymine, thus generating G/T mispairs in DNA. We investigated the way in which these lesions are addressed in mammalian cells by introducing specific G/T mispairs into the genome of SV40 and determining the fate of the mismatched bases in simian cells. Mispairs were incorporated in 12 bp synthetic duplexes ligated into SV40 DNA between the BstXI and TaqI restriction sites. Analysis of 347 plaques obtained after transfection of this modified DNA indicated that mispairs were corrected in 343 cases (99%), revealing 314 repair events in favor of guanine (90%) and 29 in favor of thymine (8%). Correction in favor of guanine occurred regardless of the orientation of the mispair in DNA and regardless of whether the mispair was in the commonly methylated CpG dinucleotide. These results attest to a specific mismatch repair pathway that restores G/C pairs lost through deamination of 5-methylcytosine residues.
Article
Reports of single base-pair mutations within gene coding regions causing human genetic disease were collated. Thirty-five per cent of mutations were found to have occurred within CpG dinucleotides. Over 90% of these mutations were C----T or G----A transitions, which thus occur within coding regions at a frequency 42-fold higher than that predicted from random mutations. These findings are consistent with methylation-induced deamination of 5-methyl cytosine and suggest that methylation of DNA within coding regions may contribute significantly to the incidence of human genetic disease.
Article
Chicken chromosomes were identified up to No. 18 by a sequential counterstain-enhanced fluorescence technique. A heterochromatin characterization of macro- and microchromosomes was performed; in general, the microchromosomes were GC-rich, but with a high degree of variation. The NORs are localized on chromosome No. 17.
Article
The evolutionarily important characteristics of gene conversion disparity extent and direction are surveyed in fungi. Temperature and background genotype can have small or large effects, sometimes even changing the direction of disparity. Disparity results from Sordaria and Ascobolus were very similar, with between-strain, between-data set and between-locus differences being larger than those between species or genera. In general, different loci in an organism show similar disparity properties when comparable types of mutation are considered, but may not do so in pooled results containing different proportions of different mutation types. Frameshifts typically have strong disparities, usually with negative signs for single base additions and positive signs for single base deletions. Base substitutions tend to have moderate disparities, favoring wild type more often than mutant in most data sets. Large deletions usually have significant disparity, either positive or negative. For comparable molecular types of mutation, spontaneous and induced mutations had roughly similar disparity properties.--Experimental tests and theoretical considerations generally failed to support a number of assumptions and predictions made in previous treatments of gene conversion in evolution. In general, a mutation's conversion properties depend much more on its molecular type in relation to wild type than on any evolved conversion advantages or disadvantages.
Article
Most of the nuclear genome of warm-blooded vertebrates is a mosaic of very long (much greater than 200 kilobases) DNA segments, the isochores; these isochores are fairly homogeneous in base composition and belong to a small number of major classes distinguished by differences in guanine-cytosine (GC) content. The families of DNA molecules derived from such classes can be separated and used to study the genome distribution of any sequence which can be probed. This approach has revealed (i) that the distribution of genes, integrated viral sequences, and interspersed repeats is highly nonuniform in the genome, and (ii) that the base composition and ratio of CpG to GpC in both coding and noncoding sequences, as well as codon usage, mainly depend on the GC content of the isochores harboring the sequences. The compositional compartmentalization of the genome of warm-blooded vertebrates is discussed with respect to its evolutionary origin, its causes, and its effects on chromosome structure and function.
Article
Linkage relationships of homologous loci in man and mouse were used to estimate the mean length of autosomal segments conserved during evolution. Comparison of the locations of greater than 83 homologous loci revealed 13 conserved segments. Map distances between the outermost markers of these 13 segments are known for the mouse and range from 1 to 24 centimorgans. Methods were developed for using this sample of conserved segments to estimate the mean length of all conserved autosomal segments in the genome. This mean length was estimated to be 8.1 +/- 1.6 centimorgans. Evidence is presented suggesting that chromosomal rearrangements that determine the lengths of these segments are randomly distributed within the genome. The estimated mean length of conserved segments was used to predict the probability that certain loci, such as peptidase-3 and renin, are linked in man given that homologous loci are chi centimorgans apart in the mouse. The mean length of conserved segments was also used to estimate the number of chromosomal rearrangements that have disrupted linkage since divergence of man and mouse. This estimate was shown to be 178 +/- 39 rearrangements.
Article
A total of 770 breakpoints (80 of them identified by the authors) from unrelated patients with two-break rearrangements resulting in reciprocal translocations were studied to determine whether they were located preferentially. The distribution of breakpoints among the chromosome arms differs from that expected on the basis of their lengths, with more than expected on chromosome arms 4p, 9p, 9q, 13q, 18q, 21p, 21q, 22p, and 22q and fewer than expected on 1p, 1q, 3p, 3q, 5q, 6q, 7p, 12p, 16p, and the gonosomes. More breakpoints than expected occurred in the centromeric regions, and fewer in the median regions. Distribution of breakpoints within bands differed with the technique used: with G banding a many more breakpoints were localized in the light bands and fewer in the dark bands. With R banding no fewer than expected were present in the light bands and only slightly more were found in the dark bands.
Article
Several lines of evidence are presented which suggest that sequence G + C content and recombination frequency are related in mammals: (i) chromosome G + C content is positively correlated to chiasmata density; (ii) the non-pairing region of the Y chromosome has one of the lowest G + C contents of any chromosomal segment; (iii) a reduction in the rate of recombination at several loci is mirrored by a decrease in G + C content; and (iv) when compared with humans, mice have a lower variance in chiasmata density which is reflected in a lower variance in G + C content. The observed relation between recombination frequency and sequence G + C content provides an elegant explanation of why gene density is higher in G + C rich isochores than in other parts of the genome, and why long interspersed elements (LINES) are exclusive to G + C poor isochores. However, the cause of the relation is as yet unknown. Several possibilities are considered, including gene conversion.
Article
We employed fluorescence in situ hybridization (FISH) with probes established by flow sorting metaphase chromosomes of the domestic cat (Felis cattus, 2n = 38) to "paint" homologous segments on human chromosomes and, reciprocally, using human chromosome paints on feline metaphase preparations. The results revealed, by direct microscopic observation, widespread conservation of genome organization between the two mammalian orders and confirmed 90% of the homologous genes mapped to both species. Fourteen of 23 human chomosomes were hybridized with single cat probes, and 9 of 19 cat chromosomes were entirely labeled by a single human probe. All other chromosomes were labeled with only two or, at most, three probes of the respective species. Y-chromosome probes gave no signals. Approximately 30 syntenic segments were identified, and the number of translocations could be estimated to be on the order of one new translocation per 10 million years in the phylogenetic lines leading to human and cat. Using the principle of maximum parsimony, the primitive vs. derived human chromosome segments were identified by comparison to the feline, cattle, and pig genomes, a first step in reconstructing the evolutionary heritage of the mammalian radiations. The results suggest that reciprocal chromosome painting will help reconstruct the history of genomic changes by determining the polarity of chromosomal rearrangements and establishing the ancestral karyotype for each principle branching point in mammalian evolution.
Article
PAML, currently in version 1.2, is a package of programs for phylogenetic analyses of DNA and protein sequences using the method of maximum likelihood (ML). The programs can be used for (i) maximum likelihood estimation of evolutionary parameters such as branch lengths in a phylogenetic tree, the transition/transversion rate ratio, the shape parameter of the gamma distribution for variable evolutionary rates at sites, and rate parameters for different genes; (ii) likelihood ratio test of hypotheses concerning sequence evolution, such as rate constancy and independence among sites and rate constancy among lineages (the molecular clock); (iii) calculation of substitution rates at sites and reconstruction of ancestral nucleotide or amino acid sequences; and (iv) phylogenetic tree reconstruction by maximum likelihood and Bayesian methods. The strength of PAML, in comparison with other phylogenetic packages currently available, is its implementation of a variety of evolutionary models. These include several models of variable evolutionary rates among sites, models for combined analyses of multiple gene sequence data and models for amino acid sequences. Multifurcating trees are supported, as well as trees in which some sequences are ancestral to some others. A heuristic tree search algorithm (star decomposition) is used in the package, but tree making is not a strong point of the current version, although work is under way to implement efficient search algorithms. Major programs in the package, as well as the types of analyses they perform, are listed in Table 1. More details are available in the documentation included in the package, written using Microsoft Word. PAML is distributed free of charge for academic use only. The package, including ANSI C source codes, documentation, example data sets, and control files, can be obtained by anonymous ftp at mw511.biol.berkeley.edu/pub, or from the Indiana molecular biology ftp site at ftp.bio.indiana.edu under the directory Incoming or molbio/evolve . MAC and PowerMac executables are also available, although DOS executables are not prepared yet. Further information about the package is available from the World Wide Web at
Article
Codon usage in mammals is mainly determined by the spatial arrangement of genomic G + C-content, i.e., the isochore structure. Ancestral G + C-content at third codon positions of 27 nuclear protein-coding genes of eutherian mammals was estimated by maximum-likelihood analysis on the basis of a nonhomogeneous DNA substitution model, accounting for variable base compositions among present-day sequences. Data consistently supported a human-like ancestral pattern, i.e., highly variable G + C-content among genes. The mouse genomic structure-more narrow G + C-content distribution-would be a derived state. The circumstances of isochore evolution are discussed with respect to this result. A possible relationship between G + C-content homogenization in murid genomes and high mutation rate is proposed, consistent with the negative selection hypothesis for isochore maintenance in mammals.
Article
One of the most striking features of mammalian chromosomes is the variation in G+C content that occurs over scales of hundreds of kilobases to megabases, the so-called 'isochore' structure of the human genome. This variation in base composition affects both coding and non-coding sequences and seems to reflect a fundamental level of genome organization. However, although we have known about isochores for over 25 years, we still have a poor understanding of why they exist. In this article, we review the current evidence for the three main hypotheses.
Article
Using immunological and cytological approaches, researchers are beginning to reveal the complex molecular components of transcriptionally inert, pericentric heterochromatin. A new study shows that histone modifications and a non-Xist RNA component have essential structural roles in maintaining the tight organization of this type of chromatin.
Article
Understanding the co-variation of nucleotide diversity and local recombination rates is important both for the mapping of disease-associated loci and in understanding the causes of sequence evolution. It is known that single nucleotide polymorphisms (SNPs) around protein coding genes show higher diversity in regions of high recombination. Here, we find that this correlation holds for SNPs across the entire human genome, the great majority of which are not near exons or control elements. Contrasting with results from coding regions, we provide evidence that the higher nucleotide diversity in regions of high recombination is most likely due, at least in part, to a higher mutation rate. One possible explanation for this is that recombination is mutagenic.
Article
The centromere is essential for the proper segregation and inheritance of genetic information. Neocentromeres are ectopic centromeres that originate occasionally from noncentromeric regions of chromosomes. Despite the complete absence of normal centromeric alpha-satellite DNA, human neocentromeres are able to form a primary constriction and assemble a functional kinetochore. Since the discovery and characterization of the first case of a human neocentromere in our laboratory a decade ago, 60 examples of constitutional human neocentromeres distributed widely across the genome have been described. Typically, these are located on marker chromosomes that have been detected in children with developmental delay or congenital abnormalities. Neocentromeres have also been detected in at least two types of human cancer and have been experimentally induced in Drosophila. Current evidence from human and fly studies indicates that neocentromere activity is acquired epigenetically rather than by any alteration to the DNA sequence. Since human neocentromere formation is generally detrimental to the individual, its biological value must lie beyond the individual level, such as in karyotype evolution and speciation.
Article
The availability of a complete genome sequence allows the detailed study of intraspecies variability. Here we use high-density oligonucleotide arrays to discover 11,115 single-feature polymorphisms (SFPs) existing in one or more of 14 different yeast strains. We use these SFPs to define regions of genetic identity between common laboratory strains of yeast. We assess the genome-wide distribution of genetic variation on the basis of this yeast population. We find that genome variability is biased toward the ends of chromosomes and is more likely to be found in genes with roles in fermentation or in transport. This subtelomeric bias may arise through recombination between nonhomologous sequences because full-gene deletions are more common in these regions than in more central regions of the chromosome.
Article
One of the most striking findings to emerge from the study of genomic patterns of variation is that regions with lower recombination rates tend to have lower levels of intraspecific diversity but not of interspecies divergence. This uncoupling of variation within and between species has been widely interpreted as evidence that natural selection shapes patterns of genetic variability genomewide. We revisited the relationship between diversity, divergence, and recombination in humans, using data from closely related species and better estimates of recombination rates than previously available. We show that regions that experience less recombination have reduced divergence to chimpanzee and to baboon, as well as lower levels of diversity. This observation suggests that mutation and recombination are associated processes in humans, so that the positive correlation between diversity and recombination may have a purely neutral explanation. Consistent with this hypothesis, diversity levels no longer increase significantly with recombination rates after correction for divergence to chimpanzee.
Article
Classical genetic studies show that gene conversion can favour some alleles over others. Molecular experiments suggest that gene conversion could favour GC over AT basepairs, leading to the concept of biased gene conversion towards GC (BGC(GC)). The expected consequence of such a process is the GC-enrichment of DNA sequences under gene conversion. Recent genomic work suggests that BGC(GC) affects the base composition of yeast, invertebrate and mammalian genomes. Hypotheses for the mechanisms and evolutionary origin of such a strange phenomenon have been proposed. Most BGC(GC) events probably occur during meiosis, which has implications for our understanding of the evolution of sex and recombination.
Article
The human and mouse genomic sequences provide evidence for a larger number of rearrangements than previously thought and reveal extensive reuse of breakpoints from the same short fragile regions. Breakpoint clustering in regions implicated in cancer and infertility have been reported in previous studies; we report here on breakpoint clustering in chromosome evolution. This clustering reveals limitations of the widely accepted random breakage theory that has remained unchallenged since the mid-1980s. The genome rearrangement analysis of the human and mouse genomes implies the existence of a large number of very short "hidden" synteny blocks that were invisible in the comparative mapping data and ignored in the random breakage model. These blocks are defined by closely located breakpoints and are often hard to detect. Our results suggest a model of chromosome evolution that postulates that mammalian genomes are mosaics of fragile regions with high propensity for rearrangements and solid regions with low propensity for rearrangements.
Article
Translocations and gross deletions are important causes of both cancer and inherited disease. Such gene rearrangements are nonrandomly distributed in the human genome as a consequence of selection for growth advantage and/or the inherent potential of some DNA sequences to be frequently involved in breakage and recombination. Using the Gross Rearrangement Breakpoint Database [GRaBD; www.uwcm.ac.uk/uwcm/mg/grabd/grabd.html] (containing 397 germ-line and somatic DNA breakpoint junction sequences derived from 219 different rearrangements underlying human inherited disease and cancer), we have analyzed the sequence context of translocation and deletion breakpoints in a search for general characteristics that might have rendered these sequences prone to rearrangement. The oligonucleotide composition of breakpoint junctions and a set of reference sequences, matched for length and genomic location, were compared with respect to their nucleotide composition. Deletion breakpoints were found to be AT-rich whereas by comparison, translocation breakpoints were GC-rich. Alternating purine-pyrimidine sequences were found to be significantly over-represented in the vicinity of deletion breakpoints while polypyrimidine tracts were over-represented at translocation breakpoints. A number of recombination-associated motifs were found to be over-represented at translocation breakpoints (including DNA polymerase pause sites/frameshift hotspots, immunoglobulin heavy chain class switch sites, heptamer/nonamer V(D)J recombination signal sequences, translin binding sites, and the chi element) but, with the exception of the translin-binding site and immunoglobulin heavy chain class switch sites, none of these motifs were over-represented at deletion breakpoints. Alu sequences were found to span both breakpoints in seven cases of gross deletion that may thus be inferred to have arisen by homologous recombination. Our results are therefore consistent with a role for homologous unequal recombination in deletion mutagenesis and a role for nonhomologous recombination in the generation of translocations.