[Show abstract][Hide abstract] ABSTRACT: It has recently been shown that genomic integrity (with respect to copy number variants [CNVs]) is compromised in human induced pluripotent stem cells (iPSCs) generated by viral-based ectopic expression of specific transcription factors (e.g., Oct4, Sox2, Klf4, and c-Myc). However, it is unclear how different methods for iPSC generation compare with one another with respect to CNV formation. Because array-based methods remain the gold standard for detecting unbalanced structural variants (i.e., CNVs), we have used this approach to comprehensively identify CNVs in iPSC as a proxy for determining whether our modified protein-based method minimizes genomic instability compared with retro- and lentiviral methods. In this study, we established an improved method for protein reprogramming by using partially purified reprogramming proteins, resulting in more efficient generation of iPSCs from C57/BL6J mouse hepatocytes than using protein extracts. We also developed a robust and unbiased 1 M custom array CGH platform to identify novel CNVs and previously described hot spots for CNV formation, allowing us to detect CNVs down to the size of 1.9 kb. The genomic integrity of these protein-based mouse iPSCs (p-miPSCs) was compared with miPSCs developed from viral-based strategies (i.e., retroviral: retro-miPSCs or lentiviral: lenti-miPSCs). We identified an increased CNV content in lenti-miPSCs and retro-miPSCs (29∼53 CNVs) compared with p-miPSCs (9∼10 CNVs), indicating that our improved protein-based reprogramming method maintains genomic integrity better than current viral reprogramming methods. Thus, our study, for the first time to our knowledge, demonstrates that reprogramming methods significantly influence the genomic integrity of resulting iPSCs.
[Show abstract][Hide abstract] ABSTRACT: In primates and other animals reverse transcription of mRNA followed by genomic integration creates retroduplications. Expressed retroduplications are either 'retrogenes' coding for functioning proteins or expressed 'processed pseudogenes', which can function as noncoding RNAs. To date, little is known about the variation in retroduplications in terms of their presence or absence across individuals in the human population. We developed new methodologies allowing us to identify 'novel' retroduplications (i.e., those not present in the reference genome), to find their insertion points, and to genotype them. Using these methods, we catalogued and analyzed 174 retroduplication variants in almost one thousand humans, which were sequenced as part of Phase 1 of the 1000 Genomes Project. The accuracy of our dataset was corroborated by (i) multiple lines of sequencing evidence for retroduplication (e.g., depth of coverage in exons vs. introns), (ii) experimental validation, and (iii) the fact that we can reconstruct a correct phylogenetic tree of human sub-populations based solely on retroduplications. We also show that parent genes of retroduplication variants tend to be expressed at the M-to-G1 transition in the cell cycle, and that M-to-G1 expressed genes have more copies of fixed retroduplications than genes expressed at other times. These findings suggest that cell division is coupled to retrotransposition and perhaps, is even a requirement for it.
Genome Research 09/2013; 23(12). DOI:10.1101/gr.154625.113 · 14.63 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Although nucleotide resolution maps of genomic structural variants (SVs) have provided insights into the origin and impact of phenotypic diversity in humans, comparable maps in nonhuman primates have thus far been lacking. Using massively parallel DNA sequencing, we constructed fine-resolution genomic structural variation maps in five chimpanzees, five orang-utans, and five rhesus macaques. The SV maps, which are comprised of thousands of deletions, duplications, and mobile element insertions, revealed a high activity of retrotransposition in macaques compared with great apes. By comparison, nonallelic homologous recombination is specifically active in the great apes, which is correlated with architectural differences between the genomes of great apes and macaque. Transcriptome analyses across nonhuman primates and humans revealed effects of species-specific whole-gene duplication on gene expression. We identified 13 gene duplications coinciding with the species-specific gain of tissue-specific gene expression in keeping with a role of gene duplication in the promotion of diversification and the acquisition of unique functions. Differences in the present day activity of SV formation mechanisms that our study revealed may contribute to ongoing diversification and adaptation of great ape and Old World monkey lineages.
Proceedings of the National Academy of Sciences 09/2013; 110(39). DOI:10.1073/pnas.1305904110 · 9.67 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Ancient population structure shaping contemporary genetic variation has been recently appreciated and has important implications regarding our understanding of the structure of modern human genomes. We identified a ∼36-kb DNA segment in the human genome that displays an ancient substructure. The variation at this locus exists primarily as two highly divergent haplogroups. One of these haplogroups (the NE1 haplogroup) aligns with the Neandertal haplotype and contains a 4.6-kb deletion polymorphism in perfect linkage disequilibrium with 12 single nucleotide polymorphisms (SNPs) across diverse populations. The other haplogroup, which does not contain the 4.6-kb deletion, aligns with the chimpanzee haplotype and is likely ancestral. Africans have higher overall pairwise differences with the Neandertal haplotype than Eurasians do for this NE1 locus (p<10(-15)). Moreover, the nucleotide diversity at this locus is higher in Eurasians than in Africans. These results mimic signatures of recent Neandertal admixture contributing to this locus. However, an in-depth assessment of the variation in this region across multiple populations reveals that African NE1 haplotypes, albeit rare, harbor more sequence variation than NE1 haplotypes found in Europeans, indicating an ancient African origin of this haplogroup and refuting recent Neandertal admixture. Population genetic analyses of the SNPs within each of these haplogroups, along with genome-wide comparisons revealed significant FST (p = 0.00003) and positive Tajima's D (p = 0.00285) statistics, pointing to non-neutral evolution of this locus. The NE1 locus harbors no protein-coding genes, but contains transcribed sequences as well as sequences with putative regulatory function based on bioinformatic predictions and in vitro experiments. We postulate that the variation observed at this locus predates Human-Neandertal divergence and is evolving under balancing selection, especially among European populations.
[Show abstract][Hide abstract] ABSTRACT: Gene expression differences are shaped by selective pressures and contribute to phenotypic differences between species. We identified 964 copy number differences (CNDs) of conserved sequences across three primate species and examined their potential effects on gene expression profiles. Samples with copy number different genes had significantly different expression than samples with neutral copy number. Genes encoding regulatory molecules differed in copy number and were associated with significant expression differences. Additionally, we identified 127 CNDs that were processed pseudogenes and some of which were expressed. Furthermore, there were copy number-different regulatory regions such as ultraconserved elements and long intergenic noncoding RNAs with the potential to affect expression. We postulate that CNDs of these conserved sequences fine-tune developmental pathways by altering the levels of RNA.
Proceedings of the National Academy of Sciences 07/2012; 109(31):12656-61. DOI:10.1073/pnas.1205199109 · 9.67 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Transposable elements (TEs) are abundant in the human genome, and some are capable of generating new insertions through RNA
intermediates. In cancer, the disruption of cellular mechanisms that normally suppress TE activity may facilitate mutagenic
retrotranspositions. We performed single-nucleotide resolution analysis of TE insertions in 43 high-coverage whole-genome
sequencing data sets from five cancer types. We identified 194 high-confidence somatic TE insertions, as well as thousands
of polymorphic TE insertions in matched normal genomes. Somatic insertions were present in epithelial tumors but not in blood
or brain cancers. Somatic L1 insertions tend to occur in genes that are commonly mutated in cancer, disrupt the expression
of the target genes, and are biased toward regions of cancer-specific DNA hypomethylation, highlighting their potential impact
[Show abstract][Hide abstract] ABSTRACT: Over the past decade, the ubiquity of copy number variants (CNVs, the gain or loss of genomic material) in the genomes of healthy humans has become apparent. Although some of these variants are associated with disorders, a handful of studies documented an adaptive advantage conferred by CNVs. In this review, we propose that CNVs are substrates for human evolution and adaptation. We discuss the possible mechanisms and evolutionary processes in which CNVs are selected, outline the current challenges in identifying these loci, and highlight that copy number variable regions allow for the creation of novel genes that may diversify the repertoire of such genes in response to rapidly changing environments. We expect that many more adaptive CNVs will be discovered in the coming years, and we believe that these new findings will contribute to our understanding of human-specific phenotypes.
Trends in Genetics 04/2012; 28(6):245-57. DOI:10.1016/j.tig.2012.03.002 · 9.92 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Copy number variants (CNVs), defined as losses and gains of segments of genomic DNA, are a major source of genomic variation.
In this study, we identified over 2,000 human CNVs that overlap with orthologous chimpanzee or orthologous macaque CNVs. Of these, 170 CNVs overlap with both chimpanzee and macaque CNVs, and these were collapsed into 34 hotspot regions of CNV formation. Many of these hotspot regions of CNV formation are functionally relevant, with a bias toward genes involved in immune function, some of which were previously shown to evolve under balancing selection in humans. The genes in these primate CNV formation hotspots have significant differential expression levels between species and show evidence for positive selection, indicating that they have evolved under species-specific, directional selection.
These hotspots of primate CNV formation provide a novel perspective on divergence and selective pressures acting on these genomic regions.
[Show abstract][Hide abstract] ABSTRACT: Two abundant classes of mobile elements, namely Alu and L1 elements, continue to generate new retrotransposon insertions in human genomes. Estimates suggest that these elements have generated millions of new germline insertions in individual human genomes worldwide. Unfortunately, current technologies are not capable of detecting most of these young insertions, and the true extent of germline mutagenesis by endogenous human retrotransposons has been difficult to examine. Here, we describe technologies for detecting these young retrotransposon insertions and demonstrate that such insertions indeed are abundant in human populations. We also found that new somatic L1 insertions occur at high frequencies in human lung cancer genomes. Genome-wide analysis suggests that altered DNA methylation may be responsible for the high levels of L1 mobilization observed in these tumors. Our data indicate that transposon-mediated mutagenesis is extensive in human genomes and is likely to have a major impact on human biology and diseases.
[Show abstract][Hide abstract] ABSTRACT: Although a large proportion (44%) of the human genome is occupied by transposons and transposon-like repetitive elements, only a small proportion (<0.05%) of these elements remain active today. Recent evidence indicates that approximately 35-40 subfamilies of Alu, L1 and SVA elements (and possibly HERV-K elements) remain actively mobile in the human genome. These active transposons are of great interest because they continue to produce genetic diversity in human populations and also cause human diseases by integrating into genes. In this review, we examine these active human transposons and explore mechanistic factors that influence their mobilization.
Trends in Genetics 04/2007; 23(4):183-91. DOI:10.1016/j.tig.2007.02.006 · 9.92 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Transposable genetic elements are abundant in the genomes of most organisms, including humans. These endogenous mutagens can alter genes, promote genomic rearrangements, and may help to drive the speciation of organisms. In this study, we identified almost 11,000 transposon copies that are differentially present in the human and chimpanzee genomes. Most of these transposon copies were mobilized after the existence of a common ancestor of humans and chimpanzees, approximately 6 million years ago. Alu, L1, and SVA insertions accounted for >95% of the insertions in both species. Our data indicate that humans have supported higher levels of transposition than have chimpanzees during the past several million years and have amplified different transposon subfamilies. In both species, approximately 34% of the insertions were located within known genes. These insertions represent a form of species-specific genetic variation that may have contributed to the differential evolution of humans and chimpanzees. In addition to providing an initial overview of recently mobilized elements, our collections will be useful for assessing the impact of these insertions on their hosts and for studying the transposition mechanisms of these elements.
The American Journal of Human Genetics 05/2006; 78(4):671-9. DOI:10.1086/501028 · 10.93 Impact Factor