Publications (4)29.74 Total impact
-
Article: Genomic binding of Pol III transcription machinery and relationship with TFIIS transcription factor distribution in mouse embryonic stem cells.
[show abstract] [hide abstract]
ABSTRACT: RNA polymerase (Pol) III synthesizes the tRNAs, the 5S ribosomal RNA and a small number of untranslated RNAs. In vitro, it also transcribes short interspersed nuclear elements (SINEs). We investigated the distribution of Pol III and its associated transcription factors on the genome of mouse embryonic stem cells using a highly specific tandem ChIP-Seq method. Only a subset of the annotated class III genes was bound and thus transcribed. A few hundred SINEs were associated with the Pol III transcription machinery. We observed that Pol III and its transcription factors were present at 30 unannotated sites on the mouse genome, only one of which was conserved in human. An RNA was associated with >80% of these regions. More than 2200 regions bound by TFIIIC transcription factor were devoid of Pol III. These sites were associated with cohesins and often located close to CTCF-binding sites, suggesting that TFIIIC might cooperate with these factors to organize the chromatin. We also investigated the genome-wide distribution of the ubiquitous TFIIS variant, TCEA1. We found that, as in Saccharomyces cerevisiae, TFIIS is associated with class III genes and also with SINEs suggesting that TFIIS is a Pol III transcription factor in mammals.Nucleic Acids Research 09/2011; 40(1):270-83. · 8.03 Impact Factor -
Article: The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations.
[show abstract] [hide abstract]
ABSTRACT: We have studied a genome-wide set of single-nucleotide polymorphism (SNP) allele frequency measures for African-American, East Asian, and European-American samples. For this analysis we derived a simple, closed mathematical formulation for the spectrum of expected allele frequencies when the sampled populations have experienced nonstationary demographic histories. The direct calculation generates the spectrum orders of magnitude faster than coalescent simulations do and allows us to generate spectra for a large number of alternative histories on a multidimensional parameter grid. Model-fitting experiments using this grid reveal significant population-specific differences among the demographic histories that best describe the observed allele frequency spectra. European and Asian spectra show a bottleneck-shaped history: a reduction of effective population size in the past followed by a recent phase of size recovery. In contrast, the African-American spectrum shows a history of moderate but uninterrupted population expansion. These differences are expected to have profound consequences for the design of medical association studies. The analytical methods developed for this study, i.e., a closed mathematical formulation for the allele frequency spectrum, correcting the ascertainment bias introduced by shallow SNP sampling, and dealing with variable sample sizes provide a general framework for the analysis of public variation data.Genetics 02/2004; 166(1):351-72. · 4.01 Impact Factor -
Article: Sequence variations in the public human genome data reflect a bottlenecked population history.
[show abstract] [hide abstract]
ABSTRACT: Single-nucleotide polymorphisms (SNPs) constitute the great majority of variations in the human genome, and as heritable variable landmarks they are useful markers for disease mapping and resolving population structure. Redundant coverage in overlaps of large-insert genomic clones, sequenced as part of the Human Genome Project, comprises a quarter of the genome, and it is representative in terms of base compositional and functional sequence features. We mined these regions to produce 500,000 high-confidence SNP candidates as a uniform resource for describing nucleotide diversity and its regional variation within the genome. Distributions of marker density observed at different overlap length scales under a model of recombination and population size change show that the history of the population represented by the public genome sequence is one of collapse followed by a recent phase of mild size recovery. The inferred times of collapse and recovery are Upper Paleolithic, in agreement with archaeological evidence of the initial modern human colonization of Europe.Proceedings of the National Academy of Sciences 02/2003; 100(1):376-81. · 9.68 Impact Factor -
Article: Connected gene neighborhoods in prokaryotic genomes.
[show abstract] [hide abstract]
ABSTRACT: A computational method was developed for delineating connected gene neighborhoods in bacterial and archaeal genomes. These gene neighborhoods are not typically present, in their entirety, in any single genome, but are held together by overlapping, partially conserved gene arrays. The procedure was applied to comparing the orders of orthologous genes, which were extracted from the database of Clusters of Orthologous Groups of proteins (COGs), in 31 prokaryotic genomes and resulted in the identification of 188 clusters of gene arrays, which included 1001 of 2890 COGs. These clusters were projected onto actual genomes to produce extended neighborhoods including additional genes, which are adjacent to the genes from the clusters and are transcribed in the same direction, which resulted in a total of 2387 COGs being included in the neighborhoods. Most of the neighborhoods consist predominantly of genes united by a coherent functional theme, but also include a minority of genes without an obvious functional connection to the main theme. We hypothesize that although some of the latter genes might have unsuspected roles, others are maintained within gene arrays because of the advantage of expression at a level that is typical of the given neighborhood. We designate this phenomenon 'genomic hitchhiking'. The largest neighborhood includes 79 genes (COGs) and consists of overlapping, rearranged ribosomal protein superoperons; apparent genome hitchhiking is particularly typical of this neighborhood and other neighborhoods that consist of genes coding for translation machinery components. Several neighborhoods involve previously undetected connections between genes, allowing new functional predictions. Gene neighborhoods appear to evolve via complex rearrangement, with different combinations of genes from a neighborhood fixed in different lineages.Nucleic Acids Research 06/2002; 30(10):2212-23. · 8.03 Impact Factor