[Show abstract][Hide abstract] ABSTRACT: The vast majority of microscopic life on earth consists of microbes that do not grow in laboratory culture. To profile the
microbial diversity in environmental and clinical samples, we have devised and employed molecular probe technology, which
detects and identifies bacteria that do and do not grow in culture. The only requirement is a short sequence of contiguous
bases (currently 60 bases) unique to the genome of the organism of interest. The procedure is relatively fast, inexpensive,
customizable, robust, and culture independent and uses commercially available reagents and instruments. In this communication,
we report improving the specificity of the molecular probes substantially and increasing the complexity of the molecular probe
set by over an order of magnitude (>1,200 probes) and introduce a new final readout method based upon Illumina sequencing.
In addition, we employed molecular probes to identify the bacteria from vaginal swabs and demonstrate how a deliberate selection
of molecular probes can identify less abundant bacteria even in the presence of much more abundant species.
Preview · Article · May 2014 · Applied and Environmental Microbiology
[Show abstract][Hide abstract] ABSTRACT: To determine the vaginal microbiome in women undergoing IVF-ET and investigate correlations with clinical outcomes.
Thirty patients had blood drawn for estradiol (E(2)) and progesterone (P(4)) at four time points during the IVF-ET cycle and at 4-6 weeks of gestation, if pregnant. Vaginal swabs were obtained in different hormonal milieu, and the vaginal microbiome determined by deep sequencing of the 16S ribosomal RNA gene.
The vaginal microbiome underwent a transition during therapy in some but not all patients. Novel bacteria were found in 33% of women tested during the treatment cycle, but not at 6-8 weeks of gestation. Diversity of species varied across different hormonal milieu, and on the day of embryo transfer correlated with outcome (live birth/no live birth). The species diversity index distinguished women who had a live birth from those who did not.
This metagenomics approach has enabled discovery of novel, previously unidentified bacterial species in the human vagina in different hormonal milieu and supports a shift in the vaginal microbiome during IVF-ET therapy using standard protocols. Furthermore, the data suggest that the vaginal microbiome on the day of embryo transfer affects pregnancy outcome.
Full-text · Article · Feb 2012 · Journal of Assisted Reproduction and Genetics
[Show abstract][Hide abstract] ABSTRACT: The accurate and complete selection of candidate genomic regions from a DNA sample before sequencing is critical in molecular diagnostics. Several recently developed technologies await substantial improvements in performance, cost, and multiplex sample processing. Here we present the utility of long padlock probes (LPPs) for targeted exon capture followed by array-based sequencing. We found that on average 92% of 5,471 exons from 524 nuclear-encoded mitochondrial genes were successfully amplified from genomic DNA from 63 individuals. Only 144 exons did not amplify in any sample due to high GC content. One LPP was sufficient to capture sequences from <100-500 bp in length and only a single-tube capture reaction and one microarray was required per sample. Our approach was highly reproducible and quick (<8 h) and detected DNA variants at high accuracy (false discovery rate 1%, false negative rate 3%) on the basis of known sample SNPs and Sanger sequence verification. In a patient with clinical and biochemical presentation of ornithine transcarbamylase (OTC) deficiency, we identified copy-number differences in the OTC gene at exon-level resolution. This shows the ability of LPPs to accurately preserve a sample's genome information and provides a cost-effective strategy to identify both single nucleotide changes and structural variants in targeted resequencing.
Full-text · Article · Apr 2011 · Proceedings of the National Academy of Sciences
[Show abstract][Hide abstract] ABSTRACT: A common goal in the discovery of rare functional DNA variants via medical resequencing is to incur a relatively lower proportion
of false positive base-calls. We developed a novel statistical method for resequencing arrays (SRMA, sequence robust multi-array
analysis) to increase the accuracy of detecting rare variants and reduce the costs in subsequent sequence verifications required
in medical applications. SRMA includes single and multi-array analysis and accounts for technical variables as well as the
possibility of both low- and high-frequency genomic variation. The confidence of each base-call was ranked using two quality
measures. In comparison to Sanger capillary sequencing, we achieved a false discovery rate of 2% (false positive rate 1.2 × 10−5, false negative rate 5%), which is similar to automated second-generation sequencing technologies. Applied to the analysis
of 39 nuclear candidate genes in disorders of mitochondrial DNA (mtDNA) maintenance, we confirmed mutations in the DNA polymerase
gamma POLG in positive control cases, and identified novel rare variants in previously undiagnosed cases in the mitochondrial
topoisomerase TOP1MT, the mismatch repair enzyme MUTYH, and the apurinic-apyrimidinic endonuclease APEX2. Some patients carried
rare heterozygous variants in several functionally interacting genes, which could indicate synergistic genetic effects in
these clinically similar disorders.
Full-text · Article · Jan 2011 · Nucleic Acids Research
[Show abstract][Hide abstract] ABSTRACT: A sensitive, high-throughput method for monitoring pre-mRNA splicing on a genomic scale is needed to understand the spectrum of alternatively spliced mRNA in human cells.
We adapted Molecular Inversion Probes (MIPs), a padlock-probe based technology, for the multiplexed capture and quantitation of individual splice events in human tissues. Individual MIP capture probes can be quantified using either DNA microarrays or high-throughput sequencing, which permits independent assessment of each spliced junction. Using our methodology we successfully identified 100% of our positive controls and showed that there is a strong correlation between the data from our alternative splicing MIP (asMIP) assay and quantitative PCR.
The asMIP assay provides a sensitive, accurate and multiplexed means for measuring pre-mRNA splicing. Fully optimized, we estimate that the assay could accommodate a throughput of greater than 20,000 splice junctions in a single reaction. This would represent a significant improvement over existing technologies.
[Show abstract][Hide abstract] ABSTRACT: Tiling Array Design and Data Description Ontology Analyses Antisense Regulation Identification Correlated Expression Between Two DNA Strands Identification of Nonprotein Coding mRNA Summary Acknowledgments References
[Show abstract][Hide abstract] ABSTRACT: Knowing gene structure is vital to understanding gene function, and accurate genome annotation is essential for understanding cellular function. To this end, we have developed a genome-wide assay for mapping introns in Saccharomyces cerevisiae. Using high-density tiling arrays, we compared wild-type yeast to a mutant deficient for intron degradation. Our method identified 76% of the known introns, confirmed 18 previously predicted introns, and revealed 9 formerly undiscovered introns. Furthermore, we discovered that all 13 meiosis-specific intronic yeast genes undergo regulated splicing, which provides posttranscriptional regulation of the genes involved in yeast cell differentiation. Moreover, we found that approximately 16% of intronic genes in yeast are incompletely spliced during exponential growth in rich medium, which suggests that meiosis is not the only biological process regulated by splicing. Our tiling-array assay provides a snapshot of the spliced transcriptome in yeast. This robust methodology can be used to explore environmentally distinct splicing responses and should be readily adaptable to the study of other organisms, including humans.
Full-text · Article · Feb 2007 · Proceedings of the National Academy of Sciences
[Show abstract][Hide abstract] ABSTRACT: There is abundant transcription from eukaryotic genomes unaccounted for by protein coding genes. A high-resolution genome-wide survey of transcription in a well annotated genome will help relate transcriptional complexity to function. By quantifying RNA expression on both strands of the complete genome of Saccharomyces cerevisiae using a high-density oligonucleotide tiling array, this study identifies the boundary, structure, and level of coding and noncoding transcripts. A total of 85% of the genome is expressed in rich media. Apart from expected transcripts, we found operon-like transcripts, transcripts from neighboring genes not separated by intergenic regions, and genes with complex transcriptional architecture where different parts of the same gene are expressed at different levels. We mapped the positions of 3' and 5' UTRs of coding genes and identified hundreds of RNA transcripts distinct from annotated genes. These nonannotated transcripts, on average, have lower sequence conservation and lower rates of deletion phenotype than protein coding genes. Many other transcripts overlap known genes in antisense orientation, and for these pairs global correlations were discovered: UTR lengths correlated with gene function, localization, and requirements for regulation; antisense transcripts overlapped 3' UTRs more than 5' UTRs; UTRs with overlapping antisense tended to be longer; and the presence of antisense associated with gene function. These findings may suggest a regulatory role of antisense transcription in S. cerevisiae. Moreover, the data show that even this well studied genome has transcriptional complexity far beyond current annotation.
Full-text · Article · May 2006 · Proceedings of the National Academy of Sciences
[Show abstract][Hide abstract] ABSTRACT: Functional analysis of a genome requires accurate gene structure information and a complete gene inventory. A dual experimental strategy was used to verify and correct the initial genome sequence annotation of the reference plant Arabidopsis. Sequencing full-length cDNAs and hybridizations using RNA populations from various tissues to a set of high-density oligonucleotide arrays spanning the entire genome allowed the accurate annotation of thousands of gene structures. We identified 5817 novel transcription units, including a substantial amount of antisense gene transcription, and 40 genes within the genetically defined centromeres. This approach resulted in completion of approximately 30% of the Arabidopsis ORFeome as a resource for global functional experimentation of the plant proteome.
[Show abstract][Hide abstract] ABSTRACT: The symbiotic nitrogen-fixing soil bacterium Sinorhizobium
meliloti contains three replicons: pSymA, pSymB, and the
chromosome. We report here the complete 1,354,226-nt sequence of pSymA.
In addition to a large fraction of the genes known to be specifically
involved in symbiosis, pSymA contains genes likely to be involved in
nitrogen and carbon metabolism, transport, stress, and resistance
responses, and other functions that give S. meliloti an
advantage in its specialized niche.
Full-text · Article · Aug 2001 · Proceedings of the National Academy of Sciences
[Show abstract][Hide abstract] ABSTRACT: The scarcity of usable nitrogen frequently limits plant growth. A tight metabolic association with rhizobial bacteria allows legumes to obtain nitrogen compounds by bacterial reduction of dinitrogen (N2) to ammonium (NH4+). We present here the annotated DNA sequence of the alpha-proteobacterium Sinorhizobium meliloti, the symbiont of alfalfa. The tripartite 6.7-megabase (Mb) genome comprises a 3.65-Mb chromosome, and 1.35-Mb pSymA and 1.68-Mb pSymB megaplasmids. Genome sequence analysis indicates that all three elements contribute, in varying degrees, to symbiosis and reveals how this genome may have emerged during evolution. The genome sequence will be useful in understanding the dynamics of interkingdom associations and of life in soil environments.
[Show abstract][Hide abstract] ABSTRACT: The genome of the flowering plant Arabidopsis thaliana has five chromosomes1,2. Here we report the sequence of the largest, chromosome 1, in two contigs of around 14.2 and 14.6 megabases. The contigs extend from the telomeres to the centromeric borders, regions rich in transposons, retrotransposons and repetitive elements such as the 180-base-pair repeat. The chromosome represents 25% of the genome and contains about 6,850 open reading frames, 236 transfer RNAs (tRNAs) and 12 small nuclear RNAs. There are two clusters of tRNA genes at different places on the chromosome. One consists of 27 tRNAPro genes and the other contains 27 tandem repeats of tRNATyr-tRNATyr-tRNASergenes. Chromosome 1 contains about 300 gene families with clustered duplications. There are also many repeat elements, representing 8% of the sequence.
[Show abstract][Hide abstract] ABSTRACT: The Database of Arabidopsis thaliana Annotation (DAtA) was created to enable easy access to and analysis of all the Arabidopsis genome project annotation. The database was constructed using the completed A.thaliana genomic sequence data currently in GenBank. An automated annotation process was used to predict coding sequences for GenBank
records that do not include annotation. DAtA also contains protein motifs and protein similarities derived from searches of the proteins in DAtA with motif databases and the non-redundant protein database. The database is routinely updated to include new GenBank submissions
for Arabidopsis genomic sequences and new Blast and protein motif search results. A web interface to DAtA allows coding sequences to be searched by name, comment, blast similarity or motif field. In addition, browse options present
lists of either all the protein names or identified motifs present in the sequenced A.thaliana genome. The database can be accessed at http://baggage.stanford.edu/group/arabprotein/
Full-text · Article · Feb 2000 · Nucleic Acids Research