[Show abstract][Hide abstract] ABSTRACT: CRISPRz (http://research.nhgri.nih.gov/CRISPRz/) is a database of CRISPR/Cas9 target sequences that have been experimentally validated in zebrafish. Programmable RNA-guided
CRISPR/Cas9 has recently emerged as a simple and efficient genome editing method in various cell types and organisms, including
zebrafish. Because the technique is so easy and efficient in zebrafish, the most valuable asset is no longer a mutated fish
(which has distribution challenges), but rather a CRISPR/Cas9 target sequence to the gene confirmed to have high mutagenic
efficiency. With a highly active CRISPR target, a mutant fish can be quickly replicated in any genetic background anywhere
in the world. However, sgRNA's vary widely in their activity and models for predicting target activity are imperfect. Thus,
it is very useful to collect in one place validated CRISPR target sequences with their relative mutagenic activities. A researcher
could then select a target of interest in the database with an expected activity. Here, we report the development of CRISPRz,
a database of validated zebrafish CRISPR target sites collected from published sources, as well as from our own in-house large-scale
mutagenesis project. CRISPRz can be searched using multiple inputs such as ZFIN IDs, accession number, UniGene ID, or gene
symbols from zebrafish, human and mouse.
Nucleic Acids Research 10/2015; DOI:10.1093/nar/gkv998 · 9.11 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Endometrial cancer (EC) is the 8th leading cause of cancer death amongst American women. Most ECs are endometrioid, serous, or clear cell carcinomas, or an admixture of histologies. Serous and clear ECs are clinically aggressive tumors for which alternative therapeutic approaches are needed. The purpose of this study was to search for somatic mutations in the tyrosine kinome of serous and clear cell ECs, because mutated kinases can point to potential therapeutic targets.
In a mutation discovery screen, we PCR amplified and Sanger sequenced the exons encoding the catalytic domains of 86 tyrosine kinases from 24 serous, 11 clear cell, and 5 mixed histology ECs. For somatically mutated genes, we next sequenced the remaining coding exons from the 40 discovery screen tumors and sequenced all coding exons from another 72 ECs (10 clear cell, 21 serous, 41 endometrioid). We assessed the copy number of mutated kinases in this cohort of 112 tumors using quantitative real time PCR, and we used immunoblotting to measure expression of these kinases in endometrial cancer cell lines.
Overall, we identified somatic mutations in TNK2 (tyrosine kinase non-receptor, 2) and DDR1 (discoidin domain receptor tyrosine kinase 1) in 5.3% (6 of 112) and 2.7% (3 of 112) of ECs. Copy number gains of TNK2 and DDR1 were identified in another 4.5% and 0.9% of 112 cases respectively. Immunoblotting confirmed TNK2 and DDR1 expression in endometrial cancer cell lines. Three of five missense mutations in TNK2 and one of two missense mutations in DDR1 are predicted to impact protein function by two or more in silico algorithms. The TNK2P761Rfs*72 frameshift mutation was recurrent in EC, and the DDR1R570Q missense mutation was recurrent across tumor types.
This is the first study to systematically search for mutations in the tyrosine kinome in clear cell endometrial tumors. Our findings indicate that high-frequency somatic mutations in the catalytic domains of the tyrosine kinome are rare in clear cell ECs. We uncovered ten new mutations in TNK2 and DDR1 within serous and endometrioid ECs, thus providing novel insights into the mutation spectrum of each gene in EC.
BMC Cancer 11/2014; 14(1):884. DOI:10.1186/1471-2407-14-884 · 3.36 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Background
Quantification of a transcriptional profile is a useful way to evaluate the activity of a cell at a given point in time. Although RNA-Seq has revolutionized transcriptional profiling, the costs of RNA-Seq are still significantly higher than microarrays, and often the depth of data delivered from RNA-Seq is in excess of what is needed for simple transcript quantification. Digital Gene Expression (DGE) is a cost-effective, sequence-based approach for simple transcript quantification: by sequencing one read per molecule of RNA, this technique can be used to efficiently count transcripts while obviating the need for transcript-length normalization and reducing the total numbers of reads necessary for accurate quantification. Here, we present trieFinder, a program specifically designed to rapidly map, parse, and annotate DGE tags of various lengths against cDNA and/or genomic sequence databases.
The trieFinder algorithm maps DGE tags in a two-step process. First, it scans FASTA files of RefSeq, UniGene, and genomic DNA sequences to create a database of all tags that can be derived from a predefined restriction site. Next, it compares the experimental DGE tags to this tag database, taking advantage of the fact that the tags are stored as a prefix tree, or “trie”, which allows for linear-time searches for exact matches. DGE tags with mismatches are analyzed by recursive calls in the data structure. We find that, in terms of alignment speed, the mapping functionality of trieFinder compares favorably with Bowtie.
trieFinder can quickly provide the user an annotation of the DGE tags from three sources simultaneously, simplifying transcript quantification and novel transcript detection, delivering the data in a simple parsed format, obviating the need to post-process the alignment results. trieFinder is available at http://research.nhgri.nih.gov/software/trieFinder/.
[Show abstract][Hide abstract] ABSTRACT: Mnemiopsis leidyi is a ctenophore native to the coastal waters of the western Atlantic Ocean. A number of studies on Mnemiopsis have led to a better understanding of many key biological processes, and these studies have contributed to the emergence of Mnemiopsis as an important model for evolutionary and developmental studies. Recently, we sequenced, assembled, annotated, and performed a preliminary analysis on the 150-megabase genome of the ctenophore, Mnemiopsis. This sequencing effort has produced the first set of whole-genome sequencing data on any ctenophore species and is amongst the first wave of projects to sequence an animal genome de novo using solely next-generation sequencing technologies.Description: The Mnemiopsis Genome Project Portal (http://research.nhgri.nih.gov/mnemiopsis/) is intended both as a resource for obtaining genomic information on Mnemiopsis through an intuitive and easy-to-use interface and as a model for developing customized Web portals that enable access to genomic data. The scope of data available through this Portal goes well beyond the sequence data available through GenBank, providing key biological information not available elsewhere, such as pathway and protein domain analyses; it also features a customized genome browser for data visualization.
We expect that the availability of these data will allow investigators to advance their own research projects aimed at understanding phylogenetic diversity and the evolution of proteins that play a fundamental role in metazoan development. The overall approach taken in the development of this Web site can serve as a viable model for disseminating data from whole-genome sequencing projects, framed in a way that best-serves the specific needs of the scientific community.
[Show abstract][Hide abstract] ABSTRACT: Retroviruses integrate into the host genome in patterns specific to each virus. Understanding the causes of these patterns can provide insight into viral integration mechanisms, pathology and genome evolution, and is critical to the development of safe gene therapy vectors. We generated murine leukemia virus integrations in human HepG2 and K562 cells and subjected them to second-generation sequencing, using a DNA barcoding technique that allowed us to quantify independent integration events. We characterized >3 700 000 unique integration events in two ENCODE-characterized cell lines. We find that integrations were most highly enriched in a subset of strong enhancers and active promoters. In both cell types, approximately half the integrations were found in <2% of the genome, demonstrating genomic influences even narrower than previously believed. The integration pattern of murine leukemia virus appears to be largely driven by regions that have high enrichment for multiple marks of active chromatin; the combination of histone marks present was sufficient to explain why some strong enhancers were more prone to integration than others. The approach we used is applicable to analyzing the integration pattern of any exogenous element and could be a valuable preclinical screen to evaluate the safety of gene therapy vectors.
Nucleic Acids Research 01/2014; 42(7). DOI:10.1093/nar/gkt1399 · 9.11 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: An understanding of ctenophore biology is critical for reconstructing events that occurred early in animal evolution. Toward
this goal, we have sequenced, assembled, and annotated the genome of the ctenophore Mnemiopsis leidyi. Our phylogenomic analyses of both amino acid positions and gene content suggest that ctenophores rather than sponges are
the sister lineage to all other animals. Mnemiopsis lacks many of the genes found in bilaterian mesodermal cell types, suggesting that these cell types evolved independently.
The set of neural genes in Mnemiopsis is similar to that of sponges, indicating that sponges may have lost a nervous system. These results present a newly supported
view of early animal evolution that accounts for major losses and/or gains of sophisticated cell types, including nerve and
[Show abstract][Hide abstract] ABSTRACT: Technological advances have greatly increased the availability of human genomic sequencing. However, the capacity to analyze genomic data in a clinically meaningful way lags behind the ability to generate such data. To help address this obstacle, we reviewed all conditions with genetic causes and constructed the Clinical Genomic Database (CGD) (http://research.nhgri.nih.gov/CGD/), a searchable, freely Web-accessible database of conditions based on the clinical utility of genetic diagnosis and the availability of specific medical interventions. The CGD currently includes a total of 2,616 genes organized clinically by affected organ systems and interventions (including preventive measures, disease surveillance, and medical or surgical interventions) that could be reasonably warranted by the identification of pathogenic mutations. To aid independent analysis and optimize new data incorporation, the CGD also includes all genetic conditions for which genetic knowledge may affect the selection of supportive care, informed medical decision-making, prognostic considerations, reproductive decisions, and allow avoidance of unnecessary testing, but for which specific interventions are not otherwise currently available. For each entry, the CGD includes the gene symbol, conditions, allelic conditions, clinical categorization (for both manifestations and interventions), mode of inheritance, affected age group, description of interventions/rationale, links to other complementary databases, including databases of variants and presumed pathogenic mutations, and links to PubMed references (>20,000). The CGD will be regularly maintained and updated to keep pace with scientific discovery. Further content-based expert opinions are actively solicited. Eventually, the CGD may assist the rapid curation of individual genomes as part of active medical care.
Proceedings of the National Academy of Sciences 05/2013; 110(24). DOI:10.1073/pnas.1302575110 · 9.67 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: With the completion of zebrafish genome sequencing project, it becomes possible to analyze the function of zebrafish genes in a systematic way. The first step in such an analysis is to inactivate each protein-coding gene by targeted or random mutation. Here we describe a streamlined pipeline using proviral insertions coupled with high-throughput sequencing and mapping technologies to widely mutagenize genes in the zebrafish genome. We also report the first 6,144 mutagenized and archived F1s predicted to carry up to 3,776 mutations in annotated genes. Using in vitro fertilization, we have rescued and characterized roughly 0.5% of the predicted mutations, showing mutation efficacy and a variety of phenotypes relevant to both developmental processes and human genetic diseases. Mutagenized fish lines are being made freely available to the public through the Zebrafish International Resource Center. These fish lines establish an important milestone for zebrafish genetics research and should greatly facilitate systematic functional studies of the vertebrate genome.
Genome Research 02/2013; 23(4). DOI:10.1101/gr.151464.112 · 14.63 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: ZInC (Zebrafish Insertional Collection, http://research.nhgri.nih.gov/ZInC/) is a web-searchable interface of insertional mutants in zebrafish. Over the last two decades, the zebrafish has become a
popular model organism for studying vertebrate development as well as for modeling human diseases. To facilitate such studies,
we are generating a genome-wide knockout resource that targets every zebrafish protein-coding gene. All mutant fish are freely
available to the scientific community through the Zebrafish International Resource Center (ZIRC). To assist researchers in
finding mutant and insertion information, we developed a comprehensive database with a web front-end, the ZInC. It can be
queried using multiple types of input such as ZFIN (Zebrafish Information Network) IDs, UniGene accession numbers and gene
symbols from zebrafish, human and mouse. In the future, ZInC may include data from other insertional mutation projects as
well. ZInC cross-references all integration data with the ZFIN (http://zfin.org/).
[Show abstract][Hide abstract] ABSTRACT: Human immunodeficiency virus type 1 (HIV1) vectors poorly transduce rhesus hematopoietic cells due to species-specific restriction factors, including the tripartite motif-containing 5 isoformα (TRIM5α) which targets the HIV1 capsid. We previously developed a chimeric HIV1 (χHIV) vector system wherein the vector genome is packaged with the simian immunodeficiency virus (SIV) capsid for efficient transduction of both rhesus and human CD34(+) cells. To evaluate whether χHIV vectors could efficiently transduce rhesus hematopoietic repopulating cells, we performed a competitive repopulation assay in rhesus macaques, in which half of the CD34(+) cells were transduced with standard SIV vectors and the other half with χHIV vectors. As compared with SIV vectors, χHIV vectors achieved higher vector integration, and the transgene expression rates were two- to threefold higher in granulocytes and red blood cells and equivalent in lymphocytes and platelets for 2 years. A recipient of χHIV vector-only transduced cells reached up to 40% of transgene expression rates in granulocytes and lymphocytes and 20% in red blood cells. Similar to HIV1 and SIV vectors, χHIV vector frequently integrated into gene regions, especially into introns. In summary, our χHIV vector demonstrated efficient transduction for rhesus long-term repopulating cells, comparable with SIV vectors. This χHIV vector should allow preclinical testing of HIV1-based therapeutic vectors in large animal models.
[Show abstract][Hide abstract] ABSTRACT: All nonmammalian vertebrates studied can regenerate inner ear mechanosensory receptors (i.e., hair cells) (Corwin and Cotanche, 1988; Lombarte et al., 1993; Baird et al., 1996), but mammals possess only a very limited capacity for regeneration after birth (Roberson and Rubel, 1994). As a result, mammals experience permanent deficiencies in hearing and balance once their inner ear hair cells are lost. The mechanisms of hair cell regeneration are poorly understood. Because the inner ear sensory epithelium is highly conserved in all vertebrates (Fritzsch et al., 2007), we chose to study hair cell regeneration mechanism in adult zebrafish, hoping the results would be transferrable to inducing hair cell regeneration in mammals. We defined the comprehensive network of genes involved in hair cell regeneration in the inner ear of adult zebrafish with the powerful transcriptional profiling technique digital gene expression, which leverages the power of next-generation sequencing ('t Hoen et al., 2008). We also identified a key pathway, stat3/socs3, and demonstrated its role in promoting hair cell regeneration through stem cell activation, cell division, and differentiation. In addition, transient pharmacological inhibition of stat3 signaling accelerated hair cell regeneration without overproducing cells. Taking other published datasets into account (Sano et al., 1999; Schebesta et al., 2006; Dierssen et al., 2008; Riehle et al., 2008; Zhu et al., 2008; Qin et al., 2009), we propose that the stat3/socs3 pathway is a key response in all tissue regeneration and thus an important therapeutic target for a broad application in tissue repair and injury healing.
The Journal of Neuroscience : The Official Journal of the Society for Neuroscience 08/2012; 32(31):10662-73. DOI:10.1523/JNEUROSCI.5785-10.2012 · 6.34 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: ATAD5, the human ortholog of yeast Elg1, plays a role in PCNA deubiquitination. Since PCNA modification is important to regulate DNA damage bypass, ATAD5 may be important for suppression of genomic instability in mammals in vivo. To test this hypothesis, we generated heterozygous (Atad5(+/m)) mice that were haploinsuffficient for Atad5. Atad5(+/m) mice displayed high levels of genomic instability in vivo, and Atad5(+/m) mouse embryonic fibroblasts (MEFs) exhibited molecular defects in PCNA deubiquitination in response to DNA damage, as well as DNA damage hypersensitivity and high levels of genomic instability, apoptosis, and aneuploidy. Importantly, 90% of haploinsufficient Atad5(+/m) mice developed tumors, including sarcomas, carcinomas, and adenocarcinomas, between 11 and 20 months of age. High levels of genomic alterations were evident in tumors that arose in the Atad5(+/m) mice. Consistent with a role for Atad5 in suppressing tumorigenesis, we also identified somatic mutations of ATAD5 in 4.6% of sporadic human endometrial tumors, including two nonsense mutations that resulted in loss of proper ATAD5 function. Taken together, our findings indicate that loss-of-function mutations in mammalian Atad5 are sufficient to cause genomic instability and tumorigenesis.
[Show abstract][Hide abstract] ABSTRACT: This unit includes a basic protocol with an introduction to the Map Viewer, describing how to perform a simple text-based search of genome annotations to view the genomic context of a gene, navigate along a chromosome, zoom in and out, and change the displayed maps to hide and show information. It also describes some of NCBI's sequence-analysis tools, which are provided as links from the Map Viewer. The alternate protocols describe different ways to query the genome sequence, and also illustrate additional features of the Map Viewer. Alternate Protocol 1 shows how to perform and interpret the results of a BLAST search against the human genome. Alternate Protocol 2 demonstrates how to retrieve a list of all genes between two STS markers. Finally, Alternate Protocol 3 shows how to find all annotated members of a gene family.
Current protocols in human genetics / editorial board, Jonathan L. Haines ... [et al.] 04/2011; Chapter 18:Unit18.5. DOI:10.1002/0471142905.hg1805s69
[Show abstract][Hide abstract] ABSTRACT: Adverse events linked to perturbations of cellular genes by vector insertion reported in gene therapy trials and animal models have prompted attempts to better understand the mechanisms directing viral vector integration. The integration profiles of vectors based on MLV, ASLV, SIV and HIV have all been shown to be non-random, and novel vectors with a safer integration pattern have been sought. Recently, we developed a producer cell line called CatPac that packages standard MoMLV vectors with feline leukemia virus (FeLV) gag, pol and env gene products. We now report the integration profile of this vector, asking if the FeLV integrase and capsid proteins could modify the MoMLV integration profile, potentially resulting in a less genotoxic pattern. We transduced rhesus macaque CD34+ hematopoietic progenitor cells with CatPac or standard MoMLV vectors, and determined their integration profile by LAM-PCR. We obtained 184 and 175 unique integration sites (ISs) respectively for CatPac and standard MoMLV vectors, and these were compared with 10 000 in silico-generated random IS. The integration profile for CatPac vector was similar to MoMLV and equally non-random, with a propensity for integration near transcription start sites and in highly dense gene regions. We found an IS for CatPac vector localized 715 nucleotides upstream of LMO-2, the gene involved in the acute lymphoblastic leukemia developed by X-SCID patients treated by gene therapy using MoMLV vectors. In conclusion, we found that replacement of MoMLV env, gag and pol gene products with FeLV did not alter the basic integration profile. Thus, there appears to be no safety advantage for this packaging system. However, considering the stability and efficacy of CatPac vectors, further development is warranted, using potentially safer vector backbones, for instance those with a SIN configuration.
[Show abstract][Hide abstract] ABSTRACT: Derivation of induced pluripotent stem (iPS) cells requires the expression of defined transcription factors (among Oct3/4, Sox2, Klf4, c-Myc, Nanog, and Lin28) in the targeted cells. Lentiviral or standard retroviral gene transfer remains the most robust and commonly used approach. Low reprogramming frequency overall, and the higher efficiency of derivation utilizing integrating vectors compared to more recent nonviral approaches, suggests that gene activation or disruption via proviral integration sites (IS) may play a role in obtaining the pluripotent phenotype. We provide for the first time an extensive analysis of the lentiviral integration profile in human iPS cells. We identified a total of 78 independent IS in eight recently established iPS cell lines derived from either human fetal fibroblasts or newborn foreskin fibroblasts after lentiviral gene transfer of Oct4, Sox2, Nanog, and Lin28. The number of IS ranged from 5 to 15 IS per individual iPS clone, and 75 IS could be assigned to a unique chromosomal location. The different iPS clones had no IS in common. Expression analysis as well as extensive bioinformatic analysis did not reveal functional concordance of the lentiviral targeted genes between the different clones. Interestingly, in six of the eight iPS clones, some of the IS were found in pairs, integrated into the same chromosomal location within six base pairs of each other or in very close proximity. Our study supports recent reports that efficient reprogramming of human somatic cells is not dependent on insertional activation or deactivation of specific genes or gene classes.
[Show abstract][Hide abstract] ABSTRACT: This unit includes a Basic Protocol with an introduction to the Map Viewer, describing how to perform a simple text-based search of genome annotations to view the genomic context of a gene, navigate along a chromosome, zoom in and out, and change the displayed maps to hide and show information. It also describes some of NCBI's sequence-analysis tools, which are provided as links from the Map Viewer. The Alternate Protocols describe different ways to query the genome sequence, and also illustrate additional features of the Map Viewer. Alternate Protocol 1 shows how to perform and interpret the results of a BLAST search against the human genome. Alternate Protocol 2 demonstrates how to retrieve a list of all genes between two STS markers. Finally, Alternate Protocol 3 shows how to find all annotated members of a gene family.
Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.] 03/2010; Chapter 1:Unit 1.5.1-25. DOI:10.1002/0471250953.bi0105s29
[Show abstract][Hide abstract] ABSTRACT: We describe the creation of a specialized web-accessible database named the Pigment Cell Gene Resource, which contains information on the genetic pathways that regulate pigment cell development and function. This manually curated database is comprised of two sections, an annotated literature section and an interactive transcriptional network diagram. Initially, this database focuses on the transcription factor SOX10, which has essential roles in pigment cell development and function, but the database has been designed with the capacity to expand in the future, allowing inclusion of many more pigmentation genes.
Database URL: http://research.nhgri.nih.gov/pigment_cell/
Database The Journal of Biological Databases and Curation 01/2010; 2010:baq025. DOI:10.1093/database/baq025 · 3.37 Impact Factor