-
[show abstract]
[hide abstract]
ABSTRACT: oPOSSUM-3 is a web-accessible software system for identification of over-represented transcription factor binding sites (TFBS) and TFBS families in either DNA sequences of co-expressed genes or sequences generated from high-throughput methods, such as ChIP-Seq. Validation of the system with known sets of co-regulated genes and published ChIP-Seq data demonstrates the capacity for oPOSSUM-3 to identify mediating transcription factors (TF) for co-regulated genes or co-recovered sequences. oPOSSUM-3 is available at http://opossum.cisreg.ca.
G3 (Bethesda, Md.). 09/2012; 2(9):987-1002.
-
[show abstract]
[hide abstract]
ABSTRACT: We performed a genome-wide scan for muscle-specific cis-regulatory modules (CRMs) using three computational prediction programs. Based on the predictions, 339 candidate CRMs were tested in cell culture with NIH3T3 fibroblasts and C2C12 myoblasts for capacity to direct selective reporter gene expression to differentiated C2C12 myotubes. A subset of 19 CRMs validated as functional in the assay. The rate of predictive success reveals striking limitations of computational regulatory sequence analysis methods for CRM discovery. Motif-based methods performed no better than predictions based only on sequence conservation. Analysis of the properties of the functional sequences relative to inactive sequences identifies nucleotide sequence composition can be an important characteristic to incorporate in future methods for improved predictive specificity. Muscle-related TFBSs predicted within the functional sequences display greater sequence conservation than non-TFBS flanking regions. Comparison with recent MyoD and histone modification ChIP-Seq data supports the validity of the functional regions.
PLoS Computational Biology 12/2011; 7(12):e1002256. · 5.22 Impact Factor
-
Carles Vilariño-Güell,
Christian Wider,
Owen A Ross,
Justus C Dachsel,
Jennifer M Kachergus,
Sarah J Lincoln,
Alexandra I Soto-Ortolaza,
Stephanie A Cobb,
Greggory J Wilhoite,
Justin A Bacon, [......],
Tim Lynch,
Eldad Melamed,
Alex Rajput,
Ali H Rajput,
Alessandra Solida,
Ruey-Meei Wu,
Ryan J Uitti,
Zbigniew K Wszolek,
François Vingerhoets,
Matthew J Farrer
[show abstract]
[hide abstract]
ABSTRACT: The identification of genetic causes for Mendelian disorders has been based on the collection of multi-incident families, linkage analysis, and sequencing of genes in candidate intervals. This study describes the application of next-generation sequencing technologies to a Swiss kindred presenting with autosomal-dominant, late-onset Parkinson disease (PD). The family has tremor-predominant dopa-responsive parkinsonism with a mean onset of 50.6 ± 7.3 years. Exome analysis suggests that an aspartic-acid-to-asparagine mutation within vacuolar protein sorting 35 (VPS35 c.1858G>A; p.Asp620Asn) is the genetic determinant of disease. VPS35 is a central component of the retromer cargo-recognition complex, is critical for endosome-trans-golgi trafficking and membrane-protein recycling, and is evolutionarily highly conserved. VPS35 c.1858G>A was found in all affected members of the Swiss kindred and in three more families and one patient with sporadic PD, but it was not observed in 3,309 controls. Further sequencing of familial affected probands revealed only one other missense variant, VPS35 c.946C>T; (p.Pro316Ser), in a pedigree with one unaffected and two affected carriers, and thus the pathogenicity of this mutation remains uncertain. Retromer-mediated sorting and transport is best characterized for acid hydrolase receptors. However, the complex has many types of cargo and is involved in a diverse array of biologic pathways from developmental Wnt signaling to lysosome biogenesis. Our study implicates disruption of VPS35 and retromer-mediated trans-membrane protein sorting, rescue, and recycling in the neurodegenerative process leading to PD.
The American Journal of Human Genetics 07/2011; 89(1):162-7. · 10.60 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: In the central nervous system (CNS), myelin is produced from spirally-wrapped oligodendrocyte plasma membrane and, as exemplified by the debilitating effects of inherited or acquired myelin abnormalities in diseases such as multiple sclerosis, it plays a critical role in nervous system function. Myelin sheath production coincides with rapid up-regulation of numerous genes. The complexity of their subsequent expression patterns, along with recently recognized heterogeneity within the oligodendrocyte lineage, suggest that the regulatory networks controlling such genes drive multiple context-specific transcriptional programs. Conferring this nuanced level of control likely involves a large repertoire of interacting transcription factors (TFs). Here, we combined novel strategies of computational sequence analyses with in vivo functional analysis to establish a TF network model of coordinate myelin-associated gene transcription. Notably, the network model captures regulatory DNA elements and TFs known to regulate oligodendrocyte myelin gene transcription and/or oligodendrocyte development, thereby validating our approach. Further, it links to numerous TFs with previously unsuspected roles in CNS myelination and suggests collaborative relationships amongst both known and novel TFs, thus providing deeper insight into the myelin gene transcriptional network.
Nucleic Acids Research 07/2011; 39(18):7974-91. · 8.03 Impact Factor
-
Antony Le Béchec,
Elodie Portales-Casamar,
Guillaume Vetter,
Michèle Moes,
Pierre-Joachim Zindy,
Anne Saumet,
David Arenillas,
Charles Theillet, Wyeth W Wasserman,
Charles-Henri Lecellier,
Evelyne Friederich
[show abstract]
[hide abstract]
ABSTRACT: To understand biological processes and diseases, it is crucial to unravel the concerted interplay of transcription factors (TFs), microRNAs (miRNAs) and their targets within regulatory networks and fundamental sub-networks. An integrative computational resource generating a comprehensive view of these regulatory molecular interactions at a genome-wide scale would be of great interest to biologists, but is not available to date.
To identify and analyze molecular interaction networks, we developed MIR@NT@N, an integrative approach based on a meta-regulation network model and a large-scale database. MIR@NT@N uses a graph-based approach to predict novel molecular actors across multiple regulatory processes (i.e. TFs acting on protein-coding or miRNA genes, or miRNAs acting on messenger RNAs). Exploiting these predictions, the user can generate networks and further analyze them to identify sub-networks, including motifs such as feedback and feedforward loops (FBL and FFL). In addition, networks can be built from lists of molecular actors with an a priori role in a given biological process to predict novel and unanticipated interactions. Analyses can be contextualized and filtered by integrating additional information such as microarray expression data. All results, including generated graphs, can be visualized, saved and exported into various formats. MIR@NT@N performances have been evaluated using published data and then applied to the regulatory program underlying epithelium to mesenchyme transition (EMT), an evolutionary-conserved process which is implicated in embryonic development and disease.
MIR@NT@N is an effective computational approach to identify novel molecular regulations and to predict gene regulatory networks and sub-networks including conserved motifs within a given biological context. Taking advantage of the M@IA environment, MIR@NT@N is a user-friendly web resource freely available at http://mironton.uni.lu which will be updated on a regular basis.
BMC Bioinformatics 03/2011; 12:67. · 2.75 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Functional contributions of cis-regulatory sequence variations to human genetic disease are numerous. For instance, disrupting variations in a HNF4A transcription factor binding site upstream of the Factor IX gene contributes causally to hemophilia B Leyden. Although clinical genome sequence analysis currently focuses on the identification of protein-altering variation, the impact of cis-regulatory mutations can be similarly strong. New technologies are now enabling genome sequencing beyond exomes, revealing variation across the non-coding 98% of the genome responsible for developmental and physiological patterns of gene activity. The capacity to identify causal regulatory mutations is improving, but predicting functional changes in regulatory DNA sequences remains a great challenge. Here we explore the existing methods and software for prediction of functional variation situated in the cis-regulatory sequences governing gene transcription and RNA processing.
Genome Medicine 01/2011; 3(10):65.
-
Elodie Portales-Casamar,
Douglas J Swanson,
Li Liu,
Charles N de Leeuw,
Kathleen G Banks,
Shannan J Ho Sui,
Debra L Fulton,
Johar Ali,
Mahsa Amirabbasi,
David J Arenillas, [......],
Bibiana K Y Wong,
Siaw H Wong,
Tony Y T Wong,
George S Yang,
Athena R Ypsilanti,
Steven J M Jones,
Robert A Holt,
Daniel Goldowitz, Wyeth W Wasserman,
Elizabeth M Simpson
[show abstract]
[hide abstract]
ABSTRACT: The Pleiades Promoter Project integrates genomewide bioinformatics with large-scale knockin mouse production and histological examination of expression patterns to develop MiniPromoters and related tools designed to study and treat the brain by directed gene expression. Genes with brain expression patterns of interest are subjected to bioinformatic analysis to delineate candidate regulatory regions, which are then incorporated into a panel of compact human MiniPromoters to drive expression to brain regions and cell types of interest. Using single-copy, homologous-recombination "knockins" in embryonic stem cells, each MiniPromoter reporter is integrated immediately 5' of the Hprt locus in the mouse genome. MiniPromoter expression profiles are characterized in differentiation assays of the transgenic cells or in mouse brains following transgenic mouse production. Histological examination of adult brains, eyes, and spinal cords for reporter gene activity is coupled to costaining with cell-type-specific markers to define expression. The publicly available Pleiades MiniPromoter Project is a key resource to facilitate research on brain development and therapies.
Proceedings of the National Academy of Sciences 09/2010; 107(38):16589-94. · 9.68 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Laboratory Animal Management Assistant (LAMA) is an internet-based system for tracking large laboratory mouse colonies. It has a user-friendly interface with powerful search capabilities that ease day-to-day tasks such as tracking breeding cages and weaning litters. LAMA was originally developed to manage hundreds of new mouse strains generated by a large functional genomics program, the Pleiades Promoter Project ( http://www.pleiades.org ). The software system has proven to be highly flexible, suitable for diverse management approaches to mouse colonies. It allows custom tagging and grouping of animals, simplifying project-specific handling and access to data. Finally, LAMA was developed in close collaboration with mouse technicians to ease the transition from paper- or Excel-based management systems to computerized tracking, allowing data export in a popular spreadsheet format and automatic printing of cage cards. LAMA is an open-access software tool, freely available to the research community at http://launchpad.net/mousedb .
Mammalian Genome 06/2010; 21(5-6):224-30. · 2.89 Impact Factor
-
Deepti Malhotra,
Elodie Portales-Casamar,
Anju Singh,
Siddhartha Srivastava,
David Arenillas,
Christine Happel,
Casper Shyr,
Nobunao Wakabayashi,
Thomas W Kensler, Wyeth W Wasserman,
Shyam Biswal
[show abstract]
[hide abstract]
ABSTRACT: The Nrf2 (nuclear factor E2 p45-related factor 2) transcription factor responds to diverse oxidative and electrophilic environmental stresses by circumventing repression by Keap1, translocating to the nucleus, and activating cytoprotective genes. Nrf2 responses provide protection against chemical carcinogenesis, chronic inflammation, neurodegeneration, emphysema, asthma and sepsis in murine models. Nrf2 regulates the expression of a plethora of genes that detoxify oxidants and electrophiles and repair or remove damaged macromolecules, such as through proteasomal processing. However, many direct targets of Nrf2 remain undefined. Here, mouse embryonic fibroblasts (MEF) with either constitutive nuclear accumulation (Keap1(-/-)) or depletion (Nrf2(-/-)) of Nrf2 were utilized to perform chromatin-immunoprecipitation with parallel sequencing (ChIP-Seq) and global transcription profiling. This unique Nrf2 ChIP-Seq dataset is highly enriched for Nrf2-binding motifs. Integrating ChIP-Seq and microarray analyses, we identified 645 basal and 654 inducible direct targets of Nrf2, with 244 genes at the intersection. Modulated pathways in stress response and cell proliferation distinguish the inducible and basal programs. Results were confirmed in an in vivo stress model of cigarette smoke-exposed mice. This study reveals global circuitry of the Nrf2 stress response emphasizing Nrf2 as a central node in cell survival response.
Nucleic Acids Research 05/2010; 38(17):5718-34. · 8.03 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: JASPAR (http://jaspar.genereg.net) is the leading open-access database of matrix profiles describing the DNA-binding patterns of transcription factors (TFs) and other proteins interacting with DNA in a sequence-specific manner. Its fourth major release is the largest expansion of the core database to date: the database now holds 457 non-redundant, curated profiles. The new entries include the first batch of profiles derived from ChIP-seq and ChIP-chip whole-genome binding experiments, and 177 yeast TF binding profiles. The introduction of a yeast division brings the convenience of JASPAR to an active research community. As binding models are refined by newer data, the JASPAR database now uses versioning of matrices: in this release, 12% of the older models were updated to improved versions. Classification of TF families has been improved by adopting a new DNA-binding domain nomenclature. A curated catalog of mammalian TFs is provided, extending the use of the JASPAR profiles to additional TFs belonging to the same structural family. The changes in the database set the system ready for more rapid acquisition of new high-throughput data sources. Additionally, three new special collections provide matrix profile data produced by recent alternative high-throughput approaches.
Nucleic Acids Research 11/2009; 38(Database issue):D105-10. · 8.03 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The PAZAR database unites independently created and maintained data collections of transcription factor and regulatory sequence annotation. The flexible PAZAR schema permits the representation of diverse information derived from experiments ranging from biochemical protein-DNA binding to cellular reporter gene assays. Data collections can be made available to the public, or restricted to specific system users. The data 'boutiques' within the shopping-mall-inspired system facilitate the analysis of genomics data and the creation of predictive models of gene regulation. Since its initial release, PAZAR has grown in terms of data, features and through the addition of an associated package of software tools called the ORCA toolkit (ORCAtk). ORCAtk allows users to rapidly develop analyses based on the information stored in the PAZAR system. PAZAR is available at http://www.pazar.info. ORCAtk can be accessed through convenient buttons located in the PAZAR pages or via our website at http://www.cisreg.ca/ORCAtk.
Nucleic Acids Research 11/2008; 37(Database issue):D54-60. · 8.03 Impact Factor
-
Anne Saumet,
Guillaume Vetter,
Manuella Bouttier,
Elodie Portales-Casamar, Wyeth W Wasserman,
Thomas Maurin,
Bernard Mari,
Pascal Barbry,
Laurent Vallar,
Evelyne Friederich,
Khalil Arar,
Bruno Cassinat,
Christine Chomienne,
Charles-Henri Lecellier
[show abstract]
[hide abstract]
ABSTRACT: Micro(mi)RNAs are small noncoding RNAs that orchestrate many key aspects of cell physiology and their deregulation is often linked to distinct diseases including cancer. Here, we studied the contribution of miRNAs in a well-characterized human myeloid leukemia, acute promyelocytic leukemia (APL), targeted by retinoic acid and trioxide arsenic therapy. We identified several miRNAs transcriptionally repressed by the APL-associated PML-RAR oncogene which are released after treatment with all-trans retinoic acid. These coregulated miRNAs were found to control, in a coordinated manner, crucial pathways linked to leukemogenesis, such as HOX proteins and cell adhesion molecules whose expressions are thereby repressed by the chemotherapy. Thus, APL appears linked to transcriptional perturbation of miRNA genes, and clinical protocols able to successfully eradicate cancer cells may do so by restoring miRNA expression. The identification of abnormal miRNA biogenesis in cancer may therefore provide novel biomarkers and therapeutic targets in myeloid leukemias.
Blood 11/2008; 113(2):412-21. · 9.90 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Inactivation of the transcription factor and tumor suppressor p53, and overexpression or mutational activation of PIK3CA, which encodes the p110alpha catalytic subunit of phosphatidylinositol-3-kinase (PI3K), are two of the most common deleterious genomic changes in cancer, including in ovarian carcinomas. We investigated molecular mechanisms underlying interactions between these two mediators and their possible roles in ovarian tumorigenesis. We identified two alternate PIK3CA promoters and showed direct binding of and transcriptional inhibition by p53 to one of these promoters. Conditional suppression of functional p53 increased p110alpha transcripts, protein levels and PI3K activity in immortalized, non-tumorigenic ovarian surface epithelial (OSE) cells, the precursors of ovarian carcinoma. Conversely, overexpression of p53 by adenoviral infection and activation of p53 by gamma-irradiation both diminished p110alpha protein levels in normal OSE and ovarian cancer cells. The demonstration that p53 binds directly to the PIK3CA promoter and inhibits its activity identifies a novel mechanism whereby these two mediators regulate cellular functions, and whereby inactivation of p53 and subsequent upregulation of PIK3CA might contribute to the pathophysiology of ovarian cancer.
Journal of Cell Science 04/2008; 121(Pt 5):664-74. · 6.11 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: In this study, genome-wide expression analyses were used to study the response of Saccharomyces cerevisiae to stress throughout a 15-day wine fermentation. Forty per cent of the yeast genome significantly changed expression levels to mediate long-term adaptation to fermenting grape must. Among the genes that changed expression levels, a group of 223 genes was identified, which was designated as fermentation stress response (FSR) genes that were dramatically induced at various points during fermentation. FSR genes sustain high levels of induction up to the final time point and exhibited changes in expression levels ranging from four- to 80-fold. The FSR is novel; 62% of the genes involved have not been implicated in global stress responses and 28% of the FSR genes have no functional annotation. Genes involved in respiratory metabolism and gluconeogenesis were expressed during fermentation despite the presence of high concentrations of glucose. Ethanol, rather than nutrient depletion, seems to be responsible for entry of yeast cells into the stationary phase.
FEMS Yeast Research 03/2008; 8(1):35-52. · 2.40 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation.
PLoS Computational Biology 02/2008; 4(1):e5. · 5.22 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We introduce the Gene Characterization Index, a bioinformatics method for scoring the extent to which a protein-encoding gene is functionally described. Inherently a reflection of human perception, the Gene Characterization Index is applied for assessing the characterization status of individual genes, thus serving the advancement of both genome annotation and applied genomics research by rapid and unbiased identification of groups of uncharacterized genes for diverse applications such as directed functional studies and delineation of novel drug targets.
The scoring procedure is based on a global survey of researchers, who assigned characterization scores from 1 (poor) to 10 (extensive) for a sample of genes based on major online resources. By evaluating the survey as training data, we developed a bioinformatics procedure to assign gene characterization scores to all genes in the human genome. We analyzed snapshots of functional genome annotation over a period of 6 years to assess temporal changes reflected by the increase of the average Gene Characterization Index. Applying the Gene Characterization Index to genes within pharmaceutically relevant classes, we confirmed known drug targets as high-scoring genes and revealed potentially interesting novel targets with low characterization indexes. Removing known drug targets and genes linked to sequence-related patent filings from the entirety of indexed genes, we identified sets of low-scoring genes particularly suited for further experimental investigation.
The Gene Characterization Index is intended to serve as a tool to the scientific community and granting agencies for focusing resources and efforts on unexplored areas of the genome. The Gene Characterization Index is available from http://cisreg.ca/gci/.
PLoS ONE 02/2008; 3(1):e1440. · 4.09 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: A central problem in systems biology research is the identification and extension of biological modules-groups of genes or proteins participating in a common cellular process or physical complex. As a result, there is a persistent need for practical, principled methods to infer the modular organization of genes from genome-scale data.
We introduce a novel approach for the identification of modules based on the persistence of isolated gene groups within an evolving graph process. First, the underlying genomic data is summarized in the form of ranked gene-gene relationships, thereby accommodating studies that quantify the relevant biological relationship directly or indirectly. Then, the observed gene-gene relationship ranks are viewed as the outcome of a random graph process and candidate modules are given by the identifiable subgraphs that arise during this process. An isolation index is computed for each module, which quantifies the statistical significance of its survival time.
The Miso (module isolation) method predicts gene modules from genomic data and the associated isolation index provides a module-specific measure of confidence. Improving on existing alternative, such as graph clustering and the global pruning of dendrograms, this index offers two intuitively appealing features: (1) the score is module-specific; and (2) different choices of threshold correlate logically with the resulting performance, i.e. a stringent cutoff yields high quality predictions, but low sensitivity. Through the analysis of yeast phenotype data, the Miso method is shown to outperform existing alternatives, in terms of the specificity and sensitivity of its predictions.
PLoS ONE 02/2008; 3(10):e3358. · 4.09 Impact Factor
-
Obi L. Griffith,
Stephen Montgomery,
Bridget Bernier,
Bryan Chu,
Katayoon Kasaian,
Stein Aerts,
Shaun Mahony,
Monica C. Sleumer,
Mikhail Bilenky,
Maximilian Haeussler, [......],
Ian J. Donaldson,
Gordon Robertson,
Claes Wadelius,
Pieter J. De Bleser,
Dominique Vlieghe,
Marc S. Halfon, Wyeth W. Wasserman,
Ross C. Hardison,
Casey M. Bergman,
Steven J. M. Jones
Nucleic Acids Research. 01/2008; 36:107-113.
-
[show abstract]
[hide abstract]
ABSTRACT: The identification of over-represented transcription factor binding sites from sets of co-expressed genes provides insights into the mechanisms of regulation for diverse biological contexts. oPOSSUM, an internet-based system for such studies of regulation, has been improved and expanded in this new release. New features include a worm-specific version for investigating binding sites conserved between Caenorhabditis elegans and C. briggsae, as well as a yeast-specific version for the analysis of co-expressed sets of Saccharomyces cerevisiae genes. The human and mouse applications feature improvements in ortholog mapping, sequence alignments and the delineation of multiple alternative promoters. oPOSSUM2, introduced for the analysis of over-represented combinations of motifs in human and mouse genes, has been integrated with the original oPOSSUM system. Analysis using user-defined background gene sets is now supported. The transcription factor binding site models have been updated to include new profiles from the JASPAR database. oPOSSUM is available at http://www.cisreg.ca/oPOSSUM/
Nucleic Acids Research 08/2007; 35(Web Server issue):W245-52. · 8.03 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: PAZAR is an open-access and open-source database of transcription factor and regulatory sequence annotation with associated web interface and programming tools for data submission and extraction. Curated boutique data collections can be maintained and disseminated through the unified schema of the mall-like PAZAR repository. The Pleiades Promoter Project collection of brain-linked regulatory sequences is introduced to demonstrate the depth of annotation possible within PAZAR. PAZAR, located at http://www.pazar.info, is open for business.
Genome biology 02/2007; 8(10):R207. · 6.63 Impact Factor