Stein Aerts

Aix-Marseille Université, Marseille, Provence-Alpes-Cote d'Azur, France

Are you Stein Aerts?

Claim your profile

Publications (31)222.49 Total impact

  • Article: Comparative motif discovery combined with comparative transcriptomics yield accurate targetome and enhancer predictions.
    [show abstract] [hide abstract]
    ABSTRACT: The identification of transcription factor binding sites, enhancers, and transcriptional target genes often rely on the integration of gene expression profiling and computational cis-regulatory sequence analysis. Methods for the prediction of cis-regulatory elements can take advantage of comparative genomics to increase signal-to-noise levels. However, gene expression data are usually derived from only one species. Here we investigate tissue-specific cross-species gene expression profiling by high-throughput sequencing, combined with cross-species motif discovery. First, we compared different methods for expression level quantification and cross-species integration using Tag-Seq data. Using the optimal pipeline we derived a set of genes with conserved expression during retinal determination across D. melanogaster, D. yakuba, and D. virilis. These genes are enriched for binding sites of eye-related transcription factors including the zinc-finger Glass, a master regulator of photoreceptor differentiation. Validation of predicted Glass targets using RNA-Seq in homozygous glass mutant confirms that the majority of our predictions are expressed downstream of Glass. Finally, we tested nine candidate enhancers by in vivo reporter assays and found eight of them to drive GFP in the eye disc, of which seven co-localize with the Glass protein, namely scrt, chp, dpr10, CG6329, retn, Lim3, and dmrt99B. In conclusion, we show for the first time the combined use of cross-species expression profiling with cross-species motif discovery as a method to define a core developmental program and we augment the candidate Glass targetome from a single known target gene, lozenge, to at least 62 conserved transcriptional targets.
    Genome Research 10/2012; · 13.61 Impact Factor
  • Article: i-cisTarget: an integrative genomics method for the prediction of regulatory features and cis-regulatory modules.
    [show abstract] [hide abstract]
    ABSTRACT: The field of regulatory genomics today is characterized by the generation of high-throughput data sets that capture genome-wide transcription factor (TF) binding, histone modifications, or DNAseI hypersensitive regions across many cell types and conditions. In this context, a critical question is how to make optimal use of these publicly available datasets when studying transcriptional regulation. Here, we address this question in Drosophila melanogaster for which a large number of high-throughput regulatory datasets are available. We developed i-cisTarget (where the 'i' stands for integrative), for the first time enabling the discovery of different types of enriched 'regulatory features' in a set of co-regulated sequences in one analysis, being either TF motifs or 'in vivo' chromatin features, or combinations thereof. We have validated our approach on 15 co-expressed gene sets, 21 ChIP data sets, 628 curated gene sets and multiple individual case studies, and show that meaningful regulatory features can be confidently discovered; that bona fide enhancers can be identified, both by in vivo events and by TF motifs; and that combinations of in vivo events and TF motifs further increase the performance of enhancer prediction.
    Nucleic Acids Research 06/2012; 40(15):e114. · 8.03 Impact Factor
  • Article: Using cisTargetX to predict transcriptional targets and networks in Drosophila.
    [show abstract] [hide abstract]
    ABSTRACT: Gene expression regulation is a fundamental biological process leading to complete organism development by controlling processes like cell type specification and differentiation. The accuracy of this process is -governed by transcription factors (TFs) acting within a complex gene regulatory network. CisTargetX has been developed to enable a user to predict TFs, enhancers, and target genes involved in the regulation of co-expressed genes. It uses a strategy that incorporates the genome-wide prediction of clusters of transcription factor binding sites (TFBSs), starting from a large, unbiased collection of position weight matrices (PWMs) and uses comparative genomics criteria to filter potential TFBS. We describe in this chapter, step-by-step, how to use cisTargetX starting from a set of genes or TF(s) to predict transcriptional targets with their putative binding sites and networks in Drosophila. Next, we illustrate this approach on a particular developmental system, namely, sensory organ development, and identify relevant TFs, DNA regions regulating gene expression, and TF/target gene interactions. CisTargetX is available at http://med.kuleuven.be/lcb/cisTargetX .
    Methods in molecular biology (Clifton, N.J.) 01/2012; 786:291-314.
  • Article: Variations in the exome of the LNCaP prostate cancer cell line.
    [show abstract] [hide abstract]
    ABSTRACT: The LNCaP cell line is widely used as a model for prostate cancer. However, information on protein-changing mutations, genetic heterogeneity and genetic (in)stability is largely lacking for these cells. Next-generation sequencing of the LNCaP exome revealed many single nucleotide variants (SNVs). To help identify the mutations that are most likely drivers of the oncogenic process, we developed an in silico protocol, which can be adapted for other exome analyses. We detected 1,802 non-synonymous SNVs and 218 small insertions and deletions in the LNCaP exome. We confirm the known mutations in the androgen receptor and the PTEN gene, but most other mutations remained undescribed until now. The presence of 38 out of 42 SNVs was confirmed in monoclonal as well as in polyclonal LNCaP derivatives. Moreover, most variants were also detectable in LNCaP mRNA. We provide an extensive database of genetic variations in the protein-coding part of the genome of LNCaP cells, which should be taken into consideration when using LNCaP cells or its derivatives as models for prostate cancer. From the analysis of several LNCaP-derived cultures and clones, we can confirm that the cell line is heterozygous for a large number of variants and that both the variant and the wild-type allele can be simultaneously expressed as mRNA. The fact that the SNVs in the E-cadherin, CDK4, Notch1, and PlexinB1 genes are absent in some of the subclones strongly indicates a degree of genetic instability.
    The Prostate 12/2011; 72(12):1317-27. · 3.48 Impact Factor
  • Article: Mutation analysis of the tyrosine phosphatase PTPN2 in Hodgkin's lymphoma and T-cell non-Hodgkin's lymphoma.
    [show abstract] [hide abstract]
    ABSTRACT: We recently reported deletion of the protein tyrosine phosphatase gene PTPN2 in T-cell acute lymphoblastic leukemia. Functional analyses confirmed that PTPN2 acts as classical tumor suppressor repressing the proliferation of T cells, in part through inhibition of JAK/STAT signaling. We investigated the expression of PTPN2 in leukemia as well as lymphoma cell lines. We identified bi-allelic inactivation of PTPN2 in the Hodgkin's lymphoma cell line SUP-HD1 which was associated with activation of the JAK/STAT pathway. Subsequent sequence analysis of Hodgkin's lymphoma and T-cell non-Hodgkin's lymphoma identified bi-allelic inactivation of PTPN2 in 2 out of 39 cases of peripheral T-cell lymphoma not otherwise specified, but not in Hodgkin's lymphoma. These results, together with our own data on T-cell acute lymphoblastic leukemia, demonstrate that PTPN2 is a tumor suppressor gene in T-cell malignancies.
    Haematologica 07/2011; 96(11):1723-7. · 6.42 Impact Factor
  • Source
    Article: Robust target gene discovery through transcriptome perturbations and genome-wide enhancer predictions in Drosophila uncovers a regulatory basis for sensory specification.
    [show abstract] [hide abstract]
    ABSTRACT: A comprehensive systems-level understanding of developmental programs requires the mapping of the underlying gene regulatory networks. While significant progress has been made in mapping a few such networks, almost all gene regulatory networks underlying cell-fate specification remain unknown and their discovery is significantly hampered by the paucity of generalized, in vivo validated tools of target gene and functional enhancer discovery. We combined genetic transcriptome perturbations and comprehensive computational analyses to identify a large cohort of target genes of the proneural and tumor suppressor factor Atonal, which specifies the switch from undifferentiated pluripotent cells to R8 photoreceptor neurons during larval development. Extensive in vivo validations of the predicted targets for the proneural factor Atonal demonstrate a 50% success rate of bona fide targets. Furthermore we show that these enhancers are functionally conserved by cloning orthologous enhancers from Drosophila ananassae and D. virilis in D. melanogaster. Finally, to investigate cis-regulatory cross-talk between Ato and other retinal differentiation transcription factors (TFs), we performed motif analyses and independent target predictions for Eyeless, Senseless, Suppressor of Hairless, Rough, and Glass. Our analyses show that cisTargetX identifies the correct motif from a set of coexpressed genes and accurately predicts target genes of individual TFs. The validated set of novel Ato targets exhibit functional enrichment of signaling molecules and a subset is predicted to be coregulated by other TFs within the retinal gene regulatory network.
    PLoS Biology 01/2010; 8(7):e1000435. · 11.45 Impact Factor
  • Source
    Article: The atonal proneural transcription factor links differentiation and tumor formation in Drosophila.
    [show abstract] [hide abstract]
    ABSTRACT: The acquisition of terminal cell fate and onset of differentiation are instructed by cell type-specific master control genes. Loss of differentiation is frequently observed during cancer progression, but the underlying causes and mechanisms remain poorly understood. We tested the hypothesis that master regulators of differentiation may be key regulators of tumor formation. Using loss- and gain-of-function analyses in Drosophila, we describe a critical anti-oncogenic function for the atonal transcription factor in the fly retina, where atonal instructs tissue differentiation. In the tumor context, atonal acts by regulating cell proliferation and death via the JNK stress response pathway. Combined with evidence that atonal's mammalian homolog, ATOH1, is a tumor suppressor gene, our data support a critical, evolutionarily conserved, function for ato in oncogenesis.
    PLoS Biology 03/2009; 7(2):e40. · 11.45 Impact Factor
  • Article: Integrating computational biology and forward genetics in Drosophila.
    [show abstract] [hide abstract]
    ABSTRACT: Genetic screens are powerful methods for the discovery of gene-phenotype associations. However, a systems biology approach to genetics must leverage the massive amount of "omics" data to enhance the power and speed of functional gene discovery in vivo. Thus far, few computational methods for gene function prediction have been rigorously tested for their performance on a genome-wide scale in vivo. In this work, we demonstrate that integrating genome-wide computational gene prioritization with large-scale genetic screening is a powerful tool for functional gene discovery. To discover genes involved in neural development in Drosophila, we extend our strategy for the prioritization of human candidate disease genes to functional prioritization in Drosophila. We then integrate this prioritization strategy with a large-scale genetic screen for interactors of the proneural transcription factor Atonal using genomic deficiencies and mutant and RNAi collections. Using the prioritized genes validated in our genetic screen, we describe a novel genetic interaction network for Atonal. Lastly, we prioritize the whole Drosophila genome and identify candidate gene associations for ten receptor-signaling pathways. This novel database of prioritized pathway candidates, as well as a web application for functional prioritization in Drosophila, called Endeavour-HighFly, and the Atonal network, are publicly available resources. A systems genetics approach that combines the power of computational predictions with in vivo genetic screens strongly enhances the process of gene function and gene-gene association discovery.
    PLoS Genetics 02/2009; 5(1):e1000351. · 8.69 Impact Factor
  • Article: ENDEAVOUR update: a web resource for gene prioritization in multiple species.
    [show abstract] [hide abstract]
    ABSTRACT: Endeavour (http://www.esat.kuleuven.be/endeavourweb; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes. Using a training set of genes known to be involved in a biological process of interest, our approach consists of (i) inferring several models (based on various genomic data sources), (ii) applying each model to the candidate genes to rank those candidates against the profile of the known genes and (iii) merging the several rankings into a global ranking of the candidate genes. In the present article, we describe the latest developments of Endeavour. First, we provide a web-based user interface, besides our Java client, to make Endeavour more universally accessible. Second, we support multiple species: in addition to Homo sapiens, we now provide gene prioritization for three major model organisms: Mus musculus, Rattus norvegicus and Caenorhabditis elegans. Third, Endeavour makes use of additional data sources and is now including numerous databases: ontologies and annotations, protein-protein interactions, cis-regulatory information, gene expression data sets, sequence information and text-mining data. We tested the novel version of Endeavour on 32 recent disease gene associations from the literature. Additionally, we describe a number of recent independent studies that made use of Endeavour to prioritize candidate genes for obesity and Type II diabetes, cleft lip and cleft palate, and pulmonary fibrosis.
    Nucleic Acids Research 08/2008; 36(Web Server issue):W377-84. · 8.03 Impact Factor
  • Source
    Article: ModuleMiner - improved computational detection of cis-regulatory modules: are there different modes of gene regulation in embryonic development and adult tissues?
    [show abstract] [hide abstract]
    ABSTRACT: We present ModuleMiner, a novel algorithm for computationally detecting cis-regulatory modules (CRMs) in a set of co-expressed genes. ModuleMiner outperforms other methods for CRM detection on benchmark data, and successfully detects CRMs in tissue-specific microarray clusters and in embryonic development gene sets. Interestingly, CRM predictions for differentiated tissues exhibit strong enrichment close to the transcription start site, whereas CRM predictions for embryonic development gene sets are depleted in this region.
    Genome biology 05/2008; 9(4):R66. · 6.63 Impact Factor
  • Source
    Article: ORegAnno: an open-access community-driven resource for regulatory annotation.
    [show abstract] [hide abstract]
    ABSTRACT: ORegAnno is an open-source, open-access database and literature curation system for community-based annotation of experimentally identified DNA regulatory regions, transcription factor binding sites and regulatory variants. The current release comprises 30 145 records curated from 922 publications and describing regulatory sequences for over 3853 genes and 465 transcription factors from 19 species. A new feature called the 'publication queue' allows users to input relevant papers from scientific literature as targets for annotation. The queue contains 4438 gene regulation papers entered by experts and another 54 351 identified by text-mining methods. Users can enter or 'check out' papers from the queue for manual curation using a series of user-friendly annotation pages. A typical record entry consists of species, sequence type, sequence, target gene, binding factor, experimental outcome and one or more lines of experimental evidence. An evidence ontology was developed to describe and categorize these experiments. Records are cross-referenced to Ensembl or Entrez gene identifiers, PubMed and dbSNP and can be visualized in the Ensembl or UCSC genome browsers. All data are freely available through search pages, XML data dumps or web services at: http://www.oreganno.org.
    Nucleic Acids Research 02/2008; 36(Database issue):D107-13. · 8.03 Impact Factor
  • Source
    Article: Text-mining assisted regulatory annotation.
    [show abstract] [hide abstract]
    ABSTRACT: Decoding transcriptional regulatory networks and the genomic cis-regulatory logic implemented in their control nodes is a fundamental challenge in genome biology. High-throughput computational and experimental analyses of regulatory networks and sequences rely heavily on positive control data from prior small-scale experiments, but the vast majority of previously discovered regulatory data remains locked in the biomedical literature. We develop text-mining strategies to identify relevant publications and extract sequence information to assist the regulatory annotation process. Using a vector space model to identify Medline abstracts from papers likely to have high cis-regulatory content, we demonstrate that document relevance ranking can assist the curation of transcriptional regulatory networks and estimate that, minimally, 30,000 papers harbor unannotated cis-regulatory data. In addition, we show that DNA sequences can be extracted from primary text with high cis-regulatory content and mapped to genome sequences as a means of identifying the location, organism and target gene information that is critical to the cis-regulatory annotation process. Our results demonstrate that text-mining technologies can be successfully integrated with genome annotation systems, thereby increasing the availability of annotated cis-regulatory data needed to catalyze advances in the field of gene regulation.
    Genome biology 02/2008; 9(2):R31. · 6.63 Impact Factor
  • Article: ORegAnno: an open-access community-driven resource for regulatory annotation.
    Nucleic Acids Research. 01/2008; 36:107-113.
  • Article: Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures.
    [show abstract] [hide abstract]
    ABSTRACT: Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or 'evolutionary signatures', dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.
    Nature 12/2007; 450(7167):219-32. · 36.28 Impact Factor
  • Source
    Article: Fine-tuning enhancer models to predict transcriptional targets across multiple genomes.
    [show abstract] [hide abstract]
    ABSTRACT: Networks of regulatory relations between transcription factors (TF) and their target genes (TG)- implemented through TF binding sites (TFBS)- are key features of biology. An idealized approach to solving such networks consists of starting from a consensus TFBS or a position weight matrix (PWM) to generate a high accuracy list of candidate TGs for biological validation. Developing and evaluating such approaches remains a formidable challenge in regulatory bioinformatics. We perform a benchmark study on 34 Drosophila TFs to assess existing TFBS and cis-regulatory module (CRM) detection methods, with a strong focus on the use of multiple genomes. Particularly, for CRM-modelling we investigate the addition of orthologous sites to a known PWM to construct phyloPWMs and we assess the added value of phylogenentic footprinting to predict contextual motifs around known TFBSs. For CRM-prediction, we compare motif conservation with network-level conservation approaches across multiple genomes. Choosing the optimal training and scoring strategies strongly enhances the performance of TG prediction for more than half of the tested TFs. Finally, we analyse a 35(th) TF, namely Eyeless, and find a significant overlap between predicted TGs and candidate TGs identified by microarray expression studies. In summary we identify several ways to optimize TF-specific TG predictions, some of which can be applied to all TFs, and others that can be applied only to particular TFs. The ability to model known TF-TG relations, together with the use of multiple genomes, results in a significant step forward in solving the architecture of gene regulatory networks.
    PLoS ONE 02/2007; 2(11):e1115. · 4.09 Impact Factor
  • Source
    Article: Gene prioritization through genomic data fusion.
    [show abstract] [hide abstract]
    ABSTRACT: The identification of genes involved in health and disease remains a challenge. We describe a bioinformatics approach, together with a freely accessible, interactive and flexible software termed Endeavour, to prioritize candidate genes underlying biological processes or diseases, based on their similarity to known genes involved in these phenomena. Unlike previous approaches, ours generates distinct prioritizations for multiple heterogeneous data sources, which are then integrated, or fused, into a global ranking using order statistics. In addition, it offers the flexibility of including additional data sources. Validation of our approach revealed it was able to efficiently prioritize 627 genes in disease data sets and 76 genes in biological pathway sets, identify candidates of 16 mono- or polygenic diseases, and discover regulatory genes of myeloid differentiation. Furthermore, the approach identified a novel gene involved in craniofacial development from a 2-Mb chromosomal region, deleted in some patients with DiGeorge-like birth defects. The approach described here offers an alternative integrative method for gene discovery.
    Nature Biotechnology 06/2006; 24(5):537-44. · 23.27 Impact Factor
  • Source
    Article: Identification of conserved modes of expression profiles during hippocampal development and neuronal differentiation in vitro.
    [show abstract] [hide abstract]
    ABSTRACT: Gene expression profiles can be regarded as sums of simpler modes, analogous to the modes of a vibrating violin string. Decomposition of temporal gene expression profiles into modes by singular value decomposition (SVD) was reported before, but the question as to what degree the SVD modes can be interpreted in terms of biology remains open. We report and compare the results of SVD of published datasets from hippocampal development, neuronal differentiation in vitro, and a control time-series hippocampal dataset. We demonstrate that the first SVD mode reflects the magnitude of expression, interpretable on the Affymetrix platform. In the datasets from gene profiling of hippocampal development and neuronal differentiation, the second mode reflects a monotonous change in expression, either up- or down-regulation, in the time course of experiment. We demonstrate that the top two SVD modes are conserved between datasets and therefore, likely reflect properties of the underlying system (gene expression in hippocampus) rather than of a particular experiment or dataset. Our results also indicate that the magnitude of expression, and the direction of change in expression during hippocampal development, are uncorrelated, suggesting that they are regulated by largely independent mechanisms.
    Journal of Neurochemistry 05/2006; 97 Suppl 1:87-91. · 4.06 Impact Factor
  • Source
    Article: Prediction of a key role of motifs binding E2F and NR2F in down-regulation of numerous genes during the development of the mouse hippocampus.
    [show abstract] [hide abstract]
    ABSTRACT: We previously demonstrated that gene expression profiles during neuronal differentiation in vitro and hippocampal development in vivo were very similar, due to a conservation of the important second singular value decomposition (SVD) mode (Mode 2) of expression. The conservation of Mode 2 suggests that it reflects a regulatory mechanism conserved between the two systems. In either dataset, the expression vectors of all the genes form two large clusters that differ in the sign of the contribution of Mode 2, which for the majority of them reflects the difference between down- or up-regulation. In the current work, we used a novel approach of analyzing cis-regulation of gene expression in a subspace of a single SVD mode of temporal expression profiles. In the putative upstream regulatory sequences identified by mouse-human homology for all the genes represented in either dataset, we searched for simple features (motifs and pairs of motifs) associated with either sign of the loading of Mode 2. Using a cross-system training-test set approach, we identified E2F binding sites as predictors of down-regulation of gene expression during hippocampal development. NR2F binding sites, for the transcription factors Nr2f/COUP and Hnf4, and also NR2F_SP1 pairs of binding sites, were predictors of down-regulation of expression both during hippocampal development and neuronal differentiation. Analysis of another dataset, from gene profiling of myoblast differentiation in vitro, shows that the conservation of Mode 2 extends to the differentiation of mesenchymal cells. This permitted the identification of two more pairs of motifs, one of which included the CDE/CHR tandem element, as features associated with down-regulation both in the differentiating myoblasts and in the developing hippocampus. Of the features we identified, the E2F and CDE/CHR motifs may be associated with the cycling progenitor cell status, while NR2F may be related to the entry into differentiation along the neuronal pathway. Our results constitute the first prediction of an expression pattern from the genomic sequence for the developing mammalian brain, and demonstrate a potential for the analysis of gene regulation in a subspace of a single SVD mode of expression.
    BMC Bioinformatics 02/2006; 7:367. · 2.75 Impact Factor
  • Source
    Article: TOUCAN 2: the all-inclusive open source workbench for regulatory sequence analysis.
    [show abstract] [hide abstract]
    ABSTRACT: We present the second and improved release of the TOUCAN workbench for cis-regulatory sequence analysis. TOUCAN implements and integrates fast state-of-the-art methods and strategies in gene regulation bioinformatics, including algorithms for comparative genomics and for the detection of cis-regulatory modules. This second release of TOUCAN has become open source and thereby carries the potential to evolve rapidly. The main goal of TOUCAN is to allow a user to come to testable hypotheses regarding the regulation of a gene or of a set of co-regulated genes. TOUCAN can be launched from this location: http://www.esat.kuleuven.ac.be/~saerts/software/toucan.php.
    Nucleic Acids Research 08/2005; 33(Web Server issue):W393-6. · 8.03 Impact Factor
  • Source
    Article: A genetic algorithm for the detection of new cis-regulatory modules in sets of coregulated genes.
    [show abstract] [hide abstract]
    ABSTRACT: The implementation of a genetic algorithm is described that provides a fast method of searching for the optimal combination of transcription factor binding sites in a set of regulatory sequences. AVAILABILITY: The algorithm can be used transparently as a web service from within the Toucan software. Toucan can be accessed at http://www.esat.kuleuven.ac.be/~saerts/software/toucan.php. A standalone version of the software is available upon request.
    Bioinformatics 09/2004; 20(12):1974-6. · 5.47 Impact Factor

Institutions

  • 2012
    • Aix-Marseille Université
      Marseille, Provence-Alpes-Cote d'Azur, France
  • 2006–2012
    • Leuven University College
      Leuven, VLG, Belgium
  • 2003–2010
    • KU Leuven
      • • Department of Human Genetics
      • • Department of Electrical Engineering (ESAT)
      Leuven, VLG, Belgium
  • 2007–2009
    • Vlaams Instituut voor Biotechnologie
      • Laboratory of Neurogenetics
      Gent, VLG, Belgium