Per Unneberg

French National Centre for Scientific Research, Lutetia Parisorum, Île-de-France, France

Are you Per Unneberg?

Claim your profile

Publications (15)90.56 Total impact

  • Source
    Per Unneberg, Jean-Michel Claverie
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent studies on chromosome conformation show that chromosomes colocalize in the nucleus, bringing together active genes in transcription factories. This spatial proximity of actively transcribing genes could provide a means for RNA interaction at the transcript level. We have screened public databases for chimeric EST and mRNA sequences with the intent of mapping transcription-induced interchromosomal interactions. We suggest that chimeric transcripts may be the result of close encounters of active genes, either as functional products or "noise" in the transcription process, and that they could be used as probes for chromosome interactions. We have found a total of 5,614 chimeric ESTs and 587 chimeric mRNAs that meet our selection criteria. Due to their higher quality, the mRNA findings are of particular interest and we hope that they may serve as food for thought for specialists in diverse areas of molecular biology.
    PLoS ONE 02/2007; 2(2):e254. · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.
    Science 10/2006; 313(5793):1596-604. · 31.20 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The Populus genus has evolved as the model organism for forest tree genomics, which has been further emphasised with the sequencing of the Populus trichocarpa genome. Populus species are widely spread over the Northern Hemisphere and provide a great source of genetic diversity, which can be used for mapping of quantitative trait loci, positional cloning, association mapping and studies in environmental adaptation. Collections of expressed sequence tags (ESTs) are rich sources in studies of genetic diversity. Here, we report on an in-depth analysis of 70,000 ESTs from two Populus species, Populus tremula and Populus trichocarpa. We present data on the level of conservation in transcript sequences and supply a collection of potential single nucleotide polymorphisms.
    Tree Genetics & Genomes 10/2005; 1(3):109-115. · 2.40 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Forage is an application which uses two neural networks for detecting single nucleotide polymorphisms (SNPs). Potential SNP candidates are identified in multiple alignments. Each candidate is then represented by a vector of features, which is classified as SNP or monomorphic by the networks. A validated dataset of SNPs was constructed from experimentally verified SNP data and used for network training and method evalutation. AVAILABILITY: The package is available at biobase.biotech.kth.se/forage/
    Bioinformatics 06/2005; 21(10):2528-30. · 5.32 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Trees present a life form of paramount importance for terrestrial ecosystems and human societies because of their ecological structure and physiological function and provision of energy and industrial materials. The genus Populus is the internationally accepted model for molecular tree biology. We have analyzed 102,019 Populus ESTs that clustered into 11,885 clusters and 12,759 singletons. We also provide >4,000 assembled full clone sequences to serve as a basis for the upcoming annotation of the Populus genome sequence. A public web-based EST database (POPULUSDB) provides digital expression profiles for 18 tissues that comprise the majority of differentiated organs. The coding content of Populus and Arabidopsis genomes shows very high similarity, indicating that differences between these annual and perennial angiosperm life forms result primarily from differences in gene regulation. The high similarity between Populus and Arabidopsis will allow studies of Populus to directly benefit from the detailed functional genomic information generated for Arabidopsis, enabling detailed insights into tree development and adaptation. These data will also valuable for functional genomic efforts in Arabidopsis.
    Proceedings of the National Academy of Sciences 09/2004; 101(38):13951-6. · 9.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.
    PLoS Biology 07/2004; 2(6):e162. · 12.69 Impact Factor
  • American Society of Plant Biologists Newsletter. 01/2004; 38(101):13951-13956.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/ ). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.
    PLoS Biology. 01/2004;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Monitoring of differential gene expression is an important step towards understanding of gene function. We describe a comparison of the representational difference analysis (RDA) subtraction process with corresponding microarray analysis. The subtraction steps are followed in a quantitative manner using a shotgun cloning and sequencing procedure that includes over 1900 gene sequences. In parallel, the enriched transcripts are spotted onto microarrays facilitating large scale hybridization analysis of the representations and the difference products. We show by the shotgun procedure that there is a high diversity of gene fragments represented in the iterative RDA products (92-67% singletons) with a low number of shared sequences (<9%) between subsequent subtraction cycles. A non redundant set of 1141 RDA clones were immobilized on glass slides and the majority of these clones (97%) gave repeated good fluorescent signals in a subsequent hybridization of the labelled and amplified original cDNA. We observed only a low number of false positives (<2%) and a more than twofold differential expression for 32% (363) of the immobilized RDA clones. In conclusion, we show that by random sequencing of the difference products we obtained an accurate transcript profile of the individual steps and that large-scale confirmation of the obtained transcripts can be achieved by microarray analysis.
    Gene 05/2003; 310:39-47. · 2.20 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: There exist a number of gene expression profiling techniques that utilize restriction enzymes for generation of short expressed sequence tags. We have studied how the choice of restriction enzyme influences various characteristics of tags generated in an experiment. We have also investigated various aspects of in silico transcript identification that these profiling methods rely on. First, analysis of 14 248 mRNA sequences derived from the RefSeq transcript database showed that 1-30% of the sequences lack a given restriction enzyme recognition site. Moreover, 1-5% of the transcripts have recognition sites located less than 10 bases from the poly(A) tail. The uniqueness of 10 bp tags lies in the range 90-95%, which increases only slightly with longer tags, due to the existence of closely related transcripts. Furthermore, 3-30% of upstream 10 bp tags are identical to 3' tags, introducing a risk of misclassification if upstream tags are present in a sample. Second, we found that a sequence length of 16-17 bp, including the recognition site, is sufficient for unique transcript identification by BLAST based sequence alignment to the UniGene Human non-redundant database. Third, we constructed a tag-to-gene mapping for UniGene and compared it to an existing mapping database. The mappings agreed to 79-83%, where the selection of representative sequences in the UniGene clusters is the main cause of the disagreement. The results of this study may serve to improve the interpretation of sequence-based expression studies and the design of hybridization arrays, by identifying short tags that have a high reliability and separating them from tags that carry an inherent ambiguity in their capacity to discriminate between genes. To this end, supplementary information in the form of a web companion to this paper is located at http:// biobase.biotech.kth.se/tagseq.
    Nucleic Acids Research 05/2003; 31(8):2217-26. · 8.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Various approaches to the study of differential gene expression are applied to compare cell lines and tissue samples in a wide range of biological contexts. The compromise between focusing on only the important genes in certain cellular processes and achieving a complete picture is critical for the selection of strategy. We demonstrate how global microarray technology can be used for the exploration of the differentially expressed genes extracted through representational difference analysis (RDA). The subtraction of ubiquitous gene fragments from the two samples was demonstrated using cDNA microarrays including more than 32 000 spotted, PCR-amplified human clones. Hybridizations indicated the expression of 9100 of the microarray elements in a macrophage/foam cell atherosclerosis model system, of which many were removed during the RDA process. The stepwise subtraction procedure was demonstrated to yield an efficient enrichment of gene fragments overrepresented in either sample (18% in the representations, 86% after the first subtraction, and 88% after the second subtraction), many of which were impossible to detect in the starting material. Interestingly, the method allowed for the observation of the differential expression of several members of the low-abundant nuclear receptor gene family. We also observed a certain background level in the difference products of nondifferentially expressed gene fragments, warranting a verification strategy for selected candidate genes. The differential expression of several genes was verified by real-time PCR.
    BioTechniques 07/2002; 32(6):1348-50, 1352, 1354-6, 1358. · 2.40 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We describe a novel method for transcript profiling based on high-throughput parallel sequencing of signature tags using a non-gel-based microtiter plate format. The method relies on the identification of cDNA clones by pyrosequencing of the region corresponding to the 3'-end of the mRNA preceding the poly(A) tail. Simultaneously, the method can be used for gene discovery, since tags corresponding to unknown genes can be further characterized by extended sequencing. The protocol was validated using a model system for human atherosclerosis. Two 3'-tagged cDNA libraries, representing macrophages and foam cells, which are key components in the development of atherosclerotic plaques, were constructed using a solid phase approach. The libraries were analyzed by pyrosequencing, giving on average 25 bases. As a control, conventional expressed sequence tag (EST) sequencing using slab gel electrophoresis was performed. Homology searches were used to identify the genes corresponding to each tag. Comparisons with EST sequencing showed identical, unique matches in the majority of cases when the pyrosignature was at least 18 bases. A visualization tool was developed to facilitate differential analysis using a virtual chip format. The analysis resulted in identification of genes with possible relevance for development of atherosclerosis. The use of the method for automated massive parallel signature sequencing is discussed.
    Gene 06/2002; 289(1-2):31-9. · 2.20 Impact Factor
  • Source
    P Unneberg, J J Merelo, P Chacón, F Morán
    [Show abstract] [Hide abstract]
    ABSTRACT: This article presents SOMCD, an improved method for the evaluation of protein secondary structure from circular dichroism spectra, based on Kohonen's self-organizing maps (SOM). Protein circular dichroism (CD) spectra are used to train a SOM, which arranges the spectra on a two-dimensional map. Location in the map reflects the secondary structure composition of a protein. With SOMCD, the prediction of beta-turn has been included. The number of spectra in the training set has been increased, and it now includes 39 protein spectra and 6 reference spectra. Finally, SOM parameters have been chosen to minimize distortion and make the network produce clusters with known properties. Estimation results show improvements compared with the previous version, K2D, which, in addition, estimated only three secondary structure components; the accuracy of the method is more uniform over the different secondary structures.
    Proteins Structure Function and Bioinformatics 04/2001; 42(4):460-70. · 3.34 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This article presents SOMCD, an improved method for the evaluation of protein secondary structure from circular dichroism spectra, based on Kohonen's self-organizing maps (SOM). Protein circular dichroism (CD) spectra are used to train a SOM, which arranges the spectra on a two-dimensional map. Location in the map reflects the secondary structure composition of a protein. With SOMCD, the prediction of β-turn has been included. The number of spectra in the training set has been increased, and it now includes 39 protein spectra and 6 reference spectra. Finally, SOM parameters have been chosen to minimize distortion and make the network produce clusters with known properties. Estimation results show improvements compared with the previous version, K2D, which, in addition, estimated only three secondary structure components; the accuracy of the method is more uniform over the different secondary structures. Proteins 2001;42:460–470. © 2001 Wiley-Liss, Inc.
    Proteins Structure Function and Bioinformatics 02/2001; 42(4):460 - 470. · 3.34 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: An implementation of the method presented in this article will be available online at http://somcd.geneura.org
    Proteins Structure Function and Bioinformatics 01/2001; 42(4):460. · 3.34 Impact Factor

Publication Stats

2k Citations
90.56 Total Impact Points

Institutions

  • 2007
    • French National Centre for Scientific Research
      Lutetia Parisorum, Île-de-France, France
  • 2005–2006
    • AlbaNova University Center
      Tukholma, Stockholm, Sweden
  • 2001–2005
    • KTH Royal Institute of Technology
      • School of Biotechnology (BIO)
      Stockholm, Stockholm, Sweden