Publications (17)73.72 Total impact
-
Article: Manual GO annotation of predictive protein signatures: the InterPro approach to GO curation.
[show abstract] [hide abstract]
ABSTRACT: InterPro amalgamates predictive protein signatures from a number of well-known partner databases into a single resource. To aid with interpretation of results, InterPro entries are manually annotated with terms from the Gene Ontology (GO). The InterPro2GO mappings are comprised of the cross-references between these two resources and are the largest source of GO annotation predictions for proteins. Here, we describe the protocol by which InterPro curators integrate GO terms into the InterPro database. We discuss the unique challenges involved in integrating specific GO terms with entries that may describe a diverse set of proteins, and we illustrate, with examples, how InterPro hierarchies reflect GO terms of increasing specificity. We describe a revised protocol for GO mapping that enables us to assign GO terms to domains based on the function of the individual domain, rather than the function of the families in which the domain is found. We also discuss how taxonomic constraints are dealt with and those cases where we are unable to add any appropriate GO terms. Expert manual annotation of InterPro entries with GO terms enables users to infer function, process or subcellular information for uncharacterized sequences based on sequence matches to predictive models. Database URL: http://www.ebi.ac.uk/interpro. The complete InterPro2GO mappings are available at: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/external2go/interpro2go.Database The Journal of Biological Databases and Curation 01/2012; 2012:bar068. · 2.07 Impact Factor -
Article: InterPro in 2011: new developments in the family and domain prediction database.
[show abstract] [hide abstract]
ABSTRACT: InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.Nucleic Acids Research 11/2011; 40(Database issue):D306-12. · 8.03 Impact Factor -
Article: InterPro: the integrative protein signature database.
[show abstract] [hide abstract]
ABSTRACT: The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or 'signatures' representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total approximately 58,000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein-protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/).Nucleic Acids Research 11/2008; 37(Database issue):D211-5. · 8.03 Impact Factor -
Article: Building a biological space based on protein sequence similarities and biological ontologies.
[show abstract] [hide abstract]
ABSTRACT: Assignment of function to protein sequence is a task of growing importance in the life sciences, as new high-throughput sequencing DNA technologies generate ever increasing quantities of genomic and meta-genomic data. Patterns within the sequence space, caused by the evolutionary conservation and assembly of protein domains, make possible the inference of function from sequence similarity. Clustering similar sequences is a useful technique for finding conserved sequences; the CluSTr database is a publicly-available database arranging proteins in a hierarchy structured by similarity. The protein classification tool InterProScan builds on this approach by applying a range of methods to detect proteins that contain signatures indicative of the presence of particular conserved domains. The use of ontologies to describe protein function provides a flexible and abstract language to classify proteins. Together, these techniques can provide an understanding of the shape of the protein space, and can be used to explore the unchartered waters of the emerging metagenomic world.Combinatorial Chemistry & High Throughput Screening 10/2008; 11(8):653-60. · 1.78 Impact Factor -
Article: The Rice Annotation Project Database (RAP-DB): 2008 update.
[show abstract] [hide abstract]
ABSTRACT: The Rice Annotation Project Database (RAP-DB) was created to provide the genome sequence assembly of the International Rice Genome Sequencing Project (IRGSP), manually curated annotation of the sequence, and other genomics information that could be useful for comprehensive understanding of the rice biology. Since the last publication of the RAP-DB, the IRGSP genome has been revised and reassembled. In addition, a large number of rice-expressed sequence tags have been released, and functional genomics resources have been produced worldwide. Thus, we have thoroughly updated our genome annotation by manual curation of all the functional descriptions of rice genes. The latest version of the RAP-DB contains a variety of annotation data as follows: clone positions, structures and functions of 31 439 genes validated by cDNAs, RNA genes detected by massively parallel signature sequencing (MPSS) technology and sequence similarity, flanking sequences of mutant lines, transposable elements, etc. Other annotation data such as Gnomon can be displayed along with those of RAP for comparison. We have also developed a new keyword search system to allow the user to access useful information. The RAP-DB is available at: http://rapdb.dna.affrc.go.jp/ and http://rapdb.lab.nig.ac.jp/.Nucleic Acids Research 02/2008; 36(Database issue):D1028-33. · 8.03 Impact Factor -
Article: Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana.
[show abstract] [hide abstract]
ABSTRACT: We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is approximately 32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene.Genome Research 03/2007; 17(2):175-83. · 13.61 Impact Factor -
Article: New developments in the InterPro database.
[show abstract] [hide abstract]
ABSTRACT: InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro). The InterProScan search tool is now also available via a web service at http://www.ebi.ac.uk/Tools/webservices/WSInterProScan.html.Nucleic Acids Research 02/2007; 35(Database issue):D224-8. · 8.03 Impact Factor -
Article: InterPro, progress and status in 2005.
[show abstract] [hide abstract]
ABSTRACT: InterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information. Annotation is provided in an abstract, Gene Ontology mapping and links to specialized databases. New features of InterPro include extended protein match views, taxonomic range information and protein 3D structure data. One of the new match views is the InterPro Domain Architecture view, which shows the domain composition of protein matches. Two new entry types were introduced to better describe InterPro entries: these are active site and binding site. PIRSF and the structure-based SUPERFAMILY are the latest member databases to join InterPro, and CATH and PANTHER are soon to be integrated. InterPro release 8.0 contains 11 007 entries, representing 2573 domains, 8166 families, 201 repeats, 26 active sites, 21 binding sites and 20 post-translational modification sites. InterPro covers over 78% of all proteins in the Swiss-Prot and TrEMBL components of UniProt. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).Nucleic Acids Research 02/2005; 33(Database issue):D201-5. · 8.03 Impact Factor -
Article: The InterPro Database, 2003 brings increased coverage and new features.
[show abstract] [hide abstract]
ABSTRACT: InterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 as a means of amalgamating the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART and TIGRFAMs have been manually integrated and curated and are available in InterPro for text- and sequence-based searching. The results are provided in a single format that rationalises the results that would be obtained by searching the member databases individually. The latest release of InterPro contains 5629 entries describing 4280 families, 1239 domains, 95 repeats and 15 post-translational modifications. Currently, the combined signatures in InterPro cover more than 74% of all proteins in SWISS-PROT and TrEMBL, an increase of nearly 15% since the inception of InterPro. New features of the database include improved searching capabilities and enhanced graphical user interfaces for visualisation of the data. The database is available via a webserver (http://www.ebi.ac.uk/interpro) and anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).Nucleic Acids Research 02/2003; 31(1):315-8. · 8.03 Impact Factor -
Article: Mendel-ESTS: Database of Plant ESTs in dbEST Annotated with Gene Family Numbers and Gene Family Names
[show abstract] [hide abstract]
ABSTRACT: The rapid expansion of gene sequencing has led to an exponential increase in the number of genes being deposited in the sequence databases. Associated with this is the proliferation of idiosyncratic gene names. The DE field descriptions in closely related sequences, whether they be from the same or different species, particularly in the EMBL and TrEMBL databases are often inconsistent and sometimes incorrect and misleading (Galperin and Koonin 1998). In an attempt to unify gene nomenclature and DE field descriptions, initially within the plants, two databases have been created: Mendel-GFDb (genes arranged into gene families) and Mendel-ESTS (plant EST and STS sequences related to genes in Mendel-GFDb by gene family numbers). The database web addresses are:Mendel-GFDb and Mendel-ESTS: http://www.mendel.ac.uk/US Mirror: http://genome.cornell.edu/Plant Molecular Biology Reporter 08/1999; 17(3):239-247. · 2.45 Impact Factor -
Article: Wheat regenerated from scutellum callus as a source of material for transformation
[show abstract] [hide abstract]
ABSTRACT: Reports of wheat transformation efficiencies vary from less than 1% to more than 5% for individual experiments. Rarely are negative experiments reported though we estimate that between one in two and one in three of all experiments fail to produce transformed plants. Consequently if transformation efficiencies were calculated from the total number of scutellum bombarded rather than from only those experiments which produced transformed plants there would be a significant fall in reported efficiencies. The use of scutellum-derived material from plants regenerated from scutellum callus and grown in a controlled environment room significantly reduced the number of experiments failing to produce plants. Though there is a small but significant increase in transformation efficiencies for individual experiments, the recovery of plants, as a direct consequence of the reduction in the number of failed experiments, increases nearly 350%, from 4.8 plants/1000 scutella from seed gown plants bombarded to 17 plants/1000 embryos bombarded from tissue culture regenerated plants. There appears to be no additional gains to be made from plants which are cycled through tissue culture more than one time.Plant Cell Tissue and Organ Culture 04/1999; 57(2):153-156. · 3.09 Impact Factor -
Article: Localization and organization of tRNA genes on the mitochondrial genomes of fertile and male sterile lines of maize
[show abstract] [hide abstract]
ABSTRACT: Maize mitochondrial (mt) tRNA genes were localized on the mt master circles of two fertile lines (WF9-N and B37-N) and of one cytoplasmic male sterile line (B37-cmsT) of maize. The three genomes contain 16 tRNA genes with 14 different anticodons which correspond to 13 amino acids. Out of these 16 tRNA genes, 6 show a high degree of homology with the corresponding chloroplast (cp) tRNA genes and were shown to originate from cp DNA insertions and to be expressed in the mitochondria. The organization of the mt tRNA genes in both fertile lines is similar. The same genes are found, in the same environment, as judged from the restriction maps, in fertile and male sterile lines that have the same nuclear background, but the relative organization of the mt tRNA genes on the master circle is completely different.MGG - Molecular and General Genetics 08/1990; 223(2):224-232. -
Article: Sequence analysis of the tRNATyr and tRNALys genes and evidence for the transcription of a chloroplast — like tRNAMet in maize mitochondria
[show abstract] [hide abstract]
ABSTRACT: The nucleotide sequences of three tRNA genes and their flanking regions from the maize mitochondrial genome is reported. These genes, which are located in the same region of the genome between the 14-kb inverted repeats, are transcribed in the mitochondria and code for tRNALys (anticodon UUU) tRNAMet (CAU) and tRNATyr (GUA). The very high homology shown by the tRNAMet gene with its chloroplast counterpart indicates that it probably originates from a chloroplast DNA insertion. The analysis of the upstream regions of these genes showed that the tRNATyr and the tRNALys genes possess the consensus sequence AAGAANRR, which could act as a promoter sequence in higher plant mitochondria.Current Genetics 08/1989; 16(3):195-201. · 2.56 Impact Factor -
Article: Mitochondrial genome organization of the maize cytoplasmic male sterile type T
[show abstract] [hide abstract]
ABSTRACT: A complete SmaI, XhoI, BamHI restriction map of the maize mitochondrial genome from the T male sterile cytoplasm (cmsT) of maize has been established. The genome exists in the form of a complex multicircular structure as found for the maize normal (N) type (Lonsdale et al. 1984) where the entire sequence complexity with a content of 540 kb can be arranged on a single circular master chromosome. However, most of the repeats (inverted or direct) present in the maize cmsT genome are different from those found in the maize N genome. Recombinational events between these repeats generate a population of circular molecules rather different from the multipartite organization of the N genome. The mitochondrial genes are dispersed throughout the genome. The open reading frame coding for a 13 kDa polypeptide associated with cytoplasmic male sterility (Dewey et al. 1986, 1987) has also been located on the map.MGG - Molecular and General Genetics 03/1989; 216(2):395-401. -
Article: Homozygous transgenic wheat plants with increased luciferase activity do not maintain their high level of expression in the next generation
[show abstract] [hide abstract]
ABSTRACT: In an effort to assess transgene expression and stability in wheat (Triticum aestivum L.), we followed expression of the luciferase gene through T1 and T2 generations. We showed that only 48% of T1 homozygous plants expressed the luciferase gene at double the level of hemizygous plants. The homozygous state of these T1 plants was confirmed by studying the segregation of the transgene in the next generation and by using fluorescence in situ hybridisation. We showed that in all homozygous T1 plants showing transgene dosage, the level of luciferase expression was reduced in the next generation. On the contrary, all T1 low expressing plants conserved their expression levels in T2 generation. Possible reasons for this reduction in transgene activity are discussed.Plant Science. -
Article: New developments in the InterPro database
Nucleic Acids Res. 35:D224 - D228. -
Article: Additional introns inserted within the luciferase reporter gene stabilise transgene expression in wheat
[show abstract] [hide abstract]
ABSTRACT: As part of a study to understand factors affecting transgene expression, constructs containing maize introns within the luciferase reporter gene coding region, in addition to a 5′ untranslated leader intron, were transformed into wheat plants. Adding a single intron at one particular position increased transgene activity by 1.5- to 2-fold, however the additional presence of a second intron reduced expression levels in both T1 and T2 populations. These results are in agreement with the transient assay results that we have previously reported. More importantly, we showed that both intron combinations led to a stabilisation of luciferase expression between generations in transgenic wheat plants.Plant Science.
Top Journals
Institutions
-
2007
-
EMBL-EBI
Cambridge, ENG, United Kingdom
-
-
1999
-
John Innes Centre
Norwich, ENG, United Kingdom
-
-
1990
-
University of Utah
Salt Lake City, UT, USA
-