Matthew Pocock

Newcastle University, Newcastle-on-Tyne, England, United Kingdom

Are you Matthew Pocock?

Claim your profile

Publications (39)189.56 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: While the first version of the Synthetic Biology Open Language (SBOL) has been adopted by several academic and commercial genetic design automation (GDA) software tools, it only covers a limited number of the requirements for a standardized exchange format for synthetic biology. In particular, SBOL Version 1.1 is capable of representing DNA components and their hierarchical composition via sequence annotations. This proposal revises SBOL Version 1.1, enabling the representation of a wider range of components with and without sequences, including RNA components, protein components, small molecules, and molecular complexes. It also introduces modules to instantiate groups of components on the basis of their shared function and assert molecular interactions between components. By increasing the range of structural and functional descriptions in SBOL and allowing for their composition, the proposed improvements enable SBOL to represent and facilitate the exchange of a broader class of genetic designs.
    ACS Synthetic Biology 06/2014; · 3.95 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The re-use of previously validated designs is critical to the evolution of synthetic biology from a research discipline to an engineering practice. Here we describe the Synthetic Biology Open Language (SBOL), a proposed data standard for exchanging designs within the synthetic biology community. SBOL represents synthetic biology designs in a community-driven, formalized format for exchange between software tools, research groups and commercial service providers. The SBOL Developers Group has implemented SBOL as an XML/RDF serialization and provides software libraries and specification documentation to help developers implement SBOL in their own software. We describe early successes, including a demonstration of the utility of SBOL for information exchange between several different software tools and repositories from both academic and industrial partners. As a community-driven standard, SBOL will be updated as synthetic biology evolves to provide specific capabilities for different aspects of the synthetic biology workflow.
    Nature Biotechnology 06/2014; 32(6):545-550. · 32.44 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: BacillOndex is an extension of the Ondex data integration system, providing a semantically annotated, integrated knowledge base for the model Gram-positive bacterium Bacillus subtilis. This application allows a user to mine a variety of B. subtilis data sources, and analyse the resulting integrated dataset, which contains data about genes, gene products and their interactions. The data can be analysed either manually, by browsing using Ondex, or computationally via a Web services interface. We describe the process of creating a BacillOndex instance, and describe the use of the system for the analysis of single nucleotide polymorphisms in B. subtilis Marburg. The Marburg strain is the progenitor of the widely-used laboratory strain B. subtilis 168. We identified 27 SNPs with predictable phenotypic effects, including genetic traits for known phenotypes. We conclude that BacillOndex is a valuable tool for the systems-level investigation of, and hypothesis generation about, this important biotechnology workhorse. Such understanding contributes to our ability to construct synthetic genetic circuits in this organism.
    Journal of integrative bioinformatics 01/2013; 10(2):224.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The rise of high-throughput technologies in the post-genomic era has led to the production of large amounts of biological data. Many of these datasets are freely available on the Internet. Making optimal use of these data is a significant challenge for bioinformaticians. Various strategies for integrating data have been proposed to address this challenge. One of the most promising approaches is the development of semantically rich integrated datasets. Although well suited to computational manipulation, such integrated datasets are typically too large and complex for easy visualization and interactive exploration. We have created an integrated dataset for Saccharomyces cerevisiae using the semantic data integration tool Ondex, and have developed a view-based visualization technique that allows for concise graphical representations of the integrated data. The technique was implemented in a plug-in for Cytoscape, called OndexView. We used OndexView to investigate telomere maintenance in S. cerevisiae. The Ondex yeast dataset and the OndexView plug-in for Cytoscape are accessible at http://bsu.ncl.ac.uk/ondexview.
    Bioinformatics 03/2011; 27(9):1299-306. · 5.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The use of computational modeling to describe and analyze biological systems is at the heart of systems biology. Model structures, simulation descriptions and numerical results can be encoded in structured formats, but there is an increasing need to provide an additional semantic layer. Semantic information adds meaning to components of structured descriptions to help identify and interpret them unambiguously. Ontologies are one of the tools frequently used for this purpose. We describe here three ontologies created specifically to address the needs of the systems biology community. The Systems Biology Ontology (SBO) provides semantic information about the model components. The Kinetic Simulation Algorithm Ontology (KiSAO) supplies information about existing algorithms available for the simulation of systems biology models, their characterization and interrelationships. The Terminology for the Description of Dynamics (TEDDY) categorizes dynamical features of the simulation results and general systems behavior. The provision of semantic information extends a model's longevity and facilitates its reuse. It provides useful insight into the biology of modeled processes, and may be used to make informed decisions on subsequent simulation experiments.
    Molecular Systems Biology 01/2011; 7:543. · 11.34 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: An increasing number of biomedical resources provide their information on the Semantic Web and this creates the basis for a distributed knowledge base which has the potential to advance biomedical research [1]. This potential, however, cannot be realized until researchers from the life sciences can interact with information in the Semantic Web. In particular, there is a need for tools that provide data reduction, visualization and interactive analysis capabilities. Ondex is a data integration and visualization platform developed to support Systems Biology Research [2]. At its core is a data model based on two main principles: first, all information can be represented as a graph and, second, all elements of the graph can be annotated with ontologies. This data model conforms to the Semantic Web framework, in particular to RDF, and therefore Ondex is ideally positioned as a platform that can exploit the semantic web. The Ondex system offers a range of features and analysis methods of potential value to semantic web users, including: - An interactive graph visualization interface (Ondex user client), which provides data reduction and representation methods that leverage the ontological annotation. - A suite of importers from a variety of data sources to Ondex (http://ondex.org/formats.html) - A collection of plug-ins which implement graph analysis, graph transformation and graph-matching functions. - An integration toolkit (Ondex Integrator) which allows users to compose workflows from these modular components - In addition, all importers and plug-ins are available as web-services which can be integrated in other tools, as for instance Taverna [3]. The developments that will be presented in this demo have made this functionality interoperable with the Semantic Web framework. In particular we have developed an interactive importer, based on SPARQL that allows the query-driven construction of datasets which brings together information from different RDF data resources into Ondex. These datasets can then be further refined, analysed and annotated both interactively using the Ondex user client and via user-defined workflows. The results of these analyses can be exported in RDF, which can be used to enrich existent knowledge bases, or to provide application-specific views of the data. Both importer and exporter only focus on a subset of the Ondex and RDF data models, which are shared between these two data representations [4]. In this demo we will show how Ondex can be used to query, analyse and visualize Semantic Web knowledge bases. In particular we will present real use cases focused, but not limited to, resources relevant to plant biology. We believe that Ondex can be a valid contribution to the adoption of the Semantic Web in Systems Biology research and in biomedical investigation more generally. We welcome feedback on our current import/export prototype and suggestions for the advancement of Ondex for the Semantic Web. References 1. Ruttenberg, A. et. al.: Advancing translational research with the Semantic Web, BMC Bioinformatics, 8 (Suppl. 3): S2 (2007). 2. Köhler, J., Baumbach, J., Taubert, J., Specht, M., Skusa, A., Ruegg, A., Rawlings, C., Verrier, P., Philippi, S.: Graph-based analysis and visualization of experimental results with Ondex. Bioinformatics 22 (11):1383-1390 (2006). 3. Rawlings, C.: Semantic Data Integration for Systems Biology Research, Technology Track at ISMB’09, http://www.iscb.org/uploaded/css/36/11846.pdf (2009). 4. Splendiani, A. et. al.: Ondex semantic definition, (Web document) http://ondex.svn.sourceforge.net/viewvc/ondex/trunk/doc/semantics/ (2009).
    Nature Precedings 12/2010;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Drug development is expensive and prone to failure. It is potentially much less risky and expensive to reuse a drug developed for one condition for treating a second disease, than it is to develop an entirely new compound. Systematic approaches to drug repositioning are needed to increase throughput and find candidates more reliably. Here we address this need with an integrated systems biology dataset, developed using the Ondex data integration platform, for the in silico discovery of new drug repositioning candidates. We demonstrate that the information in this dataset allows known repositioning examples to be discovered. We also propose a means of automating the search for new treatment indications of existing compounds.
    Journal of integrative bioinformatics 01/2010; 7(3).
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: ABSTRACT : BACKGROUND : The creation of accurate quantitative Systems Biology Markup Language (SBML) models is a time-intensive, manual process often complicated by the many data sources and formats required to annotate even a small and well-scoped model. Ideally, the retrieval and integration of biological knowledge for model annotation should be performed quickly, precisely, and with a minimum of manual effort. RESULTS : Here we present rule-based mediation, a method of semantic data integration applied to systems biology model annotation. The heterogeneous data sources are first syntactically converted into ontologies, which are then aligned to a small domain ontology by applying a rule base. We demonstrate proof-of-principle of this application of rule-based mediation using off-the-shelf semantic web technology through two use cases for SBML model annotation. Existing tools and technology provide a framework around which the system is built, reducing development time and increasing usability. CONCLUSIONS : Integrating resources in this way accommodates multiple formats with different semantics, and provides richly-modelled biological knowledge suitable for annotation of SBML models. This initial work establishes the feasibility of rule-based mediation as part of an automated SBML model annotation system. AVAILABILITY : Detailed information on the project files as well as further information on and comparisons with similar projects is available from the project page at http://cisban-silico.cs.ncl.ac.uk/RBM/.
    Journal of biomedical semantics. 01/2010; 1 Suppl 1:S3.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies.
    Journal of biomedical semantics. 01/2010; 1(1):8.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Saint is a web application which provides a lightweight annotation integration environment for quantitative biological models. The system enables modellers to rapidly mark up models with biological information derived from a range of data sources. Availability and Implementation: Saint is freely available for use on the web at http://www.cisban.ac.uk/saint. The web application is implemented in Google Web Toolkit and Tomcat, with all major browsers supported. The Java source code is freely available for download at http://saint-annotate.sourceforge.net. The Saint web server requires an installation of libSBML and has been tested on Linux (32-bit Ubuntu 8.10 and 9.04).
    Bioinformatics 10/2009; 25(22):3026-7. · 5.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Functional Genomics Experiment data model (FuGE) has been developed to increase the consistency and efficiency of experimental data modeling in the life sciences, and it has been adopted by a number of high-profile standardization organizations. FuGE can be used: (1) directly, whereby generic modeling constructs are used to represent concepts from specific experimental activities; or (2) as a framework within which method-specific models can be developed. FuGE is both rich and flexible, providing a considerable number of modeling constructs, which can be used in a range of different ways. However, such richness and flexibility also mean that modelers and application developers have choices to make when applying FuGE in a given context. This paper captures emerging best practice in the use of FuGE in the light of the experience of several groups by: (1) proposing guidelines for the use and extension of the FuGE data model; (2) presenting design patterns that reflect recurring requirements in experimental data modeling; and (3) describing a community software tool kit (STK) that supports application development using FuGE. We anticipate that these guidelines will encourage consistent usage of FuGE, and as such, will contribute to the development of convergent data standards in omics research.
    Omics: a journal of integrative biology 06/2009; 13(3):239-51. · 2.29 Impact Factor
  • AL Lister, M Pocock, A Wipat
    03/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Many biological systems can be modeled as networks. Hence, network analysis is of increasing importance to systems biology. We describe an evolutionary algorithm for selecting clusters of nodes within a large network based upon network topology together with a measure of the relevance of nodes to a set of independently identified genes of interest. We apply the algorithm to a previously published integrated functional network of yeast genes, using a set of query genes derived from a whole genome screen of yeast strains with a mutation in a telomere uncapping gene. We find that the algorithm identifies biologically plausible clusters of genes which are related to the cell cycle, and which contain interactions not previously identified as potentially important. We conclude that the algorithm is valuable for the querying of complex networks, and the generation of biological hypotheses.
    Proceedings of the 2009 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2009, Nashville, TN, USA, March 30 - April 2, 2009; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: BioJava is a mature open-source project that provides a framework for processing of biological data. BioJava contains powerful analysis and statistical routines, tools for parsing common file formats and packages for manipulating sequences and 3D structures. It enables rapid bioinformatics application development in the Java programming language. AVAILABILITY: BioJava is an open-source project distributed under the Lesser GPL (LGPL). BioJava can be downloaded from the BioJava website (http://www.biojava.org). BioJava requires Java 1.5 or higher. All queries should be directed to the BioJava mailing lists. Details are available at http://biojava.org/wiki/BioJava:MailingLists.
    Bioinformatics 10/2008; 24(18):2096-7. · 5.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.
    BMC Bioinformatics 02/2008; 9:334. · 3.02 Impact Factor
  • Bioinformatics. 01/2008; 24:2096-2097.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The analysis of microbial genome sequences can identify protein families that provide potential drug targets for new antibiotics. With the rapid accumulation of newly sequenced genomes, this analysis has become a computationally intensive and data-intensive problem. This paper describes the development of a Web-service-enabled, component-based, architecture to support the large-scale comparative analysis of complete microbial genome sequences and the subsequent identification of orthologues and protein families (Microbase). The system is coordinated through the use of Web-service-based notifications and integrates distributed computing resources together with genomic databases to realize all-against-all comparisons for a large volume of genome sequences and to present the data in a computationally amenable format through a Web service interface. We demonstrate the use of the system in searching for orthologues and candidate protein families, which ultimately could lead to the identification of potential therapeutic targets.
    IEEE Transactions on Information Technology in Biomedicine 08/2007; 11(4):435-42. · 1.98 Impact Factor
  • Source
    AL Lister, M Pocock, A Wipat
    [Show abstract] [Hide abstract]
    ABSTRACT: The creation of quantitative, simulatable, Systems Biology Markup Language (SBML) models that accurately simulate the system under study is a time-intensive manual process that requires careful checking. Currently, the rules and constraints of model creation, curation, and annotation are distributed over at least three separate documents: the SBML schema document (XSD), the Systems Biology Ontology (SBO), and the ?Structures and Facilities for Model Definition? document. The latter document contains the richest set of constraints on models, and yet it is not amenable to computational processing. We have developed a Web Ontology Language (OWL) knowledge base that integrates these three structure documents, and that contains a representative sample of the information contained within them. This Model Format OWL (MFO) performs both structural and constraint integration and can be reasoned over and validated. SBML Models are represented as individuals of OWL classes, resulting in a single computationally a
    Journal of Integrative Bioinformatics. 01/2007; 4:80.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Life sciences research is based on individuals, often with diverse skills, assembled into research groups. These groups use their specialist expertise to address scientific problems. The in silico experiments undertaken by these research groups can be represented as workflows involving the co-ordinated use of analysis programs and information repositories that may be globally distributed. With regards to Grid computing, the requirements relate to the sharing of analysis and information resources rather than sharing computational power. The myGrid project has developed the Taverna Workbench for the composition and execution of workflows for the life sciences community. This experience paper describes lessons learnt during the development of Taverna. A common theme is the importance of understanding how workflows fit into the scientists' experimental context. The lessons reflect an evolving understanding of life scientists' requirements on a workflow environment, which is relevant to other areas of data intensive and exploratory science. Copyright © 2005 John Wiley & Sons, Ltd.
    Concurrency and Computation Practice and Experience 01/2006; 18:1067-1100. · 0.85 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome comparison and analysis can reveal the structures and junctions of genome sequences of different species. As more genomes are sequenced, genomic data sources are rapidly increasing such that their analysis is beyond the processing capabilities of most research institutes. The grid is a powerful solution to support large-scale genomic data processing and genome analysis. This paper presents the Microbase project that is developing a grid-based system for genome comparison and analysis, and discusses the first implementation of the system (called MicrobaseLite). MicrobaseLite uses a scalable computing environment to support computationally intensive microbial genome comparison and analysis, employing state-of-the-art technologies of Web services, notification, comparative genomics and parallel computing. Microbase will support not only system-defined genome comparison and analysis but also user-defined, remotely conceived genome analysis.
    Cluster Computing and the Grid, 2005. CCGrid 2005. IEEE International Symposium on; 06/2005

Publication Stats

3k Citations
189.56 Total Impact Points

Institutions

  • 2004–2014
    • Newcastle University
      • • School of Computing Science
      • • Institute for Cell and Molecular Biosciences
      Newcastle-on-Tyne, England, United Kingdom
  • 2006
    • University of Newcastle
      • Department of Computer Science
      Newcastle, New South Wales, Australia
  • 1998–2000
    • Wellcome Trust Sanger Institute
      • Cancer Genome Project
      Cambridge, England, United Kingdom