Katy J. Wolstencroft

Katy J. Wolstencroft
Leiden University | LEI · Leiden Institute of Advanced Computer Science

PhD, MSc, BSc

About

65
Publications
13,375
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,551
Citations
Additional affiliations
October 2004 - August 2013
The University of Manchester
Position
  • PostDoc Position

Publications

Publications (65)
Article
Full-text available
The goal of developing therapies and dosage regimes for characterized subgroups of the general population can be facilitated by the use of simulation models able to incorporate information about inter-individual variability in drug disposition (pharmacokinetics), toxicity and response effect (pharmacodynamics). Such observed variability can have mu...
Article
Full-text available
A recent community survey conducted by Infrastructure for Systems Biology Europe (ISBE) informs requirements for developing an efficient infrastructure for systems biology standards, data and model management. © 2015 The Authors. Published under the terms of the CC BY 4.0 license.
Article
Full-text available
Data-driven research requires many people from different domains to collaborate efficiently. The domain scientist collects and analyzes scientific data, the data scientist develops new techniques, and the tool developer implements, optimizes and maintains existing techniques to be used throughout science and industry. Today, however, this data scie...
Article
Full-text available
Background Systems biology research typically involves the integration and analysis of heterogeneous data types in order to model and predict biological processes. Researchers therefore require tools and resources to facilitate the sharing and integration of data, and for linking of data to systems biology models. There are a large number of public...
Article
Full-text available
One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, an...
Article
Full-text available
The Taverna workflow tool suite (http://www.taverna.org.uk) is designed to combine distributed Web Services and/or local tools into complex analysis pipelines. These pipelines can be executed on local desktop machines or through larger infrastructure (such as supercomputers, Grids or cloud environments), using the Taverna Server. In bioinformatics,...
Article
The increase in volume and complexity of biological data has led to increased requirements to reuse that data. Consistent and accurate metadata is essential for this task, creating new challenges in semantic data annotation and in the constriction of terminologies and ontologies used for annotation. The BioSharing community are developing standards...
Article
Katy Wolstencroft is a Research Fellow in the School of Computer Science, University of Manchester and a visiting researcher in the Molecular Cell Physiology group at the Vrije Universiteit, Amsterdam. She has a PhD and MSc in Bioinformatics from the University of Manchester, and a BSc in Biochemistry from the University of Leeds. Katy’s work is p...
Conference Paper
RightField is a Java application that provides a mechanism for embedding ontology annotation support for scientific data in Microsoft Excel or Open Office spreadsheets. The result is semantic annotation by stealth, with an annotation process that is less error-prone, more efficient, and more consistent with community standards. By automatically gen...
Conference Paper
The interpretation and integration of experimental data depends on consistent metadata and uniform annotation. However, there are many barriers to the acquisition of this rich semantic metadata, not least the overhead and complexity of its collection by scientists. We present RightField, a lightweight spreadsheet-based annotation tool for lowering...
Article
The combination of highly complex biology problems and varying IT skills among life scientists poses a unique challenge in designing bioinformatics programs. The set of tools and initiatives described in this work shows new ways of making life science workflows more accessible to the community. Our aim is to help bioinformaticians help biologists....
Article
Full-text available
To make full use of research data, the bioscience community needs to adopt technologies and reward mechanisms that support interoperability and promote the growth of an open 'data commoning' culture. Here we describe the prerequisites for data commoning and present an established and growing ecosystem of solutions using the shared 'Investigation-St...
Article
Full-text available
Background Ontologies are being developed for the life sciences to standardise the way we describe and interpret the wealth of data currently being generated. As more ontology based applications begin to emerge, tools are required that enable domain experts to contribute their knowledge to the growing pool of ontologies. There are many barriers tha...
Conference Paper
Full-text available
In this position paper we present a set of best practices for workflow design to prevent workflow decay and increase reuse and re-purposing of scientific workflows. MyExperiment provides access to a large number of scientific workflows. However, scientists find it difficult to reuse or re-purpose these workflows for mainly two reasons: workflows su...
Article
Systems biology research is typically performed by multidisciplinary groups of scientists, often in large consortia and in distributed locations. The data generated in these projects tend to be heterogeneous and often involves high-throughput “omics” analyses. Models are developed iteratively from data generated in the projects and from the literat...
Conference Paper
The paper presents a novel web-based platform for experimental workflow development in historical document digitisation and analysis. The platform has been developed as part of the IMPACT project, providing a range of tools and services for transforming physical documents into digital resources. It explains the main drivers in developing the techni...
Article
Full-text available
We have developed an online model constructor and validator called OneStop, which is compliant with SBGN, SBML and MIRIAM standards. Key features of OneStop are: 1) a human readable input form (in addition to SBML upload and saving); 2) live visualization (SBGN graphics) of the reaction network during the construction phase; and 3) online access fr...
Article
Full-text available
Motivation: In the Life Sciences, guidelines, checklists and ontologies describing what metadata is required for the interpretation and reuse of experimental data are emerging. Data producers, however, may have little experience in the use of such standards and require tools to support this form of data annotation. Results: RightField is an open...
Conference Paper
Full-text available
The myExperiment social website for sharing scientific workflows, designed according to Web 2.0 principles, has grown to be the largest public repository of its kind. It is distinctive for its focus on sharing methods, its researcher-centric design and its facility to aggregate content into sharable `research objects'. This evolution of myExperimen...
Article
Full-text available
The Open Source movement and its technologies are popular in the bioinformatics community because they provide freely available tools and resources for research. In order to feed the steady demand for updates on software and associated data, a service infrastructure is required for sharing and providing these tools to heterogeneous computing enviro...
Article
Full-text available
We present Populous, a tool for gathering content with which to populate an ontology. Domain experts need to add content, that is often repetitive in its form, but without having to tackle the underlying ontological representation. Populous presents users with a table based form in which columns are constrained to take values from particular ontolo...
Article
Full-text available
The myExperiment social website for sharing scientific workflows, designed according to Web 2.0 principles, has grown to be the largest public repository of its kind. It is distinctive for its focus on sharing methods, its researcher-centric design and its facility to aggregate content into sharable 'research objects'. This evolution of myExperimen...
Conference Paper
Computational and data-intensive science increasingly depends on a large Web Service infrastructure, as services that provide a broad array of functionality can be composed into workflows to address complex research questions. In this context, the goal of service registries is to offer accurate search and discovery functions to scientists. Their ef...
Article
Full-text available
The use of Web Services to enable programmatic access to on-line bioinformatics is becoming increasingly important in the Life Sciences. However, their number, distribution and the variable quality of their documentation can make their discovery and subsequent use difficult. A Web Services registry with information on available services will help t...
Article
Full-text available
We present an extensible software model for the genotype and phenotype community, XGAP. Readers can download a standard XGAP (http://www.xgap.org) or auto-generate a custom version using MOLGENIS with programming interfaces to R-software and web-services or user interfaces for biologists. XGAP has simple load formats for any type of genotype, epige...
Article
Automation in science is increasingly marked by the use of workflow technology. The sharing of workflows through repositories supports the verifability, reproducibility and extensibility of computational experiments. However, the subsequent discovery of workflows remains a challenge, both from a sociological and technological viewpoint. Based on a...
Conference Paper
Life science workflow systems are developed to help life scientists to conveniently connect various programs and web services. In practice however, much time is spent on data conversion, because web services provided by different organisations use different data formats. We have analysed all the Taverna workflows available at the myExperiment web s...
Article
Full-text available
Web Services have gained a momentum as a means for packaging existing data and computational resources in a form that is amenable for use and composition by third party applications. The life science community is certainly among the first adopters of Web Services. For example, "Taverna":http://www.mygrid.org.uk, a workflow workbench that is popular...
Article
Workflows systems are steadily finding their way into the work practices of scientists. This is particularly true in the in silico science of bioinformatics, where biological data can be processed by Web Services. In this paper, we investigate the potential of evolving the users’ interaction with workflow environments so that it more closely relate...
Article
Full-text available
In bioinformatics, we are familiar with the idea of curated data as a prerequisite for data integration. We neglect, often to our cost, the curation and cataloguing of the processes that we use to integrate and analyse our data. Programmatic access to services, for data and processes, means that compositions of services can be made that represent t...
Article
Full-text available
The torrent of data emerging from the application of new technologies to functional genomics and systems biology can no longer be contained within the traditional modes of data sharing and publication with the consequence that data is being deposited in, distributed across and disseminated through an increasing number of databases. The resulting fr...
Conference Paper
Full-text available
A growing array of biotechnologies is being used to study the genetics of complex biomolecular traits in laboratory mice as models for human disease. Combined analysis of these datasets provides much of the power of the approach of functional genomics but this depends on the ability of databases to exchange data with each other and with analytical...
Conference Paper
Full-text available
There seems to be a general consensus on the crucial role metadata can play for enhancing the functionalities of scientific workflows systems, e.g., workflow and service discovery, composition and provenance browsing, among others. However, in most cases their management is under-specified, if not left unaddressed at all. A step in this direction,...
Conference Paper
Workflows systems are steadily finding their way into the work practices of scientists. This is particularly true in the in silico science of bioinformatics, where biological data can be processed by Web services. In this paper we investigate the potential of evolving the users' interaction with workflow environments so that it more closely relates...
Article
Full-text available
It is increasingly common to combine genome-wide expression data with quantitative trait mapping data to aid in the search for sequence polymorphisms responsible for phenotypic variation. By joining these complex but different data types at the level of the biological pathway, we can take advantage of existing biological knowledge to systematically...
Article
Much has been written of the facilities for ontology building and reasoning offered for ontologies expressed in the Web Ontology Language (OWL). Less has been written about how the modelling requirements of different areas of interest are met by OWL-DL's underlying model of the world. In this paper we use the disciplines of biology and bioinformati...
Conference Paper
In silico experiments have hitherto required ad hoc collections of scripts and programs to process and visualise biological data, consuming substantial amounts of time and effort to build, and leading to tools that are difficult to use, are architecturally fragile and scale poorly. With examples of the systems applied to real biological problems, w...
Article
Full-text available
(my)Grid supports in silico experiments in the life sciences, enabling the design and enactment of workflows as well as providing components to assist service discovery, data and metadata management. The (my)Grid ontology is one component in a larger semantic discovery framework for the identification of the highly distributed and heterogeneous bio...
Article
Full-text available
It is increasingly common to combine Microarray and Quantitative Trait Loci data to aid the search for candidate genes responsible for phenotypic variation. Workflows provide a means of systematically processing these large datasets and also represent a framework for the re-use and the explicit declaration of experimental methods. In this article,...
Article
Full-text available
Much has been written of the facilities for ontology building and reasoning offered for ontologies expressed in the Web Ontology Language (OWL). Less has been written about how the modelling requirements of different areas of interest are met by OWL-DL's underlying model of the world. In this paper we use the disciplines of biology and bioinformati...
Conference Paper
Full-text available
Knowledge artifacts that have been labeled as ontologies have many different qualities and intended outcomes. This is particularly true of bio-ontologies where high demand has led to a rapid growth in the number of these artifacts. Good communication between the human agents involved in the life cycle of ontologies is essential for the ontologist t...
Chapter
Full-text available
The core part of the Web Ontology Language (OWL) is based on Description Logic (DL) theory, which has been investigated for more than 25 years. OWL reasoning systems offer various DL-based inference services such as (i) checking class descriptions for consistency and automatically organizing them into classification hierarchies, (ii) checking descr...
Chapter
Life Science research has extended beyond in vivo and in vitro bench-bound science to incorporate in silico knowledge discovery, using resources that have been developed over time by different teams for different purposes and in different forms. The myGrid project has developed a set of software components and a workbench, Taverna, for building, ru...
Article
myGrid supports in silico experiments in the life sciences, enabling the design and enactment of workflows as well as providing components to assist service discovery, data and metadata management. The myGrid ontology is one component in a larger semantic discovery framework for the identification of the highly distributed and heterogeneous bioinfo...
Article
myGrid supports in silico experiments in the life sciences, enabling the design and enactment of workflows as well as providing components to assist service discovery, data and metadata management. The myGrid ontology is one component in a larger semantic discovery framework for the identification of the highly distributed and heterogeneous bioinfo...
Article
Full-text available
The classification of proteins expressed by an organism is an important step in understanding the molecular biology of that organism. Traditionally, this classification has been performed by human experts. Human knowledge can recognise the functional properties that are sufficient to place an individual gene product into a particular protein family...
Article
Full-text available
Taverna is an application that eases the use and integration of the growing number of molecular biology tools and databases available on the web, especially web services. It allows bioinformaticians to construct workflows or pipelines of services to perform a range of different analyses, such as sequence analysis and genome annotation. These high-l...
Conference Paper
Full-text available
The Digital Database for Screening Mammography (DDSM) is an invaluable resource for digital mammography research. However, there are two particular shortcomings that can pose a significant barrier to many of those who may want to use the resource: 1) the actual mammographic image data is encoded using a non-standard lossless variant of the JPEG ima...
Conference Paper
Full-text available
We show how state-of-the-art Semantic Web technology can be used in e-Science, in particular to automate the classification of pro- teins in biology. We show that the resulting classification was of compara- ble quality to one performed by a human expert, and how investigations using the classified data even resulted in the discovery of significant...
Article
Full-text available
Protein family databases provide a central focus for scientific communities as well as providing useful resources to aide research. However, such resources require constant curation and often become outdated and discontinued. We have developed an ontology-driven system for capturing and managing protein family data that addresses the problems of ma...
Conference Paper
Full-text available
The Taverna e-Science Workbench is a central component of myGrid, a loosely coupled suite of middleware services designed to support in silico experiments in biology. Taverna enables the construction and enactment of complex workflows over resources on local and remote machines, allowing the automation of otherwise labour-intensive multi-step bioin...
Article
PhosphaBase is an ontology-driven database resource containing information on the protein phosphatase family. It is the first public resource dedicated to protein phosphatases, which are enzymes that perform dephosphorylation reactions. In conjunction with the phosphorylation action of protein kinases, phosphatases are involved in important control...
Article
Full-text available
We have investigated a family with an autosomal dominant form of spondyloepiphyseal dysplasia (SED) characterised by short stature and severe premature degenerative arthropathy. Previous studies have excluded linkage between this condition and the locus for the type II collagen gene. Here we report the identification of linkage between this disorde...
Article
Computational biology manifests itself in many flavours. It comprises the data analysis and -management of sequences, structures, the observed and synthetical variants of the prior, static or dynamic interactions, and serves the modelling of biological processes in physiological and pathophysiological conditions. The field gained an enormous moment...
Article
Full-text available
We present Populous, an open source application for gathering content for an ontology and populating that ontology en masse. Populous presents authors with a table-based form where columns are tied to take values from particular on-tologies; the user can select a concept from an ontology via its meaningful label to give a value for a given entity....
Article
The Digital Database for Screening Mammography (DDSM) is an invaluable resource for digital mam-mography research. However, there are two particular shortcomings that can pose a significant barrier to many of those who may want to use the resource: 1) the actual mammographic image data is encoded using a non-standard lossless variant of the JPEG im...

Network