-
[show abstract]
[hide abstract]
ABSTRACT: Provenance is a critical ingredient for establishing trust of published
scientific content. This is true whether we are considering a data set, a
computational workflow, a peer-reviewed publication or a simple scientific
claim with supportive evidence. Existing vocabularies such as DC Terms and the
W3C PROV-O are domain-independent and general-purpose and they allow and
encourage for extensions to cover more specific needs. We identify the specific
need for identifying or distinguishing between the various roles assumed by
agents manipulating digital artifacts, such as author, contributor and curator.
We present the Provenance, Authoring and Versioning ontology (PAV): a
lightweight ontology for capturing just enough descriptions essential for
tracking the provenance, authoring and versioning of web resources. We argue
that such descriptions are essential for digital scientific content. PAV
distinguishes between contributors, authors and curators of content and
creators of representations in addition to the provenance of originating
resources that have been accessed, transformed and consumed. We explore five
projects (and communities) that have adopted PAV illustrating their usage
through concrete examples. Moreover, we present mappings that show how PAV
extends the PROV-O ontology to support broader interoperability.
The authors strived to keep PAV lightweight and compact by including only
those terms that have demonstrated to be pragmatically useful in existing
applications, and by recommending terms from existing ontologies when
plausible.
We analyze and compare PAV with related approaches, namely Provenance
Vocabulary, DC Terms and BIBFRAME. We identify similarities and analyze their
differences with PAV, outlining strengths and weaknesses of our proposed model.
We specify SKOS mappings that align PAV with DC Terms.
04/2013;
-
Proceedings of the Joint EDBT/ICDT 2013 Workshops; 01/2013
-
Khalid Belhajjame,
Oscar Corcho,
Daniel Garijo,
Jun Zhao,
Paolo Missier,
David Newman,
Raúl Palma,
Sean Bechhofer,
Esteban García Cuesta,
José Manuel Gómez-Pérez,
Graham Klyne,
Kevin Page,
Marco Roos,
José Enrique Ruiz,
Stian Soiland-Reyes,
Lourdes Verdes-Montenegro,
David De Roure,
Carole A Goble
[show abstract]
[hide abstract]
ABSTRACT: A workflow-centric research object bundles a workflow, the provenance of the results obtained by its enactment, other digital objects that are relevant for the experiment (papers, datasets, etc.), and anno-tations that semantically describe all these objects. In this paper, we propose a model to specify workflow-centric research objects, and show how the model can be grounded using semantic technologies and exist-ing vocabularies, in particular the Object Reuse and Exchange (ORE) model and the Annotation Ontology (AO). We describe the life-cycle of a research object, which resembles the life-cycle of a scientific experiment.
Second International Conference on the Future of Scholarly Communication and Scientific Publishing Sepublica2012., Crete; 05/2012
-
T. Large-Scale Data- and Knowledge-Centered Systems. 01/2012; 5:126-157.
-
[show abstract]
[hide abstract]
ABSTRACT: Some of the shared digital artefacts of digital research are executable in the sense that they describe an automated pro-cess which generates results. One example is the compu-tational scientific workflow which is used to conduct auto-mated data analysis, predictions and validations. We de-scribe preservation challenges of scientific workflows, and suggest a framework to discuss the reproducibility of work-flow results. We describe curation techniques that can be used to avoid the 'workflow decay' that occurs when steps of the workflow are vulnerable to external change. Our ap-proach makes extensive use of provenance information and also considers aggregate structures called Research Objects as a means for promoting workflow preservation.
iPRES 2011 - 8th International Conference on Preservation of Digital Objects, Singapore; 11/2011
-
Proceedings of the 4th International Workshop on Semantic Web Applications and Tools for the Life Sciences; 01/2011
-
Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 12-16, 2011; 01/2011
-
IEEE 7th International Conference on E-Science, e-Science 2011, Stockholm, Sweden, December 5-8, 2011; 01/2011
-
CIDR 2011, Fifth Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 9-12, 2011, Online Proceedings; 01/2011
-
[show abstract]
[hide abstract]
ABSTRACT: The vision of dataspaces is to provide various of the benefits of classical data integration, but with reduced up-front costs,
combined with opportunities for incremental refinement, enabling a “pay as you go” approach. As such, dataspaces join a long
stream of research activities that aim to build tools that simplify integrated access to distributed data. To address dataspace
challenges, many different techniques may need to be considered: data integration from multiple sources, machine learning
approaches to resolving schema heterogeneity, integration of structured and unstructured data, management of uncertainty,
and query processing and optimization. Results that seek to realize the different visions exhibit considerable variety in
their contexts, priorities and techniques. This chapter presents a classification of the key concepts in the area, encouraging
the use of consistent terminology, and enabling a systematic comparison of proposals. This chapter also seeks to identify
common and complementary ideas in the dataspace and search computing literatures, in so doing identifying opportunities for
both areas and open issues for further research.
03/2010: pages 114-134;
-
01/2010
-
EDBT 2010, 13th International Conference on Extending Database Technology, Lausanne, Switzerland, March 22-26, 2010, Proceedings; 01/2010
-
Proceedings of the 2010 EDBT/ICDT Workshops, Lausanne, Switzerland, March 22-26, 2010; 01/2010
-
Current Trends in Web Engineering - 10th International Conference on Web Engineering, ICWE 2010 Workshops, Vienna, Austria, July 2010, Revised Selected Papers; 01/2010
-
EDBT 2010, 13th International Conference on Extending Database Technology, Lausanne, Switzerland, March 22-26, 2010, Proceedings; 01/2010
-
Andrew R Jones,
Allyson L Lister,
Leandro Hermida,
Peter Wilkinson,
Martin Eisenacher, Khalid Belhajjame,
Frank Gibson,
Phil Lord,
Matthew Pocock,
Heiko Rosenfelder,
Javier Santoyo-Lopez,
Anil Wipat,
Norman W Paton
[show abstract]
[hide abstract]
ABSTRACT: The Functional Genomics Experiment data model (FuGE) has been developed to increase the consistency and efficiency of experimental data modeling in the life sciences, and it has been adopted by a number of high-profile standardization organizations. FuGE can be used: (1) directly, whereby generic modeling constructs are used to represent concepts from specific experimental activities; or (2) as a framework within which method-specific models can be developed. FuGE is both rich and flexible, providing a considerable number of modeling constructs, which can be used in a range of different ways. However, such richness and flexibility also mean that modelers and application developers have choices to make when applying FuGE in a given context. This paper captures emerging best practice in the use of FuGE in the light of the experience of several groups by: (1) proposing guidelines for the use and extension of the FuGE data model; (2) presenting design patterns that reflect recurring requirements in experimental data modeling; and (3) describing a community software tool kit (STK) that supports application development using FuGE. We anticipate that these guidelines will encourage consistent usage of FuGE, and as such, will contribute to the development of convergent data standards in omics research.
Omics: a journal of integrative biology 06/2009; 13(3):239-51. · 2.29 Impact Factor
-
Conference Proceeding:
Dataspaces.
Search Computing: Challenges and Directions [outcome of the first SeCO Workshop on Search Computing Challenges and Directions, Como, Italy, June 17-19, 2009]; 01/2009
-
Dataspace: The Final Frontier, 26th British National Conference on Databases, BNCOD 26, Birmingham, UK, July 7-9, 2009. Proceedings; 01/2009
-
Advanced Information Systems Engineering, 21st International Conference, CAiSE 2009, Amsterdam, The Netherlands, June 8-12, 2009. Proceedings; 01/2009
-
[show abstract]
[hide abstract]
ABSTRACT: The Functional Genomics Experiment Object Model (FuGE) supports modelling of experimental processes either directly or through extensions that specialize FuGE for use in specific contexts. FuGE applications commonly include components that capture, store and search experiment descriptions, where the requirements of different applications have much in common.
We describe a toolkit that supports data capture, storage and web-based search of FuGE experiment models; the toolkit can be used directly on FuGE compliant models or configured for use with FuGE extensions. The toolkit is illustrated using a FuGE extension standardized by the proteomics standards initiative, namely GelML.
The toolkit and a demonstration are available at http://code.google.com/p/fugetoolkit
Bioinformatics 10/2008; 24(22):2647-9. · 5.47 Impact Factor