Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Purpose – The purpose of this paper is to present a four-level architecture that aims at integrating, publishing and retrieving ecological data making use of linked data (LD). It allows scientists to explore taxonomical, spatial and temporal ecological information, access trophic chain relations between species and complement this information with other data sets published on the Web of data. The development of ecological information repositories is a crucial step to organize and catalog natural reserves. However, they present some challenges regarding their effectiveness to provide a shared and global view of biodiversity data, such as data heterogeneity, lack of metadata standardization and data interoperability. LD rose as an interesting technology to solve some of these challenges. Design/methodology/approach – Ecological data, which is produced and collected from different media resources, is stored in distinct relational databases and published as RDF triples, using a relational-Resource Description Format mapping language. An application ontology reflects a global view of these datasets and share with them the same vocabulary. Scientists specify their data views by selecting their objects of interest in a friendly way. A data view is internally represented as an algebraic scientific workflow that applies data transformation operations to integrate data sources. Findings – Despite of years of investment, data integration continues offering scientists challenges in obtaining consolidated data views of a large number of heterogeneous scientific data sources. The semantic integration approach presented in this paper simplifies this process both in terms of mappings and query answering through data views. Social implications – This work provides knowledge about the Guanabara Bay ecosystem, as well as to be a source of answers to the anthropic and climatic impacts on the bay ecosystem. Additionally, this work will enable evaluating the adequacy of actions that are being taken to clean up Guanabara Bay, regarding the marine ecology. Originality/value – Mapping complexity is traded by the process of generating the exported ontology. The approach reduces the problem of integration to that of mappings between homogeneous ontologies. As a byproduct, data views are easily rewritten into queries over data sources. The architecture is general and although applied to the ecological context, it can be extended to other domains.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Web data integration and extraction using semantic technologies are extended to many knowledge areas as is illustrated in [12], where ecological data from the Guanabara Bay ecosystem are stored in distinct relational databases, thus there is the heterogeneity, lack of metadata standardization, and reduced interoperability. To tackle these problems, a four-level architecture is proposed to integrate, publish and retrieve ecological data from repositories using linked data; data are published as RDF triples using a relational-Resource Description Format mapping language and an application ontology to integrate a common vocabulary and a global view of the generated datasets. ...
Article
Full-text available
The paper describes the design and implementation of a semantic web service that retrieves theses and extends the keyword based-search of a DSpace repository taking into account the roles of advisors and steering committee members formally represented into a custom-made ontology. The service uses SPARQL queries and the serialization module of RDF DSpace, this links the item submission process and the ontology, thus the more theses are added into a repository, the more instances are inserted into the ontology. The paper provides empirical insights about how to reuse theses metadata and includes the results of an exploratory and self-management survey of usability heuristic evaluation of a web site that enables to access the proposed service. Heuristics were estimated with a purposive sample of students, teachers, and managers, the results indicated a high satisfaction level and showed that the service increased theses accessibility in the web environment. The service also generates semantically enriched datasets that coexist with the repository, they are of utility and value to educational organizations as they give institutional visibility.
... The semantic richness of ontology provoked a broad study in its applications. Ontology is mainly used in knowledge integration [2], knowledge reuse [3], and for knowledge disambiguation [4]. In the legal domain, ontologies enjoy quite some reputation as a way to model knowledge about laws and jurisprudence. ...
Conference Paper
Full-text available
With today's innovative technology, many courts across the country are moving to paperless systems and this resulted in the tremendous increase in the number of e-judgments. The sheer volume and the heterogeneous nature of the e-judgments demands text-mining techniques to extract legal relations from e-judgments. Relation extraction plays a major role in the legal domain by answering queries and finding out similar cases. Though many methods are available for extracting relations from natural language text, ontology based relation extraction is ideal for domain-specific tasks. In this paper, we present a legal case ontology that incorporates the concepts and relations present in the legal case domain by including the relevant terms from a set of real-life judicial decisions. The proposed ontology aims to support the extraction of domain specific taxonomic and non-taxonomic relations from e-judgments. Later, these significant relations would be useful for Text Summarization, Question Answering, and legal case based reasoning.
... The semantic richness provokes a broad study in ontology applications. For instance, the study of ontology as knowledge integration [1], ontology for knowledge reuse [2], and ontology as a support in solving knowledge disambiguation [3]. ...
Article
The lack of synthesized information regarding biodiversity is a major problem among researchers, leading to a pervasive cycle where ecologists make field campaigns to collect information that already exists and yet has not been made available for a broader audience. This problem leads to long-lasting effects in public policies such as spending money multiple times to conduct similar studies in the same area. We aim to identify this knowledge gap by synthesizing information available regarding two Brazilian long-term biodiversity programs and the metadata generated by them. Using a unique dataset containing 1904 metadata, we identified patterns of metadata distribution and intensity of research conducted in Brazil, as well as where we should concentrate research efforts in the next decades. We found that the majority of metadata were about vertebrates, followed by plants, invertebrates, and fungi. Caatinga was the biome with least metadata, and that there's still a lack of information regarding all biomes in Brazil, with none of them being sufficiently sampled. We hope that these results will have implications for broader conservation and management guiding, as well as to funding allocation programs.
Article
Full-text available
Ontology-based data access (OBDA) is a novel paradigm for accessing large data repositories through an ontology, that is a formal description of a domain of interest. Supporting the management of OBDA applications poses new challenges, as it requires to provide effective tools for (i) allowing both expert and non-expert users to analyze the OBDA specification, (ii) collaboratively documenting the ontology, (iii) exploiting OBDA services, such as query answering and automated reasoning over ontologies, e.g., to support data quality check, and (iv) tuning the OBDA application towards optimized performances. To fulfill these challenges, we have built a novel system, called MASTRO STUDIO, based on a tool for automated reasoning over ontologies, enhanced with a suite of tools and optimization facilities for managing OBDA applications. To show the effectiveness of MASTRO STUDIO, we demonstrate its usage in one OBDA application developed in collaboration with the Italian Ministry of Economy and Finance.
Article
Full-text available
While the Web of Linked Data grows rapidly, the develop-ment of Linked Data applications is still cumbersome and hampered due to the lack of software libraries for accessing, integrating and cleansing Linked Data from the Web. In order to make it easier to develop Linked Data applications, we provide the LDIF -Linked Data Integration Framework. LDIF can be used as a component within Linked Data ap-plications to gather Linked Data from the Web and to trans-late the gathered data into a clean local target representa-tion while keeping track of data provenance. LDIF provides a Linked Data crawler as well as components for accessing SPARQL endpoints and remote RDF dumps. It provides an expressive mapping language for translating data from the various vocabularies that are used on the Web to a con-sistent, local target vocabulary. LDIF includes an identity resolution component which discovers URI aliases in the in-put data and replaces them with a single target URI based on flexible, user-provided matching heuristics. For prove-nance tracking, the LDIF framework employs the Named Graphs data model. LDIF contains a data quality assess-ment and a data fusion module which allow Web data to be filtered according to different data quality assessment poli-cies and provide for fusing Web data using different conflict resolution methods. In order to deal with use cases of differ-ent sizes, we provide an in-memory implementation of the LDIF framework as well as an RDF-store-backed implemen-tation and a Hadoop implementation that can be deployed on Amazon EC2.
Article
Full-text available
We aim to inform the development of decision support tools for resource managers who need to examine large complex ecosystems and make recommendations in the face of many tradeoffs and conflicting drivers. We take a semantic technology approach, leveraging background ontologies and the growing body of linked open data. In previous work, we designed and implemented a semantically enabled environmental monitoring framework called SemantEco and used it to build a water quality portal named SemantAqua. Our previous system included foundational ontologies to support environmental regulation violations and relevant human health effects. In this work, we discuss SemantEco’s new architecture that supports modular extensions and makes it easier to support additional domains. Our enhanced framework includes foundational ontologies to support modeling of wildlife observation and wildlife health impacts, thereby enabling deeper and broader support for more holistically examining the effects of environmental pollution on ecosystems. We conclude with a discussion of how, through the application of semantic technologies, modular designs will make it easier for resource managers to bring in new sources of data to support more complex use cases.
Conference Paper
Full-text available
Ontology-based data access (OBDA) is a novel paradigm for ac-cessing large data repositories through an ontology, that is a formal description of a domain of interest. Supporting the management of OBDA applications poses new challenges, as it requires to provide effective tools for (i) allowing both expert and non-expert users to analyze the OBDA specification, (ii) collaboratively documenting the ontology, (iii) exploiting OBDA services, such as query answer-ing and automated reasoning over ontologies, e.g., to support data quality check, and (iv) tuning the OBDA application towards op-timized performances. To fulfill these challenges, we have built a novel system, called MASTRO STUDIO, based on a tool for auto-mated reasoning over ontologies, enhanced with a suite of tools and optimization facilities for managing OBDA applications. To show the effectiveness of MASTRO STUDIO, we demonstrate its usage in one OBDA application developed in collaboration with the Italian Ministry of Economy and Finance.
Conference Paper
Full-text available
Linked data continues to grow at a rapid rate, but a limitation of a lot of the data that is being published is the lack of a semantic description. There are tools, such as D2R, that allow a user to quickly convert a database into RDF, but these tools do not provide a way to easily map the data into an existing ontology. This paper presents a semi-automatic approach to map structured sources to ontologies in order to build semantic descriptions (source models). Since the precise mapping is sometimes ambiguous, we also provide a graphical user interface that allows a user to interactively refine the models. The resulting source models can then be used to convert data into RDF with respect to a given ontology or to define a SPARQL end point that can be queried with respect to an ontology. We evaluated the overall approach on a variety of sources and show that it can be used to quickly build source models with minimal user interaction.
Article
Full-text available
We propose a new family of description logics (DLs), called DL-Lite, specifically tailored to capture basic ontology languages, while keeping low complexity of reasoning. Reasoning here means not only computing subsumption between concepts and checking satisfiability of the whole knowledge base, but also answering complex queries (in particular, unions of conjunctive queries) over the instance level (ABox) of the DL knowledge base. We show that, for the DLs of the DL-Litefamily, the usual DL reasoning tasks are polynomial in the size of the TBox, and query answering is LogSpace in the size of the ABox (i.e., in data complexity). To the best of our knowledge, this is the first result of polynomial-time data complexity for query answering over DL knowledge bases. Notably our logics allow for a separation between TBox and ABox reasoning during query evaluation: the part of the process requiring TBox reasoning is independent of the ABox, and the part of the process requiring access to the ABox can be carried out by an SQL engine, thus taking advantage of the query optimization strategies provided by current database management systems. Since even slight extensions to the logics of the DL-Litefamily make query answering at least NLogSpace in data complexity, thus ruling out the possibility of using on-the-shelf relational technology for query processing, we can conclude that the logics of the DL-Litefamily are the maximal DLs supporting efficient query answering over large amounts of instances.
Article
Full-text available
Biodiversity data derive from myriad sources stored in various formats on many distinct hardware and software platforms. An essential step towards understanding global patterns of biodiversity is to provide a standardized view of these heterogeneous data sources to improve interoperability. Fundamental to this advance are definitions of common terms. This paper describes the evolution and development of Darwin Core, a data standard for publishing and integrating biodiversity information. We focus on the categories of terms that define the standard, differences between simple and relational Darwin Core, how the standard has been implemented, and the community processes that are essential for maintenance and growth of the standard. We present case-study extensions of the Darwin Core into new research communities, including metagenomics and genetic resources. We close by showing how Darwin Core records are integrated to create new knowledge products documenting species distributions and changes due to environmental perturbations.
Conference Paper
Full-text available
Motivated by the ongoing success of Linked Data and the growing amount of semantic data sources available on theWeb, new challenges to query processing are emerging. Especially in distributed settings that require joining data provided by multiple sources, sophisticated optimization techniques are necessary for efficient query processing. We propose novel join processing and grouping techniques to minimize the number of remote requests, and develop an effective solution for source selection in the absence of preprocessed metadata. We present FedX, a practical framework that enables efficient SPARQL query processing on heterogeneous, virtually integrated Linked Data sources. In experiments, we demonstrate the practicability and efficiency of our framework on a set of real-world queries and data sources from the Linked Open Data cloud. With FedX we achieve a significant improvement in query performance over state-of-the-art federated query engines.
Conference Paper
Full-text available
In this contribution a system is presented, which provides access to distributed data sources using Semantic Web technology. While it was primar- ily designed for data sharing and scientific collaboration, it is regarded as a base technology useful for many other Semantic Web applications. The proposed sys- tem allows to retrieve data using SPARQL queries, data sources can register and abandon freely, and all RDF Schema or OWL vocabularies can be used to de- scribe their data, as long as they are accessible on the Web. Data heterogeneity is addressed by RDF-wrappers like D2R-Server placed on top of local information systems. A query does not directly refer to actual endpoints, instead it contains graph patterns adhering to a virtual data set. A mediator fina lly pulls and joins RDF data from different endpoints providing a transparent on-the-fly view to t he end-user. The SPARQL protocol has been defined to enable systematic dat a access to re- mote endpoints. However, remote SPARQL queries require the explicit notion of endpoint URIs. The presented system allows users to execute queries without the need to specify target endpoints. Additionally, it is po ssible to execute join and union operations across different remote endpoints. The optimization of such distributed operations is a key factor concerning the perfo rmance of the overall system. Therefore, proven concepts from database research can be applied.
Conference Paper
Full-text available
Integrated access to multiple distributed and autonomous RDF data sources is a key challenge for many semantic web applications. As a reaction to this challenge, SPARQL, the W3C Recommendation for an RDF query language, supports querying of multiple RDF graphs. However, the current standard does not provide transparent query federation, which makes query formulation hard and lengthy. Furthermore, current implementations of SPARQL load all RDF graphs mentioned in a query to the local machine. This usually incurs a large overhead in network traffic, and sometimes is simply impossible for technical or legal reasons. To overcome these problems we present DARQ, an engine for federated SPARQL queries. DARQ provides transparent query access to multiple SPARQL services, i.e., it gives the user the impression to query one single RDF graph despite the real data being distributed on the web. A service description language enables the query engine to decompose a query into sub-queries, each of which can be answered by an individual service. DARQ also uses query rewriting and cost-based query optimization to speed-up query execution. Experiments show that these optimizations significantly improve query performance even when only a very limited amount of statistical information is available. DARQ is available under GPL License at http://darq.sf.net/ .
Conference Paper
Full-text available
The Linked Data paradigm has evolved into a powerful enabler for the transition from the document-oriented Web into the Semantic Web. While the amount of data published as Linked Data grows steadily and has surpassed 25 billion triples, less than 5% of these triples are links between knowledge bases. Link discovery frameworks provide the functionality necessary to discover missing links between knowledge bases. Yet, this task requires a significant amount of time, especially when it is carried out on large data sets. This paper presents and evaluates LIMES, a novel time-efficient approach for link discovery in metric spaces. Our approach utilizes the mathematical characteristics of metric spaces during the mapping process to filter out a large number of those instance pairs that do not suffice the mapping conditions. We present the mathematical foundation and the core algorithms employed in LIMES. We evaluate our algorithms with synthetic data to elucidate their behavior on small and large data sets with different configurations and compare the runtime of LIMES with another state-of-the-art link discovery tool.
Article
Full-text available
In this paper, we present a three-level mediator based framework for linked data integration. In our approach, the mediated schema is represented by a domain ontology, which provides a conceptual representation of the application. Each relevant data source is described by a source ontology, published on the Web according to the Linked Data principles, thereby becoming part of the Web of linked data. Each source ontology is rewritten as an application ontology, whose vocabulary is restricted to be a subset of the vocabulary of the domain ontology. The three-level architecture permits dividing the mapping definition in two stages: local mappings and mediated mappings. Due to this architecture the problem of query answering can also be broken into two steps. First, the query is decomposed, using the mediated mappings, into a set of elementary sub-queries expressed in terms of the application ontologies. Then, these sub-queries are rewritten, using the local mappings, in terms of their local sources schemas. The main contribution of the paper is an algorithm for reformulating a user query into sub-queries over the data sources. The reformulation algorithm exploits inter-ontology links to return more complete query results. Our approach is illustrated by an example of a virtual store mediating access to online booksellers.
Conference Paper
Full-text available
Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interesting from a theoretical point of view. This document presents on overview of the material to be presented in a tutorial on data integration. The tutorial is focused on some of the theoretical issues that are relevant for data integration. Special attention will be devoted to the following aspects: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.
Conference Paper
Full-text available
We review the use on ontologies for the integration of heterogeneous information sources. Based on an in-depth evaluation of existing approaches to this problem we discuss how ontologies are used to support the integration task. We evaluate and compare the languages used to represent the ontologies and the use of mappings between ontologies as well as to connect ontologies with information sources. We also enquire into ontology engineering methods and tools used to develop ontologies for...
Article
Full-text available
This paper describes QEF a query evaluation framework designed to support complex applications on the grid. QEF has been extended to support querying within a number of different applications, including supporting scientific visualization and implementing a web service semantic search engine. Application requests take a form of a workflow in which tasks are represented as algebraic operators and specific data types are enveloped into a common tuple structure. The implemented system is automatically deployed into schedule grid nodes and autonomously manages query evaluation according to grid environment conditions. The generality of our approach has been tested with a number of applications leading to a full grid web service implementation available at http://codims.epfl.ch
Article
Full-text available
In the above titled paper (ibid., vol. 33, no. 8, pp. 526-543, Aug 07), there were several mistakes. The corrections are presented here.
Article
Full-text available
Recent work in Artificial Intelligence (AI) is exploring the use of formal ontologies as a way of specifying content-specific agreements for the sharing and reuse of knowledge among software entities. We take an engineering perspective on the development of such ontologies. Formal ontologies are viewed as designed artifacts, formulated for specific purposes and evaluated against objective design criteria. We describe the role of ontologies in supporting knowledge sharing activities, and then present a set of criteria to guide the development of ontologies for these purposes. We show how these criteria are applied in case studies from the design of ontologies for engineering mathematics and bibliographic data. Selected design decisions are discussed, and alternative representation choices are evaluated against the design criteria.
Article
This paper presents a framework to manage, treat and integrate ecological data in the context of the PELD project, currently in development in Brazil. These data, which are produced and collected from different resources, are stored in distinct relational databases and transformed later into RDF triples, using a traditional relational-RDF mapping. Taxonomical, spatial and trophic relations are explored by means of ontological properties, which make it possible to discover interesting information about existing marine species of different bays in the country, illustrated by SPARQL queries. Additionally, the endpoint thus generated allows data to be accessed on the Web of data, as linked data.
Conference Paper
The Linked Data cloud contains large amounts of RDF data generated from databases.
Article
Large‐scale scientific experiments based on computer simulations are typically modeled as scientific workflows, which eases the chaining of different programs. These scientific workflows are defined, executed, and monitored by scientific workflow management systems (SWfMS). As these experiments manage large amounts of data, it becomes critical to execute them in high‐performance computing environments, such as clusters, grids, and clouds. However, few SWfMS provide parallel support. The ones that do so are usually labor‐intensive for workflow developers and have limited primitives to optimize workflow execution. To address these issues, we developed workflow algebra to specify and enable the optimization of parallel execution of scientific workflows. In this paper, we show how the workflow algebra is efficiently implemented in Chiron, an algebraic based parallel scientific workflow engine. Chiron has a unique native distributed provenance mechanism that enables runtime queries in a relational database. We developed two studies to evaluate the performance of our algebraic approach implemented in Chiron; the first study compares Chiron with different approaches, whereas the second one evaluates the scalability of Chiron. By analyzing the results, we conclude that Chiron is efficient in executing scientific workflows, with the benefits of declarative specification and runtime provenance support. Copyright © 2013 John Wiley & Sons, Ltd.
Article
In this paper, we discuss the use of ontologies for data integra-tion. We consider two different settings depending on the system architecture: central and peer-to-peer data integration. Within those settings, we discuss five different cases studies that illustrate the use of ontologies in metadata representation, in global conceptualization, in high-level querying, in declarative mediation, and in mapping support. Each case study is described in detail and accompanied by examples.
Chapter
We propose a layered framework for the integration of syntactically, schematically, and semantically heterogeneous networked data sources. Their heterogeneity stems from different models (e.g., relational, XML, or RDF), different schemas within the same model, and different terms associated with the same meaning. We use a semantic based approach that uses a global ontology to mediate among the schemas of the data sources. In our framework, a query is expressed in terms of one of the data sources or of the global ontology and is then translated into subqueries on the other data sources using mappings based on a common vocabulary. Metadata representation, global conceptualization, declarative mediation, mapping support, and query processing are addressed in detail in our discussion of a case study.
Conference Paper
Research on ontology is becoming increasingly widespread in the computer science community, and its importance is being recognized in a multiplicity of research fields and application areas, including knowledge engineering, database design and integration, information retrieval and extraction. We shall use the generic term "information systems", in its broadest sense, to collectively refer to these application perspectives. We argue in this paper that so-called ontologies present their own methodological and architectural peculiarities: on the methodological side, their main peculiarity is the adoption of a highly interdisciplinary approach, while on the architectural side the most interesting aspect is the centrality of the role they can play in an information system, leading to the perspective of ontology-driven information systems.
Conference Paper
For the integration of data that resides in autonomous data sources Software AG uses ontologies. Data source ontologies describe the data sources themselves. Business ontologies provide an integrated view of the data. F-Logic rules are used to describe mappings between data objects in data source or business ontologies. Furthermore, F-Logic is used as the query language. F-Logic rules are perfectly suited to describe the mappings between objects and their properties. Some of these mapping rules can be generated automatically from the data sources meta data. Some patterns do frequently reoccur in user-defined mapping rules, for instance rules which establish inverse object relations or rules which create new object relations based on the objects' property values. Within our first project access to information is still typical data retrieval and not so much knowledge inference. Therefore, a lot of effort in this project concentrated on query functionality and even more on performance. But these are only first steps. To strengthen this development and to get more experience in this field Software AG recently joined several EU research projects which all have a focus on exploitation of semantic technology with concrete business cases
Article
Research in ecology increasingly relies on the integration of small, focused studies, to produce larger datasets that allow for more powerful, synthetic analyses. The results of these synthetic analyses are critical in guiding decisions about how to sustainably manage our natural environment, so it is important for researchers to effectively discover relevant data, and appropriately integrate these within their analyses. However, ecological data encompasses an extremely broad range of data types, structures, and semantic concepts. Moreover, ecological data is widely distributed, with few well-established repositories or standard protocols for their archiving and retrieval. These factors make the discovery and integration of ecological data sets a highly labor-intensive task. Metadata standards such as the Ecological Metadata Language and Darwin Core are important steps for improving our ability to discover and access ecological data, but are limited to describing only a few, relatively specific aspects of data content (e.g., data owner and contact information, variable “names”, keyword descriptions, etc.). A more flexible and powerful way to capture the semantic subtleties of complex ecological data, its structure and contents, and the inter-relationships among data variables is needed.
Article
Biodiversity research requires associating data about living beings and their habitats, constructing sophisticated models and correlating all kinds of information. Data handled are inherently heterogeneous, being provided by distinct (and distributed) research groups, which collect these data using different vocabularies, assumptions, methodologies and goals, and under varying spatio-temporal frames. Ontologies are being adopted as one of the means to alleviate these heterogeneity problems, thus helping cooperation among researchers. While ontology toolkits offer a wide range of operations, they are self-contained and cannot be accessed by external applications. Thus, the many proposals for adopting ontologies to enhance interoperability in application development are either based on the use of ontology servers or of ontology frameworks. The latter support many functions, but impose application recoding whenever ontologies change, whereas the first supports ontology evolution, but for a limited set of functions.This paper presents Aondê—a Web service geared towards the biodiversity domain that combines the advantages of frameworks and servers, supporting ontology sharing and management on the Web. By clearly separating storage concerns from semantic issues, the service provides independence between ontology evolution and the applications that need them. The service provides a wide range of basic operations to create, store, manage, analyze and integrate multiple ontologies. These operations can be repeatedly invoked by client applications to construct more complex manipulations. Aondê has been validated for real biodiversity case studies.
Article
The Ecological Metadata Language is an effective specification for describing data for long-term storage and interpretation. When used in conjunction with a metadata repository such as Metacat, and a metadata editing tool such as Morpho, the Ecological Metadata Language allows a large community of researchers to access and to share their data. Although the Ecological Metadata Language/Morpho/Metacat toolkit provides a rich data documentation mechanism, current methods for retrieving metadata-described data can be laborious and time consuming. Moreover, the structural and semantic heterogeneity of ecological data sets makes the development of custom solutions for integrating and querying these data prohibitively costly for large-scale synthesis. The Data Manager Library leverages the Ecological Metadata Language to provide automated data processing features that allow efficient data access, querying, and manipulation without custom development. The library can be used for many data management tasks and was designed to be immediately useful as well as extensible and easy to incorporate within existing applications. In this paper we describe the motivation for developing the Data Manager Library, provide an overview of its implementation, illustrate ideas for potential use by describing several planned and existing deployments, and describe future work to extend the library.
Article
The emergence of web standards for representing the semantics of information has enabled a new level of integration of information resources. In this paper we will discuss a number of issues related to semantic integration and the techniques we employed for integrating disparate resources. We will detail our approach, which includes the use of ontologies, information mapping and distributed queries. We will also show how we used RDF to represent different types of metadata within the system. Additionally, we will discuss some techniques for using certain types of semantic information to assist in the navigation of very large search spaces. In particular we will show how the tools we developed support an interactive querying paradigm for browsing information.
Article
Semantic integration is an active area of research in several disciplines, such as databases, information-integration, and ontologies. This paper provides a brief survey of the approaches to semantic integration developed by researchers in the ontology community. We focus on the approaches that differentiate the ontology research from other related areas. The goal of the paper is to provide a reader who may not be very familiar with ontology research with introduction to major themes in this research and with pointers to different research projects. We discuss techniques for finding correspondences between ontologies, declarative ways of representing these correspondences, and use of these correspondences in various semantic-integration tasks.
Book
The World Wide Web has enabled the creation of a global information space comprising linked documents. As the Web becomes ever more enmeshed with our daily lives, there is a growing desire for direct access to raw data not currently available on the Web or bound up in hypertext documents. Linked Data provides a publishing paradigm in which not only documents, but also data, can be a first class citizen of the Web, thereby enabling the extension of the Web with a global data space based on open standards-the Web of Data. In this Synthesis lecture we provide readers with a detailed technical introduction to Linked Data. We begin by outlining the basic principles of Linked Data, including coverage of relevant aspects of Web architecture. The remainder of the text is based around two main themes-the publication and consumption of Linked Data. Drawing on a practical Linked Data scenario, we provide guidance and best practices on: Architectural approaches to publishing Linked Data; choosing URIs and vocabularies to identify and describe resources; deciding what data to return in a description of a resource on the Web; methods and frameworks for automated linking of data sets; and testing and debugging approaches for Linked Data deployments. We give an overview of existing Linked Data applications and then examine the architectures that are used to consume Linked Data from the Web, alongside existing tools and frameworks that enable these. Readers can expect to gain a rich technical understanding of Linked Data fundamentals, as the basis for application development, research or further study.
Article
Data integration is a perennial issue in bioinformatics, with many systems being developed and many technologies offered as a panacea for its resolution. The fact that it is still a problem indicates a persistence of underlying issues. Progress has been made, but we should ask "what lessons have been learnt?", and "what still needs to be done?" Semantic Web and Web 2.0 technologies are the latest to find traction within bioinformatics data integration. Now we can ask whether the Semantic Web, mashups, or their combination, have the potential to help. This paper is based on the opening invited talk by Carole Goble given at the Health Care and Life Sciences Data Integration for the Semantic Web Workshop collocated with WWW2007. The paper expands on that talk. We attempt to place some perspective on past efforts, highlight the reasons for success and failure, and indicate some pointers to the future.
Conference Paper
Ecological and environmental data cover a wide range of topics, from biodiversity surveys to measurements of trace gas fluxes, and are modeled using a tremendous variety of schemas. We have developed Morpho, a data management application designed to assist researchers in managing this heterogeneous collection of ecological data. Our goal in developing Morpho was to ease the burden of data management on scientists while improving access to and documentation for ecological data. Morpho allows ecological researchers to describe their data using a comprehensive and flexible metadata specification, and to share their data publicly or to specific collaborators over the Knowledge Network for Biocomplexity (KNB). Morpho's main features include: (1) flexible metadata creation and editing using an XML syntax for metadata exchange; (2) a 'wizard' interface for collecting metadata; (3) automated metadata extraction while importing data; (4) an XML editor that is configurable using multiple XML DTDs; (5) compliance with the Ecological Metadata Language; (6) powerful metadata search on the network or locally; and, (7) comprehensive revision control for data and metadata.
Article
For single databases, primary hindrances for end-user access are the volume of data that is becoming available, the lack of abstraction, and the need to understand the representation of the data. When information is combined from multiple databases, the major concern is the mismatch encountered in information representation and structure. Intelligent and active use of information requires a class of software modules that mediate between the workstation applications and the databases. It is shown that mediation simplifies, abstracts, reduces, merges, and explains data. A mediator is a software module that exploits encoded knowledge about certain sets or subsets of data to create information for a higher layer of applications. A model of information processing and information system components is described. The mediator architecture, including mediator interfaces, sharing of mediator modules, distribution of mediators, and triggers for knowledge maintenance, are discussed.< >
PostgreSQL beta release, www.postgresql.org
  • Postgresql
Postgresql: PostgreSQL beta release, www.postgresql.org/ (accessed April 2013).
Linked data-design issues " , available at: www.w3.org/DesignIssues
  • T Berners-Lee
Berners-Lee, T. (2006), " Linked data-design issues ", available at: www.w3.org/DesignIssues/ LinkedData.html (accessed April 2014).
D2R Server - publishing relational databases on the Web as SPARQL endpoints
  • C Bizer
  • T Health
  • T Berners-Lee
Bizer, C., Health, T. and Berners-Lee, T. (2006), "D2R Server -publishing relational databases on the Web as SPARQL endpoints", Proceedings of the 15th International World Wide Web Conference, Edinburgh.
W3C RDB2RDF incubator group report
  • A Malhotra
Malhotra, A. (2005), " W3C RDB2RDF incubator group report ", available at: www.w3.org/2005/ Incubator/rdb2rdf/XGR-rdb2rdf-20090126/ (accessed April 2013).
RDF primer " , W3C Recommendation, available at: www.w3.org/ TR
  • F Manola
  • E Miller
Manola, F. and Miller, E. (2004), " RDF primer ", W3C Recommendation, available at: www.w3.org/ TR/2004/REC-rdf-primer-20040210/ (accessed May 2014).
Megadiversity: earth's biologically wealthiest nations
  • R A Mittermeier
  • P R Gil
  • C G Mittermeier
Mittermeier, R.A., Gil, P.R. and Mittermeier, C.G. (1997), "Megadiversity: earth's biologically wealthiest nations", Cemex, 1st ed., Mexico (in Spanish).
SemantEco: a semantically powered modular architecture for integrating distributed environmental and ecological data”, Future Generation Computing Systems
  • E W Patton
  • P Seyed
  • P Wang
  • L Fu
  • F J Dein
  • R S Bristol
  • D L Mcguiness
Patton, E.W., Seyed, P., Wang, P., Fu, L., Dein, F.J., Bristol, R.S. and McGuiness, D.L. (2014), "SemantEco: a semantically powered modular architecture for integrating distributed environmental and ecological data", Future Generation Computing Systems, pp. 36430-36440.