Article

An ontology-based knowledge management framework for a distributed water information system

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

With the increasing complexity of hydrologic problems, data collection and data analysis are often carried out in distributed heterogeneous systems. Therefore it is critical for users to determine the origin of data and its trustworthiness. Provenance describes the information life cycle of data products. It has been recognised as one of the most promising methods to improve data transparency. However, due to the complexity of the information life cycle involved, it is a challenge to query the provenance information which may be generated by distributed systems, with different vocabularies and conventions, and may involve knowledge of multiple domains. In this paper, we present a semantic knowledge management framework that tracks and integrates provenance information across distributed heterogeneous systems. It is underpinned by the Integrated Knowledge model that describes the domain knowledge and the provenance information involved in the information life cycle of a particular data product. We evaluate the proposed framework in the context of two real-world water information systems.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Yi et al. (2011) presented an ontology and domain modeling-based design method for an integrated modeling and assessment DSSs in hydroinformatics. Liu et al. (2013b) present a generic knowledge model and knowledge management framework that captures hydrological data from different sources. They integrate four different ontologies into a single model. ...
Article
Full-text available
Big data generated by remote sensing, ground-based measurements, models and simulations, social media and crowdsourcing, and a wide range of structured and unstructured sources necessitates significant data and knowledge management efforts. Innovations and developments in information technology over the last couple of decades have made data and knowledge management possible for an insurmountable amount of data collected and generated over the last decades. This enabled open knowledge networks to be built that led to new ideas in scientific research and the business world. To design and develop open knowledge networks, ontologies are essential since they form the backbone of conceptualization of a given knowledge domain. A systematic literature review was conducted to examine research involving ontologies related to hydrological processes and water resource management. Ontologies in the hydrology domain support the comprehension, monitoring, and representation of the hydrologic cycle’s complex structure, as well as the predictions of its processes. They contribute to the development of ontology-based information and decision support systems; understanding of environmental and atmospheric phenomena; development of climate and water resiliency concepts; creation of educational tools with artificial intelligence; and strengthening of related cyberinfrastructures. This review provides an explanation of key issues and challenges in ontology development based on hydrologic processes to guide the development of next generation artificial intelligence applications. The study also discusses future research prospects in combination with artificial intelligence and hydroscience.
... An ontology [29] based on the SSN was proposed to describe the time series and assess the reliability of a hydrological sensor network. Liu et al. [30] presented a WaterML and SSN-based ontology framework that represented information from sensors, observation values, and timeseries for a distributed water information system. In summary, most related studies have been based on the SSN ontology, and the following is a comparison of the SSN ontology with the proposed ontology. ...
Article
Full-text available
The increasing deterioration of aquatic environments has attracted more attention to water quality monitoring techniques, with most researchers focusing on the acquisition and assessment of water quality data, but seldom on the discovery and tracing of pollution sources. In this study, a semantic-enhanced modeling method for ontology modeling and rules building is proposed, which can be used for river water quality monitoring and relevant data observation processing. The observational process ontology (OPO) method can describe the semantic properties of water resources and observation data. In addition, it can provide the semantic relevance among the different concepts involved in the observational process of water quality monitoring. A pollution alert can be achieved using the reasoning rules for the water quality monitoring stations. In this study, a case is made for the usability testing of the OPO models and reasoning rules by utilizing a water quality monitoring system. The system contributes to the water quality observational monitoring process and traces the source of pollutants using sensors, observation data, process models, and observation products that users can access in a timely manner.
Article
Full-text available
This article proposes a framework of linked software agents that continuously interact with an underlying knowledge graph to automatically assess the impacts of potential flooding events. It builds on the idea of connected digital twins based on the World Avatar dynamic knowledge graph to create a semantically rich asset of data, knowledge, and computational capabilities accessible to humans, applications, and artificial intelligence. We develop three new ontologies to describe and link environmental measurements and their respective reporting stations, flood events, and their potential impact on population and built infrastructure as well as the built environment of a city itself. These coupled ontologies are deployed to dynamically instantiate near real-time data from multiple fragmented sources into the World Avatar. Sequences of autonomous agents connected via the derived information framework automatically assess consequences of newly instantiated data, such as newly raised flood warnings, and cascade respective updates through the graph to ensure up-to-date insights into the number of people and building stock value at risk. Although we showcase the strength of this technology in the context of flooding, our findings suggest that this system-of-systems approach is a promising solution to build holistic digital twins for various other contexts and use cases to support truly interoperable and smart cities.
Article
Full-text available
The world is transforming into a predominantly urban space, meaning that cities have to be ready to provide services, for instance, to ensure availability and sustainable management of water and sanitation for all. In this scenario, the water quality evaluation has a crucial role and often needs multiple sources segregated. Our purpose is to build bridges between these data silos to provide an integrated and interoperable view, where different datasets can be provided and combined through knowledge graphs in order to characterize water quality. This work shows the quality of the Bogota river basin's water bodies by analyzing physicochemical and biological properties using spatio-temporal and legal elements. So, our knowledge graphs allow us to discover what, when, and where infractions happened on water quality in a river basin of the most populated cities of Latin America during a critical period (2007–2013), highlighting the presence of high values of suspended solids and nitrites, lower amounts of dissolved oxygen, and the worst water quality during the driest periods (appearing until a maximum of 63 infractions in a year). HIGHLIGHTS A new water quality ontology with three modules composed of diverse international standards.; Multi-dimensional knowledge graphs about the water quality of the Bogota river basin were developed.; The water quality characterization using spatio-temporal distribution and legal framework from an integrated and interoperable scenario.;
Article
Full-text available
It is increasingly recognized that water scarcity, rather than a lack of arable land, will be the major constraint to increase agricultural production over the next few decades. Therefore, water represents a unique agricultural asset to drive agricultural sustainability. However, its planning, management and usage are often influenced by a mix of interdependent economic, engineering, social, hydrologic, environmental, and even political factors. Such a complex interdependency suggests that a sociotechnical approach to water resources management, a subject of the field of Hydroinformatics, represents a viable path forward to achieve sustainable agriculture. Thus, this paper presents an overview of the intersection between hydroinformatics and agriculture to introduce a new research field called agricultural hydroinformatics. In addition, it proposes a general conceptual framework taking into account the distinctive features associated with the sociotechnical dimension of hydroinformatics when applied in agriculture. The framework is designed to serve as a stepping-stone to achieve, not only integrated water resources management, but also agricultural sustainability transitions in general. Using examples from agricultural water development to horticultural and livestock farming, the paper highlights facets of the framework applicability as a new paradigm on data flows/sources consideration, and information and simulation models engineering as well as integration for a holistic approach to water resources management in agriculture. Finally, it discusses opportunities and challenges associated with the implementation of agricultural hydroinformatics and the development of new research areas needed to achieve the full potential of this emerging framework. These areas include, for example, sensor deployment and development, signal processing, information modeling and storage, artificial intelligence, and new kind of simulation model development approaches.
Article
Full-text available
To determine a suitable hydrological model structure for a specific application context using integrated modelling frameworks, modellers usually need to manually select the required hydrological processes, identify the appropriate algorithm for each process, and couple the algorithms' software components. However, these modelling steps are difficult and require corresponding knowledge. It is not easy for modellers to master all of the required knowledge. To alleviate this problem, a knowledge-based method is proposed to automatically determine hydrological model structures. First, modelling knowledge for process selection, algorithm identification, and component coupling is formalized in the formats of the Rule Markup Language (RuleML) and Resource Description Framework (RDF). Second, the formalized knowledge is applied to an inference engine to determine model structures. The method is applied to three hypothetical experiments and a real experiment. These experiments show how the knowledge-based method could support modellers in determining suitable model structures. The proposed method has the potential to reduce the knowledge burden on modellers and would be conducive to the promotion of integrated modelling frameworks.
Conference Paper
A process trace describes the processes taken in a workflow to generate a particular result. Given many process traces, each with a large amount of very low level information, it is a challenge to make process traces meaningful to different users. It is more challenging to compare two complex process traces generated by heterogenous systems and have different levels of granularity. We present CTrace, a system that (1) lets users explore the conceptual abstraction of large process traces with different levels of granularity, and (2) provides semantic comparison among traces in which both the structural and the semantic similarity are considered. The above functions are underpinned by a novel notion of multi-granularity process trace and efficient multi-granularity similarity comparison algorithms.
Article
Full-text available
We aim to inform the development of decision support tools for resource managers who need to examine large complex ecosystems and make recommendations in the face of many tradeoffs and conflicting drivers. We take a semantic technology approach, leveraging background ontologies and the growing body of linked open data. In previous work, we designed and implemented a semantically enabled environmental monitoring framework called SemantEco and used it to build a water quality portal named SemantAqua. Our previous system included foundational ontologies to support environmental regulation violations and relevant human health effects. In this work, we discuss SemantEco’s new architecture that supports modular extensions and makes it easier to support additional domains. Our enhanced framework includes foundational ontologies to support modeling of wildlife observation and wildlife health impacts, thereby enabling deeper and broader support for more holistically examining the effects of environmental pollution on ecosystems. We conclude with a discussion of how, through the application of semantic technologies, modular designs will make it easier for resource managers to bring in new sources of data to support more complex use cases.
Article
Full-text available
The web, and more recently the concept and technology of the Semantic Web, has created a wealth of new ideas and innovative tools for data management, integration and computation in an open framework and at a very large scale. One area of particular interest to the science of hydrology is the capture, representation, inference and presentation of provenance information: information that helps to explain how data were computed and how they should be interpreted. This paper is among the first to bring recent developments in the management of provenance developed for e-science and the Semantic Web to the problems of hydrology. Our main result is a formal ontological model for the representation of provenance information driven by a hydrologic case study. Along the way, we support usability, extensibility and reusability for provenance representation, relying on the concept of modelling both domain-independent and domain-specific aspects of provenance. We evaluate our model with respect to its ability to satisfy identified requirements arising from the case study on streamflow forecasting for the South Esk River catchment in Tasmania, Australia.
Conference Paper
Full-text available
Sensor web applications such as real-time environmental decision support systems require the use of sensors from multiple heterogeneous sources for purposes beyond the scope of the original sensor design and deployment. In such cyberenvironments, provenance plays a critical role as it enables users to understand, verify, reproduce, and ascertain the quality of derived data products. Such capabilities are yet to be developed in many sensor web enablement (SWE) applications. This paper develops a provenance-aware “Virtual Sensor” system, where a new persistent live “virtual” sensor is re-published in realtime after some model-based computational transformations of the raw sensor data streams. We describe the underlying OPM (Open Provenance Model) API's (Application Programming Interfaces), architecture for provenance capture, creation of the provenance graph and publishing of the provenance-aware virtual sensor where the new virtual sensor time-series data is augmented with OPM-compliant provenance information. A case study on creating real-time provenance-aware virtual rainfall sensors is illustrated. Such a provenance-aware virtual sensor system allows digital preservation and verification of the new virtual sensors.
Article
Full-text available
Computational provenance--a record of the antecedents and processing history of digital information--is key to properly documenting computer-based scientific research. To support investigations in hydrologic science, we produce the daily fractional snow-covered area from NASA's moderate-resolution imaging spectroradiometer (MODIS). From the MODIS reflectance data in seven wavelengths, we estimate the fraction of each 500 m pixel that snow covers. The daily products have data gaps and errors because of cloud cover and sensor viewing geometry, so we interpolate and smooth to produce our best estimate of the daily snow cover. To manage the data, we have developed the Earth System Science Server (ES3), a software environment for data-intensive Earth science, with unique capabilities for automatically and transparently capturing and managing the provenance of arbitrary computations. Transparent acquisition avoids the scientists having to express their computations in specific languages or schemas in order for provenance to be acquired and maintained. ES3 models provenance as relationships between processes and their input and output files. It is particularly suited to capturing the provenance of an evolving algorithm whose components span multiple languages and execution environments.
Conference Paper
In the past five years, we have designed and evolved an interlingua for sharing explanations generated by various automated systems such as hybrid web-based question answering systems, text analytics, theorem proving, task processing, web services execution, rule engines, and machine learning components. In this paper, we present our recent major updates including: (i) splitting the interlingua into three modules (i.e. provenance, information manipulation or justifications, and trust) to reduce maintenance and reuse costs and to support various modularity requirements; (ii) providing representation primitives capable of representing four critical types of justifications identified in past work. We also discuss some examples of how this work can be and is being used in a variety of distributed application settings.
Conference Paper
The provenance of data has recently been recognized as central tothe trust one places in data. It is also important to annotation, todata integration and to probabilistic databases. Three workshops havebeen held on the topic, and it has been the focus of several researchprojects and prototype systems. This tutorial will attempt to providean overview of research in provenance in databases with a focus onrecent database research and technology in this area. This tutorialis aimed at a general database research audience and at people whowork with scientific data.
Article
The increasing ability for the sciences to sense the world around us is resulting in a growing need for datadriven e-Science applications that are under the control of workflows composed of services on the Grid. The focus of our work is on provenance collection for these workflows that are necessary to validate the work-flow and to determine quality of generated data products. The challenge we address is to record uniform and usable provenance metadata that meets the domain needs while minimizing the modification burden on the service authors and the performance overhead on the workflow engine and the services. The framework is based on generating discrete provenance activities during the lifecycle of a workflow execution that can be aggregated to form complex data and process provenance graphs that can span across workflows. The implementation uses a loosely coupled publish-subscribe architecture for propagating these activities, and the capabilities of the system satisfy the needs of detailed provenance collection. A performance evaluation of a prototype finds a minimal performance overhead (in the range of 1% for an eight-service workflow using 271 data products).
Article
The need to understand and manage provenance arises in almos t every scientific application. In many cases, information about provenance constitutes the proofof correctness of results that are generated by scientific applications. It also determines the quality andamount of trust one places on the results. For these reasons, the knowledge of provenance of a scientific re sult is typically regarded to be as important as the result itself. In this paper, we provide an overview ofresearch in provenance in databases and dis- cuss some future research directions. The content of this pa per is largely based on the tutorial presented at SIGMOD 2007 (11).
Article
In a service-oriented environment, heterogeneous data from distributed data archiving centers and various geo-processing services are chained together dynamically to generate on-demand data products. Creating an executable service chain requires detailed specification of metadata for data sets and service instances. Using metadata tracking, semantics-enabled metadata are generated and propagated through a service chain. This metadata can be employed to validate a service chain, e.g. whether metadata preconditions on the input data of services can be satisfied. This paper explores how this metadata can be further exploited to augment geospatial data provenance, i.e., how a geospatial data product is derived. Provenance information is automatically captured during the metadata tracking process. Semantic Web technologies, including OWL and SPARQL, are used for representation and query of this provenance information. The approach can not only contribute to the automatic recording of geospatial data provenance, but also provide a more informed understanding of provenance information using Semantic Web technologies.
Article
SUMMARY VisTrails is a new workflow and provenance management system that provides support for scientific data exploration and visualization. Whereas workflows have been traditionally used to automate repetitive tasks, for applications that are exploratory in nature, change is the norm. VisTrails uses a new change-based provenance mechanism which was designed to handle rapidly-evolving workflows. It uniformly and automatically captures provenance information for data products and for the evolution of the workflows used to generate these products. In this paper, we describe how the VisTrails provenance data is organized in layers and present a first approach for querying this data that we developed to tackle the Provenance Challenge queries.
Article
Our research focuses on creating and executing large-scale scientific workflows that often involve thousands of computations over distributed, shared resources. We describe an approach to workflow creation and refinement that uses semantic representations to 1) describe complex scientific applications in a data-independent manner, 2) automatically generate workflows of computations for given data sets, and 3) map the workflows to available computing resources for efficient execution. Our approach is implemented in the Wings/Pegasus workflow system and has been demonstrated in a variety of scientific application domains. This paper illustrates the application-level provenance information generated Wings during workflow creation and the refinement provenance by the Pegasus mapping system for execution over grid computing environments. We show how this information is used in answering the queries of the First Provenance Challenge.