Article
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Governments are publishing enormous amounts of open data on the web every day in an effort to increase transparency and reusability. Linking data from multiple sources on the web enables the performance of advanced data analytics, which can lead to the development of valuable services and data products. However, Canada's open government data portals are isolated from one another and remain unlinked to other resources on the web. In this paper, we first expose the statistical data sets in Canadian provincial open data portals as Linked Data, and then integrate them using RDF Cube vocabulary, thereby making different open data portals available through a single search endpoint. We leverage Semantic Web Technologies to publish open data sets taken from two provincial portals (Nova Scotia and Alberta) as RDF (the Linked Data format), and to connect them to one another. The success of our approach illustrates its high potential for linking open government data sets across Canada, which will in turn enable greater data accessibility and improved search results.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

Article
Full-text available
Accident, injury, and fatality rates remain disproportionately high in the construction industry. Information from past mishaps provides an opportunity to acquire insights, gather lessons learned, and systematically improve safety outcomes. Advances in data science and industry 4.0 present new unprecedented opportunities for the industry to leverage, share, and reuse safety information more efficiently. However, potential benefits of information sharing are missed due to accident data being inconsistently formatted, non-machine-readable, and inaccessible. Hence, learning opportunities and insights cannot be captured and disseminated to proactively prevent accidents. To address these issues, a novel information sharing system is proposed utilizing linked data, ontologies, and knowledge graph technologies. An ontological approach is developed to semantically model safety information and formalize knowledge pertaining to accident cases. A multi-algorithmic approach is developed for automatically processing and converting accident case data to a resource description framework (RDF), and the SPARQL protocol is deployed to enable query functionalities. Trials and test scenarios utilizing a dataset of 200 real accident cases confirm the effectiveness and efficiency of the system in improving information access, retrieval, and reusability. The proposed development facilitates a new “open” information sharing paradigm with major implications for industry 4.0 and data-driven applications in construction safety management.
Article
Full-text available
An important part of Open Data is of a statistical nature and describes economic and social indicators monitoring population size, inflation, trade, and employment. Combining and analyzing Open Data from multiple datasets and sources enable the performance of advanced data analytics scenarios that could result in valuable services and data products. However, it is still difficult to discover and combine Open Statistical Data that reside in different data portals. Although Linked Open Statistical Data (LOSD) provide standards and approaches to facilitate combining statistics on the Web, various interoperability challenges still exist. In this paper, we propose an Interoperability Framework for LOSD, comprising definitions of LOSD interoperability conflicts as well as modelling practices currently used by six official open government data portals. Towards this end, we combine a top-down approach that studies interoperability conflicts in the literature with a bottom-up approach that studies the modelling practices of data portals. We define two types of LOSD schema-level conflicts, namely naming conflicts and structural conflicts. Naming conflicts result from using different URIs. Structural conflicts result from different practices of modelling the structure of data cubes. Only two out of the 19 conflicts are currently resolved and 11 can be resolved according to literature.
Article
Full-text available
Building on the promise of open data, government agencies support a continuously growing number of open data initiatives that are driven mainly by expectations of unprecedented value generation from an underutilized resource. Although data in general have undoubtedly become an essential resource for the economy, it has remained largely unclear how, or even whether, open data repositories generate any significant value. We addressed this void with a study that examines how sustainable value is generated from open data. Subsequently, we developed a model that explains how open data generate sustainable value through two underlying mechanisms. The first, the information sharing mechanism, explicates how open data are beneficial to forging informational content that creates value for society through increased transparency and improved decision making. The second, the market mechanism, explicates how open data are beneficial as a resource in products and services offered on the market, as well as how open data are used to make processes more efficient or to satisfy previously unmet needs. We tested and validated the model using PLS with secondary quantitative data from 76 countries. The study provides empirical support to the conjecture that openness of data as well as the digital governance and digital infrastructure in a country have a positive effect on the country's level of sustainable value. Overall, the study provides empirical evidence in favor of nurturing open data culture and insights about the conditions that support turning it into sustainable value for the benefit of citizens, business organizations, and society at large.
Article
Full-text available
Purpose The purpose of this paper is to conduct a usability evaluation of governmental data portals and provide a list of best practices for improving stakeholders’ ability to discover, access, and reuse of these online information sources. Design/methodology/approach The developed methodology was based on the comprehensive literature review that resulted in a benchmarking framework of the most important criteria. A usability testing method was then applied with accordance to unique requirements of open data portals. This approach was demonstrated by using of a case study. Findings The main found weakness was a lack of support for active engagement of stakeholders. The list of best practices was introduced to improve the quality of these portals. This should help to improve the discoverability and facilitate the access to data sets in order to increase their reuse by stakeholders. Social implications The creation of appropriate open data portals aims to fulfill the principles of open government, i.e., to promote transparency and openness through the publication of government data, enhance the accountability of public officials and encourage public participation, collaboration, and cooperation of involved stakeholders. Originality/value This paper proposed a new approach for the usability evaluation of open data portals on national level from an ordinary citizen’s point of view and provided important insights on improving their quality regarding data discoverability, accessibility, and reusability.
Article
Full-text available
Open data has attracted huge attention for the construction of smart city in terms of delivering useful city information to citizens and interacting with citizens from the city council perspective. In this paper, we present an overview of the current status and issues of open data opened by different seven Canadian cities. We start by presenting the characters of open data, followed by data format conclusion and detailed dataset explaination for each Canadian city (e.g., Calgary, Halifax, Surrey, Waterloo, Ottawa, Vancouver, and Toronto) including the different data catalogues and their detailed characteristics. Next, we discuss the state-of-the-art of the tools and applications developed over each city’s open data. Here, we not only illustrate the most successful examples, but particularly consider the potential issues due to the characters of the city datasets. This paper is not only beneficial for a government which can compare its open data status with that of the Canadian cities but also quite useful for users or companies interested in tool development over open city data.
Article
Full-text available
The open government data (OGD) movement has rapidly expanded worldwide with high expectations for substantial benefits to society. However, recent research has identified considerable social and technical barriers that stand in the way of achieving these benefits. This paper uses sociotechnical systems theory and a review of open data research and practice guidelines to develop a preliminary ecosystem model for planning and designing OGD programs. Findings from two empirical case studies in New York and St. Petersburg, Russia produced an improved general model that addresses three questions: How can a given government's open data program stimulate and support an ecosystem of data producers, innovators, and users? In what ways and for whom do these the ecosystems produce benefits? Can an ecosystem approach help governments design effective open government data programs in diverse cultures and settings? The general model addresses policy and strategy, data publication and use, feedback and communication, benefit generation, and advocacy and interaction among stakeholders. We conclude that an ecosystem approach to planning and design can be widely used to assess existing conditions and to consider policies, strategies, and relationships that address realistic barriers and stimulate desired benefits.
Conference Paper
Full-text available
Statistical data is one of the most important sources of information, relevant for large numbers of stakeholders in the governmental, scientific and business domains alike. In this article, we overview how statistical data can be managed on the Web. With OLAP2DataCube and CSV2DataCube we present two complementary approaches on how to extract and publish statistical data. We also discuss the linking, repair and the visualization of statistical data. As a comprehensive use case, we report on the extraction and publishing on the Web of statistical data describing 10 years of life in Brazil.
Conference Paper
Full-text available
CubeViz is a flexible exploration and visualization platform for statistical data represented adhering to the RDF Data Cube vocabulary. If statistical data is provided adhering to the Data Cube vocabulary, CubeViz exhibits a faceted browsing widget allowing to interactively filter observations to be visualized in charts. Based on the selected structural part, CubeViz offers suitable chart types and options for configuring the visualization by users. In this demo we present the CubeViz visualization architecture and components , sketch its underlying API and the libraries used to generate the desired output. By employing advanced in-trospection, analysis and visualization bootstrapping techniques CubeViz hides the schema complexity of the encoded data in order to support a user-friendly exploration experience .
Article
Full-text available
Research on practices to share and reuse data will inform the design of infrastructure to support data collection, management, and discovery in the long tail of science and technology. These are research domains in which data tend to be local in character, minimally structured, and minimally documented. We report on a ten-year study of the Center for Embedded Network Sensing (CENS), a National Science Foundation Science and Technology Center. We found that CENS researchers are willing to share their data, but few are asked to do so, and in only a few domain areas do their funders or journals require them to deposit data. Few repositories exist to accept data in CENS research areas.. Data sharing tends to occur only through interpersonal exchanges. CENS researchers obtain data from repositories, and occasionally from registries and individuals, to provide context, calibration, or other forms of background for their studies. Neither CENS researchers nor those who request access to CENS data appear to use external data for primary research questions or for replication of studies. CENS researchers are willing to share data if they receive credit and retain first rights to publish their results. Practices of releasing, sharing, and reusing of data in CENS reaffirm the gift culture of scholarship, in which goods are bartered between trusted colleagues rather than treated as commodities.
Conference Paper
Full-text available
Government released statistical data, particularly health census data, holds valuable information which can be utilized for critical needs assessment in public health policy and the development of health services. However, the Canadian health census data that is available can be found primarily in raw data formats (e.g. csv, txt and etc.) which discourage its rapid manipulation for critical decision making. Due to the importance of health information and to promote its widespread usage we adopted to republish the Canadian health census data using a contemporary methodology, LOD (Linked Open Data) which is a W3C recommended flexible and interoperable standard based on RDF (Resource Description Framework). We have published the data as according to the LOD cloud schema and incorporated well known semantic web vocabularies. The integration of semantic vocabularies with health census data not only supports stable automatic linkage with the LOD cloud, but it also enhances the quality and interoperability of the data. Furthermore, we have provided an LOD explorer interface and a SPARQL endpoint to facilitate the data seekers in finding target data for reuse in mashups and the creation of comparative analyses. This initiative will enhance access to data that is already “open”, serving as an easy to use portal and information conduit for citizens interested in understanding health data and policy.
Article
Full-text available
The improvement of public health is one of the main indicators for societal progress. Statistical data for monitoring public health is highly relevant for a number of sectors, such as research (e.g. in the life sciences or economy), policy making, health care, pharmaceutical industry, insurances etc. Such data is meanwhile available even on a global scale, e.g. in the Global Health Observatory (GHO) of the United Nations's World Health Organization (WHO). GHO comprises more than 50 different datasets, it covers all 198 WHO member countries and is updated as more recent or revised data becomes available or when there are changes to the methodology being used. However, this data is only accessible via complex spreadsheets and, therefore, queries over the 50 different datasets as well as combinations with other datasets are very tedious and require a significant amount of manual work. By making the data available as RDF, we lower the barrier for data re-use and integration. In this article, we describe the conversion and publication process as well as use cases, which can be implemented using the GHO data.
Conference Paper
Full-text available
The amount of available Linked Data on the Web is increasing, and data providers start to publish statistical datasets that comprise numerical data. Such statistical datasets differ significantly from the currently predominant network-style data published on the Web. We explore the possibility of integrating statistical data from multiple Linked Data sources. We provide a mapping from statistical Linked Data into the Multidimensional Model used in data warehouses. We use an extract-transform-load (ETL) pipeline to convert statistical Linked Data into a format suitable for loading into an open-source OLAP system, and thus demonstrate how standard OLAP infrastructure can be used for elaborate querying and visualisation of integrated statistical Linked Data. We discuss lessons learned from three experiments and identify areas which require future work to ultimately arrive at a well-interlinked set of statistical data from multiple sources which is processable with standard OLAP systems.
Article
Full-text available
Data discovery on the Semantic Web requires crawling and indexing of statements, in addition to the 'linked-data' approach of de-referencing resource URIs. Existing Semantic Web search engines are focused on database-like functionality, compromising on index size, query performance and live updates. We present Sindice, a lookup index over Semantic Web resources. Our index allows applications to automatically locate documents containing information about a given resource. In addition, we allow resource retrieval through inverse-functional properties, offer a full-text search and index SPARQL endpoints. Finally, we extend the sitemap protocol to efficiently index large datasets with minimal impact on data providers.
Article
A major part of Open Data concerns statistics such as economic and social indicators. Statistical data are structured in a multidimensional manner creating data cubes. Recently, National Statistical Institutes and public authorities adopted the Linked Data paradigm to publish their statistical data on the Web. Many vocabularies have been created to enable modeling data cubes as RDF graphs, and thus creating Linked Open Statistical Data (LOSD). However, the creation of LOSD remains a demanding task mainly because of modeling challenges related either to the conceptual definition of the cube, or to the way of modeling cubes as linked data. The aim of this paper is to identify and clarify (a) modeling challenges related to the creation of LOSD and (b) approaches to address them. Towards this end, nine LOSD experts were involved in an interactive feedback collection and consensus-building process that was based on Delphi method. We anticipate that the results of this paper will contribute towards the formulation of best practices for creating LOSD, and thus facilitate combining and analyzing statistical data from diverse sources on the Web.
Chapter
The Japanese Statistics Center began publishing a statistical linked open data (LOD) site in 2016. The data currently consists of approximately 1.3 billion triples. The publication of statistical data as LOD enables datasets and categorizations to be clarified. This allows users not only to search objective data easily, but also to combine the data with other domestic or international data. This paper first introduces a design policy for LOD and a method for representing geographic areas. Then, it explains the method used to query the LOD by using SPARQL or GeoSPARQL, and provides one example application.
A Template for Handling Statistical Data in RDF
  • Y Asano
  • M Iwayama
  • H Takeda
  • S Koide
  • F Kato
  • I Kobayashi
Asano, Y., Iwayama, M., Takeda, H., Koide, S., Kato, F., & Kobayashi, I. (2014). A Template for Handling Statistical Data in RDF. Second International Workshop on Semantic Statistics (SemStats2014).
A data-driven public sector
  • C Van Ooijen
  • B Ubaldi
  • B Welby
van Ooijen, C., Ubaldi, B., & Welby, B. (2019). A data-driven public sector. 33. https://doi.org/10.1787/09ab162c-en