Content uploaded by Enayat Rajabi
Author content
All content in this area was uploaded by Enayat Rajabi on Feb 02, 2024
Content may be subject to copyright.
Int. J. , Vol. x, No. x, xxxx 1
Copyright © 200x Inderscience Enterprises Ltd.
Towards Linked Provincial Open Data
in Canada
Enayat Rajabi
Shannon School of Business
Cape Breton University
Sydney, NS, Canada
Email: enayat_rajabi@cbu.ca
Abstract: Governments are publishing enormous amounts of open data on the
Web every day in an effort to increase transparency and reusability. Linking data
from multiple sources on the Web enables the performance of advanced data
analytics, which can lead to the development of valuable services and data
products. However, Canada’s provincial government open data portals are
isolated from one another, and remain unlinked to other resources on the Web.
In this paper, we first expose the statistical datasets in Canadian provincial open
data portals as Linked Data, and then integrate them using RDF Cube
vocabulary, thereby making different open data portals available through a single
search endpoint. We leverage Semantic Web technologies to publish open data
sets taken from two provincial portals (Nova Scotia and Alberta) as RDF (the
Linked Data format), and to connect them to one another. The success of our
approach illustrates its high potential for linking open data sets across Canada,
which will in turn enable greater data accessibility and improved search results.
Keywords: Open data; RDF cube; Linked Data; semantic web
Biographical notes: Enayat Rajabi is an Assistant Professor of Business Analytics
with the Shannon School of Business at Cape Breton University in Canada, as well
as an adjunct professor of Information Management with Dalhousie University’s
School of Information Management. Dr. Rajabi received his Ph.D. in Information
and Knowledge Engineering from the University of Alcala, Spain. His work has
focused on Semantic Web and Linked Data since 2010, and he has published his
related research in several JCR journals.
1. Introduction
E. Rajabi
The volume and variety of open governmental data is growing exponentially as new
data sources become available on the Web. In particular, the extraordinary volume
of statistical data published by governments around the world is a significant
resource that enables not only increased public transparency and accountability, but
also greater innovation (Jetzek et al., 2019; van Ooijen et al., 2019). For example,
most of the datasets published on the European Commission’s
1
open data portal are
statistical in nature. This portal was designed to allow European institutions and
other bodies access to an array of open datasets in the hopes that they will apply
these data in new and innovative ways, thereby unlocking their economic potential.
Open statistical data are published by governments with the aim of such data being
used, analyzed, and commercialized by a wide range of actors, from professional
statisticians to the lay public. However, publishing and sharing data openly is only
the first step in successfully reusing data. Indeed, as Wallis et al. (2013) inquire, “if
we share data, will anyone use them?” (Dawes et al., 2016) have similarly
identified the promotion of data re-use as a key challenge, especially considering
the many purposes it can serve . A survey conducted by the European Commission
(2017) identified five main benefits of re-using open statistical data: innovation,
reduced costs, data harmonization, enhanced business models, and increased
company reliability. For many of the companies interviewed in this study, open data
is a core component of their operations, and it is one of the key resources that
enabled them to start their business.
The Linked Data approach has been widely used by different governments
to increase data re-usability, as it allows data from different disciplines to be
interconnected (Zaveri et al., 2013)(Gür et al., 2012) (Oren et al., 2008) (Caracciolo
et al., 2012). Linked statistical datasets can be applied in various domains for
different purposes. For instance, the integration of statistical data from multiple
sources can enable the performance of advanced data analytics, which can in turn
lead to the production of valuable services and data products (Kalampokis,
Karamanou, et al., 2019). One example of such an application can be seen in the
work of Kämpgen and Harth (2011), who built a data warehouse by leveraging
Linked Data in order to analyze related statistical datasets.
Although open data in Canada is published and shared through an open data
portal,
2
the re-use of this data is still in its infancy and with little research having
been devoted to this area. Searching for and connecting relevant datasets in
provincial open data portals is costly, as most statistical datasets are published in
spreadsheet formats. Furthermore, processing the data across all of the datasets in
the different open data portals can be a tedious and time-consuming task due to the
lack of uniformity in the structures and vocabularies that are used to express it (Dong
et al., 2017). While some of the provincial data portals have developed an API to
1
https://data.europa.eu/euodp/en/data
2
http://open.canada.ca
Towards Linked Open Provincial Data in Canada
access data, they do not follow a unified standard. This is a significant shortcoming,
as a unified approach to searching multiple linked provincial datasets would enable
users to analyze and compare topically-related datasets with greater ease. In this
paper, we attempt to employ a unified structure to integrate statistical open data from
multiple Canadian sources in order to promote the domestic and national utilization
of this statistical data. Put differently, this research seeks to answer the key question:
“can we integrate the isolated data islands (provincial open datasets) and provide a
search platform to query the unified and linked statistical datasets?”
To answer this question, we expose the statistical datasets in the provincial
open data portals as Linked Data, and integrate them using RDF Cube vocabulary.
1
We leveraged a World Wide Web (W3C) standard vocabulary to expose the current
statistical and multi-dimensional data as a Resource Description Framework (RDF).
RDF Cube vocabulary is designed to publish multi-dimensional data in a way that
links all related data. This integration makes different open data portals available
through a single search endpoint, and consists of four phases:
a) Data selection: a common statistical dataset is selected from two or more
open data portals;
b) Vocabulary preparation: a new vocabulary is defined or an existing
vocabulary is reused;
c) Defining a data structure: identifying a structure that will allow the
selected datasets to be integrated; and
d) Data Storage: storing the exposed data in a triple store to run the queries.
The statistical datasets in the provincial open data portals are currently available in
raw data formats (e.g., CSV or XLS), and are converted to an RDF when connected
to other datasets in another portal. Thus, RDF links connect common statistical
datasets to each other, as well as to the other datasets in the different provincial data
portals. The process of converting a statistical dataset to an RDF is described in
Section 2. Furthermore, datasets can be also connected to external sources and
ontologies. Details relating to the publishing data and the interlinking of provincial
data with other vocabularies are presented in Section 3. As a proof of concept, the
proposed approach was implemented using statistical datasets taken from two
Canadian provincial data portals (Nova Scotia and Alberta). The selected datasets,
which were relevant to the same subject, were manually downloaded from the data
portal websites and transformed to RDF after defining a data structure for both.
Section 4 details the process that was used to link the two statistical datasets. Finally,
Section 5 provides a discussion of the lessons learned as a result of this research.
2. Background
1
https://www.w3.org/TR/vocab-data-cube/
E. Rajabi
2.1. RDF Cube
Statistical data consist of measures (e.g., birth rate) and dimensions describing the
measures (e.g., province, country, and year). Statistical data are structured as data
cubes and can be modelled as RDF graphs using the RDF Cube vocabulary, which
is the global standard for publishing multi-dimensional data in RDF. As such, the
RDF Cube vocabulary can be used to promote the utilization of statistics, both
domestically and internationally. The RDF Cube vocabulary has been widely
accepted by the Semantic Web community, and it is used to represent a large
proportion of existing linked statistical datasets on the Web (Martin et al., 2015).
One of the reasons for the RFD Cube Vocabulary’s popularity is that it allows
publishers to integrate and slice across their datasets. Data publishers can also
leverage the RDF Cube to publish the information model along with the raw data
using common terms for the dimensions and units in their datasets. This vocabulary
is compatible with the Statistical Data and Metadata eXchange XML format
(SDMX) (van Ooijen et al., 2019), which is defined by an initiative established in
2001 to support the exchange of statistical data.
The main element in an open statistical dataset is observation, which
consists of one or more dimensions, one or more measures, and optional attributes.
The RDF Data Cube Vocabulary uses one subject per observation, with the
dimensions, measures, and attributes being attached to the subject of the observation
in the dataset. An observation is connected to a dataset using an outgoing link.
Figure 1 depicts an observation from an open dataset (a provincial dataset in
Canada), which shows the quantity of an incident (124) in a given area (NS) for a
particular time period (2015).
Figure 1: An example of an observation.
Towards Linked Open Provincial Data in Canada
We applied the same concept in order to present statistical data records, with
the RDF Cube serving as the building block for integrating the two datasets used in
our approach.
2.2. Related Works
Kalampokis, Zeginis, et al. (2019) investigated and proposed approaches
to dealing with the modeling challenges associated with creating a linked
statistical open dataset. The present research adopts some of their proposed
guidelines—for example, those related to the naming of dimensions, measures,
and dataset structure—in creating a provincial dataset structure using RDF Cube.
Additionally, Asano et al., (2014) proposed a software template based on
RDF Cube standard, which could be used to manipulate statistical data (browse
and edit) in RDF; however, this tool was unavailable for use in this research. In
terms of data integration using Linked Data, Zaveri et al. (2013) translated over
50 statistical health-related datasets into a Linked Data format using RDF Data
Cube Vocabulary, which were then interlinked in order to lower the barrier for
data re-use and integration. Similarly, in their work for the Japanese Statistical
Center, Matsuda et al. (2018) were able to publish statistical and governmental
data with around 300 million triples by first publishing statistical datasets as
RDF and linking their vocabularies to existing vocabularies on the Web.
Several other studies, including Bukhari and Baker (2013), Salas et al.
(2012) and Máchová et al. (2018), have focused on reusing open data in national
open data portals. For example, Salas et al. (2012) proposed using an open data
framework to make open data portals more discoverable and intelligible for
potential data reuse purposes in the agriculture domain. However, their research
utilized a tagging and annotating approach and did not consider linking different
open datasets in their framework. Alternately, Máchová et al. (2018) proposed
using a usability evaluation approach to identify deficiencies in the usability of
several individual data portals, including Canada open data. Dong et al. (2017)
also examined this issue within a Canadian context, presenting an overview of
the status and issues associated with the open data provided by seven Canadian
cities.
The present research attempts to integrate the different Canadian statistical
open data portals, which, to the best of our knowledge, has yet to be studied. To
this end, we present an approach that enables us to publish statistical open
datasets, which are currently available in provincial open data portals, and to link
them to each other based on common vocabularies on the Web.
3. Materials and Methods
E. Rajabi
This section details the context of our study and the methodology we will employ
to publish and link provincial open data portals in Canada.
3.1 Provincial Open Datasets in Canada
We performed an exploratory analysis by gathering metadata from existing
provincial open data portals in Canada. Table 1 shows the results of this analysis,
including each portal’s number of open datasets and its current web address. Our
findings revealed that 11 provinces and territories had published approximately
11,771 datasets in different domains ranging from “Business and Economy” to
“Nature and Environment.” Notably, most of the open data portals used different
standards to categorize their datasets, with some not using any categories at all. In
some cases, it was necessary to explore the entire website to find the published open
datasets, which was both tedious and time-consuming. As Table 1 shows, British
Columbia and Alberta published more open data than the other provinces. Many of
these portals presented their data using different formats, including CSV, JSON, and
Excel. Although a few of these data portals allowed users to export data in RDF
format, they do not follow the Linked Data vocabulary standards such as the RDF
Cube vocabulary, and they do not link their datasets to those of other provinces.
Table 1: Provincial open data portals in Canada based on province/territory.
Province/Territory
Number of datasets
Website URL
British Columbia
2,939
data.gov.bc.ca
Alberta
2,777
open.alberta.ca
Ontario
2,656
ontario.ca
Yukon
1,177
open.yukon.ca
Manitoba
789
data.winnipeg.ca
Nova Scotia
575
data.novascotia.ca
Saskatchewan
354
www.saskatchewan.ca
Prince Edward Island
202
data.princeedwardisland.ca
New Brunswick
121
gnb.socrata.com
Northwest Territory
100
statsnwt.ca
Newfoundland & Labrador
81
opendata.gov.nl.ca
Towards Linked Open Provincial Data in Canada
3.2 Methodology
We propose a three-layer architecture to integrate all of the provincial open data
portals in Canada, wherein open datasets are exposed as RDF and linked to the
other datasets based on a common vocabulary and ontology. As depicted in
Figure 1, different statistical open datasets are extracted from different data
portals and converted to an RDF format based on a common predefined
vocabulary.
Figure 2: Architecture of the linked provincial datasets.
Since most of the open data portals do not provide an API to retrieve data, it
will be necessary to manually download the statistical datasets directly from
those websites. Based on the subject, a vocabulary is used to transform a
statistical dataset from its raw format (which can be CSV or XLS) to the RDF
format. In this process, data can be also linked to external sources or
vocabularies. For example, if there is a disease in a dataset, it can be linked to
E. Rajabi
the disease ontology on the Web. Once the integration process has been
completed, the integrated data is stored in a data store (known as the “triple
store”), and a query service is provided as a single search point to retrieve query
results (See Figure 2).
To map the statistical data to a graph database, we leveraged the RDF Data
Cube Vocabulary discussed in Section 2. The core concept of the Data Cube
Vocabulary is an observation class (qb:Observation), which is used to make all
statistical observations as being part of a Data Cube. Every observation must
follow a specific structure that is defined using the class,
qb:DataStructureDefinition (DSD), and referenced by a dataset resource (DS) of
type qb:DataSet. Since every observation should refer to one specific dataset
(which again refers to the corresponding DSD), the structure of the observation
is fully specified. DSD components are defined as a set of dimensions
(qb:DimensionProperty), attributes (qb:AttributeProperty), and measures
(qb:MeasureProperty) to encode the semantics of the observations. These
component properties are also used to link the corresponding elements of
dimensions, measure values, and units with the respective observational
resource. Furthermore, it is possible to define groups and slices of observations,
as well as hierarchical orders of dimension, using respective concepts.
According to what we have described in this section, the integration of open
statistical datasets will be achieved via the following steps (see Figure 3):
• Data selection: In this phase, a statistical dataset common to two or
more two provincial open data portals is selected. Since the goal of
this integration is to provide a single-point search mechanism, it is
especially beneficial to identify datasets that are common to several
portals. Regardless, any dataset in an open data portal can be imported
into the data store.
• Defining the dataset and data structure (DSD): The structure of an
open dataset should be defined during the integration process. Datasets
with same subject but different structures can be unified using the Data
Cube standard.
• Vocabulary preparation: Necessary items are used to express the target
data as an RDF. When a standard vocabulary exists, we use it; when a
standard vocabulary does not exist, we define a new one.
• Conversion of observations: After defining the structure of each
observation, we convert them to the RDF format. Once each dataset
has been converted, it is imported into a triple store for further analysis
and querying.
Towards Linked Open Provincial Data in Canada
Figure 3: Open statistical dataset integration process.
4. Implementation and Discussion
To implement the proposed approach, we present a scenario wherein one
statistical dataset common to two provincial open data platforms is selected. In
this scenario, we selected the “cause of death” dataset from the Nova Scotia and
E. Rajabi
Alberta data portals. Table 2 shows a set of records for this dataset in each
province.
Table 2: Cause of death dataset in Nova Scotia and Alberta.
Province/Territory
Year
Quantity
Dataset area
Neoplasms
2015
2,699
Nova Scotia
Diseases of the Circulatory System
2015
2,317
Nova Scotia
Mental and Behavioral Disorders
2014
591
Nova Scotia
Diseases of the Nervous System
2014
477
Nova Scotia
Malignant neoplasms of colon
2016
477
Alberta
Mental and behavioral disorders due to
use of alcohol
2015
193
Alberta
Alzheimer's disease
2014
299
Alberta
All other diseases of nervous system
2014
253
Alberta
As shown in Table 2, the datasets for both provinces consist of three
dimensions: year, cause of death, and quantity (number of deaths). For example,
the first row of Table 2 shows 2,699 cases of Neoplasm disease in Nova Scotia
in 2015. It is possible that each open data portal contains other datasets with
additional dimensions (e.g., ranking or gender). We downloaded the datasets in
CSV format directly from the provincial open data portals, but the Nova Scotia
dataset is also accessible in RDF format via their Socrata API.
1
However, the
RDF does not follow the W3C statistical standard for publishing data in the
proper format. Therefore, as noted above, we unified the two datasets by
transforming the data into RDF base on the RDF Cube vocabulary standard.
Prior to this transformation, we conducted a data cleaning step wherein we
assigned additional vocabularies to each dataset.
Since many of the causes of death in both datasets were related to diseases,
we linked each disease to the disease ontology,
2
which is an open-source
ontology organized around inheritable diseases, environmental factors leading
to disease, and the infectious origins of diseases. To find the corresponding
ontological term for each disease in the downloaded dataset, we used the disease
ontology lookup service. This type of vocabulary linking can be used both to
categorize the diseases in each dataset, and to connect the common terms in both
datasets using a common URI (Uniform Resource Identifier) (see Table 2). The
information for each ontological term was retrieved by manually searching for
cause of death in the disease ontology and assigning the relevant information to
each disease in the open dataset. For example, the disease ontology website lists
1
https://dev.socrata.com/
2
http://disease-ontology.org/
Towards Linked Open Provincial Data in Canada
campylobacteriosis as a kind of gastrointestinal disease; as such, we linked its
URI to the corresponding record in the dataset.
Table 3: Connecting diseases to the disease ontology.
Disease in open data portal
Disease ontology URI
Disease
ontology term
Neoplasms
http://purl.obolibrary.org/obo/
HP_0002664
Neoplasm
Diseases of the Circulatory
System
http://purl.obolibrary.org/obo/
UBERON_0015228
Circulatory
Organ
Mental and Behavioral
Disorders
http://purl.obolibrary.org/obo/
DOID_10652
Alzheimer's
disease
Diseases of the Nervous
System
http://purl.obolibrary.org/obo/
UBERON_0001016
Nervous
system
After linking the diseases to their corresponding disease ontologies, we
exposed all of the necessary items of an observation (measure, dimension, and
attributes) using the RDF Cube vocabulary. Beyond reusing existing
vocabularies (e.g., Dublin Core) to present the attributes of an observation, we
defined a new vocabulary with the aim of making it compatible with other
vocabularies. We also used certain aspects of the Statistical Data Metadata
Exchange (SMDX) vocabulary, which is associated with statistical regions.
URIs were defined based on unique identifiers for each item in the dataset using
the following naming convention:
Base_URI/province_code/dataset_name/observation_id
This URI is then used to find an observation in the entire system. The base
URI is a web address, which is a common element in all datasets and
observations. Each province is assigned a code (e.g., NS for Nova Scotia), while
the observation identifier (observation_id) serves as a unique identifier for each
individual observation in a dataset. For example, the following URI could be
used as an identifier for a record from Table 2:
http://www.example.org/NS/Cause_of_Death/obs-01
To define a structure for the “cause of death” dataset, we create a
“DataStructure” scheme, which is shown below. Each structure scheme contains
the following attributes: Data Cube dimensions, measures, description, and area
of observation, in this case, the province.
E. Rajabi
od:causeOfDeath-structure a qb:DataStructureDefinition;
rdfs:comment "cause of death structure"@en;
qb:component
[ qb:dimension od:causeOfDeath; ],
[ qb:dimension sdmx-dimension:refPeriod; ],
[ qb:measure od-measure:quantity; ],
[ qb:measure od-measure:rank; ].
qb:component
[qb:attribute sdmx-dimension:refArea;
qb:componentRequired "true"^^xsd:boolean;
qb:componentAttachment qb:DataSet; ].
After designing the dataset structure, the RDF Cube vocabulary is used to
define a metadata for each statistical dataset. The following dataset attributes are
considered in the metadata: title, unique address of dataset (URI), information
about the data publisher, area of dataset (e.g., Nova Scotia), published date,
dataset subject, and the web address from which the dataset is derived. The
following example shows a dataset definition in RDF Cube vocabulary derived
from the province of Alberta's "cause of death" dataset:
od:dataset-causeOfDeath-AB a qb:DataSet;
qb:structure od:causeOfDeath-structure;
dct:creator "Alberta Open Government".
dct:title "Causes of death in Alberta"@en;
dct:issued "2016-04-11"^^xsd:dateTime ;
dct:publisher "Open Government–Alberta"@en;
dct:subject
sdmx-subject:1.4, od:CauseOfDeath;
prov:wasDerivedFrom "https://open.alberta.ca/".
sdmx-dimension:refArea od:Alberta.
Note that the subject created for the above dataset (od:CauseOfDeath) can be
used in different areas and with other data portals. We also linked the dataset to
the SDMX vocabulary. With the dataset obtained, each observation and its
related information are also described in the RDF format. The following snippet
shows a single observation in Nova Scotia dataset, which shows there were 1,330
causes of “Acute myocardial infarction” disease in 2001. A can also be seen, this
observation is linked to the “cardiovascular system disease” category of the
disease ontology by the following URL:
http://purl.obolibrary.org/obo/DOID_1287.
Towards Linked Open Provincial Data in Canada
qb:obs-ab-1 a qb:Observation ;
od:category
"cardiovascular system disease";
od:causeOfDeath "Acute myocardial infarction" ;
od:dataSet od:dataset_causeOfDeath_AB;
sdmx_dimension:refPeriod 2001;
od_measure:quantity 1330;
skos:broader "http://purl.obolibrary.org/obo/DOID_1287 " .
This approach can be easily extended to the other provincial open data portals
as well as other types of datasets, such as censuses. Using the above-mentioned
components (i.e., the dataset and its observations), we generated an RDF dataset
that included all of the observations downloaded from both provincial data
portals. We wrote a Python program to convert each provincial open dataset to
RDF using Python RDFlib.
1
This library was used for three purposes: 1) to create
the main structure of the datasets and their corresponding observations; 2) to
create the structure of each dataset; and 3) to generate the observations. We then
created a semantic graph that included all of the observations from both
provincial datasets and loaded it into a data store (triple store) for further analysis
and writing queries. Specifically, we applied Jena Fuseki
2
as the triple store and
wrote two SPARQL queries to compare the two datasets. The scripts of the
SPARQL queries are provided in the Appendix. To construct the queries, we
designed the questions such that the two isolated data portals could be seen as a
single endpoint. In the first query, we asked: “What was the most prevalent cause
of death in both provinces in 2015?” As Figure 4 illustrates, “chlamydia” in
Nova Scotia and “chronic heart diseases” in Alberta accounted for the most
deaths in the datasets.
Figure 4. Query results for the first query
1
https://rdflib.readthedocs.io/en/stable/
2
https://jena.apache.org/documentation/fuseki2/
E. Rajabi
In the second query, we asked: “What were the common causes of death in both
provinces in 2014?” According to the results, “gastrointestinal system disease” was
a common disease category in both datasets (see Figure 5), with 189 deaths due to
“alcoholic liver disease” in the Alberta and 609 deaths due to “clostridium difficile”
in Nova Scotia.
To verify our results, we explored the CSV files that had been downloaded
from the open data portals and checked the number of cases manually for these
diseases. All of the numbers matched up.
Figure 5: Results for the second query.
5. Conclusion
In this paper, we proposed a semantic web approach to integrating open statistical
data using the RDF Cube vocabulary. As a proof of concept, we implemented our
approach using two provincial data portals in Canada. Although we were able to
successfully transform one statistical dataset common to both two provinces to the
Linked Data format, we encountered a few problems such as cumbersome
conversion and a lack of common categories and vocabularies in the designed
process.
Towards Linked Open Provincial Data in Canada
One of the key lessons we learned from exposing open data as RDF was in relation
to the integration phase, namely, that the observations descriptions did not follow a
universal standard. Many open data portals use free text to describe the
observations in statistical datasets without assigning them a vocabulary. This lack
of a universal standard creates the need to manually review the datasets for
unifying the concepts and observations. Another lesson learned relates to the
structure of the datasets. Specifically, datasets with same subject, but in different
data portals, will have different dimensions. For example, one dataset may contain
the quantity of observations, while the other may contain quantity and ranking.
Having different dimensions and measures makes the integration process
complicated, as some dimensions will not be used in the final integrated search
mechanism.
Linking observation items to the external vocabularies (e.g., the disease ontology)
was another challenge that was encountered in the interlinking phase. However,
this issue might be addressed by developing a software program to retrieve the
matched item from the external vocabulary.
We will continue to work towards linking more datasets from diverse open
provincial data portals. To this end, designing an ontology that is capable of
covering all of the open statistical data will be a key next step in this research
programme.
Acknowledgments
The work presented in this paper was partly funded by an NSERC (Natural Sciences
and Engineering Research Council) Discovery Grant (RGPIN-2020-05869):
Semantic Web Analysis over the Nova Scotia Open Data.
References
Asano, Y., Iwayama, M., Takeda, H., Koide, S., Kato, F., & Kobayashi, I.
(2014). A Template for Handling Statistical Data in RDF. Second
International Workshop on Semantic Statistics (SemStats2014).
Bukhari, S. A. C., & Baker, C. (2013). The Canadian health census as Linked
Open Data: Towards policy making in public health.
Caracciolo, C., Stellato, A., Rajbahndari, S., Morshed, A., Johannsen, G.,
Jaques, Y., & Keizer, J. (2012). Thesaurus maintenance,
aalignment,and publication as linked data: The AGROVOC use case.
E. Rajabi
International Journal of Metadata, Semantics and Ontologies, 7(1), 65–
75. https://doi.org/10.1504/IJMSO.2012.048511
Dawes, S. S., Vidiasova, L., & Parkhimovich, O. (2016). Planning and
designing open government data programs: An ecosystem approach.
Government Information Quarterly, 33(1), 15–27.
https://doi.org/10.1016/j.giq.2016.01.003
Dong, H., Singh, G., Attri, A., & El Saddik, A. (2017). Open data-set of seven
Canadian Cities. IEEE Access, 5, 529–543.
https://doi.org/10.1109/ACCESS.2016.2645658
European Commission. (2017). Re-using open data.
Gür, N., Díaz, L., & Kauppinen, T. (2012). GI Systems for Public Health with
an Ontology Based Approach.
Jetzek, T., Avital, M., & Bjorn-Andersen, N. (2019). The Sustainable Value of
Open Government Data. Journal of the Association for Information
Systems, 702–734. https://doi.org/10.17705/1jais.00549
Kalampokis, E., Karamanou, A., & Tarabanis, K. (2019). Interoperability
Conflicts in Linked Open Statistical Data. Information, 10(8), 249.
https://doi.org/10.3390/info10080249
Kalampokis, E., Zeginis, D., & Tarabanis, K. (2019). On modeling linked open
statistical data. Journal of Web Semantics, 55, 56–68.
https://doi.org/10.1016/j.websem.2018.11.002
Kämpgen, B., & Harth, A. (2011). Transforming statistical linked data for use
in OLAP systems. ACM International Conference Proceeding Series,
33–40. https://doi.org/10.1145/2063518.2063523
Máchová, R., Hub, M., & Lnenicka, M. (2018). Usability evaluation of open
data portals: Evaluating data discoverability, accessibility, and
reusability from a stakeholders’ perspective. Aslib Journal of
Information Management, 70(3), 252–268.
https://doi.org/10.1108/AJIM-02-2018-0026
Martin, M., Abicht, K., Stadler, C., Auer, S., Ngomo, A. C. N., & Soru, T.
(2015). CubeViz—Exploration and Visualization of Statistical Linked
Data. WWW 2015 Companion - Proceedings of the 24th International
Conference on World Wide Web, 219–222.
https://doi.org/10.1145/2740908.2742848
Matsuda, J., Mizutani, A., Asano, Y., Yamamoto, D., Takeda, H., Ohmukai, I.,
Kato, F., Koide, S., Harada, H., & Nishimura, S. (2018). Publication of
statistical linked open data in Japan. Lecture Notes in Computer Science
(Including Subseries Lecture Notes in Artificial Intelligence and
Lecture Notes in Bioinformatics), 11341 LNCS, 307–319.
https://doi.org/10.1007/978-3-030-04284-4_21
Towards Linked Open Provincial Data in Canada
Oren, E., Delbru, R., Catasta, M., Cyganiak, R., Stenzhorn, H., & Tummarello,
G. (2008). Sindice.com: A document-oriented lookup index for open
linked data. International Journal of Metadata, Semantics and
Ontologies, 3(1), 37–52. https://doi.org/10.1504/IJMSO.2008.021204
Salas, P. E. R., Martin, M., Mota, F. M. Da, Auer, S., Breitman, K., &
Casanova, M. A. (2012). Publishing statistical data on the web.
Proceedings - IEEE 6th International Conference on Semantic
Computing, ICSC 2012, 285–292.
https://doi.org/10.1109/ICSC.2012.16
van Ooijen, C., Ubaldi, B., & Welby, B. (2019). A data-driven public sector.
33. https://doi.org/10.1787/09ab162c-en
Wallis, J. C., Rolando, E., & Borgman, C. L. (2013). If We Share Data, Will
Anyone Use Them? Data Sharing and Reuse in the Long Tail of
Science and Technology. PLoS ONE, 8(7), e67332.
https://doi.org/10.1371/journal.pone.0067332
Zaveri, A., Lehmann, J., Auer, S., Hassan, M. M., Sherif, M. A., & Martin, M.
(2013). Publishing and interlinking the global health observatory
dataset. Semantic Web, 4(3), 315–322. https://doi.org/10.3233/SW-
130102