Conference PaperPDF Available

MAKING MORE OGC SERVICES AVAILABLE ON THE WEB DISCOVERABLE FOR THE SDI COMMUNITY

Authors:

Abstract and Figures

The work presented in this paper describes preliminary results of the main activity performed in the research project Bolegweb, which is connected to activities done by the SmartOpenData project. The Bolegweb project aims at the development of a geospatial meta-search crawler to collect online accessible geospatial information (GI) resources and harvest geospatial metadata. Deployment of a graphical user and application programming interfaces providing access to collected resources will provide a gateway to GI resources for users and applications at all levels of the Web with a global coverage. More than 15 thousand OGC services collected within the last period, covering 17 months (October 2013 – April 2015) are collected and metadata for both services and resources are catalogued.
Content may be subject to copyright.
Cartography and GIS
MAKING MORE OGC SERVICES AVAILABLE ON THE WEB
DISCOVERABLE FOR THE SDI COMMUNITY
Dr. Tomáš Kliment
1
Assoc. Prof. Dr. Vlado Cetl
1
Dr. Marcel Kliment
2
Dr. Martin Tuchyňa
3
1
University of Zagreb, Croatia
2
Slovak University of Agriculture, Slovakia
2
Slovak Environment Agency, Slovakia
ABSTRACT
The work presented in this paper describes preliminary results of the main activity
performed in the research project Bolegweb, which is connected to activities done by
the SmartOpenData project. The Bolegweb project aims at the development of a
geospatial meta-search crawler to collect online accessible geospatial information (GI)
resources and harvest geospatial metadata. Deployment of a graphical user and
application programming interfaces providing access to collected resources will provide
a gateway to GI resources for users and applications at all levels of the Web with a
global coverage. More than 15 thousand OGC services collected within the last period,
covering 17 months (October 2013 April 2015) are collected and metadata for both
services and resources are catalogued.
Keywords: SDI, OGC services, Geospatial Web, Metadata, Discovery
INTRODUCTION
Nowadays Spatial Data Infrastructures (SDIs) deployed on the global, regional, national
or local level provide a gateway to vast amounts of Geospatial Information (GI)
resources available in the Web. This fact moves SDIs to be not only a sharing platform
for traditional geospatial data publishers used to serve a particular market of
professionals; but it can be used as a principal source of GI for Web GIS, Semantic Web
and Linked Data as well as mobile market applications. Publishing geospatial data using
standardized web services connected to applications deployed in a layer of geospatial
web has become relatively an easy job. However if an application developer searches
for those services using available SDI catalogues [1,2], he/she may not find everything
available out there in the web. The reason is that not all OGC services that are deployed
have online available descriptive information - metadata accessible with an SDI
catalogue published as a catalogue service for Web (OGC CSW). Therefore, such
services might never be found within any current SDI platforms, even if accessing
global (e.g. GEOSS) or community (e.g. INSPIRE) infrastructures. The present work
15
th
International SGEM GeoConference on……………
describes preliminary results of the main activity performed in the research project
Bolegweb [3], which is directly connected to activities performed within the
SmartOpenData [4] project. The Bolegweb project aims at the development of a
geospatial meta-search crawler to collect online accessible geospatial information (GI)
resources and harvest geospatial metadata. Deployment of a graphical user and
application programming interfaces providing access to collected resources will provide
a gateway to GI resources for users and applications at all levels of the Web with the
global coverage. A methodology to discover OGC resources in the mainstream web
search engine (SE) provided by Google Inc. is described. The results summarizing the
quantitative analysis on seven types of web service interfaces defined by open standards
developed by Open Geospatial Consortium are introduced and discussed.
METHODOLOGY
A simple workflow was designed and assembled using the following steps:
(i) Multipurpose scrapper gathers OGC services’ endpoints from Google search
engine and imports the GetCapabilities URLs into crawler database (import);
(ii)
Crawler’s verification script checks availability of collected services and
extracts the service type and version (verify);
(iii)
Harvesting script collects the metadata for services and resources they operate
on in an SDI catalogue (harvest) [5].
The system architecture depicted in figure 1 consists of three main groups of
components. The OGC crawler built upon three main procedures: import, verify and
harvest implemented as PHP scripts. The SDI catalogue built on the top of individual
clusters deployed for each OGC service type implemented with GeoNetwork
opensource catalogue software [6], and Data Portal offering and easy-to-use way to
discover the OGC resources collected by crawler, implemented using CKAN, the
world's leading open-source data portal platform [7]. Besides the main groups, the input
data are collected by Web collection engine OutWit Hub by extraction workflow
developed as automated scrapping of OGC services GetCapabilities URLs from Google
SE result list [8]. The input data collected by scrapper are stored in the database
(ogcwxs), verified and for each active service a harvesting profile is created and
executed in catalogue instance corresponding the service type. Catalogues are harvested
by CKAN through CSW API using its geospatial and harvesting capabilities. The data
from each component is accessible through both graphic user and application
programming interfaces. OGC resources that are available through the web and
application-programming interfaces (API) have been collected since October, 2013.
Cartography and GIS
Figure 1 Bolegweb project system architecture [source: own processing]
15
th
International SGEM GeoConference on……………
RESULTS
The meta-search scrapping and crawling processes have been launched in October 2013.
The scrapping process is currently executed manually, at minimum once per month and
crawling runs permanently. The availability of OGC services is represented in the
following figures (2 and 3).
0
500
1000
1500
2000
2500
3000
3500
4000
WMS WFS WCS WPS SOS WMTS CSW
Number of Service total 2013-2014
Type of Services
Oct-Dec 2013 Full 2014 Jan-April 2015
Figure 2 Number of total service per year (10-
12/2013, 2014, 1-4/2015) [Source: own processing].
0
100
200
300
400
500
600
700
WMS WFS WCS WPS SOS WMTS CSW
Number of Services average per month
Type of Services
Oct-Dec 2013 Full 2014 Jan-April 2015
Figure 3 Monthly average number of service
found in years 2013, 2014 and 2015 [Source:
own processing]
Figure 2 represents the total amount of OGC Service endpoint URL candidates collected
within the period October 2013 until April 2015 (15036 URLs) divided into seven OGC
services ‘types. Figure 3 displays the monthly average calculated for each period. This
was done to be able to understand and compare the trend per year of available services
within the time period October 2013 -April 2015, since only 2014 was a full year.
Figure 3 shows decreasing trend for metadata services (CSW), data portrayal (WMS)
and spatial feature download services (WFS) and increasing trend for coverage and
observation data download services (WCS and SOS). Web processing services (WPS)
represent the minor group concerning the number, however with balanced values
between period 2014 and first third of 2015. The results described above represented
candidates of OGC WxS GetCapabilities URLs collected by meta-search scrapper.
Obviously not all the URL addresses collected always point to an active service or to a
valid endpoint. Non-active services endpoints (or errors) are also collected by the
system, which can recognize and classify them. Figure 4 represents the ratio between
active and non-active (also non-valid) URLs.
Taking the total number collected so far shows that the number of errors (non-active or
non-valid endpoint) is significant (~ 38%), Figure 4. Even though the queries sent to
Google SE were defined precisely using advanced search operator inurl for parameters
of the GetCapabilities query: service, request and the values (e.g. WMS,
GetCapabilities) defined individually. However, if divided by year of crawling, the
trend in the 2015 period showed a decrease where the number of active services
represents almost 88%, which is a positive outcome.
Cartography and GIS
0
2000
4000
6000
8000
10000
12000
Total 2013 2014 2015
# of Services
Active
Non Active
Figure 4 Number of active and non-active services, Total of all years, and divided per year.
Geographic distribution was identified by extracting the location of servers providing
access to discovered services. Figure 5 displays the countries with more than 200 OGC
Services available per country. Eleven out of total number of 107 countries covers
almost 80% of the total amount of all URLs discovered on Google SE.
Canada; 201 Netherlands; 226 Switzerland; 261
No server located; 262
Italy; 400
France; 413
United Kingdom;
414
Puerto Rico;
459
Australia; 584
Spain; 746
Germany; 1157
United States; 7166
Figure 5 Number of service (displayed > 200) per
individual country [Source: own processing]
1-50; 79; 73%
51-100; 6; 5%
101-150; 7; 6%
151-200; 4; 4%
200-300; 4; 4%
400-750; 6; 6%
1157; 1; 1%
7166; 1; 1%
Figure 6 Number of countries grouped per range of
number of service per individual country [Source:
own processing].
On the other hand the majority of discovered server locations (~ 73%) offer up to 50
services per country (Figure 6), geographically distributed around almost the whole
globe, within Europe, North and South America, Asia and Australia (figure 7).
15
th
International SGEM GeoConference on……………
Figure 7 Map of server locations discovered by crawler grouped per range of number of service per
individual country grouping [Source: own processing].
All OGC services are available via a simple HTML and JavaScript GUI offering a
tabular view on the database populated by meta-search scrapper and OGC crawler [9].
Simple filtering is provided for data in each column. User can easily filter, an example
is reported in figure 8, which represents the OGC SOS services endpoints of version
2.0.0 located in Italy (Figure 8).
Figure 8 HTML List of OGC Services: Example of filtering for OGC SOS of version 2.0.0 located in
Italy [Source: own processing].
A simple double click on a selected service opens the capabilities XML document in a
new browser’s tab. Each OGC service that resulted as an active endpoint was prepared
for metadata harvesting task to be configured and executed in corresponding node of
CSW catalogue. Harvesting task configuration divided metadata for services and related
resources they operate. For instance, the CSW catalogue of WMS services publishes
metadata for both services and layers, WFS services and features, WCS services and
coverages, SOS services and observations. Each CSW catalogue offers its standard GUI
as well as OGC CSW API virtual endpoints deployed for each service and related
resource type.
Cartography and GIS
CONCLUSION
The work presented describes the main activity performed in the research project
Bolegweb, which is connected to activities performed by the SmartOpenData project.
By applying a graphical user and application programming interface allowed the
creation of a gateway to GI resources for users and applications at all levels of the Web
with a global coverage. From the results achieved, more than 15 thousand OGC services
were collected within a covering period of 17 months (October 2013 – April 2015). The
services collected and metadata for both services and resources were categorized and
catalogued in SDI CSWs. The fact that more and more services are easily found on the
web can be explained by the following evidences: i. Application of European and
national regulations for information infrastructures are encouraging stakeholders to
share and make available their service/data in accordance with the current regulations
and related technical documentation; ii. Today we can buy cheaper but more powerful
hardware components to build complex data collection platform to perform the
research/query; iii. An increased number of advanced freely available software solutions
are becoming more and more easy to use. These facts are making more OGC services
available and discoverable on the web at a global scale.
ACKNOWLEDGEMENTS
This work was supported by the International Fellowship Mobility Programme for
Experienced Researchers in Croatia – NEWFELPRO – co-financed by the Government
of the Republic of Croatia, the Ministry of Science, Education and Sport (MSES), and
through the Marie Curie FP7-PEOPLE-2011-COFUND program.
REFERENCES
[1] Kliment, T., Granell, C., Cetl, V., & Kliment, M. (2013, May). Publishing OGC
resources discovered on the mainstream web in an SDI catalogue. In The 16th AGILE
International Conference on Geographic Information Science.
[2] Lopez-Pellicer, F. J., Béjar, R., Florczyk, A. J., Muro-Medrano, P. R., & Zarazaga-
Soria, F. J. (2011). A review of the implementation of OGC Web Services across
Europe. International Journal of Spatial Data Infrastructures Research, 6(1), 168-186.
[3] Bolegweb project website. (n.d.) Retrieved May 10, 2015 from
http://bolegweb.geof.unizg.hr/
[4] SmartOpenData project website. (n.d.) Retrieved May 10, 2015 from
http://www.smartopendata.eu/
[5] Kliment, T., Gálová, L., Ďuračiová, R., Fencík, R., & Kliment, M. (2014).
Geospatial Information Relevant to the Flood Protection Available on The Mainstream
Web. Slovak Journal of Civil Engineering, 22(1), 9-18.
[6] Ticheler, J., & Hielkema, J. U. (2007). Geonetwork opensource internationally
standardized distributed spatial information management. OSGeo Journal, 2(1).
15
th
International SGEM GeoConference on……………
[7] Winn, J. (2013). Open data and the academy: An evaluation of CKAN for research
data management.
[8] Kliment, T., Cetl, V., & Tuchyňa, M. Discovery of Geospatial Information
Resources on the Web. SDI DAYS.
[9] HTML List of OGC Services. (n.d.) Retrieved May 10, 2015 from
http://bolegweb.geof.unizg.hr/site8/products
... The success of this method depends largely on the willingness of service providers to register their WMSs and make them available [15,16,26]. However, numerous services published on the web are not registered in any catalogues, and thus cannot be found through catalogue searches [2,27]. Various search engines, which use web crawlers to continuously traverse static pages [28][29][30], have been developed for finding services dispersed in the surface web [31,32]. ...
... Various search engines, which use web crawlers to continuously traverse static pages [28][29][30], have been developed for finding services dispersed in the surface web [31,32]. General-purpose search engines (such as Google and Bing), customized search engines with general crawlers and focused crawlers are the three most commonly used approaches [16,27,[31][32][33][34]. In essence, however, these active approaches can only find geospatial web services that reside in static pages. ...
... The most notable research for actively discovering surface geospatial web services can be divided into two types of approaches. The first type of discovery approach utilizes the application programming interfaces (APIs) of general-purpose search engines, employing predefined queries to search the Internet for discovering OGC geospatial web services [27,32,33,37,38]. For example, the Refractions Research OGC Survey [37] used the Google web API with two queries "request = getcapabilities" and "request = capabilities" to extract the results and then used a number of Perl "regular expressions" for a complete capabilities URL in each returned page. ...
Article
Full-text available
Automatic discovery of isolated land cover web map services (LCWMSs) can potentially help in sharing land cover data. Currently, various search engine-based and crawler-based approaches have been developed for finding services dispersed throughout the surface web. In fact, with the prevalence of geospatial web applications, a considerable number of LCWMSs are hidden in JavaScript code, which belongs to the deep web. However, discovering LCWMSs from JavaScript code remains an open challenge. This paper aims to solve this challenge by proposing a focused deep web crawler for finding more LCWMSs from deep web JavaScript code and the surface web. First, the names of a group of JavaScript links are abstracted as initial judgements. Through name matching, these judgements are utilized to judge whether or not the fetched webpages contain predefined JavaScript links that may prompt JavaScript code to invoke WMSs. Secondly, some JavaScript invocation functions and URL formats for WMS are summarized as JavaScript invocation rules from prior knowledge of how WMSs are employed and coded in JavaScript. These invocation rules are used to identify the JavaScript code for extracting candidate WMSs through rule matching. The above two operations are incorporated into a traditional focused crawling strategy situated between the tasks of fetching webpages and parsing webpages. Thirdly, LCWMSs are selected by matching services with a set of land cover keywords. Moreover, a search engine for LCWMSs is implemented that uses the focused deep web crawler to retrieve and integrate the LCWMSs it discovers. In the first experiment, eight online geospatial web applications serve as seed URLs (Uniform Resource Locators) and crawling scopes; the proposed crawler addresses only the JavaScript code in these eight applications. All 32 available WMSs hidden in JavaScript code were found using the proposed crawler, while not one WMS was discovered through the focused crawler-based approach. This result shows that the proposed crawler has the ability to discover WMSs hidden in JavaScript code. The second experiment uses 4842 seed URLs updated daily. The crawler found a total of 17,874 available WMSs, of which 11,901 were LCWMSs. Our approach discovered a greater number of services than those found using previous approaches. It indicates that the proposed crawler has a large advantage in discovering LCWMSs from the surface web and from JavaScript code. Furthermore, a simple case study demonstrates that the designed LCWMS search engine represents an important step towards realizing land cover information integration for global mapping and monitoring purposes.
... The success of this method depends largely on the willingness of service providers to register their WMSs and make them available [15,16,26]. However, numerous services published on the web are not registered in any catalogues, and thus cannot be found through catalogue searches [2,27]. Various search engines, which use web crawlers to continuously traverse static pages [28][29][30], have been developed for finding services dispersed in the surface web [31,32]. ...
... Various search engines, which use web crawlers to continuously traverse static pages [28][29][30], have been developed for finding services dispersed in the surface web [31,32]. General-purpose search engines (such as Google and Bing), customized search engines with general crawlers and focused crawlers are the three most commonly used approaches [16,27,[31][32][33][34]. In essence, however, these active approaches can only find geospatial web services that reside in static pages. ...
... The most notable research for actively discovering surface geospatial web services can be divided into two types of approaches. The first type of discovery approach utilizes the application programming interfaces (APIs) of general-purpose search engines, employing predefined queries to search the Internet for discovering OGC geospatial web services [27,32,33,37,38]. For example, the Refractions Research OGC Survey [37] used the Google web API with two queries "request = getcapabilities" and "request = capabilities" to extract the results and then used a number of Perl "regular expressions" for a complete capabilities URL in each returned page. ...
Article
Full-text available
Automatic discovery of isolated land cover web map services (LCWMSs) can potentially help in sharing land cover data. Currently, various search engine-based and crawler-based approaches have been developed for finding services dispersed throughout the surface web. In fact, with the prevalence of geospatial web applications, a considerable number of LCWMSs are hidden in JavaScript code, which belongs to the deep web. However, discovering LCWMSs from JavaScript code remains an open challenge. This paper aims to solve this challenge by proposing a focused deep web crawler for finding more LCWMSs from deep web JavaScript code and the surface web. First, the names of a group of JavaScript links are abstracted as initial judgements. Through name matching, these judgements are utilized to judge whether or not the fetched webpages contain predefined JavaScript links that may prompt JavaScript code to invoke WMSs. Secondly, some JavaScript invocation functions and URL formats for WMS are summarized as JavaScript invocation rules from prior knowledge of how WMSs are employed and coded in JavaScript. These invocation rules are used to identify the JavaScript code for extracting candidate WMSs through rule matching. The above two operations are incorporated into a traditional focused crawling strategy situated between the tasks of fetching webpages and parsing webpages. Thirdly, LCWMSs are selected by matching services with a set of land cover keywords. Moreover, a search engine for LCWMSs is implemented that uses the focused deep web crawler to retrieve and integrate the LCWMSs it discovers. In the first experiment, eight online geospatial web applications serve as seed URLs (Uniform Resource Locators) and crawling scopes; the proposed crawler addresses only the JavaScript code in these eight applications. All 32 available WMSs hidden in JavaScript code were found using the proposed crawler, while not one WMS was discovered through the focused crawler-based approach. This result shows that the proposed crawler has the ability to discover WMSs hidden in JavaScript code. The second experiment uses 4842 seed URLs updated daily. The crawler found a total of 17,874 available WMSs, of which 11,901 were LCWMSs. Our approach discovered a greater number of services than those found using previous approaches. It indicates that the proposed crawler has a large advantage in discovering LCWMSs from the surface web and from JavaScript code. Furthermore, a simple case study demonstrates that the designed LCWMS search engine represents an important step towards realizing land cover information integration for global mapping and monitoring purposes.
... This paper provides preliminary results of the main activities performed in the research project Bolegweb 2 . The Bolegweb project aims at the development of a geospatial meta-search crawler to collect online accessible GI resources published on the Web using OGC services, harvest the geospatial metadata and deploy Graphic User and Application Programming Interfaces (GUI and API) facilitating access for different user groups (Kliment et al., 2015). ...
Article
Full-text available
The effective access and use of geospatial information (GI) resources acquires a critical value of importance in modern knowledge based society. Standard web services defined by Open Geospatial Consortium (OGC) are frequently used within the implementations of spatial data infrastructures (SDIs) to facilitate discovery and use of geospatial data. This data is stored in databases located in a layer, called the invisible web, thus are ignored by search engines. SDI uses a catalogue (discovery) service for the web as a gateway to the GI world through the metadata defined by ISO standards, which are structurally diverse to OGC metadata. Therefore, a crosswalk needs to be implemented to bridge the OGC resources discovered on mainstream web with those documented by metadata in an SDI to enrich its information extent. A public global wide and user friendly portal of OGC resources available on the web ensures and enhances the use of GI within a multidisciplinary context and bridges the geospatial web from the end-user perspective, thus opens its borders to everybody. Project “Crosswalking the layers of geospatial information resources to enable a borderless geospatial web” with the acronym BOLEGWEB is ongoing as a postdoctoral research project at the Faculty of Geodesy, University of Zagreb in Croatia (http://bolegweb.geof.unizg.hr/). The research leading to the results of the project has received funding from the European Union Seventh Framework Programme (FP7 2007-2013) under Marie Curie FP7-PEOPLE-2011-COFUND. The project started in the November 2014 and is planned to be finished by the end of 2016. This paper provides an overview of the project, research questions and methodology, so far achieved results and future steps.
... This paper provides preliminary results of the main activities performed in the research project Bolegweb 2 . The Bolegweb project aims at the development of a geospatial meta-search crawler to collect online accessible GI resources published on the Web using OGC services, harvest the geospatial metadata and deploy Graphic User and Application Programming Interfaces (GUI and API) facilitating access for different user groups (Kliment et al., 2015). ...
Article
Full-text available
The effective access and use of geospatial information (GI) resources acquires a critical value of importance in modern knowledge based society. Standard web services defined by Open Geospatial Consortium (OGC) are frequently used within the implementations of spatial data infrastructures (SDIs) to facilitate discovery and use of geospatial data. This data is stored in databases located in a layer, called the invisible web, thus are ignored by search engines. SDI uses a catalogue (discovery) service for the web as a gateway to the GI world through the metadata defined by ISO standards, which are structurally diverse to OGC metadata. Therefore, a crosswalk needs to be implemented to bridge the OGC resources discovered on mainstream web with those documented by metadata in an SDI to enrich its information extent. A public global wide and user friendly portal of OGC resources available on the web ensures and enhances the use of GI within a multidisciplinary context and bridges the geospatial web from the end-user perspective, thus opens its borders to everybody. Project “Crosswalking the layers of geospatial information resources to enable a borderless geospatial web” with the acronym BOLEGWEB is ongoing as a postdoctoral research project at the Faculty of Geodesy, University of Zagreb in Croatia (http://bolegweb.geof.unizg.hr/). The research leading to the results of the project has received funding from the European Union Seventh Framework Programme (FP7 2007-2013) under Marie Curie FP7-PEOPLE-2011-COFUND. The project started in the November 2014 and is planned to be finished by the end of 2016. This paper provides an overview of the project, research questions and methodology, so far achieved results and future steps.
Article
Full-text available
The effective access to and reuse of geospatial information (GI) has come to be of critical value in modern knowledge based society. The standardized web services defined by Open Geospatial Consortium (OGC) are frequently used for the implementation of a spatial data infrastructure (SDI), to expose geospatial data, metadata and models on the Web. These GI are normally stored in an encoded geospatial layer, which is hidden from search engines. SDI uses a catalogue service for the web as a gateway to GI through the metadata defined by ISO standards, which are structurally diverse to OGC metadata. Therefore, a crosswalk needs to be implemented to bridge the OGC resources discovered on mainstream web with those documented by metadata in an SDI to enrich its information extent. We have to build mechanisms allowing entrepreneurs and developers access the information SDI is providing to build their apps. The paper reports a global wide and user friendly platform of OGC resources available on the web with the main goal to ensure and enhance the use of GI within a multidisciplinary context and to bridge the geospatial web from the end-user perspective, thus to open its borders to more web communities. The platform has been developed in the research project Borderless Geospatial Web (Bolegweb).
Conference Paper
Full-text available
Nowadays geospatial data users search for geospatial information within an SDI using discovery clients of a Geoportal application (i.e. INSPIRE Geoportal). If data producers want to promote related resources and make them available in the SDI, then they need to create metadata according to the predefined rules (i.e. INSPIRE metadata regulation) and publish them using a CSW standard. This approach allows for either distributed searches or harvesting metadata from different SDI nodes. Nevertheless, there are still a lot of data producers making their resources available on the Web without documenting and publishing in a standardised way. The paper describes a workflow to provide a tool to make OGC-based geospatial services found on the Internet discoverable through CSW-compatible service catalogues and, hence, more visible to a wider SDI community.
Geonetwork opensource internationally standardized distributed spatial information management Open data and the academy: An evaluation of CKAN for research data management
  • J Ticheler
  • J U Hielkema
  • J Winn
Ticheler, J., & Hielkema, J. U. (2007). Geonetwork opensource internationally standardized distributed spatial information management. OSGeo Journal, 2(1). th International SGEM GeoConference on…………… [7] Winn, J. (2013). Open data and the academy: An evaluation of CKAN for research data management.
Geonetwork opensource internationally standardized distributed spatial information management
  • J Ticheler
  • J U Hielkema
Ticheler, J., & Hielkema, J. U. (2007). Geonetwork opensource internationally standardized distributed spatial information management. OSGeo Journal, 2(1).
Open data and the academy: An evaluation of CKAN for research data management
  • J Winn
Winn, J. (2013). Open data and the academy: An evaluation of CKAN for research data management.