ArticlePDF Available

Improving the documentation and findability of data services and repositories: A review of (meta)data management approaches

  • Digtial Flanders


This scientific review paper aims at challenging a common point of view on metadata as a necessary evil and something mandatory to the data creating and dataset publishing process. Metadata are instead presented as a crucial element to ensure the findability of data services and repositories. This paper describes a way through four levels of metadata management and publication, from default unstructured data, through schema-based metadata with literal values and/or URIs, towards linked open (meta)data providing explicit linkage between reliable data resources. Such research was conducted within the European Union's project PoliVisu. Special attention is given to the following: (1) guidance on publication aimed at the broad audience of search engine users and (2) the publication of geo (meta)data not only via standard technologies, such as the OGC Catalogue Service for Web and open data portals, but also through leading search engines (that are
Improving the documentation and ndability of data services and
repositories: A review of (meta)data management approaches
s ˇ
, Lieven Raes
, Andrew Stott
, Bart De Lathouwer
, Andrea Perego
Karel Charv´
, ˇ
an Kafka
Masaryk University, Faculty of Science, Department of Geography, Brno, Czech Republic
Digitaal Vlaanderen, Brussels, Belgium
University of Cambridge, Centre for Science and Policy, Cambridge, United Kingdom
Open Geospatial Consortium, Wayland, USA
European Parliament, Luxembourg
Help Service Remote Sensing Ltd., Beneˇ
sov, Czech Republic
Metadata review
Open linked metadata
Open linked data
Search engines
This scientic review paper aims at challenging a common point of view on metadata as a necessary evil and
something mandatory to the data creating and dataset publishing process. Metadata are instead presented as a
crucial element to ensure the ndability of data services and repositories. This paper describes a way through
four levels of metadata management and publication, from default unstructured data, through schema-based
metadata with literal values and/or URIs, towards linked open (meta)data providing explicit linkage between
reliable data resources. Such research was conducted within the European Unions project PoliVisu. Special
attention is given to the following: (1) guidance on publication aimed at the broad audience of search engine
users and (2) the publication of geo (meta)data not only via standard technologies, such as the OGC Catalogue
Service for Web and open data portals, but also through leading search engines (that are
1. Introduction
The importance of metadata is often underestimated. For data users,
metadata are commonly regarded as a necessary evil and something
mandatory to the data creation and dataset publishing process. Meta-
data specialists - a small group of specialists in the IT, geo-communities,
scientic, and archive communities - have their particular conceptions
of an ideal metadata description. Today, several metadata approaches
subsume different levels of effort, complexity, and prot (Dublin Core,
2020; FGDC, 1998; ISO 19115, 2003; ISO 19115, 2014; W3C, 2020b;
JoinUp, 2020). Metadata are a crucial element in data-driven decision
making because of the essential link between reliable data and the
reliability of the decision outcome. They play an essential role during
the data collection process regarding dataset characteristics, prove-
nance, and data quality. The concept of metadata, as presented within
this review, stems from research conducted in the PoliVisu project
(PoliVisu, 2020).
The 1990s and the beginning of the new millennia could be char-
acterized as an era of metadata enlightenment. The importance of
metadata was stressed where applicable, and the (misleading
) deni-
tion of data about data was emphasized on almost all relevant occa-
sions, such as scientic conferences, scientic/technical papers,
workshops, and commercial (software) presentations, etc. (Weibel et al.,
1995; Shien-Chiang et al., 2003; FGDC, 1998; Jensen et al., 2000). This
period culminated in the adoption of several legally-binding texts that
require metadata creation as well as their maintenance. For instance, in
Europe, the INSPIRE directive (European Commission, 2007) and its
(chronologically the rst) accompanying Commission Regulation, No
1205/2008 on metadata (European Commission, 2008), are the most
evident proofs of the above-described efforts originating from the 1990s.
The primary motivation of the INSPIRE Directive is clear: (geo)data can
only be shared and re-used if they are ndable. The FAIR Guiding
Principles for scientic data management and stewardship are being
followed across the world (Wilkinson et al., 2016).
* Corresponding author. Masaryk University, Department of Geography, Kotl´
a 2, 611 37, Brno, Czech Republic.
E-mail address: (T. ˇ
The views expressed are purely those of the author and may not in any circumstances be regarded as stating an ofcial position of the European Parliament.
Metadata describe various assets except for data; such as Web services, applications, models; eld sessions, projects, sensors etc. (ISO 19115, 2003).
Contents lists available at ScienceDirect
Computers and Geosciences
journal homepage:
Received 2 October 2020; Received in revised form 16 November 2021; Accepted 10 July 2022
The next era, since 2004, is most commonly referred to as the era of
the Semantic Web(W3C, 2015a). It is characterized as an extension of
the World Wide Web (WWW) rather than a new concept. The principle
remains the same as in the INSPIRE Directive case, i.e., to re-use and
share data across the WWW, with the intention of obtaining more
knowledge from the ood of existing data. The Semantic Web Frame-
work is considered as an integrator across different contents, informa-
tion systems, and communities. It has been acknowledged as a feature of
the WWW since 1989 in a paper by the founder of both the WWW and
the originator of the term Semantic Web, Tim Berners-Lee.
Semantic metadata are metadata that describe the meaningof data
(Melton and Buxton, 2006) in order to promote them for re-use and
sharing. Another benet of semantic metadata is not commonly
mentioned: the revoking of the articial boundary between data and
The contemporary desire for everything open, interconnected, cus-
tomizable, shareable, scalable, and t for use with articial intelligence
applications may create the impression that the Semantic Web and its
application are followers of traditional approaches, including metadata.
This document is a guide about whether to:
remain using non-semantic metadata approaches;
follow semantic metadata approaches: and if so, at which level;
combine both approaches.
This paper provides a review of (section 3), and guidance at a deci-
sion making level (section 4) on whether, and, if so, to what extent, a
shift towards semantic metadata approaches should be made by a
metadata creator/administrator. It also shows how such modication
should be made in order for it to be the most benecial and cost-efcient
at the same time (sections 5 and 6).
2. Methods and related work
This study aims to provide geo-based communities with a compari-
son and evaluation of advances in metadata development towards the
open linked (meta)data paradigm, as well as the motivation to adopt this
paradigm. The concept of open linked (meta)data enables the articial
boundary between data and metadata maintained in the geosciences
(and other domains) to be annulled. Such an articial boundary origi-
nates from separate conceptualisations of metadata, e.g., in the case of
Dublin Core (Dublin Core, 2020), CSDGM (FGDC, 1998), and ISO 19115
(ISO 19115, 2003; ISO 19115, 2014).
The interoperability, exibility, and openness of metadata have
become crucial questions for any location-based information system
since the number of assets that need to be described through metadata
has multiplied.
We may identify many studies on metadata in geosciences, including
linked open (meta)data approaches, as described in detail below. As far
as the authors are aware, no study similar to this one has been performed
so far. This study provides a synthesizing picture in contrast to state-of-
the-art studies, as it comprises the following three major features: (1) a
review of metadata management in geosciences, explaining the evolu-
tions/revolutions in metadata approaches & metadata publication sys-
tems, (2) guidance on publication aimed at the broad audience of search
engines users, and (3) guidance on whether, and, if so, to what extent, a
shift towards semantic metadata approaches should be undertaken.
Geosciences make up the sixth largest component within the Linked
Open Data (LOD) cloud, after life sciences, government, linguistics,
publications, and social networking (McCrae, 2020a).
Scientic papers indexed at the Web of Science (Clarivate Analytics,
2020) and applicable to metadata within geosciences were analysed
from the following points of view:
numbers of available papers concerning the ten most common
phrases related to the scope of this paper (see Fig. 1 for results);
analysis of review papers relevant to the scope of open linked met-
adata; both within and beyond geosciences; (see Fig. 2 for results);
analysis of papers relevant to the scope of open linked metadata
concerning applicability at the following levels (see Fig. 3 for
inspirations beyond geosciences that may apply to geosciences,
applications that are entirely or partly related to geosciences (as a
multidisciplinary approach).
In general, the topic of metadata is strongly accented in geosciences.
The description of an application accompanied by metadata is the most
common one (Di et al., 2009; Ho et al., 2011; Maue et al., 2012; Zhang
et al., 2013; Da Silva et al., 2014; Kalantari et al., 2014; Hu et al., 2015;
Laa et al., 2016; Palma et al., 2016; Reznik et al., 2016; McGee et al.,
2017; Neumaier et al., 2018). Such papers mostly deal with an explicitly
dened scope, like the integration of open linked metadata into cloud
platforms, sensor webs, geovisual analytics, Volunteer Geographic In-
formation, e-government, libraries for geo-resources, cartographic
models, and Web services. The only geo application of open linked
metadata to open data, in general, has so far been presented by Neu-
maier et al. (2018).
The books by Moellering et al. (2005) and Nogueras-Iso et al. (2005)
still contain the most comprehensive overviews of international as well
as national metadata initiatives, structures and standards, despite being
published fteen years ago. Since then, we may identify a number of
relevant studies dealing, for example, with the following aspects: posi-
tioning metadata in the framework of international standardization
(Furner, 2020), the evaluation of and outlook on the international
standardization of geographic information metadata (Brodeur et al.,
2019), and metadata life cycles, spirals, use cases, hierarchies, and their
interconnections through links (Habermann, 2018). An analysis of types
of metadata and their use by Danko (2012) has a unique position as a
similar study has not since been published. The management of meta-
data collections remains an open challenge, primarily concerning their
completeness, topicality, and accuracy, as Giles (2011) described.
Analysis of all the studies examined for this paper conrmed the
following. Dublin Core (Dublin Core, 2020) and ISO 19115 (ISO 19115,
2003; ISO 19115, 2014; together with related ISO, 19100 series stan-
dards) remain the only two signicant players regarding international
metadata standards for geosciences. Australia and New Zealand
(ANZLIC, 2020), China (Baiquan et al., 2013), Europe (European Com-
mission, 2008.) and the United States of America (FGDC, 1998.) are all
examples countries/regions shifting towards ISO 19115 and Dublin Core
in geosciences, although national metadata standards are also being
used in parallel.
Only ve review papers were found concerning open linked meta-
data in geosciences. Smits and Friis-Christenson (2006) as well as
Govedarica et al. (2010) discuss the potential of open linked metadata in
general. Di et al. (2009) present an in-depth discussion for sensor met-
adata. Such work was extended by Tagliolato et al. (2019), who devel-
oped semantic proles for SensorML descriptions. An in-depth
discussion of Earth observation data is provided by Harris and Olby
(2001). The most comprehensive recent review paper on the topic of
semantic geoinformation modelling was written by Kokla and Guilbert
(2020). One section of this paper is dedicated to semantic search and
knowledge discovery that is based on ontologies. No review paper with a
holistic view of metadata in geosciences has yet been found. The most
comprehensive summaries have so far been published by W3C/OGC
Spatial Data on Web Best Practices (W3C, 2017b). The presented
research is in line with these best practices, as it follows and enhances
them in terms of open linked metadata.
This review aims to ll relevant knowledge gaps with respect to
metadata approaches and publication systems at various levels. The
concept of open linked metadata is emphasized as the most novel one,
and as the one not described sufciently in existing papers.
T. ˇ
Rezník et al.
3. Characteristics of metadata approaches & metadata
publication systems
This section provides an overview at the management level of
different stages of metadata approaches. These metadata (management)
approaches are tightly connected to metadata publication systems. The
primary intention of the following text is to present an overview that is
demarcated by historic milestones. Note that the presented levels indi-
cate different metadata approaches; there is no intention to compare
levels as better or worse. The main difference is historical: a level with a
lower number was created earlier. Nevertheless, metadata have been
produced at all levels now and will presumably also continue to be in the
The various metadata approaches (see also use cases in Supple-
mentary material 1) presented in this section can also be understood in
the following way:
Level 0 (section 3.1.1) is the default unstructured metadata used
within many typical IT systems.
Level 1 (section 3.1.2) represents a revolution that changes the
paradigm of metadata management and publication to schema-
based metadata with literal values.
Level 2 (section 3.1.3) is an evolution of Level 1, with an emphasis on
the usage of unique identiers in addition to literals whenever
applicable, giving schema-based metadata with unique
Level 3 (section 3.1.4) represents a further revolution that changes
the paradigm of metadata management and publication, as it
removes the articial border between data and metadata. The met-
adata is itself linked open (meta)data.
The terms ‘revolutionand ‘evolutionare used here in the following
ways. ‘Revolution means that the metadata management from the
previous level is thrown out and entirely replaced by a new one. In
contrast ‘evolutionimplies that the majority of the features of the prior
level are kept.
3.1. The data management step-up process
3.1.1. Level 0: default unstructured metadata
The most basic level of metadata management relies solely on
automatically created metadata in a system. Typical examples in a data
repository are le size or date of last modication, or information
implicitly associated with a data resource, such as a le name or type of
le (format).
Three primary benets may be dened for Level 0 metadata
management as follows:
1. Easy to deploy. Automatically/implicitly managed metadata are a
part of several IT systems, including database solutions. In many
cases, automatic/implicit metadata management is the default one.
Even if not, it can be deployed quickly from the administrators
console. Level 0 does not require any specialized metadata software,
as metadata remain a natural part of data. As a consequence of such
tight bindings, the life cycle of metadata is equal to the life cycle of
2. User-friendliness. No training is needed, as metadata are so simple
that everyone understands metadata elements like le size, le
name, date of last modication, access rights etc.
3. Tight (meta)data bindings. Metadata are automatically created/
updated/deleted together with the data they describe. Life cycles of
data and metadata are equal.
The simple approach of Level 0 also brings challenges that are
tightly connected to the benets of easy deployment, user-friendliness
and tight (meta)data bindings:
1. Fixed metadata elements. Metadata elements are clear to users;
however, they are based on the capabilities of the used system. Their
customization/extension is usually not supported (or to a minimal
extent). It is very complicated, or even not feasible, to add other
metadata elements, such as spatial extent, even if there is a user
2. Absence of publication. Metadata are presented only within a
system; their export and/or presentation in a different way from the
default one is not supported.
3. Limited ndability. Metadata are presented only at the level of data
display. For example, searching for relevant data according to given
criteria is not supported, i.e., in a catalogue service based on meta-
data. Findability capabilities are limited only to metadata automat-
ically created/implicitly associated with a data resource if supported
by the application logic of the used system.
3.1.2. Level 1: schema-based metadata with literal values
Traditional, i.e. non-semantic, metadata approaches can be char-
acterised as the management of textual key-value pairs. For instance, we
have a key titleand value MySampleGeodata. Such a concept is at
least as old as relational databases (Codd, 1970) and has remained in use
since the 1970s. All the many different metadata standards and deni-
tions vary, among others, in the following aspects:
Wrapper: whether metadata are used as stand-alone data (ISO
19115, 2003; ISO 19115, 2014) or as a fraction of something bigger
(Dublin Core, 2020), like a Website, database, Web server response,
or a table linked to other data etc.;
Complexity and structure: a metadata record could be represented
by a at list of (by default up to 15) key-value pairs (as in Dublin
Core, 2020), through a more complex structure is possible, with
dozens of metadata elements and very complex structures (as in
GeoDCAT-, 2015.), hundreds of metadata elements, and several hi-
erarchical levels (as in ISO 19115, 2003; ISO 19115, 2014).
Exchange format: This could be null, in the case of metadata
managed in a table within an internal database, or it could be CSV
(Comma Separated Value); export table structures, such as XLS
(Microsoft Excel Spreadsheet) up to XML (eXtensible Markup Lan-
guage), according to the standardsXML schema denition; and RDF
(Resource Description Framework W3C, 2014a). The Level 1
(traditional) approaches encode metadata in any of the above-
mentioned (and several other) exchange formats, including their
combinations. However, not all the formats are self-describing.
Consumers of exchange formats such as CSV and XLS would need to
know that they contain metadata in a particular layout (which might
mean a further level of metadata!), or the CSV or XLS produced might
have to comply with strict standards to be ingestible into established
catalogues or search services.
Publication: in general, a broad portfolio of cataloguing solutions is
available. However, the given structure determines which publica-
tion tools are suitable. For instance, the OGC Catalogue Service for
the Web (OGC, 2016) supports by default a structure based on Dublin
Core, but, through its application proles, also structures following
ISO 19115/19119, ebRIM (registries) (OGC, 2006), or (North
American) CSDGM (FGDC, 1998).
Level 1: schema-based metadata with literal values address the
challenges of Level 0 in terms of xed metadata elements, absence of
publication, and limited ndability. The following benets document
the shift towards Level 1:
1. Strict, yet exible, structure. A structure that follows a given
schema is the essential feature of Level 1. Such an approach brings a
number of clearly dened metadata elements and their organisation.
Metadata elements are commonly dened by their names, textual
descriptions, cardinalities (including obligations), data types, and
T. ˇ
Rezník et al.
domains. However, a user may set up his/her metadata prole, as far
as it follows the rules given by the schema. A user may change an
optional metadata element to conditional/mandatory, or change the
data type from a character string to a code list, etc. Under certain
conditions, a user may also add his/her new metadata elements that
are not included in the schema.
2. Metadata publication. The schema is usually dened on both the
conceptual level (textual/tabular descriptions accompanied with
graphics, most typically expressed as a UML class diagram, e.g. in
ISO 19115, 2003; ISO 19115, 2014) and the implementation level.
XML Schema (XSD) is a typical technology that is being used to
capture encoding rules. A schema denition on the implementation
level makes it possible to incorporate (semi)automatic validation
tools, like XML schema validation and Schematron validations
(Schematron, 2020).
3. Findability layer. Searching and ndability represent application
logic on top of the published metadata records. Catalogue services
make it possible to search for and discover relevant resources ac-
cording to given criteria. Examples in this direction could be: Show
me all the datasets that provide measurements between 2017 and
2019 in Pilsen city by using trafc intensity detectors.
4. Interoperability. The ability of systems to work with each other is
initiated, as the structure as well as catalogue APIs (Application
Programming Interfaces) are clearly dened and originate from an
(international) standard. Metadata may be transferred from one
system to another; related catalogues may be connected, etc.
Level 1: schema-based metadata with literal values bring chal-
lenges, similarly to any change of a paradigm. The following challenges
were identied:
1. Difcult to deploy. Metadata management based on schema-based
metadata with literal values usually requires specic metadata tools
and staff acquainted with their set up, customizations, and mainte-
nance. The required efforts are equal to the complexity of a given
structure/schema. Deployment can be relatively easy for structures/
schemas like Dublin Core, with up to 15 metadata elements in a at
structure (Dublin Core, 2020), but very complex for ISO
19115/19119, which includes hundreds of hierarchically organised
metadata elements (ISO 19115, 2003; ISO 19115, 2014). Note that
efforts with respect to ISO 19100 series metadata can vary consid-
erably, as most of their metadata elements are optional. An ISO
19100 metadata prole can contain dozens or hundreds of metadata
elements. Among other issues, ambiguities in Level 0 metadata are
dependent on the documentation of a code list granularity. Also, the
deployment of catalogue services is specic and requires trained
2. User-friendliness. More complex metadata structures/schemas are
difcult to understand for both metadata administrators and users.
Complex hierarchical structures may decrease the clarity of the in-
formation presented in the metadata.
3. Loose (meta)data bindings. Metadata require, in Level 1, specic
structures, tools and handling, which results in various (and
commonly isolated) life cycles of data and metadata.
4. Ambiguities. The provided character strings and/or code list values
are commonly non-intuitive, as they capture not easily interpretable
values such as ‘DTM, ‘007or ‘NoiMesAboGro2mHei.
3.1.3. Level 2: schema-based metadata with unique identiers
Level 2 is understood, in contrast to the previous two levels, as the
rst semantic-oriented level. Values are presented as identiers in
comparison to Level 1 to reduce ambiguities. The shift to Level 2 ad-
dresses the above-raised points as follows:
Wrapper: remains the same as in Level 1. There are no changes
regarding the metadata wrapper (this is the most important change
in comparison to Level 0). Websites, databases, Web server re-
sponses, tables linked to other data etc. Are still being used in Level 2;
Complexity and structure: where applicable, it could use a triple-
based (i.e. subject-predicate-object; W3C, 2014a) or even more
complex structure. For example, the relation between the Coordi-
nate reference system and urn:ogc:def:crs:EPSG:4326 (EPSG,
2019) is a typical unique identier used in geosciences metadata. A
user is then sure of what kind of coordinate system is meant. The
unique identier urn:ogc:def:crs:EPSG:4326 is used explicitly by
the Open Geospatial Consortium (OGC) for the two-dimensional
expression of WGS84 (World Geodetic System 1984). In Level 2,
unique identiers are re-used as much as possible. New unique
identiers can be created if existing ones are not available and/or not
applicable for the given purpose. However, although this ensures
uniqueness, it does not necessarily aid the association or combina-
tion of datasets unless the same unique identiers are used for the
same entities throughout, say, an organisation.
Exchange format: This remains the same as in Level 1. The given
structure can still be exchanged in a table within an internal data-
base, through CSV, export table structures like XLS, XML, or RDF.
Publication: This can remain the same as in Level 1, i.e. a broad
portfolio of cataloguing solutions is available in general, while the
given structure determines suitable publication tools. The benets of
the ndability layer (publication) appear when a cataloguing solu-
tion has an application logic that supports interpretations of unique
identiers. A user may then see the types and denitions of a value
provided as a unique identier instead of seeing the unique identier
We may identify the following differences on top of the modications
mentioned above. The semantic approach based on unique identiers
adds the types and denitions of data, a term which requires an
explanation so as not to be interpreted differently. The key-value pair
concept is addressed in the semantic web as follows:
Keys that add the types and denitions of the names of things/rele-
vant concepts (such as noise, which can be dened according to the
altitude above the ground, and the methodology, etc. provided);
examples of Keys are ‘keywords, ‘spatial extent, ‘coordinate refer-
ence system, ‘identication of an organisationetc.;
Values that are provided together with their types and denitions
(such as units of measure); examples of Values are ‘gemetKeyword:
noise_measurement, ‘vocab.gettytgn/7011723 (for Pilsen), ‘urn:
ogc:def:crs:EPSG:4326 (for WGS84), czechGov:Ministry_of_-
Transportation. Values are unique identiers agreed and followed in
a domain/area of use.
Level 2 schema-based metadata with unique identiers have iden-
tical benets as described in Level 1, i.e.:
1. Strict, yet exible, structure.
2. Metadata publication.
3. Findability layer.
4. Interoperability.
Furthermore, Level 2 of metadata management brings one new
motivation for its use:
5. Clarity. The unique identier-based approach aims at explicit
designation. It is evident through a unique identier ‘vocab.get-
tytgn/1014734that the described East Yorkis the one in Canada,
Ontario and not the one in the United States, Pennsylvania, or any
other East York around the world. The unique identier-based
approach is used for locations, keywords, identications of under-
lying data resources, and coordinate transformation systems, etc.
T. ˇ
Rezník et al.
The fth benet is described more in detail below.
In any context in which data exist, there are values that represent
certain concepts; human decisions, including policy-making, are then
(evidence-)based on interpretations of such values. Associating specic
values with specic concepts is one way of assigning types and deni-
tions to data made up of those values. For example, the following values
demonstrate the same concepts of trafc noise measurements: dayand
night, 1 and 0, Ldayand Ln, or L07-19 and L20-06. A
metadata registry captures the allowed values for some key as managed
by some registration authority. A mechanism by which the names of
thingsand the values assigned to them can be managed makes them
easier to be found and interpreted in various data sources.
As a result, the semantic approach prefers unique identiers over text
strings to populate a certain value. A value for noise could, for instance,
look like urn:noiseAuthority:measuremements:registry:noisePeriod:
dayinstead of day.
The added value for a user is receiving the types and denitions
relating to this value, e.g. that a noise measurement conducted during
the day means between 7 a.m. and 7 p.m., at a height of 2 m above the
ground, while other bias noises were suppressed by further processing.
In particular, sensor networks including the IoT (Internet of Things;
Gubbi et al., 2013) benet from semantic metadata through concepts
like SOSA (Semantic Sensor Network Ontology; W3C, 2017a), a joint
effort of the World Wide Web Consortium and the Open Geospatial
Versioning in a unique identier provides an opportunity to capture
the evolutions of a resource. Versioning commonly includes a version
number, e.g. 1.3.0, or a date of revision/update, most commonly in
line with ISO 8601-01 (2019), e.g. a date like 2021-11-10 or with a
timestamp like 2021-11-10 T12:00:07. Moreover, EPSG (2019), for
example, assumes the latest version when the version number is missing.
For instance, the unique identier urn:ogc:def:crs:EPSG:4326contains
:precisely for such a purpose. The greatest challenge in versioning lies
at the implementation level. All implementations using such unique
identiers with resource versioning must follow all the evolutions, a
non-trial task.
Linking a value to a corresponding registry, thesaurus, and/or
gazetteer can be achieved for any application. When moving to semantic
approaches, starting from Level 2, geosciences are being seamlessly in-
tegrated into the e-government concept: geo- and non-geo- resources are
handled equally and can be linked together.
The challenges identied at Level 2 of metadata management
remain the same as in Level 1 with one exception: ambiguities are no
longer a challenge, rather a benet. The following challenges from Level
1 remain valid also for Level 2 metadata management:
1. Difcult to deploy.
2. User-friendliness.
3. Loose (meta)data bindings.
3.1.4. Level 3: linked open (meta)data
The shift to Level 3 of metadata management includes the adoption
of referenced URIs. Open data available on the Web are linked to other
data through URIs. The shift to Level 3: linked open (meta)data ad-
dresses these points as follows:
Wrapper: This is enhanced in comparison to previous levels. In
general, anything WWW-related can be re-used: from a Website or a
Web server response. However, some kinds of resources remain
inaccessible to metadata applications - for instance, even a Semantic
Web application cannot address the metadata of an e-mail.
Complexity and structure: any (meta)data structure identied in
Levels 1 and 2 can be used as an underlying one (Dublin Core,
CSDGM, ISO, 19115, etc.). Level 3 is the most complex as it provides
URIs in a dereferenceable way. Unique identiers in Level 2, like
gemetKeyword:noise_measurement, are provided as URL links,
such as
A user may click on the link and get to another source of information.
Such an approach enables relevant pieces of information to be con-
nected and improves the decision making process on top of the new
connections (links). Moreover, Level 3 allows users and applications
to identify equivalencies, hierarchies (parents, children, siblings),
broader terms (like measuring GEMET, 2020a) or related terms
(like noise analysis GEMET, 2020b), and homonyms (words/terms
which sound alike or are spelt alike but have different meanings) etc.
Another point of view lies in the structure of metadata element itself.
The shift between different levels can be illustrated on the example of
a creator metadata element as follows:
At Level 0, only the implicitly provided name of the le owner at
the current time is available. This also means that a contributors
identication depends on the system used.
At Level 1, key-value pairs with literal values are provided. For
instance, a key creator has a value ‘John First.
At Level 2, values are supported by unique identiers to identify
the corresponding resources uniquely. For instance, a key creator
has a value as the unique identier ‘JohnFirst003. Such an
approach enables us to identify the right John First explicitly.
At Level 3, a collection of FOAF (Friend of a Friend; for further
details see FOAF (2014) objects is provided. For instance, the
(semantic) triple can be expressed as ‘dc:creator =http://
myOrganisation.policy/staff/JohnFirst”’. Complexity should be
depicted in ontologies, as the object (value) http://myOrganisati
on.policy/staff/JohnFirst can be a predicate (key) to other ob-
jects (values), e.g. linking http://myOrganisation.policy/staff/
JohnFirst with information about him on a web page.
Such an example tends to illustrate the fact that statements on se-
mantic metadata approaches are not black or white, as semantic ap-
proaches are not ‘all or nothing.
Exchange format: three formats are dened as the default ones:
RDF/XML (W3C, 2014c), Turtle (W3C, 2014b), and JSON-LD (W3C,
2020a), as they are all capable of handling links to other information
resources and are self-dening in the sense of allowing a
consuming process to interpret the metadata without any additional
information. However, referenceable URIs may also be handled in
XML through XLink (W3C, 2010), as in <gmx:Anchor>tags in ISO
19139 (ISO/TS 19139, 2007; ISO/TS 19139-1, 2019) compliant
Publication: follows the structure(s) used and the exchange format
(s). For instance, CKAN (Comprehensive Knowledge Archive
Network) with proper extensions (GitHub, 2020) is used as a
wide-spread cataloguing solution. Linked open (meta)data also open
new publication possibilities for leading search engines. Such an
approach may attract more users than before. Moreover, such
attraction may appear in terms of user-friendliness, as depicted in
Fig. 4. Note that any search engine following can be
re-used for such publication. The leading search engines mostly
require metadata to be embedded in Web pages using HTML +RDFa
(W3C, 2015b) or JSON-LD (W3C, 2020a) snippets. If metadata are
expressed in RDF, this could facilitate indexing, provided that they
are not available only separately, but embedded in HTML pages
following SEO (Search Engine Optimisation) techniques. Interoper-
ability is not granted. Moreover, these rules are different for different
information resources. As an example, see the differences in recipes
(Fig. 4) and datasets (Fig. 5). When metadata are expressed using
Dublin Core (Dublin Core, 2020) and DCAT (W3C, 2020b), they can
be indexed without the need to convert them to More-
over, the terms for datasets and catalogues are modelled
on DCAT, so the correspondence is pretty straightforward.
A successful publication in Level 3 also requires the following
T. ˇ
Rezník et al.
which leading search engine(s) are desired,
which information resources are desired.
Search engines mostly omit or simplify the metadata of geo resources
(typically by avoiding spatial extent information) due to their speci-
city. According to Clarke (2012), semantic metadata can be used to
increase trafc from search engines, as such metadata can provide
search engines with more information about the content being searched.
For this reason, several of the leading search engines have begun
working together, via (2020), on the development of stan-
dards that facilitate a greater level of metadata exposure. As a conse-
quence, the leading search engines provide the user with an answer that
best matches his/her search history, i.e. a user prole. To sum up, se-
mantic metadata are one of the ingredients of machine learning algo-
rithms - and this not only within the leading search engines.
There are dozens of kinds of rich (meta)data
content supported by the leading search engines: from ‘articlethrough
‘dataset, ‘event or ‘recipe to ‘video. In general, ‘dataset-rich results
seem to be the most common way of describing geoscience resources.
However, ‘dataset-rich results do not appear on the entry pages of some
leading search engines (Brickley et al., 2019). The primary advantage of
rich results is lost a user-friendly visualisation that is common to
non-geo metadata as depicted in Fig. 4. Instead, geo metadata are pre-
sented without the benets of rich results, as depicted in Fig. 5.
The benets and challenges of Level 3, open linked (meta)data, are
described in greater detail in section 4 due to their complexity and
4. The benets and challenges of open linked (meta)data
management and publication
The text in the following sections presents the benets of, and
challenges facing open linked (meta)data management and publication
in more detail.
Level 3 is a further shift in comparison to Level 2. The main benet of
Level 2 is clarity in the types and denitions of data. Level 3 uses,
contrary to Level 2, dereferenceable URIs that point to other relevant
resources. Thesauri, gazetteers and/or registries are used as primary
sources. For instance, the Level 2 unique identier ‘vocab.gettytgn/
1014734 is modied for Level 3 as follows: http://vocab.getty.
edu/page/tgn/1014734. The linked open (meta)data approach in-
creases user-friendliness, as a user can click on a link and obtain the
information directly from its source. Moreover, the application logic of a
system is capable of employing improved processes that automatically
connect relevant pieces of information.
4.1. Benets
As noted at the beginning of section 3, Level 3 represents a revolution
with respect to both metadata management and publication. For this
reason, the benets, as well as challenges, relating to it will be described
in greater depth in comparison with the previous levels.
4.1.1. Openness
Open data is a paradigm that, in the last decade, has been empha-
sized more and more in two major communities: in research and in
public administrations. Regarding the latter, the open data paradigm
enables the re-use and creation of added value and evidence-based de-
cision making on top of already published data. Also, for this reason,
public administration bodies commonly desire to publish openly as
much data as is feasible. Such a situation leads, in some cases, to a heap
of data in which it is difcult for users to orient themselves. For example,
Fig. 6 shows an overwhelming number of results on available visual-
isations at the EU Open Data Portal (
/visualisation-home), which, however, are not connected in a user-
friendly way. Such difculties from a user experience point of view
result in quite a low number of visitors to the EU Open Data Portal in
comparison to national geoportals.
4.1.2. Dissemination to the masses
The basic weakness of both open data portals and geoportals is the
fact that users need to know that they exist. Finding the right tool for
ndability may be an even more severe obstacle than searching within
the discovered tool. Geoportals are not commonly known to people
outside of the geo bubble, while open data portals are not commonly
known to people outside of the open data (e-government) bubble.
Level 3 (the linked open (meta)data approach) facilitates the user-
friendly publication of available (meta)data in leading search
Fig. 1. Numbers of papers available at the Web of Science concerning eleven phrases related to the scope of this paper. Note that the analysis of papers for the term
‘semantic web was not performed in this study as it would have been too demanding for the capacity of our team.
T. ˇ
Rezník et al.
engines via publication through (see Fig. 7).
4.1.3. Describing only relevant aspects
An easy-to-use structure, especially when compared with the
complexity of standard geo metadata structures, is another feature and
benet of semantic (meta)data. Contemporary traditional metadata
standards, like CSDGM (The Content Standard for Digital Geospatial
Metadata; published by the Federal Geographic Data Committee FGDC
of the United Nations; FGDC, 1998) or ISO 19115 Geographic infor-
mation Metadata (ISO 19115, 2003; ISO 19115, 2014), have very
complex structures including hundreds of metadata elements in several
hierarchical levels. In these, a metadata creator/administrator is also
pushed to document mandatory metadata elements that are 1) not
needed and 2) do not describe a resource appropriately according to
his/her scope of applications. Such an impulse often results in
non-equivalent descriptions of identical or similar concepts. Semantic
approaches enable only those descriptions that are relevant according to
the scope of the metadata application to be documented. This benet
may easily become a disadvantage as there is no minimal set of
describing metadata elements, as in the case of the core metadata ele-
ments in ISO 19115 or a legally required set of elements in INSPIRE.
4.1.4. Links within/between information resources
Semantic approaches by default aim at linking open data. The added
value lies, among other advantages, in clearly linking relevant pieces of
information, such as in an example of visualisations of trafc measure-
ments (see Fig. 8). When supporting semantics, we may visualise the
most relevant resources as the primary ones and leave others to be
shown as links, if more information is desired. Semantic approaches
assist in estimations of the most relevant resources. For instance, a user
is searching for a river; however, the metadata contains the term
stream. A catalogue service will provide datasets to a user for both
terms, i.e. riveras well as stream, thanks to an associated thesaurus
that has indicated the terms riverand stream as synonyms.
Fig. 2. Analysis of review papers relevant to the scope of open linked metadata at the Web of Science.
Fig. 3. Analysis of papers relevant to the scope of open linked metadata concerning their applicability in geosciences.
T. ˇ
Rezník et al.
4.1.5. Revoke the (articial) boundary between data and metadata
Level 0 and Level 3 have a common aspect: they both apply the same
rules to data as well as metadata. Level 3 revokes an articial boundary
that is the cost of the paradigm used in Levels 1 and 2. Level 3, open
linked (meta)data, uses the triple-based construction for data and met-
adata. Data and metadata follow a common life cycle. Metadata
accompany data where desired and at several levels, such as a series of
datasets, a single dataset, dataset visualisation, the e-shop offering the
dataset, the layers of a dataset, the object type as part of a layer, and
object instance, etc. Findability, as well as other processes, may be
designed in a new, more complex way.
4.2. Challenges
The following challenges will be, similarly to benets, described in
more depth:
4.2.1. Updating mechanisms
Updates need to be set, the most common in geosciences through ETL
(Extract, Transform, Load) mechanisms. The updating mechanisms are
mostly automatic; however, a certain amount of manual input is usually
needed, which inuences the regular costs invested into metadata
4.2.2. Lack of concepts
The LOD (Linked Open Data) cloud (McCrae, 2020a) provides an
excellent basis for (meta)data integration with relevant
semantically-rich content. The LOD cloud is poorly balanced when
speaking about different scientic domains as well as concepts in a
scientic domain. The LOD cloud (Fig. 9) in November 2021 contained
Fig. 4. Demonstration of rich results as a user-friendly means of linked open (meta)data publication, as appearing in search engines (images
adopted from, modied).
Fig. 5. Demonstration of a rich resultof a geo-domain dataset, as appearing in search engines (texts adopted from,, modied).
T. ˇ
Rezník et al.
1301 datasets with 16,283 links.
Geosciences, identied in the LOD cloud as a separate domain,
contributed 44 datasets; see also Fig. 10. It should be noted that geo-
sciences are mentioned in the LOD cloud as ‘geography. For example, in
reality, the ‘geography domain in the LOD cloud contains, among
others, the Geological Survey of Austria (GBA) Thesaurus. Therefore,
the authors in this paper prefer the term geosciences when speaking
about the ‘geographydomain in the LOD cloud. The major geoscience
databases within the LOD cloud are the DBpedia (a semantic equivalent
of Wikipedia), LinkedGeoData (including a semantic version of Open-
StreetMap) and GeoNames (as the primary source for geocoding, with
25 million geographical names and 150 million web service requests per
Fig. 6. Results brought to a user when searching for visualisations at the EU Open Data Portal (adopted from:
Fig. 7. Demonstration of metadata: primarily, answers are presented to users directly instead of metadata behind the answers (although metadata
were used) (map adopted from, modied).
T. ˇ
Rezník et al.
Fig. 8. Visualisation of open data in a structured and linked way: PoliVisu prototype on Trafc intensity detectors in Pilsen (development inspired by https://data. and
Fig. 9. All the databases within the LOD cloud and their linkages. Adopted from:
T. ˇ
Rezník et al.
day). As depicted in Fig. 10, these three databases are the main hubs, as
all the remaining semantic databases in geosciences are linked to them.
Regarding their trajectory, geosciences are growing faster than the
LOD cloud in general. From another point of view, the geosciences are
the sixth most represented within the LOD cloud (after life sciences,
government, linguistics, publications and social networking). Another
perspective is philosophical: What is and what is not related to geo-
sciences? For instance, OECD Linked Data (McCrae, 2020b) contains
geolocated information; however, it is not classied within the LOD
cloud as geo. Therefore, the inclusion of only 44 linked geo datasets in
the LOD cloud is misleading.
4.2.3. Weak support in geosciences
This situation seems to be a result of the rigidness of the geo com-
munity. With the exception of a pioneers, we can see a lack of best
practice (Hu et al., 2015; Di et al., 2009; Ho et al., 2011; Narock and Fox,
2012; Kalantari et al., 2014; Wilson et al., 2014; Laa et al., 2016;
McGee et al., 2017; Da Silva et al., 2014; Zhang et al., 2013; Maue et al.,
2012; Neumaier et al., 2018). The geo community, having reached a
peak with respect to the use of non-semantic metadata approaches (i.e.
Levels 1 and 2), still, as a whole, hesitates whether or not to adopt se-
mantic metadata approaches. It seems that the identied challenges,
together with a lack of application support, prevent the geosciences
community from taking signicant steps towards using semantic meta-
data. Findability mechanisms present the most visible obstacle. Geo-
portals have not evolved to comply with publication techniques (the
so-called SEO techniques) used by all Web developers to ensure that a
Web site is indexed by search engines. These not only include embed-
ding metadata in Web pages - via HTML +RDFa and/or JSON-LD
snippets, but also - and more importantly - the use of basic Web publi-
cation best practices (such as having a URL for each page to be indexed).
Nevertheless, some catalogue platforms are moving in this direction,
such as GeoNetwork (2020). The leading search engines have adopted
semantic web principles while geo catalogues mostly remain according
to how such catalogues were designed and built ten years ago, even
though semantic approaches were successfully tested within a geo
catalogue as early as in 2006 (HarmonISA, 2020). A shift to the devel-
opment of semantic applications will also mean a shift on the part of the
geo community to the employment of semantic-based use cases (and
4.2.4. Invested efforts
Efforts are needed when shifting from one level to another, no matter
how great or small the shift. The most expensive are revolutions; i.e.
shifts from Level 0 to Level 1 and from Level 2 to Level 3. The geo
community has made the shifts to Levels 1 and 2 over the last two de-
cades. However, the will to nance another revolution in terms of
metadata management and publication seems to be low. Creation and
maintenance costs on the one hand and low benets, especially when
the scope of application is not sufciently specied enough, on the other
hand are the major economic disadvantages. See section 6 for further
4.2.5. Formalisation
Two extreme situations could be identied:
Limited loose semantics (typical for linked data) makes imple-
mentation easier and more re-useable.
Fig. 10. The selection of geosciences within the LOD cloud. Adopted from:
T. ˇ
Rezník et al.
Strong formalised semantics allow for powerful reasoning but make
it hard to combine information from different sources.
The occurrence of both situations simultaneously is unlikely, at least
not with formalisms like OWL (Web Ontology Language), which are
based on classical logic (Augusto, 2019), or a more precise descriptive
logic (Horrocks et al., 2003; Horrocks, 2005). As a result, a successful
semantic web application is either powerful, but limited in scope and
hard to integrate with other systems, or it is dumbbut easy to integrate
and use. It is therefore necessary to decide which type of success is
4.2.6. Performance
As far as the authors are aware, the most complex semantically
interlinked geodatabase has been developed within the FOODIE and
DataBio projects (Reznik et al., 2017): more than 700 million triples
were maintained in the Virtuoso triple store, with responses within
seconds when using the Pozna´
n Supercomputing and Networking Centre
(in Poland). Nevertheless, the performance of a semantic-based nd-
ability service becomes worse in cases of:
the presence of low-end hardware (on the part of the server),
the inputting of complex queries (usually dozens or more conditions
in one query) or
the existence of very strong formalisation (especially the number of
linkages and the capability of related stores to respond under stress
5. Open linked (meta)data: incremental versus an all in one go
As noted earlier, open linked (meta)data is not an ‘all or nothing
concept. Two major approaches are feasible: incremental implementa-
tion on the one hand and all in one go implementation on the other.
As stated in section 3, Level 3 open linked (meta)data approaches are
considered as another revolution in terms of (meta)data management
and publication. Open-linked (meta)data are often understood as a
completely new paradigm that requires abandoning the existing
approach. This section (5) tries to summarize the advantages of incre-
mental implementation as a revolution divided into several smaller
steps, on the one hand, and of a complete revolution, i.e., changing
everything at once, on the other.
Both revolutionary approaches are valid; both have advantages and
disadvantages. The steps identied in Fig. 11 need to be considered:
The strengths and weaknesses of each approach are summarised in
Table 8.
6. Discussion
The lack of best practices in semantic approaches in geosciences
sometimes seems to have a paralysing effect on their wider adoption
(and vice versa). This discussion is intended as a guide to selecting the
most appropriate approach for an organisation at a particular point in
time, taking account of the type of content, the organisations objectives,
the extent of openness and connectivity with other data, and the re-
sources and skills available.
6.1. Suitability for a semantic approach
6.1.1. How suitable is the resource for this approach?
As also discussed in section 4, semantic approaches are applicable
only for some kinds of resources. For instance, e-mails cannot be (at least
so far) linked with/to any other related concept in existing semantic
approaches. Video, audio and images can be linked only partially (Isaac
and Haslhofer, 2013; Sikos, 2017). The basic geo resources like datasets,
Fig. 11. Steps identied for ‘incremental implementation as well as for the all in one goapproach.
Table 8
Strengths and weaknesses of incremental implementation versus all in one go
Incremental implementation All in one go
Strengths preservation of existing
open linked (meta)data
onlyin a publication data
existing processes remain
linkages between all the dened
data resources
one (new) infrastructure for data
and metadata
metadata and data life cycles are
Weaknesses linkages only between some
data resources
metadata and data life cycles
remain different
difcult updating
training needed to set up and
maintain the combined
costs for changing the
training needed to set up and
maintain the combined
missing best practices
division of publicly unavailable
linkage to historical data (before
shifting to the linked open (meta)
data approach)
T. ˇ
Rezník et al.
web services, (map) compositions, and/or applications could follow the
best practices dened within the OGC document on GeoDCAT-AP (Raes
et al., 2019). Even such a portfolio of resources is not commonly
described by semantic approaches in geosciences. Linking to relevant
concepts is both the basic idea and the greatest benet of semantic ap-
proaches. Higher impact would be achieved in cases when other kinds of
resources are supported within implementations.
6.1.2. How much will the (meta)data be shared, now or in the future?
Semantic approaches are based on linking concepts, and one of the
main objectives has been to make it easier both to share data in a
meaningful way and to use data from other sources, sometimes by
combining them with data already held. Therefore, the greater the
expectation that data will be shared, the greater the realisable value of a
semantic approach.
Many semantic approaches are associated with data openness goals.
However, this does not mean that all the metadata need to be publicly
available now, or in the future. It would be possible to design a semantic
approach with a view to a future sharing of the organisations data and
with the possibility of using data from other sources more easily straight
away. It would also be possible to publish some parts of the metadata but
not other parts, with a clear and managed division between publicly
available metadata and condential metadata.
6.1.3. How much can existing semantic resources be used?
The more that semantic concepts and ontologies relevant to your
data have already been developed, the easier the application of a se-
mantic approach, and the greater the potential benets of linking with
other data. Over the last ten years, signicant progress has been made in
developing useful standard resources. Some key resources are.
The LOD cloud: as presented in section 4.
Thesauri (that may be a part of the LOD cloud)
domain-related thesauri like GEMET, AGROVOC, USGS Thesaurus,
gazetteers, i.e. thesauri with geolocated concepts, like:
Getty Thesaurus of Geographical Names (TGN),
GeoNet Name Server (GNS),
The World Gazetteer,
GeoNames (the most used one),
registries that provide types and denitions typically for the cod-
ing of list values, such as the INSPIRE registries (INSPIRE, 2020) or
the EPSG registry.
other resources relevant to the scope of a (metadata) application.
It is not essential that all necessary concepts have been developed.
There are cases in which equivalent concepts do not (yet) exist or are not
applicable and/or it is not desirable/could be misleading to link to
existing concepts. In such cases, a (meta)data creator/administrator has
an opportunity to dene an ontology as well as to set up URIs. However,
if no concepts are linked, many of the benets of a semantic approach
are lost.
6.2. What benets would be gained from a semantic approach?
6.2.1. How important is precision and uniqueness in the metadata?
Traditional metadata management provides key-value pairs that
already aim at adding types and denitions of data. However, these
types and denitions of data may be:
known only to some community;
very limited, as the key-value pair concept does not allow more
complex explanations.
The benets of semantic approaches include explicit types and def-
initions of data for the provided (meta)data, including explicit position:
for instance, when Dublin is indicated as a place of origin, semantic
metadata is able to indicate whether it is Dublin the capital of
Ireland, Dublin a city in Ohio, the United States of America, or some
6.2.2. How important is it that public search engines can nd your content?
In summary, full-text based search, as provided by the leading search
(Web) engines, brings vast amounts of users from various domains. Full-
text search engines have several times higher numbers of users than the
most visited geoportal on the planet. Such users are eager to discover
also geodata. It is important to emphasise that not all users are willing to
receive the answer in the form of a map. Nevertheless, the question is
often a location-based one, such as What noise from trafc do I
encounter when walking from my home to my work? (Kraak and
Brown, 2000).
Two related levels can be identied:
If you are capable of delivering a direct answer instead of metadata,
do it.
Only if you cannot directly provide an answer, present the user with
metadata on a resource where (s)he can nd the searched
For instance, a user is searching for 2 +2; (s)he immediately re-
ceives an answer. The same applies when searching for a location-based
answer like noise map Flanders. A user receives a preview of a
collection of maps instead of textual metadata (, 2020).
6.2.3. How important is it for your metadata to be integrated with sensor
The integration of (meta)data within and beyond sensor networks
brings new perspectives as well as added value (Di et al., 2009;
Tagliolato et al., 2019). Complex queries in semantic approaches may
become even more complex as they can also address the original (sensor)
measurements. The joint Open Geospatial Consortiums and World Wide
Web Consortiums recommendation called the Sensor Network
Ontology (W3C, 2017a) addresses this step in detail. Such a document
also discusses the differences between so-called live and static datasets.
6.3. What is the ability to deliver a semantic approach?
6.3.1. What skills are available?
Human resources skilled in metadata management are also needed
for semantic approaches. However, it may not be sufcient to under-
stand a metadata standard, dene a metadata prole, develop a tem-
plate for metadata encoding, publish a Web service, or dene validation
mechanisms. The following changes in comparison to traditional
metadata approaches may be identied:
1. Analysis of user requirements (as traditionallythis step is omitted
within geosciences at the metadata level (Ho et al., 2011; Laa et al.,
2016; Maue et al., 2012; Neumaier et al., 2018),
2. Denition of high-level business process workows including a de-
cision on the re-use of the existing ontology versus developing a new
one(s) (Kokla and Guilbert, 2020),
3. Determination of whether the required (meta)data will also be of a
sensitive nature and require any special handling,
4. Determination of whether (meta)data will also be ndable by leading
search engines that follow,
5. Decision about semantic-ready storage and its implementation
(typically, a triple/quadruple store in comparison to traditional
relational database),
6. Decision about exchange formats and their implementation,
7. The setting up of publication mechanisms (typically, an open API
supporting SPARQL queries),
T. ˇ
Rezník et al.
8. Denition and development of quality assurance and (or) quality
control measures (especially to verify whether the underlying, i.e.
interlinked, concepts are still reachable).
6.3.2. To what extent would you need to dene your own concepts
concerning semantic web denitions?
Semantic approaches could be used even when there are no relevant
publicly available concepts for integration. However, in such a case, the
efforts invested into semantic approaches will be considerably higher, as
the denition of ones own concepts is not a trivial task (Corcho et al.,
2003). This means that a registry/thesaurus/OWL ontology or any
similar entity needs to be created within your organisation. Additional
requirements arise both in terms of skills and nancial resources.
6.3.3. How much funding is available?
In general, semantic approaches are initially costlier to implement in
comparison to traditional metadata approaches; however, semantic
approaches may give greater benets, allowing other costs to be avoided
over the data lifecycle. The highest cost is likely to be human resources
with the necessary semantic design and implementation skills - either
within the organisation (including the costs of developing those skills if
necessary) or by contracting temporary specialist skills.
6.3.4. Suitability of existing IT infrastructure and services?
The implementation of semantic (meta)data may involve building
interfaces and ensuring compatibility with the existing IT corporate
infrastructure. If organisation policies permit, it may be possible - and, in
the short term, desirable - to use cloud services to host the semantic
element in order to avoid costs and delays in implementing semantic
software in the corporate IT infrastructure.
6.4. Reaching a decision on the most suitable approach
The decision on which approach to adopt is not just technical. Still, it
should consider issues of business strategy, manageability, costs of
scarce technical and non-technical resources and the benets case for
the organisation. It will need to look not just at the short-term and
pragmatic issues but also at the longer-term benets of a semantic
approach. It will often be worth preparing a Business Case for the
proposed approach so that there is a clear and sustainable basis for
the formal decision.
The choice among the options will depend on the circumstances of
individual organisations, but broadly four approaches are possible.
Where there are clear and substantial benets of a semantic
approach, particularly in making the organisations data more pub-
licly available and ndable, and there is a substantial body of
existing semantic data to which to link, organisations should seri-
ously consider committing the necessary nancial and human re-
sources to a fully semantic approach.
Where the immediate benets are less clear, but there is nevertheless
a vast amount of data involved, the advantages of precise and unique
metadata may nevertheless justify a fully semantic approach.
Where the organisation is not yet sure about the strength of the
benets case, it might choose in the short-term to adopt the semi-
semantic approach of using unique semantic identiers in its
(meta)data. This would allow some of the benets of unique data to
be realised and may also be a suitable way of developing the orga-
nisations skills for fuller semantic approaches at a later stage. This
could be particularly attractive where the organisation is in a posi-
tion to develop ontologies that other stakeholders could use.
Where the organisations data is not suitable for a semantic approach
or is unlikely to be shared with others, then traditional meta-data
approaches may be adequate. However, to enable linking of data
within the organisation itself, there may still be an advantage in
using URIs to describe key entities uniquely and commonly across
different departments of the organisation.
7. Conclusions
The presented paper provides comprehensive reference material for
adopting the principles of linked open (meta)data. Three major advan-
tages can be dened for adopting such semantic-based (meta)data
clear types and denitions of data used for decision making,
exibility of metadata descriptions,
sharing relevant information with the broad audience of
based search engines users.
The rst point aims at better decisions, as data in semantic ap-
proaches are linked to explicit evidence on the procedures used, units of
measures, and the quality of the data (measurements) etc. Benets of the
rst point can appear with even meagre investments, as linking data can
be handled through URIs in existing (geo) solutions. A (dereferenceable)
URI provides clearer information on types and denitions as well as
further relevant information in comparison to the traditional main-
tenance of (free text) character strings in metadata.
The second point aims at the exibility of metadata descriptions.
Metadata creators/administrators are no longer pushed to provide
mandatory metadata elements according to some complex (interna-
tional) standard. Instead, only relevant (meta)data are linked together
according to the needs of an application. This also means that metadata
may be easily provided also in cases not supported by an (international)
metadata standard. For instance, ISO 19115 does not support a
description of the data structure in the metadata. In contrast, semantic
approaches enable metadata to be linked to feature types, attribute
types, and/or any similar structure of the described dataset. Semantic
approaches enable the removal of the articial boundary between data
and metadata that has been present within geosciences for more than
three decades.
Sharing relevant information with the broad audience of Schema.
org-based search engines users allows, if desired, to step outside geo
and open data bubbles. It also addresses a common challenge that has, so
far, been only partially addressed within geo/open data communities.
That is, discovering a proper tool for nding the data seems to be in
several cases even more complicated than discovering the data in that a
tool. Such a step requires knowledge as well as investments in time and
effort. Metadata can be advertised in an attractive user-friendly way in
leading search engines.
Four levels of metadata management and publication were identied
and described:
‘Level 0: default unstructured metadata, which is generated auto-
matically/implicitly within many typical IT systems.
‘Level 1: schema-based metadata with literal values, which is a
‘revolutionwhere the semantics of these values is ambiguous (clear
usually only within a domain and/or interest group).
‘Level 2: schema-based metadata with unique identiers, which is
an evolution of Level 1 with an emphasis on the use of unique
identiers that are added to literal values whenever applicable, these
providing explicit identication of reliable data resources.
‘Level 3: linked open (meta)data, which provides explicit linkage
(and types plus denitions) between reliable data resources. More-
over, Level 3 facilitates the indexing of (meta)data by leading search
The decision on which level to adopt is not just technical. Still, it will
need to consider issues of business strategy, manageability, opportunity
costs of scarce technical and non-technical resources, and the benets
case for the organisation. It will need to look not just at short-term and
T. ˇ
Rezník et al.
pragmatic issues but also at the longer-term benets of a semantic
approach. It will often be worth preparing a Business Case for the pro-
posed approach so that there is a clear and sustainable basis for the
formal decision. In any case, two approaches are feasible: incremental
implementations as a revolution divided into several smaller steps on
the one hand, and complete revolution, i.e. ‘all in one go implementa-
tion on the other hand.
The benets of semantic approaches increase with the number of
involved stakeholders. To date, the Linked Open Data (LOD) cloud
contains 1301 datasets with 16,283 links. Geosciences are the sixth
most-represented component within the LOD cloud (after life sciences,
government, linguistics, publications, and social networking) with 44
datasets. However, the real number of linked geo datasets seems to be
higher. For instance, OECD Linked Data contains geolocated informa-
tion, which is classied as government and not assigned to
Future work will focus on alignment of the developed guidelines
with existing best practices for geo (meta)data. Such work remains a
part of the activity of the Metadata Working Group of the Open Geo-
spatial Consortium (OGC). Revisions and amendments to the OGC
GeoDCAT-AP with respect to the outcomes of this paper are one of the
anticipated results.
Funding sources
This research was supported by the European Unions Horizon 2020
research and innovation programme under grant agreement No 769608
titled Policy Development based on Advanced Geospatial Data Analytics
and Visualisation (PoliVisu) and by the European Unions Horizon 2020
research and innovation programme under grant agreement No. 818346
called Sino-EU Soil Observatory for intelligent Land Use Management
Author contributions
s ˇ
Rezník conducted the study, wrote the text, prepared gur-
es&tables and revised the text. Lieven Raes conducted the study, wrote
the text and revised the text. Andrew Stott conducted the study, wrote
the text and revised the text. Bart De Lathouwer wrote and revised the
text. Andrea Perego wrote and revised the text. Karel Charv´
at revised the
text. ˇ
an Kafka conducted the study.
Computer code availability
No specic software/script is related to the presented work.
Declaration of competing interest
The authors declare that they have no known competing nancial
interests or personal relationships that could have appeared to inuence
the work reported in this paper.
Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi.
Clarivate Analytics, 2020. Web of Science.
ANZLIC, 2020. Metadata ANZLIC Spatial Resource Discovery and Access Toolkit 2009.
Augusto, L.M., 2019. Formal Logic: Classical Problems and Proofs. College Publications,
Baiquan, X., Shiqiang, Y., Qianju, W., Jian, L., Xiaoping, W., Keyong, D., 2013.
Geospatial data infrastructure: the development of metadata for geo-information in
China. In: 35th International Symposium on Remote Sensing of Environment
(ISRSE35), Beijing, China, vol. 17.
012259. IOP Conference Series: Earth and Environmental Science.
Brickley, D., Burgess, M., Noy, N., 2019. Google Dataset Search: building a search engine
for datasets in an open Web ecosystem. In: The World Wide Web Conference. ACM,
New York, NY, USA, pp. 13651375. URL:
Brodeur, J., Coetzee, S., Danko, D., Garcia, S., Hjelmager, J., 2019. Geographic
information metadataan outlook from the international standardization
perspective. ISPRS Int. J. Geo-Inf. 8 (6), 280.
Clarke, M., 2012. In: Campbell, R., Pentz, E., Borthwick, I. (Eds.), The Digital Revolution.
Academic and Professional Publishing. Chandos Publishing, pp. 7998. https://doi.
Codd, E.F., 1970. A relational model of data for large shared data banks. Commun. ACM
13 (6), 377387.
Corcho, O., Fern´
opez, M., G´
erez, A., 2003. Methodologies, tools and
languages for building ontologies. Where is their meeting point? Data Knowl. Eng.
46 (1), 4164.
Da Silva, J.R., Castro, J.A., Ribeiro, C., Honrado, J., Lomba, A., Goncalves, J., 2014.
Beyond INSPIRE: an ontology for biodiversity metadata records. In: Meersman, R.,
et al. (Eds.), On the Move to Meaningful Internet Systems: OTM 2014 Workshops.
OTM 2014, Lecture Notes in Computer Science, vol. 8842. Springer, Berlin,
Heidelberg, pp. 597607.
Danko, D.M., 2012. Geospatial metadata. In: Kresse, W., Danko, D.M. (Eds.), Springer
Handbook of Geographic Information (S. 191244). Springer.
Di, L., Moe, K.L., Yu, G.N., 2009. Metadata requirements analysis for the emerging Sensor
Web. Int. J. Digit. Earth 2, 317.
Dublin Core, 2020. Dublin Core Metadata Initiative.c
EPSG, 2019., 4326.
European Commission, 2007. Directive 2007/2/EC of the European Parliament and of
the Council of 14 March 2007 establishing an Infrastructure for Spatial Information
in the European Community (INSPIRE).
European Commission, 2008. PDF, EN. Commission Regulation (EC) No 1205/2008 of 3
December 2008 implementing Directive 2007/2/EC of the European Parliament and
of the Council as regards metadata.
FGDC, 1998. Content Standard for Digital Geospatial Metadata (CSDGM). https://www.
Foaf, 2014. FOAF Vocabulary Specication 0, vol. 99.
Furner, J., 2020. Denitions of metadata: a brief Survey of international standards.
J. Assoc. Inf. Sci. Technol. 71 (6), E33E42.
GEMET, 2020a. Measuring.
GEMET, 2020b. Noise Analysis.
GeoDCAT-Ap, 2015. A Geospatial Extension for the DCAT Application Prole for Data
Portals in Europe.le-
GeoNetwork, 2020. GeoNetwork.
Giles, J.R.A., 2011. Geoscience MetadataNo Pain, No Gain.
GitHub, 2020. CKAN.
Govedarica, M., Boskovic, D., Petrovacki, D., Ninkov, T., Ristic, A., 2010. Metadata
catalogues in spatial information systems. Geod. List. 4, 313334. URL: https://h
Gubbi, J., Buyya, R., Marusic, S., Palaniswami, M., 2013. Internet of Things (IoT): a
vision, architectural elements, and future directions. Future Generat. Comput. Syst.
29 (7), 16451660.
Habermann, T., 2018. Metadata life cycles, use cases and hierarchies. Geosciences 8 (5),
HarmonISA, 2020. Harmonisation of land-use data.
Harris, R., Olby, N., 2001. Earth observation data archiving in the USA and Europe.
Space Pol. 17, 3548.
Ho, Q.V., Lundblad, P., Astrom, T., Jern, M., 2011. A web-enabled visualization toolkit
for geovisual analytics. In: Wong, P.C., et al. (Eds.), Proceedings of SPIE, the
International Society for Optical Engineering: SPIE: Electronic Imaging Science and
Technology. Visualization and Data Analysis, San Francisco USA.
10.1117/12.872250, 78680R.
Horrocks, I., 2005. OWL: a description logic based Ontology Language. In: van Beek, P.
(Ed.), Principles and Practice of Constraint Programming - CP 2005. CP 2005,
Lecture Notes in Computer Science, vol. 3709. Springer, Berlin, Heidelberg. https://
Horrocks, I., Patel-Schneider, P.F., van Harmelen, F., 2003. From SHIQ and RDF to OWL:
the making of a web Ontology Language. J. Web Semant. 1, 726.
Hu, Y.J., Janowicz, K., Prasad, S., Gao, S., 2015. Metadata topic harmonization and
semantic search for linked-data-driven geoportals: a case study using ArcGIS online.
Trans. GIS 19, 398416.
INSPIRE, 2020. Registry.
Isaac, A., Haslhofer, B., 2013. European linked open data - data. Semantic
Web 4 (3), 291297.
ISO 19115, 2003. ISO 19115:2003 geographic information metadata. https://www.
ISO 19115, 2014. ISO 19115-1:2014 geographic information metadata Part 1:
T. ˇ
Rezník et al.
ISO 8601-01, 2019. ISO 8601-01:2019 date and time representations for information
interchange Part 1: basic rules.
ISO/TS 19139, 2007. ISO/TS 19139:2007 Geographic information metadata XML
schema implementation.
ISO/TS 19139-1, 2019. ISO/TS 19139-1:2019 Geographic information XML schema
implementation Part 1: encoding rules.
Jensen, J., Saalfeld, A., Broome, F., Cowen, D., Price, K., Ramsey, D., Lapine, L., 2000.
Spatial data acquisition and integration.
JoinUp, 2020. GeoDCAT-AP 1.0.1 PDF.
Kalantari, M., Rajabifard, A., Olfat, H., Williamson, I., 2014. Geospatial metadata 2.0-an
approach for volunteered geographic information. Comput. Environ. Urban Syst. 48,
Kokla, M., Guilbert, E., 2020. A review of geospatial semantic information modeling and
elicitation approaches. ISPRS Int. J. Geo-Inf. 9, 146.
Kraak, J.-M., Brown, A., 2000. Web Cartography. CRC Press, p. 213.
Laa, S., Jablonski, J., Kuhn, W., Cooley, S., Medrano, F.A., 2016. Spatial discovery and
the research library. Trans. GIS 20, 399412.
Maue, P., Michels, H., Roth, M., 2012. Injecting semantic annotations into (geospatial)
web service descriptions. Semantic Web 3, 385395.
McCrae, J.P., 2020a. The Linked Open Data Cloud.
McCrae, J.P., 2020b. Organisation for Economic Co-operation and Development (OECD)
Linked Data.
McGee, M., Durante, K., Weimer, K.H., 2017. Applications and projections toward a
linked data model for describing cartographic resources. J. Map Geogr. Libr. 13,
Melton, J., Buxton, S., 2006. Metadata an overview. In: Melton, J., Buxton, S. (Eds.),
Queyring XML. Morgan Kaufmann, Burlington, USA, pp. 6784.
Moellering, H., Aalders, H.J.G.L., Crane, A., 2005. World spatial metadata standards.
Elsevier, amsterdam, Netherlands, 710 pp.
Narock, T., Fox, P., 2012. From science to e-Science to Semantic e-Science: a
Heliophysics case study. Comput. Geosci. 46, 248254.
Neumaier, S., Savenkov, V., Polleres, A., 2018. Geo-semantic labelling of open data.
Procedia Comput. Sci. 137, 920.
Nogueras-Iso, J., Zarazaga-Soria, F.J., Bejar, R., Alvarez, P.J., Muro-Medrano, P.R., 2005.
OGC Catalog services: a key element for the development of Spatial Data
Infrastructures. Comput. Geosci. 31, 199209.
OGC, 2006. OpenGIS® catalogue services ebRIM (ISO/TS 15000-3) prole of CSW.les/?artifact_id=12604&version=1&format
OGC, 2016. Catalogue Services 3.0 - General Model.
Palma, R., Reznik, T., Esbri, M., Charvat, K., Mazurek, C., 2016. An INSPIRE-based
vocabulary for the publication of agricultural linked data. In: Tamma, V.,
Dragoni, M., Gonçalves, R., Ławrynowicz, A. (Eds.), Ontology Engineering. OWLED
2015, Lecture Notes in Computer Science, vol. 9557. Springer, Cham, pp. 124133.
PoliVisu, 2020. PoliVisu EU Project - Policy & Data Results Hub. https://www.polivisu.
Raes, L., VanDenbroucke, D., Reznik, T., 2019. GeoDCAT-AP.
Reznik, T., Chudy, R., Micietova, E., 2016. Normalized evaluation of the performance,
capacity and availability of catalogue services: a pilot study based on INfrastruture
for SPatial InfoRmation in Europe (INSPIRE). Int. J. Digit. Earth 9, 325341. https://
Reznik, T., Charvat, K., Palma, R., Kozuch, D., Cerba, O., Jedlicka, K., Berzins, R.,
Bergheim, R., 2017. Integration of open Land use, smart point of interest and open
transport map using RDF. In: G20 Summit in Berlin, Meeting of Chief Agricultural
Schemaorg, 2020.
Schematron, 2020. Schematron.
Shien-Chiang, Y., Kun-Yung, L., Ruey-Shun, C., 2003. Metadata management system:
design and implementation. Electron. Libr. 21, 154164.
Sikos, L.F., 2017. RDF-powered semantic video annotation tools with concept mapping
to Linked Data for next-generation video indexing: a comprehensive review.
Multimed. Tool. Appl. 76 (12), 1443714460.
Smits, P.C., Friis-Christensen, A., 2006. Resource discovery in a European spatial data
infrastructure. IEEE Trans. Knowl. Data Eng. 19, 8595.
Tagliolato, P., Fugazza, C., Oggioni, A., Carrara, P., 2019. Semantic proles for easing
SensorML description: review and proposal. ISPRS Int. J. Geo-Inf. 8, 340. https://
W3C, 2010. XML Linking Language (XLink) Version 1.1.
W3C, 2014a. RDF.
W3C, 2014b. RDF 1.1 Turtle.
W3C, 2014c. RDF 1.1 XML Syntax.
W3C, 2015a. Semantic Web.
W3C, 2015b. HTML+RDFa 1.1 - second edition. Support for RDFa in HTML4 and
W3C, 2017a. Semantic Sensor Network Ontology.
W3C, 2017b. Spatial Data on the Web Best Practices.
W3C, 2020a. JSON-LD 1.1.
W3C, 2020b. Data Catalog Vocabulary (DCAT) - Version 2.
Weibel, S., Godby, J., Miller, E., 1995. OCLC/NCSAmetadata workshop report. URL:
Wilkinson, M., Dumontier, M., Aalbersberg, I., et al., 2016. The FAIR Guiding Principles
for scientic data management and stewardship. Sci. Data 3, 160018. https://doi.
Wilson, A., Cox, M., Elsborg, D., Lindholm, D., Traver, T., 2014. A semantically enabled
metadata repository for scientic data. Earth Science Informatics 8, 649661.
Zhang, M.D., Yuan, J., Gong, J.Y., Yue, P., 2013. An interlinking approach for linked
geospatial data. Int. Arch. Photogram. Rem. Sens. Spatial Inf. Sci. XL-7/W2,
T. ˇ
Rezník et al.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
The present paper provides a review of two research topics that are central to geospatial semantics: information modeling and elicitation. The first topic deals with the development of ontologies at different levels of generality and formality, tailored to various needs and uses. The second topic involves a set of processes that aim to draw out latent knowledge from unstructured or semi-structured content: semantic-based extraction, enrichment, search, and analysis. These processes focus on eliciting a structured representation of information in various forms such as: semantic metadata, links to ontology concepts, a collection of topics, etc. The paper reviews the progress made over the last five years in these two very active areas of research. It discusses the problems and the challenges faced, highlights the types of semantic information formalized and extracted, as well as the methodologies and tools used, and identifies directions for future research.
Full-text available
The adoption of Sensor Web Enablement (SWE) practices by sensor maintainers is hampered by the inherent complexity of the Sensor Model Language (SensorML), its high expressiveness, and the scarce availability of editing tools. To overcome these issues, the Earth Observation (EO) community often recurs to SensorML profiles narrowing the range of admitted metadata structures and value ranges. Unfortunately, profiles frequently fall short of providing usable editing tools and comprehensive validation criteria, particularly for the difficulty of checking value ranges in the multi-tenanted domain of the Web of Data. In this paper, we provide an updated review of current practices, techniques, and tools for editing SensorML in the perspective of profile support and introduce our solution for effective profile definition. Beside allowing for formalization of a broad range of constraints that concur in defining a metadata profile, our proposal closes the gap between profile definition and actual editing of the corresponding metadata by allowing for ex-ante validation of the metadata that is produced. On this basis, we suggest the notion of Semantic Web SensorML profiles, characterized by a new family of constraints involving Semantic Web sources. We also discuss implementation of SensorML profiles with our tool and pinpoint the benefits with respect to the existing ex-post validation facilities provided by schema definition languages.
Full-text available
Geographic information metadata provides a detailed description of geographic information resources. Well before digital data emerged, metadata were shown in the margins of paper maps to inform the reader of the name of the map, the scale, the orientation of the magnetic North, the projection used, the coordinate systems, the legend, and so on. Metadata were used to communicate practical information for the proper use of maps. When geographic information entered the digital era with geographic information systems, metadata was also collected digitally to describe datasets and the dataset collections for various purposes. Initially, metadata were collected and saved in digital files by data producers for their own specific needs. The sharing of geographic datasets that required producers to provide metadata with the dataset to guide proper use of the dataset—map scale, data sources, extent, datum, coordinate reference system, etc. Because of issues with sharing and no common understanding of metadata requirements, the need for metadata standardization was recognized by the geographic information community worldwide. The ISO technical committee 211 was created in 1994 with the scope of standardization in the field of digital geographic information to support interoperability. In the early years of the committee, standardization of metadata was initiated for different purposes, which culminated in the ISO 19115:2003 standard. Now, there are many ISO Geographic information standards that covers the various aspect of geographic information metadata. This paper traces an illustration of the development and evolution of the requirements and international standardization activities of geographic information metadata standards, profiles and resources, and how these attest to facilitating the discovery, evaluation, and appropriate use of geographic information in various contexts.
Full-text available
This article describes a bibliographic mapping project recently undertaken at the Library of the Institute for the Study of the Ancient World (ISAW). The MARC Advisory Committeerecently approved an update to MARC that enables the use of dereferenceable Uniform Resource Identifiers (URIs) in MARC subfield $0. The ISAW Library has taken advantage of MARC’s new openness to URIs, using identifiers from the linked data gazetteer Pleiades in MARC records and using this metadata to create maps representing our library’s holdings. By populating our MARC records with URIs from Pleiades, an online, linked open data (LOD) gazetteer of the ancient world, we are able to create maps of the geographic metadata in our library’s catalog. This article describes the background, procedures, and potential future directions for this collection-mapping project.
Purposes This paper aims to present an objective summary of the current state of research concerning the evaluation criteria of map metadata. The undertaken research identifies which authors and to what extent the discussed issues related to the metadata of objects collected in digital libraries, with particular emphasis on cartographic materials. Design/methodology/approach Independent reviewers analysed the basic articles data. Selected papers were subject to quality assessment, based on the full text and 12 questions. Finally, iterative backward reference search was conducted. Findings The results demonstrate that there are no universal criteria for metadata evaluation. There are no works that would assess the metadata of cartographic studies, although numerous publications point to the need for this type of work. Practical implications Metadata evaluation allows users to obtain knowledge whether objects found in the library are relevant for their needs. Originality/value The criteria and methods most often used for assessing metadata quality which can be adopted to map metadata evaluation have been identified. The authors identified the existing research gaps and proved that there is a need for research contributions in the field of evaluating map metadata.
A search on the term “metadata” in the International Organization for Standardization's Online Browsing Platform (ISO OBP) reveals that there are 96 separate ISO standards that provide definitions of the term. Between them, these standards supply 46 different definitions—a lack of standardization that we might not have expected, given the context. In fact, if we make creative use of Simpson's index of concentration (originally devised as a measure of ecological diversity) to measure the degree of standardization of definition in this case, we arrive at a value of 0.05, on a scale of zero to one. It is suggested, however, that the situation is not as problematic as it might seem: that low cross‐domain levels of standardization of definition should not be cause for concern.
Virtual Research Environments (VREs), also known as science gateways or virtual laboratories, assist researchers in data science by integrating tools for data discovery, data retrieval, workflow management and researcher collaboration, often coupled with a specific computing infrastructure. Recently, the push for better open data science has led to the creation of a variety of dedicated research infrastructures (RIs) that gather data and provide services to different research communities, all of which can be used independently of any specific VRE. There is therefore a need for generic VREs that can be coupled with the resources of many different RIs simultaneously, easily customised to the needs of specific communities. The resource metadata produced by these RIs rarely all adhere to any one standard or vocabulary however, making it difficult to search and discover resources independently of their providers without some translation into a common framework. Cross-RI search can be expedited by using mapping services that harvest RI-published metadata to build unified resource catalogues, but the development and operation of such services pose a number of challenges. In this paper, we discuss some of these challenges and look specifically at the VRE4EIC Metadata Portal, which uses X3ML mappings to build a single catalogue for describing data products and other resources provided by multiple RIs. The Metadata Portal was built in accordance to the e-VRE Reference Architecture, a microservice-based architecture for generic modular VREs, and uses the CERIF standard to structure its catalogued metadata. We consider the extent to which it addresses the challenges of cross-RI search, particularly in the environmental and earth science domain, and how it can be further augmented, for example to take advantage of linked vocabularies to provide more intelligent semantic search across multiple domains of discourse.
Conference Paper
There are thousands of data repositories on the Web, providing access to millions of datasets. National and regional governments, scientific publishers and consortia, commercial data providers, and others publish data for fields ranging from social science to life science to high-energy physics to climate science and more. Access to this data is critical to facilitating reproducibility of research results, enabling scientists to build on others' work, and providing data journalists easier access to information and its provenance. In this paper, we discuss Google Dataset Search, a dataset-discovery tool that provides search capabilities over potentially all datasets published on the Web. The approach relies on an open ecosystem, where dataset owners and providers publish semantically enhanced metadata on their own sites. We then aggregate, normalize, and reconcile this metadata, providing a search engine that lets users find datasets in the “long tail” of the Web. In this paper, we discuss both social and technical challenges in building this type of tool, and the lessons that we learned from this experience.