Conference PaperPDF Available

Abstract and Figures

Effective metadata handling is determinant for successful data discovery among the organizations and in the context of a Distributed Oceanographic Observatory. However, in any distributed system, the harvesting of consistent metadata from data services implemented by distinct providers is not without obstacles. In this publication authors will describe how they are dealing with these issues in the scope of a service ecosystem that aims at providing interoperation with analogous infrastructures at a global scale. Topics such as i) the adoption of controlled vocabularies; ii) the use of standard encodings both for data and metadata and; iii) the usage of service implementations in conformance with the INSPIRE technical recommendations will be discussed. Other aspects such as keeping the process of metadata inclusion efficient, by maximizing metadata descriptors that can be automatically harvested from data, thus minimizing the impact on data producers will also be addressed.
Content may be subject to copyright.
Lisboa, 21, 22 e 23 de junho de 2016
405
4as Jornadas de Engenharia Hidrográfica Lisboa, 21 a 23 de junho de 2016
Data discovery mechanisms and metadata handling in RAIA Coastal
Observatory
Artur Rocha (1), Marco Amaro Oliveira (1), Filipe Freire (1), Gabriel David (1), Pedro Monteiro
Vilar (2), Begoña Vila Taboada (2), Isabel Iglesias (3), Clara Lázaro (3), Luísa Bastos (3), Ilmer van
Golde (4), A. Jorge Silva (4)
(1) Instituto de Engenharia de Sistemas e Computadores, Tecnologia e Ciência (INESC TEC),
artur.rocha@inesctec.pt.
(2) Instituto Tecnlóxico para o Control do Medio Mariño de Galicia (INTECMAR).
(3) Centro Interdisciplinar de Investigação Marinha e Ambiental (CIIMAR).
(4) Instituto Hidrográfico.
Abstract: Effective metadata handling is determinant for successful data discovery among the
organizations and in the context of a Distributed Oceanographic Observatory. However, in any distributed
system, the harvesting of consistent metadata from data services implemented by distinct providers is not
without obstacles. In this publication authors will describe how they are dealing with these issues in the
scope of a service ecosystem that aims at providing interoperation with analogous infrastructures at a
global scale. Topics such as: i) the adoption of controlled vocabularies; ii) the use of standard encodings
both for data and metadata and; iii) the usage of service implementations in conformance with the
INSPIRE technical recommendations will be discussed. Other aspects such as keeping the process of
metadata inclusion efficient, by maximizing metadata descriptors that can be automatically harvested
from data, thus minimizing the impact on data producers will also be addressed.
Key words: THREDDS, INSPIRE, catalogue service, nearshore forecasts, metadata, RAIA.
1. INTRODUCTION
Oceanographic Observatories such as RAIA are a
joint effort undertaken by several organizations,
distinct in competences and geographically disperse,
that bring together specific thematic knowledge,
along with established background practices when
dealing with their data.
In this context, adherence to the technical standards
and recommendations that enable the
implementation of the INSPIRE directive transcends
the need to discover and invoke services, which
could be achieved in an ad-hoc way, but would pose
serious threats in cross-referencing data produced at
different sites. Therefore, the pursuit of truly
interoperable and harmonized services, as foreseen
in INSPIRE implementation recommendations
(Council of European Union, 2007), and in annexes
6 and 7 of the directive, has been an underlying
principle to the RAIA Coastal Observatory since its
start (Vila et al., 2012).
The Observatory includes several data and service
providers, which are themselves consumers of the
data and services delivered by their peers. In such
architecture each provider acts as an independent
node of the infrastructure of the Observatory.
As a node, not only each provider can have its own
“customers”, but it can also act as a link in the value
chain to provide added-value products and services
to end-users. Furthermore, each of the nodes can act
as “display case” or even broker for products and
services delivered by other nodes.
From the computational perspective, this builds up
to a federated distributed computing infrastructure,
which, in order to interoperate properly, relies in a
set of standards in terms of the Application
Programming Interface (API) to enquire both data
and metadata, as well as in terms of the message
formats and encodings that these APIs accept and
return.
Since the RAIA Coastal Observatory needs to be
able to deal with a variety of geographical data of
different nature, from observations to forecasts, both
in a discrete or aggregated fashion and also be able
to account for distinct ways of exploring this data
enhancing it according to the needs of target user
groups, several standards are in use.
However, aside from proper implementation of these
standards, the quality and completeness of services
metadata and of the data they contain is of
paramount importance to the overall quality of the
resulting infrastructure, influencing the user’s ability
to find and effectively make use of the offered
products and services.
2. PROBLEM DESCRIPTION
Past attempts revealed that setting up a catalogue
service relying on geographically distributed data
sources may lead to a poor end-user experience,
even if the sources comply with standard APIs, such
4as Jornadas de Engenharia Hidrográfica Lisboa, 21 a 23 de junho de 2016
Data discovery mechanisms and metadata handling in RAIA Coastal
Observatory
Artur Rocha (1), Marco Amaro Oliveira (1), Filipe Freire (1), Gabriel David (1), Pedro Monteiro
Vilar (2), Begoña Vila Taboada (2), Isabel Iglesias (3), Clara Lázaro (3), Luísa Bastos (3), Ilmer van
Golde (4), A. Jorge Silva (4)
(1) Instituto de Engenharia de Sistemas e Computadores, Tecnologia e Ciência (INESC TEC),
artur.rocha@inesctec.pt.
(2) Instituto Tecnlóxico para o Control do Medio Mariño de Galicia (INTECMAR).
(3) Centro Interdisciplinar de Investigação Marinha e Ambiental (CIIMAR).
(4) Instituto Hidrográfico.
Abstract: Effective metadata handling is determinant for successful data discovery among the
organizations and in the context of a Distributed Oceanographic Observatory. However, in any distributed
system, the harvesting of consistent metadata from data services implemented by distinct providers is not
without obstacles. In this publication authors will describe how they are dealing with these issues in the
scope of a service ecosystem that aims at providing interoperation with analogous infrastructures at a
global scale. Topics such as: i) the adoption of controlled vocabularies; ii) the use of standard encodings
both for data and metadata and; iii) the usage of service implementations in conformance with the
INSPIRE technical recommendations will be discussed. Other aspects such as keeping the process of
metadata inclusion efficient, by maximizing metadata descriptors that can be automatically harvested
from data, thus minimizing the impact on data producers will also be addressed.
Key words: THREDDS, INSPIRE, catalogue service, nearshore forecasts, metadata, RAIA.
1. INTRODUCTION
Oceanographic Observatories such as RAIA are a
joint effort undertaken by several organizations,
distinct in competences and geographically disperse,
that bring together specific thematic knowledge,
along with established background practices when
dealing with their data.
In this context, adherence to the technical standards
and recommendations that enable the
implementation of the INSPIRE directive transcends
the need to discover and invoke services, which
could be achieved in an ad-hoc way, but would pose
serious threats in cross-referencing data produced at
different sites. Therefore, the pursuit of truly
interoperable and harmonized services, as foreseen
in INSPIRE implementation recommendations
(Council of European Union, 2007), and in annexes
6 and 7 of the directive, has been an underlying
principle to the RAIA Coastal Observatory since its
start (Vila et al., 2012).
The Observatory includes several data and service
providers, which are themselves consumers of the
data and services delivered by their peers. In such
architecture each provider acts as an independent
node of the infrastructure of the Observatory.
As a node, not only each provider can have its own
“customers”, but it can also act as a link in the value
chain to provide added-value products and services
to end-users. Furthermore, each of the nodes can act
as “display case” or even broker for products and
services delivered by other nodes.
From the computational perspective, this builds up
to a federated distributed computing infrastructure,
which, in order to interoperate properly, relies in a
set of standards in terms of the Application
Programming Interface (API) to enquire both data
and metadata, as well as in terms of the message
formats and encodings that these APIs accept and
return.
Since the RAIA Coastal Observatory needs to be
able to deal with a variety of geographical data of
different nature, from observations to forecasts, both
in a discrete or aggregated fashion and also be able
to account for distinct ways of exploring this data
enhancing it according to the needs of target user
groups, several standards are in use.
However, aside from proper implementation of these
standards, the quality and completeness of services
metadata and of the data they contain is of
paramount importance to the overall quality of the
resulting infrastructure, influencing the user’s ability
to find and effectively make use of the offered
products and services.
2. PROBLEM DESCRIPTION
Past attempts revealed that setting up a catalogue
service relying on geographically distributed data
sources may lead to a poor end-user experience,
even if the sources comply with standard APIs, such
4.as Jornadas de Engenharia Hidrográfica
406
as Web Map Service (WMS)(OGC, 2006), Web
Feature Service (WFS)(OGC, 2010) or Web
Coverage Service (WCS)(OGC, 2012a), and these
return results which comply with standard schemas.
This is due to the fact that most data providers are
driven by the goal of delivering “maps on the web”,
either using free or commercial implementations of
standards, while not actually being sensitive to the
rationale behind good metadata.
On the other hand the process of duly annotating
published datasets can be tedious and time-
consuming if not properly automated, in order to
maximize what can be directly extracted from data.
3. METHODS AND IMPLEMENTATION
As previously stated, the Observatory needs to deal
with geographical data of different natures, so it was
established that providers would use the adequate
implementations for each, in accordance to INSPIRE
technical recommendations (INSPIRE, 2014).
For well-known interfaces such as WMS, WFS and
WCS, popular implementations such as GeoServer
(http://geoserver.org/) have been used. Nevertheless,
particular attention was paid to filling service
metadata in accordance to the foreseen in INSPIRE
Metadata Annexes for the respective themes, such as
Ocean Features and Meteorological Conditions.
This had a strong impact on the quality of metadata
harvested by the geo-catalogue implementation,
based on GeoNetwork (http://geonetwork-
opensource.org/).
Observation data resulting from sensor systems such
as buoys, meteorological stations or wind towers,
was conveyed to implementations of the Sensor
Observation Service (SOS) standard (OGC, 2012b),
which returns sensor data using the Observations
and Measurements (O&M) standard (OGC, 2011).
The major advantage of this standard is the ability to
store, in a semantically rich way, any kind of
observation about any feature (a real world
phenomena or indirect ways to observe it),
comprising one or several observed properties, along
with the description of the process used to make the
observation, such as the characteristics of the sensor,
algorithm or system of these, used to observe the
properties. In summary, the use of this standard
ensures that observations of any application domain
are stored and published in a self-describing way.
Graphical interfaces to explore and download the
contents of the SOS have been developed (Fig. 1)
and its contents are also continuously harvested by
the geo-catalogue implementation.
Forecasts and other multi-variable aggregated data
are often voluminous since they usually encompass
several high-resolution geographic coverages, for
multiple time frames, therefore the O&M standard is
presently not the most efficient way to encode them.
For these, we opted to use the Network Common
Data form (NetCDF) (Unidata, 2016a) encoding as it
is an increasingly popular format, which is both
efficient and semantically rich.
Fig. 1. Sensor Observation Service Client.
In fact, the NetCDF file header can accommodate
both the metadata required by Common Data Format
(CDF) (NASA, 2015) and ISO 19115-2 (ISO, 2009)
standards, as well as to make use of the CF (Climate
and Forecast) (Eaton et al., 2011) convention for
attribute names and units, that seems to suit most
uses and includes clear rules on how to generate new
attribute names. It has been agreed that data
providers would use CF as controlled vocabulary to
describe data, using version 1.4 and newer.
Providers deployed and configured Unidata’s
THREDDS Data Servers (TDS) (Unidata, 2016b), to
publish this type of data. TDS implementations offer
a large variety of standard APIs to get forecast data,
as well as to verify the completeness of its
associated metadata (see Figure 2).
Fig. 2. THREDDS Data Server: detail on metadata.
Forecast data and similar products, are also
harvested by the geo-catalogue implementations.
Fig. 3 illustrates Instituto de Engenharia de Sistemas
e Computadores, Tecnologia e Ciência (INESC
TEC) geo-catalogue, although a similar service
exists at Instituto Tecnlóxico para o Control do
Medio Mariño de Galicia (INTECMAR). In fact,
both catalogues index each other, in a process called
catalogue federation.
Lisboa, 21, 22 e 23 de junho de 2016
407
4as Jornadas de Engenharia Hidrográfica Lisboa, 21 a 23 de junho de 2016
Fig. 3. Geo-catalogue interface example.
4. FINAL CONSIDERATIONS
Although the implementation of a Coastal
Observatory is a permanent work in progress,
due to the increasing amount of data sources
and diverse new products and services, the
applied methods contributed to improve the
ability of end users to find and use the resources
and value-added services made available by
RAIA.
Furthermore, the adoption of standard
encodings and APIs, increases the complexity
of the implementation (and may reduce its pace)
but largely compensates by enabling the
realization of a system of systems, composed by
similar infrastructures across the world.
Acknowledgements
This work is financed by the European Regional
Development Fund (ERDF) through the
Operational Programme for Competitiveness
and Internationalisation COMPETE
Programme, and by National Funds through the
Fundação para a Ciência e a Tecnologia (FCT
Portuguese Foundation for Science and
Technology) within projects POCI-01-0145-
FEDER-006961, 0520-RAIA-CO-1-E and
0688-RAIATEC-1-P.
REFERENCES
Council of European Union (2007). Council
Directive 2007/2/EC of the European
Parliament and of the Council, of 14
March 2007 establishing an Infrastructure
for Spatial Information in the European
Community (INSPIRE), available online
http://eur-lex.europa.eu/legal-
content/EN/ALL/?uri=CELEX:32007L000
2, published on 2007-03-14.
Eaton, B., Gregory, J., Drach, B., Taylor, K.,
Hankin, S., Caron, J., Signell, R., Bentley,
P., Rappa, G., Höck, H., Pamment, A.,
Juckes, M. (2011). NetCDF Climate and
Forecast (CF) Metadata Conventions, ,
available online
http://cfconventions.org/cf-
conventions/v1.6.0/cf-conventions.pdf,
published on 2011-12-05.
INSPIRE (2014). INSPIRE Data Specifications
Drafting Team, Technical Guidance for
INSPIRE Spatial Data Services and
services allowing spatial data services to
be invoked, version 3.1, available online
http://inspire.ec.europa.eu/index.cfm/pagei
d/241/documentid/3510, published on
2014-12-17.
ISO (2009). International Organization for
Standardization, ISO 19115-2:2009
Geographic information - Metadata - Part 2:
Extensions for imagery and gridded data,
published on 2009-02-15.
4.as Jornadas de Engenharia Hidrográfica
408
NASA (2015). CDF User's Guide, NASA /
Goddard Space Flight Center, available
online
http://cdaweb.gsfc.nasa.gov/pub/software/cd
f/doc/cdf361/cdf361ug.pdf, published on
2015-09-20.
OGC (2006). Open Geospatial Consortium,
Web Map Server Implementation
Specification, version 1.3.0 (also ISO
19128), Doc:06-042, available online
http://portal.opengeospatial.org/files/?artifac
t_id=14416, published on 2006-03-15.
OGC (2010). Open Geospatial Consortium,
Web Feature Service 2.0 Interface Standard,
version 2.0.0 (also ISO 19142), Doc: 09-
025r1, available online
http://portal.opengeospatial.org/files/?artifac
t_id=39967, published on 2010-11-02.
OGC (2011). Open Geospatial Consortium,
Observations and Measurements - XML
Implementation, available online
http://portal.opengeospatial.org/files/?artifac
t_id=41510, published on 2011-03-22.
OGC (2012a). Open Geospatial Consortium,
WCS 2.0 Interface Standard- Core, version
2.0.1Doc: 09-110r4, available online
https://portal.opengeospatial.org/files/09-
110r4, published on 2012-07-12.
OGC (2012b). Open Geospatial Consortium,
OGC® Sensor Observation Service Interface
Standard, OGC 12-006, available online
https://portal.opengeospatial.org/files/?artifa
ct_id=47599, published on 2012-04-20.
Unidata (2016a). Unidata, Network Common
Data Form, version 4.4.3, available online
http://doi.org/10.5065/D6H70CW6,
published on 2016-02-21.
Unidata (2016b). Unidata, THREDDS Data
Server version 4.6.4, available online
http://doi.org/10.5065/D6N014KG,
published on 2016-02-16.
Vila, B., Gómez, A., Cortizas, C., Díaz. P.,
Oliveira, M.A., Rocha, A., Méndez,
X.(2007). RAIA Observatory: Visualization
of Oceanographic Data under INSPIRE
Directive, in: Atas das II Jornadas de
Engenharia Hidrográfica, Instituto
Hidrográfico, 2012.
... The model results as well as the several databases generated during the projects were filtered and analyzed and a common data platform was created to provide data access thought web and mobile interfaces. Data are provided according to international standards, following OpenDAP, SOS, and WMS protocols (Vila et al., 2012;Rocha et al., 2016). ...
Article
Full-text available
Coastal and Oceanic Observatories are important tools to provide information on ocean state, phenomena and processes. They meet the need for a better understanding of coastal and ocean dynamics, revealing regional characteristics and vulnerabilities. These observatories are extremely useful to guide human actions in response to natural events and potential climate change impacts, anticipating the occurrence of extreme weather and oceanic events and helping to minimize consequent personal and material damages and costs. International organizations and local governments have shown an increasing interest in operational oceanography and coastal, marine and oceanic observations, which resulted in substantial investments in these areas. A variety of physical, chemical and biological data have been collected to better understand the specific characteristics of each ocean area and its importance in the global context. Also the general public’s interest in marine issues and observatories has been raised, mainly in relation to vulnerability, sustainability and climate change issues. Data and products obtained by an observatory are hence useful to a broad range of stakeholders, from national and local authorities to the population in general. An introduction to Ocean Observatories, including their national and regional importance, and a brief analysis of the societal interest in these observatories and related issues are presented. The potential of a Coastal and Ocean Observatory is then demonstrated using the RAIA observatory as example. This modern and comprehensive observatory is dedicated to improve operational oceanography, technology and marine science for the North Western Iberian coast, and to provide services to a large range of stakeholders.
Conference Paper
Full-text available
One of the main goals of RAIA Project is to incorporate the requirements defined by the European Directive INSPIRE, which establishes an infrastructure for spatial information in Europe to support Community environmental policies and activities, in order to ensure that the spatial data infrastructures collected by the oceanographic observatory for the northwest Iberian Peninsula are compatible and usable. Through a single interoperable platform, RAIA Observatory will allow the public access to the information gathered by the oceanographic network. This platform is based on the use of solutions that implement standards defined by the Open Geospatial Consortium (OGC), such as WMS (Web Map Service), WFS (Web Feature Service) and WCS (Web Coverage Service). RAIA Project also includes other types of products that use specific protocols, such as OPeNDAP or Thredds, which are widely known and used by the oceanographic community.
NetCDF Climate and Forecast (CF) Metadata Conventions
  • B Eaton
  • J Gregory
  • B Drach
  • K Taylor
  • S Hankin
  • J Caron
  • R Signell
  • P Bentley
  • G Rappa
  • H Höck
  • A Pamment
  • M Juckes
Eaton, B., Gregory, J., Drach, B., Taylor, K., Hankin, S., Caron, J., Signell, R., Bentley, P., Rappa, G., Höck, H., Pamment, A., Juckes, M. (2011). NetCDF Climate and Forecast (CF) Metadata Conventions,, available online http://cfconventions.org/cfconventions/v1.6.0/cf-conventions.pdf, published on 2011-12-05.
International Organization for Standardization
  • Iso
ISO (2009). International Organization for Standardization, ISO 19115-2:2009
CDF User's Guide, NASA / Goddard Space Flight Center
NASA (2015). CDF User's Guide, NASA / Goddard Space Flight Center, available online http://cdaweb.gsfc.nasa.gov/pub/software/cd f/doc/cdf361/cdf361ug.pdf, published on 2015-09-20.
Open Geospatial Consortium, Web Map Server Implementation Specification
OGC (2006). Open Geospatial Consortium, Web Map Server Implementation Specification, version 1.3.0 (also ISO 19128), Doc:06-042, available online http://portal.opengeospatial.org/files/?artifac t_id=14416, published on 2006-03-15.
Open Geospatial Consortium, Web Feature Service 2.0 Interface Standard
OGC (2010). Open Geospatial Consortium, Web Feature Service 2.0 Interface Standard, version 2.0.0 (also ISO 19142), Doc: 09-025r1, available online http://portal.opengeospatial.org/files/?artifac t_id=39967, published on 2010-11-02.
Open Geospatial Consortium, Observations and Measurements -XML Implementation
OGC (2011). Open Geospatial Consortium, Observations and Measurements -XML Implementation, available online http://portal.opengeospatial.org/files/?artifac t_id=41510, published on 2011-03-22.
Open Geospatial Consortium, OGC® Sensor Observation Service Interface Standard, OGC 12-006
OGC (2012b). Open Geospatial Consortium, OGC® Sensor Observation Service Interface Standard, OGC 12-006, available online https://portal.opengeospatial.org/files/?artifa ct_id=47599, published on 2012-04-20.
Unidata, THREDDS Data Server version 4.6
  • Unidata
Unidata (2016b). Unidata, THREDDS Data Server version 4.6.4, available online http://doi.org/10.5065/D6N014KG, published on 2016-02-16.
Open Geospatial Consortium, WCS 2.0 Interface Standard-Core, version 2.0.1Doc: 09-110r4
OGC (2012a). Open Geospatial Consortium, WCS 2.0 Interface Standard-Core, version 2.0.1Doc: 09-110r4, available online https://portal.opengeospatial.org/files/09-110r4, published on 2012-07-12.