Technical ReportPDF Available

Abstract and Figures

This document consists of the data management plan (DMP) that has developed for the SeaTech project. This DMP describes all the data sets that will be generated and collected under this project and explains how it will be exploited or if it will be shared for verification and re-use. The SeaTech project has proposed to develop two symbiotic ship engine and propulsion innovations, which when combined, lead to an increase in fuel efficiency and radical emission reduction. Both innovations will collect a considerable amount of data sets that will be combined and analysed under the advanced data analytics framework (ADAF). However, the selected sets of these innovations will be shared, as open access to research data, as illustrated in the initial project proposal, with the consent of the data protection officer. Furthermore, this document describes the data, what are the data sources, how it will be stored and backed-up, how and where to download, who owns it and who is responsible for the different data. One should note that this DMP is developed accordance with this Horizon 2020 DMP template guidelines. In general terms, the research data sets should be supported with the 'FAIR' principle, that is findable, accessible, interoperable and re-usable, accordance with the same guidelines. This document will be updated as the project progresses.
Content may be subject to copyright.
D1.5 DATA MANAGEMENT PLAN
Grant Agreement Nr
857840
Deliverable Leader
Wärtsilä
Related Task(s)
T1.3 Ensure compliance with data management principles, including
commercial and regulatory aspects
Author(s)
Lokukaluge Prasad Perera
Dissemination level
Public
Due Submission Date
30.11.2020
Actual Submission
30.11.2020
Status
Updated version v1.2
The opinions expressed in this document reflect only the author’s view and in no way reflect the European
Commission’s opinions. The European Commission is not responsible for any use that may be made of the
information it contains.
2
SEATECH-D1-4-Nov30-DataManagementPlan-v1.2
Contents
Executive Summary ........................................................................................................................................... 3
Version and contribution control ...................................................................................................................... 3
1 Data Summary ........................................................................................................................................... 4
1.1 Data purpose ..................................................................................................................................... 4
1.2 Data type ........................................................................................................................................... 5
1.3 Data size............................................................................................................................................. 6
1.4 Data protection officer. ..................................................................................................................... 6
2 FAIR data .................................................................................................................................................... 6
2.1 Making data findable, including provisions for metadata ................................................................. 6
2.2 Making data openly accessible .......................................................................................................... 7
2.3 Making data interoperable ................................................................................................................ 7
2.4 Increase data re-use (through clarifying licenses) ............................................................................. 7
3 Allocation of resources .............................................................................................................................. 8
4 Data security .............................................................................................................................................. 8
5 Ethical and legal aspects ............................................................................................................................ 8
6 References ................................................................................................................................................. 9
3
SEATECH-D1-4-Nov30-DataManagementPlan-v1.2
Executive Summary
This document consists of the data management plan (DMP) that has developed for the SeaTech project. This
DMP describes all the data sets that will be generated and collected under this project and explains how it
will be exploited or if it will be shared for verification and re-use. The SeaTech project has proposed to
develop two symbiotic ship engine and propulsion innovations, which when combined, lead to an increase
in fuel efficiency and radical emission reduction. Both innovations will collect a considerable amount of data
sets that will be combined and analysed under the advanced data analytics framework (ADAF). However,
the selected sets of these innovations will be shared, as open access to research data, as illustrated in the
initial project proposal, with the consent of the data protection officer. Furthermore, this document
describes the data, what are the data sources, how it will be stored and backed-up, how and where to
download, who owns it and who is responsible for the different data. One should note that this DMP is
developed accordance with this Horizon 2020 DMP template guidelines. In general terms, the research data
sets should be supported with the 'FAIR' principle, that is findable, accessible, interoperable and re-usable,
accordance with the same guidelines. This document will be updated as the project progresses.
Version and contribution control
Version
Date
Modified by
Modification description
V1.1
23.11.2020
Lokukaluge Prasad Perera
This version of DMP was published as part of
the respective deliverable on the management
of the project data sets.
V1.2
24.11.2020
Anders Öster and Jukka
Tiainen
Checked and approved.
V1.2
30.11.2020
Lokukaluge Prasad Perera
The next version is published.
All logos, trademarks, imagines, and brand names used herein are the property of their respective
owners. Images used are for illustration purposes only.
This work is licensed under the Creative Commons License “BY-NC-SA”.
4
SEATECH-D1-4-Nov30-DataManagementPlan-v1.2
1 Data Summary
1.1 Data purpose
The SeaTech project has proposed to develop two symbiotic ship engine and propulsion innovations, which
when combined, lead to an increase in fuel efficiency and radical emission reduction. The proposed
renewable-energy-based propulsion innovation consists of a bio-mimetic dynamic wing mounted at the ship
bow, augmented with ship propulsion in moderate and higher sea states, capturing wave energy, producing
extra thrust and damping ship motions. The proposed power generation innovation consists of an engine
with ultra-high energy conversion efficiency by precisely controlling its combustion process for achieving
optimally reduced emissions. The ultimate objective of the SeaTech project is to scale up both technologies,
demonstrate them in relevant environment and combine the expected complementarities and synergy
effects of deploying both innovations on a short-sea vessel scenario by extrapolating demonstration data
with the help of an Advanced Data Analytics Framework (ADAF). The ADAF will be developed as a software
module in a data science environment, where machine learning (ML) and artificial intelligence (AI) algorithms
will be implemented, with various data sets from laboratory tests, sea trails and industrial process to quantify
the engine and propulsion innovations proposed under the SeaTech project.
Figure 1: Advanced data analytics framework.
An overview of the ADAF is presented in Fig. 1 and that consists of three main sections of the life cycle and
cost analysis (LCCA), data pre-processing and data post-processing. That represents the overall data flow and
utilization process of this project. In the LCCA section, the cost associated with bunker fuel, engine
innovation, propeller technology and bio-mimetic dynamic wing, from materials to product disposal, will be
estimated. This information along with the engine and propulsion innovation data sets collected under
laboratory and large-scale model sea trails will be delivered to the next section of data pre-processing.
The data pre-processing section consists of the main steps of data handling, mapping and quality assurance
for the large-scale industrial sensors and business process data sets. This will be scaled into a selected vessel
in an assigned ship route, with the respective wind, wave and current conditions. The engine-propeller
combinator diagram can play an important role in this section to provide a basis for the respective data sets.
Hence, the various data sets collected in the previous sections will be clustered with respect to the engine
operational modes, then trim and draft conditions and the respective parameter correlations on each data
clusters will be investigated with respect to a selected set of key performance indicators (KPIs) . Such KPIs
will represent the main comparisons indexes of propulsion power vs. engine emissions, system installation,
5
SEATECH-D1-4-Nov30-DataManagementPlan-v1.2
operational, maintenance and disposal costs vs. benefits, and environmental impact. Therefore, the
respective KPIs can be used to quantify the performance of both engine and propulsion innovations, where
that can also be compared with various energy efficiency and emission control technology considerations in
shipping.
One should note that these innovations will not be tested under full-scale ocean going conditions under this
project, and that will be a future step taken by Wartsila under its product and solution development program.
Since the engine innovation will be tested under limited laboratory conditions and the propulsion innovation
under (large) model scale sea trail conditions, it would be difficult to quantify both innovations as an
integrated solution due to the same reasons. Therefore, this project proposes to argument the data sets that
are collected by both innovations under the ADAF with the basis of an engine-propeller combinator diagram.
The respective data layers that will collected under this project to support the ADAF are summarised in Fig.
2.
Figure 2: the data layers of the ADAF.
1.2 Data type
This project will collect several data types and that have been presented in the following table.
Data source
Data type
Contributor
1. Engine testing/Measurements data from the engine
innovation.
Sensor data
Wartsila
2. LCCA calculation data from the product development
and retrofitting operations.
Product cost and
retrofitting calculation data
(i.e. Business Process Data)
Wartsila &
Huygens
Engineers
3. Model scale vessel measurement data from the
propulsion innovation (the bio-memetic dynamic wing).
Sensor data
NTUA
4. Full scale ship performance and navigation data from
selected ocean-going vessels
Sensor data
Utkilen
Table 1: Data types collected by the project.
The data set in item (1) of Table 1 will be collected from the development and testing of the engine innovation
under laboratory conditions. The data set in item (2) of Table 1 is somewhat existing with Wartsila & Huygens
Engineers. It is expected that Such data sets can be obtained, i.e. from product development and retrofitting
operations, from the respective project partners, as mentioned before. Furthermore, these data can also be
obtained from the existing literature, i.e. research studies, in some situations, if required. The data set in
item (3) of Table 1 will be collected from the development and testing of the propulsion innovation under
the model scale sea trail tests. The data set in item (4) of the same table will be collected from ocean-going
full-scale vessels with the existing sensors and data acquisition systems.
6
SEATECH-D1-4-Nov30-DataManagementPlan-v1.2
1.3 Data size
It is expected that these data sets within the sizes of 1-10 GB due to the reason that these data sets consist
of numerical values. One should note that these data sets will be analysed under the proposed ADAF and
that will be developed by UiT with the support of NTUA.
1.4 Data protection officer.
Due to the commercial nature of this project some data sets may be categorized as confidential by the data
protection officer, whom that has been appointed by the consortium. Jukka Tiainen has nominated as
the data protection officer for this project.
2 FAIR data
2.1 Making data findable, including provisions for metadata
The research data sets should be supported with the 'FAIR' principle, that has findable, accessible,
interoperable and re-usable features. Hence, the following approaches will be initiated by the consortium to
support the 'FAIR' principle. The SeaTech project data sets will be archived in a database supported and
managed by the UiT University Library
1
through the UiT open research data service. The archive is based on
the Dataverse software, with metadata templates that comply with the requirements of DataCite
(http://schema.datacite.org/). These datasets will be assigned unique Digital Object Identifiers (DOIs). The
same research data may be linked to the corresponding publications of this project and vice versa via with
their DOIs.
According to the principle of "Good Scientific Practice" the data files cannot be changed after they have been
published. However, several versions of the same data sets can introduced such situations under the UiT
open research data service. One of the core functions of the proposed archive is versioning, which allows
new versions of published records while previous versions are kept available (e.g. in order to guarantee
correct citation). To every new version, a new DOI is assigned. Previous and new versions are linked to each
other automatically. Another core function consists in the possibility to set an ‘embargo’ (i.e. a blocking
period) to block the publication of full texts and research data until a certain date.
There are several naming conventions have been used by the shipping industry will be adopted to name the
respective parameters of the data sets. One should note that these data sets often use the most common
parameter names, i.e. engine power, rpm, fuel consumption, speed over ground, speed over water, relative
wind speed and direction etc., that have been used by the maritime industry. Hence, these data sets will
follow the same naming structure during this project.
The following keywords will be used in the respective data sets and that will be possible to re-use the data
sets for the future studies by the respective researchers.
Data source
Keywords
1. Engine testing/Measurements
data for the engine innovation.
Maritime, shipping, marine engine data
2. LCCA calculation data from the
product development and
retrofitting operations.
Maritime, shipping, retrofit data, product development, engine
cost, propeller cost
3. Model scale vessel
measurement data from the
propulsion innovation.
Maritime, shipping, mode-scale data, sea trials
1
URL: https://dataverse.no/dataverse/uit
7
SEATECH-D1-4-Nov30-DataManagementPlan-v1.2
4. Full scale ship performance
and navigation data from
selected ocean-going vessels
Maritime, shipping, full-scale data
Table 2: Key words for data sources.
One of the principles of the repository will be to use standard interfaces, protocols, metadata, etc. For data
exchange, standard interfaces, and standard protocols like OAI-PMH, SWORD (Simple Web-service Offering
Repository Deposit) will be implemented in this project. Using standard metadata schemas in the repository
will mean that metadata can easily be converted into other metadata schemas. Where possible, standard
vocabularies and machine-readable file formats can be used when storing the project research data. In the
course of the quality check after the submission of the data, the UiT library staff will verify whether the data
corresponds to the metadata standard.
2.2 Making data openly accessible
The Consortium is aware of the mandate for open access of publications in the H2020 projects and the
participation in the Open Research Data Pilot. Some selected data sets from Table 1, i.e. the representative
data sets from each item in the same table, will be available to the public through the UiT open research data
service. This is a certified depository service and a discussion has been conducted with the UiT library to
utilize this service for the SeaTech project, as mentioned before. That is a data deposition service, where
these data sets can be downloaded without additional software. One should note that any restrictions or
limitation on accessing these data sets have not been observed until now by the authors.
Due to the commercial nature of this project, the complete data sets from this project, as presented in Table
1, will not be available to the public. The selected data sets will be presented to the data protection officer
and after his consent, the representative data sets will be published. The data protection Officer will ensure
that historical 3rd party data, data generated within the project and other types of data are handled
according to regulatory and commercial principles. That will specially be applied to the data sets from the
commercial partners, such as Wartsila, Huygens Engineers and Utkilen.
The proposed ADAF will be developed as a software toolbox, so called ‘ShipAI’, that can be used to analyse
these data sets. This toolbox will not be available for the public, therefore the respective codes will not be
open source. The required IP rights for this software toolbox will be obtained by UiT in the future. However,
these codes will be shared with selected universities including NTUA to support their research projects.
There will not be a data access committee for this project but the data protection officer. This officer will
decide any accessible limitations on the data sets, if required. The web service will track the IP addresses of
the persons who will download these data sets, therefore any additional registrations will not be required
under the UiT open research data service.
2.3 Making data interoperable
The data sets in the project interoperable, that allows data exchange and re-use between the researchers,
institutions, organizations, countries, etc. The data sets will be available in the CSV format (a comma-
separated values) as text files. The same format can facilitate to re-combine with different datasets from
not only from this project but also with other available data sources, e.g. any weather data sets from external
providers. Since these data sets are limited to a selected set of parameters none of the standard vocabularies,
standards or methodologies and specific ontologies will not be used in this project.
2.4 Increase data re-use (through clarifying licenses)
There are several licensing options are available in the UiT open research data service. It is decided to apply
the MIT Creative Commons License, unless otherwise noted in the data sets. Therefore, the data sets can be
used for none-commercial research and development studies. For the commercial use of the SeaTech project
8
SEATECH-D1-4-Nov30-DataManagementPlan-v1.2
data, the respective party should contact the consortium. Then, the partner who has shared the respective
data sets will make the final decision on sharing and contracting requirements with the third party.
The data sets will be uploaded throughout the project life cycle. In general, within the following two weeks
after the laboratory experiment or sea trail, the data sets will be uploaded. This period will be taken by the
data protection officer to decide, whether the data sets can be shared with the public. These data sharing
updates will be informed under the various social media channels of LinkedIn (https://www.linkedin.com/
company/seatech2020/) and ResearchGate (https://www.researchgate.net/project/SeaTech-Horizon2020-
Project-Next-generation-short-sea-ship-dual-fuel-engine-and-propulsion-retrofit-technologies) to the
general public. As for the current conditions of the UiT open research data service, the service is free for an
indefinite time period. Therefore, the consortium would expect these data sets to be online for an indefinite
time period.
The data quality can be an important part of any data analysis. Hence, the data anomaly detection and
recovery steps will be implemented under the ADAF. Therefore, these data sets will be cleaned by the ADAF
during this project, adequality and such information will also be included in the metadata section. Therefore,
the persons who are downloading these data sets will have access to much cleaner parameter values.
Furthermore, some information on the data anomalies and recovery steps will also be published under this
project, that will also be available to the public as conference and journal papers. One should note that
machine-readable electronic copies of conference and journal papers (the final version or final peer-reviewed
manuscript accepted for publication) will be made available with open access publishing (gold open access)
or with a certain delay to get past the embargo period of green open access. These publications will also be
shared in the same social media channels of LinkedIn and ResearchGate, where the respective data sets and
possible download locations will be linked.
3 Allocation of resources
The UiT University Library will support with the respective costs that relate to data curation and preservation
of this project. Therefore, the SeaTech project funds have not been allocated for this process other than the
human resources that will be used to collect, prepare and upload such data sets into the UiT open research
data service. The data management activities of the SeaTech project will be done by UiT. As for the current
conditions of the UiT open research data service, the service is free for an indefinite time period, as
mentioned before. Therefore, it would expect these data sets to be online for an indefinite time period.
4 Data security
The required data security will be provided by the UiT open research data service. Therefore, any additional
services will not be considered during this project. The data sets that have sensitive and commercial nature
will not be published under this project with the recommendation of the data protection officer. Therefore,
it is expected that critical data security issues will not arise during this process.
5 Ethical and legal aspects
The SeaTech project does not handle any personal data, therefore any ethical aspects will not be considered
under this project. Since a data protection officer has been assigned to look into any sensitive or commercial
nature of the respective data sets, any legal aspects of the data sets will be addressed by the guidelines of
the same officer.
9
SEATECH-D1-4-Nov30-DataManagementPlan-v1.2
6 References
1. TEMPLATE HORIZON 2020 DATA MANAGEMENT PLAN (DMP), H2020 templates: Data management
plan v1.0 13.10.2016., URL: https://ec.europa.eu/research/participants/docs/h2020-funding-
guide/cross-cutting-issues/open-access-data-management/data-management_en.htm
2. Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data
management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
3. Guidelines on FAIR Data Management in Horizon 2020, Version 3.0, 26 July 2016.
... The selected data sets of these innovations will be shared, as open access research data with the consent of the data protection officer in this project. The respective data sharing strategy to support open innovation based applications are further described in the data management plan (DMP) (Perera, 2020) of the SeaTech H2020 project. The same document describes the respective data: what are the data sources, how it will be stored and backed-up, how and where to download, who owns it and who is responsible for different data sets. ...
Conference Paper
Full-text available
An overview of integrating two energy efficient and emission reduction technologies to improve ship energy efficiency under advanced data analytics is presented in this study. The proposed technologies consist of developing engine and propulsion innovations that will be experimented under laboratory conditions and large-model-scale sea trials, respectively. These experiments will collect large amount of data sets that will be used to quantify the performance of both innovations under the advanced data analytics framework (ADAF). Hence, extensive details on the ADAF along with preliminary data sets collected from a case study vessel are presented in this study.
Article
Full-text available
There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.