ArticlePDF Available

Abstract and Figures

Soils Security is a critical and growing global concern. The OpenSoils´ objective is to host, connect and share large amounts of curated soil data and knowledge at the Brazilian and South America level. The e-infrastructure consists of several layers of services, a database of soil profiles, a cloud-based computational framework to compute and share soil data integrated with a map visualization tools. OpenSoils is open, elastic, provenance-oriented and lightweight computational e-infrastructure that collects, stores, describes, curates, harmonizes and directs to various soil resource types: large datasets of soils profiles, services/applications, documents, projects and external links. OpenSoils is the first open science-based computational framework of soils security in the literature.
Content may be subject to copyright.
Towards an e-infrastructure for Open Science in Soils
rgio Manuel Serra da Cruz1,2,3, Marcos Bacis Ceddia1, Eber Assis Schmitz3,
Gabriel S. Rizzo2, Renan C. T. Miranda2, Sabrina O. Cruz2, Ana Clara Correa2,
Felipe Klinger2, Elton Marinho3, Pedro Vieira Cruz2
1 Universidade Federal Rural do Rio de Janeiro PPGMMC/UFRRJ
2 Programa de Educação Tutorial - PET-SI/UFRRJ
3 Universidade Federal do Rio de Janeiro PPGI/UFRJ,
Abstract. Soils Security is a critical and growing global concern. The OpenSoils´
objective is to host, connect and share large amounts of curated soil data and
knowledge at the Brazilian and South America level. The e-infrastructure consists
of several layers of services, a database of soil profiles, a cloud-based
computational framework to compute and share soil data integrated with a map
visualization tools. OpenSoils is open, elastic, provenance-oriented and lightweight
computational e-infrastructure that collects, stores, describes, curates, harmonizes
and directs to various soil resource types: large datasets of soils profiles,
services/applications, documents, projects and external links. OpenSoils is the first
open science-based computational framework of soils security in the literature.
1. Introduction
Agriculture consists of a complex science from a data-centric point of view, with different
disciplines (from genomics to soil sciences) and, different scales (from genes to
geolocalisation). The ability to explore this complex dataset is a crucial issue to tackle new
agricultural and societal challenges like food and soils security (WOLFERT et al., 2017). To
Koch et al. (2013), soils are probably the most important natural resource and biosystem that
support the human and terrestrial life. It is a primary, finite natural resource which derives
other resources, goods, and services.
Soils security is an emerging chief concept of soil sciences motivated by sustainable
development and precision agriculture. It is related to the maintenance and improvement of
the global soil resource to produce food, fibers and fresh water, human health, carbon
sequestration, contribute to energy and climate sustainability, and to maintain the biodiversity
and the overall protection of the ecosystem (KOCH et al., 2013). Soils security, like food
security, has several dimensions (e.g., capability, condition, capital, connectivity, and
codification) that interact with environmental, social, and economic components
(MCBRATNEY, FIELD & KOCH, 2014). Soils security is a data-intensive research domain
which life-cycle starts at the harvest of new soils data in the field and finish at scientist’s
visualization workstation or decision maker´s desk (Figure 1). It is important to highlight that
Figure 1 did not capture the complexity of soils security, once it does not encompass the
interconnection of the five dimensions and the political, economic and sociological aspects of
soil use and management. Figure 1 summarizes the life cycle of soil information at the
research and academic level, which is the primary focus of this research.
Figure 1 Example of soil horizons and the main phases of the life-cycle of soils
investigations (maps adapted from MELO et al., 2016).
Soils and food security investigations are in a rapid transformation. However, these
disciplines did not draw the same degree of attention of other e-science subjects like
bioinformatics, astronomy, computational chemistry. We advocate the utter necessity to do
interdisciplinary research considering the roles of computer science, data governance, supply
chain data integration and mathematical modeling in soils security to face the challenges. We
foresee that several open data, semantic web, open science, big data, and data science
approaches may aid the soils community to make wider investigations, do more accurate
predictions in precision agriculture and deliver more knowledge to the society.
The goal of this paper is to present the big picture of OpenSoils. It was conceived to
guide Brazilian policies by designing and laying the groundwork for a long-term effort aiming
at achieving an e-infrastructure for open science in soils security that would position Brazil as
a major global player at the forefront of research and innovation in this area. This paper is
organized as follows. Section 2 presents the background. Section 3 presents OpenSoils
conceptual architecture and uses. Section 4 the related work and Section 5 concluding
remarks and future work.
2. Soil, Soils Data, and Open Science
The development of soil from inorganic and organic materials is a complex natural process.
The soil is defined as the layer(s) of generally loose mineral and organic material that is
affected by physical, chemical, and/or biological processes at or near the planetary
surface and usually hold liquids, gases, and biota and support plants (VAN ES, 2017).
The soil is considered an open system that interacts with other components of the geologic
cycle. The characteristics of a soil are a function of Parent material, Climate, Relief,
Organisms and Time. (PANSU & GAUTHEYROU, 2006). Soils are evaluated in the field
through soil profiles, which is defined as a two-dimensional section composed of a vertical
succession of horizons, commonly named O, A, B, C (beginning at the surface), that have
been subjected to soil-forming processes (Figure 1). Each soil profile has very specific
mineralogical, morphological, chemical, physical, biological and environmental properties.
Soil investigations require actions in the field and wet scientific laboratories because
soils properties are diverse and are hard to be collected, mapped, analyzed, stored and
shared as soils data in databases.
Soils investigations, like any other scientific domain, has a life cycle and
characteristics that deserves efforts to improve the long-term data management and use of
strategic the data assets (YAMSON et al., 2016, ARROUAYS et al., 2017). Soil data has key
features, for instance, there are lots of legacies unanalyzed raw data. However, either new or
existing soils data are heterogeneous in its values and semi-structured in its formats.
Currently, there are many isolated data silos which store legacy soils data as (e.g.,
scientific papers, spreadsheets, text, pdf files or web pages), having poor semantics and
lacking metadata descriptors. Additionally, several soil databases are either inaccessible to
structured queries or are presented as simple spreadsheets or text files, being hardly shared
and reused by farmers and policymakers (ARROUAYS et al., 2017). Lots of soil data and
knowledge are still currently fragmented and at risk of getting lost in digital data silos or even
in simple tables in scientific papers. Consequently, reproducing the results from scratch from
several soils experiments is both time-consuming and error-prone at best, and sometimes
Recent evidence from meta-research studies suggests that problems with research
integrity and reproducibility in several scientific domains (BAKER, 2016; NEVES et al.,
2017; FANELLI, 2018 & HUTSON, 2018; FREIRE & CHIRIGATI, 2018). Many scientists,
journals, and funders are concerned about the biased, low reproducible and irreproducible
scientific findings in soils security as well. Thus, one approach that may serve to expand the
reliability and robustness of soils security investigations is the adoption of open science
(MUNAFÒ, 2016), e-science (HEY et al., 2009) and data provenance (BUNEMAN et al.,
2000 & FREIRE et al., 2008).
Open science is an umbrella term encompassing a multitude of assumptions about the
future of knowledge construction (FECHER & FRIESIKE, 2013). It is a global movement to
make scientific research, data, and dissemination accessible at all levels of an inquiring
society. Nowadays, there are some open science infrastructures (e.g., OpenAIRE, OSF,
EOSC, among others) not experienced with features of soils security challenges. E-
infrastructure is a computational tool that promotes open, centralized workflows by enabling
capture of different aspects and products of the research life-cycle, including developing a
research idea, designing an investigation, storing and analyzing collected data, and writing
and publishing reports or papers. The e-infrastructures support a variety of scientific tools and
services to assist in the research process (FOSTER & DEARDORFF, 2017).
3. OpenSoils e-infrastructure
It is useful to start from a theoretical e-infrastructure framing the complexity of challenges and
demystifying the role of big data in soils security. OpenSoils is an open, elastic, provenance-
oriented and lightweight computational open science e-infrastructure which rely on four
overarching layers. Figure 2 illustrates the e-infrastructure, the layers and summarizes the data
life-cycle of soil data (showed as arrows) (DEELMAN et al., 2009; CRUZ, CAMPOS &
MATTOSO, 2009; MATTOSO et al., 2010).
(i) The end-users layer (e.g., soil specialists, data managers, policy makers) uses on
the web portal and mobile applications. They are used to collect and ingest new soil data
directly from the fields into OpenSoilsDB using OpenSoils app or query data through the web
portal aiding policy-makers to make decisions (DSS), and urban planners do envision new
soils usage (PSS).
The specialists and researchers use this layer to handle data. The first can use mobile,
IoT and web applications (e.g., OpenSoils App and Wet Lab tools) to collect the data directly
in the fields and trace the route of each soil sample collected and sent to the chemistry and
physics laboratories (i.e., wet labs) to be further analyzed. Usually, each soil sample is
submitted in situ by the specialists to morphological analyses. Thus, OpenSoils app sends raw
data to the database. After that, each soil sample is tagged and shipped to laboratories where
the scientist does (in vitro) wet experiments and further execute (in silico) computational
scientific experiments with SisGExp (CRUZ & NASCIMENTO, 2016) which evaluate
specific physic-chemical properties of each soil horizon.
Figure 2 Overview of the conceptual architecture of OpenSoils (the arrows
describe data operations within the phases of life-cycle of soils investigations).
(ii) The services layer uses scientific and business models to generate curated data;
they are composed of set data-centric scientific workflows (which ingest and analyses the
consistency of the incoming of legacy soils data). RFlow is part of the layers
(NASCIMENTO, 2015). It is a provenance-based approach that aid researchers to reproduce
scientific experiments based on R scripts. RFlow manages, shares, and enacts the
computational scientific workflows that encapsulate legacy R scripts it transparently captures
provenance of R scripts and endows experiments reproducibility.
(iii) The data layer stores in the core of OpenSoils, it stores, describes, curates, various
soils data sets, and metadata descriptors. The internal structure supports a diversified degree
of data granularity and uses a relational database named OpenSoilsDB (former InfoSoilsBR,
(RIZZO, CEDDIA & CRUZ, 2017). It can store new curated soils data annotated with
Much of the information needed to assure the data quality and to allow researchers to
reproduce soils security experiments can be obtained by systematically capturing its
provenance. Provenance refers to the record trail that accounts for the origin of a piece of
data (FREIRE et al., 2008). OpenSoilsDB can store workflow and scripts provenance.
Workflow provenance consists of the record of the derivation of a result (e.g., a soil profile, an
image, a map) by a computational process represented as scientific workflows. Script
provenance is obtained by analyzing the source code of soils security experiments represented
as R scripts (PIMENTEL et al., 2017). OpenSoilsDB uses W3C PROV-DM recommendation
to store prospective and retrospective provenance for workflows and scripts (MOREAU &
MISSIER, 2013). Besides, OpenSoilsDB supports FAIR guidelines (Findable, Accessible,
Interoperable, and Reusable) for scientific data management and sharing (WILKINSON et
al., 2016).
The database also supports the ingestion of legacy soils data imported through ETL
tools (e.g., Pentaho/Kettle). The layer can store operational and governance data. Besides, to
support open data we use CKAN ( which stores curated open data sets.
Besides, CKAN is an international open data standard provides a streamlined way to make
curated soils data publishable, usable, discoverable and interoperable by third-part soils
applications. CKAN support data annotation with thesaurus ensuring semantic
interoperability between computer systems, research teams or community users to exchange
data with unambiguous meaning.
The thesaurus is used to semantically annotate soils data, allowing us to link it as
RDF triples in DBpedia (2018), as depicted in Figure 2. The thesaurus used in the e-
infrastructure is Agrovoc (CARACCIOLO et al., 2013). Currently, Agrovoc is a SKOS-XL
concept scheme published as Linked Open Data which covers all areas of interest of the Food
and Agriculture Organization (FAO), including food, agriculture, environment. FAO
publishes it; it is edited by a community of experts and consists of over 34,000 concepts
available in 29 languages. It is used by researchers, librarians and information managers for
indexing, retrieving and organizing data in agricultural information systems.
OpenSoilsDB database has two abstraction layers (e.g., operational and governance).
The lower operational layer aims to serve high quality-assessed, georeferenced soils profiles
database to the Brazilian and international communities upon their standardization and
harmonization. Each soil profile description recorded in the database has more than 40
entities, and 250 attributes to stores the soil properties and soil experiments (e.g.,
mineralogical, morphological, chemical, physical and environmental data). Furthermore, the
database support data versioning, data provenance, and stores georeferenced soil data as text
and images about physic-chemical analytical data from each horizon and soil samples
analyzed in wet laboratories.
The upper layer of the OpenSoilsDB improves the accessibility and reuse of soil data
and knowledge. Data governance and data literacy are two important building blocks in the
knowledge base of information professionals involved in supporting data-intensive research,
and both address data quality and research data management. Adopting data governance in
OpenSoils is advantageous because it is a service based on standardized, repeatable processes
and is designed to enable the transparency of data-related processes and cost reduction. It
refers to rules, policies, standards; decision rights; accountabilities and methods of
(iv) The governance layers are composed by data management, data license,
analytical and visualization tools and map generation services that can be connected to other
software (e.g., QGIS, ArcGIS, R, Tableau or sci-kit-learn) to generate analytical reports, soils
prediction, raster maps to name a few.
Although received little attention in soils research communities, this layer is
foundational for soils security. The prime function of the layer is to improve and maintain the
quality of the soils dataset; thus, to be successful at governance, quality must be continuously
measured, and the results continuously fed back by the data and services layers. We stress that
this layer has roles of individuals. For instance, these individuals are the application owners,
data custodians and application data architect, they are responsible for compliance with data
standards, resolve data-related issues, share the soil datasets, and support enforcement of
data/soil standards.
3.1 Daily uses of OpenSoils
OpenSoils was conceived as an e-infrastructure because refers to a combination
and interworking of digitally-based software technologies, resources (data, services, digital
repositories), communications (protocols and data access rights), and the people and
organizational structures needed to support modern and collaborative research in soils
security. OpenSoils has three primary uses:
(i) Offer diverse, integrated, timely and trustworthy digital repositories to researchers
(e.g., statistical studies of the quality of soils, soils mapping, evaluation of
contamination by heavy metals and organic waste management system).
(ii) Offer tools to city planners, agronomists, farmers to make better decisions using high-
quality harmonized open data (e.g., studies to erosion, risk of landslides, risk of
flooding, potential for agricultural use of soils; environmental and economic and
ecological zoning, insurance of agronomic entreprises, land classification for
irrigation; support in the recommendation of fertilizers and limestone).
(iii) Help students to increase their knowledge and skills about soils, the e-infrastructure
is connected to the Brazilian Soils Museum at UFRRJ, where users can
explore the collection of soil monoliths, soil artifacts, pictures and browse
the data.
4. Related Work
Traditionally, soils security has operated along disciplinary lines in using and applying its data
and analytical tools. Soils data management, curation, and governance is an issue that is still
underestimated in soils sciences, with data being analyzed for isolated applications and with
small groups of researchers working with isolated data silos on their personal computers and
not properly sharing them (LOKERS et al., 2016). Today, there are no open science software
platforms to support the full cycle of research in soils security. Thus, we conceived OpenSoils
as an open e-infrastructure that than be used by the researcher, decision maker, data curator,
city planner, farmer and students.
The investigations of soils security in Brazil and Latin America are still beginning.
They are depicted as several isolated investigations and data silos about legacy soils data. For
instance, BDSolos (BDSOLOS, 2018) is a relational database developed by EMBRAPA
Solos that stores about 9.000 soils profiles. The database has no provenance nor metadata
descriptors, besides there are no public interfaces to allow researchers to insert new soils data.
Furthermore, the interfaces to query data are hard to be used even by soil specialists. Last but
not least, there are no concerns about soils security nor map visualization facilities. We can
point out the same limitations are shared by Fe.BR (FEBR, 2018). It is a single HTML
website that stores the same type of data of BDSolos. The dataset is presented as a set of
google docs resting in a virtual drive on the Web; their authors claim that it is open data.
However, we stress that it fails to fulfill the eight Open Data principles (OGP, 2018), has no
governance policies and unfortunately does not commit with the best web semantic practices
(GYRARD et al., 2015).
Fortunately, OpenSoils is entirely different from related works; it was conceived to
adopt the open science, e-science, open data and data provenance emerging trends. First, it is
a multi-disciplinary, community and data integrative e-infrastructure. Second, it supports the
movement to make scientific experiments more reproducible and the publications and
scientific data available as open access. Third, it can handle large amounts of data of soils
security investigations. Fourth, it based on web, workflows services, and clouds infrastructure
which offer access to elastic and abundant resources that can be provisioned and de-
provisioned on-demand.
5. Concluding Remarks
Conditions are now ripe for a comparable step change in the interplay between soils
science and computer science, a change that will not only spur economic growth and
competitive advantage, but also will help scientists to develop solutions to our societal
challenges, understand climate change, and explore new frontiers of knowledge.
The soil has an integral part to play in the global environmental sustainability
challenges. Nevertheless, there is still a lot of computational work needed to be fully
developed in soil sciences. The growth of open science and the curated open soils databases
may aid scientist to increase the reliability, robustness, and reproducibility of soils security
In this paper we presented OpenSoils, a novel e-infrastructure which provide
knowledge about soils security to different kinds of users and not only researchers. The
infrastructure enhances reproducibility and delivers high-quality soils datasets, knowledge
and maps based on curated open data. OpenSoils is being developed; the mobile apps can be
found at PET-SI Google Play and the further information about Wet Labs applications, the
scientific workflows or ETL components can be found at
As future work, we plan to finish the implementation of the e-infrastructure and
investigate the alternative semantic relationships between soils data, digital objects and related
domains to enhance solutions and improve data sharing, data curation, and long-term data
stewardship policies.
This work was supported in part by the Brazilian funding agencies FNDE, PIBIC/CNPq and
Petrobras. The author's thanks, PET-SI/UFRRJ, MEC/SESU, Reds CYTED BigDSSAgro
and SmartLogistcs@IB.
Arrouays, D. et al., Soil legacy data rescue via GlobalSoilMap and other international and national
initiatives. GeoResJ 14, pages 1-19, 2017.
Baker, M., 1,500 scientists lift the lid on reproducibility. Nature. 533:7604, 2016.
Buneman, P., Khanna, S., Tan, W-C. Data Provenance: Some Basic Issues. In: Kapoor S., Prasad S.
(eds) FST TCS 2000: Foundations of Software Technology and Theoretical Computer Science.
FSTTCS 2000. Lecture Notes in Computer Science, vol 1974. Springer, Berlin, Heidelberg.
BDSolos, Banco de Dados de Solos.,
(acessado em 9.3.2018).
Caracciolo, C. et al. The AGROVOC Linked Dataset. Semantic Web, 4, 3, pages. 341-348. 2013.
Cruz, S.M.S, Campos, M. L. M and Mattoso, M. Towards a Taxonomy of Provenance in Scientific
Workflow Management Systems. In: SERVICES I, pages. 259-266, USA, 2009.
Cruz, S.M.S, Nascimento, J.A.P. SisGExp: Rethinking Long-Tail Agronomic Experiments. IPAW
DBPedia, (acessado em 24.3.2018).
Deelman, E. et al., Workflows and e-Science: An overview of workflow system features and
capabilities. Future Generation Computer Systems, 25:5, pages. 528540, 2009.
Fanelli, D. Opinion: Is science really facing a reproducibility crisis, and do we need it to? Proceedings
of the National Academy of Sciences of the USA, March 2018.
FeBR, Reposirio de dados de solos., (acessado em 9.3.2018).
Freire, J., Koop, D., Santos, E., Silva C.T. Provenance for Computational Tasks: A Survey. Computing
in Science and Engineering, 10:3, pages 1121, 2008.
Freire, J. Chirigati, F. Provenance and the Different Flavors of Computational Reproducibility. IEEE
Data Engineering Bulletin, 41(1), pages. 15-26, 2018.
Fecher, B. and Friesike, S. Open Science: One Term, Five Schools of Thought. Opening Science,
pages. 17-47, 2013.
Foster, E. D., Deardorff, A. Open Science Framework (OSF). J Med Libr Assoc. 105:2, pages. 203
206. 2017.
Gyrard, A., Serrano, M., Atemezing G. A. Semantic Web Methodologies, Best Practices and Ontology
Engineering Applied to Internet of Things. 2nd IEEE World Internet of Things, 2015.
Hey, T., Tansley, S., Tolle, K. The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft
Research, 2009.
Hutson, M., Artificial intelligence faces reproducibility crisis. Science, 359: 6377, pp. 725-726, 2018.
Koch, A. et al., Soil Security: Solving the Global Soil Crisis. Global Policy, 4:4 ages 434-441. 2013.
Lockers, R. et al., Analysis of Big Data technologies for use in agro-environmental science.
Environmental Modelling & Software, 84, pp. 494-504, 2016.
Mattoso, M. et al., Towards supporting the life cycle of large-scale scientific experiments. International
Journal of Business Process Integration and Management, 5:1, pages 79-92, 2010.
McBratney, A., Field, D. J and Koch, A. The dimensions of soil security. Geoderma. 213, pages 203-
213, 2014.
Melo, A. A. B. et al., Spatial distribution of organic carbon and humic substances in irrigated soils
under different management systems in a semi- Arid zone in Cea, Brazil. SEMINA: CIENCIAS
AGRARIAS, 37:4, pages 1845-1856, 2016.
Moreau. L., Missier, P. PROV-DM: The PROV Data Model.
(acessado em 24.3.2018).
Munafò, M. Open Science and Research Reproducibility. Ecancer medical science. 10, ed56. 2016.
Nascimento, J. A. P. RFLOW: uma arquitetura para execução e coleta de proveniência de workflows
estatísticos. Dissertação de Mestrado, UFRRJ, 2015.
Neves V. C. et al., Managing Provenance of Implicit Data Flows in Scientific Experiments. ACM
ACM Transactions on Internet Technology. Volume 17 Issue 4, Article No. 36, 2017.
OGP, Open Government Partnership, 2017.
(acessado em 9.3.2018).
Pansu, M., Gautheyrou, J., Handbook of Soil Analysis, Springer, 2006.
Pimentel, J. F et al., noWorkflow: a Tool for Collecting, Analyzing, and Managing Provenance from
Python Scripts 2017. Proceedings of the VLDB. vol 10:12, pages 1841-1844, 2017.
Rizzo, G.S.C, Ceddia, M. B., Cruz, S. M. S. Banco de Dados Pedológico: Primeiros Estudos. V RAIC
UFRRJ, 2017.
Wilkinson, M. D. et al., The FAIR Guiding Principles for scientific data management and stewardship.
Scientific Data 3, Article number: 160018, 2016.
Worlfert, S. et al., Big Data in Smart Farming A review. Agricultural Systems, v. 153, pages 69-80,
Yamson, D. O., et al., Putting Soils Security on the Policy Agenda: Need for a Familiar Framework.
Challenges. 4:2 15 pages. 2016.
van Es, H., A New Definition of Soil CSANews 62:20-21, 2017.
... O objetivo deste trabalho é divulgar o OpenSoils nas comunidades latinoamericanas [38]. Ele foi concebido para contribuir com as políticas brasileiras de proteção e mapeamento de solos, projetando e estabelecendo as bases para um esforço de longo prazo, sendo baseado nos fundamentos de open science e e-science para a área de segurança de solos. ...
... Dependendo da natureza da amostra ela é encaminhada poderá ser encaminha para solotecas ou museus de solos do Brasil e Américas. [38]. ...
... Neste artigo, apresentamos o OpenSoils [38], uma nova infraestrutura eletrônica multiusuário que fornece apoio ao ciclo de vida de estudos e projetos em segurança de solos. A infraestrutura armazena e compartilha datasets curados e permite a coleta de dados in situ, in vitro e in silico e elaboração de mapas digitais de solos de alta qualidade com base nesses dados curados. ...
Conference Paper
Full-text available
A segurança dos solos é um problema global, crescente e crítico que afeta a todos os países do mundo. O objetivo do OpenSoils é conectar e compartilhar grandes quantidades de dados curados de solos nos níveis brasileiro e sul-americano. OpenSoils é um framework leve, aberto, elástico, multiusuário que armazena, descreve, organiza, harmoniza grandes conjuntos de dados de perfis de solos. Também oferece dados abertos e mapas permitem que os usuários naveguem pelos principais dados de solo da região. O OpenSoils é uma das primeiras infraestruturas voltados para segurança de solos baseada em conceitos de e-Science e Open Science.
... The architecture is an open, provenance-oriented, and lightweight computational e-infrastructure which rely on layers to store, compute and share curated data of (STE and LTE) soils experiments [5]. Figure 1 illustrates a conceptual view and the flow of information in the architecture. ...
... Layer 3 (Data layer) -stores and describes various soils datasets with metadata. The internal structure supports a diversified degree of data granularity and uses a database named OpenSoilsDB [5,6] which can store new curated soils data annotated with provenance metadata. Much of the information needed to assure the data quality and to allow researchers to reproduce STE experiments can be obtained by systematically capturing data provenance [4]. ...
Full-text available
Soils are probably the most critical natural resource in Agriculture, and soils security represents a critical growing global issue. Soils experiments require vast amounts of high-quality data, are very hard to be reproduced, and there are few studies about data provenance of such tests. We present OpenSoils; it shares knowledge about data-centric soils experiments. OpenSoils is a provenance-oriented and lightweight e-infrastructure that collects, stores, describes, curates and, harmonizes various soil datasets.
Full-text available
We present noWorkflow, an open-source tool that systematically and transparently collects provenance from Python scripts, including data about the script execution and how the script evolves over time. During the demo, we will show how noWorkflow collects and manages provenance, as well as how it supports the analysis of computational experiments. We will also encourage attendees to use noWorkflow for their own scripts.
Full-text available
Legacy soil data have been produced over 70 years in nearly all countries of the world. Unfortunately, data, information and knowledge are still currently fragmented and at risk of getting lost if they remain in a paper format. To process this legacy data into consistent, spatially explicit and continuous global soil information, data are being rescued and compiled into databases. Thousands of soil survey reports and maps have been scanned and made available online. The soil profile data reported by these data sources have been captured and compiled into databases. The total number of soil profiles rescued in the selected countries is about 800,000. Currently, data for 117, 000 profiles are compiled and harmonized according to GlobalSoilMap specifications in a world level database (WoSIS). The results presented at the country level are likely to be an underestimate. The majority of soil data is still not rescued and this effort should be pursued. The data have been used to produce soil property maps. We discuss the pro and cons of top-down and bottom-up approaches to produce such maps and we stress their complementarity. We give examples of success stories. The first global soil property maps using rescued data were produced by a top-down approach and were released at a limited resolution of 1km in 2014, followed by an update at a resolution of 250m in 2017. By the end of 2020, we aim to deliver the first worldwide product that fully meets the GlobalSoilMap specifications.
Full-text available
The Open Science Framework (OSF) is a free, open source,r esearch workflow web application developed and maintained by the Center for Open Science (COS).
Full-text available
Smart Farming is a development that emphasizes the use of information and communication technology in the cyber-physical farm management cycle. New technologies such as the Internet of Things and Cloud Computing are expected to leverage this development and introduce more robots and artificial intelligence in farming. This is encompassed by the phenomenon of Big Data, massive volumes of data with a wide variety that can be captured, analysed and used for decision-making. This review aims to gain insight into the state-of-the-art of Big Data applications in Smart Farming and identify the related socio-economic challenges to be addressed. Following a structured approach, a conceptual framework for analysis was developed that can also be used for future studies on this topic. The review shows that the scope of Big Data applications in Smart Farming goes beyond primary production; it is influencing the entire food supply chain. Big data are being used to provide predictive insights in farming operations, drive real-time operational decisions, and redesign business processes for game-changing business models. Several authors therefore suggest that Big Data will cause major shifts in roles and power relations among different players in current food supply chain networks. The landscape of stakeholders exhibits an interesting game between powerful tech companies, venture capitalists and often small start-ups and new entrants. At the same time there are several public institutions that publish open data, under the condition that the privacy of persons must be guaranteed. The future of Smart Farming may unravel in a continuum of two extreme scenarios: 1) closed, proprietary systems in which the farmer is part of a highly integrated food supply chain or 2) open, collaborative systems in which the farmer and every other stakeholder in the chain network is flexible in choosing business partners as well for the technology as for the food production side. The further development of data and application infrastructures (platforms and standards) and their institutional embedment will play a crucial role in the battle between these scenarios. From a socio-economic perspective, the authors propose to give research priority to organizational issues concerning governance issues and suitable business models for data sharing in different supply chain scenarios.
Full-text available
Soils generate agricultural, environmental, and socio-economic benefits that are vital to human life. The enormity of threats to global soil stocks raises the imperative for securing this vital resource. To contribute to the security framing and advancement of the soil security concept and discourse, this paper provides a working definition and proposes dimensions that can underpin the conceptualization of soil security. In this paper, soil security refers to safeguarding and improving the quality, quantity and functionality of soil stocks from critical and pervasive threats in order to guarantee the availability, access, and utilization of soils to sustainably generate productive goods and ecosystem services. The dimensions proposed are availability, accessibility, utilization, and stability, which are obviously similar to the dimensions of food security. Availability refers to the quality and spatial distribution of soils of a given category. Accessibility relates to the conditions or mechanisms by which actors negotiate and gain entitlements to occupy and use a given soil. Utilization deals with the use or purpose to which a given soil is put and the capacity to manage and generate optimal private and public benefits from the soil. Finally, stability refers to the governance mechanisms that safeguard and improve the first three dimensions. These dimensions, their interactions, and how they can be operationalized in a strategy to secure soils are presented and discussed.
Full-text available
Knowledge of the spatial variability in soil properties can contribute to effective use and management. This study was conducted to evaluate the spatial distribution of the levels of total organic carbon (TOC) and humic substances (humic acid (C-FAH), fulvic acid fraction (C-FAF), and humin fraction (C-HUM)) in an Ultisol under different land uses, located in the irrigated perimeter of Baixo Acaraú- CE, transition to semiarid Ceará. The distribution and spatial dependence of the humic fractions were evaluated using descriptive statistics, including semivariogram analysis and data interpolation (kriging). The TOC showed a pure nugget effect, whereas the other fractions showed moderate spatial dependence. Forested and banana cultivation areas showed similar distributions of C-FAH and C-FAF, due to the high input of organic matter (leaves and pseudostems) in the area of banana cultivation and the absence of soil disturbance in the forested area. Data interpolation (kriging) and mapping were useful tools to assess the distribution and spatial dependence of soil attributes.
Conference Paper
Full-text available
Reproducibility is a major feature of Science. Even agronomic research of exemplary quality may have irreproducible empirical findings because of random or systematic error. This work presents SisGExp, a provenance-based approach that aid researchers to manage, share, and enact the computational scientific workflows that encapsulate legacy R scripts. SisGExp transparently captures provenance of R scripts and endows experiments reproducibility. SisGExp is non-intrusive, does not require users to change their working way, it wrap agronomic experiments as a scientific workflow system.
Scientific experiments modeled as scientific workflows may create, change, or access data products not explicitly referenced in the workflow specification, leading to implicit data flows. The lack of knowledge about implicit data flows makes the experiments hard to understand and reproduce. In this article, we present ProvMonitor, an approach that identifies the creation, change, or access to data products even within implicit data flows. ProvMonitor links this information with the workflow activity that generated it, allowing for scientists to compare data products within and throughout trials of the same workflow, identifying side effects on data evolution caused by implicit data flows. We evaluated ProvMonitor and observed that it could answer queries for scenarios that demand specific knowledge related to implicit provenance.