Robert Huber’s research while affiliated with University of Bremen and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (31)


Overview of PANGAEAs workflow from data submission to dissemination.
Example of a usage statistic for https://doi.org/10.1594/PANGAEA.937574²⁰. Details on usage statistics can be found in the Wiki.
Simplified technical architecture of PANGAEA.
Simplified data model of PANGAEA.
PANGAEA - Data Publisher for Earth & Environmental Science
  • Article
  • Full-text available

June 2023

·

144 Reads

·

45 Citations

Scientific Data

Janine Felden

·

Lars Möller

·

Uwe Schindler

·

[...]

·

The information system PANGAEA provides targeted support for research data management as well as long-term data archiving and publication. PANGAEA is operated as an open access library for archiving, publishing, and distributing georeferenced data from earth and environmental sciences. It focuses on observational and experimental data. Citability, comprehensive metadata descriptions, interoperability of data and metadata, a high degree of structural and semantic harmonization of the data inventory as well as the commitment of the hosting institutions ensures the long-term usability of archived data. PANGAEA is a pioneer of FAIR and open data infrastructures to enable data intensive science and an integral component of national and international science and technology activities. This paper provides an overview of the recent organisational, structural, and technological advancements in developing and operating the information system.

Download

An automated solution for measuring the progress toward FAIR research data

October 2021

·

179 Reads

·

59 Citations

Patterns

With a rising number of scientific datasets published and the need to test their Findable, Accessible, Interoperable, and Reusable (FAIR) compliance repeatedly, data stakeholders have recognized the importance of an automated FAIR assessment. This paper presents a programmatic solution for assessing the FAIRness of research data. We describe the translation of the FAIR data principles into measurable metrics and the application of the metrics in evaluating FAIR compliance of research data through an open-source tool we developed. For each metric, we conceptualized and implemented practical tests drawn upon prevailing data curation and sharing practices, and the paper discusses their rationales. We demonstrate the work by evaluating multidisciplinary datasets from trustworthy repositories, followed by recommendations and improvements. We believe our experience in developing and applying the metrics in practice and the lessons we learned from it will provide helpful information to others developing similar approaches to assess different types of digital objects and services.



Figure 1: Tasks of the RDA WG I-ADOPT.
Figure 2: The I-ADOPT Framework.
Figure 3: Quantitative example: Concentration of endosulfan sufalte in Ostrea edulis.
Figure 4: Qualitative example: Shape of cell.
The I-ADOPT Interoperability Framework for FAIRer data descriptions of biodiversity

July 2021

·

127 Reads

·

1 Citation

Biodiversity, the variation within and between species and ecosystems, is essential for human well-being and the equilibrium of the planet. It is critical for the sustainable development of human society and is an important global challenge. Biodiversity research has become increasingly data-intensive and it deals with heterogeneous and distributed data made available by global and regional initiatives, such as GBIF, ILTER, LifeWatch, BODC, PANGAEA, and TERN, that apply different data management practices. In particular, a variety of metadata and semantic resources have been produced by these initiatives to describe biodiversity observations, introducing interoperability issues across data management systems. To address these challenges, the InteroperAble Descriptions of Observable Property Terminology WG (I-ADOPT WG) was formed by a group of international terminology providers and data center managers in 2019 with the aim to build a common approach to describe what is observed, measured, calculated, or derived. Based on an extensive analysis of existing semantic representations of variables, the WG has recently published the I-ADOPT framework ontology to facilitate interoperability between existing semantic resources and support the provision of machine-readable variable descriptions whose components are mapped to FAIR vocabulary terms. The I-ADOPT framework ontology defines a set of high level semantic components that can be used to describe a variety of patterns commonly found in scientific observations. This contribution will focus on how the I-ADOPT framework can be applied to represent variables commonly used in the biodiversity domain.



Figure 1 Research data lifecycle.
Figure 6 An automated assessment of the FAIRness of data objects through the F-UJI service.
Figure 7 FAIR scores of the PANGAEA datasets before (upper part) and after improvement of metadata (lower part).
From Conceptualization to Implementation: FAIR Assessment of Research Data Objects

February 2021

·

155 Reads

·

33 Citations

Data Science Journal

Funders and policy makers have strongly recommended the uptake of the FAIR principles in scientific data management. Several initiatives are working on the implementation of the principles and standardized applications to systematically evaluate data FAIRness. This paper presents practical solutions, namely metrics and tools, developed by the FAIRsFAIR project to pilot the FAIR assessment of research data objects in trustworthy data repositories. The metrics are mainly built on the indicators developed by the RDA FAIR Data Maturity Model Working Group. The tools’ design and evaluation followed an iterative process. We present two applications of the metrics: an awareness-raising self-assessment tool and an automated FAIR data assessment tool. Initial results of testing the tools with researchers and data repositories are discussed, and future improvements suggested including the next steps to enable FAIR data assessment in the broader research data ecosystem.


Fig. 1. A schematic overview of HTTP based methods to expose metadata in a machine as well a human friendly manner offering various routes for machine based discovery of links to downloadable data.
Integrating data and analysis technologies within leading environmental research infrastructures: Challenges and approaches

February 2021

·

202 Reads

·

31 Citations

Ecological Informatics

When researchers analyze data, it typically requires significant effort in data preparation to make the data analysis ready. This often involves cleaning, pre-processing, harmonizing, or integrating data from one or multiple sources and placing them into a computational environment in a form suitable for analysis. Research infrastructures and their data repositories host data and make them available to researchers, but rarely offer a computational environment for data analysis. Published data are often persistently identified, but such identifiers resolve onto landing pages that must be (manually) navigated to identify how data are accessed. This navigation is typically challenging or impossible for machines. This paper surveys existing approaches for improving environmental data access to facilitate more rapid data analyses in computational environments, and thus contribute to a more seamless integration of data and analysis. By analysing current state-of-the-art approaches and solutions being implemented by world‑leading environmental research infrastructures, we highlight the existing practices to interface data repositories with computational environments and the challenges moving forward. We found that while the level of standardization has improved during recent years, it still is challenging for machines to discover and access data based on persistent identifiers. This is problematic in regard to the emerging requirements for FAIR (Findable, Accessible, Interoperable, and Reusable) data, in general, and problematic for seamless integration of data and analysis, in particular. There are a number of promising approaches that would improve the state-of-the-art. A key approach presented here involves software libraries that streamline reading data and metadata into computational environments. We describe this approach in detail for two research infrastructures. We argue that the development and maintenance of specialized libraries for each RI and a range of programming languages used in data analysis does not scale well. Based on this observation, we propose a set of established standards and web practices that, if implemented by environmental research infrastructures, will enable the development of RI and programming language independent software libraries with much reduced effort required for library implementation and maintenance as well as considerably lower learning requirements on users. To catalyse such advancement, we propose a roadmap and key action points for technology harmonization among RIs that we argue will build the foundation for efficient and effective integration of data and analysis.


Der späteiszeitliche Tüttensee-Komplex als Ergebnis der Abschmelzgeschichte am Ostrand des Chiemsee-Gletschers und sein Bezug zum „Chiemgau Impakt“ (Landkreis Traunstein, Oberbayern)

August 2020

·

217 Reads

·

7 Citations

E&G Quaternary Science Journal

Anhand von sedimentologischen und geländemorphologischen Untersuchungen wird die Abschmelzgeschichte des südöstlichen Chiemsee-Gletschers beschrieben. Mit dem Trockenfallen der Bad Adelholzen-Erlstätter Rinne im Verlaufe des Spätwürm entwickelt sich aus dem Abschmelzen des Eislappens in der Grabenstätter Bucht eine sich ständig tiefer legende konzentrische Abfolge von zunächst peripheren Entwässerungsrinnen, wobei die ältesten Rinnen dieser Phase bei Chieming, die jüngeren dann entsprechend weiter im Süden, in die zentripetale Richtung umschwenken. Die Entstehung des Tüttensee-Komplexes ist im Kontext dieser Entwicklung zu sehen. Er ist das Ergebnis der glazifluvialen und glazilakustrinen Sedimentation im Einflussbereich des sukzessiven Eisabbaus in der Grabenstätter Bucht in Kombination mit einer Toteisbildung im Bereich des heutigen Tüttensees. Dafür sprechen die stufenartige Abfolge der beschriebenen peripheren Abflussrinnen mit ihren immer tiefer liegenden Abflussniveaus, die Höhengleichheit von drei dieser Rinnen mit den Tüttensee-Terrassen sowie die für die jeweilige Terrassenentstehung typische glazifluviale bzw. delta-artige Sedimentstruktur und -reife. Dieses Ergebnis stellt ein Korrektiv zur Hypothese des Chiemgau-Impakts dar, wonach der Tüttensee ein Impaktkrater sein soll. Da diese nun falsifizierte Annahme vor allem im deutschsprachigen Raum von zahlreichen Medien propagiert wird, ist der folgende Artikel auf Deutsch verfasst, um einer breiten Leserschaft zugänglich zu sein.


Setting up an Interdisciplinary Data Infrastructure: Why Cooperation between Domain Experts and Computer Scientists Matters - An Experience Report from the GFBio Project

August 2017

·

123 Reads

·

1 Citation

Biodiversity Information Science and Standards

The German Federation for Biological Data (GFBio; Diepenbroek et al. 2014) is implementing a national infrastructure for the preservation, integration, and publication of biological data collected in German research projects. GFBio is built upon an archive infrastructure comprised of nine data centers including PANGAEA and the major German Natural Science Collections (German Federation for Biological Data (GFBio) 2017a). Creating and running GFBio requires close collaborations within a highly interdisciplinary consortium. Bringing together expertise from collections, scientists in the relevant fields, biodiversity informaticians and computer scientists proved to be essential for designing and building this system. GFBio is currently in its second funding phase. Essential services, required for the operation of the future infrastructure, have been successfully implemented. The realized technologies and tools use globally accepted standards as well as innovative concepts e.g., for data visualisation or semantic integration. A portal (https://www.gfbio.org) provides a common point of access to all GFBio services: data submission, data discovery, data visualisation and analysis, a terminology service, and a help desk. In addition, archived research data is shared with international information infrastructures such as the Global Biodiversity Information Facility (GBIF) and the Biological Collection Access Service (BioCASE). As the data centers use different systems and thus internally build upon different data structures (German Federation for Biological Data (GFBio) 2017b), the search functionality integrated in the portal is an good example of the collaboration between teams of different expertise. Since the aim was to provide an integrated, faceted search, it was necessary to agree on common fields that can be used to feed the facets. Therefore, the GFBio data centers agreed on using ABCD 2.06 (Access to Biological Collection Data) as a common standard and specified thirty elements for data exchange. Here, it was essential to bring together (1) domain experts for defining which facets they consider useful for an effective search, (2) computer scientists for providing the implementation based on Elasticsearch (Elasticsearch 2017), (3) biodiversity informaticians for defining mappings between different standards and (4) data curators from the GFBio data centers and long-term repositories for negotiating the set of mandatory fields. The starting point for broader research data management workflows was derived from high-quality data provided via publishing pipelines established at each data center. With that, primary collection and research data are available with metadata and data units according to the ABCD community standard and are ready to be reused following the FAIR data principles (Wilkinson et al. 2016): Findable, Accessible, Interoperable, Re-usable. Consequently, interdisciplinary cooperation is the GFBio data portal’s measure of success.


Terminology supported archiving and publication of environmental science data in PANGAEA

July 2017

·

532 Reads

·

18 Citations

Journal of Biotechnology

Exemplified on the information system PANGAEA, we describe the application of terminologies for archiving and publishing environmental science data. A Terminology Catalogue (TC) was embedded into the system, with interfaces allowing to replicate and to manually work on terminologies. For data ingest and archiving, we show how the TC can improve structuring and harmonizing lineage and content descriptions of data sets. Key is the conceptualization of measurement and observation types (parameters) and methods, for which we have implemented a basic syntax and rule set. For data access and dissemination, we have improved findability of data through enrichment of metadata with TC terms. Semantic annotations, e.g. adding term concepts (including synonyms and hierarchies) or mapped terms of different terminologies, facilitate comprehensive data retrievals. The PANGAEA thesaurus of classifying terms, which is part of the TC is used as an umbrella vocabulary that links the various domains and allows drill downs and side drills with various facets. Furthermore, we describe how TC terms can be linked to nominal data values. This improves data harmonization and facilitates structural transformation of heterogeneous data sets to a common schema. Technical developments are complemented by work on the metadata content. Over the last 20 years, more than 100 new parameters have been defined on average per week. Recently, PANGAEA has increasingly been submitting new terms to various terminology services. Matching terms from terminology services with our parameter or method strings is supported programmatically. However, the process ultimately needs manual input by domain experts. The quality of terminology services is an additional limiting factor, and varies with respect to content, editorial, interoperability, and sustainability. Good quality terminology services are the building blocks for the conceptualization of parameters and methods. In our view, they are essential for data interoperability and arguably the most difficult hurdle for data integration. In summary, the application of terminologies has a mutual positive effect for terminology services and information systems such as PANGAEA. On both sides, the application of terminologies improves content, reliability and interoperability.


Citations (17)


... Moreover, there are repositories from related disciplines that, among others, manage agricultural research data, such as PANGAEA (www.pangaea.de, Felden et al. 2023) for earth system research, the open access repository OpenAgrar (www.openagrar.de), or the electronic data archive library for plant genomics and phenomics e!DAL-PGP (edal-pgp.ipkgatersleben.de, Arend et al. 2016). ...

Reference:

Facilitating Effective Reuse of Soil Research Data: The BonaRes Repository
PANGAEA - Data Publisher for Earth & Environmental Science

Scientific Data

... These efforts can enhance the FAIRification of restricted datasets, promoting interdisciplinary collaboration and improving dataset discovery across domains [Vlachidis et al., 2021] [Sasse et al., 2022]. The annotation of column-level metadata has been shown to facilitate the retrieval and use of sensitive datasets, including medical records and microdata, by preserving privacy while adding valuable context [Dugas et al., 2016] [Magagna et al., 2021] [Jonquet et al., 2023] [Razick et al., 2014]. ...

The I-ADOPT Interoperability Framework for FAIRer data descriptions of biodiversity

... In the scientific domain, the FAIR Guiding Principles are turned into objectively measurable metrics. These are often based on intermediate concretization into dedicated requirements [Deva21]. Thus, metrics represent even more specific requirements, which are investigated to derive the domain-specific FAIRness criteria of manufacturing sensor data. ...

An automated solution for measuring the progress toward FAIR research data

Patterns

... In the literature, there is a remarkable amount of recent work that points to the relationship between interoperability and Sustainable Development. Themes such as smart cities (Jeong et al. 2020), energy efficiency (Martínez et al. 2021), cultural heritage (Turillazzi et al. 2021), biodiversity (Buttigieg et al. 2019;Magagna et al. 2021), agriculture (Adam-Blondon et al. 2016Alreshidi 2019), among others, show the diversity of research in the search for the construction of information systems that break up information silos (Pennington and Cagnazzo 2019), in pursuit of effective sharing and re-use of open and research data (Charalabidis et al. 2018), concerning FAIR principles (Findable, Accessible, Interoperable, Reusable) (Grandcolas 2019). These principles were proposed in 2016 and have since been recommended by several organizations (Wilkinson et al. 2016). ...

The I-ADOPT Interoperability Framework for FAIRer data descriptions of biodiversity

... Interoperability issues with regard to observational data have recently gained attention [1]. Observational data can have different formats, structures, and diversity in semantic representation of observational characteristics, resulting in interoperability issues at all these levels. ...

The I-ADOPT Interoperability Framework: a proposal for FAIRer observable property descriptions

... Long-term observations of ecosystem processes and function, including carbon and water cycling, biodiversity and phenology monitoring, provide the information needed to understand ecosystem response to climate extremes and societal pressures. Long-term observations enable detection of ecosystem disturbance and stress events, as well as track ecosystem resistance to and recovery from events like drought, heat and water stress that are difficult to attribute with short-term monitoring (Baldocchi et al 2018, Huber et al 2021, Beringer et al 2022. When combined with remote sensing data, long-term observations can be scaled to understand ecosystem response to water stress at the landscape and regional level (Cleugh et al 2007, Reichstein et al 2007, Yang et al 2020. ...

Integrating data and analysis technologies within leading environmental research infrastructures: Challenges and approaches

Ecological Informatics

... As one may observe, the principles of Table 2 themselves are abstract, meaning that they do not explicitly define the metrics in order to achieve FAIRness. However, there are certain efforts towards defining metrics and developing tools to assess data compliance to FAIRness principles [12,[33][34][35]. For our analysis, we chose F-UJI [35], a tool that evaluates FAIRness metrics defined by the FAIRsFAIR project [36] 16 . ...

From Conceptualization to Implementation: FAIR Assessment of Research Data Objects

Data Science Journal

... With regard to the DTM, it is noteworthy that the larger direct companion crater of the Lake Tüttensee crater, shown in Fig. 2, did not exist at all because it was covered by dense vegetation. A special and important aspect of the application of the extremely high-resolution DTM to the identification of smaller and very shallow craters arises in view of the increasing realization that, contrary to earlier and still held and published that meteorite airbursts only occur at higher altitudes (e.g.. [6]), the so-called low-altitude touchdown airburst impacts, which can now also be described using hydrocode modeling methods [6], are leading to completely new statistics on the frequency of terrestrial impact events. ...

Der späteiszeitliche Tüttensee-Komplex als Ergebnis der Abschmelzgeschichte am Ostrand des Chiemsee-Gletschers und sein Bezug zum „Chiemgau Impakt“ (Landkreis Traunstein, Oberbayern)

E&G Quaternary Science Journal

... To fulfil our goal of creating a usable information system rather than another burdensome requirement for the researchers involved, it is necessary to get a precise understanding of the research workflow conducted by experimental scientists. Other data infrastructure projects affirm the importance of including domain experts into the design process as well [5]. ...

Setting up an Interdisciplinary Data Infrastructure: Why Cooperation between Domain Experts and Computer Scientists Matters - An Experience Report from the GFBio Project

Biodiversity Information Science and Standards

... Those terms are specified in external terminologies such as WoRMS 6 ( World Register of Marine Species), QUDT 7 (Quantities, Units, Dimensions and Types) and PATO [9]. At present, we have developed tailor-made client applications customized to each of the terminologies [6]. The clients import terminologies from the external repositories into the PANGAEA data system and update them periodically. ...

Terminology supported archiving and publication of environmental science data in PANGAEA

Journal of Biotechnology