Figure 1 - uploaded by Armando Stellato
Content may be subject to copyright.
Source publication
Born in the early 1980's as a multilingual agricultural thesaurus, AGROVOC has steadily evolved over the last fifteen years, moving to an electronic version around the year 2000, and embracing the Semantic Web shortly thereafter. Today AGROVOC is a SKOS-XL concept scheme published as Linked Open Data, containing links (as well as backlinks) and ref...
Contexts in source publication
Context 1
... AGROVOC thesaurus -- its name is a portman- teau word of Agriculture with Vocabulary -- was first published at the beginning of the 1980s by the Food and Agriculture Organization of the United Nations (FAO). At its birth AGROVOC was available in three languages (English, Spanish and French), its purpose to serve as a controlled vocabulary for the indexing of publications in agricultural science and technology, including forestry, animal husbandry, aquatic sciences, fisheries, aquaculture and human nutrition. Primary users were the FAO library and the International Sys- tem for Agricultural Science and Technology (AGRIS) ( ), a global public domain database coordinated by FAO and containing approx- imately 3.5 million bibliographic records. In the year 2000, AGROVOC abandoned paper printing and went digital, with data storage handled by a relational database. This greatly eased maintenance. However, limitations were also experienced, espe- cially owing to the distributed community of editors which had enlarged over the years. Also, data were available to third parties only by means of database dumps, or through web services. The models and technologies developed within the Semantic Web, and the publication methodologies and best practices pro- moted by Linked Open Data [1] offered the possibility to overcome these limitations. AGROVOC was re- modelled using OWL [2] and then SKOS (see [3] for a detailed description of the evolution of the model). With the adoption of SKOS-XL, AGROVOC finally met the modelling requirements of a multilingual and linguistically detailed thesaurus. Today, the AGROVOC SKOS-XL concept scheme is a LOD (Linked Open Data) Dataset composed of more than 32000 concepts available in over 20 languages (five additional languages are under development), containing up to 40,000 terms in each language. AGROVOC is still managed by FAO, and owned and maintained by an international community of experts and institutions active in the area of agriculture. AGROVOC is widely used in specialized libraries as well as digital libraries and repositories to index content. It is also used as a specialized tagging resource for knowledge and content organization by FAO and other third-party stakeholders. This paper provides an overall description of the AGROVOC Linked Dataset and details its maintenance and publication process. As many thesaurus managers are embracing Semantic Web technologies, we believe our work is of general interest and may serve as a use case to the community. The rest of this paper is organized as follows: section two provides more details about publication of the linked dataset; section three presents the process followed for the generation of links between AGROVOC and relevant resources such as vocabularies, glossaries and thesauri; section four summarizes and discusses the entire data flow of AGROVOC, from maintenance to LOD publication; section five provides additional information on reported use of the AGROVOC linked Dataset and section six concludes. Information about AGROVOC thesaurus is available from the FAO website, at the address: The RDF version of AGROVOC has been made available as a Linked Open Dataset at the address: and it is also available through a data dump on the project site 1 of its main editing platform, VocBench (see [4] for a detailed description of this collaborative editing tool developed by FAO and other partners). AGROVOC data is freely usable under the terms of the Creative Commons 3.0 license 2 . Content negotiation for the LOD Dataset is properly managed by the server, and clients requesting HTML content (e.g. ordinary web browsers) are returned with an HTML representation of RDF data describing the requested concept, provided by the Pubby 3 application (see Figure 1). A description file following the VoID (Vocabulary of Interlinked Datasets) specifications [5] is available alongside the AGROVOC Linked Open Dataset: Such a VoID file contains statistical information about the linked dataset, as well as coordinates for automatically accessing and properly querying it. After the evolutions which its modelling exigencies dictated along the past years, AGROVOC finally found in the SKOS-XL model its perfectly-fitting dress. The SKOS-XL model, features “reified” labels which can thus be enriched with properties of their own. As an example, consider the concept shown in Figure 1, “maize” in English . Its Chinese preferred label 4 , “ 玉 米 ” , is expressed in skos-xl by means of two triples 5 ...
Context 2
... AGROVOC thesaurus -- its name is a portman- teau word of Agriculture with Vocabulary -- was first published at the beginning of the 1980s by the Food and Agriculture Organization of the United Nations (FAO). At its birth AGROVOC was available in three languages (English, Spanish and French), its purpose to serve as a controlled vocabulary for the indexing of publications in agricultural science and technology, including forestry, animal husbandry, aquatic sciences, fisheries, aquaculture and human nutrition. Primary users were the FAO library and the International Sys- tem for Agricultural Science and Technology (AGRIS) ( ), a global public domain database coordinated by FAO and containing approx- imately 3.5 million bibliographic records. In the year 2000, AGROVOC abandoned paper printing and went digital, with data storage handled by a relational database. This greatly eased maintenance. However, limitations were also experienced, espe- cially owing to the distributed community of editors which had enlarged over the years. Also, data were available to third parties only by means of database dumps, or through web services. The models and technologies developed within the Semantic Web, and the publication methodologies and best practices pro- moted by Linked Open Data [1] offered the possibility to overcome these limitations. AGROVOC was re- modelled using OWL [2] and then SKOS (see [3] for a detailed description of the evolution of the model). With the adoption of SKOS-XL, AGROVOC finally met the modelling requirements of a multilingual and linguistically detailed thesaurus. Today, the AGROVOC SKOS-XL concept scheme is a LOD (Linked Open Data) Dataset composed of more than 32000 concepts available in over 20 languages (five additional languages are under development), containing up to 40,000 terms in each language. AGROVOC is still managed by FAO, and owned and maintained by an international community of experts and institutions active in the area of agriculture. AGROVOC is widely used in specialized libraries as well as digital libraries and repositories to index content. It is also used as a specialized tagging resource for knowledge and content organization by FAO and other third-party stakeholders. This paper provides an overall description of the AGROVOC Linked Dataset and details its maintenance and publication process. As many thesaurus managers are embracing Semantic Web technologies, we believe our work is of general interest and may serve as a use case to the community. The rest of this paper is organized as follows: section two provides more details about publication of the linked dataset; section three presents the process followed for the generation of links between AGROVOC and relevant resources such as vocabularies, glossaries and thesauri; section four summarizes and discusses the entire data flow of AGROVOC, from maintenance to LOD publication; section five provides additional information on reported use of the AGROVOC linked Dataset and section six concludes. Information about AGROVOC thesaurus is available from the FAO website, at the address: The RDF version of AGROVOC has been made available as a Linked Open Dataset at the address: and it is also available through a data dump on the project site 1 of its main editing platform, VocBench (see [4] for a detailed description of this collaborative editing tool developed by FAO and other partners). AGROVOC data is freely usable under the terms of the Creative Commons 3.0 license 2 . Content negotiation for the LOD Dataset is properly managed by the server, and clients requesting HTML content (e.g. ordinary web browsers) are returned with an HTML representation of RDF data describing the requested concept, provided by the Pubby 3 application (see Figure 1). A description file following the VoID (Vocabulary of Interlinked Datasets) specifications [5] is available alongside the AGROVOC Linked Open Dataset: Such a VoID file contains statistical information about the linked dataset, as well as coordinates for automatically accessing and properly querying it. After the evolutions which its modelling exigencies dictated along the past years, AGROVOC finally found in the SKOS-XL model its perfectly-fitting dress. The SKOS-XL model, features “reified” labels which can thus be enriched with properties of their own. As an example, consider the concept shown in Figure 1, “maize” in English . Its Chinese preferred label 4 , “ 玉 米 ” , is expressed in skos-xl by means of two triples 5 ...
Context 3
... note that when providing a human reada- ble representation of the data ( Figure 1), we preferred to present our visitor the actual label, instead of the reified version of it. This is why the classical skos representation of terms is also available: Beyond enabling a finer linguistic modelling of the resource (by allowing, for instance, lexical relation- ships across labels without involvement of the at- tached concepts), SKOS-XL also makes it possible to refine the grain of editorial notes at language level by adding separate information on the revision of concepts as well as of each label in each language. Thus we are able to state that “ 玉 米 ” has a code, originating from the past pre-RDF versions of AGROVOC, equal to “12332” . This is expressed by the triple 6 : As a further example, we are also able to state that that term was created on December 12, 2002: The Agrontology seen in the above example is a compendium to AGROVOC, providing domain-spe- cific properties for enriching the description of concepts. Agrontology is enriched with VOAF 8 (Vocabulary of a Friend) descriptors, mostly for linking it to AGROVOC (and to other Datasets adopting it, such as the FAO Biotech Glossary 9 ), and to have it men- tioned in the LOV Dataset. Currently, AGROVOC is undergoing a deep analysis in order to make very explicit the modelling style adopted in its various topic areas (see Figure 2). A parallel analysis of the agron- tology is ongoing, with the purpose of reaching a full harmony between domain modelling and the vocabulary currently used for it. All 32,000+ concepts of the AGROVOC thesaurus are hierarchically organized under its 25 top concepts. AGROVOC top concepts are very general and high level concepts , including concepts such as “activities”, “organisms”, “locations”, “products”, “organism”, etc. The fact that 20,000+ concepts fall under the top concept “organism”, confirms how AGROVOC is largely oriented towards the agriculture sector (see Figure 2 for a complete statistic distribution of the dataset concepts under its 25 top concepts). Other important areas of AGROVOC include “substances”, “entities”, “products” and “locations”. Beyond being listed in the AGROVOC website, the list of top concepts can be found in the VoID file for AGROVOC Linked Open Dataset. Moreover, a deep study on the current coverage of AGROVOC is under study, with the purpose of supporting human and machine users alike in their quest for information within the thesaurus and its links. AGROVOC is today published as an Open Linked Dataset with links to thirteen vocabularies, thesauri and ontologies. Five of the linked resources are general in scope: the Library of Congress Subject Head- ings (LCSH) 10 , , RAMEAU Répertoire d'autorité- matière encyclopedique et alphabetique unifie, Eu- rovoc 12 , DBpedia 13 , and an experimental Linked Data version of the Dewey Decimal Classification 14 . The remaining eight resources are specific to various domains: NAL Thesaurus 15 for agriculture, GEMET 16 for environment, STW for Economics, TheSoz 17 for so- cial science and both GeoNames and the FAO Geopolitical Ontology cover countries and political regions. ASFA 18 covers aquatic science and the aptly named Biotechnology glossary covers biotechnology. These linked resources are mostly available as RDF/SKOS resources. The linked resources were considered in their entirety barring RAMEAU, for which only agriculture related concepts were considered (amounting to some 10% of its 150 000 concepts). Candidate mappings were found by applying string similarity matching al- gorithms to pairs of preferred labels [6] and by using the Ontology Alignment API [7] for managing the produced matches. The common analysis language used was English in all cases except the AGROVOC RAMEAU alignment for which French was used. Table 1 shows, for each resource linked to AGROVOC (column 1), its area of coverage (column 2), the language considered for mapping with AGROVOC (column 3), and the number of matches resulting from the evaluation (column 4, see below). Candidate links were presented to a domain expert for evaluation in the form of a spreadsheet. Once validated, the mappings were loaded in the same triple store where the linked data version of AGROVOC is stored. All resulting validated candidate matches were considered to be skos:exactMatch, as in agv:c_12332 skos:exactMatch ("maize" in English). Exact match has the advantage of a looser notion of equivalence than the more formal equivalence found in owl equivalence/identity properties such as owl:sameAs . The objective when linking AGROVOC to other resources was to provide only main anchors, privileging accuracy over recall. This is why we (mostly) rejected skos:closeMatch and relied exclusively on skos:exactMatch , found by means of string-similarity techniques as opposed to more sophisticated context-based approaches. Also, the One Sense per Domain assumption which, in analogy to the “one sense per discourse” [8] assumption, specifies that “t he more specialized a domain is, ...
Similar publications
Linked Data technologies are increasingly being implemented to enhance cataloguing workflows in libraries, archives and museums. We review current best practice in library cataloguing, how Linked Data is used to link collections and provide consistency in indexing, and briefly describe the relationship between Linked Data, library data models and d...
Reutilization and interoperability are major issues in the fields of knowledge representation and extraction, as reflected in initiatives such as the Semantic Web and the Linked Open Data Cloud. This paper shows how terminological resources can be integrated and reused within different types of application. EcoLexicon is a multilingual terminologic...
Wikidata is a free, open, multilingual and collaboratively edited knowledge base. It works as a repository of structured and linked data that that can be used by other projects, such as Scholia, to generate scholarly profiles and data visualizations. In this paper we present our contribution to enrich NOVA School of Business and Economics (NOVA SBE...
A case-study involving the semantic enrichment of a multilingual archive is presented with theaim of assessing the relevance of natural language processing techniques such as named-entityrecognition and entity linking for cultural heritage material. In order to improve the search ex-perience of the end users of historical collections, we map entiti...
The third edition of the open challenge on Question Answering over Linked Data (QALD-3) has been conducted as a half-day lab at CLEF 2013. Differently from previous editions of the challenge, has put a strong emphasis on multilinguality, offering two tasks: one on multilingual question answering and one on ontology lexicalization. While no submissi...
Citations
... Most consortia related to Life Science have turned to Bioschemas profiles as they already provide some guidance on how to use SchemaOrg in this domain. For instance, FAIRagro will build upon and extend Bioschemas specifications, taking also into account well-known vocabularies in the agri-domain (e.g., AgroVoc [7]). Work on extensions and adoption will involve a variety of domain experts, expert associations and service providers to work collaboratively, via two AgriHackathons. ...
Schema.org is a controlled vocabulary that makes it easier for web pages to describe their actual content in a semantic, structured and machine-processable way. It is recognized by major search engines and data aggregators, making it easier for researchers to expose metadata describing their research outcomes. Here we present how Schema.org is used (or planned to be) by some NFDI consortia, becoming a lightweight approach to harmonize digital objects coming from different sources so they can be connected to each other in a meaningful way
... However, such mappings can only be done between individuals of skos:Concept, and thus requires an additional step of converting aligned skos:Concept individuals into OWL classes to create a multilingual ontology. An example of SKOS-based mappings include the AGROVOC multilingual thesaurus, which uses cross-lingual mappings, where SKOS concepts are mapped to external vocabularies such as the Chinese Agricultural Thesaurus using skos:exactMatch and skos:closeMatch [54,55,56]. Annane et al. [57] used SKOS and the General Ontology for Linguistic Description (GOLD) [58] to generate 228 000 mappings between English ontologies on BioPortal and its French equivalent. ...
The Multilingual Semantic Web has been in focus for over a decade. Multilingualism in Linked Data and RDF has shown substantial adoption, but this is unclear for ontologies since the last review 15 years ago. One of the design goals for OWL was internationalisation, with the aim that an ontology is usable across languages and cultures. Much research to improve on multilingual ontologies has taken place in the meantime, and presumably multilingual linked data could use multilingual ontologies. Therefore, this review seeks to (i) elucidate and compare the modelling options for multilingual ontologies, (ii) examine extant ontologies for their multilingualism, and (iii) evaluate ontology editors for their ability to manage a multilingual ontology. Nine different principal approaches for modelling multilinguality in ontologies were identified, which fall into either of the following approaches: using multilingual labels, linguistic models, or a mapping-based approach. They are compared on design by means of an ad hoc visualisation mode of modelling multilingual information for ontologies, shortcomings, and what issues they aim to solve. For the ontologies, we extracted production-level and accessible ontologies from BioPortal and the LOV repositories, which had, at best, 6.77% and 15.74% multilingual ontologies, respectively, where most of them have only partial translations and they all use a labels-based approach only. Based on a set of nine tool requirements for managing multilingual ontologies, the assessment of seven relevant ontology editors showed that there are significant gaps in tooling support, with VocBench 3 nearest of meeting them all. This stock-taking may function as a new baseline and motivate new research directions for multilingual ontologies.
... The LEAP4FNSSA lexicon is the combination between 3 semantic resources, i.e. inputs in order to construct the final lexicon: Agrovoc terms associated with these concepts are manually extracted from the online 1 resource. Agrovoc is a multilingual thesaurus dedicated to the agricultural domain devel-oped by FAO (Food and Agriculture Organization) [3] . This thesaurus is used for different applications, e.g. ...
The main objective of the project LEAP4FNSSA (Long-term EU-AU Research and Innovation Partnership for Food and Nutrition Security and Sustainable Agriculture) is to provide a tool for European and African institutions to engage in a sustainable partnership platform for research and innovation on Food and Nutrition Security, and Sustainable Agriculture (FNSSA). The FNSSA roadmap facilitates the involvement of stakeholders for addressing and linking research to innovation dealing with food security issues. In this context, the LEAP4FNSSA project supports the driving of the roadmap. Research and innovation activities were captured in different data, i.e. LEAP4FNSSA database and heterogeneous textual data including project reports, websites, scientific publications, workshop reports and student theses. The Knowledge Extractor Pipeline System (KEOPS) was implemented to support the processing and analysis of textual data associated with FNSSA activities. KEOPS is based on the LEAP4FNSSA lexicon presented in this data paper. The LEAP4FNSSA lexicon based on 331 keywords associated with 12 concepts dealing with the food security domain is the result of 3 steps of work and brainstorming. The lexicon enables the capturing of research and innovation topics dealing with food security and conducted by African and European partners. This data paper presents the obtained lexicon and a summary of the method to build it.
... As mentioned before, the LLOD cloud [14] was established in 2011 as a means "to measure and visualize the adoption of linked and open data within the linguistics community" [41]. It is the result of an effort by the Open Linguistics Working Group 37 and contains two kinds of resources: linguistic resources in a strict sense (e.g., dictionaries, wordnets, annotated corpora such as treebanks) and other linguistically-relevant resources (e.g., thesauri from tourism or life sciences, such as EARTh -the Environmental Applications Reference Thesaurus [2] or AGROVOC [13]); various downstream tasks can make use of them in data processing. ...
The need for reusable, interoperable, and interlinked linguistic resources in Natural Language Processing downstream tasks has been proved by the increasing efforts to develop standards and metadata suitable to represent several layers of information. Nevertheless, despite these efforts, the achievement of full compatibility for metadata in linguistic resource production is still far from being reached. Access to resources observing these standards is hindered either by (i) lack of or incomplete information, (ii) inconsistent ways of coding their metadata, and (iii) lack of maintenance. In this paper, we offer a quantitative and qualitative analysis of descriptive metadata and resources availability of two main metadata repositories: LOD Cloud and Annohub. Furthermore, we introduce a metadata enrichment, which aims at improving resource information, and a metadata alignment to META-SHARE ontology, suitable for easing the accessibility and interoperability of such resources.
... Although this proof-of-concept database is large and fairly comprehensive, in reality it must be connected to broader networks of information to be truly useful. Using controlled vocabularies, like FAO AGROVOC or CABI (Caracciolo et al., 2013;CABI, 2014), can give stakeholder groups flexibility to do this in real time: new issues can be added; new links made to existing issues; new indicators can be searched for, updated, or changed; and new ontological relationships between issues and indictors can be made. The malleability of this Semantic Web of food sustainability information will be necessary for most use cases, and hence this open, linked-data framework will be essential for interoperability within and across communities of practice. ...
A variety of stakeholders are concerned with many issues regarding the sustainability of our complex global food system. Yet navigating and comparing the plethora of issues and indicators across scales, commodities, and regions can be daunting, particularly for different communities of practice with diverse goals, perspectives, and decision-making workflows. This study presents a malleable workflow to help different stakeholder groups identify the issues and indicators that define food system sustainability for their particular use case. By making information used in such workflows semantically-consistent, the output from each unique case can be easily compared and contrasted across domains, contributing to both a deeper and broader understanding of what issues and indicators define a resilient global food system.
... Moreover, Senaratne et al. (2017) review literature to assemble quality indicators and according measurement methods to assess the quality of volunteered geographic information. Focusing on semantic aspects, the daQ Ontology facilitates describing the quality of linked datasets (Debattista et al., 2014), which can be considered influential in the development of DQV (Albertoni and Isaac (2021), Debattista et al. (2016)). ...
Extensive data quality descriptions as a vital part of a dataset’s metadata are widely accepted, albeit their provision in a formalized manner is often lacking. This is due to a number of problems that are frequently encountered by geodata producing scientists. As one of these problems, we identified missing, unknown or unused options to model inhomogeneity of data quality across space, time, and theme in a dataset’s metadata. Detailed information of inhomogeneous geodata quality beyond dataset-wide statistical measures (variance, min, max, etc.) is often only described in dataset accompanying papers or quality reports. These text-based approaches prevent precise querying and hinder the development of advanced data discovery tools that could make valuable use of inhomogeneous data quality information. We propose a profile for the data quality vocabulary (DQV) that allows to model inhomogeneous geodata quality. Considering established vocabularies typically used to describe geographic metadata, as well as ensuring compatibility with the default version of DQV, enhances the usability and thus, minimizes the effort for data producers to provide formalized descriptions of inhomogeneous data quality.
... Such standards are defined and promoted by standard agencies such as the International Standards Organisation (ISO) [60]. The Agricultural Information Management Standards (AIMS) [61], the Agricultural Metadata Element Set (AgMES) [62], and Agrovoc [63] are three notable metadata initiatives in the agriculture area [53]. Moreover, ontologies and taxonomies can enable better interoperability by allowing data to be linked at the semantic level. ...
With the rapid growth of population and the increasing demand for food worldwide, improving productivity in farming procedures is essential. Smart farming is a concept that emphasizes the use of modern technologies such as the Internet of Things (IoT) and artificial intelligence (AI) to enhance productivity in farming practices. In a smart farming scenario, large amounts of data are collected from diverse sources such as wireless sensor networks, network-connected weather stations, monitoring cameras, and smartphones. These data are valuable resources to be used in data-driven services and decision support systems (DSS) in farming applications. However, one of the major challenges with these large amounts of agriculture data is their immense diversity in terms of format and meaning. Moreover, the different services and technologies in a smart farming ecosystem have limited capability to work together due to the lack of standardized practices for data and system integration. These issues create a significant challenge in cooperative service provision, data and technology integration, and data-sharing practices. To address these issues, in this paper, we propose the platform approach, a design approach intended to guide building effective, reliable, and robust smart farming systems. The proposed platform approach considers six requirements for seamless integration, processing, and use of farm data. These requirements in a smart farming platform include interoperability, reliability, scalability, real-time data processing, end-to-end security and privacy, and standardized regulations and policies. A smart farming platform that considers these requirements leads to increased productivity, profitability, and performance of connected smart farms. In this paper, we aim at introducing the platform approach concept for smart farming and reviewing the requirements for this approach.
... This can be illustrated with FoodOn (Dooley et al., 2018), which was initially built to be used in collaboration with Genomic Epidemiology Ontology (GenEpiO 31 ) to specify foodborne disease risks and not food science or technology. AGROVOC, on the other hand, is a generic multilingual thesaurus developed by the Food and Agriculture Organisation (FAO) with direct interest for KT and covering many fields in agriculture and food (Caracciolo et al., 2013). ...
Background
Scientific software incorporates models that capture fundamental domain knowledge. This software is becoming increasingly more relevant as an instrument for food research. However, scientific software is currently hardly shared among and (re-)used by stakeholders in the food domain, which hampers effective dissemination of knowledge, i.e. knowledge transfer.
Scope and approach
This paper reviews selected approaches, best practices, hurdles and limitations regarding knowledge transfer via software and the mathematical models embedded in it to provide points of reference for the food community.
Key findings and conclusions
The paper focusses on three aspects. Firstly, the publication of digital objects on the web, which offers valorisation software as a scientific asset. Secondly, building transferrable software as way to share knowledge through collaboration with experts and stakeholders. Thirdly, developing food engineers' modelling skills through the use of food models and software in education and training.
... O CGIAR (Consultative Group for International Agricultural Research) [15] sugeriu o desenvolvimento de um software de big data para a agricultura, com o uso da ontologia de domínio para a agricultura Crop Ontology (Crop and Agronomy Ontology Community) e o projeto AGROVOC [16] também utiliza um vocabulário voltado para agricultura, compartilha palavras e faz o reuso de ontologias. ...
The presence of technologies in the agronomic field has the purpose of proposing the best solutions to the challenges found in agriculture, especially to the problems that affect cultivars. One of the obstacles found is to apply the use of your own language in applications that interact with the user in Brazilian Agribusiness. Therefore, this work uses Natural Language Processing techniques for the development of an automatic and effective computer system to interact with the user and assist in the identification of pests and diseases in soybean crop, stored in a non-relational database repository to provide accurate diagnostics to simplify the work of the farmer and the agricultural stakeholders who deal with a lot of information. In order to build dialogues and provide rich consultations, from agriculture manuals, a data structure with 108 pests and diseases with their information on the soybean cultivar and through the spaCy tool, it was possible to pre-process the texts, recognize the entities and support the requirements for the development of the conversacional system.
... 20 In addition, a large number of semantic resources exist, e.g. AGROVOC (word combination of agriculture and vocabulary) 21 or the general ontologies of the Open Biological and Biomedical Ontology (OBO) Foundry. 22 In the area of data structures, concepts for generalization have been proposed, such as Investigation-Study-Assay (ISA-TAB) 23 or, more recently, the Core Scientific Dataset Model, 24 which abstracts individual data structures to a self-describing generic data structure. ...
With the ongoing cost decrease of genotyping and sequencing technologies, accurate and fast phenotyping remains the bottleneck in the utilizing of plant genetic resources for breeding and breeding research. Although cost-efficient high-throughput phenotyping platforms are emerging for specific traits and/or species, manual phenotyping is still widely used and is a time- and money-consuming step. Approaches that improve data recording, processing or handling are pivotal steps towards the efficient use of genetic resources and are demanded by the research community. Therefore, we developed PhenoApp, an open-source Android app for tablets and smartphones to facilitate the digital recording of phenotypical data in the field and in greenhouses. It is a versatile tool that offers the possibility to fully customize the descriptors/scales for any possible scenario, also in accordance with international information standards such as MIAPPE (Minimum Information About a Plant Phenotyping Experiment) and FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. Furthermore, PhenoApp enables the use of pre-integrated ready-to-use BBCH (Biologische Bundesanstalt für Land- und Forstwirtschaft, Bundessortenamt und CHemische Industrie) scales for apple, cereals, grapevine, maize, potato, rapeseed and rice. Additional BBCH scales can easily be added. The simple and adaptable structure of input and output files enables an easy data handling by either spreadsheet software or even the integration in the workflow of laboratory information management systems (LIMS). PhenoApp is therefore a decisive contribution to increase efficiency of digital data acquisition in genebank management but also contributes to breeding and breeding research by accelerating the labour intensive and time-consuming acquisition of phenotyping data.