Figure 3 - uploaded by Armando Stellato
Content may be subject to copyright.
Source publication
Born in the early 1980's as a multilingual agricultural thesaurus, AGROVOC has steadily evolved over the last fifteen years, moving to an electronic version around the year 2000, and embracing the Semantic Web shortly thereafter. Today AGROVOC is a SKOS-XL concept scheme published as Linked Open Data, containing links (as well as backlinks) and ref...
Context in source publication
Context 1
... separate information on the revision of concepts as well as of each label in each language. Thus we are able to state that “ 玉 米 ” has a code, originating from the past pre-RDF versions of AGROVOC, equal to “12332” . This is expressed by the triple 6 : As a further example, we are also able to state that that term was created on December 12, 2002: The Agrontology seen in the above example is a compendium to AGROVOC, providing domain-spe- cific properties for enriching the description of concepts. Agrontology is enriched with VOAF 8 (Vocabulary of a Friend) descriptors, mostly for linking it to AGROVOC (and to other Datasets adopting it, such as the FAO Biotech Glossary 9 ), and to have it men- tioned in the LOV Dataset. Currently, AGROVOC is undergoing a deep analysis in order to make very explicit the modelling style adopted in its various topic areas (see Figure 2). A parallel analysis of the agron- tology is ongoing, with the purpose of reaching a full harmony between domain modelling and the vocabulary currently used for it. All 32,000+ concepts of the AGROVOC thesaurus are hierarchically organized under its 25 top concepts. AGROVOC top concepts are very general and high level concepts , including concepts such as “activities”, “organisms”, “locations”, “products”, “organism”, etc. The fact that 20,000+ concepts fall under the top concept “organism”, confirms how AGROVOC is largely oriented towards the agriculture sector (see Figure 2 for a complete statistic distribution of the dataset concepts under its 25 top concepts). Other important areas of AGROVOC include “substances”, “entities”, “products” and “locations”. Beyond being listed in the AGROVOC website, the list of top concepts can be found in the VoID file for AGROVOC Linked Open Dataset. Moreover, a deep study on the current coverage of AGROVOC is under study, with the purpose of supporting human and machine users alike in their quest for information within the thesaurus and its links. AGROVOC is today published as an Open Linked Dataset with links to thirteen vocabularies, thesauri and ontologies. Five of the linked resources are general in scope: the Library of Congress Subject Head- ings (LCSH) 10 , , RAMEAU Répertoire d'autorité- matière encyclopedique et alphabetique unifie, Eu- rovoc 12 , DBpedia 13 , and an experimental Linked Data version of the Dewey Decimal Classification 14 . The remaining eight resources are specific to various domains: NAL Thesaurus 15 for agriculture, GEMET 16 for environment, STW for Economics, TheSoz 17 for so- cial science and both GeoNames and the FAO Geopolitical Ontology cover countries and political regions. ASFA 18 covers aquatic science and the aptly named Biotechnology glossary covers biotechnology. These linked resources are mostly available as RDF/SKOS resources. The linked resources were considered in their entirety barring RAMEAU, for which only agriculture related concepts were considered (amounting to some 10% of its 150 000 concepts). Candidate mappings were found by applying string similarity matching al- gorithms to pairs of preferred labels [6] and by using the Ontology Alignment API [7] for managing the produced matches. The common analysis language used was English in all cases except the AGROVOC RAMEAU alignment for which French was used. Table 1 shows, for each resource linked to AGROVOC (column 1), its area of coverage (column 2), the language considered for mapping with AGROVOC (column 3), and the number of matches resulting from the evaluation (column 4, see below). Candidate links were presented to a domain expert for evaluation in the form of a spreadsheet. Once validated, the mappings were loaded in the same triple store where the linked data version of AGROVOC is stored. All resulting validated candidate matches were considered to be skos:exactMatch, as in agv:c_12332 skos:exactMatch ("maize" in English). Exact match has the advantage of a looser notion of equivalence than the more formal equivalence found in owl equivalence/identity properties such as owl:sameAs . The objective when linking AGROVOC to other resources was to provide only main anchors, privileging accuracy over recall. This is why we (mostly) rejected skos:closeMatch and relied exclusively on skos:exactMatch , found by means of string-similarity techniques as opposed to more sophisticated context-based approaches. Also, the One Sense per Domain assumption which, in analogy to the “one sense per discourse” [8] assumption, specifies that “t he more specialized a domain is, the less is the influen ce of word sense am- biguity ”, supports our claim that (in our case) similar strings correspond to equivalent meanings. The use of more sophisticated approaches might have contributed to filtering out potential results more than widening their number (thus incrementing precision over recall), however this potential loss of precision was well com- pensated by the manual validation of candidate links by a domain expert. Figure 3 (next page) provides a high-level view of the entire AGROVOC ’s maintenance process and its publication as linked data. The figure emphasises the three levels of data maintenance (bottom layer), data storage (middle layer), and data publication (top layer). The relational database is still necessary as many existing applications interface with this legacy model via SQL. Such conversions are thus needed to syn- chronize the data accessed by editors using legacy tools. This duplication of data repositories and the consequent data conversions is obviously not ideal and in principle should be limited. On the other hand, AGROVOC has supported a worldwide community of users (people and institutions) for decades who have developed a number of applications relying on the legacy relational model: these conversion steps are thus currently unavoidable and give an idea of the com- plexity inherent to historic, distributed collaboration scenarios. Elaborate procedures are rendered necessary, and the conversion effort, modelling issues and information needs are just the tip of the iceberg com- pared to the real effort spent in content and services maintenance. Several conversion steps are then present in the AGROVOC lifecycle. Note that this data flow is not always monotonic. Although the main authoring tool is VocBench, contributions to AGROVOC may also occasionally come from legacy formats such as spreadsheets and SQL files. This updated content is thus contributed separately (through different modali- ties) and then merged to produce a new copy. When a VocBench version is finalized with contributions coming from different sources and formats, it is then converted back to the relational DB for legacy applications. At the same time, a SKOS-XL version is produced and enriched with information, such as metadata descriptors from the VoID vocabulary to feed the LOD endpoint with updated data. Currently, no versioning info for the dataset as a whole (i.e. which AGROVOC release a client is accessing), is ex- plicitly reported inside its triples, while editorial notes provide fine-grained details about its content, with creation and modification dates for all concepts and labels present in the dataset. The linked data version of AGROVOC is now available online thanks to a collaboration between FAO and MIMOS Berhad. Data is stored in an RDF triple store (Allegrograph 19 ) hosted on a server in Kuala Lumpur, Malaysia. A SPARQL endpoint, combined with http resolution of AGROVOC entities, allows for publication as linked data. The Pubby service men- tioned in section 2 is also hosted by MIMOS. Both RDF and HTML access are resolved through content negotiation on FAO servers and redirected to the proper MIMOS service. During the more than 30 years of its existence, AGROVOC has seen a growing community of users exploiting its content for a progressively wider set of uses. In this section we report the more important uses of which we are aware. In some cases they exploit AGROVOC to give further explicit contribution to the LOD cloud itself, while in others, the availability of AGROVOC data as LOD will foster wider access pos- sibilities and probably see an increase in use as the number of potential users augment due to their inter- action with the entirety of the LOD cloud and to AGROVOC ’s position within it. In 2011, following a wave of enthusiasm caused by Linked Open Data initiatives and benefiting from the successful experiences of the AIMS group working on AGROVOC and other concept schemes and vocabularies ported to the Semantic Web, FAO ’s Information Technology Division (CIO) chose to add a taste of Semantic Web to their ambitious data integration project data.fao.org. The project which launches publicly in December 2012, bring s much of FAO’s statistical, textual and geographical data under one roof, fostering data integration and harmonization first within FAO itself, and later publicly via LOD. The models which are being exploited are many, mainly covering domain representation (OWL and SKOS as core modelling vocabularies), flanked by “standard” vocabularies such as FOAF [9]) and statistical data reporting (Data Cube Vocabulary [10]). AGROVOC, the first FAO resource to embrace the Semantic Web and to be published on the LOD cloud, was chosen as a common, controlled vocabulary for tagging the information resources (documents, media etc..) in data.fao.org. AGROVOC will also act as an interlingua to easily match RDF resources from different datasets, which still maintain a certain independ- ence and which thus expose potential overlaps with other datasets. A new, potentially wider, set of linksets on a star configuration with AGROVOC in the centre will be elaborated for establishing a global intercon- nected network of resources within FAO. AGROVOC and other vocabularies hosted on VocBench (e.g. Journal Authority Descriptions) have for some years been supported by an extensive set of SOAP web services 20 that ...
Similar publications
Linked Data technologies are increasingly being implemented to enhance cataloguing workflows in libraries, archives and museums. We review current best practice in library cataloguing, how Linked Data is used to link collections and provide consistency in indexing, and briefly describe the relationship between Linked Data, library data models and d...
Reutilization and interoperability are major issues in the fields of knowledge representation and extraction, as reflected in initiatives such as the Semantic Web and the Linked Open Data Cloud. This paper shows how terminological resources can be integrated and reused within different types of application. EcoLexicon is a multilingual terminologic...
The third edition of the open challenge on Question Answering over Linked Data (QALD-3) has been conducted as a half-day lab at CLEF 2013. Differently from previous editions of the challenge, has put a strong emphasis on multilinguality, offering two tasks: one on multilingual question answering and one on ontology lexicalization. While no submissi...
A case-study involving the semantic enrichment of a multilingual archive is presented with theaim of assessing the relevance of natural language processing techniques such as named-entityrecognition and entity linking for cultural heritage material. In order to improve the search ex-perience of the end users of historical collections, we map entiti...
Citations
... Most consortia related to Life Science have turned to Bioschemas profiles as they already provide some guidance on how to use SchemaOrg in this domain. For instance, FAIRagro will build upon and extend Bioschemas specifications, taking also into account well-known vocabularies in the agri-domain (e.g., AgroVoc [7]). Work on extensions and adoption will involve a variety of domain experts, expert associations and service providers to work collaboratively, via two AgriHackathons. ...
Schema.org is a controlled vocabulary that makes it easier for web pages to describe their actual content in a semantic, structured and machine-processable way. It is recognized by major search engines and data aggregators, making it easier for researchers to expose metadata describing their research outcomes. Here we present how Schema.org is used (or planned to be) by some NFDI consortia, becoming a lightweight approach to harmonize digital objects coming from different sources so they can be connected to each other in a meaningful way
... However, such mappings can only be done between individuals of skos:Concept, and thus requires an additional step of converting aligned skos:Concept individuals into OWL classes to create a multilingual ontology. An example of SKOS-based mappings include the AGROVOC multilingual thesaurus, which uses cross-lingual mappings, where SKOS concepts are mapped to external vocabularies such as the Chinese Agricultural Thesaurus using skos:exactMatch and skos:closeMatch [54,55,56]. Annane et al. [57] used SKOS and the General Ontology for Linguistic Description (GOLD) [58] to generate 228 000 mappings between English ontologies on BioPortal and its French equivalent. ...
The Multilingual Semantic Web has been in focus for over a decade. Multilingualism in Linked Data and RDF has shown substantial adoption, but this is unclear for ontologies since the last review 15 years ago. One of the design goals for OWL was internationalisation, with the aim that an ontology is usable across languages and cultures. Much research to improve on multilingual ontologies has taken place in the meantime, and presumably multilingual linked data could use multilingual ontologies. Therefore, this review seeks to (i) elucidate and compare the modelling options for multilingual ontologies, (ii) examine extant ontologies for their multilingualism, and (iii) evaluate ontology editors for their ability to manage a multilingual ontology. Nine different principal approaches for modelling multilinguality in ontologies were identified, which fall into either of the following approaches: using multilingual labels, linguistic models, or a mapping-based approach. They are compared on design by means of an ad hoc visualisation mode of modelling multilingual information for ontologies, shortcomings, and what issues they aim to solve. For the ontologies, we extracted production-level and accessible ontologies from BioPortal and the LOV repositories, which had, at best, 6.77% and 15.74% multilingual ontologies, respectively, where most of them have only partial translations and they all use a labels-based approach only. Based on a set of nine tool requirements for managing multilingual ontologies, the assessment of seven relevant ontology editors showed that there are significant gaps in tooling support, with VocBench 3 nearest of meeting them all. This stock-taking may function as a new baseline and motivate new research directions for multilingual ontologies.
... The LEAP4FNSSA lexicon is the combination between 3 semantic resources, i.e. inputs in order to construct the final lexicon: Agrovoc terms associated with these concepts are manually extracted from the online 1 resource. Agrovoc is a multilingual thesaurus dedicated to the agricultural domain devel-oped by FAO (Food and Agriculture Organization) [3] . This thesaurus is used for different applications, e.g. ...
The main objective of the project LEAP4FNSSA (Long-term EU-AU Research and Innovation Partnership for Food and Nutrition Security and Sustainable Agriculture) is to provide a tool for European and African institutions to engage in a sustainable partnership platform for research and innovation on Food and Nutrition Security, and Sustainable Agriculture (FNSSA). The FNSSA roadmap facilitates the involvement of stakeholders for addressing and linking research to innovation dealing with food security issues. In this context, the LEAP4FNSSA project supports the driving of the roadmap. Research and innovation activities were captured in different data, i.e. LEAP4FNSSA database and heterogeneous textual data including project reports, websites, scientific publications, workshop reports and student theses. The Knowledge Extractor Pipeline System (KEOPS) was implemented to support the processing and analysis of textual data associated with FNSSA activities. KEOPS is based on the LEAP4FNSSA lexicon presented in this data paper. The LEAP4FNSSA lexicon based on 331 keywords associated with 12 concepts dealing with the food security domain is the result of 3 steps of work and brainstorming. The lexicon enables the capturing of research and innovation topics dealing with food security and conducted by African and European partners. This data paper presents the obtained lexicon and a summary of the method to build it.
... As mentioned before, the LLOD cloud [14] was established in 2011 as a means "to measure and visualize the adoption of linked and open data within the linguistics community" [41]. It is the result of an effort by the Open Linguistics Working Group 37 and contains two kinds of resources: linguistic resources in a strict sense (e.g., dictionaries, wordnets, annotated corpora such as treebanks) and other linguistically-relevant resources (e.g., thesauri from tourism or life sciences, such as EARTh -the Environmental Applications Reference Thesaurus [2] or AGROVOC [13]); various downstream tasks can make use of them in data processing. ...
The need for reusable, interoperable, and interlinked linguistic resources in Natural Language Processing downstream tasks has been proved by the increasing efforts to develop standards and metadata suitable to represent several layers of information. Nevertheless, despite these efforts, the achievement of full compatibility for metadata in linguistic resource production is still far from being reached. Access to resources observing these standards is hindered either by (i) lack of or incomplete information, (ii) inconsistent ways of coding their metadata, and (iii) lack of maintenance. In this paper, we offer a quantitative and qualitative analysis of descriptive metadata and resources availability of two main metadata repositories: LOD Cloud and Annohub. Furthermore, we introduce a metadata enrichment, which aims at improving resource information, and a metadata alignment to META-SHARE ontology, suitable for easing the accessibility and interoperability of such resources.
... Although this proof-of-concept database is large and fairly comprehensive, in reality it must be connected to broader networks of information to be truly useful. Using controlled vocabularies, like FAO AGROVOC or CABI (Caracciolo et al., 2013;CABI, 2014), can give stakeholder groups flexibility to do this in real time: new issues can be added; new links made to existing issues; new indicators can be searched for, updated, or changed; and new ontological relationships between issues and indictors can be made. The malleability of this Semantic Web of food sustainability information will be necessary for most use cases, and hence this open, linked-data framework will be essential for interoperability within and across communities of practice. ...
A variety of stakeholders are concerned with many issues regarding the sustainability of our complex global food system. Yet navigating and comparing the plethora of issues and indicators across scales, commodities, and regions can be daunting, particularly for different communities of practice with diverse goals, perspectives, and decision-making workflows. This study presents a malleable workflow to help different stakeholder groups identify the issues and indicators that define food system sustainability for their particular use case. By making information used in such workflows semantically-consistent, the output from each unique case can be easily compared and contrasted across domains, contributing to both a deeper and broader understanding of what issues and indicators define a resilient global food system.
... Moreover, Senaratne et al. (2017) review literature to assemble quality indicators and according measurement methods to assess the quality of volunteered geographic information. Focusing on semantic aspects, the daQ Ontology facilitates describing the quality of linked datasets (Debattista et al., 2014), which can be considered influential in the development of DQV (Albertoni and Isaac (2021), Debattista et al. (2016)). ...
Extensive data quality descriptions as a vital part of a dataset’s metadata are widely accepted, albeit their provision in a formalized manner is often lacking. This is due to a number of problems that are frequently encountered by geodata producing scientists. As one of these problems, we identified missing, unknown or unused options to model inhomogeneity of data quality across space, time, and theme in a dataset’s metadata. Detailed information of inhomogeneous geodata quality beyond dataset-wide statistical measures (variance, min, max, etc.) is often only described in dataset accompanying papers or quality reports. These text-based approaches prevent precise querying and hinder the development of advanced data discovery tools that could make valuable use of inhomogeneous data quality information. We propose a profile for the data quality vocabulary (DQV) that allows to model inhomogeneous geodata quality. Considering established vocabularies typically used to describe geographic metadata, as well as ensuring compatibility with the default version of DQV, enhances the usability and thus, minimizes the effort for data producers to provide formalized descriptions of inhomogeneous data quality.
... Such standards are defined and promoted by standard agencies such as the International Standards Organisation (ISO) [60]. The Agricultural Information Management Standards (AIMS) [61], the Agricultural Metadata Element Set (AgMES) [62], and Agrovoc [63] are three notable metadata initiatives in the agriculture area [53]. Moreover, ontologies and taxonomies can enable better interoperability by allowing data to be linked at the semantic level. ...
With the rapid growth of population and the increasing demand for food worldwide, improving productivity in farming procedures is essential. Smart farming is a concept that emphasizes the use of modern technologies such as the Internet of Things (IoT) and artificial intelligence (AI) to enhance productivity in farming practices. In a smart farming scenario, large amounts of data are collected from diverse sources such as wireless sensor networks, network-connected weather stations, monitoring cameras, and smartphones. These data are valuable resources to be used in data-driven services and decision support systems (DSS) in farming applications. However, one of the major challenges with these large amounts of agriculture data is their immense diversity in terms of format and meaning. Moreover, the different services and technologies in a smart farming ecosystem have limited capability to work together due to the lack of standardized practices for data and system integration. These issues create a significant challenge in cooperative service provision, data and technology integration, and data-sharing practices. To address these issues, in this paper, we propose the platform approach, a design approach intended to guide building effective, reliable, and robust smart farming systems. The proposed platform approach considers six requirements for seamless integration, processing, and use of farm data. These requirements in a smart farming platform include interoperability, reliability, scalability, real-time data processing, end-to-end security and privacy, and standardized regulations and policies. A smart farming platform that considers these requirements leads to increased productivity, profitability, and performance of connected smart farms. In this paper, we aim at introducing the platform approach concept for smart farming and reviewing the requirements for this approach.
... This can be illustrated with FoodOn (Dooley et al., 2018), which was initially built to be used in collaboration with Genomic Epidemiology Ontology (GenEpiO 31 ) to specify foodborne disease risks and not food science or technology. AGROVOC, on the other hand, is a generic multilingual thesaurus developed by the Food and Agriculture Organisation (FAO) with direct interest for KT and covering many fields in agriculture and food (Caracciolo et al., 2013). ...
Background
Scientific software incorporates models that capture fundamental domain knowledge. This software is becoming increasingly more relevant as an instrument for food research. However, scientific software is currently hardly shared among and (re-)used by stakeholders in the food domain, which hampers effective dissemination of knowledge, i.e. knowledge transfer.
Scope and approach
This paper reviews selected approaches, best practices, hurdles and limitations regarding knowledge transfer via software and the mathematical models embedded in it to provide points of reference for the food community.
Key findings and conclusions
The paper focusses on three aspects. Firstly, the publication of digital objects on the web, which offers valorisation software as a scientific asset. Secondly, building transferrable software as way to share knowledge through collaboration with experts and stakeholders. Thirdly, developing food engineers' modelling skills through the use of food models and software in education and training.
... O CGIAR (Consultative Group for International Agricultural Research) [15] sugeriu o desenvolvimento de um software de big data para a agricultura, com o uso da ontologia de domínio para a agricultura Crop Ontology (Crop and Agronomy Ontology Community) e o projeto AGROVOC [16] também utiliza um vocabulário voltado para agricultura, compartilha palavras e faz o reuso de ontologias. ...
The presence of technologies in the agronomic field has the purpose of proposing the best solutions to the challenges found in agriculture, especially to the problems that affect cultivars. One of the obstacles found is to apply the use of your own language in applications that interact with the user in Brazilian Agribusiness. Therefore, this work uses Natural Language Processing techniques for the development of an automatic and effective computer system to interact with the user and assist in the identification of pests and diseases in soybean crop, stored in a non-relational database repository to provide accurate diagnostics to simplify the work of the farmer and the agricultural stakeholders who deal with a lot of information. In order to build dialogues and provide rich consultations, from agriculture manuals, a data structure with 108 pests and diseases with their information on the soybean cultivar and through the spaCy tool, it was possible to pre-process the texts, recognize the entities and support the requirements for the development of the conversacional system.
... 20 In addition, a large number of semantic resources exist, e.g. AGROVOC (word combination of agriculture and vocabulary) 21 or the general ontologies of the Open Biological and Biomedical Ontology (OBO) Foundry. 22 In the area of data structures, concepts for generalization have been proposed, such as Investigation-Study-Assay (ISA-TAB) 23 or, more recently, the Core Scientific Dataset Model, 24 which abstracts individual data structures to a self-describing generic data structure. ...
With the ongoing cost decrease of genotyping and sequencing technologies, accurate and fast phenotyping remains the bottleneck in the utilizing of plant genetic resources for breeding and breeding research. Although cost-efficient high-throughput phenotyping platforms are emerging for specific traits and/or species, manual phenotyping is still widely used and is a time- and money-consuming step. Approaches that improve data recording, processing or handling are pivotal steps towards the efficient use of genetic resources and are demanded by the research community. Therefore, we developed PhenoApp, an open-source Android app for tablets and smartphones to facilitate the digital recording of phenotypical data in the field and in greenhouses. It is a versatile tool that offers the possibility to fully customize the descriptors/scales for any possible scenario, also in accordance with international information standards such as MIAPPE (Minimum Information About a Plant Phenotyping Experiment) and FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. Furthermore, PhenoApp enables the use of pre-integrated ready-to-use BBCH (Biologische Bundesanstalt für Land- und Forstwirtschaft, Bundessortenamt und CHemische Industrie) scales for apple, cereals, grapevine, maize, potato, rapeseed and rice. Additional BBCH scales can easily be added. The simple and adaptable structure of input and output files enables an easy data handling by either spreadsheet software or even the integration in the workflow of laboratory information management systems (LIMS). PhenoApp is therefore a decisive contribution to increase efficiency of digital data acquisition in genebank management but also contributes to breeding and breeding research by accelerating the labour intensive and time-consuming acquisition of phenotyping data.