Figure 1 - uploaded by Leonardo Candela
Content may be subject to copyright.
Source publication
Species data are scattered among several databases and information systems. During the last years, considerable pro-gresses have been made in developing on-line species databases. However, there is no single database that can claim to host, and make available in a seamless way, all the species data needed by the communities willing to have access t...
Context in source publication
Similar publications
The challenge of person re-identification (re-id) is to match individual images of the same person captured by different non-overlapping camera views against significant and unknown cross-view feature distortion. While a large number of distance metric/subspace learning models have been developed for re-id, the cross-view transformations they learn...
Citations
... In particular, every mediator relies on mappings (Lenzerini, 2002 ) supporting (i) the rewriting of queries from the unifying SPD query language to the query language supported by the specific data provider, and (ii) the transformation of results from the specific data provider format to the unifying SPD format. Details on the SPD query language, the SPD unifying data format and the mapping of retrieved data into the unifying format are extensively discussed by Candela et al. (2014). It is important to highlight that records, once described in the unified data model, contain details on their provenance produced accordingly to the citation policies promoted by each database. ...
During the last years, considerable progresses have been made in developing on-line species occurrence databases. These are crucial in environmental and agricultural challenges, e.g., they are a basic element in the generation of species distribution models. Unfortunately, their exploitation is still difficult and time consuming for many scientists. No database currently exists that can claim to host, and make available in a seamless way, all the species occurrence data needed by the ecology scientific community. Occurrence data are scattered among several databases and information systems. It is not easy to retrieve records from them, because of differences in the adopted protocols, formats and granularity. Once collected, datasets have to be selected, homogenized and pre-processed before being ready-to-use in scientific analysis and modeling. This paper introduces a set of facilities offered by the D4Science Data Infrastructure to support these phases of the scientific process. It also exemplifies how they contribute to reduce the time spent in data quality assessment and curation thus improving the overall performance of the scientific investigation.
Over 300 million arthropod specimens are housed in North American natural history collections. These collections represent a “vast hidden treasure trove” of biodiversity −95% of the specimen label data have yet to be transcribed for research, and less than 2% of the specimens have been imaged. Specimen labels contain crucial information to determine species distributions over time and are essential for understanding patterns of ecology and evolution, which will help assess the growing biodiversity crisis driven by global change impacts. Specimen images offer indispensable insight and data for analyses of traits, and ecological and phylogenetic patterns of biodiversity. Here, we review North American arthropod collections using two key metrics, specimen holdings and digitization efforts, to assess the potential for collections to provide needed biodiversity data. We include data from 223 arthropod collections in North America, with an emphasis on the United States. Our specific findings are as follows: (1) The majority of North American natural history collections (88%) and specimens (89%) are located in the United States. Canada has comparable holdings to the United States relative to its estimated biodiversity. Mexico has made the furthest progress in terms of digitization, but its specimen holdings should be increased to reflect the estimated higher Mexican arthropod diversity. The proportion of North American collections that has been digitized, and the number of digital records available per species, are both much lower for arthropods when compared to chordates and plants. (2) The National Science Foundation’s decade-long ADBC program (Advancing Digitization of Biological Collections) has been transformational in promoting arthropod digitization. However, even if this program became permanent, at current rates, by the year 2050 only 38% of the existing arthropod specimens would be digitized, and less than 1% would have associated digital images. (3) The number of specimens in collections has increased by approximately 1% per year over the past 30 years. We propose that this rate of increase is insufficient to provide enough data to address biodiversity research needs, and that arthropod collections should aim to triple their rate of new specimen acquisition. (4) The collections we surveyed in the United States vary broadly in a number of indicators. Collectively, there is depth and breadth, with smaller collections providing regional depth and larger collections providing greater global coverage. (5) Increased coordination across museums is needed for digitization efforts to target taxa for research and conservation goals and address long-term data needs. Two key recommendations emerge: collections should significantly increase both their specimen holdings and their digitization efforts to empower continental and global biodiversity data pipelines, and stimulate downstream research.
Purpose
– Marine species data are scattered across a series of heterogeneous repositories and information systems. There is no repository that can claim to have all marine species data. Moreover, information on marine species are made available through different formats and protocols. The purpose of this paper is to provide models and methods that allow integrating such information either for publishing it, browsing it or querying it. Aiming at providing a valid and reliable knowledge ground for enabling semantic interoperability of marine species data, in this paper the authors motivate a top level ontology, called MarineTLO and discuss its use for creating MarineTLO-based warehouses.
Design/methodology/approach
– In this paper the authors introduce a set of motivating scenarios that highlight the need of having a top level ontology. Afterwards the authors describe the main data sources (Fisheries Linked Open Data, ECOSCOPE, WoRMS, FishBase and DBpedia) that will be used as a basis for constructing the MarineTLO.
Findings
– The paper discusses about the exploitation of MarineTLO for the construction of a warehouse. Furthermore a series of uses of the MarineTLO-based warehouse is being reported.
Originality/value
– In this paper the authors described the design of a top level ontology for the marine domain able to satisfy the need for maintaining integrated sets of facts about marine species and thus assisting ongoing research on biodiversity. Apart from the ontology the authors also elaborated with the mappings that are required for building integrated warehouses.