ArticlePDF Available

Spatial discovery and the research library


Abstract and Figures

Academic libraries have always supported research across disciplines by integrating access to diverse contents and resources. They now have the opportunity to reinvent their role in facilitating interdisciplinary work by offering researchers new ways of sharing, curating, discovering, and linking research data. Spatial data and metadata support this process because location often integrates disciplinary perspectives, enabling researchers to make their own research data more discoverable, to discover data of other researchers, and to integrate data from multiple sources. The Center for Spatial Studies at the University of California, Santa Barbara (UCSB) and the UCSB Library are undertaking joint research to better enable the discovery of research data and publications. The research addresses the question of how to spatially enable data discovery in a setting that allows for mapping and analysis in a GIS while connecting the data to publications about them. It suggests a framework for an integrated data discovery mechanism and shows how publications may be linked to associated data sets exposed either directly or through metadata on Esri's Open Data platform. The results demonstrate a simple form of linking data to publications through spatially referenced metadata and persistent identifiers. This linking adds value to research products and increases their discoverability across disciplinary boundaries.
Content may be subject to copyright.
UC Santa Barbara
UC Santa Barbara Previously Published Works
Spatial discovery and the research library
Transactions in GIS, 20(3)
Lafia, Sara K
Kuhn, Werner
Publication Date
Data Availability
The data associated with this publication are available at:
Peer reviewed Powered by the California Digital Library
University of California
Spatial Discovery and the Research Library
Sara Lafia1, Jon Jablonski2, Werner Kuhn1, Savannah Cooley1, F. Antonio Medrano1
1 Center for Spatial Studies
3512 Phelps Hall
Department of Geography
1832 Ellison Hall
Department of Geography
University of California at Santa Barbara
Santa Barbara, CA 93106-4060
2 UC Santa Barbara Library
University of California at Santa Barbara
digital libraries, spatial data portals, Semantic Web
Academic libraries have always supported research across disciplines by integrating access to diverse
contents and resources. They now have the opportunity to reinvent their role in facilitating
interdisciplinary work by offering researchers new ways of sharing, curating, discovering, and linking
research data. Spatial data and metadata support this process because location often integrates disciplinary
perspectives, enabling researchers to make their own research data more discoverable, to discover data of
other researchers, and to integrate data from multiple sources. The Center for Spatial Studies at the
University of California, Santa Barbara (UCSB) and the UCSB Library are undertaking joint research to
better enable the discovery of research data and publications. The research addresses the question of how
to spatially enable data discovery in a setting that allows for mapping and analysis in a GIS while
connecting the data to publications about them. It suggests a framework for an integrated data discovery
mechanism and shows how publications may be linked to associated data sets exposed either directly or
through metadata on Esri’s Open Data platform. The results demonstrate a simple form of linking data to
publications through spatially referenced metadata and persistent identifiers. This linking adds value to
research products and increases their discoverability across disciplinary boundaries.
1. Introduction
Location plays a key role in the organization and integration of knowledge. In an interdisciplinary setting,
location can reveal patterns and trends in diverse and seemingly disparate information. For example, a
“geographic prism” on social mobility data in the United States reveals vast regional differences that can
then produce hypotheses about causes, based on local differences in factors like family structure or
schools (“Mobility, measured”, The Economist 2014). Data discovery tools that exploit location can offer
users a spatial view of phenomena, and in doing so, bridge disciplines in research and policy making. The
design of such tools, with an emphasis on connecting the discovered data to publications about them, is
the focus of this paper.
1.1. Problem statement
Enabling the spatial discovery of research publications and datasets, herein referenced as research objects,
is the next step in the evolving role of the modern research library. The notion of the extensible and
reusable research object originates from the domain of e-Science (Bechhofer et al. 2010). Over time, the
set of research objects, beginning with documents, has expanded beyond texts to include artifacts, models,
games, and works of art (Buckland 1997). Today, research libraries are increasingly called upon to build
links between research objects, such as journal articles or electronic theses, and auxiliary data, of which
both may have embedded locational references and may reside in external data repositories.
Emerging research object repositories, which hold data and publications, are still unstable as
architectures and face challenges in handling spatially referenced content (Hey et al. 2009). There is a
growing need for a stable yet flexible discovery mechanism that can thrive in an evolving spatial and non-
spatial information landscape (Cooley et al. 2015). At the same time, e-Science is producing sophisticated
models of research objects (Bechhofer et al. 2010) that are exceedingly complex for the needs of data and
publication discovery at libraries. Much work remains to be done in the development of simple search and
discovery tools that span multiple collections (van Hoolen et al. 2014).
In this article, we address the primary challenge of stability by implementing a simple linked data
model that exploits basic relationships between research data and research publications in a way that does
not break when repositories change. In doing so, we also address a second challenge, that of supporting
discovery, resulting in enhanced integrative capacity for spatially referenced research objects. Combining
these two challenges in the proposed pragmatic form is expected to result in progress on a third, broader
goal: that of supporting interdisciplinarity in scientific workflows through data reusability, within and
across domains. These three challenges translate into the following set of guiding research questions:
1. How can libraries generate stable links for research objects across repositories?
2. How can libraries support the discovery of research objects based on location?
3. How can libraries promote cross-disciplinary data sharing and reuse?
The research, undertaken by the Center for Spatial Studies at the University of California, Santa Barbara
(UCSB), in partnership with the UCSB Library and Esri Inc., seeks to make spatial references and
relationships explicit in research objects, thereby integrating diverse contents and contextualizing
published research data by connecting them to publications.
1.2. Motivation
Research institutions generate massive quantities of data from diverse disciplines in a wide variety of
formats (Mayernik et al. 2015). Recent efforts to increase transparency and reproducibility encourage, and
often mandate, that researchers make publications and data publically accessible through open-access
licenses (University of California Regents 2014). A proliferation of associated data is contributing to a
growing imbalance between an institution’s ability to collect data and its ability to curate resources
(Cragin et al. 2010), resulting in a trade-off between quality assurance and ingestion capacity, a trend that
the UCSB Library can attest has accelerated in the intervening years.
Interdisciplinary research presents additional unique challenges, including disciplinary
differences in frames of reference, operational agendas, research methods, and vocabularies (Brewer
2015; MacMillan 2014). Many data discovery portals support only domain-specific vocabularies, data
structures, and metadata formats, severely limiting the applicability and reuse of data across domains
(Golding 2009). More limitations arise when subsets of domain research are published in expensive
subscription journals. This diminishes research impact and potential for data reuse across domains, which
could be enhanced if made available through open-access policies (Harnad et al. 2008).
Library repositories have developed detailed workflows for generating metadata that use rigorous
metadata content standards and controlled vocabularies, but result in extremely limited ingest capacity. In
contrast, repositories for self-deposit of spatial content, such as ArcGIS Online,1 have minimal metadata
constraints and see 8,00012,000 new and mostly undescribed objects added per day (Szukalski 2015).
Metadata that describe the lifecycle of a dataset are often very granular and must account for both library
and spatial needs. Metadata are valuable for long-term preservation, yet they are not central to resource
discoverability (Hardy and Durante 2014), which is the primary focus of this research. Enforcing
particular metadata requirements may do more to hinder data availability, especially across diverse
domains that have their own metadata standards, than to aid in their discovery.
Library-run and self-deposit systems of research data management can be complementary, as they
approach control and sharing in two distinct ways. However, they are not currently connected. This
research proposes to align the traditional library ingest process with the self-deposit approach of cloud-
based GIS, such as ArcGIS Online, through the generation of links between two sets of research objects:
researcher publications and researcher data. This approach combines the best aspects of both worlds:
spatial discovery of data from the GIS world and document curation from the library world, connected
through a lightweight and stable linked data solution.
Creating links between publications held in a tightly controlled library repository and data stored
across external databases increases the discoverability of these research objects. This work connects
research objects held in separate repositories without the need to formally align their metadata schemas.
In our proposed framework, a research object in a self-deposit environment like ArcGIS Online, which
has minimal metadata constraints, is semantically linked to a related research object in an institutional
environment with tightly controlled metadata.
This article presents a proof-of-concept model for linking spatial data to the research publications
that utilize them. Using OpenRefine with its Resource Description Framework (RDF) extension for data
processing and cleaning2, we link sample publications to data hosted on Esri’s Open Data platform by
Dublin Core metadata relationships. The linked data are stored as triples, which allows for queries on the
associated RDF about publication data. Such formalized relationships are key to developing a rich
publication and data repository that allows for discovery of research resources and advances cross-
disciplinary sharing of knowledge, as illustrated in Figure 1.
Figure 1—Project vision for data discovery and publication integration across domains
2. Background and related work
This work builds on a long tradition of spatially enabled digital libraries and uses the latest semantic and
geospatial technologies to demonstrate the potential for spatial discovery and the interlinking of research
resources. As university researchers are increasingly expected to share the data associated with their
publications under open data mandates, university libraries find themselves being called upon to curate
increasing volumes and additional types of researcher-generated data. In this context, enhancing users’
ability to share, discover, and make sense of content is of great importance.
2.1. Library repositories
In the mid-1990s, the Alexandria Digital Library (ADL) at UCSB was the first distributed digital library
(Freeston 2004) to offer collections of georeferenced materials, hosted online, searchable by spatial and
temporal criteria (Goodchild 2004). ADL eventually lapsed, relegating UCSB researcher data, such as the
popular Maya Forest GIS collection (Ford 1995), to offline discovery and curation. Reinstating such
legacy collections through an open-access digital presence increases their utility in an interdisciplinary
research context. Further, linking these datasets to publications in a manner that can be exploited by
Semantic Web tools improves their discoverability.
Many university libraries have implemented hybrid ad-hoc solutions for spatial data collection,
discovery, access, storage, and archiving in the context of the changing landscape of user needs and
technologies (Scaramozzino et al. 2014). Libraries have generally promoted interdisciplinary
collaboration by supporting geospatial research platforms and tools for analysis and post-data discovery.
However, they do not yet combine spatial and semantic approaches to expose connections between
existing data silos that span diverse disciplines. In practice, most library-curated research objects are
locally stored, have limited access points, and are undiscoverable from related content (Padilla 2016).
UCSB Map & Imagery Laboratory (MIL) is in the process of developing a spatial metadata
workflow using ArcCatalog for the purposes of preparing spatial datasets for ingest into the new
Alexandria Digital Research Library (ADRL)3. The ISO 19115 standard4, the Open Geoportal Metadata
Creation Guide5 and the Stanford University metadata creation workflow6 inform this metadata model.
The work described in this paper couples these ongoing efforts with the production of linked data.
2.2. Emerging spatial data technologies
Achieving the dual purposes of enhancing spatial discovery and linking research objects requires a novel
solution. Some contemporary data management solutions address the need to enable the spatial discovery
of resources, but do not enhance discovery of resources through semantic links. GeoBlacklight7, for
instance, is an open source, multi-institutional software project that many libraries are currently adopting
(Addison et al. 2015; Durante and Hardy 2015). It offers users text-based, spatial, and faceted semantic
search to enable discovery of GIS-consumable resources across organizations (Hardy and Durante 2014).
GeoBlacklight also allows users to connect to data as a service, which enables analysis from a desktop
GIS, comparable with Esri Open Data. While a GeoBlacklight instance for the UCSB Library would
support spatial discovery by relating spatially referenced content based on location, it would not connect
the data to publications held in the library’s own repositories, nor to other external repositories. Another
possible data management solution includes the California Digital Library’s Dash system, which has been
adopted by several University of California campuses and features a self-deposit feature, facilitates data
search, data sharing, and preservation services (Tsang 2015). However, DASH does not offer inherent
spatial functionality, although efforts to achieve this are underway at UC Irvine8.
Considering these existing alternatives, utilizing Esri’s ArcGIS Online platform as a foundation
for combined spatial and semantic search makes sense for several reasons. Since GIS software has
become ubiquitous for performing spatial analysis across a variety of academic disciplines, universities
often administer an ArcGIS Online enterprise account through their libraries. ArcGIS Online is a cloud-
based GIS that acts as a self-deposit data system with basic geoprocessing functionality. Additionally,
ArcGIS Online now includes Esri Open Data9, which is a spatial data repository with native access
controls and search features. Enabling Open Data on ArcGIS Online allows organizations to make content
available to the public or restricted to users authorized by the institution.
There are many advantages to using Esri Open Data as a spatial data discovery solution, not the
least of which is publishing spatial data in a way that allows open access and download. Users are not
required to have ArcGIS Online credentials to access data hosted through Esri Open Data, which
increases both accessibility to data and reproducibility of results derived from that data. ArcGIS Online
also supports various metadata standards, increasing the potential to share data across domains. Its
interface allows for visualization and filtering of the data for basic geoprocessing and analysis. This adds
immediate value to the discovery process, as users can begin making sense of datasets even before
downloading them. Many organizations are adopting Esri’s Open Data platform because ArcGIS Online
offers web-based analysis and search. UCSB’s instance of Esri Open Data10 is shown in Figure 2.
Figure 2UCSB’s Open Data instance leverages ArcGIS Online
While Esri’s Open Data platform is an excellent tool for publishing, discovering, and accessing
spatial datasets, it is not a stable repository solution in either the traditional sense of institutional preprint
repositories or in the emerging sense of Trusted Digital Repositories (Tsang 2015). However, when
linking data with a controlled resource, such as UCSB’s Alexandria Digital Research Library11 (ADRL)
repository, which hosts theses and dissertations, or University of California’s eScholarship12, which offers
open-access to researcher publications, the power of the Semantic Web can be brought to bear on the
systems. This design choice provides flexibility that many current repositories cannot offer.
2.3. State of the art
Many institutions, including libraries, archives and museums, are adopting linked data approaches to
improve the discoverability of the growing number of resources that they curate (van Hooland et al.
2014). Institutions, such as the Linked Data for Libraries university consortium, the Library of Congress
and the Tate Modern Gallery, leverage linked open data technologies to enhance access to their
collections. These management models offer users access to data and metadata through Application
Programming Interfaces (APIs) and extend query capabilities through accessible endpoints.
The Linked Data for Libraries (LD4L) 13 initiative is a multi-institutional effort, including
Stanford, Rice and Harvard universities, aimed toward applying the Library of Congress Bibliographic
Framework Initiative to describe library resources. Transforming traditional MARC (MAchine-Readable
Cataloging) metadata descriptions, which are flat, text-based, and fielded (Avram 2003) into linked
BIBFRAME descriptions for cartographic and geospatial materials leverages Library of Congress
controlled vocabularies alongside DBPedia and GeoNames, to model places, creators, themes, and events
(Durante et al. 2016). Library of Congress is a notable early organizational contributor to the production
of API-accessible linked open data for authority files14. Many institutions use these services for
authoritative reconciliation (Heath and Bizer 2011). These services allow institutions, such as the Tate
Modern Gallery, to contribute collection metadata to repositories, like GitHub15, that they neither own nor
manage, increasing discoverability, content exposure and creative reuse (Padilla 2016).
The development of Semantic Web technologies enables linked data driven portals. Linked data
portals provide new opportunities to organize metadata and retrieve information resources such as text
documents, datasets, and multimedia content (Baierer et al. 2014; Hu et al. 2015-1; Hu et al. 2015-2).
Linked data resource discovery systems can index domain-specic information with terms from
ontologies. Ontologies are formal explicit specifications of a shared conceptualization using a vocabulary
of classes and relations, expressed in RDF, which is a data model that stores metadata attributes as nodes
and links to constitute an interconnected graph.
Whereas other methods for publishing data rely on multiple data models, the RDF data model
provides an integrated and simple access mechanism that also supports hyperlink-based data discovery
using uniform resource identifiers (URIs) as global identifiers for entities (Heath and Bizer 2011). For
instance, Athanasis et al. (2009) described data with domain-specific spatial ontologies in a linked data
discovery tool, and Keßler et al. (2012) developed a linked data portal for the GIScience community to
explore and visualize geographic distributions of publications by conference location and editor or author
affiliations. Scheider et al. (2014) have leveraged linked spatiotemporal data to enhance access to diverse
formats of library materials, from paper maps to scientific datasets. Taken together, the interlinking of
research objects and their metadata creates a semantically linked graph.
Adopting semantic technologies addresses issues of interoperability that arise from online portals
featuring spatial data in various standards and formats. In particular, relationships between research
publications and associated data can be captured through RDF subject-predicate-object triples, which
bridge gaps between data and metadata, as well as differing metadata content standards.
3. Methods
Publicly available research objects, namely researcher datasets and researcher publications, drive the data
discovery mechanism developed in this research. The design and evaluation of a linked data model is
informed by user personas, which structure the relationship between published research and associated
data. The extensible triple model developed in this work allows for future expansion of the vocabulary.
3.1. User personas
The current designs of most access systems do not support the spatial integration of research object
collections across various domains. Adopting the personas of domain scientists and considering the types
of data that each might search for or contribute, along with their motivations for doing so, informed the
design specifications of our system. The UCSB Esri Open Data instance contains collections of test data
that span research domains, data formats, and user needs. The current three exemplary data collections
represent a small but diverse range of disciplines, from archaeology to political science, and diverse
formats, including shapefiles, imagery, text documents, external repositories, and map services.
Table 1Personas, domains, and datasets of researchers currently discoverable through UCSB Open Data
The first collection corresponds to Anabel Ford’s Maya Forest GIS and was obtained from a CD archive
(Ford 1995). The data include shapefiles and imagery complete with full ISO compliant metadata created
by UCSB Library staff. The second collection comes from a meta-analysis conducted by Benjamin
Halpern, a UCSB ecologist. His collection of sampling sites has a global extent and is hosted in an
external repository, a practice typical of UCSB researchers in the life sciences for disseminating research
(Halpern et al. 2009). While these spatial data are open-access and publically shared, they are not
currently discoverable through a search of UCSB Library holdings. The third data collection comes from
Thomas Patterson, a political scientist at Stanford University, and represents world boundaries of disputed
areas. The data are part of the broader Natural Earth collection, currently discoverable through the UCSB
Library, but not yet formally associated with Patterson’s research publications (Patterson 2009).
The personas cover various data sharing scenarios. Anabel Ford has locally hosted resources that
she intends to share with a global public audience through open access. Benjamin Halpern is from a
domain that favors data distribution through a repository external to UCSB. Thomas Patterson is from
another institution and has spatially relevant contents that might interest Ford, Halpern, or other scientists
at UCSB or anywhere else. The datasets share a spatial overlap that would not otherwise be obvious. For
instance, Patterson’s contested borders dataset is a feature collection with a global extent, yet intersects
with Ford and Halperns’ regions of research. The potential to expose the spatial complementarity of
resources would go unrecognized without the assignment of spatial footprints to these objects. Users can
then benefit from discovering useful and seemingly unrelated datasets or publications from unfamiliar
domains by exploring the spatial relations of the research objects.
Taken together, these exemplary researcher personas and associated datasets provide a foundation
for several competency questions that capture the kinds of queries that users may want to construct:
Find datasets referenced by a particular publication.
Find publications that have a particular dataset associated with them.
Find research objects that overlap with a particular spatial extent.
Performing such queries is frequently relevant to a resource discovery process, but relationships between
research data, the publications that reference them, and the locational extents that they cover are not
currently exposed in the metadata. The onus of relating publications with datasets, as well as relating both
the publications and datasets with location, is currently placed on the end-user. The linked data
relationship between research publications and data, taken along with the spatial extent of the dataset
represented in Open Data, address the types of thematic and spatial queries that users would currently like
to ask of a library catalog but cannot.
3.2. Experimental design
The purpose of using linked data in our approach is to formalize relationships between data hosted
through Esri Open Data or any other spatial repository, and publications hosted anywhere. The linked
data publishing pattern followed in this research generates linked data from static structured data in the
manner of Heath and Bizer (2011). This is achieved by taking static input data in the form of spatial and
non-spatial contents, publishing them as services and generating a triplestore to reference the URIs of data
services and associated publications. This is achieved with the aid of the tool OpenRefine and its RDF
extension. The stepwise procedure undertaken to achieve this is summarized as follows:
1. Data hosting: Spatial and non-spatial research data are published to a local server and shared via
ArcGIS Online as image or feature services, which are shared with the UCSB Open Data group
by a system administrator and are made publically referenceable through Open Data source URIs.
2. URIs: Identifiers for corresponding publications and dataset services referenced by Open Data
content are retrieved from open access document repositories or publisher pages.
3. Vocabularies: The OpenRefine with RDF extension generates a graph using the identifiers of
publications and research object relationships defined by Dublin Core predicates.
4. Reconciliation: The graph is referenced against Library of Congress Subject Headings to enrich
users’ ability to explore and discover thematically linked content.
5. Implementation: Publication-data relationships are serialized as triples that can be queried using
the SPARQL Protocol and RDF Query Language.
Because the data described in the previous section are hosted as web services, they are easily referenced
through their URIs. Researchers at UCSB can currently share their spatial and non-spatial research data
through the institutional instance of ArcGIS Online. Any content currently available through this platform
can be migrated into Open Data by changing system permissions. A small subset of data are currently
hosted for this research, but by hosting data directly on ArcGIS Online and by connecting additional
external resources associated with UCSB researchers to UCSB Open Data, we hope to expand content.
The URIs of research objects correspond to either a data layer or a publication. Datasets that
share a common base name are parts of collections and are indicated by URI container. Data creators have
only partial control over assignment of the URI resource domain name, which as a best practice, should
be self-descriptive and human readable (Heath and Bizer 2011).
When selecting a technology for RDF creation, it was important to consider the provided data
formats, mechanisms of access and desired output. While initial stages of this research tested the
Callimachus linked data application builder, a locally hosted triplestore was deemed to be inefficient and
limiting. Several other RDF converters and services were considered, but many of these tools perform
script-based extraction, transformation and loading from web pages. Semi-automatic RDF creation, rather
than script-based extraction for instance, is a technique better suited to our purposes.
The nature of the data and the questions asked about the data determine the choice of vocabulary.
Using predicates from existing vocabularies increases data interoperability and reuse. Other datasets and
applications that use shared vocabularies can also be more readily cross-linked without additional
processing, increasing their discoverability (Heath and Bizer 2011). The Dublin Core Metadata Initiative
(DCMI) vocabulary is widely used and is well maintained with dereferenceable URIs that point to a
retrieval protocol. These factors motivated the decision to use DCMI instead of specialized vocabularies,
which are typically less stable. DCMI metadata elements define general attributes such as title and
subject. Our data model does not rely on metadata standards, but rather on the two simple associative
relationships, isReferencedBy and references, defined in the Dublin Core ontology16 shown in Figure 3.
Figure 3—Generic Dublin Core Metadata Initiative (DCMI) data model
One of the motivations for producing linked data is to forge associations with other data sets,
which is a step achieved during the reconciliation process. URIs of the research objects can be interlinked
with Library of Congress authority files17 and even extended to link with other contextually relevant
ontologies, such as Wikipedia’s knowledge graph DBPedia18, by referencing the SPARQL endpoints.
These links enable exploration of other works associated with authors and datasets.
Once the linked data model has been applied to the publications and dataset URIs, OpenRefine
generates an RDF skeleton. The interface allows users to preview the RDF schema and manually edit
nodes in the graph. Once the structure is formalized, it is possible to export the data to a variety of
formats, such as RDF/XML or Turtle, depending on the intended use, as shown in Figure 4.
Figure 4Reconciled OpenRefine template (above) and RDF skeleton (below)
We used OpenRefine with its RDF extension to implement our simple linked data model.
OpenRefine generates a static profile triplestore, which is an internally hosted RDFa document that
references the URIs assigned to the publication and research data. These static files can then be uploaded
to a web server, offering users a web-accessible interface that supports queries.
In OpenRefine, a class is a set of RDF resources that use the same templates. Classes such as
publications and data are defined as instances. A new Publications class template uses an RDFa
serialization, embedding RDF as triples in HTML documents and encoding the semantic properties and
relationships captured in Figure 5.
Figure 5RDF triples for datasets and publications exported in Turtle syntax
The triplestore can be queried using SPARQL. A SPARQL endpoint is a web-protocol to which
queries against a triplestore can be submitted (Powell 2014). User queries pertaining to datasets
referenced within a publication or publications that utilize a particular dataset can be formulated in this
way. General queries across all relationships as well as between specific publications or datasets can then
be generated. A SPARQL query for all publications that reference datasets can be formulated against the
triples, as shown in Figure 6.
Figure 6A generic SPARQL query against the triples
In this query, a user requests the attributes of data associated with publications, which are then
optionally filtered by matching author name and sorted by title. By formalizing the relationships between
subjects and objects through the use of DCMI prefixes during the data production phase, it is possible to
map the relationships between research publications and datasets. In this example, matching triples for all
publications referencing datasets produced by Dr. Anabel Ford are returned, sorted by title. Additional
queries could be constructed using any combination of predefined attributes and predicates.
4. Results
The research datasets tested in the model included a Maya Forest GIS layer featuring archaeological sites
on UCSB Open Data as the object and a published report from the researcher on the 2000 Field Season19
as the subject (Ford and Wernecke 2000), which are illustrated in Table 2. Queries for datasets associated
with a particular publication use the DCMI predicate references to point users to linked datasets hosted
through UCSB Open Data. Conversely, users can query for publications associated with datasets through
the predicate isReferencedBy, which points back to objects in their respective repositories using URIs.
Table 2Example of a triple stored in the RDF framework
The two parameters defined within the OpenRefine template include a publication URI resource,
which is provided by the user, and a data URI resource, which in the case of the sample data comes from
Esri Open Data. The relationship between these entities is manually defined. The template references the
DCMI vocabulary and makes assignments to each resource based on the user asserted relationship.
OpenRefine with RDF extension offers a flexible template that can easily be extended to include
additional prefixes and connect the research objects to other collections.
Once linked data are generated from the research objects, publications and datasets are
discoverable from their URIs. Users can spatially browse for datasets through the UCSB Open Data
instance and discover linked datasets based on the associated attributes formalized in the data model.
Importantly, this process enables the spatial discovery of research publications associated with spatial
datasets, which are not traditionally conceptualized as objects with footprints. Retrieving datasets from
publications is also possible through the linked data model, as pointers to the hosted data can be exposed
during a search on an external repository.
We have deliberately avoided developing a complex model of authorial relationships between
data and publications. With a data model for generating simple associative triples in place, scaling up the
number of resources referenced in the system from our current small set will be possible. User-testing to
ensure the data model is adequate, and achieving a critical mass of datasets and associated publications
will eventually result in a cross-disciplinary discovery resource.
5. Discussion and Conclusions
This article presents a first step in establishing a linked data discovery mechanism that prioritizes stability
and supports the discoverability and reusability of research data with spatial references, whether these are
in the data themselves or just metadata. It demonstrates how academic libraries can spatially enable the
discovery of research objects across disciplines and systems. Formalized relationships between
publications and researcher-generated data expose the interplay between researchers and the data that they
use or produce. Linking research data, hosted for example through Esri Open Data, to publications, such
as those accessible through the UCSB Alexandria Digital Research Library repository, adds value to both
sets of research objects. By creating links through the use of linked data predicates taken from Dublin
Core, library users are led from publications to data and back, leveraging the spatial search in Esri Open
Data on a much broader scale. Making an increasing amount of content compatible through a linked data
model will make more library holdings discoverable through a spatial search interface.
5.1. Limitations
Current institutional policies support research sharing through open-access licensing, yet incentives and
formal channels for sharing only currently exist for publications, not necessarily for associated datasets
(University of California Regents 2014). Therefore, in order to lower hurdles to participation, the system
described here opts to give researchers full control over what data they want to make available and how.
Another open issue is the long-term maintenance of such a system. The production of linked data
is currently a manual process undertaken using OpenRefine. Transitioning to a system that automatically
scrapes repositories and generates links may be desirable. The use of semi-automatic RDF creation in this
research enabled reconciliation of resources through a graphical user interface, yet this required manual
effort. The process could be expedited through the use of server-side tools like Apache Jena20 to automate
the workflow by running periodic scrapes and generating triplestores from URIs.
5.2. Next steps
Metadata for objects in ADRL recently became available as RDF triples, which are available through a
dedicated API. UCSB Library staff harvest metadata in ADRL from the MARC metadata in the library
catalog. Aligning this collection with the triplestores for research objects currently generated in
OpenRefine can increase the amount of campus resources accessible as linked data, expanding the
university’s knowledge graph. Linking these systems through common vocabularies could increase
awareness of research efforts across domains and increase discoverability of curated research objects
The ADRL efforts will also result in the eventual contribution of name records for all new
electronic thesis and dissertation authors to Library of Congress Name Authority Files, which are
referenced by libraries as a controlled vocabulary for bibliographic records. Name records will be
available through the Library of Congress Linked Data service as URIs. UCSB Open Data and the ADRL
content now available as linked data could readily reference these authority headings (Maali 2011).
Different linked data sets do not have to necessarily share a single schema, yet their structure
allows them to support cooperation without a need to coordinate. Adopting a linked data approach that
defines the relationship between objects regardless of their format, location, or metadata schema,
expands the scope of content discovery beyond that which any single system can offer. By extension, this
expands discovery beyond an individual campus to the broader research community.
6. Acknowledgements
We would like to thank the UCSB Library, UCSB Center for Spatial Studies, and Esri Inc. as well as
Anabel Ford and Ben Halpern for supporting this research project. The research reported here has been
partially supported by an anonymous private donor.
7. References
Addison, A., Moore, J., and Hudson-Vitale, C. (2015). Forging partnerships: Foundations of
geospatial data stewardship. Journal of Map and Geography Libraries 11(3): 359375.
Athanasis, N., Kalabokidis, K., Vaitis, M., and Soulakellis, N. (2009). Towards a semantics-based
approach in the development of geographic portals. Computers and Geosciences 35(2): 301308.
Avram, H. D. (2003). Machine-readable cataloging (MARC) program. Encyclopedia of library and
information science 3: 1712.
Baierer, K., Dröge, E., Trkulja, V., Petras, V. (2014). Linked Data Mapping Cultures: An
Evaluation of Metadata Usage and Distribution in a Linked Data Environment. In Proceedings of
the International Conference on Dublin Core and Metadata Applications.
Bechhofer, S., De Roure, D., Gamble, M., Goble, C., & Buchan, I. (2010). Research objects:
Towards exchange and reuse of digital knowledge. The Future of the Web for Collaborative
Brewer, G. D. (2015). The challenges of interdisciplinarity. Policy Sciences 32(4): 327337.
Buckland, M. K. (1997). What is a document?. Journal of the American Society for Information
Science (1986-1998), 48(9), 804.
Cooley, S., Lafia, S., Medrano, A., Stephens, D., and Kuhn, W. (2015) Spatial Discovery Expert
Meeting Final Report. Center for Spatial Studies, University of California, Santa Barbara Library.
Cragin, M., Palmer, C., Carlson, J., and Witt, M. (2010). Data sharing, small science and
institutional repositories. Philosophical Transactions of the Royal Society 40234038.
Durante, K., and Hardy, D. (2015). Discovery, management, and preservation of geospatial data
using hydra. Journal of Map and Geography Libraries 11(2): 123154.
Durante, K., Weimer, K. H. and McGee, M. (2016) “Linked Open Data Modeling for Library
Cartographic Resources.” Presentation at the Annual Association of American Geographers
Conference, San Francisco, CA, March 29.
Ford, A., Wernecke, C. (2000). Assessing the Situation at El Pilar: Chronology, Survey,
Conservation, and Management Planning for the 21st Century. MesoAmerican Research Center.
UC Santa Barbara: MesoAmerican Research Center. eScholarship:
Ford, A. (1995) Archaeological Sites Maya Forest GIS. WWW document,
Freeston, M. (2004). The Alexandria Digital Library and the Alexandria Digital Earth prototype. In
Proceedings of the 2004 joint ACM/IEEE conference on Digital librariesJCDL 2004, p. 410.
New York, New York, USA: ACM Press.
Golding, C. (2009). Integrating the disciplines: Successful interdisciplinary subjects. Centre for the
Study of Higher Education, University of Melbourne.
Goodchild, M. F. (2004). The Alexandria Digital Library Project. D-Lib Magazine, pp. 18.
Halpern, B. Lester, S. and Grorud-Colvert, K. (2009). PISCO: Partnership for Interdisciplinary
Studies of Coastal Oceans. Science of Marine Reserves: Meta-analysis: Global synthesis. KNB
Data Repository.
Hardy, D., and Durante, K. (2014). A Metadata Schema for Geospatial Resource Discovery Use
Cases. Code4lib Journal 25.
Harnad, S., Brody, T., Vallieres, F., Carr, L., Hitchcock, S., Gingras, Y., Oppenheim, C., Hajjem,
C., and Hilf, E. (March 2008). The Access/impact Problem and the Green and Gold Roads to Open
Access: An Update. Serials Review 34 (1): 3640. doi:10.1016/j.serrev.2007.12.005.
Heath, T., and Bizer, C. (2011). Linked Data: Evolving the Web into a Global Data Space.
Synthesis Lectures on the Semantic Web: Theory and Technology, Morgan and Claypool.
Hey, T., Tansley, S., and Tolle, K., eds. (2009). The fourth paradigm: Data-intensive scientific
discovery. Redmond, WA: Microsoft Research.
Hu, Y., Janowicz, K., Prasad, S. and Gao, S. (2015-1), Metadata Topic Harmonization and
Semantic Search for Linked-Data-Driven Geoportals: A Case Study Using ArcGIS Online.
Transactions in GIS, 19: 398416. doi: 10.1111/tgis.12151
Hu, Y., Janowicz, K., Prasad, S., and Gao, S. (2015-2). Enabling Semantic Search and Knowledge
Discovery for ArcGIS Online: A Linked-Data-Driven Approach. In AGILE, pp. 116.
Keßler C, Janowicz, K., and Kauppinen, T. (2012). Exploring the research field of GIScience with
linked data. In Xiao N, Kwan M-P, Goodchild M F, and Shekhar S (eds) Geographic Information
Science: Seventh International Conference, GIScience 2012, Columbus, OH. September 1821,
2012, Proceedings. Berlin, Springer Lecture Notes in Computer Science 7478: 102115.
Maali, F., Cyganiak, R., & Peristeras, V. (2011). Re-using Cool URIs: Entity reconciliation against
LOD hubs. CEUR Workshop Proceedings, 813.
MacMillan, D. (2014). Data Sharing and Discovery: What Librarians Need to Know. The Journal
of Academic Librarianship Vol. 40 (5): 541549.
Mayernik, M. S. (2015). Research data and metadata curation as institutional issues. Journal of the
Association for Information Science and Technology,
Mobility, measured: America is no less socially mobile than it was a generation ago (2014,
February 1). The Economist. Retrieved from
Padilla, T. (2016). Humanities Data in the Library: Integrity, Form, Access. D-Lib Magazine, 22
(3/4), 112.
Patterson, T., Kelso, N. V., & North American Cartographic Information Society. (2009). Natural
earth. WWW document
Powell, J. (2014). A Librarian's Guide to Graphs, Data and the Semantic Web. Chandos
Publishing. Oxford.
Regents of the University of California. “UC Open Access Policies Office of Scholarly
Communication.” Accessed January 12, 2016.
Scaramozzino, J., White, R., Essic, J., Fullington, L. A., Mistry, H., Henley, A., and Olivares, M.
(2014). Map Room to Data and GIS Services: Five University Libraries Evolving to Meet Campus
Needs and Changing Technologies. Journal of Map and Geography Libraries 10 (1): 6–47.
Scheider, S., Degbelo, A., Kuhn, W., and Przibytzin, H. (2014). Content and contextHow linked
spatio-temporal data enables novel information services for libraries. GIS.Science (4): 138149.
Szukalski, B. (2015, June 17) "ArcGIS Online Demo: A Very Spatial Update." Spatial Discovery
Expert Meeting. Upham Hotel, Santa Barbara, California. Lecture.
Tsang, Daniel C. (2015). Academic Librarians & Open Access of Data: Challenges &
Opportunities in Research Data Management. UC Irvine: UC Irvine Libraries.
van Hooland, Seth and Verborgh, R. (2014). Linked Data for Libraries, Archives and Museums:
How to clean, link and publish your metadata. Facet.
8. Tables
Table 1Personas, domains, and datasets of researchers currently discoverable through UCSB Open Data
Situation El
Science of
Network for
effects within
reserves: a
via WorldCat
Boundaries of
Table 2Example of a triple stored in the RDF framework
Archaeological Sites
Maya Forest GIS
Assessing the Situation
at El Pilar
Assessing the Situation
at El Pilar
Archaeological Sites
Maya Forest GIS
9. List of table numbers and captions
Table 1Personas, domains, and datasets of researchers currently discoverable through UCSB Open Data
Table 2Example of a triple stored in the RDF framework
10. List of illustration numbers and captions
Figure 1—Project vision for data discovery and publication integration across domains
Figure 2UCSB’s Open Data instance27 leverages ArcGIS Online
Figure 3Generic Dublin Core Metadata Initiative (DCMI) data model
Figure 4Reconciled OpenRefine template (above) and RDF skeleton (below)
Figure 5RDF triples for datasets and publications exported in Turtle syntax
Figure 6A generic SPARQL query against the triples
... In the last few decades, demand and production of spatial data such as maps, images, and other geographically referenced data have been increasing continuously among different government and private agencies (Klinkenberg 2003). For efficacious discovery and access of spatial data, it is necessary to develop metadata documentation (Brovelli et al. 2019;Lafia et al. 2016;Wohner et al. 2019) which will also promote the sharing of spatial data between organizations (Greenberg 2005;Ma 2006;Foresman 2008). Metadata refers to "data about data" (Vaduva and Dittrich 2001) and the metadata related to spatial data contains information like projection, scale, resolution, reliability, data provider, and distribution policy (Leyk et al. 2019). ...
Spatial Data Infrastructure (SDI) is a framework of geospatial data, metadata, and users that primarily function for the production and sharing of large geospatial data. Metadata refers to “data about data” and the metadata related to spatial data contains information like projection, scale, resolution, reliability, data provider, and distribution policy. Metadata is a key component of National Spatial Data Infrastructure (NSDI) of all nations as this aims to provide data discovery and access. The metadata standard of Indian NSDI is based upon standards taken from existing international standards like FGDC, ANZLIC, Dublin Core, and CSDGM. The drawbacks faced by Indian NSDI include duplication of metadata elements; non-availability of automatic metadata generation capability and metadata standard template; duplication of effort and time in the generation of metadata over the data server and catalogue server separately; and catalogue repository updates in future. To address challenges faced in Indian NSDI and the existing research gap, the present study is undertaken to develop novel metadata standard and framework for modifying and generation of metadata elements automatically using Open Source Software (OSS) for Indian NSDI. The proposed Indian NSDI metadata standard is more efficient as well as the modified metadata elements are more compatible with the other metadata standards. The developed framework for automatic metadata generation is based on the principle of three tier client–server architecture. These concepts are then implemented efficaciously for developing a prototype model of public health SDI for Prayagraj city (acronym GeoMeta4Health) using OSS. The functionality of GeoMeta4Health will help in the search, discovery, access, and visualization of geospatial data and metadata. This research will help geospatial communities not only to generate metadata automatically but also to expand and exchange their geospatial data in a wider domain.
... The ability to automatically extract and spatially represent this geographic information would enable researchers to organize and find information using not just keywords but also spatial criteria, as is done for other types of text using Geographic Information Retrieval (GIR) techniques [10]. Organizing and visualizing scientific corpora by space would facilitate geographically-aware meta-analyses [11], enable studies to be cross-referenced by location [12,13], and allow for the discovery of geographical research gaps such as understudied regions in a particular scientific discipline [14,15]. Though scientific articles have become a frequent object of study for researchers, common research objectives are to analyze and visualize (often large) article collections [16][17][18], and to extract or summarize specific information from publications through text mining, usually in a particular domain such as biomedical research [19,20]. ...
Full-text available
Scientific articles often contain relevant geographic information such as where field work was performed or where patients were treated. Most often, this information appears in the full-text article contents as a description in natural language including place names, with no accompanying machine-readable geographic metadata. Automatically extracting this geographic information could help conduct meta-analyses, find geographical research gaps, and retrieve articles using spatial search criteria. Research on this problem is still in its infancy, with many works manually processing corpora for locations and few cross-domain studies. In this paper, we develop a fully automatic pipeline to extract and represent relevant locations from scientific articles, applying it to two varied corpora. We obtain good performance, with full pipeline precision of 0.84 for an environmental corpus, and 0.78 for a biomedical corpus. Our results can be visualized as simple global maps, allowing human annotators to both explore corpus patterns in space and triage results for downstream analysis. Future work should not only focus on improving individual pipeline components, but also be informed by user needs derived from the potential spatial analysis and exploration of such corpora.
... However, the growth of the spatial humanities, the spatial turn in digital humanities and the history of sciences, has led to efforts to create large-scale databases of geographic references in historical documents [18,54,23]. Furthermore, spatial search has been proposed as a way of organizing and discovering scientific research objects [28]. A significant body of research has also focused on modeling locations with web text data, especially shorter microblog text, and ranking locations given a query [45,53,27,3]. ...
Full-text available
Domain-based learning and research are important applications driving the development of exploratory search systems. A wealth of historical information about events from around the world resides within documents on the web, yet contemporary search engines do not take advantage of the closely integrated temporal and spatial information found within these web pages for indexing and design of search user interfaces. This gap limits the use of the web as a resource for historical and geohistorical information seeking. In this paper we propose chronotopic information interaction as a new interaction concept for web search that explicitly links temporal and spatial entities to keywords using a space-time grid index and a paired search user interface. The space-time grid index allows different modes of interaction between spatial, temporal, and keyword-based views in the search user interface. We demonstrate use of the space-time grid index and chronotopic information interaction concept with the development of Pteraform, a prototype of a search engine that enables users to explore information in the English version of Wikipedia through a geo-historical lens.
... While libraries have long been the traditional brokers of knowledge, today's queries are largely mediated by commercial digital search engines [12]. Yet, libraries are taking on new roles, facilitating discovery, and often co-production, of knowledge [8]. Semantically annotated data can be more easily discovered and retrieved via queries that traverse knowledge graphs, regardless of the endpoints where they are hosted. ...
Conference Paper
Full-text available
We describe a method and system design for improved data discovery in an integrated network of open geospatial data that supports collaborative policy development between governments and local constituents. Metadata about civic data (such as thematic categories, user-generated tags, geo-references, or attribute schemata) primarily rely on technical vocabularies that reflect scientific or organizational hierarchies. By contrast, public consumers of data often search for information using colloquial terminology that does not align with official metadata vocabularies. For example, citizens searching for data about bicycle collisions in an area are unlikely to use the search terms with which organizations like Departments of Transportation describe relevant data. Users may also search with broad terms, such as “traffic safety”, and will then not discover data tagged with narrower official terms, such as “vehicular crash”. This mismatch raises the question of how to bridge the users’ ways of talking and searching with the language of technical metadata. In similar situations, it has been beneficial to augment official metadata with semantic annotations that expand the discoverability and relevance recommendations of data, supporting more inclusive access. Adopting this strategy, we develop a method for automated semantic annotation, which aggregates similar thematic and geographic information. A novelty of our approach is the development and application of a crosscutting base vocabulary that supports the description of geospatial themes. The resulting annotation method is integrated into a novel open access collaboration platform (Esri’s ArcGIS Hub) that supports public dissemination of civic data and is in use by thousands of government agencies. Our semantic annotation method improves data discovery for users across organizational repositories and has the potential to facilitate the coordination of community and organizational work, improving the transparency and efficacy of government policies.
Full-text available
This scientific review paper aims at challenging a common point of view on metadata as a necessary evil and something mandatory to the data creating and dataset publishing process. Metadata are instead presented as a crucial element to ensure the findability of data services and repositories. This paper describes a way through four levels of metadata management and publication, from default unstructured data, through schema-based metadata with literal values and/or URIs, towards linked open (meta)data providing explicit linkage between reliable data resources. Such research was conducted within the European Union's project PoliVisu. Special attention is given to the following: (1) guidance on publication aimed at the broad audience of search engine users and (2) the publication of geo (meta)data not only via standard technologies, such as the OGC Catalogue Service for Web and open data portals, but also through leading search engines (that are
Where does one look to study cities around the world? How does a librarian build a collection that moves beyond a limited Western focus to incorporate post-colonial and indigenous experiences? And how can such analysis be automated to allow practitioners at disparate institutions to diversify their own collections? These questions are important as Urban Planning tries to incorporate a variety of practices in human settlement from across the world. Building on previous research related to an Urban Planning book collection, this study uses GIS analysis to address DEI questions on a global scale by highlighting disparities in scholarly focus. By analyzing the geographic subject content of top journal articles in the field of Urban Planning in comparison to books within the library, the study examines ways that a collection can address gaps in analysis of human settlements around the world, especially in the global south. These analyses are then used to guide collection development, building a global focus in the book collection, filling in gaps that may arise from limits in the current journal coverage. Material is analyzed both in the specific collection, but also in the larger scholarly community, comparing the specific gaps in the collection to larger gaps in the scholarship of Urban Planning. In addition to the primary study, this article includes details about using Excel macros for textual analysis of a corpus of metadata, with instructions for how to use these open-source macros to do analysis at a variety of institutions.
Conference Paper
Full-text available
It is challenging for scholars to discover thematically related research in a multidisciplinary setting, such as that of a university library. In this work, we use spatialization techniques to convey the relatedness of research themes without requiring scholars to have specific knowledge of disciplinary search terminology. We approach this task conceptually by revisiting existing spatialization techniques and reframing them in terms of core concepts of spatial information, highlighting their different capacities. To apply our design, we spatialize masters and doctoral theses (two kinds of research objects available through a university library repository) using topic modeling to assign a relatively small number of research topics to the objects. We discuss and implement two distinct spaces for exploration: a field view of research topics and a network view of research objects. We find that each space enables distinct visual perceptions and questions about the relatedness of research themes. A field view enables questions about the distribution of research objects in the topic space, while a network view enables questions about connections between research objects or about their centrality. Our work contributes to spatialization theory a systematic choice of spaces informed by core concepts of spatial information. Its application to the design of library discovery tools offers two distinct and intuitive ways to gain insights into the thematic relatedness of research objects, regardless of the disciplinary terms used to describe them.
Full-text available
The wide expansion of digital technologies has influenced research in all fields of science as well as educational activities. Scientific objective: The purpose of this article is to examine critical areas of academic library activity, in a significant or requiring far-reaching changes in all aspects, in the context of needs of the scientific community. Research methods: It was decided that the method that will allow to outline the situation in this area will be qualitative content analysis texts from leading journals. For this purpose, the main databases of Web of Science articles have been searched: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), and Arts & Humanities Citation Index (A&HCI), using the instruction TS = (“academic library” OR “academic libraries” OR “university library” OR “university libraries”) AND TS = (scholars or scientists or faculty or researchers or academics). The query limited to the last five years yielded gave as results 170 articles, of which 51 were deemed relevant to the issues discussed. Results and conclusions: In the light of qualitative content analysis of those texts, it is possible to distinguish following areas as important: general approach of scholars and librarians to cooperation, practices of research support, access to information resources adapted to scholars’ needs, data curation support, publication strategies support. On this basis, conclusions have been drawn about the role and type of support that academic libraries may provide in the process of scholarly communication. Cognitive value: This study has contributed to the research into the evaluation of academic library’s support services in the process of scholarly communication.
Current publishing practices in academia tend to result in datasets that are difficult to discover. This is because datasets are not well-integrated across academic domains and they are often not linked to the documents that reference them. For these reasons, discovering datasets across domains can be challenging; for example, discovering archeological observations and biological specimens using the same search is not widely supported, even if both datasets share a similar spatial extent, like Mesoamerica. It is also challenging to retrieve relevant documents that reference datasets; for example, retrieving a series of field reports that reference archeological observations is typically not supported. Our work develops an extensible method for: (1) geographically integrating collections across disciplinary repositories and (2) connecting datasets to related documents. We describe a collection of spatially-referenced researcher datasets, capturing their metadata elements and encoding them as linked open data. We then leverage existing library services to formalize links from datasets to documents. The system described in this work has been deployed, resulting in an experimental open data site for the UCSB campus. Results indicate that this system can be scaled-up with support from an institutional repository in the near future.
Full-text available
Space, time and thematic content are essential dimensions that allow libraries and their users to efficiently describe, search and access information media. The latter include not only documents and traditional media, such as paper maps, but to an increasing extent also scientific data sets, as well as all kinds of metadata describing these documents and data sets, both content-wise and in terms of their provenance. How can libraries be supported in their role as information broker for these diverse media? In this paper, we discuss a number of library services which have been challenging or impossible to realize in the past, especially with respect to linking media with spatio-temporal content descriptions and descriptions of their spatio-temporal accessibility. We argue that linked spatio-temporal data (LSTD) provide a way of realizing these services, in a manner which may substantially broaden the current scope of library information services. We illustrate these services based on examples from the Linked Open Data initiative of the University of Münster (LODUM) and related research. We discuss (based on a variety of illustrative tools) how LSTD is suitable to tackle these challenges.
Full-text available
The research access/impact problem arises because journal articles are not accessible to all of their would-be users; hence, they are losing potential research impact. The solution is to make all articles open access (OA, i.e., accessible online, free for all). OA articles have significantly higher citation impact than non-OA articles. There are two roads to OA: the “golden” road (publish your article in an OA journal) and the “green” road (publish your article in a non-OA journal but also self-archive it in an OA archive). About 10% of journals are gold, but over 90% are already green (i.e., they have given their authors the green light to self-archive); yet only about 10–20% of articles have been self-archived. To reach 100% OA, self-archiving needs to be mandated by researchers’ employers and funders, as they are now increasingly beginning to do.
Full-text available
ArcGIS Online is a unified Web portal designed by Environment System Research Institute (ESRI). It contains a rich collection of Web maps, layers, and services contributed by GIS users throughout the world. The metadata about these GIS resources reside in data silos that can be accessed via a Web API. While this is sufficient for simple syntax-based searches, it does not support more advanced queries, e.g., finding maps based on the semantics of the search terms, or performing customized queries that are not pre-designed in the API. In metadata, titles and descriptions are commonly available attributes which provide important information about the content of the GIS resources. However, such data cannot be easily used since they are in the form of unstructured natural language. To address these difficulties, we combine data-driven techniques with theory-driven approaches to enable semantic search and knowledge discovery for ArcGIS Online. We develop an ontology for ArcGIS Online data, convert the metadata into Linked Data, and enrich the metadata by extracting thematic concepts and geographic entities from titles and descriptions. Based on a human participant experiment, we calibrate a linear regression model for semantic search, and demonstrate the flexible queries for knowledge discovery that are not possible in the existing Web API. While this research is based on the ArcGIS Online data, the presented methods can also be applied to other GIS cloud services and data infrastructures.
This highly practical handbook teaches you how to unlock the value of your existing metadata through cleaning, reconciliation, enrichment and linking and how to streamline the process of new metadata creation. Libraries, archives and museums are facing up to the challenge of providing access to fast growing collections whilst managing cuts to budgets. Key to this is the creation, linking and publishing of good quality metadata as Linked Data that will allow their collections to be discovered, accessed and disseminated in a sustainable manner. This highly practical handbook teaches you how to unlock the value of your existing metadata through cleaning, reconciliation, enrichment and linking and how to streamline the process of new metadata creation. Metadata experts Seth van Hooland and Ruben Verborgh introduce the key concepts of metadata standards and Linked Data and how they can be practically applied to existing metadata, giving readers the tools and understanding to achieve maximum results with limited resources. Readers will learn how to critically assess and use (semi-)automated methods of managing metadata through hands-on exercises within the book and on the accompanying website. Each chapter is built around a case study from institutions around the world, demonstrating how freely available tools are being successfully used in different metadata contexts. This handbook delivers the necessary conceptual and practical understanding to empower practitioners to make the right decisions when making their organisations resources accessible on the Web. Key topics include, the value of metadata; metadata creation – architecture, data models and standards; metadata cleaning; metadata reconciliation; metadata enrichment through Linked Data and named-entity recognition; importing and exporting metadata; ensuring a sustainable publishing model. This will be an invaluable guide for metadata practitioners and researchers within all cultural heritage contexts, from library cataloguers and archivists to museum curatorial staff. It will also be of interest to students and academics within information science and digital humanities fields. IT managers with responsibility for information systems, as well as strategy heads and budget holders, at cultural heritage organisations, will find this a valuable decision-making aid.
Since 1994, the Alexandria Digital Library Project has developed three prototype digital libraries for georeferenced information. This paper describes the most recent of these efforts, a three-tier client-server architecture that relies heavily on a middleware layer to present a single uniform set of interfaces to multiple heterogeneous servers. These standard interfaces, all of which are implemented in HTTP, support session management, collection discovery and evaluation, metadata searching, metadata retrieval, and online holding retrieval. An XML-based metadata encoding scheme and a simple Boolean query language have also been developed. The architecture described by these interfaces has been implemented at UCSB.
Digitally inflected Humanities scholarship and pedagogy is on the rise. Librarians are engaging this activity in part through a range of digital scholarship initiatives. While these engagements bear value, efforts to reshape library collections in light of demand remain nascent. This paper advances principles derived from practice to inform development of collections that can better support data driven research and pedagogy, examines existing practice in this area for strengths and weaknesses, and extends to consider possible futures.
Libraries, archives and museums are facing up to the challenge of providing access to fast growing collections whilst managing cuts to budgets. Key to this is the creation, linking and publishing of good quality metadata as Linked Data that will allow their collections to be discovered, accessed and disseminated in a sustainable manner. Metadata experts Seth van Hooland and Ruben Verborgh introduce the key concepts of metadata standards and Linked Data and how they can be practically applied to existing metadata, giving readers the tools and understanding to achieve maximum results with limited resources. Readers will learn how to critically assess and use (semi-)automated methods of managing metadata through hands-on exercises within the book and on the accompanying website. Each chapter is built around a case study from institutions around the world, demonstrating how freely available tools are being successfully used in different metadata contexts. This handbook delivers the necessary conceptual and practical understanding to empower practitioners to make the right decisions when making their organisations resources accessible on the Web.