Chapter

Spatializing and analyzing digital texts: Corpora, GIS, and places

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Unlike ontology construction which can be fully automated, ontology enrichment remains typically a semiautomated procedure. Several dedicated methods have been described in literature [2,5,10,11,15,23,40] requiring some manual interaction with a domain expert, to review, accept, or reject the system's proposals. The process of ontology enrichment has been undertaken by taking into account other resources such as BabelNet [40], Wikipedia, and GeoNames [23] to extend the number of interrelations, to validate the ontology structure and schema, and to differentiate between concepts and instances when necessary. ...
... In the context of Geographic Information Retrieval (GIR), ontologies have been enriched for the formalization of geographic concepts and used for the spatialization [11,15] and Bruggmann and Fabrikant [10] and discovery/ exploration of text corpora [17], for semantic search [35]. Another enrichment technique exists, namely, ontology learning. ...
Article
Full-text available
verview Stats Comments Citations References Related research (10+) Abstract The ontology enrichment process is text-based and the application domain in hand is circumscribed to the content of the related texts. However, the main challenge in ontology enrichment is its learning since there is still a lack of relevant approach able to achieve automatic enrichment from a textual corpus or dataset of various topics. In this paper, we describe a new approach for automatic learning of terminological ontologies from textual corpus based on probabilistic models. In our approach, two topic modeling algorithms are explored, namely LDA and pLSA for learning topic ontology. The objective is to capture semantic relationships between word-topic and topic-document in terms of probability distributions to build a topic ontology and ontology graph with minimum human intervention. Experimental analysis on building a topic ontology and retrieving corresponding topic ontology for a user query demonstrates the effectiveness of the proposed approach.
... For these reasons, new methods of collecting intercity relational data are being explored. One promising method is geographic text analysis, which extracts information about places from text through keywords, structure and content (Gregory, Cooper, Hardie, & Rayson, 2015;Porter, Atkinson, & Gregory, 2015). This type of methods is of particular interest due to its accessibility and its ability of obtaining large-scale, intercity relational data. ...
Article
Full-text available
Compared to the burgeoning literature discussing the importance of agglomeration externalities for development, limited attention has been given to network externalities. This is largely due to limited data availability. We propose a general measure to proxy city network externalities based on toponym co-occurrences that indicate the relatedness between cities. This paper extracts intercity relationships based on the co-occurrence of Chinese place names on 2.5 billion webpages. We calculate and map absolute and relative network positions, which we use to explain urban labour productivity. We found that a stronger embeddedness in networks of cities is significantly and positively associated with urban productivity. Smaller cities benefit comparatively more from being well embedded in city networks, suggesting that these relations can compensate for a lack of agglomeration externalities. We also compare the importance for urban performance of city network externalities vis-à-vis agglomeration externalities. City network externalities turn out to be more important in explaining urban performance than agglomeration externalities. This calls for new theorizing on a relational approach to urban and regional development. Rather than stimulating further concentration of urbanization, our findings suggest that fostering relationships between cities is a viable alternative urban development strategy. We conclude with suggestions for a research agenda that delves deeper into city network externalities. ARTICLE HISTORY
... It relies on computer-aided analysis of large bodies of text, which looks for frequency counts of words or word sequences by displaying instances of words or phrases as a concordance of each instance of text. Concordance is the data output of all text matches within a corpus that displays keywords and their co-text (i.e., words occurring near search term) (Gregory et al., 2015). Within concordance analysis, the researcher analyzes keywords concerning context text through qualitative analysis to draw conclusions about occurrences (Anthony, 2013). ...
Article
Climate resilient development is emerging as a global policy strategy that integrates climate adaptation and mitigation into sustainable development decisions. For the Caribbean small island developing state (SIDS) of Antigua and Barbuda, the national government is pursuing climate resilient development through multilateral climate funds to protect economic growth from climate and weather-related disasters. Critical adaptation literature argues that interpreting climate vulnerability through an economic growth lens prioritizes economic solutions over other development concerns, which can further the uneven distribution of climate vulnerability and risk. Despite revealing the consequences of market-based climate actions, research has yet to fully understand the economization of vulnerability, which describes the political techniques that render and reconfigure vulnerability in calculated ways. By tracing the discursive interactions between multilateral climate financial institutions and the Antigua and Barbuda national government, this paper empirically examines how vulnerability is economized through climate resilient development. Findings identify the construction of ‘adaptation economies’ in watershed areas, which are economies that can capitalize upon climate challenges within areas of highest vulnerability through fee-for-climate services. The results illustrate that economic growth rationalities characterize climate vulnerability problematizations, which incentivize solutions that enforce the economic development of areas with the highest disaster impacts. Based on these findings, this study emphasizes a need to critically evaluate national actor efforts to re-organize development under climate financing rationales, and its vulnerability-inducing effects.
... In computing and language analysis, extracting locations from human language has seen some advancements and several approaches. For example, Corpus Linguistics is a methodology used to study language using a large naturally occurring body of text -a corpus -on various levels, including lexis, syntax, semantics, and pragmatics or discourse [8]. Corpus techniques are increasingly being exploited across a wide range of areas within linguistics, such as the description of grammar, the analysis of literary style, or the investigation of language change. ...
Conference Paper
Full-text available
This paper envisions a pipeline for automating the generation of augmented reality tours of contested heritage sites while employing a critical approach toward the representation of history. Through the design of a generative pipeline, the paper identifies and discusses the potential and pitfalls associated with extracting spatial features from archival manuscripts and presenting them using an augmented reality application. The paper proposes a number of design approaches that assist in automating the transformation of manuscripts into interactive tours while taking into consideration historical, narrative, and technical challenges.
... This allowed simple quantitative analysis of the pattern or Betty's daily activity, supplemented with qualitative textual data from the diary to provide more detail and context. All data extraction and recording were done manually as there is not a digital record of the diary and, given the varied nature of entries, it would have been hard to extract relevant information using computer text analysis software (Gregory et al., 2015a(Gregory et al., , 2015b. The database records only the number of times that an activity was recorded in the diary (on a day by day basis) and does not record duration. ...
Article
Full-text available
For most of the time everyday life is composed of a variety of mundane activities that go almost unnoticed and unrecorded. Many of these will follow a regular rhythm or routine that may vary over the life course as personal and family circumstances change. They may also change over a weekly or seasonal cycle. Although individually such activities could be viewed as trivial, collectively these routines and rhythms construct the fabric of all societies, economies and communities. Studying everyday life in the past is hard because few sources record mundane activities in their entirety or over a whole life span. In this paper the diaries of one woman who lived in north Lancashire (UK) from 1928 to 2018 are analysed to chart the changing rhythms and routines of everyday activities over her life course. She began writing a diary at the age of 13 and completed a detailed daily account of her activities every year until shortly before her death. By sampling the extensive run of diaries, I identify the ways in which her activities changed over her life course, and how they fluctuated over weekly and seasonal cycles. I identify seven key life-course stages during which her commitments to employment, housework, caring and leisure activities varied in response to her changing circumstances. The paper uses both quantitative and qualitative evidence from the diaries to illustrate a rarely seen aspect of change over the life course, and relates this evidence to theories of everyday life, including Lefebvre’s work on ‘rhythmanalysis’.
... The data that support the findings of this study are available from the corresponding author upon reasonable request. (2014), da Silveira (2014), Gregory and Geddes (2014), Bodenhamer, Corrigan, and Harris (2015), Gregory, Cooper, Hardie, and Rayson (2015), Juvan (2015), Juvan and Dokler (2015), Yuan, McIntosh, and Delozier (2015), and Travis and von Lünen (2016). ...
Article
Full-text available
This article examines how GIS can be used as a heuristic tool to reconstruct spatial–temporal events from narratives in order to examine whether a scenario is conceivable within the narrative world. The narrative about Paul's escape from Berea (Acts 17:14–15) is used as a case study. Several interpretive issues related to spatial and temporal questions surround these texts. In the case study, three methods are applied: (a) least-cost path analysis on elevation data to construct journeys and travel times for Roman roads; (b) network analysis to find seafaring routes valid for ancient times; and (c) the integration of spatial and temporal data in a space-time cube. Our main finding is that the method yields insights into the spatial–temporal dynamics of the narrative. This helps a modern reader to better understand the narrative conceivability of a story in the mind of a first-century reader.
... The fundamental basis of this research begins with the occurrence, identification and extraction of geography within Llwyd's texts. In the Digital Humanities, by considering reference to toponyms in written texts, research teams have used a variety of digital techniques and tools to burrow deep into genres as diverse as fiction, poetry and travelogues, early newspapers and medical reports (Lang 2014;Gregory et al. 2015a;Murrieta-Flores et al. 2015;Gregory and Donaldson 2016;Donaldson et al. 2017;Taylor et al. 2018a;Taylor et al. 2018b;Baker et al. 2019). Advancing from the close reading procedures of manually identifying geographic references within written texts, several projects have employed varying forms of distant reading and mixed-method approaches, to combine the more traditional qualitative forms of literary research with digital and quantitative mechanisms (Moretti 2013;Underwood 2017). ...
Article
Full-text available
Digital technologies are rapidly altering the approaches used to analyse and visualise the content of early texts. This is especially evident in the growth and popularity of digital and spatial humanities projects exploring the geographies of historical and literary sources. Despite twenty-first century advances, this research has so far been limited by the common isolation and separation of different mediums of text which often form associated components of an overall narrative. This paper challenges this separation by offering a combined analysis and re-examination of the written and cartographic corpus of the Welsh antiquary, Humphrey Llwyd (c.1527-1568). Llwyd’s outputs are re-evaluated via an innovative fusion of previously disparate avenues of investigation commonly employed in the digital humanities, literary geography, the history of cartography and Geographic Information Systems (GIS). The analyses reveal that Llwyd’s written and visual chorography of early Britain and Wales contain hitherto ‘hidden’ geographies that Llwyd drew upon and divulges previously unknown connections between his different forms of chorography. The paper concludes with a recommendation that we think outside of our core skill-set and re-imagine our approach to textual research to provide a more complete and connected view of the layers of geography in early cultural texts.
... Um recurso comum no processo de anotação de entidades geográcas mencionadas é o uso de listas de entidades geográ cas (gazetteers) que provêm, para além do topónimo, informação complementar de utilidade para a desambiguação e georreferenciação (Leidner 2007, p. 51;Southall, Mostern & Berman 2011), particularmente as coordenadas geográ cas em termos de latitude e longitude. Na aplicação de gazetteers para a anotação automática, quando um termo no texto coincide com um topónimo da lista, outorgamos o atributo de entidade geográ ca e recuperamos a informação relevante disponível segundo os objetivos e o problema a resolver: quer simples reconhecimento e classi cação das entidades mencionadas, quer labores mais especí cos de resolução e análise geográ ca (Gregory et al. 2013;2015). Porém, mesmo quando se tiver uma lista especí ca, a simples aplicação dos topónimos produz ambiguidades (ex. ...
Article
Full-text available
Na anotação automática de entidades geográcasmencionadas,aslistasespecializadasdetopoˊnimoste^mqueenfrentarambiguidadesecontextosemqueovalorgeograˊcas mencionadas, as listas especializadas de topónimos têm que enfrentar ambiguidades e contextos em que o valor geográco de uma expressão não é evidente. Neste artigo, estuda-se o caso prático de um índice de topónimos utilizado para criar um corpus anotado da Peregrinação de Mendes Pinto. As diculdadesachadasservemparaclassiculdades achadas servem para classicar os tipos de erros que se produzem quando o topónimo é resolvido pela simples coincidência de expressões e introduzem critérios para a identicac\ca~odasentidadesgeograˊcação das entidades geográcas, uma tarefa que deve preceder e tem um impacto direto nos resultados obtidos no processo de anotação automática.
... Cooper et al. [128] proposed Geographical Text Analysis (GTA) for the spatialization and analysis of digital texts. The information extraction process identifies place names, as well as thematic tags representing topics (e.g., education, warfare, and farming using the UCREL Semantic Analysis System (USAS)). ...
Article
Full-text available
The present paper provides a review of two research topics that are central to geospatial semantics: information modeling and elicitation. The first topic deals with the development of ontologies at different levels of generality and formality, tailored to various needs and uses. The second topic involves a set of processes that aim to draw out latent knowledge from unstructured or semi-structured content: semantic-based extraction, enrichment, search, and analysis. These processes focus on eliciting a structured representation of information in various forms such as: semantic metadata, links to ontology concepts, a collection of topics, etc. The paper reviews the progress made over the last five years in these two very active areas of research. It discusses the problems and the challenges faced, highlights the types of semantic information formalized and extracted, as well as the methodologies and tools used, and identifies directions for future research.
... Uma vez georreferenciadas as entidades, quer manualmente, quer com procedimentos semiautomáticos, obtemos um corpus georreferenciado, suscetível de análises geográfi cas do texto (Gregory et al., 2015;Gregory et al., 2016) (Nugteren, 2011: 298). ...
Conference Paper
Full-text available
Work on georeferencing Asian place names mentioned in Fernão Mendes Pinto's Peregrinação. Presented at the 12th Conference of the Associação Internacional de Lusitanistas held in Macau, July 2017. (In Portuguese) As entidades geográficas mencionadas (EGM) são consideradas no Processamento da Linguagem Natural (PLN) como parte do problema de Reconhecimento e Classificação de entidades mencionadas (EM) para a anotação dos topónimos. Sabemos também que as entidades geográficas são objeto de atenção especial na análise geográfi ca de textos, particularmente na georreferenciação, entendida como a ligação entre a expressão do topónimo e o objeto geográfico, resolvida preferentemente por meio da obtenção de coordenadas. A importância de ambos os problemas, a anotação e a georreferenciação, faz com que se multiplique o número de soluções e aproximações. Nesta comunicação apresento a anotação e georreferenciação de EGM a partir do caso prático da Peregrinação de Fernão Mendes Pinto.
... A combinatória de técnicas da linguística de corpus e SIG aparece como uma nova área com aplicações nas humanidades, especialmente por permitir a visualização das geografias dum texto (Gregory & Baron, 2013;Alves & Queiroz, 2015;Cooper, Gregory, Hardie & Rayson, 2015;DeLozier, Wing, Baldridge & Nesbit, 2016). Num procedimento tipo de análise geográfica ( fig. ...
Article
Full-text available
RESUMO: A geografia da Peregrinação de Fernão Mendes Pinto tem sido abordada desde aproximações interdisciplinares com a intenção de reconstruir itinerários e cenários. Nenhuma obra conseguiu, de momento, referenciar toda a geografia da Peregrinação para todas as áreas. Neste artigo descreve-se mais um contributo que combina técnicas do Processamento da Linguagem Natural (PLN) com Sistemas de Informação Geográfica (SIG) para elaborar um novo índice de entidades geográficas mencionadas. A partir da análise comparada de obras especializadas, principalmente no domínio da história e da geografia histórica, o índice oferece uma georreferência exata para todas as localidades que, sendo conhecidas previamente, não apresentam contradição com a descrição dada por Pinto. Para o resto, configura-se um modelo inicial em que cada entidade é classificada segundo um tipo geográfico físico ou administrativo e ligada com um holónimo na relação é_Parte_de. A taxonomia resultante é processada numa ontologia e guardada junto de dados adicionais do corpus numa base de dados relacional. Ilustram-se os métodos e resultados com exemplos, descrevem-se os produtos finais e conclui-se ser necessária uma maior análise interdisciplinar para mais desenvolver as georreferências relativas. PALAVRAS-CHAVE: Fernão Mendes Pinto, georeferrenciamento, entidades geográficas mencionadas, geografia histórica, mineração de texto. ABSTRACT: There have been different interdisciplinar approaches to recreating the routes and locations in Fernão Mendes Pinto's travels. Until now, none has been able to provide a georeference for all the areas involved. In this article we describe one more contribution in which we combine techniques from Natural Language Processing (NLP) and Geographic Information Systems (GIS) to produce an index of geographical named entities. After careful comparative analysis of specialized works, mainly from the domain of history and historical geography, our index provides an exact georeference for all locations that were previously known and show no contradiction with Pinto's description. For the rest, we build an initial model that solves a relative georeference where every single entity is assigned a geographical feature and linked to a holonym in the relation is_Part_of. The resulting taxonomy is further processed as an ontology and stored along with additional data from corpus analysis in a relational database. We illustrate methods and results with examples, describe the final products, and conclude more interdisciplinary analysis is required to further develop the relative georeferences.
... Another relevant area of development, that we return to below, is the intersection of corpus linguistics and GIS, seen for example in the work of Ian Gregory and others at Lancaster on geographical textual analysis, and especially in the Corpus of Lake District Writing project, which focuses on place names and the identification of geographical features such as waterfalls, woodland, or farms Gregory, Cooper, Hardie, & Rayson, 2015;Gregory & Donaldson, 2016;Rayson et al., 2017). Similar questions are tackled by Kim, Vasardani, and Winter (2017), who focus on the question of resolving ambiguous place names by exploring their relations to other spatial features, and Song et al. (2017), who developed and tested a method for detecting and extracting vague cognitive regions. ...
Article
In this article we reflect back on our decade‐long collaboration on the geographies of the Holocaust to argue for a GIS of place. Our previous work on ghettoization in Budapest and on the spatio‐temporal patterns of Jewish persecution in Italy had a marked spatial dimension, both in the research questions we set out to answer and the methods we used, which were largely quantitative. During the course of our research, we progressively came to realize that a spatial perspective favors the voice of the perpetrator and that to fully comprehend and understand the geography of the Holocaust, we needed to engage with the voice of the victim, extend the set of methods and tools used, and broaden our epistemology. While proposing a fully‐fledged model of a qualitative GIS of the places and spaces of the Holocaust is beyond the scope of this article, we: (a) argue for the integration of social network analysis, corpus linguistics, and spatio‐temporal methods and for a mixed‐methods analytical approach and (b) note how the topological and relational foundations we identify as fundamental to a GIS of place parallel the long‐standing call for an “integrated history” of the Holocaust.
... In this paper we merge two methodologies that are ordinarily separate, namely Geographical Information Systems (GIS), a technology usually employed to analyse the spatial patterns within quantitative data, and corpus linguistics, a method which is used to analyse large volumes of digital texts but which, to date, has largely ignored geography. By combining these to create a set of techniques called Geographical Text Analysis (GTA) (Author et al, 2015;Murrieta-Flores et al 2015) we are able to explore the geographies within the Registrar-General's Reports. This is achieved by extracting disease related keywords and associated place-names from the reports and allows us to examine which diseases the Registrar-General was most interested in, which places he associated with these diseases, and how this changed over time. ...
Article
This paper uses a combination of Geographic Information Systems (GIS) and corpus linguistic analysis to extract and analyse disease related keywords from the Registrar-General's Decennial Supplements. Combined with known mortality figures, this provides, for the first time, a spatial picture of the relationship between the Registrar-General's discussion of disease and deaths in England and Wales in the nineteenth and early twentieth centuries. Techniques such as collocation, density analysis, the Hierarchical Regional Settlement matrix and regression analysis are employed to extract and analyse the data resulting in new insight into the relationship between the Registrar-General's published texts and the changing mortality patterns during this time.
Article
Full-text available
The massive amount of user-generated content available today presents a new challenge for the geospatial domain and a great opportunity to delve into linguistic, semantic, and cognitive aspects of geographic information. Ontology-based information extraction is a new, prominent field in which a domain ontology guides the extraction process and the identification of pre-defined concepts, properties, and instances from natural language texts. The paper describes an approach for enriching and populating a geospatial ontology using both a top-down and a bottom-up approach in order to enable semantic information extraction. The top-down approach is applied in order to incorporate knowledge from existing ontologies. The bottom-up approach is applied in order to enrich and populate the geospatial ontology with semantic information (concepts, relations, and instances) extracted from domain-specific web content.
Article
Accurate automated identification of named places is a major concern for scholars in the digital humanities, and especially for those engaged in research that depends upon the gazetteer-led recognition of specific aspects. The field of onomastics examines the linguistic roots and historical development of names, which have for the most part only standardized into single officially recognized forms since the late nineteenth century. Even slight spelling variations can introduce errors in geotagging techniques, and these differences in place-name spellings are thus vital considerations when seeking high rates of correct geospatial identification in historical texts. This article offers an overview of typical name-based variation that can cause issues in the accurate geotagging of any historical resource. The article argues that careful study and documentation of these variations can assist in the development of more complete onymic records, which in turn may inform geo-taggers through a cycle of variational recognition. It demonstrates how patterns in regional naming variation and development, across both specific and generic name elements, can be identified through the historical records of each known location. The article uses examples taken from a digitized corpus of writing about the English Lake District, a collection of 80 texts that date from between 1622 and 1900. Four of the more complex spelling-based problems encountered during the creation of a manual gazetteer for this corpus are examined. Specifically, the article demonstrates how and why such variation must be expected, particularly in the years preceding the standardization of place-name spellings. It suggests how procedural developments may be undertaken to account for such geo-referential issues in the Named Entity Recognition (NER) strategies employed by future projects. Similarly, the benefits of such multigenre corpora to assist in completing onomastic records is also shown via examples of new name forms discovered for prominent sites in the Lake District. This focus is accompanied by a discussion of the influence of literary works on place-name standardization—an aspect not typically accounted for in traditional onomastic study—to illustrate the extent to which authorial interests in regional toponymic histories can influence linguistic development.
Article
This paper begins with the so-called spatial turn and goes on to examine one of its most recent offshoots: the cartographic turn. After analysing the implications that this turn, particularly its digital aspect, may have on a possible mappability of literature and on the definition of an emerging field like spatial humanities, the paper will discuss the broad disciplinary spectrum of digital humanities and its possible convergences with this cartographic and spatialising trend through the changes experienced by the contemporary textual condition (from the large-scale digitation of texts to the spread of multimedia). The paper also explores the split between an eminently quantitative approach and a qualitative one, within both digital and spatial humanities, when tackling the study of texts, whether they be literary or otherwise. This duality leads to the current debate between defenders and detractors of what Franco Moretti dubbed distant reading, a critical practice that opposes the traditional method of close reading. As the paper attempts to argue, that distant perspective is closely linked to the cartographic turn and does not necessarily involve using exclusively quantitative tools and giving up close reading as a means of accessing texts. In this sense, through the underlying concept of some literary GIS and of the emerging notion of deep or thick mapping, the paper argues for the possibility of a telescopic reading which, as part of the approaches and interests of spatial and digital humanities, combines quantitative and qualitative methods and makes a distant focus (that is, cartographic) compatible with a close reading of texts.
Article
Participatory GIS (PGIS) was born out of the cauldron of the GIS and Society debates and the social theoretic critique of GIS. The form and practice of PGIS continues to reflect its origins. At its core PGIS remains focused on integrating local knowledge that is multivalent, equivocal, and often conflictual within a reductionist GIS technology and extensive Spatial Data Infrastructure. Recent conceptual developments in deep mapping and spatial storytelling have the potential to advance the representation of community knowledge through participatory deep mapping. Deep mapping explicitly recognizes that social life is contingent, implicated, and unpredictable. In representing a critical engagement between Geographic Information Science (GISc) and community knowledge and representation, deep mapping potentially challenges the misalignment in representing community knowledge in GIS and in bending geospatial technologies to the needs of communities.
ResearchGate has not been able to resolve any references for this publication.